Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

to_tap_class, and inspect fields instead of signature for data models #128

Merged
merged 40 commits into from
Mar 10, 2024
Merged

to_tap_class, and inspect fields instead of signature for data models #128

merged 40 commits into from
Mar 10, 2024

Conversation

kddubey
Copy link
Contributor

@kddubey kddubey commented Jan 2, 2024

(Updated summary of changes)

This is a WIP following up on #125. It should also close #112.

2 advantages to using tap_class_from_data_model instead of tapify:

  1. Gets field descriptions for the -h help string
  2. Returns a Tap class, which allows the user to override configure and process_args. See the example here.

Future refactor

I'm thinking we can allow tapify to return a Tap class instead of calling the input object.

from tap_class_from_data_model import _tap_class  # in this PR

def tapify(class_or_function, return_tap_class: bool=False, ...):
    fields_data = ... 
    # from inspecting the signature of class_or_function (as already done in tapify),
    # or by extracting field data from class_or_function if it's a data model (to get
    # the field descriptions)
    tap_class = _tap_class(fields_data)
    if return_tap_class:
        return tap_class
    # rest of tapify as before
    tap = tap_class(description=description, explicit_bool=explicit_bool)
    command_line_args = tap.parse_args()
    command_line_args_dict = command_line_args.as_dict()
    return class_or_function(**command_line_args_dict)

@martinjm97 What do you think of this refactor? Lmk if I should open a new issue for this, or a PR if that makes things more concrete.

For now, my plan is to focus on the todos above. The downside to merging in tap_class_from_data_model is that there would be 2 interfaces—tapify and tap_class_from_data_model—which "tapify" a Pydantic model or dataclass, when there could be 1: tapify.

@kddubey kddubey marked this pull request as draft January 2, 2024 06:28
@martinjm97
Copy link
Collaborator

Hi @kddubey,

The PR looks great so far! Thank you for all the work you put in!

@swansonk14 and I will meet to discuss this PR within a week.

My opinion is that both of the advantages that you listed of tap_class_from_data_model over tapify should be incorporated into tapify. As a result, I'm inclined to put the new support for Pydantic in tapify as well.

We'll keep you posted. Thank you again!

--Jesse

@martinjm97
Copy link
Collaborator

martinjm97 commented Jan 11, 2024

Hi @kddubey,

We are in support of two different functions convert_to_tap_class and tapify. Then convert_to_tap_class (a renaming of tap_class_from_data_model) produces a Tap class from a class, dataclass, function, or pydantic model and tapify would call convert_to_tap under the hood and then either instantiate the class or run the function.

Here's a version that we were working on https://github.com/swansonk14/typed-argument-parser/blob/convert_to_tap/tap/tapify.py, but we haven't had time to complete it. Any more contributions in this direction would be greatly appreciated! Thank you for what you have done already!

We'll revisit this PR in the next few weeks and find a way to get this into main!

Best,
JK (Jesse and Kyle)

@martinjm97 martinjm97 marked this pull request as ready for review January 13, 2024 17:07
setup.py Show resolved Hide resolved
tap/tapify.py Outdated Show resolved Hide resolved
@kddubey
Copy link
Contributor Author

kddubey commented Jan 18, 2024

@martinjm97 thank you for the feedback about convert_to_tap_class, and the typing fixes from a few days ago. I updated the PR. Current tests should be passing.

Remaining todos for me before I un-draft this PR and ask for a full review:

  • don't require pydantic
  • test on pydantic dataclasses and BaseModels in tests/test_tapify.py
  • test to_tap_class (rn it's just demo'd)
  • get this to work for pydantic v1
  • add to README

tap/tapify.py Show resolved Hide resolved
@martinjm97
Copy link
Collaborator

Thank you so much again! Wow!

@martinjm97
Copy link
Collaborator

martinjm97 commented Jan 24, 2024

Very sorry! @swansonk14 and I are just clearing out time to give this a proper code review. This is our number 1 priority on Tap until it's in. Thank you so much for your tremendous effort and extremely high quality work.

--Jesse

@kddubey
Copy link
Contributor Author

kddubey commented Jan 24, 2024

No worries, take your time :-)

@martinjm97
Copy link
Collaborator

Hi @kddubey,

Thank you for all the tremendous work you have put into this PR! We learned a lot from your use of pytest fixtures/parameterize and appreciated the organization of tapify and to_tap_class.

We read over all of your code and it looks great! However, when started testing we found a small bug. When we tapifyd a pydantic class and provided an unused argument, tapify did not raise an error during parsing even though it should have.

Dataclasses have the desired functionality:

from dataclasses import dataclass
from tap import Tap, tapify
@dataclass
class Model:
    id: int = 1

args = tapify(Model)
print(args.id)
>>> python demo_data_model.py --id 7 --junk 89
usage: demo_data_model.py [--id ID] [-h]
demo_data_model.py: error: unrecognized arguments: --junk 89

pydantic classes do not raise an error:

from tap import Tap, tapify
from pydantic import BaseModel
class Model(BaseModel):
    id: int = 1

args = tapify(Model)
print(args.id)
>>> python demo_data_model.py --id 7 --junk 89
7

We still want to spend more time testing your implementation before merging it into main.

Thank you so much again for all of this fantastic work! We plan to merge this in as soon as we've finished testing and (you or we) have fixed issues that arise.

Best,
JK

@kddubey
Copy link
Contributor Author

kddubey commented Jan 29, 2024

@martinjm97 see this comment and this comment. Note that, by default, a Pydantic BaseModel (surprisingly, to me at least) ignores extra fields that are passed to it; it doesn't raise an error. Should I change the two variables (mentioned in the first comment) to be False so that supplying extra arguments always raises an error?

tests/test_tapify.py Outdated Show resolved Hide resolved
@martinjm97
Copy link
Collaborator

Sorry for missing your comments! We'll be sure to read all the comments before reporting supposed bugs!

Thanks,
JK

@kddubey
Copy link
Contributor Author

kddubey commented Jan 29, 2024

No worries. That comment from a week ago wasn't really clear now that I'm reading it. And I appreciate that you're testing the code independently!

@kddubey
Copy link
Contributor Author

kddubey commented Feb 29, 2024

Hi @martinjm97, the last commit changes the Pydantic implementation to not allow extra arguments for Pydantic BaseModels by default. That way it's backwards compatible and sensible. Sorry for causing confusion.

I also added support for passing extra command line args to the pydantic model if the model is configured that way. This change is backwards compatible, i.e., you can already do this in main.

For example, for this script—

# demo_extra_args.py
from pydantic import BaseModel, ConfigDict
from tap import tapify

class User(BaseModel):
    model_config = ConfigDict(extra="allow")  # explicitly allow extra attributes

    name: str

if __name__ == "__main__":
    user = tapify(User)
    print(user)

—running

python demo_extra_args.py --name Tapify --number 12

prints

name='Tapify' number='12'

(Extra args get cast to string, which matches what happens in main. The user can add custom validation if they want to change this behavior.)

Removing the model_config = ... line will cause that last command to raise an error:

usage: demo_extra_args.py --name NAME [-h]
demo_extra_args.py: error: unrecognized arguments: --number 12

Lmk what you think. Thanks!

@martinjm97
Copy link
Collaborator

Hi @kddubey,

Thank you for all of the extensive testing and bearing with us! @swansonk14 and I went through all of the examples in https://docs.pydantic.dev/latest/concepts/fields/ and did our own testing. Everything that we could think of seems to be working.

Thank you again for all your hard work and excellent execution! We will merge it in. We plan to do another pass over code to make small changes to the code style and comments before including it in the next release.

Best,
JK

@martinjm97 martinjm97 merged commit 8147f75 into swansonk14:main Mar 10, 2024
15 checks passed
@kddubey
Copy link
Contributor Author

kddubey commented Mar 10, 2024

Sounds good, and feel free to lmk if you need any help or clarification on this code in the future :-)

@kddubey kddubey deleted the kddubey/from-data-model branch March 10, 2024 23:12
@martinjm97
Copy link
Collaborator

Thank you! It has been a pleasure to learn from your work and to work with you. Sorry for the slow turn around. Kyle and I are so happy this landed!

--Jesse

@tinkerware
Copy link

I'd love to see a new release that includes this change; looking forward to using Pydantic models with tapify.

@kddubey
Copy link
Contributor Author

kddubey commented Apr 5, 2024

@tinkerware tapify should already work w/ Pydantic models. It's just that the field descriptions are missing from the -h message.

@martinjm97
Copy link
Collaborator

We've been trying to find time to get it at least above the codecov threshold before making the new release. Sorry for the delay!

@kddubey
Copy link
Contributor Author

kddubey commented Apr 7, 2024

@martinjm97 Happy to help w/ that since I'm not as busy. And cuz I broke it lol :-)

I believe the root cause is that the coverage % is based on a coverage report generated by tests ran in an evironment which doesn't have pydantic installed. The best solution is to take the union of covered lines across the 3 tests in the test environments in the testing workflow (no pydantic, pydantic v1, pydantic v2), but I'm not sure if that feature is supported. Will look into it soon

@martinjm97
Copy link
Collaborator

@kddubey, that would be absolutely spectacular if possible. No worries if it doesn't work out though. Thank you again!

@kddubey kddubey mentioned this pull request Apr 7, 2024
@martinjm97
Copy link
Collaborator

@tinkerware, thanks to the work of @kddubey, the implementation was prepared for release. You can see it here: https://github.com/swansonk14/typed-argument-parser/releases/tag/v_1.10.0.

--Jesse

@tinkerware
Copy link

Great, now I can ditch the git-ref in my poetry dependencies list! I needed the changes in this PR so that I could use to_tap_class.

Thanks again to the OP and the maintainers for all the work and effort you put into this project!

@tinkerware
Copy link

tinkerware commented Apr 11, 2024

So, I wanted to write up a quick summary of how I'm using Pydantic and this project together to parse CLI flags, so it can be a pointer for others looking to do the same and for me to learn from comments.

I use Pydantic to parse configuration and to represent the directives of the CLI flags with a "command" abstraction:

from pydantic import BaseModel
from pydantic_settings import BaseSettings, SettingsConfigDict

class AppConfig(BaseSettings):
  model_config = SettingsConfigDict(
        env_prefix="APP_",
        env_file='.env',
        env_file_encoding='utf-8',
        env_ignore_empty=True,
        str_strip_whitespace=True,
        revalidate_instances='always',
        frozen=True
    )

  api_key: str | None = None
  debug: bool = False
  ...

ShowOption: TypeAlias = Literal[...]
ShowInput: TypeAlias = Literal['all', ShowOption]

class AppCommand(BaseModel):
  """Makes hard things easy in a pretty way."""

  show: list[ShowInput] = Field(
    default=["sensible-default"],
    description="..."
  )
  ...

  @field_validator("show")
  @classmethod
  def validate_show(cls, show: list[ShowInput]) -> list[ShowOption]:
    if "all" in show:
      return [...]  # can expand convenience flags into underlying options
    return show


def parse_flags(*flags: str) -> tuple[AppConfig, AppCommand]:
  class AppFlags(to_tap_class(AppConfig), to_tap_class(AppCommand)):
    def configure() -> None:
      self.description = AppCommand.__doc__

   args = AppFlags().parse_args(args=flags or None).as_dict()
   return env_config(**args), AppCommand(**args)

This lets me override any configuration using CLI flags, and allows to use Pydantic validators to convert CLI flags to the data that the rest of the app expects.

Configuration overrides from CLI flags look something like this (e.g. the env_config function above):

try:
    _ENV_CONFIG = AppConfig()
except ValidationError as e:
    fail("Ensure all `APP_*` environment variables are set correctly: {}", e)


def env_config(**overrides: Any) -> AppConfig:
    if not overrides:
        return _ENV_CONFIG

    # The actual changes are overrides that are different from default values.
    changes = (
        _ENV_CONFIG
        .model_copy(update=overrides)
        .model_dump(exclude_defaults=True)
    )
    return AppConfig.model_validate(
      _ENV_CONFIG.model_copy(update=changes)
    )

This lets me pass a --debug flag in CLI, or define a APP_DEBUG=1 environment variable to control debug behavior, though it does not allow overriding an enabled debug config with a disabled debug CLI flag. That would require me to use the explicit bool option in Tap.parse_args in addition to tracking the set/not-set state of an attribute like a lot of other packages end up doing.

I hope this makes sense and gives you an idea of how others can make use of better integration with Pydantic.

@kddubey
Copy link
Contributor Author

kddubey commented Apr 11, 2024

This lets me override any configuration using CLI flags, and allows to use Pydantic validators to convert CLI flags to the data that the rest of the app expects.

Wow, very cool to see that level of sophistication @tinkerware. Thank you for sharing this example and explaining it!

@kddubey
Copy link
Contributor Author

kddubey commented Apr 11, 2024

@swansonk14 minor note: should the to_tap_class feature also be mentioned in the release notes?

@martinjm97
Copy link
Collaborator

I've updated the release notes with the example from @tinkerware as well as a discussion of the to_tap_class that @kddubey was gracious enough to remind us of. Thank you both for your contributions!

--Jesse

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refactor tapify to enable subparsers
3 participants