Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Environment Parsing with Default Nested Class #347

Closed
kschwab opened this issue Jul 22, 2024 · 15 comments · Fixed by #348
Closed

Environment Parsing with Default Nested Class #347

kschwab opened this issue Jul 22, 2024 · 15 comments · Fixed by #348

Comments

@kschwab
Copy link
Contributor

kschwab commented Jul 22, 2024

@hramezani, is the below case expected or a bug? I would have thought them to be equivalent. I came across this when updating the CLI handling of defaults. I'll work to resolve it if it's a bug but wanted to confirm first before proceeding forward.

I added a potential fix if it is an issue.

import os
from pydantic import BaseModel, ValidationError
from pydantic_settings import BaseSettings

class NestedA(BaseModel):
    v0: bool
    v1: bool

class NestedB(BaseModel):
    v0: bool = False
    v1: bool = True

class SettingsGood(BaseSettings, env_nested_delimiter='__'):
    nested: NestedB = NestedB()

class SettingsBad(BaseSettings, env_nested_delimiter='__'):
    nested: NestedA = NestedA(v0=False, v1=True)

os.environ['NESTED__V0'] = 'True'
print(SettingsGood().model_dump())
"""
{'nested': {'v0': True, 'v1': True}}
"""

try:
    print(SettingsBad().model_dump())
except ValidationError as e:
    print(e)
"""
1 validation error for SettingsBad
nested.v1
  Field required [type=missing, input_value={'v0': 'True'}, input_type=dict]
"""
@hramezani
Copy link
Member

Thanks @kschwab for reporting this.

Probably related to #345. Please check my answer.

BTW, it would be a breaking change as well. So, we can't merge it in V2.

@kschwab
Copy link
Contributor Author

kschwab commented Jul 25, 2024

@hramezani actually, that is a slightly different issue. In #345, the user is providing a full object to be used, so I agree the result is expected. It would be analogous to the below if using environment variables:

DB1='{"env": "prd"}'

This issue relates to #244, or nested overrides but with class defaults. So, using the same example, it would be analogous to this:

DB1__ENV=prd

Similarly, as you noted (and just like #244), it is a breaking change and would have to be merged in with V3. The PR for this issue (#348) is failing because it depends on #244. If accepted, can we move this fix under #244 due to the coupling? That way the tests can pass etc.

It does put the CLI in a bit of a bind though. I resolved a bug where the CLI was not grabbing defaults correctly. Modifying the example to put it in context:

from os import environ
from pydantic import BaseModel
from pydantic_settings import BaseSettings


class DatabaseSettings(BaseModel):
    name: str
    env: str


class Settings(BaseSettings, cli_prog_name="example.py", env_nested_delimiter='__'):
    db1: DatabaseSettings = DatabaseSettings(name="database1", env="test")
    db2: DatabaseSettings = DatabaseSettings(name="database2", env="test")


Settings(_cli_parse_args=['--help'])
"""
usage: example.py [-h] [--db1 JSON] [--db1.name str] [--db1.env str] [--db2 JSON] [--db2.name str] [--db2.env str]

options:
  -h, --help      show this help message and exit

db1 options:
  --db1 JSON      set db1 from JSON string
  --db1.name str  (default: database1)
  --db1.env str   (default: test)                <-- Note the default is correctly populated from class default

db2 options:
  --db2 JSON      set db2 from JSON string
  --db2.name str  (default: database2)
  --db2.env str   (default: test)
"""

# We then hit our issue below when trying to use the optional arg
Settings(_cli_parse_args=['--db1.env=prd'])
"""
pydantic_core._pydantic_core.ValidationError: 1 validation error for Settings
 db1.name
   Field required [type=missing, input_value={'env': 'prd'}, input_type=dict]
     For further information visit https://errors.pydantic.dev/2.7/v/missing
"""

# And we get the same with environment variables
environ['DB1__ENV'] = 'prd'
Settings()
"""
pydantic_core._pydantic_core.ValidationError: 1 validation error for Settings
 db1.name
   Field required [type=missing, input_value={'env': 'prd'}, input_type=dict]
     For further information visit https://errors.pydantic.dev/2.7/v/missing
"""

It would be nice if we could find some kind of workaround in the interim.

@hramezani
Copy link
Member

Thanks @kschwab for elaborating on this.

The current behavior is the same as pydantic. If we use BaseModel instead of BaseSettings in your example:

class SettingsBad(BaseModel):
    nested: NestedA = NestedA(v0=False, v1=True)

class NestedA(BaseModel):
    v0: bool
    v1: bool

Then passing initial value instead on envs,

print(SettingsBad(**{'nested': {'v0': True}}).model_dump())

we will get the same error. pydantic-settings only collects the values from different sources and pass it to pydantic

Also, the new DefaultSettingsSource code look complex to me and we really avoid adding complexity to pydantic-settings because of maintenance.

The problem that the new source settings and the _deep_update in #244 trying to address (if we can consider it as a problem) is not a serious problem and may introduce breaking change.

BTW, We can keep the PR open and reconsider it on V3

Thanks @kschwab for your understanding ❤️

@kschwab
Copy link
Contributor Author

kschwab commented Jul 26, 2024

Sorry @hramezani, I may not be clear enough. This is an issue specific to this statement:

pydantic-settings only collects the values from different sources and pass it to pydantic

When "collecting values from different sources" the below should be equivalent but they're not:

import os
from pydantic import BaseModel
from pydantic_settings import BaseSettings, PydanticBaseSettingsSource


class NestedA(BaseModel):
    v0: bool
    v1: bool


class SettingsBad(BaseSettings, env_nested_delimiter='__'):
    nested: NestedA

    @classmethod
    def settings_customise_sources(
        cls,
        settings_cls: type[BaseSettings],
        init_settings: PydanticBaseSettingsSource,
        env_settings: PydanticBaseSettingsSource,
        dotenv_settings: PydanticBaseSettingsSource,
        file_secret_settings: PydanticBaseSettingsSource,
    ) -> tuple[PydanticBaseSettingsSource, ...]:
        return env_settings, init_settings, file_secret_settings


os.environ['NESTED__V0'] = 'True'
print(SettingsBad(nested={'v0': False, 'v1': True}).model_dump())    # passes
print(SettingsBad(nested=NestedA(v0=False, v1=True)).model_dump())   # fails

#244 fixes collection of values from a NestedA object while preserving typing of NestedA. i.e., the issue was: "not gathering values from a default model object".

This issue (and #345) similarly relate to: "not gathering values from a class default model object" (and I was wrong, the example provided in #345 was valid). This is because there is no class default settings source.

Given the above, are we still saying this isn't a pydantic settings issue?

the new DefaultSettingsSource code look complex to me

Fair enough. I'll look into pydantic internal and see if I can find an alias resolution function. That should simplify it.

@hramezani
Copy link
Member

#244 fixes collection of values from a NestedA object while preserving typing of NestedA. i.e., the issue was: "not gathering values from a default model object".

I got your point.

as you know, pydantic-settings makes a dict and passes it as model params. so with your above example, here are the collected dict:

print(SettingsBad(nested={'v0': False, 'v1': True}).model_dump()) -> {'nested': {'v0': 'True', 'v1': True}}
print(SettingsBad(nested=NestedA(v0=False, v1=True)).model_dump()) -> {'nested': {'v0': 'True'}}

I agree with you that this is a pydantic-settings related thing and it would be good to fix it but:

  • It probably introduces a breaking change
  • it is complex and as a project maintainer I would prefer to not add a new settings source and this amount of code for this issue

Also, I can remember that you had a PR before to fix this problem but we reverted it because it was a breaking change.

can the TypeAdapter approach that you used on that PR solve the problem and helping us to get rid of all the complexity? if so, I think we can consider it on V3

@kschwab
Copy link
Contributor Author

kschwab commented Jul 26, 2024

Now we're on the same page 😄

Yes, the original fix used a TypeAdapter, but then we were exposed to #241. i.e., a simple conversion to dict loses the original object type and reference (copy-by-value vs copy-by-reference). That's why we can't do a model_dump as proposed in #345 solution. We have to keep the original object, which was the cause of breaking change.

To solve that, deep_update has to be modified to preserve objects when encountered. #244 modifies deep_update such that when an object is encountered, it applies the updates to the original object. i.e., the original object type and reference are maintained. The only exception is if another source with a higher priority introduces a new object, in which case the higher priority object overrides the original.

In this sense, it isn't really a "breaking" change as was the case for the first PR, but it does modify a core piece of pydantic settings, so carries risk. This is why we moved it to V3 category.

I do not believe there is a simpler way to resolve it unfortunately.

@hramezani
Copy link
Member

Agree that there is not a simple way to resolve it. at least with the current implementation of pydantic-settings I can't see an easy way.

I need to discuss with the team about this change and let you know what their feedback is.

Thanks again for your great work here ❤️

@samuelcolvin
Copy link
Member

@kschwab thanks so much for reporting this.

I get where you're coming from on this being unexpected, but I'm inclined not to change the behavior, especially since the change proposed in #348 are quite extensive.

I think it would be better to update the documentation to explain why pydantic-settings behaves the way it does, and how to work around it.

@kschwab
Copy link
Contributor Author

kschwab commented Jul 29, 2024

This may have gotten buried in the threads above, but the TL;DR of my request for this issue was:

Similarly, as you noted (and just like #244), it is a breaking change and would have to be merged in with V3. The PR for this issue (#348) is failing because it depends on #244. If accepted, can we move this fix under #244 due to the coupling? That way the tests can pass etc.

I mentioned it would be nice to have a workaround in V2 if possible, but I'm ok with noting it in the docs as a limitation.

The larger question I was trying to close on was whether this is accepted as an issue, and if we can expect it to be resolved now or in the future. @samuelcolvin I can't tell if your feedback is saying the current behavior will remain going forwards, or if it's just until resolution.

Lastly, with respect to #348, it really shouldn't be that extensive. I'm sure the team could do it better. It just needs to dump defaults by validation alias and not expand sub-models (i.e., keep them as objects).

@hramezani
Copy link
Member

hramezani commented Jul 30, 2024

@kschwab

I mentioned it would be nice to have a workaround in V2 if possible, but I'm ok with noting it in the docs as a limitation.

I see, let's add this limitation to the doc for now.

The larger question I was trying to close on was whether this is accepted as an issue, and if we can expect it to be resolved now or in the future. @samuelcolvin I can't tell if your feedback is saying the current behavior will remain going forwards, or if it's just until resolution.

Yes, this is an issue. the fix that you provided is not easy. That's why we prefer not to merge it. it means this limitation will remain and we are looking for a simple fix for this (probably not possible to have an easy fix), Otherwise, we will prefer to keep the limitation.

BTW, we can keep the issue open and consider it in V3.

@kschwab
Copy link
Contributor Author

kschwab commented Jul 30, 2024

Understood, I've closed the PR. This can be reevaluated in V3.

@kschwab
Copy link
Contributor Author

kschwab commented Aug 9, 2024

@hramezani, I have an idea on how we can proceed forward with an interim solution.

The preferred easy fix is to use copy-by-value on default objects (i.e., the TypeAdapter solution). We can introduce a new config flag default_objects_copy_by_value that defaults to False, which preserves backwards compatibility but allows for users to disable copy-by-reference in favor of copy-by-value if needed.

Then, in V3, provided a more robust solution is found, the default_objects_copy_by_value config could be deprecated.

Thoughts?

@hramezani
Copy link
Member

@kschwab what kind of problem will this flag solve? what problems still remain after this flag?

@kschwab
Copy link
Contributor Author

kschwab commented Aug 11, 2024

what kind of problem will this flag solve?

When enabled - It will solve this issue, #244, #345 and #363. It does this by allowing copy-by-value instead of copy-by-reference.

When disabled - It will use current behavior of copy-by-reference. This keeps backwards compatibility.

In summary, it is a compromise on behavior. copy-by-value fixes this issue but breaks #241, whereas copy-by-reference fixes #241 but breaks this issue. Having a flag to choose between them provides a solution that is easy to maintain while giving users a choice for what they want.

what problems still remain after this flag?

A better solution that handles both cases internally without the need for a flag. This is what we discussed above but is more complex. This can be tackled in V3.

Using a flag gives us a solution until V3. It is not the best solution since a user must choose either-or behavior, but it is better than no solution at all. I think it is a good compromise for an easy fix until we have something better in V3.

@hramezani
Copy link
Member

Thanks @kschwab for the explanation.

Yeah, let's have the flag

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants