Environment Parsing with Default Nested Class #347

kschwab · 2024-07-22T13:38:03Z

@hramezani, is the below case expected or a bug? I would have thought them to be equivalent. I came across this when updating the CLI handling of defaults. ~~I'll work to resolve it if it's a bug but wanted to confirm first before proceeding forward.~~

I added a potential fix if it is an issue.

import os
from pydantic import BaseModel, ValidationError
from pydantic_settings import BaseSettings

class NestedA(BaseModel):
    v0: bool
    v1: bool

class NestedB(BaseModel):
    v0: bool = False
    v1: bool = True

class SettingsGood(BaseSettings, env_nested_delimiter='__'):
    nested: NestedB = NestedB()

class SettingsBad(BaseSettings, env_nested_delimiter='__'):
    nested: NestedA = NestedA(v0=False, v1=True)

os.environ['NESTED__V0'] = 'True'
print(SettingsGood().model_dump())
"""
{'nested': {'v0': True, 'v1': True}}
"""

try:
    print(SettingsBad().model_dump())
except ValidationError as e:
    print(e)
"""
1 validation error for SettingsBad
nested.v1
  Field required [type=missing, input_value={'v0': 'True'}, input_type=dict]
"""

hramezani · 2024-07-23T09:36:10Z

Thanks @kschwab for reporting this.

Probably related to #345. Please check my answer.

BTW, it would be a breaking change as well. So, we can't merge it in V2.

kschwab · 2024-07-25T18:58:38Z

@hramezani actually, that is a slightly different issue. In #345, the user is providing a full object to be used, so I agree the result is expected. It would be analogous to the below if using environment variables:

DB1='{"env": "prd"}'

This issue relates to #244, or nested overrides but with class defaults. So, using the same example, it would be analogous to this:

DB1__ENV=prd

Similarly, as you noted (and just like #244), it is a breaking change and would have to be merged in with V3. The PR for this issue (#348) is failing because it depends on #244. If accepted, can we move this fix under #244 due to the coupling? That way the tests can pass etc.

It does put the CLI in a bit of a bind though. I resolved a bug where the CLI was not grabbing defaults correctly. Modifying the example to put it in context:

from os import environ
from pydantic import BaseModel
from pydantic_settings import BaseSettings


class DatabaseSettings(BaseModel):
    name: str
    env: str


class Settings(BaseSettings, cli_prog_name="example.py", env_nested_delimiter='__'):
    db1: DatabaseSettings = DatabaseSettings(name="database1", env="test")
    db2: DatabaseSettings = DatabaseSettings(name="database2", env="test")


Settings(_cli_parse_args=['--help'])
"""
usage: example.py [-h] [--db1 JSON] [--db1.name str] [--db1.env str] [--db2 JSON] [--db2.name str] [--db2.env str]

options:
  -h, --help      show this help message and exit

db1 options:
  --db1 JSON      set db1 from JSON string
  --db1.name str  (default: database1)
  --db1.env str   (default: test)                <-- Note the default is correctly populated from class default

db2 options:
  --db2 JSON      set db2 from JSON string
  --db2.name str  (default: database2)
  --db2.env str   (default: test)
"""

# We then hit our issue below when trying to use the optional arg
Settings(_cli_parse_args=['--db1.env=prd'])
"""
pydantic_core._pydantic_core.ValidationError: 1 validation error for Settings
 db1.name
   Field required [type=missing, input_value={'env': 'prd'}, input_type=dict]
     For further information visit https://errors.pydantic.dev/2.7/v/missing
"""

# And we get the same with environment variables
environ['DB1__ENV'] = 'prd'
Settings()
"""
pydantic_core._pydantic_core.ValidationError: 1 validation error for Settings
 db1.name
   Field required [type=missing, input_value={'env': 'prd'}, input_type=dict]
     For further information visit https://errors.pydantic.dev/2.7/v/missing
"""

It would be nice if we could find some kind of workaround in the interim.

hramezani · 2024-07-26T10:43:53Z

Thanks @kschwab for elaborating on this.

The current behavior is the same as pydantic. If we use BaseModel instead of BaseSettings in your example:

class SettingsBad(BaseModel):
    nested: NestedA = NestedA(v0=False, v1=True)

class NestedA(BaseModel):
    v0: bool
    v1: bool

Then passing initial value instead on envs,

print(SettingsBad(**{'nested': {'v0': True}}).model_dump())

we will get the same error. pydantic-settings only collects the values from different sources and pass it to pydantic

Also, the new DefaultSettingsSource code look complex to me and we really avoid adding complexity to pydantic-settings because of maintenance.

The problem that the new source settings and the _deep_update in #244 trying to address (if we can consider it as a problem) is not a serious problem and may introduce breaking change.

BTW, We can keep the PR open and reconsider it on V3

Thanks @kschwab for your understanding ❤️

kschwab · 2024-07-26T14:07:38Z

Sorry @hramezani, I may not be clear enough. This is an issue specific to this statement:

pydantic-settings only collects the values from different sources and pass it to pydantic

When "collecting values from different sources" the below should be equivalent but they're not:

import os
from pydantic import BaseModel
from pydantic_settings import BaseSettings, PydanticBaseSettingsSource


class NestedA(BaseModel):
    v0: bool
    v1: bool


class SettingsBad(BaseSettings, env_nested_delimiter='__'):
    nested: NestedA

    @classmethod
    def settings_customise_sources(
        cls,
        settings_cls: type[BaseSettings],
        init_settings: PydanticBaseSettingsSource,
        env_settings: PydanticBaseSettingsSource,
        dotenv_settings: PydanticBaseSettingsSource,
        file_secret_settings: PydanticBaseSettingsSource,
    ) -> tuple[PydanticBaseSettingsSource, ...]:
        return env_settings, init_settings, file_secret_settings


os.environ['NESTED__V0'] = 'True'
print(SettingsBad(nested={'v0': False, 'v1': True}).model_dump())    # passes
print(SettingsBad(nested=NestedA(v0=False, v1=True)).model_dump())   # fails

#244 fixes collection of values from a NestedA object while preserving typing of NestedA. i.e., the issue was: "not gathering values from a default model object".

This issue (and #345) similarly relate to: "not gathering values from a class default model object" (and I was wrong, the example provided in #345 was valid). This is because there is no class default settings source.

Given the above, are we still saying this isn't a pydantic settings issue?

the new DefaultSettingsSource code look complex to me

Fair enough. I'll look into pydantic internal and see if I can find an alias resolution function. That should simplify it.

hramezani · 2024-07-26T14:53:31Z

#244 fixes collection of values from a NestedA object while preserving typing of NestedA. i.e., the issue was: "not gathering values from a default model object".

I got your point.

as you know, pydantic-settings makes a dict and passes it as model params. so with your above example, here are the collected dict:

print(SettingsBad(nested={'v0': False, 'v1': True}).model_dump()) -> {'nested': {'v0': 'True', 'v1': True}}
print(SettingsBad(nested=NestedA(v0=False, v1=True)).model_dump()) -> {'nested': {'v0': 'True'}}

I agree with you that this is a pydantic-settings related thing and it would be good to fix it but:

It probably introduces a breaking change
it is complex and as a project maintainer I would prefer to not add a new settings source and this amount of code for this issue

Also, I can remember that you had a PR before to fix this problem but we reverted it because it was a breaking change.

can the TypeAdapter approach that you used on that PR solve the problem and helping us to get rid of all the complexity? if so, I think we can consider it on V3

kschwab · 2024-07-26T15:19:19Z

Now we're on the same page 😄

Yes, the original fix used a TypeAdapter, but then we were exposed to #241. i.e., a simple conversion to dict loses the original object type and reference (copy-by-value vs copy-by-reference). That's why we can't do a model_dump as proposed in #345 solution. We have to keep the original object, which was the cause of breaking change.

To solve that, deep_update has to be modified to preserve objects when encountered. #244 modifies deep_update such that when an object is encountered, it applies the updates to the original object. i.e., the original object type and reference are maintained. The only exception is if another source with a higher priority introduces a new object, in which case the higher priority object overrides the original.

In this sense, it isn't really a "breaking" change as was the case for the first PR, but it does modify a core piece of pydantic settings, so carries risk. This is why we moved it to V3 category.

I do not believe there is a simpler way to resolve it unfortunately.

hramezani · 2024-07-26T15:37:39Z

Agree that there is not a simple way to resolve it. at least with the current implementation of pydantic-settings I can't see an easy way.

I need to discuss with the team about this change and let you know what their feedback is.

Thanks again for your great work here ❤️

samuelcolvin · 2024-07-29T13:44:43Z

@kschwab thanks so much for reporting this.

I get where you're coming from on this being unexpected, but I'm inclined not to change the behavior, especially since the change proposed in #348 are quite extensive.

I think it would be better to update the documentation to explain why pydantic-settings behaves the way it does, and how to work around it.

kschwab · 2024-07-29T17:42:22Z

This may have gotten buried in the threads above, but the TL;DR of my request for this issue was:

Similarly, as you noted (and just like #244), it is a breaking change and would have to be merged in with V3. The PR for this issue (#348) is failing because it depends on #244. If accepted, can we move this fix under #244 due to the coupling? That way the tests can pass etc.

I mentioned it would be nice to have a workaround in V2 if possible, but I'm ok with noting it in the docs as a limitation.

The larger question I was trying to close on was whether this is accepted as an issue, and if we can expect it to be resolved now or in the future. @samuelcolvin I can't tell if your feedback is saying the current behavior will remain going forwards, or if it's just until resolution.

Lastly, with respect to #348, it really shouldn't be that extensive. I'm sure the team could do it better. It just needs to dump defaults by validation alias and not expand sub-models (i.e., keep them as objects).

hramezani · 2024-07-30T13:55:35Z

@kschwab

I mentioned it would be nice to have a workaround in V2 if possible, but I'm ok with noting it in the docs as a limitation.

I see, let's add this limitation to the doc for now.

The larger question I was trying to close on was whether this is accepted as an issue, and if we can expect it to be resolved now or in the future. @samuelcolvin I can't tell if your feedback is saying the current behavior will remain going forwards, or if it's just until resolution.

Yes, this is an issue. the fix that you provided is not easy. That's why we prefer not to merge it. it means this limitation will remain and we are looking for a simple fix for this (probably not possible to have an easy fix), Otherwise, we will prefer to keep the limitation.

BTW, we can keep the issue open and consider it in V3.

kschwab · 2024-07-30T16:57:19Z

Understood, I've closed the PR. This can be reevaluated in V3.

kschwab · 2024-08-09T16:41:30Z

@hramezani, I have an idea on how we can proceed forward with an interim solution.

The preferred easy fix is to use copy-by-value on default objects (i.e., the TypeAdapter solution). We can introduce a new config flag default_objects_copy_by_value that defaults to False, which preserves backwards compatibility but allows for users to disable copy-by-reference in favor of copy-by-value if needed.

Then, in V3, provided a more robust solution is found, the default_objects_copy_by_value config could be deprecated.

Thoughts?

hramezani · 2024-08-11T15:18:38Z

@kschwab what kind of problem will this flag solve? what problems still remain after this flag?

kschwab · 2024-08-11T23:09:56Z

what kind of problem will this flag solve?

When enabled - It will solve this issue, #244, #345 and #363. It does this by allowing copy-by-value instead of copy-by-reference.

When disabled - It will use current behavior of copy-by-reference. This keeps backwards compatibility.

In summary, it is a compromise on behavior. copy-by-value fixes this issue but breaks #241, whereas copy-by-reference fixes #241 but breaks this issue. Having a flag to choose between them provides a solution that is easy to maintain while giving users a choice for what they want.

what problems still remain after this flag?

A better solution that handles both cases internally without the need for a flag. This is what we discussed above but is more complex. This can be tackled in V3.

Using a flag gives us a solution until V3. It is not the best solution since a user must choose either-or behavior, but it is better than no solution at all. I think it is a good compromise for an easy fix until we have something better in V3.

hramezani · 2024-08-12T11:47:46Z

Thanks @kschwab for the explanation.

Yeah, let's have the flag

pydantic-hooky bot assigned samuelcolvin Jul 22, 2024

pydantic-hooky bot added the unconfirmed label Jul 22, 2024

kschwab mentioned this issue Jul 22, 2024

Add nested_model_default_partial_update flag and DefaultSettingsSource #348

Merged

hramezani removed the unconfirmed label Jul 23, 2024

hramezani unassigned samuelcolvin Jul 23, 2024

hramezani mentioned this issue Aug 8, 2024

partial specify the sub model value does not work #363

Closed

kschwab mentioned this issue Aug 9, 2024

help message of sub model in basesetting do not show the proper default value if the top field has a default #362

Closed

hramezani closed this as completed in #348 Aug 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Environment Parsing with Default Nested Class #347

Environment Parsing with Default Nested Class #347

kschwab commented Jul 22, 2024 •

edited

Loading

hramezani commented Jul 23, 2024

kschwab commented Jul 25, 2024

hramezani commented Jul 26, 2024

kschwab commented Jul 26, 2024

hramezani commented Jul 26, 2024

kschwab commented Jul 26, 2024 •

edited

Loading

hramezani commented Jul 26, 2024

samuelcolvin commented Jul 29, 2024

kschwab commented Jul 29, 2024

hramezani commented Jul 30, 2024 •

edited

Loading

kschwab commented Jul 30, 2024

kschwab commented Aug 9, 2024 •

edited

Loading

hramezani commented Aug 11, 2024

kschwab commented Aug 11, 2024 •

edited

Loading

hramezani commented Aug 12, 2024

Environment Parsing with Default Nested Class #347

Environment Parsing with Default Nested Class #347

Comments

kschwab commented Jul 22, 2024 • edited Loading

hramezani commented Jul 23, 2024

kschwab commented Jul 25, 2024

hramezani commented Jul 26, 2024

kschwab commented Jul 26, 2024

hramezani commented Jul 26, 2024

kschwab commented Jul 26, 2024 • edited Loading

hramezani commented Jul 26, 2024

samuelcolvin commented Jul 29, 2024

kschwab commented Jul 29, 2024

hramezani commented Jul 30, 2024 • edited Loading

kschwab commented Jul 30, 2024

kschwab commented Aug 9, 2024 • edited Loading

hramezani commented Aug 11, 2024

kschwab commented Aug 11, 2024 • edited Loading

hramezani commented Aug 12, 2024

kschwab commented Jul 22, 2024 •

edited

Loading

kschwab commented Jul 26, 2024 •

edited

Loading

hramezani commented Jul 30, 2024 •

edited

Loading

kschwab commented Aug 9, 2024 •

edited

Loading

kschwab commented Aug 11, 2024 •

edited

Loading