Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Debug panic in datetime #6205

Open
robertbastian opened this issue Feb 27, 2025 · 6 comments
Open

Debug panic in datetime #6205

robertbastian opened this issue Feb 27, 2025 · 6 comments
Assignees
Labels
C-datetime Component: datetime, calendars, time zones T-bug Type: Bad behavior, security, privacy

Comments

@robertbastian
Copy link
Member

FixedCalendarDateTimeFormatter::try_new(locale!("ru").into(), fieldsets::YMDE::short()).unwrap()
    .format(&Date::try_new_gregorian(2020, 1, 1).unwrap())
    .to_string();
thread 'main' panicked at components/datetime/src/neo.rs:992:17:
unexpected error in FormattedDateTime: NamesNotLoaded(ErrorField(Field { symbol: Weekday(StandAlone), length: Three }))
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
@sffc
Copy link
Member

sffc commented Feb 27, 2025

Possibly related: #5892

@sffc
Copy link
Member

sffc commented Feb 27, 2025

Conveniently ru is a testdata locale.

https://github.com/unicode-org/icu4x/blob/main/provider/source/data/debug/datetime/GregorianDateNeoSkeletonPatternsV1/ym0de/ru.json

{
  "has_explicit_medium": true,
  "has_explicit_short": true,
  "variant_pattern_indices": [
    0,
    0,
    0,
    4,
    5,
    6
  ],
  "elements": [
    "EEEE, d MMMM y г.",
    "E, d MMM y г.",
    "ccc, dd.MM.y г.",
    "EEEE, d MMMM y г. GGG",
    "E, d MMM y г. GGG",
    "E, dd MMM y г. GGG"
  ]
}

So for whatever reason, the pattern in question has ccc instead of E. That part might be a bug (#5892). But, even so, the names are loaded based on the pattern, so the code should be able to handle this type of case.

I suspect that the issue might be an inconsistency between the year style variants. The data model is documented here:

https://unicode-org.github.io/icu4x/rustdoc/icu/datetime/provider/struct.PackedPatterns.html

For a Short length, all of these patterns are reachable depending on the input year:

  1. Abbreviated year: "ccc, dd.MM.y г."
  2. Full year: (inherit)
  3. Year with era: "E, dd MMM y г. GGG"

Aha, so the resolution of year style results in different data for the weekday field (and also the month field, but this is not a concern because one is numeric). This is problematic because ccc and E could be different names.

Two things that need to be fixed, and solutions:

  1. Fallback behavior if something like invalid data causes this to happen again:
    1. Keep status quo: debug-assert and GIGO behavior in prod
    2. Automatic, silent fallback from ccc to E or vice-versa
    3. Debug-assert sooner, in the DateTimeFormatter constructor, if multiple year styles require loading conflicting year names
    4. Same as above but return an error from the DateTimeFormatter constructor
  2. Behavior to handle the current case in Russian datetime formatting
    1. Adopt one of the solutions above and keep that behavior
    2. Change DateTimeNames to support multiple weekday lengths and load both lengths (Handle multiple lengths of the same field in DateTimeNames #4337)
    3. Fix datagen to massage the patterns if such cases happen, requiring that year style variations don't cause multiple different weekday or month spellout lengths to appear in data

I'm leaning toward 2iii and either 1i or 1iii.

Thoughts? @robertbastian @Manishearth @zbraniecki

@sffc sffc added this to the ICU4X 2.0 ⟨P1⟩ milestone Feb 27, 2025
@sffc sffc added T-bug Type: Bad behavior, security, privacy C-datetime Component: datetime, calendars, time zones labels Feb 27, 2025
@sffc
Copy link
Member

sffc commented Feb 27, 2025

For context: the weekday names for E == EEE

{
  "names": [
    "вс",
    "пн",
    "вт",
    "ср",
    "чт",
    "пт",
    "сб"
  ]
}

and for ccc

{
  "names": [
    "вс",
    "пн",
    "вт",
    "ср",
    "чт",
    "пт",
    "сб"
  ]
}

So they are currently the same, but of course this isn't guaranteed for ever moving forward.

@robertbastian
Copy link
Member Author

It's worrying that this happens in the format function, where your mantra so far has been "infallible if using field sets". We don't seem to have the test coverage to back up this claim, we test a handful of locales against a handful of field sets. Is it feasible to test all locales x calendars x field sets?

@Manishearth
Copy link
Member

That part might be a bug (#5892). But, even so, the names are loaded based on the pattern, so the code should be able to handle this type of case.

My immediate reaction to this is that this is a datagen concern, which should try to normalize it or something. Which seems to be 2iii?

Debug-assert sooner, in the DateTimeFormatter constructor, if multiple year styles require loading conflicting year names

If we keep the assertion, having it be sooner sounds correct to me.

Silent fallback sounds nice too, though.

@sffc
Copy link
Member

sffc commented Feb 27, 2025

It's worrying that this happens in the format function, where your mantra so far has been "infallible if using field sets".

The claim has been "infallible if using field sets and if data is valid." Which would infer that the data here is invalid, i.e. a datagen bug, unless this uncovers a case we hadn't considered before.

We don't seem to have the test coverage to back up this claim

No, we don't.

Once we get data-driven testing wired up with more locales, we should be able to catch more of these bugs.

Is it feasible to test all locales x calendars x field sets?

That wouldn't even catch this bug. We need to do that and also format dates covering all patterns (all year styles and time precisions).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-datetime Component: datetime, calendars, time zones T-bug Type: Bad behavior, security, privacy
Projects
None yet
Development

No branches or pull requests

3 participants