Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

red-knot: infer string literal types #13113

Merged
merged 3 commits into from
Aug 26, 2024

Conversation

chriskrycho
Copy link
Contributor

@chriskrycho chriskrycho commented Aug 26, 2024

Summary

Introduce a StringLiteralType with corresponding Display type and a relatively basic test that the resulting representation is as expected.

Note: we currently always allocate for StringLiteral types. This may end up being a perf issue later, at which point we may want to look at other ways of representing value here, i.e. with some kind of smarter string structure which can reuse types. That is most likely to show up with e.g. concatenation.

Contributes to #12701.

Test Plan

Added a test for individual strings with both single and double quotes as well as concatenated strings with both forms.

Introduce a `StringLiteralType` with corresponding `Display` type and a
relatively basic test that the resulting representation is as expected.

Note: we currently always allocate for `StringLiteral` types. This may
end up being a perf issue later, at which point we may want to look at
other ways of representing `value` here, i.e. with some kind of smarter
string structure which can reuse types. That is most likely to show up
with e.g. concatenation.
Comment on lines 284 to 285
// TODO fix this comment with whatever it is we should say we
// want to do in the future.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect this is probably something in the space of what we say for BytesLiteral?

Copy link
Contributor

@carljm carljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple small changes, but generally looks great!!

crates/red_knot_python_semantic/src/types/display.rs Outdated Show resolved Hide resolved
crates/red_knot_python_semantic/src/types/infer.rs Outdated Show resolved Hide resolved
Copy link
Contributor

github-actions bot commented Aug 26, 2024

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Copy link
Contributor

@carljm carljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent!

@carljm carljm merged commit c4d628c into astral-sh:main Aug 26, 2024
20 checks passed
Copy link
Member

@AlexWaygood AlexWaygood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! (Sorry for the post-merge review; I've been out for most of the day)

@@ -278,6 +280,10 @@ impl<'db> Type<'db> {
Type::Unknown
}
Type::BooleanLiteral(_) => Type::Unknown,
Type::StringLiteral(_) => {
// TODO defer to Type::Instance(<str from typeshed>).member
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a somewhat annoying and pedantic nitpick from me, but: this comment isn't strictly accurate. In between Literal[<some string value>] and str, there's another Python type: typing.LiteralString. Explaining exactly how LiteralString works would take quite a few words and isn't really relevant to this PR 😄 but for now I think I'd just use this as the TODO comment instead:

Suggested change
// TODO defer to Type::Instance(<str from typeshed>).member
// TODO defer to `typing.LiteralString`/`builtins.str`
// methods from typeshed's stubs

Copy link
Contributor

@carljm carljm Aug 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the relevance to this specific TODO comment is that for many members of a string Literal, we can infer the return type as a LiteralString instead of str? This is true. It doesn't seem critical to me to reflect that detail in this TODO comment, though it certainly doesn't hurt. (The other nitpick on the TODO for both this and bytes is that for some methods we can probably safely infer an actual Literal type for the member, e.g. we can infer Literal["foo"].upper() to be Literal["FOO"] -- but I also didn't think we needed to mention that in the TODO.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, yeah -- as I say, it's kind of a pedantic nitpick. But given the problems that mypy has had with implementing LiteralString (well, it still doesn't actually have an implementation, because it requires rewriting a lot of assumptions across the code), I'd prefer for us to keep LiteralString in mind from the very beginning when thinking about implementing str and Literal[<some string value>] types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will update in a follow-on PR tomorrow – and I am personally a big fan of pedantic nitpicks when it comes to stuff like this. 😅

#[salsa::interned]
pub struct StringLiteralType<'db> {
#[return_ref]
value: String,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could use Box<str> here, to save a bit of space, since the data should always be immutable? Though it maybe wouldn't make much difference

Copy link
Contributor

@carljm carljm Aug 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great call; this is probably worth doing, considering it saves an entire usize of memory, which is eight ascii characters (on a 64-bit system); it could halve the memory used for a lot of small string literal types. (Not really, if you consider the Salsa interning overhead as well. But still, not insignificant if there are lots of small string literals.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! Yeah, I consistently forget that Box<str> is an option. 👍🏼

@chriskrycho chriskrycho deleted the rk-infer-string-literal branch August 26, 2024 19:16
@carljm carljm added the red-knot Multi-file analysis & type inference label Aug 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
red-knot Multi-file analysis & type inference
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants