-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Teach source generation to reference more interesting types. #4244
Conversation
This teaches our source generation tool to create interesting type references. This include both referencing a weighted distribution of explicitly specified types, and referencing types that are being defined in the generated file. Generating more interesting explicit types will exercise more of Carbon's prelude, but because C++ doesn't have an automatic prelude with fundamental types like `int64_t` or tuples, we include some minimal headers when generating the C++ analog. This likely makes the comparison more fair rather than less fair as Carbon's toolchain isn't processing just the generated source, but also its prelude. The current set of fixed types is based primarily on the set of types that the toolchain currently implements and a set that seems reasonably interesting to exercise for compile time performance. We want to try to cover things that should be optimized in the toolchain, even if a single source file might not typically hit all of them. The weights of everything are completely arbitrary, based on intuition and some hand inspection of some random source files. There is also an intentional bias towards non-zero coverage and so the tail is much larger than it should be in reality. The result is that the weights more reflect the _priority_ of optimizing compile time than the _observed_ distribution in practice. We can refine the weighting scheme in the future though, potentially with multiple modes to separate coverage from maximally representative weights, etc. The goal is just to have a starting point. The scheme for referencing the defined types requires some care and complexity to avoid referencing types before they are defined while still referencing all of the types defined and ensuring the number of references is stable even as the order is randomized to avoid fixed patterns in the source code. All of this also triggered some minor refactoring of the state used to generate class definitions in the source generator. There are probably some good follow-on refactoring opportunities, but I'd prefer to leave those to future work. I don't have any tests here because most of how this is observable is already tested -- the existing tests ensure the file sizes remain consistent and that the generated code is compiled correctly. But if folks have any ideas of useful tests here, happy to add them.
auto public_function_param_counts() -> llvm::SmallVectorImpl<int>& { | ||
return public_function_param_counts_; | ||
} | ||
auto public_method_param_counts() -> llvm::SmallVectorImpl<int>& { | ||
return public_method_param_counts_; | ||
} | ||
auto private_function_param_counts() -> llvm::SmallVectorImpl<int>& { | ||
return private_function_param_counts_; | ||
} | ||
auto private_method_param_counts() -> llvm::SmallVectorImpl<int>& { | ||
return private_method_param_counts_; | ||
} | ||
|
||
auto class_names() -> llvm::SmallVectorImpl<llvm::StringRef>& { | ||
return class_names_; | ||
} | ||
auto member_names() -> llvm::SmallVectorImpl<llvm::StringRef>& { | ||
return member_names_; | ||
} | ||
auto param_names() -> llvm::SmallVectorImpl<llvm::StringRef>& { | ||
return param_names_; | ||
} | ||
|
||
auto type_names() -> llvm::SmallVectorImpl<llvm::StringRef>& { | ||
return type_names_; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These accessors aren't clearly adding value -- why not just make these members public and access them directly? Particularly since this class is local to this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because my understanding of the style guide precludes that. If I'm wrong, I'd be delighted for these to be public. @jonmeow
Co-authored-by: josh11b <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, PTAL!
auto public_function_param_counts() -> llvm::SmallVectorImpl<int>& { | ||
return public_function_param_counts_; | ||
} | ||
auto public_method_param_counts() -> llvm::SmallVectorImpl<int>& { | ||
return public_method_param_counts_; | ||
} | ||
auto private_function_param_counts() -> llvm::SmallVectorImpl<int>& { | ||
return private_function_param_counts_; | ||
} | ||
auto private_method_param_counts() -> llvm::SmallVectorImpl<int>& { | ||
return private_method_param_counts_; | ||
} | ||
|
||
auto class_names() -> llvm::SmallVectorImpl<llvm::StringRef>& { | ||
return class_names_; | ||
} | ||
auto member_names() -> llvm::SmallVectorImpl<llvm::StringRef>& { | ||
return member_names_; | ||
} | ||
auto param_names() -> llvm::SmallVectorImpl<llvm::StringRef>& { | ||
return param_names_; | ||
} | ||
|
||
auto type_names() -> llvm::SmallVectorImpl<llvm::StringRef>& { | ||
return type_names_; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because my understanding of the style guide precludes that. If I'm wrong, I'd be delighted for these to be public. @jonmeow
Co-authored-by: josh11b <[email protected]>
Co-authored-by: josh11b <[email protected]>
Thanks, merging! Will follow up on the accessors vs. public members and fix forward if needed. |
This teaches our source generation tool to create interesting type references. This include both referencing a weighted distribution of explicitly specified types, and referencing types that are being defined in the generated file.
Generating more interesting explicit types will exercise more of Carbon's prelude, but because C++ doesn't have an automatic prelude with fundamental types like
int64_t
or tuples, we include some minimal headers when generating the C++ analog. This likely makes the comparison more fair rather than less fair as Carbon's toolchain isn't processing just the generated source, but also its prelude.The current set of fixed types is based primarily on the set of types that the toolchain currently implements and a set that seems reasonably interesting to exercise for compile time performance. We want to try to cover things that should be optimized in the toolchain, even if a single source file might not typically hit all of them.
The weights of everything are completely arbitrary, based on intuition and some hand inspection of some random source files. There is also an intentional bias towards non-zero coverage and so the tail is much larger than it should be in reality. The result is that the weights more reflect the priority of optimizing compile time than the observed distribution in practice. We can refine the weighting scheme in the future though, potentially with multiple modes to separate coverage from maximally representative weights, etc. The goal is just to have a starting point.
The scheme for referencing the defined types requires some care and complexity to avoid referencing types before they are defined while still referencing all of the types defined and ensuring the number of references is stable even as the order is randomized to avoid fixed patterns in the source code.
All of this also triggered some minor refactoring of the state used to generate class definitions in the source generator. There are probably some good follow-on refactoring opportunities, but I'd prefer to leave those to future work.
I don't have any tests here because most of how this is observable is already tested -- the existing tests ensure the file sizes remain consistent and that the generated code is compiled correctly. But if folks have any ideas of useful tests here, happy to add them.