Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Teach source generation to reference more interesting types. #4244

Merged
merged 11 commits into from
Aug 24, 2024

Conversation

chandlerc
Copy link
Contributor

This teaches our source generation tool to create interesting type references. This include both referencing a weighted distribution of explicitly specified types, and referencing types that are being defined in the generated file.

Generating more interesting explicit types will exercise more of Carbon's prelude, but because C++ doesn't have an automatic prelude with fundamental types like int64_t or tuples, we include some minimal headers when generating the C++ analog. This likely makes the comparison more fair rather than less fair as Carbon's toolchain isn't processing just the generated source, but also its prelude.

The current set of fixed types is based primarily on the set of types that the toolchain currently implements and a set that seems reasonably interesting to exercise for compile time performance. We want to try to cover things that should be optimized in the toolchain, even if a single source file might not typically hit all of them.

The weights of everything are completely arbitrary, based on intuition and some hand inspection of some random source files. There is also an intentional bias towards non-zero coverage and so the tail is much larger than it should be in reality. The result is that the weights more reflect the priority of optimizing compile time than the observed distribution in practice. We can refine the weighting scheme in the future though, potentially with multiple modes to separate coverage from maximally representative weights, etc. The goal is just to have a starting point.

The scheme for referencing the defined types requires some care and complexity to avoid referencing types before they are defined while still referencing all of the types defined and ensuring the number of references is stable even as the order is randomized to avoid fixed patterns in the source code.

All of this also triggered some minor refactoring of the state used to generate class definitions in the source generator. There are probably some good follow-on refactoring opportunities, but I'd prefer to leave those to future work.

I don't have any tests here because most of how this is observable is already tested -- the existing tests ensure the file sizes remain consistent and that the generated code is compiled correctly. But if folks have any ideas of useful tests here, happy to add them.

This teaches our source generation tool to create interesting type
references. This include both referencing a weighted distribution of
explicitly specified types, and referencing types that are being defined
in the generated file.

Generating more interesting explicit types will exercise more of
Carbon's prelude, but because C++ doesn't have an automatic prelude with
fundamental types like `int64_t` or tuples, we include some minimal
headers when generating the C++ analog. This likely makes the comparison
more fair rather than less fair as Carbon's toolchain isn't processing
just the generated source, but also its prelude.

The current set of fixed types is based primarily on the set of types
that the toolchain currently implements and a set that seems reasonably
interesting to exercise for compile time performance. We want to try to
cover things that should be optimized in the toolchain, even if a single
source file might not typically hit all of them.

The weights of everything are completely arbitrary, based on intuition
and some hand inspection of some random source files. There is also an
intentional bias towards non-zero coverage and so the tail is much
larger than it should be in reality. The result is that the weights more
reflect the _priority_ of optimizing compile time than the _observed_
distribution in practice. We can refine the weighting scheme in the
future though, potentially with multiple modes to separate coverage from
maximally representative weights, etc. The goal is just to have
a starting point.

The scheme for referencing the defined types requires some care and
complexity to avoid referencing types before they are defined while
still referencing all of the types defined and ensuring the number of
references is stable even as the order is randomized to avoid fixed
patterns in the source code.

All of this also triggered some minor refactoring of the state used to
generate class definitions in the source generator. There are probably
some good follow-on refactoring opportunities, but I'd prefer to leave
those to future work.

I don't have any tests here because most of how this is observable is
already tested -- the existing tests ensure the file sizes remain
consistent and that the generated code is compiled correctly. But if
folks have any ideas of useful tests here, happy to add them.
@github-actions github-actions bot requested a review from josh11b August 23, 2024 02:34
Comment on lines +42 to +67
auto public_function_param_counts() -> llvm::SmallVectorImpl<int>& {
return public_function_param_counts_;
}
auto public_method_param_counts() -> llvm::SmallVectorImpl<int>& {
return public_method_param_counts_;
}
auto private_function_param_counts() -> llvm::SmallVectorImpl<int>& {
return private_function_param_counts_;
}
auto private_method_param_counts() -> llvm::SmallVectorImpl<int>& {
return private_method_param_counts_;
}

auto class_names() -> llvm::SmallVectorImpl<llvm::StringRef>& {
return class_names_;
}
auto member_names() -> llvm::SmallVectorImpl<llvm::StringRef>& {
return member_names_;
}
auto param_names() -> llvm::SmallVectorImpl<llvm::StringRef>& {
return param_names_;
}

auto type_names() -> llvm::SmallVectorImpl<llvm::StringRef>& {
return type_names_;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These accessors aren't clearly adding value -- why not just make these members public and access them directly? Particularly since this class is local to this file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because my understanding of the style guide precludes that. If I'm wrong, I'd be delighted for these to be public. @jonmeow

testing/base/source_gen.cpp Outdated Show resolved Hide resolved
testing/base/source_gen.cpp Outdated Show resolved Hide resolved
testing/base/source_gen.cpp Outdated Show resolved Hide resolved
testing/base/source_gen.h Outdated Show resolved Hide resolved
testing/base/source_gen.cpp Outdated Show resolved Hide resolved
testing/base/source_gen.cpp Outdated Show resolved Hide resolved
testing/base/source_gen.cpp Show resolved Hide resolved
testing/base/source_gen.h Outdated Show resolved Hide resolved
testing/base/source_gen.h Outdated Show resolved Hide resolved
Copy link
Contributor Author

@chandlerc chandlerc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, PTAL!

Comment on lines +42 to +67
auto public_function_param_counts() -> llvm::SmallVectorImpl<int>& {
return public_function_param_counts_;
}
auto public_method_param_counts() -> llvm::SmallVectorImpl<int>& {
return public_method_param_counts_;
}
auto private_function_param_counts() -> llvm::SmallVectorImpl<int>& {
return private_function_param_counts_;
}
auto private_method_param_counts() -> llvm::SmallVectorImpl<int>& {
return private_method_param_counts_;
}

auto class_names() -> llvm::SmallVectorImpl<llvm::StringRef>& {
return class_names_;
}
auto member_names() -> llvm::SmallVectorImpl<llvm::StringRef>& {
return member_names_;
}
auto param_names() -> llvm::SmallVectorImpl<llvm::StringRef>& {
return param_names_;
}

auto type_names() -> llvm::SmallVectorImpl<llvm::StringRef>& {
return type_names_;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because my understanding of the style guide precludes that. If I'm wrong, I'd be delighted for these to be public. @jonmeow

testing/base/source_gen.cpp Outdated Show resolved Hide resolved
testing/base/source_gen.h Outdated Show resolved Hide resolved
testing/base/source_gen.cpp Outdated Show resolved Hide resolved
testing/base/source_gen.cpp Outdated Show resolved Hide resolved
testing/base/source_gen.cpp Show resolved Hide resolved
@chandlerc chandlerc requested a review from josh11b August 23, 2024 21:09
testing/base/source_gen.cpp Outdated Show resolved Hide resolved
testing/base/source_gen.cpp Outdated Show resolved Hide resolved
testing/base/source_gen.cpp Outdated Show resolved Hide resolved
testing/base/source_gen.cpp Outdated Show resolved Hide resolved
testing/base/source_gen.cpp Outdated Show resolved Hide resolved
@chandlerc chandlerc requested a review from josh11b August 23, 2024 21:42
@chandlerc
Copy link
Contributor Author

Thanks, merging! Will follow up on the accessors vs. public members and fix forward if needed.

@chandlerc chandlerc enabled auto-merge August 24, 2024 05:10
@chandlerc chandlerc added this pull request to the merge queue Aug 24, 2024
Merged via the queue into carbon-language:trunk with commit d9cd385 Aug 24, 2024
8 checks passed
@chandlerc chandlerc deleted the gen-name-refs branch August 24, 2024 05:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants