Teach source generation to reference more interesting types. #4244

chandlerc · 2024-08-23T02:34:18Z

This teaches our source generation tool to create interesting type references. This include both referencing a weighted distribution of explicitly specified types, and referencing types that are being defined in the generated file.

Generating more interesting explicit types will exercise more of Carbon's prelude, but because C++ doesn't have an automatic prelude with fundamental types like int64_t or tuples, we include some minimal headers when generating the C++ analog. This likely makes the comparison more fair rather than less fair as Carbon's toolchain isn't processing just the generated source, but also its prelude.

The current set of fixed types is based primarily on the set of types that the toolchain currently implements and a set that seems reasonably interesting to exercise for compile time performance. We want to try to cover things that should be optimized in the toolchain, even if a single source file might not typically hit all of them.

The weights of everything are completely arbitrary, based on intuition and some hand inspection of some random source files. There is also an intentional bias towards non-zero coverage and so the tail is much larger than it should be in reality. The result is that the weights more reflect the priority of optimizing compile time than the observed distribution in practice. We can refine the weighting scheme in the future though, potentially with multiple modes to separate coverage from maximally representative weights, etc. The goal is just to have a starting point.

The scheme for referencing the defined types requires some care and complexity to avoid referencing types before they are defined while still referencing all of the types defined and ensuring the number of references is stable even as the order is randomized to avoid fixed patterns in the source code.

All of this also triggered some minor refactoring of the state used to generate class definitions in the source generator. There are probably some good follow-on refactoring opportunities, but I'd prefer to leave those to future work.

I don't have any tests here because most of how this is observable is already tested -- the existing tests ensure the file sizes remain consistent and that the generated code is compiled correctly. But if folks have any ideas of useful tests here, happy to add them.

This teaches our source generation tool to create interesting type references. This include both referencing a weighted distribution of explicitly specified types, and referencing types that are being defined in the generated file. Generating more interesting explicit types will exercise more of Carbon's prelude, but because C++ doesn't have an automatic prelude with fundamental types like `int64_t` or tuples, we include some minimal headers when generating the C++ analog. This likely makes the comparison more fair rather than less fair as Carbon's toolchain isn't processing just the generated source, but also its prelude. The current set of fixed types is based primarily on the set of types that the toolchain currently implements and a set that seems reasonably interesting to exercise for compile time performance. We want to try to cover things that should be optimized in the toolchain, even if a single source file might not typically hit all of them. The weights of everything are completely arbitrary, based on intuition and some hand inspection of some random source files. There is also an intentional bias towards non-zero coverage and so the tail is much larger than it should be in reality. The result is that the weights more reflect the _priority_ of optimizing compile time than the _observed_ distribution in practice. We can refine the weighting scheme in the future though, potentially with multiple modes to separate coverage from maximally representative weights, etc. The goal is just to have a starting point. The scheme for referencing the defined types requires some care and complexity to avoid referencing types before they are defined while still referencing all of the types defined and ensuring the number of references is stable even as the order is randomized to avoid fixed patterns in the source code. All of this also triggered some minor refactoring of the state used to generate class definitions in the source generator. There are probably some good follow-on refactoring opportunities, but I'd prefer to leave those to future work. I don't have any tests here because most of how this is observable is already tested -- the existing tests ensure the file sizes remain consistent and that the generated code is compiled correctly. But if folks have any ideas of useful tests here, happy to add them.

josh11b · 2024-08-23T18:42:49Z

testing/base/source_gen.cpp

+  auto public_function_param_counts() -> llvm::SmallVectorImpl<int>& {
+    return public_function_param_counts_;
+  }
+  auto public_method_param_counts() -> llvm::SmallVectorImpl<int>& {
+    return public_method_param_counts_;
+  }
+  auto private_function_param_counts() -> llvm::SmallVectorImpl<int>& {
+    return private_function_param_counts_;
+  }
+  auto private_method_param_counts() -> llvm::SmallVectorImpl<int>& {
+    return private_method_param_counts_;
+  }
+
+  auto class_names() -> llvm::SmallVectorImpl<llvm::StringRef>& {
+    return class_names_;
+  }
+  auto member_names() -> llvm::SmallVectorImpl<llvm::StringRef>& {
+    return member_names_;
+  }
+  auto param_names() -> llvm::SmallVectorImpl<llvm::StringRef>& {
+    return param_names_;
+  }
+
+  auto type_names() -> llvm::SmallVectorImpl<llvm::StringRef>& {
+    return type_names_;
+  }


These accessors aren't clearly adding value -- why not just make these members public and access them directly? Particularly since this class is local to this file.

Because my understanding of the style guide precludes that. If I'm wrong, I'd be delighted for these to be public. @jonmeow

testing/base/source_gen.cpp

testing/base/source_gen.h

testing/base/source_gen.cpp

testing/base/source_gen.h

Co-authored-by: josh11b <[email protected]>

chandlerc

Thanks, PTAL!

chandlerc · 2024-08-23T20:52:22Z

testing/base/source_gen.cpp

+  auto public_function_param_counts() -> llvm::SmallVectorImpl<int>& {
+    return public_function_param_counts_;
+  }
+  auto public_method_param_counts() -> llvm::SmallVectorImpl<int>& {
+    return public_method_param_counts_;
+  }
+  auto private_function_param_counts() -> llvm::SmallVectorImpl<int>& {
+    return private_function_param_counts_;
+  }
+  auto private_method_param_counts() -> llvm::SmallVectorImpl<int>& {
+    return private_method_param_counts_;
+  }
+
+  auto class_names() -> llvm::SmallVectorImpl<llvm::StringRef>& {
+    return class_names_;
+  }
+  auto member_names() -> llvm::SmallVectorImpl<llvm::StringRef>& {
+    return member_names_;
+  }
+  auto param_names() -> llvm::SmallVectorImpl<llvm::StringRef>& {
+    return param_names_;
+  }
+
+  auto type_names() -> llvm::SmallVectorImpl<llvm::StringRef>& {
+    return type_names_;
+  }


Because my understanding of the style guide precludes that. If I'm wrong, I'd be delighted for these to be public. @jonmeow

testing/base/source_gen.cpp

testing/base/source_gen.h

testing/base/source_gen.cpp

Co-authored-by: josh11b <[email protected]>

testing/base/source_gen.cpp

Co-authored-by: josh11b <[email protected]>

testing/base/source_gen.cpp

chandlerc · 2024-08-24T05:10:27Z

Thanks, merging! Will follow up on the accessors vs. public members and fix forward if needed.

github-actions bot added the toolchain label Aug 23, 2024

github-actions bot requested a review from josh11b August 23, 2024 02:34

josh11b reviewed Aug 23, 2024

View reviewed changes

chandlerc and others added 5 commits August 23, 2024 14:00

Apply suggestions from code review

d6a4e4a

Co-authored-by: josh11b <[email protected]>

code review

965e683

remove artifact

e31dcea

format

8ee0466

simplify based on review

ae28461

chandlerc commented Aug 23, 2024

View reviewed changes

chandlerc requested a review from josh11b August 23, 2024 21:09

josh11b reviewed Aug 23, 2024

View reviewed changes

chandlerc and others added 3 commits August 23, 2024 14:36

Apply suggestions from code review

99740a1

Co-authored-by: josh11b <[email protected]>

better wording

1dbc21d

format

f4f056a

chandlerc requested a review from josh11b August 23, 2024 21:42

josh11b reviewed Aug 24, 2024

View reviewed changes

testing/base/source_gen.cpp Outdated Show resolved Hide resolved

josh11b approved these changes Aug 24, 2024

View reviewed changes

chandlerc and others added 2 commits August 23, 2024 22:06

Update testing/base/source_gen.cpp

d39cad5

Co-authored-by: josh11b <[email protected]>

format

6c0b0f5

CarbonInfraBot reviewed Aug 24, 2024

View reviewed changes

testing/base/source_gen.cpp Outdated Show resolved Hide resolved

chandlerc enabled auto-merge August 24, 2024 05:10

chandlerc added this pull request to the merge queue Aug 24, 2024

Merged via the queue into carbon-language:trunk with commit d9cd385 Aug 24, 2024
8 checks passed

chandlerc deleted the gen-name-refs branch August 24, 2024 05:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Teach source generation to reference more interesting types. #4244

Teach source generation to reference more interesting types. #4244

chandlerc commented Aug 23, 2024

josh11b Aug 23, 2024

chandlerc Aug 23, 2024

chandlerc left a comment

chandlerc Aug 23, 2024

chandlerc commented Aug 24, 2024

Teach source generation to reference more interesting types. #4244

Teach source generation to reference more interesting types. #4244

Conversation

chandlerc commented Aug 23, 2024

josh11b Aug 23, 2024

Choose a reason for hiding this comment

chandlerc Aug 23, 2024

Choose a reason for hiding this comment

chandlerc left a comment

Choose a reason for hiding this comment

chandlerc Aug 23, 2024

Choose a reason for hiding this comment

chandlerc commented Aug 24, 2024