Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run the codebase through IWYU #79239

Draft
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

moxian
Copy link
Contributor

@moxian moxian commented Jan 19, 2025

Summary

None

Purpose of change

Hello
I heard IWYU was supposed to be good and useful, and I tried to apply it to the codebase.
This is the result.
Thank you.

But in seriousness, it's supposedly good because it tries to ensure the code "compiles" and not just "happens to compile". Which is probably a worthy goal.
It might also make the compile times a bit shorter, but see below on that.
(The actual original reason is that I was trying to see if C++20 modules would help with compile times, but I ran into the header mess, and thought that i could leverage IWYU to help me here...)

Describe the solution

Build IWYU from source (with llvm/clang-19, from clang_19 branch, on windows).
Run IWYU. Fix errors. Run it again. Fix new errors, add pragmas. Keep running and fixing things until it converges to a more or less stable state with no new suggestions
I've documented the steps to build it in tools/iwyu/running-on-windows.md. It's not the best location for it, but I can rearrange it in a followup pr.
There were a few "nontrivial" code changes (as in: not just touching the #include section), which i extracted into separate commits (first two).
The bulk of the change is just a mechanical run of the tool, and I don't think it's reasonably reviewable, sadly.

Describe alternatives you've considered

Not do this because it's just churn/line thrashing for little benefit. Although if that's the decision, then we should also document said decision in DEVELOPER_TOOLING.

I considered splitting it into several PRs but I don't think it makes reviewing any easier at all. And merge conflicts are surprisingly rare since people don't tend to touch includes much (I encountered just a single merge conflict so far). (I believe the only in-flight PR that will suffer is the save compression one...)
Splitting might also be a bit tricky due to the intertangled nature of the headers.
But let me know if this is desired, and I'll try to accomodate.

Another thing I tried to keep things reviewable was to separate changes into "automated" commits and "manual" fixups, but that ended up not helping in the slightest, so I just abandoned the idea, and dumped everything into a single megacommit.

Testing

CI passes in my fork moxian#9

Game runs for me locally on windows [at least, ran before the latest merge]

Build time impact

I've measured build times of the PR as reported by github CI when supplied a hacky workflow with ccache disabled. Then I compared those to a "reasonably fresh" commit (also with ccache disabled), and once more to a build at the exact same branching point as the staged IWYU changes. Here's the summary:

workflow original [w][m]
original2[w][m]
iwyu
[w][m]
delta %
Windows Build 35m 7s 33m 44s -83s -3.9%
36m 52s -3m8s -8.5%
Basic Build and Test
(Clang 10, Ubuntu, Curses)
33m 22s 32m 10s -72s -3.6%
32m 41s -31s -1.6%
Clang 12, Ubuntu, Tiles, ASan 41m 52s 42m 9s +17s +0.7%
43m 8s -59s -2.3%
Clang 14, macOS 12, Tiles, Sound,
x64 and arm64 Universal Binary
1h 24m 6s 1h 1m 26s -22m 40s -27%
79m 15s -17m 49s -22.5%
emscripten / Build 1h 41m 20s 1h 50m 3s +8m 43s +8.6%
1h 59m 31s +32s +0.5%
GCC 9, Curses, LTO 36m 14s 35m 8s -66s -3.0%
35m 58s -50s -2.3%
GCC 9, Ubuntu, Tiles, Sound, CMake 28m 7s 28m 14s +7s +0.4%
27m 59s +15s +0.9%
GCC 11, Ubuntu, Curses, ASan 49m 15s 48m 44s -31s -1.0%
50m 16s -92s -3.1%

My reading is that the build times only barely improved.

CI integration

It would be nice to add an IWYU check to github actions so as to prevent backsliding. However there are two big issues. 1) Sometimes IWYU suggests adding a header, and then, on a subsequent pass suggests removing it (with no other changes). 2) sometimes it gets hopelessly confused and suggests removing actual working code (for example it suggested to remove this whole section in generic_factory_test.cpp).

There was a suggestion in another pr (#73040 (comment)) to enable misc-include-cleaner check in our clang-tidy, but in my limited testing it seems that it's much more strict than IWYU. For example if you have two files

// file.h
#include <cstddef>
size_t return_number();
// file.cpp
#include "file.h"

size_t return_number() { return 42 }

then IWYU would be happy with that, whereas misc-include-cleaner would complain about the lack of size_t-providing header in file.cpp

Perhaps either or both of those issues are fixable, but I have not spent enough effort investigating.

Additional context

This is not fully IWYU-clean, there are a few leftovers left, but I really struggle doing finishing touches on a PR this large. The small fixes can happen in a followup PR I hope.

Having these be scattered across the codebase makes IWYU confused
and do silly things like suggest adding clzones.h when we want to
use a faction_id
.. multiple times, apply manual fixes as needed.
This is, understandably, unreviewable
@github-actions github-actions bot added <Documentation> Design documents, internal info, guides and help. NPC / Factions NPCs, AI, Speech, Factions, Ownership Info / User Interface Game - player communication, menus, etc. Translation I18n Items: Magazines Ammo holding items and objects. Missions Quests and missions Bionics CBM (Compact Bionic Modules) Map / Mapgen Overmap, Mapgen, Map extras, Map display Vehicles Vehicles, parts, mechanics & interactions Items: Food / Vitamins Comestibles and drinks Items: Battery / UPS Electric power management Crafting / Construction / Recipes Includes: Uncrafting / Disassembling Mutations / Traits / Professions/ Hobbies Mutations / Traits / Professions/ Hobbies Code: Tests Measurement, self-control, statistics, balancing. [C++] Changes (can be) made in C++. Previously named `Code` Monsters Monsters both friendly and unfriendly. Fields / Furniture / Terrain / Traps Objects that are part of the map or its features. [Markdown] Markdown issues and PRs Items: Ammo / Guns Ammunition for all kinds of weapons and these weapons themselves Melee Melee weapons, tactics, techniques, reach attack Scenarios New Scenarios, balancing, bugs with scenarios Character / World Generation Issues and enhancements concerning stages of creating a character or a world Player Faction Base / Camp All about the player faction base/camp/site Mechanics: Weather Rain, snow, portal storms and non-temperature environment labels Jan 19, 2025
@github-actions github-actions bot added Code: Tooling Tooling that is not part of the main game but is part of the repo. Mechanics: Enchantments / Spells Enchantments and spells Martial Arts Arts, Techniques, weapons and anything touching martial arts. Items: Armor / Clothing Armor and clothing Appliance/Power Grid Anything to do with appliances and power grid Limbs Limbs, mutable limbs, and code related to them. EOC: Effects On Condition Anything concerning Effects On Condition astyled astyled PR, label is assigned by github actions json-styled JSON lint passed, label assigned by github actions labels Jan 19, 2025
@moxian moxian marked this pull request as ready for review January 19, 2025 01:16
@github-actions github-actions bot requested review from dseguin and KorGgenT January 19, 2025 01:16
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Auto-requesting reviews from non-collaborators: @jbytheway @wapcaplet @Qrox @andrei8l

@Maleclypse
Copy link
Member

Maleclypse commented Jan 19, 2025

@Brambor am I remembering correctly that you've done some of this before to help with header inclusions I think?

edit: Not suggesting we shouldn't do this, just that Brambor might be able to help with it.

src/system_locale.cpp Outdated Show resolved Hide resolved
@github-actions github-actions bot added the BasicBuildPassed This PR builds correctly, label assigned by github actions label Jan 19, 2025
@Brambor
Copy link
Contributor

Brambor commented Jan 19, 2025

If this works, I am all for it. This is a better approach than mine, which was half automatic. In my PR, I reviewed all the changes, but it probably isn't necessary.

Note for merger: merge this separately. If this is merged with something else, it might break the compilation. This might work better than (my) previous IWYU attempts, as this does the entire project.

Another thing that can be measured are includes per file rather than the whole project compilation time, as I did in #73040 section "Were changes useful?".

All in all, I cannot really help, moxian is doing better than me :)

@moxian
Copy link
Contributor Author

moxian commented Jan 19, 2025

Note for merger: merge this separately. If this is merged with something else, it might break the compilation.

It will break the compilation no matter what the merger does.
The only way to ensure this keeps compiling is to merge/rebase the branch, then wait for the CI to pass, then immediately merge it, but I don't think that approach is practical given how long it halts the PR queue.
And even then, other PRs merging in later might accidentally end up breaking the build (even without experiencing merge conflicts). Even if they are merged "separately". They'd need to go do this whole dance of rebase -> wait for CI -> merge themselves if we want to be 100% free from compilation errors. And that's just not practical.

I would much rather merge it whenever, and then fix the compilation errors separately. It should not be tricky.

The changes are based on the ca1949d which landed ~4 days ago. When merging master into the branch I encountered one merge conflict, and then had to fix one compilation error. We don't have a high enough change volume right now for the conflicts to be a problem.
IMO.

Another thing that can be measured are includes per file rather than the whole project compilation time

I don't think it's a useful metric, but sure

At ~master:

$ make includes SOUNDS=1 TILES=1 LOCALIZE=1 -j 4
(...)
$ find objwin/ -name '*.inc' | xargs grep "src/" | wc -l
51122

With the PR:

$ make includes SOUNDS=1 TILES=1 LOCALIZE=1 -j 4
(...)
$ find objwin/ -name '*.inc' | xargs grep "src/" | wc -l
87990

that's a 72% increase in the amount of includes. But that's kinda intended? As in, you could simply include "item.h" + "avatar.h" and get the majority of your symbols that way, but IWYU sorta explicitly wants that you don't do that, and that you actually do put #include "type_id.h" in your cpp files even though you could simply get it transitively trough item.h

The includes have a #pragma once on them so there's a negligible amount of extra time spent when they are processed multiple times. So the changes to multiplicity of an include doesn't really affect anything meaningful.

@github-actions github-actions bot removed the BasicBuildPassed This PR builds correctly, label assigned by github actions label Jan 19, 2025
@moxian moxian marked this pull request as draft January 19, 2025 04:48
@moxian moxian marked this pull request as ready for review January 19, 2025 04:48
@andrei8l
Copy link
Contributor

Since I got pinged here as well, I'll repeat my statement from the clang-tidy PR: it doesn't make sense to address these unless they're caught by our CI or they cause significant issues. Since your own measurements show that build time is not affected, a significant issue would be some nasty include tangle and I don't think we have that either (anymore).

We've had another PR like this 10 months ago and we don't need another 7k loc churn. Please focus on catching these in CI first.

@github-actions github-actions bot added the BasicBuildPassed This PR builds correctly, label assigned by github actions label Jan 19, 2025
@moxian moxian marked this pull request as draft January 19, 2025 22:25
@moxian
Copy link
Contributor Author

moxian commented Jan 26, 2025

I'm a bit tired and burned out, so I'm unsure when (if ever) i will revisit this. I'll leave the PR in draft hell for now (feel free to close if you're reading this sufficiently in the future).

Some notes for future me/future other people who'd want to revisit this:

  • CI integration is very doable. Just give it -o clang and it will spit the diagnostics in a way GitHub Actions understands. Here's a sample workflow link: moxian@29664ae (output (don't read too much into the specific errors, that's a WIP code still, obviously))
  • Some files need to be excluded from the check because iwyu bugs out on them, but it's a really short list (as in, something like 2-5 total)
  • CI integration is neccessarily done on linux, not windows. This is important because
  • .. there are slight discrepancies between output on linux and on windows. I have not investigated this enough, but it seems to not depend on the compiler, but rather the OS itself. I.e. you get ~same-ish output with on windows cmd with clang as you get on windows mingw with clang as you get on windows mingw with gcc, but you get different output entirely on WSL. And by "different output" i mean "iwyu sees completely different set of headers and resolves what should be included differently", and has nothing to do with our platform-specific defines a la #ifdef _WIN32.
  • iwyu says that it's "very GCC-centric". In practice this means that the output you get on windows is pretty bad. For example it suggests adding #include <list> for any file that's using iterators, because there's some list mention deep in the bowels of msvc iterator implementation. I was trying to be mindful of issues like these when I was spot-checking the changes, but I was not digilent enough, and lots of those seeped through. So this PR as is is unmergeable really. Thanks a lot @andrei8l for [indirectly] catching this!
  • One can make the output not terrible on windows, but it requires a lot of effort mapping all the symbols appropriately. Perhaps someone on the internets has already done said effort, and we could take their mappings, but i didn't investigate this really. Most I've seen is someone saying "i tried to do that, and I failed miserably".
    • perhaps that's still doable by hand really, but that's the part where I too gave up
  • Output on linux, however, is nice and clean and totally usable and (definitionally) congruent with CI.

So what needs to happen there is I just need to run the tool on WSL.
But I don't want to, because WSL is mildly misconfirugred on my machine and I'm too lazy to fix it, and too annoyed to use.
.. oh and the docs need to be updated to say not to try running on windows.

Thus, draft hell.

@Brambor
Copy link
Contributor

Brambor commented Jan 26, 2025

My addition to the problems:

Another thing that can be measured are includes per file rather than the whole project compilation time

I don't think it's a useful metric, but sure

At ~master:

$ make includes SOUNDS=1 TILES=1 LOCALIZE=1 -j 4
(...)
$ find objwin/ -name '*.inc' | xargs grep "src/" | wc -l
51122

With the PR:

$ make includes SOUNDS=1 TILES=1 LOCALIZE=1 -j 4
(...)
$ find objwin/ -name '*.inc' | xargs grep "src/" | wc -l
87990

that's a 72% increase in the amount of includes. But that's kinda intended? As in, you could simply include "item.h" + "avatar.h" and get the majority of your symbols that way, but IWYU sorta explicitly wants that you don't do that, and that you actually do put #include "type_id.h" in your cpp files even though you could simply get it transitively trough item.h

The includes have a #pragma once on them so there's a negligible amount of extra time spent when they are processed multiple times. So the changes to multiplicity of an include doesn't really affect anything meaningful.

I don't think there should be an increase. Apart from making new header files or moving a definition to header files so that header now has to be included. If no such changes are made then by running IWYU the number of #include X in code increase, but they were already counted in the .inc file (otherwise it wouldn't compile). That file lists a header even if it is transitively included. That's why it is useful.

I thought the huge increase was your addition, moving definition/declaration to a header, but reading your long list of issues it is probably this:

  • iwyu says that it's "very GCC-centric". In practice this means that the output you get on windows is pretty bad. For example it suggests adding #include <list> for any file that's using iterators, because there's some list mention deep in the bowels of msvc iterator implementation. I was trying to be mindful of issues like these when I was spot-checking the changes, but I was not digilent enough, and lots of those seeped through.

Therefore, for future, I recommend moving code to headers and including them in separate commits from pure IWYU commits. IWYU commits should be checked with .inc "metric" and if it increases, something is wrong AFAIK.

I recommend the reader to read the .inc file. It is human readable. Also see #73040, future reader, I know moxian read it already.

@kevingranade
Copy link
Member

Basically the way forward with this is:

  1. get IWYU running on some ci (it doesn't have to be every one!)
  2. Add a opt-in or opt-out mechanism so we can push more code modules into iwyu coverage over time (but likely never reach 100%).
  3. Issue a potentially very long series of PRs opting modules into IWYU coverage.

It doesn't have to run cleanly on every platform, the system headers are different on different platforms and I don't think there's any way to reconcile that, IWYU is intended to work acolross a huge monorepo with a single build platform. Passing on one platform and failing IWYU but at least building on other platforms is still a valuable state to be in.

It's not the primary goal, but a good metric to look at for "is IWYU doing good things" is the total size of all preprocessed code modules. Including just the minimal set of headers in each .cpp file will tend to reduce this size, while the opposite extreme of putting every header file in every .cpp file will increase it.

If you run the primary compiler invocation with the -S argument it will output preprocessed modules instead of preprocessing and compiling to object files in one step.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Appliance/Power Grid Anything to do with appliances and power grid astyled astyled PR, label is assigned by github actions BasicBuildPassed This PR builds correctly, label assigned by github actions Bionics CBM (Compact Bionic Modules) [C++] Changes (can be) made in C++. Previously named `Code` Character / World Generation Issues and enhancements concerning stages of creating a character or a world Code: Tests Measurement, self-control, statistics, balancing. Code: Tooling Tooling that is not part of the main game but is part of the repo. Crafting / Construction / Recipes Includes: Uncrafting / Disassembling <Documentation> Design documents, internal info, guides and help. EOC: Effects On Condition Anything concerning Effects On Condition Fields / Furniture / Terrain / Traps Objects that are part of the map or its features. Game: Achievements / Conducts / Scores Player goals and how they are tracked. Info / User Interface Game - player communication, menus, etc. Items: Ammo / Guns Ammunition for all kinds of weapons and these weapons themselves Items: Archery Bows, crossbows, arrows, bolts Items: Armor / Clothing Armor and clothing Items: Battery / UPS Electric power management Items: Containers Things that hold other things Items: Food / Vitamins Comestibles and drinks Items: Magazines Ammo holding items and objects. json-styled JSON lint passed, label assigned by github actions Limbs Limbs, mutable limbs, and code related to them. Lore Game lore, in-game communication. Also the Lore tab. Map / Mapgen Overmap, Mapgen, Map extras, Map display [Markdown] Markdown issues and PRs Martial Arts Arts, Techniques, weapons and anything touching martial arts. Mechanics: Enchantments / Spells Enchantments and spells Mechanics: Weather Rain, snow, portal storms and non-temperature environment Melee Melee weapons, tactics, techniques, reach attack Missions Quests and missions Monsters Monsters both friendly and unfriendly. Mutations / Traits / Professions/ Hobbies Mutations / Traits / Professions/ Hobbies NPC / Factions NPCs, AI, Speech, Factions, Ownership Player Faction Base / Camp All about the player faction base/camp/site Scenarios New Scenarios, balancing, bugs with scenarios Translation I18n Vehicles Vehicles, parts, mechanics & interactions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants