Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor/More proper use of DI for Synchronizer #7500

Open
wants to merge 42 commits into
base: master
Choose a base branch
from

Conversation

asdacap
Copy link
Contributor

@asdacap asdacap commented Sep 27, 2024

Introduce DI

Why?

  • Reproducable service construction.

    • This mean test and production use the same code to construct class.
    • This reduces mistake and boilerplate.
    • I'm sure many of you have experience the impressive amount of code needed to create integration tests. And those impressive code need to be updated everytime we change a constructor. And those code are not necessarily the same code used in prod.
  • Easier code change.

    • This include refactor, new functionality or removing functionality.
    • We have many boilerplate code that just construct classes. These code create friction everytime we add or remove functionality.
    • I'm sure many of you have experience that changing a class means changing 6 other files that just wire the main class that you are changing. And don't forget integration tests. One or two class is fine, but the codebase is getting ridiculous, and more importantly, this add friction.
  • Lazy initialization and automatic dispose.

    • Instead of figuring out which component we need to disable whenever we just want to do one thing, for example verify trie, or era import/export, with dependency injection we just need to resolve the component that we need and only the required dependencies is created.
    • This saves a lot of time double checking what component is accidentally run unnecessarily, not just now, but in the future also.
    • Ow, did I mention that it automatically dispose resolved service in the right order?
    • Yes! It does! So you also, don't have to worry about how to dispose the service (and its dependencies!).
  • Pluggability

    • We added hooks within the initialization for plugins to use.
    • But these hooks are fixed. When a plugin can't achieve what it need to do, it mutates code in unexpected places, like the merge plugin modifying header validator in InitializeNetwork.
      • What if header validator is used before InitializeNetwork? Maybe not now, but in the future maybe. Maybe only some of the code uses the original header validator.
      • Automatic DI automatically resolve which class depends on which class so this is automatic and more importantly, reduces mistake.
      • And remember, what about integration tests?
    • We want to move more code into plugins, but what would the interface of the plugins look like? The initialization order may not be the same with one combination of plugins versus the other.
    • With dependency injection, its relatively easy to just extract TrieStore or Paprika or Portal or DiscV4/5 into an Autofac Module. The next question is then which part do you want to make plugin (and to be clear, you dont have to), not which hook to add.

Q&A

  • Do all code need to support dependency injection.

    • No, no. You can always wrap it around a big class like CompositeDiscoveryApp, and that big class can be resolved with DI.
    • But then whats the point?
      • Look, if it helps you, great. If it does not, then just don't. We don't have to do everything in one way.
      • However, it is in my opinion, that the application's subsystem's integration with each other should use dependency injection due to the benefit outline above and due to the sheer size of the application.
      • Then said, most part of the code can benefit using automatic dependency injection, and from what I can gather, most of the team member support dependency injection. Its impressive that we still don't have any sort of dependency injection in the code.
  • The configuration for dependency injection can be complicated maybe it will make the codebase more complicated.

    • Perhaps, in fact I was originally against adding it.
    • But IMO the codebase is already complicated and too large to handle without it, this move the complication to other part.
    • However, even IF it is more complicated, you still get:
      • Reproducability in test.
      • Pluggability.
      • Easier code change.
      • Lazy initialization.
  • This PR only modify some of part of the codebase while kinda hackily inject the rest of the dependency into the container. What about the construction of the dependencies?

    • The goal is to be able to support dependency injection on different initialization code slowly, building Module along the way.
    • There will be multiple IContainer and this IS an antipattern.
    • But we are starting to create the configuration which can be easily plugged in towards a more complete solution and that is the point.
      • On my last attempt, most of the initialization code is replaced, with Module, which is the point, but we have a lot of them. Putting one (or really, a few) huge pr is gonna conflict with other code. The way I see it, it must be done slowly, or we won't have it at all.
    • Even now, it can remove some unnecessary boilerplate, see (or no longer see) IBlockDownloaderFactory.cs.
  • A lot of code is not compatible with DI because of mismatched constructor or mutating config/dependency at runtime or nullable dependency. We should modify them first.

    • Firstly, knowing if it is compatible with DI or not is hard to notice at first without actually migrating it to DI.
      • Constructor conflict can be mitigated, by simply not changing it and passing its dependency one by one, which sounds pointless, but remember, you still got the rest of the benefit of DI like reproducable build and such.
      • The bigger problem is plugin integration which mutate INethermindApi, cyclic dependency, and lifetime.
      • And really, you won't realize them UNTIL you try to migrate.
    • So really, either try to move it to DI first without refactoring them, or refactor them at the same time, or don't refactor them at all.
    • Also, its easier to refactor code with dependency injection in the first place as you don't have to update 6 different place when you change the code.
    • I'd say, just integrate it first imperfectly, THEN clean it up.
  • This constructor clearly can use ISyncFeed[] and we can loop them all and wait for them. And we can just don't declare some of the feed instead of passing in a NoopSyncFeed.

    • Its going to be glaringly obvous that certain thing can be much easily or cleanly done some other way.
    • But its glarily obvious because this PR introduce DI.
    • Help me out here, let this PR land first, then that can be clean up and more importantly verify and test if it still works.
    • DI make it easier to refactor. But it still need to work! Thats why I don't want to change too many things at once.
  • This <insert class name here> class can also be dependency injected.

    • Look, help me out over here.
    • Like, really, once it touch a class which can be dependency injected, its going to touch another class, then that class can be dependency injected, then the whole InitializeNetwork can be dependency injected, then the MergePlugin can be injected and so on.
    • Which is great because you now have the drive to migrate things. Help me by doing exactly that... on a different PR, because trust me, moving everything is a lot code and a lot of work, (the integration test! think of the integration tests!), and if we make this PR large, its going to start to conflict with other PR.
    • Plus someone else can start adding DI on another part of the code in parallel.
    • If you feel itchy with InitializeNetwork specifically you can look at this branch which migrate the whole InitializeNetwork.
  • Test code shows that now Synchronizer is completely opaque: there is no way to know that we have to build a IServiceCollection with those specific services without looking into the implementation..

    • If its dependencies are also declared nicely then we DONT have to build them. Currently, this is an imperfect implementation, thats why there are list of them. This will be the case, until we migrate enough of the code.
    • That said, it still can know which service is required because it will fail at runtime, which IS a downside.
    • But think of the upside too, the wiring of the test can be made automatic AND use the same code as prod initilizer code, see this, this, this.
    • Keep in mind this test is an integration tests, it must construct those other classes.
  • Give me a vision.

    • There are no INethermindApi.
    • Some IStep is removed as it only construct but not run.
    • A new IStepMeta is used to declare steps, its dependency and dependents.
      • Can and should be declared by plugins.
      • The IStep for the IStepMeta is only resolved on its turn, meaning that if the step does not need to be constructed if it does not need to run.
      • There is now a final FullAppStep that starts other needed stops. Maybe you just need VerifyTrieStop, or EraImportStep.
      • You are no longer anxious that the RPC or the network or the special consensus client is started, or if the state need a special network bootup for some reason when you just need to provide simple import export.
    • Plugins are essentially Autofac's Module, registered after all core module, lets call it NethermindModule, which essentially have all components needed to start nethermind.
    • Integration tests, so high level, its like hive test, just faster, can be created with a container that have a final module, TestEnvironmentModule which replaces db and network.
    • Same set of tests can be used with a different plugin by simply registering the plugin's module after NethermindModule, before TestEnvironmentModule.
    • Paprika is a Module, pruning trie store is a Module, portal state store is a Module.

Changes

  • This time it resolve ISynchronizer completely from container.
  • Also INethermind*Api would fetch from container if implemented. Since ISynchronizer also forward ISyncModeSelector and ISyncProgressResolver, might as well directly get them from container also.
  • This provider a pretty straightforward way of adding DI little by little without moving everything, however, having a full DI can be hard to comprehend. See how full/fast sync and fast/beacon header is handled here. But at the same time it allows plugin to have a single point of configuration see MergeSynchronizer.ConfigureMergeComponent.
  • Although more complicated, AutoFac also brings automatic niceness like automatic ILogger, and IConfig, and IDb and friends, but not implemented here.

Types of changes

What types of changes does your code introduce?

  • Refactoring

Testing

Requires testing

  • Yes
  • No

If yes, did you write tests?

  • Yes
  • No

Notes on testing

@asdacap asdacap marked this pull request as ready for review September 27, 2024 11:59
@asdacap asdacap requested a review from rubo as a code owner September 27, 2024 11:59
Copy link
Member

@LukaszRozmej LukaszRozmej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that this is a stop-gap to start moving into DI direction.
That being said I am confused about two things:

  1. Multiple containers - I kind of don't see how that helps to achieve the goal? If the goal is to have seamless DI than this feels like counter productive. While it is fine not to migrate everything at once, how we should handle container(s) is the major design decision that we should investigate upfront and have some sensible answer (even if we change it later). Otherwise I think this doesn't achieve that much and potentially in itself increases complexity rather than decreases it?
  2. Direct registrations vs module registrations - this is probably unavoidable, but I would expect to understand the divide better, but I don't at the moment.

Would love to hear more opinions and reviews before we progress.

src/Nethermind/Nethermind.Api/IApiWithNetwork.cs Outdated Show resolved Hide resolved
src/Nethermind/Nethermind.Db/IDbProvider.cs Outdated Show resolved Hide resolved
src/Nethermind/Nethermind.Db/IDbProvider.cs Outdated Show resolved Hide resolved
public class MergeSynchronizer(
[KeyFilter(nameof(BeaconHeadersSyncFeed))] SyncFeedComponent<HeadersSyncBatch> beaconHeaderComponent,
ISyncConfig syncConfig,
Synchronizer baseSynchronizer,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Synchronizer baseSynchronizer,
ISynchronizer baseSynchronizer,

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need the WireFeedWithModeSelector which does not fit nicely into ISynchronizer. Alternatively, could copy the whole thing, which is also, not very nice.

@asdacap
Copy link
Contributor Author

asdacap commented Oct 1, 2024

  • The end goal is in fact single container. The problem is that, we don't have complete components.
    • For example, let say we have service, a, b, c, constructed in step A, B, C. b depends on a, and c depends on b.
    • Let say b is not migrated, but is used by c.
    • If we use single container, during the container building (which is gonna happen in A), b is not gonna get constructed yet at - that point, so it is not in the container and so c initialization will fail.
  • One solution is to have different lifetime for c where during the configuration of the lifetime in C, it would pick up b.
  • Another solution is to have more magic with Autofac that might solve by fetching b again at runtime with a custom registration source or something.
  • Another solution is to wait until b is migrated and when that happen, we can combine the three container into one.

@asdacap
Copy link
Contributor Author

asdacap commented Oct 1, 2024

I like the approach that each registration is done in a module. It looks nicer and make it easier to reuse in tests. I just don't do it completely here.

Copy link
Contributor

@emlautarom1 emlautarom1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Highly appreciate the detailed summary.

I'm sure many of you have experience that changing a class means changing 6 other files that just wire the main class that you are changing.

Agree, this is rather annoying but it's not a dealbreaker in my opinion.

Lazy initialization and automatic dispose.

I think this is a killer feature, in particular lazy initialization that removes the need to keep track which things are initialized first.

Pluggability

I also think this is a really important feature considering how we're moving towars a plugin-based architecture. We had to fall back to static properties during the Transaction refactoring (#7313) due to the lack of hooks or required initialization order.

The configuration for dependency injection can be complicated maybe it will make the codebase more complicated.

I don't think so. Initialization code is already complicated and has a lot of ad-hoc mechanisms like "WaitForX" to make it work. A unified DI container based approach should be more manageable.

Even now, it can remove some unnecessary boilerplate

This is good. We're closing to ~300.000 LOC and any deletion is welcomed.

Help me out here, let this PR land first, then that can be clean up and more importantly verify and test if it still works.

I'll approve since I agree with the vision and the proposed PR contains reasonable, localized changes (as you mention, let's go slowly step by step).

The end goal is in fact single container. The problem is that, we don't have complete components.

  • For example, let say we have service, a, b, c, constructed in step A, B, C. b depends on a, and c depends on b.
  • Let say b is not migrated, but is used by c.
  • If we use single container, during the container building (which is gonna happen in A), b is not gonna get constructed yet at - that point, so it is not in the container and so c initialization will fail.

I don't fully understand the example. Do we have a and c migrated? If so, I think the problem is that we migrated in the wrong order, that is, start from the leaves and move upwards (c, then b, then a). Maybe you could elaborate further?

Regarding the proposed solutions to this problem, I would avoid the introduction of lifetimes or Autofac magic for these intermediate refactoring steps.

@asdacap
Copy link
Contributor Author

asdacap commented Oct 3, 2024

Well, yes. But the point is to allow migrating out of order.

Copy link
Contributor

@flcl42 flcl42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you provide like a big picture of into what specific modules and components you would split the system

@@ -6,6 +6,8 @@
<PackageVersion Include="AspNetCore.HealthChecks.UI" Version="8.0.1" />
<PackageVersion Include="AspNetCore.HealthChecks.UI.Client" Version="8.0.1" />
<PackageVersion Include="AspNetCore.HealthChecks.UI.InMemory.Storage" Version="8.0.1" />
<PackageVersion Include="Autofac" Version="8.1.0" />
Copy link
Contributor

@alexb5dh alexb5dh Oct 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is Autofac needed on top of Microsoft.Extensions.DependencyInjection? If because of modules - can it be replicated by using separate IServiceBuilders instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be my question as well. I believe Autofac provides named registrations while Microsoft.Extensions.DependencyInjection may lack it. Not sure if this is the case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wrong. ServiceKey etc are the way to go.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For named registrations, there's IKeyedServiceProvider.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same reason as outlined here #6483.

I originally wanted to stick with Microsoft.Extensions.DependencyInjection, but can't make it work. See #7485.

src/Nethermind/Nethermind.Api/IApiWithBlockchain.cs Outdated Show resolved Hide resolved
@Scooletz
Copy link
Contributor

Scooletz commented Oct 3, 2024

  1. I like the idea of automatic composability. It's pleasing.
  2. I don't understand why Autofac and go straight with Microsoft.Extensions.DependencyInjection. What is the reason?
  3. I think that the initial proposition should go with one container and there should be a limitation of having a single one. This will increase the cost of adoption but will never leave oh there are 5 so I can add one more.
  4. This PR promises deletion and less configuration yet it doubles the registrations and resolutions in tests. To sell this idea, IMHO, one needs to remove some dirt from test initialization.

I think that in its current form, this brings not that much. More deletions, less configuration in tests, one container. That would be my goal.

@asdacap
Copy link
Contributor Author

asdacap commented Oct 3, 2024

Autofac because it has more feature, which can be a downside that is true. Specifically the main one is the ability to create and reconfigure hierarcy of lifetime. In this example, it is used to support full sync and fast sync, which use the exact same type set of service but only change IFeed. If its only one class, then its fine, can just use named type. But when it become a graph of them then it become a problem. This is needed in nethermind, specifically with block processing/block producer using the same type.

Also, with a simple registration source (not in this PR), you can inject ILogger directly instead of ILogManager. Which is nice. Same thing with IConfig.

I tried single container before, the cost is severe. See #6483, #6531. At least with multiple container at first, we would already have the module configuration.

More reduction in boilerplate can be seen here https://github.com/NethermindEth/nethermind/pull/7520/files#diff-2d1d749000abfdbd86d6f5187b2765d94b7e83dcb99dcb55f9f4d51fecf08c16. That will be after this PR.

@asdacap
Copy link
Contributor Author

asdacap commented Oct 3, 2024

Could you provide like a big picture of into what specific modules and components you would split the system

In the last attempt there is a:

  • BaseModule
  • DatabaseModule
  • CoreModule
  • StateModule
  • BlockchainModule
  • NetworkModule
  • KeyStoreModule

Plugins are literally more module. I guees whats left is block producer and rpc. And network was not complete at that time..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants