DM-46799: refactor DatasetRecordStorageManager #1095

The lone concrete dataset storage manager class and its intermediate base class have been merged, and many function parameters (all internal, of course) that plumb the SQL dataset ID type through many layers have been dropped.

This is a mostly-mechanical change; for each method: - move a method to the manager class; - give it a dataset type argument, and have it look up the storage object internally; - change calling code (SqlRegistry) to stop looking up the storage object before calling it. The ultimate goal is to get rid of DimensionRecordStorageManager, at least as a public interface. But the steps after this one won't be nearly as mechanical, so it's useful to separate them.

The dataset type cache now holds just the dataset type definition and dataset ID by name, with the SQLAlchemy table objects instead cached by DimensionGroup so they can be shared by multiple dataset types. DatasetRecordStorage has been removed (both the base class and its sole implementation) - its methods had already been moved to DatasetRecordStorageManager, and it no longer works as the opaque thing to put in the cache. Instead there's a new subpackage-private DynamicTables class that is cached by DimensionGroup (this is where the lazy loading of SQLAlchemy table objects now happens), and a module-private _DatasetRecordStorage struct that just combines that with the dataset type and its ID, to make it more convenient to pass all of that around. I also threw in some changes to the insert/import implementations because I started trying to reduce the degree that DatasetRecordStorage was being passed things that were either totally unused or assumed (without checking) have some value. I quickly realized that this problem is ubiquitous (especially with storage classes) and should be a separate ticket, but I've kept what I already did since I think it's a step in the right direction.

Supporting these would be extra complexity I don't think we need.

When the given dataset type differs from the registered dataset type in imports, it's not clear what the ideal behavior is, but the right choice for *this* ticket is clearly to not change that behavior.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DM-46799: refactor DatasetRecordStorageManager #1095

DM-46799: refactor DatasetRecordStorageManager #1095

Commits on Oct 16, 2024