Skip to content

Commit

Permalink
Revert implementation of Butler.put that used Datastore.put_new.
Browse files Browse the repository at this point in the history
After trying to re-implement datastore unit tests I realized that the
`Datastore.put_new` and `Registry.store_datastore_records` combination
(used to implement `Butler.put`) is not very compatible with transaction
rollback system that we now have in place. So I decided to keep
`Butler.put` unchanged for now until we do something more drastic with
transactions. I keep `Datastore.put_new` and `Registry.store_datastore_records`
implementations, they are not used but may be useful in the future.
  • Loading branch information
andy-slac committed Oct 11, 2023
1 parent 1c5ee3e commit 163e135
Showing 1 changed file with 18 additions and 38 deletions.
56 changes: 18 additions & 38 deletions python/lsst/daf/butler/_butler.py
Original file line number Diff line number Diff line change
Expand Up @@ -1180,62 +1180,42 @@ def put(
TypeError
Raised if the butler is read-only or if no run has been provided.
"""
if not self.isWriteable():
raise TypeError("Butler is read-only.")

if isinstance(datasetRefOrType, DatasetRef):
# This is a direct put of predefined DatasetRef.
log.debug("Butler put direct: %s", datasetRefOrType)
if run is not None:
warnings.warn("Run collection is not used for DatasetRef", stacklevel=3)
ref = datasetRefOrType

# If registry already has a dataset with the same dataset ID,
# dataset type and DataId, then _importDatasets will do nothing and
# just return an original ref. We have to raise in this case, there
# is a datastore check below for that.
self._registry._importDatasets([ref], expand=True)
self._registry._importDatasets([datasetRefOrType], expand=True)
# Before trying to write to the datastore check that it does not
# know this dataset. This is prone to races, of course.
if self._datastore.knows(ref):
raise ConflictingDefinitionError(f"Datastore already contains dataset: {ref}")
if self._datastore.knows(datasetRefOrType):
raise ConflictingDefinitionError(f"Datastore already contains dataset: {datasetRefOrType}")
# Try to write dataset to the datastore, if it fails due to a race
# with another write, the content of stored data may be
# unpredictable.
try:
stored_refs = self._datastore.put_new(obj, ref)
self._datastore.put(obj, datasetRefOrType)
except IntegrityError as e:
raise ConflictingDefinitionError(f"Datastore already contains dataset: {e}") from e
# TODO: we should re-order calls to datastore and registry and
# store records in _importDatasets()
self._registry.store_datastore_records(stored_refs)
return datasetRefOrType

log.debug("Butler put: %s, dataId=%s, run=%s", datasetRefOrType, dataId, run)
if not self.isWriteable():
raise TypeError("Butler is read-only.")
datasetType, dataId = self._standardizeArgs(datasetRefOrType, dataId, **kwargs)

# Handle dimension records in dataId
dataId, kwargs = self._rewrite_data_id(dataId, datasetType, **kwargs)

# Add Registry Dataset entry.
dataId = self._registry.expandDataId(dataId, graph=datasetType.dimensions, **kwargs)
(ref,) = self._registry.insertDatasets(datasetType, run=run, dataIds=[dataId])
self._datastore.put(obj, ref)

else:
log.debug("Butler put: %s, dataId=%s, run=%s", datasetRefOrType, dataId, run)
if not self.isWriteable():
raise TypeError("Butler is read-only.")
datasetType, dataId = self._standardizeArgs(datasetRefOrType, dataId, **kwargs)

# Handle dimension records in dataId
dataId, kwargs = self._rewrite_data_id(dataId, datasetType, **kwargs)

# Add Registry Dataset entry.
dataId = self.registry.expandDataId(dataId, graph=datasetType.dimensions, **kwargs)

# Add Registry Dataset entry.
dataId = self._registry.expandDataId(dataId, graph=datasetType.dimensions, **kwargs)
(ref,) = self._registry.insertDatasets(datasetType, run=run, dataIds=[dataId])
stored_refs = self._datastore.put_new(obj, ref)
# TODO: we should re-order calls to datastore and registry and
# store records in insertDatasets()
self._registry.store_datastore_records(stored_refs)

# We probably want to return a ref with datastore records, take any one
# from the stored refs (in case it's not empty).
# TODO: Disabled for now, this messes up logic in datastore.exists()
# and breaks unit tests.
# if stored_refs:
# ref = next(iter(stored_refs.values()))
return ref

# TODO: remove on DM-40067.
Expand Down

0 comments on commit 163e135

Please sign in to comment.