You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is basically to revert #1393 because we need to rely on the DeduplicationMappingConverter again but also bringing back the logic of translating content identifiers based on dedup relations imported from the graph.
3 years ago this was replaced with #1264, which was about relying on the mapping between the original identifiers and persistent identifiers, and since we decided to run the IIS on a non-deduped version of the graph back then we could simply drop the id translation based on the dedup identifiers.
Now we want to be able to run the IIS also on deduped version of the graph which means we need to rely on two layers of id translation for contents:
already implemented translation between the original identifiers (as defined in PDF Aggregation System) and persistent identifiers (defined in the graph)
"resurrected" translation between the persistent identifiers and deduplicated identifiers whenever given persistent id was deduped
This way we should end up with content identifiers which are fully matchable with deduped graph. When running IIS on the non deduplicated graph the 2nd id translation will not be applied because there will be no merges relations in the non-deduped graph.
The text was updated successfully, but these errors were encountered:
…ing content identifiers aligned with graph identifiers
InfoSpace importer now relies on two layes of mappings in order to fully support import from both deduplicated and non-deduplicated graph:
* between the original identifiers (as defined in PDF Aggregation System) and persistent identifiers (defined in the graph)
* between the persistent identifiers and deduplicated identifiers whenever given entity with a persistent id was dedupedlicated
The first mapping was in use up until now when IIS was mostly run on non-deduplicated data. The second mapping was reintroduced after it was replaced by the first mapping as a part of git#1264.
This is basically to revert #1393 because we need to rely on the
DeduplicationMappingConverter
again but also bringing back the logic of translating content identifiers based on dedup relations imported from the graph.3 years ago this was replaced with #1264, which was about relying on the mapping between the original identifiers and persistent identifiers, and since we decided to run the IIS on a non-deduped version of the graph back then we could simply drop the id translation based on the dedup identifiers.
Now we want to be able to run the IIS also on deduped version of the graph which means we need to rely on two layers of id translation for contents:
This way we should end up with content identifiers which are fully matchable with deduped graph. When running IIS on the non deduplicated graph the 2nd id translation will not be applied because there will be no
merges
relations in the non-deduped graph.The text was updated successfully, but these errors were encountered: