-
Notifications
You must be signed in to change notification settings - Fork 3
Getting Started with Moab and moab versioning
These notes are shamelessly cribbed, with permission, from a UCSD doc "Using Moab Versioning System for Chronopolis (Kai Lin - December 2016)" https://docs.google.com/document/d/14p3EMRHjoo_osOCRgwaIGCoACUr1yoZmSbJ0fXwGWAk
Each object must have a primary key identifier in the repository system such as "jq937jp0017". This is what we often call the "object id", "digital object id", "id", etc.
Each version of an object in Moab has a unique sub-identifier using a "natural" numbering system with sequential integers, like v0001, v0002. These version identifiers are automatically generated by the moab-versioning gem.
- Simple folder structure for storing all objects.
- Fixity data is used for eliminating duplication. (e.g. checksums, file size)
- MD5, SHA1, or SHA256 hash values are generated for each file ingested by moab-versioning. These checksums are used along with the file size to determine whether a newly submitted file is a duplicate of another file in the same version or is an exact copy of a file that was previously ingested. The file signature catalog collects this data across versions.
- Files are immutable.
- Files are stored using original names.
- Version manifests are used to describe an entire object at a point in time.
- version inventory manifest specifies the complete set of files for a version of the digital object.
- signature catalog provides a union list of all file manifestations that have been preserved in any version of the digital object.
- version additions lists which files were added to the object's catalog in a given version.
- file inventory differences shows the details of the changes that occurred between any two given versions.
- Store each version's new or modified content files in a folder within the version's home directory.
The Moab storage repository supports multiple root directories, which can be configured in the file spec/spec_config.rb
.
Here is an example of the folders and files for an object 'bj102hs9687':
bj102hs9687
├── v0001
│ ├── data
│ │ ├── content
│ │ │ ├── eric-smith-dissertation-augmented.pdf
│ │ │ └── eric-smith-dissertation.pdf
│ │ └── metadata
│ │ ├── contentMetadata.xml
│ │ ├── descMetadata.xml
│ │ ├── identityMetadata.xml
│ │ ├── provenanceMetadata.xml
│ │ ├── relationshipMetadata.xml
│ │ ├── rightsMetadata.xml
│ │ ├── technicalMetadata.xml
│ │ └── versionMetadata.xml
│ └── manifests
│ ├── fileInventoryDifference.xml
│ ├── manifestInventory.xml
│ ├── signatureCatalog.xml
│ ├── versionAdditions.xml
│ └── versionInventory.xml
└── v0002
├── data
│ └── metadata
│ ├── contentMetadata.xml
│ ├── embargoMetadata.xml
│ ├── events.xml
│ ├── identityMetadata.xml
│ ├── provenanceMetadata.xml
│ ├── relationshipMetadata.xml
│ ├── rightsMetadata.xml
│ ├── versionMetadata.xml
│ └── workflows.xml
└── manifests
├── fileInventoryDifference.xml
├── manifestInventory.xml
├── signatureCatalog.xml
├── versionAdditions.xml
└── versionInventory.xml
Each digital object is stored in a separate filesystem directory named with the object ID. The first version's data folder will usually contain the majority of the object's files. The data folder of subsequent versions will contain only files newly added to the object or modifications to files in an older version.
The fixity properties of a file (e.g. checksums) are kept in a FileSignature.xml
file. The Moab design assumes that observed file size together with the checksums in FileSignature.xml
are sufficient to determine file equality and eliminate file redundancy.