Skip to content
This repository has been archived by the owner on Jan 9, 2023. It is now read-only.

Command Line Support to Provenance

Marco Brandizi edited this page Feb 10, 2015 · 3 revisions

This is available by building the corresponding module, or downloading the binaries from the Maven repository.

The line command embeds the main commands, plus additional ones:

14:13:13 [brandizi@poderosaii myequivalents_prov_shell_0.4-SNAPSHOT]$ ./myeq.sh --help
...
 provenance find ...
   Finds provenance records, '%' can be used as wildcard

 provenance find-entity <entityId> [<user-email>...]
   Finds all the mapping operations about entity-id, which contributed to define the mappings entityId belongs to.
   If user emails are specified, it returns only entries created by the specified users

 provenance find-entity <entityId> <entityId> [<user-email>...]
   Finds all the mapping operations which contributed to define the mapping between the two entities
   If user emails are specified, it returns only provenance entries created by the specified users

 provenance purge [--prov-from <YYYYMMDD[-HHMMSS]]> [--prov-to <YYYYMMDD[-HHMMSS]]
   Remove old provenance entries in a given date range. For each parameter found in the range,
   all the entries about such parameter are removed, except the most recent one, in order to keep
   track of what/who produced a given record)
...
  -a,--prov-from <YYYMMDD[-HHMMSS]>            provenance find/purge, period to search, use
                                               something like $(date -v -1y +%Y%m%d) for calculating
                                               1 year ago
  -b,--prov-to <YYYMMDD[-HHMMSS]>              provenance find/purge, period to search
...
  -e,--prov-user <arg>                         provenance find, searched user email/login
...
  -m,--prov-param <type:value[:extraValue]>    provenance find, operation parameters to search
                                               (option can be repeated)
  -o,--prov-operation <arg>                    provenance find, operation to search

The provenance command line can be flexibly configured with different back ends, like the corrsponding base module. However, you need a back end supporting the provenance extension, such as the DB provenance extension or the web service client provenance extension, in order to support all the commands above. The base module for the command line can work with a back end supporting provenance, since, as said in the introduction, these are backward compatible: the extended commands are not available in such scenario, but the provenance of data change operations are automatically tracked.

Purging old data

The provenance purge command is of particular interest for the command line interface. This wraps the corresponding method in the ProvRegistryManager and can be used to remove old provenance entries. Note that this is done in such a way that the the records about the last operations that contributed to the formation of an equivalence class are preserved.

For instance, if we have this sequence of operations (and no others that involve the mentioned entities):

1 store.mapping ( a, b )
2 store.mapping ( b, c )
3 store.mapping ( d, e ) # (a, b, c) and (d, e) at this point  
4 store.mapping ( b, a ) # nothing changes, we already know it

the operations are reordered based on the parameters:

a: 4, 1
b: 4, 2, 1
c: 2
d: 3
e: 3

and then, for each parameter, oldest operations are removed, unless these involve other parameters. So, in the example above, operation 1 will be the only one that is purged, because: 4 keeps track of a and b; 2 keeps track of b and c; 3 keeps track of d and e.