Sync model from CSV

This script allows easy integration of data coming from other systems through CSV files

It has been created for the architecture team of La Poste and they kindly accepted to share it with everyone under the MIT license (see below). This script has since been updated by other contributors (see script header for a detailed list).

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Note: The first published version of the script (v0.5, published in the master branch) is a monolyth (a single script containing both your configuration and the synchronization code) and requires you to duplicate it for each need. The next version (1.0) will isolate the synchronization code in a shared script library, making your own scripts lighter and easier to upgrade. This documentation covers both versions.

Here's a list of supported features:

Create or update elements (v0.5 always create new elements and update existing one, v1.0 will make it possible to explicitely enable/disable creation or update)
Synchronize element name, documentation and properties (each of them being either the value of some colums or a custom function)
Create or update relationships. 1:N is supported from both sides: multiple rows can have the same external id in a column, or single row can have multiple ids (separated by newlines) in a single column.
Synchronize relationships name
Synchronize only a subset of your CSV file (by defining id as a function returning null for filtered out rows)
Match CSV rows and model elements based on any criteria (a function can be defined to solve complex needs)
CSV rows are de-duplicated (for each mapping id, only the first one is kept)
Move new and updated elements into a model folder
Keep track of last synchronization date in a user defined property

How does it work?

Each mapping has its own configuration object. A mapping is a specific use-case you want to address. A mapping rely on a mapping id to be defined for both the CSV rows and model elements, and targets only one kind of model elements. A mapping id is a key which uniquely identifies a row from the CSV file and a model element.

The element synchronisation logic is:

Load the CSV file in memory and build an index based on the CSV mapping id. Rows leading to a null mapping id are filtered out.
Filter model elements from target type which have a non null mapping id and build an index for them.
For each indexed CSV rows, check if a model element have been indexed with the same mapping id.
1. If an element is found (and, in v1.0, if update is enable), update it. Note that, for performance reasons, this is a brute force update (i.e. element is updated even if it was already up to date)
2. If no element is found (and, in v1.0, if creation is enable), create a new element

In order to create or update relationships, you have to make sure that the target elements have been updated if needed. Thus elements and relationships synchronization are two separated steps. It is usually a good approach to first run all your element synchronizations, and then run your relationships synchronizations.

The relationship synchronization logic is:

For each indexed CSV rows, check if a matching model element exist. If no element is found, warn and skip to the next indexed row.
For each matching element and for each relationship mapping defined in the mapping configuration object:
1. Get the list of potential targets from their associated mapping configuration object (which can be different from if the relationships to be created involve different element types)
2. For each of these potential targets, check is a realtionship already exist. If yes update its synchronization date, if not, create it.

Plan your configuration

Implement your configuration

Run the script

Old documentation

Will be merged into the current one Each CSV file is described through a configuration object that you can later load and sync. Most of the mapping informations (Id, Name, Documentation, Properties...) can be expressed through the name of a column from the CSV file or through a function (which allows more advanced computation).

Each source of information (data source) has its own object which contains:

The label describing the datasource (label)
The path of the CSV file (csv)
The ArchiMate concept type used when importing (targetType)
The function which extracts "logical" id from an ArchiMate element, or null if element should be skipped (getId)
The function which stores external id into an ArchiMate element (setId)
The column name (or function) used to get (or compute) a unique id (id)
The column name (or function) used to get (or compute) element's name (name)
The column name (or function) used to get (or compute) element's documentation (documentation)
A mapping of external attributes to elements' properties (propMapping)
A description of external relations to map to ArchiMate relationships (relations)

Template:

var xxx = {
  'label': '',
  'csv': __DIR__+'xxx.csv',
  'targetType': '',
  'targetFolder': getFolder('layer', 'folderName'),
  'getId': function(element) {
    // Do something to extract id
  },
  'setId': function(element, row) {
    // Do something to store id
  },
  'id': '', // Can also be a function
  'name': '', // Can also be a function
  'documentation': '', // Can also be a function
  'propMapping': {
    'Property_name': 'Column_name',
    'Property_name': 'Column_name'
  },
  'relations': {
    'Relation_Label': {               // can be changed, will be used in logs
      'column': '',                   // column name
      'reference': dataSourceObject,  // reference of the referenced data dource object
      'targetType': '',               // ArchiMate relationship type
                                      //  see https://github.com/archimatetool/archi-scripting-plugin/wiki/jArchi-Collection#object-selector-type
      'accessType': '',               // sets access type if targetType='access-relationship'
                                      //  see https://github.com/archimatetool/archi-scripting-plugin/wiki/jArchi-Object#accesstype
      'isReversed': true|false        // sets direction of the relationship
    },
    // You can add other relations if needed
  }
};

Once configured, you can load and sync elements with a simple call:

loadAndSync(datasource);
loadAndSync(anotherDatasource);

And when elements have been synced, you can sync relationships with a simple call:

syncModelRelationships(datasource);

The provided script is configured to load Regions, Sub-regions and Countries from the country-codes dataset. Simply download both script and dataset from this gist and try it on a new or existing model. This should provide enough information for you to start customizing it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly