ckanext-attribution

A CKAN extension that adds support for complex attribution.

Overview

This extension standardises author/contributor attribution for datasets, enabling enhanced metadata and greater linkage between datasets. It currently integrates with the ORCID and ROR APIs; contributors ('agents') can be added directly from these databases, or manually.

Contributors can be added and edited via actions or via a Vue app that can be inserted into the package_metadata_fields.html template snippet.

Schema

The schema is (partially) based on the RDA/TDWG recommendations. Three new models are added: Agent (contributors), ContributionActivity, and Affiliation (plus small linking models between these and Package records).

`Agent`

Defines one agent.

Field	Type	Values	Notes
`agent_type`	string	'person', 'org', 'other'
`family_name`	string		only used for 'person' records
`given_names`	string		only used for 'person' records
`given_names_first`	bool	True, False	only used for 'person' records; if the given names should be displayed first according to the person's culture/language (default True)
`name`	string		used for non-'person' records
`location`	string		used for non-person records, optional; a location to display for the organisation to help differentiate between similar names (e.g. 'Natural History Museum (London)' and 'Natural History Museum (Dublin)')
`external_id`	string		an identifier from an external service like ORCID or ROR
`external_id_scheme`	string	'orcid', 'ror', other	the scheme for the `external_id`; currently only 'orcid' and 'ror' are fully supported, though basic support for others can be implemented by adding to the `attribution_controlled_lists` action
`user_id`	string	`User.id` foreign key	link to a user account on the CKAN instance

`ContributionActivity`

Defines one activity performed by one agent on one specific dataset.

Field	Type	Values	Notes
`activity`	string	[controlled vocabulary]	the activity/role the agent is associated with, e.g. 'Editor', 'Methodology'; roles are defined in the `attribution_controlled_lists` action, which currently lists the Datacite and CRediT role taxonomies (but can be expanded)
`scheme`	string	[controlled vocabulary]	name of the defined scheme from `attribution_controlled_lists`
`level`	string	'Lead', 'Equal', 'Supporting'	optional degree of contribution (from CRediT)
`time`	datetime		optional date/time of the activity
`order`	integer		order of the agent within all who are associated with the same activity, e.g. 1st Editor, 3rd DataCollector (optional)

A specialised ContributionActivity entry with a '[citation]' activity is used to define the order in which contributors should be cited (and/or if they should be cited at all).

`Affiliation`

Defines a relationship between two agents, either as a 'universal' (persistent) affiliation or for a single package (e.g. a project affiliation).

Field	Type	Values	Notes
`agent_a_id`	string	`Agent.id` foreign key	one of the two agents (a/b order does not matter)
`agent_b_id`	string	`Agent.id` foreign key	one of the two agents (a/b order does not matter)
`affiliation_type`	string		very short description (1 or 2 words) of affiliation, e.g. 'employment' (optional)
`description`	string		longer description of affiliation (optional)
`start_date`	date		date at which the relationship began, e.g. employment start date (optional)
`end_date`	date		date at which the relationship ended (optional)
`package_id`	string	`Package.id` foreign key	links affiliation to a specific package/dataset (optional)

Installation

Path variables used below:

$INSTALL_FOLDER (i.e. where CKAN is installed), e.g. /usr/lib/ckan/default
$CONFIG_FILE, e.g. /etc/ckan/default/development.ini

Installing from PyPI

pip install ckanext-attribution
# to use the CLI as well:
pip install ckanext-attribution[cli]

Installing from source

Clone the repository into the src folder:

cd $INSTALL_FOLDER/src
git clone https://github.com/NaturalHistoryMuseum/ckanext-attribution.git

Activate the virtual env:
```
. $INSTALL_FOLDER/bin/activate
```

Install via pip:

pip install $INSTALL_FOLDER/src/ckanext-attribution
# to use the cli as well:
pip install $INSTALL_FOLDER/src/ckanext-attribution[cli]

Installing in editable mode

Installing from a pyproject.toml in editable mode (i.e. pip install -e) requires setuptools>=64; however, CKAN 2.9 requires setuptools==44.1.0. See our CKAN fork for a version of v2.9 that uses an updated setuptools if this functionality is something you need.

Post-install setup

Add 'attribution' to the list of plugins in your $CONFIG_FILE:
```
ckan.plugins = ... attribution
```
Install lessc globally:
```
npm install -g "less@~4.1"
```

Add this block to package_metadata_fields.html to show the Vue app:

{% block package_custom_fields_agent %}
     {{ super() }}
{% endblock %}

Change the authors field in your SOLR schema.xml to set up faceting.
```
<schema>
    <fields>
        <...>
        <field name="author" type="string" indexed="true" stored="true" multiValued="true"/>
        <...>
    </fields>
    <...>
    <copyField source="author" dest="text"/>
</schema>
```
After making the changes, restart SOLR and reindex (ckan -c $CONFIG_FILE search-index rebuild). You will also have to enable the config option (see below) to see this in the UI.

Configuration

These are the options that can be specified in your .ini config file. NB: setting ckanext.attribution.debug to True means that the API accesses sandbox.orcid.org instead of orcid.org. Although both run by the ORCID organisation, these are different websites and you will need a separate account/set of credentials for each. It is also worth noting that you will not have access to the full set of authors on the sandbox.

API credentials [REQUIRED]

Name	Description	Options
`ckanext.attribution.orcid_key`	Your ORCID API client ID/key
`ckanext.attribution.orcid_secret`	Your ORCID API client secret

Optional

Name	Description	Options	Default
`ckanext.attribution.debug`	If true, use sandbox.orcid.org (for testing)	True/False	True
`ckanext.attribution.enable_faceting`	Enable filtering by contributor name (requires change to SOLR schema)	True/False	False

Usage

Actions

This extension adds numerous new actions. These are primarily CRUD actions for managing models, with inline documentation and predictable interactions. It's probably more helpful to only go over the more "unusual" new actions here.

`agent_list`

Search for agents by name or external ID, or just list all agents.

data_dict = {
    'q': 'QUERY',  # optional; searches in name, family_name, given_names, and external_id
}

toolkit.get_action('agent_list')({}, data_dict)

`package_contributions_show`

Show all contribution records for a package, grouped by agent. Optionally provide a limit and offset for pagination.

data_dict = {
    'id': 'PACKAGE_ID',
    'limit': 'PAGE_SIZE',
    'offset': 'OFFSET'
}

toolkit.get_action('package_contributions_show')({}, data_dict)

Returns a dict:

{
    'contributions': [
        {
            'agent': {
                # Agent.as_dict()
            },
            'activities': [
                # list of Activity.as_dict()
            ],
            'affiliations': [
                {
                    'affiliation': {
                        # Affiliation.as_dict()
                    },
                    'other_agent': {
                        # Agent.as_dict()
                    }
                },
                # ...
            ]
        },
        # ...
    ],
    'total': total,
    'offset': offset,
    'page_size': limit or total
}

`agent_affiliations`

Show all affiliations for a given agent, optionally limited to a specific dataset/package (plus ' global' affiliations).

data_dict = {
    'agent_id': 'AGENT_ID',
    'package_id': 'PACKAGE_ID'  # optional
}

toolkit.get_action('agent_affiliations')({}, data_dict)

Returns a list of records formatted as such:

{
    'affiliation': {
        # Affiliation.as_dict()
    },
    'other_agent': {
        # Agent.as_dict()
    }
}

`attribution_controlled_lists`

Returns collections of defined values (which can be modified by using @toolkit.chained_action).

data_dict = {
    'lists': ['NAME1', 'NAME2']  # optional; only return these lists
}

toolkit.get_action('attribution_controlled_lists')({}, data_dict)

There are four collections:

agent_types describes valid types for agents and adds additional detail;
contribution_activity_types contains role/activity taxonomies (i.e. Datacite and CRediT) and lists the available activity values;
contribution_activity_levels is a list of contribution levels (i.e. 'lead', 'equal', and ' supporting', from CRediT);
agent_external_id_schemes describes valid schemes for external IDs (currently, ORCID and ROR).

These collections are useful for validation and frontend connectivity/standardisation. They are contained within an action to a. enable frontend access via AJAX requests, and b. allow users to override values as needed.

`agent_external_search`

Search external sources (ORCID and ROR) for agent data. Ignores records that already exist in the database.

data_dict = {
    'q': 'QUERY_STRING',
    'sources': ['SOURCE1', 'SOURCE2']  # optional; only search these sources
}

toolkit.get_action('agent_external_search')({}, data_dict)

Results are returned formatted as such:

{
    'SCHEME_NAME': {
        'records': [
            # list of agent dicts
        ]
        'remaining': 10000  # number of other records found
    }
}

`agent_external_read`

Read data from an external source like ORCID or ROR, either from an existing record or a new external ID.

# EITHER
data_dict_existing = {
    'agent_id': 'AGENT_ID',
    'diff': False
    # optional; only show values that differ from the record's current values (default False)
}

# OR
data_dict_new = {
    'external_id': 'EXTERNAL_ID',
    'external_id_scheme': 'orcid'  # or 'ror', etc.
}

toolkit.get_action('agent_external_read')({}, data_dict)

Commands

NB: you will have to install the optional [cli] packages to use several of these commands.

`initdb`

ckan -c $CONFIG_FILE attribution initdb

Initialise database tables.

`sync`

ckan -c $CONFIG_FILE attribution sync $OPTIONAL_ID $ANOTHER_OPTIONAL_ID

Retrieve up-to-date information from external APIs for contributors with an external ID set.

`refresh-packages`

ckan -c $CONFIG_FILE attribution refresh-packages $OPTIONAL_ID $ANOTHER_OPTIONAL_ID

Update the author string for all (or the specified) packages.

`agent-external-search`

ckan -c $CONFIG_FILE attribution agent-external-search --limit 10 $OPTIONAL_ID $ANOTHER_OPTIONAL_ID

Search external APIs for contributors without an external ID set. Run refresh-packages and rebuild the search index after this command.

`merge-agents`

ckan -c $CONFIG_FILE attribution merge-agents --q $SEARCH_QUERY --match-threshold 75

Find agents with similar names (optionally matching the search query) and merge them. Run refresh-packages and rebuild the search index after this command.

`migratedb`

ckan -c $CONFIG_FILE attribution migratedb --limit 10 --dry-run --no-search-api

Attempt to extract names of contributors from author fields and convert them to the new format.

--limit will only convert a certain number of packages at a time.
--dry-run prevents saving to the database.
--no-search-api just extracts the names, without searching external APIs for contributors after.

It is recommended to run merge-agents, refresh-packages, and rebuild the search index after running this command.

Testing

There is a Docker compose configuration available in this repository to make it easier to run tests. The ckan image uses the Dockerfile in the docker/ folder.

To run the tests against ckan 2.9.x on Python3:

Build the required images:
```
docker compose build
```
Then run the tests. The root of the repository is mounted into the ckan container as a volume by the Docker compose configuration, so you should only need to rebuild the ckan image if you change the extension's dependencies.
```
docker compose run ckan
```

Name		Name	Last commit message	Last commit date
Latest commit History 266 Commits
.github		.github
ckanext		ckanext
docker		docker
docs		docs
tests		tests
.coveragerc		.coveragerc
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yaml		.readthedocs.yaml
CHANGELOG.md		CHANGELOG.md
CITATION.cff		CITATION.cff
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
setup.py		setup.py
test.ini		test.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ckanext-attribution

Overview

Schema

`Agent`

`ContributionActivity`

`Affiliation`

Installation

Installing from PyPI

Installing from source

Installing in editable mode

Post-install setup

Configuration

API credentials [REQUIRED]

Optional

Usage

Actions

`agent_list`

`package_contributions_show`

`agent_affiliations`

`attribution_controlled_lists`

`agent_external_search`

`agent_external_read`

Commands

`initdb`

`sync`

`refresh-packages`

`agent-external-search`

`merge-agents`

`migratedb`

Testing

About

Releases 24

Packages

Contributors 3

Languages

License

NaturalHistoryMuseum/ckanext-attribution

Folders and files

Latest commit

History

Repository files navigation

ckanext-attribution

Overview

Schema

Agent

ContributionActivity

Affiliation

Installation

Installing from PyPI

Installing from source

Installing in editable mode

Post-install setup

Configuration

API credentials [REQUIRED]

Optional

Usage

Actions

agent_list

package_contributions_show

agent_affiliations

attribution_controlled_lists

agent_external_search

agent_external_read

Commands

initdb

sync

refresh-packages

agent-external-search

merge-agents

migratedb

Testing

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases 24

Packages 0

Contributors 3

Languages

`Agent`

`ContributionActivity`

`Affiliation`

`agent_list`

`package_contributions_show`

`agent_affiliations`

`attribution_controlled_lists`

`agent_external_search`

`agent_external_read`

`initdb`

`sync`

`refresh-packages`

`agent-external-search`

`merge-agents`

`migratedb`

Packages