Software Heritage - Indexer

Tools to compute multiple indexes on SWH's raw contents:

An indexer is in charge of:

There are multiple indexers working on different object types:

content indexer: works with content sha1 hashes

revision indexer: works with revision sha1 hashes

origin indexer: works with origin identifiers

Indexation procedure:

Current content indexers:

mimetype (queue swh_indexer_content_mimetype): detect the encoding and mimetype
fossology-license (queue swh_indexer_fossology_license): compute the license
metadata: translate file from an ecosystem-specific formats to JSON-LD (using schema.org/CodeMeta vocabulary)

Current origin indexers:

metadata: translate file from an ecosystem-specific formats to JSON-LD (using schema.org/CodeMeta and ForgeFed vocabularies)

Name		Name	Last commit message	Last commit date
Latest commit History 1,013 Commits
docs		docs
sql		sql
swh/indexer		swh/indexer
.copier-answers.yml		.copier-answers.yml
.git-blame-ignore-revs		.git-blame-ignore-revs
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
AUTHORS		AUTHORS
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTORS		CONTRIBUTORS
LICENSE		LICENSE
Makefile		Makefile
Makefile.local		Makefile.local
README.rst		README.rst
codemeta.json		codemeta.json
conftest.py		conftest.py
pyproject.toml		pyproject.toml
requirements-swh.txt		requirements-swh.txt
requirements-test.txt		requirements-test.txt
requirements.txt		requirements.txt
tox.ini		tox.ini

Provide feedback