Skip to content

Commit

Permalink
Add MQA documentation to README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
mjanez committed Feb 8, 2024
1 parent e5d2222 commit 7092531
Showing 1 changed file with 70 additions and 3 deletions.
73 changes: 70 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,12 +18,79 @@

This Docker Compose configuration integrates the powerful MQA toolset seamlessly with CKAN endpoints and European Data Portal catalogs, enabling users to perform in-depth assessments of metadata quality effortlessly. The setup provides an efficient way to run comprehensive quality checks on various metadata attributes, including data relevance, schema compliance, data format consistency, and adherence to standard vocabularies.

![5 MQA_dimensions png](https://github.com/mjanez/ckan-mqa/assets/96422458/0c54d8c3-e454-4a6a-bcd6-ebc0a0dae080)


>**Note**<br>
> It can be tested with an open data portal of the CKAN type such as: [mjanez/ckan-docker](https://github.com/mjanez/ckan-docker)[^1]
### [Metadata Quality Assessment Methodology](https://data.europa.eu/mqa/methodology)
The MQA measures the quality of various indicators, each indicator is explained in the tables below. The results of the checks are stored as Data Quality Vocabulary ([DQV](https://www.w3.org/TR/vocab-dqv/)) . DQV is a specification of the W3C that is used to describe the quality of a dataset.

**Dimension** | **Maximal points**
:----------------:|:------------------:
Findability | 100
Accessibility | 100
Interoperability | 110
Reusability | 75
Contextuality | 20
*Sum* | 405

The dimensions are derived from the FAIR principles:
* **Findability**
The following table describes the metrics that help people and machines in finding datasets. A maximum of 100 points can be scored in this area.

* **Accessibility**
The following table describes which metrics are used to determine whether access to the data referenced by the distributions is guaranteed. A maximum of 100 points can be scored in this area.

* **Interoperability**
The following table describes the metrics used to determine whether a distribution is considered interoperable. According to the assumption 'identical content with several distributions', only the distribution with the highest number of points is used to calculate the points. A maximum of 110 points can be scored in this area

* **Reusability**
The following table describes which metrics are used to check the reusability of the data. A maximum of 75 points can be scored in this area.

* **Contextuality**
The following table show some light weight properties, that provide more context to the user. A maximum of 20 points can be scored in this area.

![5 MQA_dimensions png](https://github.com/mjanez/ckan-mqa/assets/96422458/0c54d8c3-e454-4a6a-bcd6-ebc0a0dae080)

The final rating happens via four rating groups. The mapping of the points to the rating category is shown in the table below. The representation of the rating in the MQA is expressed exclusively via the rating categories. This enables providers to achieve the highest rating even with a slight deduction of points.

**Rating** | **Range of points**
:----------:|:-------------------:
Excellent | 351 - 405
Good | 221 – 350
Sufficient | 121 – 220
Bad | 0 - 120


#### Example of ckan-mqa results summary

**Dimension** | **Indicator/property** | **Count** | **Population** | **Percentage** | **Points** | **Weight**
:----------------:|:-----------------------------------------:|:---------:|:--------------:|:--------------:|:----------:|:----------:
Findability | dcat:keyword | 46 | 46 | 1.0 | 30.0 | 30
Findability | dcat:theme | 46 | 46 | 1.0 | 30.0 | 30
Findability | dct:spatial | 42 | 46 | 0.91 | 18.26 | 20
Findability | dct:temporal | 0 | 46 | 0.0 | 0 | 20
Accessibility | dcat:accessURL code=200 | 255 | 255 | 1.0 | 50.0 | 50
Accessibility | dcat:downloadURL | 0 | 255 | 0.0 | 0 | 20
Accessibility | dcat:downloadURL code=200 | 0 | 255 | 0.0 | 0 | 30
Interoperability | dct:format | 255 | 255 | 1.0 | 20.0 | 20
Interoperability | dcat:mediaType | 255 | 255 | 1.0 | 10.0 | 10
Interoperability | dct:format/dcat:mediaType from vocabulary | 378 | 510 | 0.74 | 7.41 | 10
Interoperability | dct:format non-proprietary | 131 | 255 | 0.51 | 10.27 | 20
Interoperability | dct:format machine-readable | 252 | 255 | 0.99 | 19.76 | 20
Interoperability | DCAT-AP compliance | 0 | 46 | 0.0 | 0 | 30
Reusability | dct:license | 255 | 255 | 1.0 | 20.0 | 20
Reusability | dct:license from vocabulary | 245 | 255 | 0.96 | 9.61 | 10
Reusability | dct:accessRights | 46 | 46 | 1.0 | 10.0 | 10
Reusability | dct:accessRights from vocabulary | 0 | 46 | 0.0 | 0 | 5
Reusability | dcat:contactPoint | 46 | 46 | 1.0 | 20.0 | 20
Reusability | dct:publisher | 46 | 46 | 1.0 | 10.0 | 10
Contextuality | dct:rights | 255 | 255 | 1.0 | 5.0 | 5
Contextuality | dcat:byteSize | 0 | 255 | 0.0 | 0 | 5
Contextuality | dct:issued | 46 | 46 | 1.0 | 5.0 | 5
Contextuality | dct:modified | 46 | 46 | 1.0 | 5.0 | 5
Total points | Rating: Good | | | 0.69 | 280.31 | 405

## Quick start
First copy the `.env.example` template as `.env` and configure by changing the `CKAN_CATALOG_URL`, as well as the DCAT-AP Profile version (`DCATAP_FILES_VERSION`), if needed.

Expand Down

0 comments on commit 7092531

Please sign in to comment.