![]() |
![]() |
---|
This document is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License. To view a copy of this license, visit https://creativecommons.org/licenses/by-sa/4.0/
The Anonymizer module is part of X-Road Metrics, which include following modules:
- Database module
- Collector module
- Corrector module
- Reports module
- Anonymizer module
- Opendata module
- Networking/Visualizer module
- Opendata Collector module
The Anonymizer module is responsible of preparing the operational monitoring data for publication through the Opendata module. Anonymizer configuration allows X-Road Metrics extension administrator to set fine-grained rules for excluding whole operational monitoring data records or to modify selected data fields before the data is published.
The anonymizer module uses the operational monitoring data that Corrector module has prepared and stored to MongoDb as input. The anonymizer processes the data using the configured ruleset and stores the output to the opendata PostgreSQL database for publication.
Anonymizer prepares data for the Opendata module. Overview of the module architecture related to publishing operational monitoring data
through Opendata module is diagram below:
MongoDb is used to store "non-anonymized" operational monitoring data that should be accessible only by the X-Road Metrics administrators. Anonymized operational monitoring data that can be published for wider audience is stored in the PostgreSQL. The Opendata UI needs access only to the PostgreSQL. To follow the "principal of least privilege" it is recommended to install Opendata UI on a dedicated host that has no access at all to MongoDb. However, the Anonymizer module needs access also to the "not-public" data, so it should run on a host that has access to both MongoDb and PostgreSQL.
The anonymizer module provides no incoming network interfaces.
For a connection to be known SSL-secured, SSL usage must be configured on both the client and the server before the connection is made. If it is only configured on the server, the client may end up sending sensitive information before it knows that the server requires high security.
To ensure secure connections ssl-mode
and ssl-root-cert
parameters has to be provided in settings file.
Possible values for ssl-mode
: disable
, allow
, prefer
, require
, verify-ca
, verify-full
For detailed information see https://www.postgresql.org/docs/current/libpq-ssl.html
To configure path to the SSL root certificate, set ssl-root-cert
Example of /etc/settings.yaml
entry:
postgres:
host: localhost
port: 5432
user: postgres
password: *******
database-name: postgres
table-name: logs
ssl-mode: verify-full
ssl-root-cert: /etc/ssl/certs/root.crt
This sections describes the necessary steps to install the anonymizer module on an Ubuntu 20.04 or Ubuntu 22.04 Linux host. For a complete overview of different modules and machines, please refer to the ==> System Architecture <== documentation.
wget -qO - https://artifactory.niis.org/api/gpg/key/public | sudo apt-key add -
sudo add-apt-repository 'https://artifactory.niis.org/xroad-extensions-release-deb main'
The following information can be used to verify the key:
- key hash: 935CC5E7FA5397B171749F80D6E3973B
- key fingerprint: A01B FE41 B9D8 EAF4 872F A3F1 FB0D 532C 10F6 EC5B
- 3rd party key server: Ubuntu key server
To install xroad-metrics-anonymizer and all dependencies execute the commands below:
sudo apt-get update
sudo apt-get install xroad-metrics-anonymizer
The installation package automatically installs following items:
- xroad-metrics-anonymizer command
- Linux user named xroad-metrics and group xroad-metrics
- configuration files:
- /etc/xroad-metrics/anonymizer/settings.yaml
- /etc/xroad-metrics/anonymizer/field_data.yaml
- /etc/xroad-metrics/anonymizer/field_translations.yaml
- cron job /etc/cron.d/xroad-metrics-anonymizer-cron to run anonymizer periodically
- log folders to /var/log/xroad-metrics/anonymizer/
Only xroad-metrics user can access the settings files and run xroad-metrics-anonymizer command.
To use corrector you need to fill in your X-Road, MongoDb and PostgreSQL configuration into the settings file. Next chapter has detailed instructions on how to configure the anonymizer module.
Before configuring the Anonymizer module, make sure you have done the following:
- installed and configured the Database_Module
- created the MongoDB user accounts
- installed and configured the Opendata database
- created the Opendata database user accounts
To use anonymizer you need to fill in your X-Road, MongoDB and PostgreSQL configuration into the settings file. (here, vi is used):
sudo vi /etc/xroad-metrics/anonymizer/settings.yaml
Settings that the user must fill in:
- X-Road instance name
- mongodb host
- username and password for the anonymizer module MongoDB user
- host and port of the PostgreSQL server
- username and password for anonymizer postgreSQL user
- name of PostgreSQL database where to store the anonymized data
- list of PostgreSQL users that should have read-only access to the anonymized data
The read-only PostgreSQL users should be the users that Opendata-UI and Networking modules use to read data from the PostgreSQL.
Anonymizer can be configured to hide (exclude) whole data records from the open-data set by defining hiding rules in settings.yaml file.
A hiding rule consists of list of feature - regular expression pairs. If the contents of the field matches the regex, then the record is excluded from opendata set.
A typical example is to exclude all operational monitoring data records related to specific clients, services or member types. The example below defines two hiding rules. First rule will exclude all records where client id is "foo" and service id is "bar". The second rule will exclude all records where service member class is not "GOV".
# settings.yaml
anonymizer:
...
hiding-rules:
-
- feature: 'clientMemberCode'
regex: '^(foo)$'
- feature: 'serviceMemberCode'
regex: '^(bar)$'
-
- feature: 'serviceMemberClass'
regex: '^(?!GOV$).*$'
Anonymizer can be configured to substitute the values of selected fields in the opendata set for records that fulfill a set of conditions. These substitution rules are defined in settings.yaml file.
A substitution rule has two parts. First conditions has a set of rules that defines the set of records where the substitution applies. These conditions have same format as the hiding rules above. Second, there is the substitutions part that consists of feature-value pairs, where feature is the name of the field to be substituted and value contains the substitute string.
The below example defines two substitution rules. First rule will substitute client and service member codes with "N/A" if the client member code is "foo2". The second rule will substitute message id with 0, if client member code is "bar2" and service member code is "foo2".
# settings.yaml
anonymizer:
...
substitution-rules:
- conditions:
- feature: 'clientMemberCode'
regex: '^foo2$'
substitutes:
- feature: 'clientMemberCode'
value: 'N/A'
- feature: 'serviceMemberCode'
value: 'N/A'
- conditions:
- feature: 'clientMemberCode'
regex: '^bar2$'
- feature: 'clientMemberCode'
regex: '^foo2$'
substitutes:
- feature: 'messageId'
value: '0'
To run anonymizer for multiple X-Road instances, a settings profile for each instance can be created.
For example to have profiles DEV, TEST and PROD create three copies of setting.yaml
file named settings_DEV.yaml
, settings_TEST.yaml
and settings_PROD.yaml
.
Then fill the profile specific settings to each file and use the --profile
flag when running xroad-metrics-anonymizer. For example to run anonymizer manually using the TEST profile:
xroad-metrics-anonymizer --profile TEST
xroad-metrics-anonymizer
command searches the settings file first in current working directory, then in
/etc/xroad-metrics/anonymizer/
All anonymizer module can be executed by calling the xroad-metrics-anonymizer
command.
Command should be executed as user xroad-metrics
so change to that user:
sudo su xroad-metrics
Currently following command line arguments are supported:
xroad-metrics-anonymizer --help # Show description of the command line arguments
xroad-metrics-anonymizer --limit <number> # Optional flag to limit the number of records to process.
xroad-metrics-anonymizer --profile <profile name> # Run with a non-default settings profile
Default installation includes a cronjob in /etc/cron.d/xroad-metrics-anonymizer-cron that runs anonymizer monthly. This job runs anonymizer using default settings profile (/etc/xroad-metrics/collector/settings.yaml)
If you want to change the collector cronjob scheduling or settings profiles, edit the file e.g. with vi
vi /etc/cron.d/xroad-metrics-anonymizer-cron
and make your changes. For example to run collector bi-weekly using settings profiles PROD and TEST:
SHELL=/bin/bash
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
# m h dom mon dow user command
15 30 1,15 * * xroad-metrics xroad-metrics-anonymizer --profile TEST
16 15 1,15 * * xroad-metrics xroad-metrics-anonymizer --profile PROD
If collector is to be run only manually, comment out the default cron task:
# 15 30 15 * * xroad-metrics xroad-metrics-anonymizer
Opendata is already anonymized in some way. To be sure data does not contain any sensitive information we need to skip certain, not relevant anonymization steps like field translation and masking and apply hiding rules, substitution and transformation.
To anonymize opendata add crontab entry to /etc/cron.d/xroad-metrics-anonymizer-cron:
*/15 * * * 1-5 xroad-metrics xroad-metrics-anonymizer --profile TEST --only_opendata
Anonymizer module would benefit in insertTime
index while performing opendata anonymization.
Refer to Indexes
The settings for the log file in the settings file are the following:
xroad:
instance: EXAMPLE
# ...
logger:
name: anonymizer
module: anonymizer
# Possible logging levels from least to most verbose are:
# CRITICAL, FATAL, ERROR, WARNING, INFO, DEBUG
level: INFO
# Logs and heartbeat files are stored under these paths.
# Also configure external log rotation and app monitoring accordingly.
log-path: /var/log/xroad-metrics/anonymizer/logs
The log file is written to log-path
and log file name contains the X-Road instance name.
The above example configuration would write logs to /var/log/xroad-metrics/anonymizer/logs/log_anonymizer_EXAMPLE.json
.
The anonymizer module log handler is compatible with the logrotate utility. To configure log rotation for the example setup above, create the file:
sudo vi /etc/logrotate.d/xroad-metrics-anonymizer
and add the following content :
/var/log/xroad-metrics/anonymizer/logs/log_anonymizer_EXAMPLE.json {
rotate 10
size 2M
}
For further log rotation options, please refer to logrotate manual:
man logrotate
The settings for the heartbeat file in the settings file are the following:
xroad:
instance: EXAMPLE
# ...
logger:
# ...
heartbeat-path: /var/log/xroad-metrics/anonymizer/heartbeat
The heartbeat file is written to heartbeat-path
and heartbeat file name contains the X-Road instance name.
The above example configuration would write logs to
/var/log/xroad-metrics/anonymizer/heartbeat/heartbeat_anonymizer_EXAMPLE.json
.
The heartbeat file consists last message of log file and status
- status: possible values "FAILED", "SUCCEEDED"
Metrics statistics is executable script to calculate useful statistical data on Metrics.
Gathered data is stored in database.
Opendata module has API endpoint to view this data by accessing api/statistics
Before viewing statistics data, make sure you have installed and configured the Database_Module and created the database credentials. See Database_Module
Add cronjob entry to calculate metrics statistics regularly:
* * * * * xroad-metrics-statistics --profile TEST
This task will calculate statistical data and will store it into database
To view this data only in output without storing data into database use optional parameter --output_only
:
xroad-metrics-statistics --profile TEST --output_only