Skip to content

Latest commit

 

History

History
391 lines (284 loc) · 14.3 KB

anonymizer_module.md

File metadata and controls

391 lines (284 loc) · 14.3 KB
X-ROAD European Union / European Regional Development Fund / Investing in your future

X-Road Metrics - Anonymizer Module

License

This document is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License. To view a copy of this license, visit https://creativecommons.org/licenses/by-sa/4.0/

About

The Anonymizer module is part of X-Road Metrics, which include following modules:

The Anonymizer module is responsible of preparing the operational monitoring data for publication through the Opendata module. Anonymizer configuration allows X-Road Metrics extension administrator to set fine-grained rules for excluding whole operational monitoring data records or to modify selected data fields before the data is published.

The anonymizer module uses the operational monitoring data that Corrector module has prepared and stored to MongoDb as input. The anonymizer processes the data using the configured ruleset and stores the output to the opendata PostgreSQL database for publication.

Architecture

Anonymizer prepares data for the Opendata module. Overview of the module architecture related to publishing operational monitoring data through Opendata module is diagram below: system diagram

Networking

MongoDb is used to store "non-anonymized" operational monitoring data that should be accessible only by the X-Road Metrics administrators. Anonymized operational monitoring data that can be published for wider audience is stored in the PostgreSQL. The Opendata UI needs access only to the PostgreSQL. To follow the "principal of least privilege" it is recommended to install Opendata UI on a dedicated host that has no access at all to MongoDb. However, the Anonymizer module needs access also to the "not-public" data, so it should run on a host that has access to both MongoDb and PostgreSQL.

The anonymizer module provides no incoming network interfaces.

Database (PostgreSQL) setup

See Opendata database

Establish encrypted SSL/TLS client connection

For a connection to be known SSL-secured, SSL usage must be configured on both the client and the server before the connection is made. If it is only configured on the server, the client may end up sending sensitive information before it knows that the server requires high security.

To ensure secure connections ssl-mode and ssl-root-cert parameters has to be provided in settings file. Possible values for ssl-mode: disable, allow, prefer, require, verify-ca, verify-full For detailed information see https://www.postgresql.org/docs/current/libpq-ssl.html

To configure path to the SSL root certificate, set ssl-root-cert

Example of /etc/settings.yaml entry:

postgres:
  host: localhost
  port: 5432
  user: postgres
  password: *******
  database-name: postgres
  table-name: logs
  ssl-mode: verify-full
  ssl-root-cert: /etc/ssl/certs/root.crt

Installation

This sections describes the necessary steps to install the anonymizer module on an Ubuntu 20.04 or Ubuntu 22.04 Linux host. For a complete overview of different modules and machines, please refer to the ==> System Architecture <== documentation.

Add X-Road Extensions Package Repository for Ubuntu

wget -qO - https://artifactory.niis.org/api/gpg/key/public | sudo apt-key add -
sudo add-apt-repository 'https://artifactory.niis.org/xroad-extensions-release-deb main'

The following information can be used to verify the key:

  • key hash: 935CC5E7FA5397B171749F80D6E3973B
  • key fingerprint: A01B FE41 B9D8 EAF4 872F A3F1 FB0D 532C 10F6 EC5B
  • 3rd party key server: Ubuntu key server

Install Anonymizer Package

To install xroad-metrics-anonymizer and all dependencies execute the commands below:

sudo apt-get update
sudo apt-get install xroad-metrics-anonymizer

The installation package automatically installs following items:

  • xroad-metrics-anonymizer command
  • Linux user named xroad-metrics and group xroad-metrics
  • configuration files:
    • /etc/xroad-metrics/anonymizer/settings.yaml
    • /etc/xroad-metrics/anonymizer/field_data.yaml
    • /etc/xroad-metrics/anonymizer/field_translations.yaml
  • cron job /etc/cron.d/xroad-metrics-anonymizer-cron to run anonymizer periodically
  • log folders to /var/log/xroad-metrics/anonymizer/

Only xroad-metrics user can access the settings files and run xroad-metrics-anonymizer command.

To use corrector you need to fill in your X-Road, MongoDb and PostgreSQL configuration into the settings file. Next chapter has detailed instructions on how to configure the anonymizer module.

Usage

Anonymizer General Settings

Before configuring the Anonymizer module, make sure you have done the following:

  • installed and configured the Database_Module
  • created the MongoDB user accounts
  • installed and configured the Opendata database
  • created the Opendata database user accounts

To use anonymizer you need to fill in your X-Road, MongoDB and PostgreSQL configuration into the settings file. (here, vi is used):

sudo vi /etc/xroad-metrics/anonymizer/settings.yaml

Settings that the user must fill in:

  • X-Road instance name
  • mongodb host
  • username and password for the anonymizer module MongoDB user
  • host and port of the PostgreSQL server
  • username and password for anonymizer postgreSQL user
  • name of PostgreSQL database where to store the anonymized data
  • list of PostgreSQL users that should have read-only access to the anonymized data

The read-only PostgreSQL users should be the users that Opendata-UI and Networking modules use to read data from the PostgreSQL.

Configuration of Hiding Rules

Anonymizer can be configured to hide (exclude) whole data records from the open-data set by defining hiding rules in settings.yaml file.

A hiding rule consists of list of feature - regular expression pairs. If the contents of the field matches the regex, then the record is excluded from opendata set.

A typical example is to exclude all operational monitoring data records related to specific clients, services or member types. The example below defines two hiding rules. First rule will exclude all records where client id is "foo" and service id is "bar". The second rule will exclude all records where service member class is not "GOV".

# settings.yaml
anonymizer:

  ...

  hiding-rules:
    -
      - feature: 'clientMemberCode'
        regex: '^(foo)$'
      - feature: 'serviceMemberCode'
        regex: '^(bar)$'

    -
      - feature: 'serviceMemberClass'
        regex: '^(?!GOV$).*$'

Configuration of Substitution Rules

Anonymizer can be configured to substitute the values of selected fields in the opendata set for records that fulfill a set of conditions. These substitution rules are defined in settings.yaml file.

A substitution rule has two parts. First conditions has a set of rules that defines the set of records where the substitution applies. These conditions have same format as the hiding rules above. Second, there is the substitutions part that consists of feature-value pairs, where feature is the name of the field to be substituted and value contains the substitute string.

The below example defines two substitution rules. First rule will substitute client and service member codes with "N/A" if the client member code is "foo2". The second rule will substitute message id with 0, if client member code is "bar2" and service member code is "foo2".

# settings.yaml
anonymizer:

  ...

  substitution-rules:
    - conditions:
        - feature: 'clientMemberCode'
          regex: '^foo2$'

      substitutes:
        - feature: 'clientMemberCode'
          value: 'N/A'
        - feature: 'serviceMemberCode'
          value: 'N/A'

    - conditions:
        - feature: 'clientMemberCode'
          regex: '^bar2$'
        - feature: 'clientMemberCode'
          regex: '^foo2$'

      substitutes:
        - feature: 'messageId'
          value: '0'

Settings Profiles

To run anonymizer for multiple X-Road instances, a settings profile for each instance can be created. For example to have profiles DEV, TEST and PROD create three copies of setting.yaml file named settings_DEV.yaml, settings_TEST.yaml and settings_PROD.yaml. Then fill the profile specific settings to each file and use the --profile flag when running xroad-metrics-anonymizer. For example to run anonymizer manually using the TEST profile:

xroad-metrics-anonymizer --profile TEST

xroad-metrics-anonymizer command searches the settings file first in current working directory, then in /etc/xroad-metrics/anonymizer/

Manual usage

All anonymizer module can be executed by calling the xroad-metrics-anonymizer command. Command should be executed as user xroad-metrics so change to that user:

sudo su xroad-metrics

Currently following command line arguments are supported:

xroad-metrics-anonymizer --help                     # Show description of the command line arguments
xroad-metrics-anonymizer --limit <number>           # Optional flag to limit the number of records to process.
xroad-metrics-anonymizer --profile <profile name>   # Run with a non-default settings profile

Cron settings

Default installation includes a cronjob in /etc/cron.d/xroad-metrics-anonymizer-cron that runs anonymizer monthly. This job runs anonymizer using default settings profile (/etc/xroad-metrics/collector/settings.yaml)

If you want to change the collector cronjob scheduling or settings profiles, edit the file e.g. with vi

vi /etc/cron.d/xroad-metrics-anonymizer-cron

and make your changes. For example to run collector bi-weekly using settings profiles PROD and TEST:

SHELL=/bin/bash
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

# m   h  dom    mon dow  user       command
  15  30  1,15   *   *   xroad-metrics      xroad-metrics-anonymizer --profile TEST
  16  15  1,15   *   *   xroad-metrics      xroad-metrics-anonymizer --profile PROD

If collector is to be run only manually, comment out the default cron task:

# 15 30 15 * * xroad-metrics xroad-metrics-anonymizer

Opendata Anonymization

Opendata is already anonymized in some way. To be sure data does not contain any sensitive information we need to skip certain, not relevant anonymization steps like field translation and masking and apply hiding rules, substitution and transformation.

To anonymize opendata add crontab entry to /etc/cron.d/xroad-metrics-anonymizer-cron:

*/15  *  *  *  1-5  xroad-metrics  xroad-metrics-anonymizer --profile TEST --only_opendata

Database indexes

Anonymizer module would benefit in insertTime index while performing opendata anonymization. Refer to Indexes

Monitoring and Status

Logging

The settings for the log file in the settings file are the following:

xroad:
  instance: EXAMPLE

#  ...

logger:
  name: anonymizer
  module: anonymizer

  # Possible logging levels from least to most verbose are:
  # CRITICAL, FATAL, ERROR, WARNING, INFO, DEBUG
  level: INFO

  # Logs and heartbeat files are stored under these paths.
  # Also configure external log rotation and app monitoring accordingly.
  log-path: /var/log/xroad-metrics/anonymizer/logs

The log file is written to log-path and log file name contains the X-Road instance name. The above example configuration would write logs to /var/log/xroad-metrics/anonymizer/logs/log_anonymizer_EXAMPLE.json.

The anonymizer module log handler is compatible with the logrotate utility. To configure log rotation for the example setup above, create the file:

sudo vi /etc/logrotate.d/xroad-metrics-anonymizer

and add the following content :

/var/log/xroad-metrics/anonymizer/logs/log_anonymizer_EXAMPLE.json {
    rotate 10
    size 2M
}

For further log rotation options, please refer to logrotate manual:

man logrotate

Heartbeat

The settings for the heartbeat file in the settings file are the following:

xroad:
  instance: EXAMPLE

#  ...

logger:
  #  ...
  heartbeat-path: /var/log/xroad-metrics/anonymizer/heartbeat

The heartbeat file is written to heartbeat-path and heartbeat file name contains the X-Road instance name. The above example configuration would write logs to /var/log/xroad-metrics/anonymizer/heartbeat/heartbeat_anonymizer_EXAMPLE.json.

The heartbeat file consists last message of log file and status

  • status: possible values "FAILED", "SUCCEEDED"

Metrics statistics

Metrics statistics is executable script to calculate useful statistical data on Metrics. Gathered data is stored in database. Opendata module has API endpoint to view this data by accessing api/statistics

Database Configuration

Before viewing statistics data, make sure you have installed and configured the Database_Module and created the database credentials. See Database_Module

Cron Settings

Add cronjob entry to calculate metrics statistics regularly:

* * * * * xroad-metrics-statistics --profile TEST

This task will calculate statistical data and will store it into database

To view this data only in output without storing data into database use optional parameter --output_only:

xroad-metrics-statistics --profile TEST --output_only