Skip to content

Commit

Permalink
Merge pull request #11 from CindyKiran/feature-ELEBUILD-140
Browse files Browse the repository at this point in the history
[ELEBUILD-140] Anonymizer script moet niet afbreken wanneer deze een …
  • Loading branch information
Shuyinsama authored Jul 20, 2023
2 parents 39edbfd + 6ca058f commit 56d5926
Show file tree
Hide file tree
Showing 5 changed files with 119 additions and 48 deletions.
78 changes: 56 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,70 +1,104 @@
[![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)

## Eleven Anonymizer

This Anonymizer program is based on [DivanteLtd/Anonymizer](https://github.com/DivanteLtd/anonymizer).

This version is written in [Typescript](https://www.typescriptlang.org/) and [Deno](https://deno.land) and can be build to an executable.
This version is written in [Typescript](https://www.typescriptlang.org/) and [Deno](https://deno.land) and can be build
to an executable.

### Usage
Keep in mind that this tool will do actions on the database. Please make sure you are testing this first before committing any changes to the database.

Keep in mind that this tool will do actions on the database. Please make sure you are testing this first before
committing any changes to the database.
There is no warranty for using this script, use at your own risk.

### Why make this?
While the original Anonymizer by DivanteLtd works fine it needs much more work to be installed and implemented on our computers/servers.
We wanted the functionality used in that version, but be able to just call the Anonymizer from anywhere and point to the needed configurations.

Also by making most of the configurations as command line arguments we are able to use the same Anonymizer and run it on a DTAP environment without creating multiple configuration files.
While the original Anonymizer by DivanteLtd works fine it needs much more work to be installed and implemented on our
computers/servers.
We wanted the functionality used in that version, but be able to just call the Anonymizer from anywhere and point to the
needed configurations.

Also by making most of the configurations as command line arguments we are able to use the same Anonymizer and run it on
a DTAP environment without creating multiple configuration files.

### Why Typescript and Deno

Typescript is something we are more comfortable with in terms of implementation.
As a trial we wanted to make an executable CLI script. This is something that Typescript + Node could give us. Initially this project was made using Typescript + Node but after a while we choose Deno as it was a much better use case for this project.
The benefits of using Deno over Node is that Deno is secure by default. Things like File access and Network access can be enabled but are disabled by default. And even if enabled it can be configured to only allow specific locations or hosts/ip/ports.
As a trial we wanted to make an executable CLI script. This is something that Typescript + Node could give us. Initially
this project was made using Typescript + Node but after a while we choose Deno as it was a much better use case for this
project.
The benefits of using Deno over Node is that Deno is secure by default. Things like File access and Network access can
be enabled but are disabled by default. And even if enabled it can be configured to only allow specific locations or
hosts/ip/ports.

Since this scripts goal is to anonymize databases, security should be one of the main focuses.

### Making sure the integrity of the imports are correct
Deno has a way to make sure that the imported packages can't just update their code on the server without you automatically retrieving this code.

Deno has a way to make sure that the imported packages can't just update their code on the server without you
automatically retrieving this code.
That's why we also have a `lock.json` in the project.

When this project is opened for the first time on a new computer please run:

```deno cache --lock=lock.json src/deps.ts```

This will make sure that the correct versions are downloaded into the computers cache where each import is integrity checked.
This will make sure that the correct versions are downloaded into the computers cache where each import is integrity
checked.

### How to build to an executable
To compile the executable the following arguments need to be set during compile otherwise the executable will not be able to run correctly

To compile the executable the following arguments need to be set during compile otherwise the executable will not be
able to run correctly

```deno compile --allow-read --allow-net --allow-env=ANONYMIZER_LOCAL_HOSTNAME,ANONYMIZER_LOCAL_PORT,ANONYMIZER_LOCAL_DATABASE,ANONYMIZER_LOCAL_USERNAME,ANONYMIZER_LOCAL_PASSWORD,ANONYMIZER_CONFIG,ANONYMIZER_LOCAL_CONNECTION_TIMEOUT,FAKER_LOCALE --output=build/anonymizer src/anonymizer.ts```

It is recommended to limit the `--allow` flags like for example `--allow-read=/var/www` and `--allow-net=127.0.0.1`.

### How to use

Using the executable can be done as follows

```ANONYMIZER_LOCAL_DATABASE=<db_name> ANONYMIZER_LOCAL_USERNAME=<db_user> ANONYMIZER_LOCAL_PASSWORD=<db_pass> ANONYMIZER_CONFIG=<path/to/json/config/file> FAKER_LOCALE=<isolang> anonymizer```
```ANONYMIZER_LOCAL_DATABASE=<db_name> ANONYMIZER_LOCAL_USERNAME=<db_user> ANONYMIZER_LOCAL_PASSWORD=<db_password> ANONYMIZER_CONFIG=<path/to/json/config/file> FAKER_LOCALE=<isolang> anonymizer```

### Use the example to test

You can use the provided `example.sql` and `example.json` to verify the tool

1. Use the `example.sql` to create an `example` database with a `dummy_data` table and some entries
2. Create an `example` user and password and only grant it permissions to the `example` database.
3. Then run the Anonymizer using: `ANONYMIZER_LOCAL_DATABASE=example ANONYMIZER_LOCAL_USERNAME=example ANONYMIZER_LOCAL_PASSWORD=example ANONYMIZER_CONFIG=<path/to/example.json> FAKER_LOCALE=en anonymizer`
1. Import `example.sql` into your database, with the name `example`, to obtain dummy data
2. Create a user and password in your database and only grant it permissions to this database. In this case the user and password are both `example`
3. Then run, based on your database name, user and password, the Anonymizer using:
```bash
ANONYMIZER_LOCAL_DATABASE=example ANONYMIZER_LOCAL_USERNAME=example ANONYMIZER_LOCAL_PASSWORD=example ANONYMIZER_CONFIG=./example/example.json FAKER_LOCALE=en anonymizer
```
4. The following ENV variables are optional
* `ANONYMIZER_LOCAL_HOSTNAME` sets the hostname of MySQL (defaults to `127.0.0.1`)
* `ANONYMIZER_LOCAL_PORT` sets the port number of MySQL (defaults to `3306` )
* `ANONYMIZER_LOCAL_CONNECTION_TIMEOUT` set DB connection timeout in seconds (defaults to `60` )
* `FAKER_LOCALE` sets the faker locale (defaults to `en` )
* `ANONYMIZER_LOCAL_HOSTNAME` sets the hostname of MySQL (defaults to `127.0.0.1`)
* `ANONYMIZER_LOCAL_PORT` sets the port number of MySQL (defaults to `3306` )
* `ANONYMIZER_LOCAL_CONNECTION_TIMEOUT` set DB connection timeout in seconds (defaults to `60` )
* `FAKER_LOCALE` sets the faker locale (defaults to `en` )

### Unit tests
You can also run unit test by running the following command `deno test`

You can also run unit test by running the following command

``` bash
deno test --allow-env
```

You can also generate a coverage report by running the following commands

`deno test --allow-env --coverage=cov_profile`
``` bash
deno test --allow-env --coverage=cov_profile
```

`deno coverage cov_profile --lcov > cov_profile/cov_profile.lcov`
```bash
deno coverage cov_profile --lcov > cov_profile/cov_profile.lcov
```

If you have the `genhtml` package you can generate a html report of the coverage

`genhtml -o cov_profile/html cov_profile/cov_profile.lcov`
```bash
genhtml -o cov_profile/html cov_profile/cov_profile.lcov
```
5 changes: 4 additions & 1 deletion example/example.json
Original file line number Diff line number Diff line change
Expand Up @@ -42,8 +42,11 @@
}
},
"custom_queries": {
"before": [
"UPDATE dummy_data SET wrong_column = CONCAT(id, '@anonymizer.nl') WHERE wrong_column NOT LIKE '%@eleven.nl';"
],
"after": [
"UPDATE dummy_data SET custom = CONCAT(id, '@anonymizer.nl') WHERE custom NOT LIKE '%@eleven.nl';"
"UPDATE dummy_data SET custom_column = CONCAT(id, '@anonymizer.nl') WHERE custom_column NOT LIKE '%@eleven.nl';"
]
}
}
14 changes: 11 additions & 3 deletions example/example.sql
Original file line number Diff line number Diff line change
Expand Up @@ -34,14 +34,22 @@ CREATE TABLE `dummy_data` (
`city` varchar(250) DEFAULT NULL,
`email` varchar(250) DEFAULT NULL,
`telephone` varchar(250) DEFAULT NULL,
`custom` varchar(250) DEFAULT NULL,
`custom_column` varchar(250) DEFAULT NULL, /*Column for testing custom queries*/
`ignored_column` varchar(250) DEFAULT NULL, /*Column to test if script still runs if column does exist in database, but not mentioned in config file*/
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
);

/*Table with empty columns to test if script still runs if table does exist in database, but not mentioned in config file*/
CREATE TABLE `ignored_table` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`username` varchar(250) DEFAULT NULL,
PRIMARY KEY (`id`)
);

LOCK TABLES `dummy_data` WRITE;
/*!40000 ALTER TABLE `dummy_data` DISABLE KEYS */;

INSERT INTO `dummy_data` (`id`, `username`, `first_name`, `last_name`, `street`, `city`, `email`, `telephone`, `custom`)
INSERT INTO `dummy_data` (`id`, `username`, `first_name`, `last_name`, `street`, `city`, `email`, `telephone`, `custom_column`)
VALUES
(1,'user 1','first name 1','last name 1','street 1','city 1','[email protected]','0612345678','[email protected]'),
(2,'user 2','first name 2','last name 2','street 2','city 2','[email protected]','0612345678','[email protected]'),
Expand Down
16 changes: 13 additions & 3 deletions src/anonymizer.ts
Original file line number Diff line number Diff line change
Expand Up @@ -18,14 +18,24 @@ import {config} from './database/config.ts';
import {client} from './database/connection.ts';
import {executeCustomQueries, runQueriesFromConfig} from './database/transactions.ts';

const errors: Error[] = [];

// Do the first custom queries
await executeCustomQueries(config.custom_queries.before);
await executeCustomQueries(config.custom_queries.before, errors);

// Do the other queries specified in the config
await runQueriesFromConfig();
await runQueriesFromConfig(errors);

// Do the final custom queries
await executeCustomQueries(config.custom_queries.after);
await executeCustomQueries(config.custom_queries.after, errors);

// Done! Close the connection.
await client.close();

if (errors.length > 0) {
console.log('------------------------------------------Error Report------------------------------------------');

for (const error of errors) {
console.log(' [error] ' + error.message);
}
}
54 changes: 35 additions & 19 deletions src/database/transactions.ts
Original file line number Diff line number Diff line change
Expand Up @@ -24,13 +24,17 @@ import {parseRowConfig, RowConfig, truncate, updateToFakerValue, updateToStaticV
*
* @param queries
*/
const executeCustomQueries = async (queries: string[]): Promise<void> => {
const executeCustomQueries = async (queries: string[], errors: Error[]): Promise<void> => {
if (!queries || queries.length < 1) {
return;
}

for (const query of queries) {
await executeCustomQuery(query);
try {
await executeCustomQuery(query);
} catch (_e) {
errors.push(new Error(`Failed to execute custom query: ${query}`));
}
}
};

Expand All @@ -42,11 +46,19 @@ const executeCustomQueries = async (queries: string[]): Promise<void> => {
const executeCustomQuery = async (query: string): Promise<void> => {
await client.transaction(async (conn) => {
return await conn.query(query);
}).catch(() => {
throw new Error();
});
};

async function getDatabaseTables() {
return (await client.execute('SHOW TABLES;')).rows.map((row) => {
async function getDatabaseTables(): Promise<string[]> {
const tables = await client.execute('SHOW TABLES;');

if (tables === undefined || tables.rows === undefined || tables.rows.length === 0) {
return [];
}

return tables.rows.map((row) => {
return row[`Tables_in_${client.config.db}`]
});
}
Expand Down Expand Up @@ -85,21 +97,30 @@ async function getDatabaseTableColumns(table: string) {
* Only process columns that exists in the database and is specified in the config file.
* Also skip the 'id' column because it's the primary key and thereby not allowed to be anonymized.
*/
async function getColumnsToBeProcessed(table: string, configColumns: Record<string, Column>) {
return (await getDatabaseTableColumns(table)).rows.map(row => row['COLUMN_NAME'])
async function getColumnsToBeProcessed(table: string, configColumns: Record<string, Column>): Promise<string[]> {
const columns = await getDatabaseTableColumns(table);

if (columns === undefined || columns.rows === undefined || columns.rows.length === 0) {
return [];
}

return columns.rows.map(row => row['COLUMN_NAME'])
.filter(column => column !== 'id')
.filter(column => Object.keys(configColumns).includes(column));
}

/**
* Check if the tables specified in the config file exists in the database, if not, log an error.
* @param configTables tables as defined in the config file (i.e. the json file in anonymizer folder)
* @param databaseTables tables found in the database
* @param toBeProcessedTables tables found in the database
* @param errors
*/
function validateConfigTables(configTables: Record<string, Table>, databaseTables: string[], errors: Error[]) {
function validateConfigTables(configTables: Record<string, Table>, toBeProcessedTables: string[], errors: Error[]) {
if (toBeProcessedTables.length === 0) {
errors.push(new Error(`No tables found in database '${client.config.db}'`));
}
Object.keys(configTables)
.filter(table => !databaseTables.includes(table))
.filter(table => !toBeProcessedTables.includes(table))
.forEach(table => errors.push(new Error(`Given table '${table}' does not exist in the database`)));
}

Expand All @@ -111,6 +132,10 @@ function validateConfigTables(configTables: Record<string, Table>, databaseTable
* @param table the name of the table
*/
function validateConfigColumns(configColumns: Record<string, Column>, toBeProcessedColumns: string[], errors: Error[], table: string) {
if (toBeProcessedColumns.length === 0) {
errors.push(new Error(`No columns found in database table '${table}'`));
}

Object.keys(configColumns)
.filter(column => !toBeProcessedColumns.includes(column))
.forEach(column => errors.push(new Error(`Given column '${column}' does not exist in database table '${table}'`)));
Expand All @@ -119,10 +144,9 @@ function validateConfigColumns(configColumns: Record<string, Column>, toBeProces
/**
* Run the queries specified in the JSON config file.
*/
const runQueriesFromConfig = async () => {
const runQueriesFromConfig = async (errors: Error[]) => {
console.time('Anonymizer done in: ');

const errors: Error[] = [];
const configTables: Record<string, Table> = config.tables;
const databaseTables: string[] = await getDatabaseTables();
const toBeProcessedTables: string[] = Object.keys(configTables).filter(table => databaseTables.includes(table));
Expand All @@ -143,14 +167,6 @@ const runQueriesFromConfig = async () => {

await client.execute('DROP TABLE IF EXISTS `ANONYMIZER_JOIN_TABLE`;');
console.timeEnd('Anonymizer done in: ');

if (errors.length > 0) {
console.log('------------------------------------------Error Report------------------------------------------');

for (const error of errors) {
console.log(' [error] ' + error.message);
}
}
};

export {executeCustomQueries, runQueriesFromConfig};

0 comments on commit 56d5926

Please sign in to comment.