Skip to content

Commit

Permalink
feat: partition mask (#383)
Browse files Browse the repository at this point in the history
* feat(partition): add venom test

* feat(partition): create empty partition mask

* feat(partition): test partition conditions

* feat(partition): exec active partition

* fix(partition): partitions must be ordered

* feat(partition): update docs
  • Loading branch information
adrienaury authored Jan 15, 2025
1 parent a3e799c commit f2acba3
Show file tree
Hide file tree
Showing 7 changed files with 269 additions and 0 deletions.
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,10 @@ Types of changes
- `Fixed` for any bug fixes.
- `Security` in case of vulnerabilities.

## [1.30.0]

- `Added` mask `partition` to handle fields containing different types of values by applying distinct transformations

## [1.29.1]

- `Fixed` mock command ignores global seed flag
Expand Down
30 changes: 30 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -165,6 +165,7 @@ The following types of masks can be used :
* [`replacement`](#replacement) is to mask a data with another data from the jsonline.
* [`pipe`](#pipe) is a mask to handle complex nested array structures, it can read an array as an object stream and process it with a sub-pipeline.
* [`apply`](#apply) process selected data with a sub-pipeline.
* [`partitions`](#partitions) will rely on conditions to identify specific cases.
* [`luhn`](#luhn) can generate valid numbers using the Luhn algorithm (e.g. french SIRET or SIREN).
* [`markov`](#markov) can generate pseudo text based on a sample text.
* [`findInCSV`](#findincsv) get one or multiple csv lines which matched with Json entry value from CSV files.
Expand Down Expand Up @@ -1069,6 +1070,35 @@ By default, if not specified otherwise, these classes will be used (input -> out

[Return to list of masks](#possible-masks)

### Partitions

[![Try it](https://img.shields.io/badge/-Try%20it%20in%20PIMO%20Play-brightgreen)](https://cgi-fr.github.io/pimo-play/#c=G4UwTgzglg9gdgLgAQCICMKBQBbAhhAayjgHMFNMkkBaJCEAGxAGMAXGMcq7pAKwngAHXKwAWyFLgCuYjlh55CXHkmFhWUDfAjKVNJHFzYQEkHigN5e7gHdRIREgDkAbxcA6abLBIAPkgATEAAzaQZWVBQ-JGwpCFYAJRASEAAPAFkRZlFUAD0AbVxqAC8AXQBqAAFCkoqAHTr3GrLygBIogF8Op0prKjEHXT79Zm1WXDhWCQn4AE9sSoCYczh3UewrPVpDYwkoAM3rO0HnN08ZUQ5ooNCpcMjo2PiklIysnJQCgAZqAE4K9pILo9YZIAaIXqg2ijODxCZTVBfJHIlGHHjbIwmVAwAZgNEqcFDPrQsbw6ZwOYbTBAA&i=N4KABGBECGCuAuALA9gJ0gLjAbXBKApgLbQCWANgAIAmyJpAdgHQDGdkANHhJAIwBMAZgAsAVgBsnblABSyRAzAARZAUh4AuiAC+QA)

The partition mask will rely on conditions to identify specific cases and apply a defined list of masks for each case. Example configuration:

```yaml
- selector:
jsonpath: "ID"
mask:
partitions: # only the fist active condition will execute
- name: case1
when: '{{ regexMatch "P[A-Z]{3}[0-9]{3}" .ID }}'
then:
# List of masks for case 1
- constant: "this is case 1"
- name: case2
when: '{{ regexMatch "G[0-9]{11}" .ID }}'
then:
# List of masks for case 2
- constant: "this is case 2"
- name: default # case with no "when" condition will always execute
then:
# List of masks for unrecognized cases
- constant: "this is another case"
```

[Return to list of masks](#possible-masks)

### FindInCSV

[![Try it](https://img.shields.io/badge/-Try%20it%20in%20PIMO%20Play-brightgreen)](https://cgi-fr.github.io/pimo-play/#c=G4UwTgzglg9gdgLgAQCICMKBQBbAhhAayjgHMFMkkBaJCEAGxAGMAXGMcyrpAKwngAOuFgAtkKYgDMYWbnkIRO3aklwATNUnGzlNScTUBJOAGEAygDUlyrgFcwUcSJYsBigPTuSUCCwB03qK2AEa2dGBM8CwgcP6R2O64YNje9IwQ7mgAnAAswUySkgDMAKwADGVoIADsIMElRbgATLgAHME5OVXtTUwAbO5guADu7llNTRX5Zbh91UVqJUwgTWhoM7jqOSWdIGp9TDmVZZJ9rdXuAjAEINjwfkwQwDo2lCAAHrisALLCTGIUV7cR7AZAAcgA3hCABQGD5IPyoAAqAE8BCAkBgAJRIAA+SHoMGG4CQAF9SWDAUC3rEwCjxFC-Cw0SAAPpockvV48L5MJJqazUkHgxkAOVw2Ax+MJxLAZIpVOpMRYdIZEL8cAlUpl4E51K4ipsH3RrD24mEVEY+BYVHgIC5NhEIHU4GQKtsIENyhVUGwbrAHswQA&i=N4KABGBEAuCeAOBTA+gRkgLigMwJYCdFIAacKAOwEMBbIrSAY0v1vIBNF9IQBfIA)
Expand Down
2 changes: 2 additions & 0 deletions internal/app/pimo/pimo.go
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ import (
"github.com/cgi-fr/pimo/pkg/markov"
"github.com/cgi-fr/pimo/pkg/model"
"github.com/cgi-fr/pimo/pkg/parquet"
"github.com/cgi-fr/pimo/pkg/partition"
"github.com/cgi-fr/pimo/pkg/pipe"
"github.com/cgi-fr/pimo/pkg/randdate"
"github.com/cgi-fr/pimo/pkg/randdura"
Expand Down Expand Up @@ -343,6 +344,7 @@ func injectMaskFactories() []model.MaskFactory {
sequence.Factory,
sha3.Factory,
apply.Factory,
partition.Factory,
}
}

Expand Down
7 changes: 7 additions & 0 deletions pkg/model/model.go
Original file line number Diff line number Diff line change
Expand Up @@ -241,6 +241,12 @@ type ApplyType struct {
URI string `yaml:"uri" json:"uri" jsonschema_description:"URI of the mask resource"`
}

type PartitionType struct {
Name string `yaml:"name" json:"name" jsonschema_description:"name of the partition"`
When string `yaml:"when,omitempty" json:"when,omitempty" jsonschema_description:"template to execute, if true the condition is active"`
Then []MaskType `yaml:"then" json:"then" jsonschema_description:"list of masks to execute if the condition is active"`
}

type MaskType struct {
Add Entry `yaml:"add,omitempty" json:"add,omitempty" jsonschema:"oneof_required=Add,title=Add Mask,description=Add a new field in the JSON stream"`
AddTransient Entry `yaml:"add-transient,omitempty" json:"add-transient,omitempty" jsonschema:"oneof_required=AddTransient,title=Add Transient Mask" jsonschema_description:"Add a new temporary field, that will not show in the JSON output"`
Expand Down Expand Up @@ -280,6 +286,7 @@ type MaskType struct {
Sequence SequenceType `yaml:"sequence,omitempty" json:"sequence,omitempty" jsonschema:"oneof_required=Sequence,title=Sequence Mask" jsonschema_description:"Generate a sequenced ID that follows specified format"`
Sha3 Sha3Type `yaml:"sha3,omitempty" json:"sha3,omitempty" jsonschema:"oneof_required=Sha3,title=Sha3 Mask" jsonschema_description:"Generate a variable-length crytographic hash (collision resistant)"`
Apply ApplyType `yaml:"apply,omitempty" json:"apply,omitempty" jsonschema:"oneof_required=Apply,title=Apply Mask" jsonschema_description:"Call external masking file"`
Partition []PartitionType `yaml:"partitions,omitempty" json:"partitions,omitempty" jsonschema:"oneof_required=Partition,title=Partition Mask" jsonschema_description:"Identify specific cases and apply a defined list of masks for each case"`
}

type Masking struct {
Expand Down
142 changes: 142 additions & 0 deletions pkg/partition/partition.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,142 @@
package partition

import (
"bytes"
"hash/fnv"
tmpl "text/template"

"github.com/cgi-fr/pimo/pkg/template"

"github.com/cgi-fr/pimo/pkg/model"
"github.com/rs/zerolog/log"
)

type MaskEngine struct {
partitions []Partition
seed int64
seeder model.Seeder
}

type Partition struct {
name string
when *template.Engine
exec model.Pipeline
}

func buildDefinition(masks []model.MaskType, globalSeed int64) model.Definition {
definition := model.Definition{
Version: "1",
Seed: globalSeed,
Functions: nil,
Masking: []model.Masking{},
Caches: nil,
}

for _, mask := range masks {
definition.Masking = append(definition.Masking, model.Masking{
Selector: model.SelectorType{Jsonpath: "."},
Mask: mask,
})
}

return definition
}

// NewMask return a MaskEngine from a value
func NewMask(partitions []model.PartitionType, caches map[string]model.Cache, fns tmpl.FuncMap, seed int64, seeder model.Seeder, seedField string) (MaskEngine, error) {
parts := []Partition{}

// Build partitions pipelines
for _, partition := range partitions {
template, err := template.NewEngine(partition.When, fns, seed, seedField)
if err != nil {
return MaskEngine{}, err
}

if partition.When == "" {
template = nil
}

definition := buildDefinition(partition.Then, seed)
pipeline := model.NewPipeline(nil)
pipeline, _, err = model.BuildPipeline(pipeline, definition, caches, fns, "", "")
if err != nil {
return MaskEngine{}, err
}

parts = append(parts, Partition{
name: partition.Name,
when: template,
exec: pipeline,
})
}

return MaskEngine{parts, seed, seeder}, nil
}

func execPipeline(pipeline model.Pipeline, e model.Entry) (model.Entry, error) {
var result []model.Entry

err := pipeline.
WithSource(model.NewSourceFromSlice([]model.Dictionary{model.NewDictionary().With(".", e)})).
// Process(model.NewCounterProcessWithCallback("internal", 1, updateContext)).
AddSink(model.NewSinkToSlice(&result)).
Run()
if err != nil {
return nil, err
}

if len(result) == 0 {
return nil, nil
}

return result[0], nil
}

func (me MaskEngine) Mask(e model.Entry, context ...model.Dictionary) (model.Entry, error) {
log.Info().Msg("Mask partition")

// exec all partitions
for _, partition := range me.partitions {
var output bytes.Buffer

if partition.when != nil {
if err := partition.when.Execute(&output, context[0].UnpackUnordered()); err != nil {
return nil, err
}
} else {
output.WriteString("true")
}

if output.String() == "true" {
log.Info().Msgf("Mask partition - executing partition %s", partition.name)

result, err := execPipeline(partition.exec, e)
if err != nil {
return e, err
}

return result, nil
}
}

return e, nil
}

// Factory create a mask from a configuration
func Factory(conf model.MaskFactoryConfiguration) (model.MaskEngine, bool, error) {
if len(conf.Masking.Mask.Partition) > 0 {
seeder := model.NewSeeder(conf.Masking.Seed.Field, conf.Seed)

// set differents seeds for differents jsonpath
h := fnv.New64a()
h.Write([]byte(conf.Masking.Selector.Jsonpath))
conf.Seed += int64(h.Sum64()) //nolint:gosec
mask, err := NewMask(conf.Masking.Mask.Partition, conf.Cache, conf.Functions, conf.Seed, seeder, conf.Masking.Seed.Field)
if err != nil {
return mask, true, err
}
return mask, true, nil
}
return nil, false, nil
}
39 changes: 39 additions & 0 deletions schema/v1/pimo.schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -584,6 +584,12 @@
"apply"
],
"title": "Apply"
},
{
"required": [
"partitions"
],
"title": "Partition"
}
],
"properties": {
Expand Down Expand Up @@ -778,6 +784,14 @@
"$ref": "#/$defs/ApplyType",
"title": "Apply Mask",
"description": "Call external masking file"
},
"partitions": {
"items": {
"$ref": "#/$defs/PartitionType"
},
"type": "array",
"title": "Partition Mask",
"description": "Identify specific cases and apply a defined list of masks for each case"
}
},
"additionalProperties": false,
Expand Down Expand Up @@ -877,6 +891,31 @@
"name"
]
},
"PartitionType": {
"properties": {
"name": {
"type": "string",
"description": "name of the partition"
},
"when": {
"type": "string",
"description": "template to execute, if true the condition is active"
},
"then": {
"items": {
"$ref": "#/$defs/MaskType"
},
"type": "array",
"description": "list of masks to execute if the condition is active"
}
},
"additionalProperties": false,
"type": "object",
"required": [
"name",
"then"
]
},
"PipeType": {
"properties": {
"masking": {
Expand Down
45 changes: 45 additions & 0 deletions test/suites/masking_partition.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
name: partition mask
testcases:
- name: simple partition with default case
steps:
- script: |-
cat > masking.yml <<EOF
version: "1"
seed: 42
masking:
- selector:
jsonpath: "id"
mask:
partitions:
- name: idrh
when: '[[ .id | default "" | mustRegexMatch "^P[A-Z]{3}[0-9]{3}$" ]]'
then:
- constant: "IDRH"
- name: digits
when: '[[ .id | default "" | mustRegexMatch "^[0-9]+$" ]]'
then:
- constant: "DIGITS"
- name: others
then:
- constant: "OTHER"
EOF
- script: sed -i "s/\[\[/\{\{/g" masking.yml
- script: sed -i "s/\]\]/\}\}/g" masking.yml
- script: |-
pimo <<EOF
{"case": 1, "id": "PZZZ123"}
{"case": 2, "id": "12345"}
{"case": 3, "id": "PABC000"}
{"case": 4, "id": "PABCD000"}
{"case": 5, "id": ""}
{"case": 6, "id": null}
EOF
assertions:
- result.code ShouldEqual 0
- 'result.systemout ShouldContainSubstring {"case":1,"id":"IDRH"}'
- 'result.systemout ShouldContainSubstring {"case":2,"id":"DIGITS"}'
- 'result.systemout ShouldContainSubstring {"case":3,"id":"IDRH"}'
- 'result.systemout ShouldContainSubstring {"case":4,"id":"OTHER"}'
- 'result.systemout ShouldContainSubstring {"case":5,"id":"OTHER"}'
- 'result.systemout ShouldContainSubstring {"case":6,"id":"OTHER"}'
- result.systemerr ShouldBeEmpty

0 comments on commit f2acba3

Please sign in to comment.