Skip to content

Commit

Permalink
Hash plugin (#726)
Browse files Browse the repository at this point in the history
* Init hash plugin

* Add docs&tests

* Optimize regexps

* Fix doc

* Fix after merge

* Fix doc

* Add LM normalizer

* Remove re normalizer

* Fix url regexp

* Rename hash_field to result_fields

* gen-doc

* Add max_size

* Change default max_size

* Fix doc
  • Loading branch information
kirillov6 authored Feb 18, 2025
1 parent b74dccd commit b92c905
Show file tree
Hide file tree
Showing 16 changed files with 927 additions and 1 deletion.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ TBD: throughput on production servers.

**Input**: [dmesg](plugin/input/dmesg/README.md), [fake](plugin/input/fake/README.md), [file](plugin/input/file/README.md), [http](plugin/input/http/README.md), [journalctl](plugin/input/journalctl/README.md), [k8s](plugin/input/k8s/README.md), [kafka](plugin/input/kafka/README.md)

**Action**: [add_file_name](plugin/action/add_file_name/README.md), [add_host](plugin/action/add_host/README.md), [convert_date](plugin/action/convert_date/README.md), [convert_log_level](plugin/action/convert_log_level/README.md), [convert_utf8_bytes](plugin/action/convert_utf8_bytes/README.md), [debug](plugin/action/debug/README.md), [decode](plugin/action/decode/README.md), [discard](plugin/action/discard/README.md), [flatten](plugin/action/flatten/README.md), [join](plugin/action/join/README.md), [join_template](plugin/action/join_template/README.md), [json_decode](plugin/action/json_decode/README.md), [json_encode](plugin/action/json_encode/README.md), [json_extract](plugin/action/json_extract/README.md), [keep_fields](plugin/action/keep_fields/README.md), [mask](plugin/action/mask/README.md), [modify](plugin/action/modify/README.md), [move](plugin/action/move/README.md), [parse_es](plugin/action/parse_es/README.md), [parse_re2](plugin/action/parse_re2/README.md), [remove_fields](plugin/action/remove_fields/README.md), [rename](plugin/action/rename/README.md), [set_time](plugin/action/set_time/README.md), [split](plugin/action/split/README.md), [throttle](plugin/action/throttle/README.md)
**Action**: [add_file_name](plugin/action/add_file_name/README.md), [add_host](plugin/action/add_host/README.md), [convert_date](plugin/action/convert_date/README.md), [convert_log_level](plugin/action/convert_log_level/README.md), [convert_utf8_bytes](plugin/action/convert_utf8_bytes/README.md), [debug](plugin/action/debug/README.md), [decode](plugin/action/decode/README.md), [discard](plugin/action/discard/README.md), [flatten](plugin/action/flatten/README.md), [hash](plugin/action/hash/README.md), [join](plugin/action/join/README.md), [join_template](plugin/action/join_template/README.md), [json_decode](plugin/action/json_decode/README.md), [json_encode](plugin/action/json_encode/README.md), [json_extract](plugin/action/json_extract/README.md), [keep_fields](plugin/action/keep_fields/README.md), [mask](plugin/action/mask/README.md), [modify](plugin/action/modify/README.md), [move](plugin/action/move/README.md), [parse_es](plugin/action/parse_es/README.md), [parse_re2](plugin/action/parse_re2/README.md), [remove_fields](plugin/action/remove_fields/README.md), [rename](plugin/action/rename/README.md), [set_time](plugin/action/set_time/README.md), [split](plugin/action/split/README.md), [throttle](plugin/action/throttle/README.md)

**Output**: [clickhouse](plugin/output/clickhouse/README.md), [devnull](plugin/output/devnull/README.md), [elasticsearch](plugin/output/elasticsearch/README.md), [file](plugin/output/file/README.md), [gelf](plugin/output/gelf/README.md), [kafka](plugin/output/kafka/README.md), [postgres](plugin/output/postgres/README.md), [s3](plugin/output/s3/README.md), [splunk](plugin/output/splunk/README.md), [stdout](plugin/output/stdout/README.md)

Expand Down
1 change: 1 addition & 0 deletions _sidebar.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@
- [decode](plugin/action/decode/README.md)
- [discard](plugin/action/discard/README.md)
- [flatten](plugin/action/flatten/README.md)
- [hash](plugin/action/hash/README.md)
- [join](plugin/action/join/README.md)
- [join_template](plugin/action/join_template/README.md)
- [json_decode](plugin/action/json_decode/README.md)
Expand Down
1 change: 1 addition & 0 deletions cmd/file.d/file.d.go
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ import (
_ "github.com/ozontech/file.d/plugin/action/decode"
_ "github.com/ozontech/file.d/plugin/action/discard"
_ "github.com/ozontech/file.d/plugin/action/flatten"
_ "github.com/ozontech/file.d/plugin/action/hash"
_ "github.com/ozontech/file.d/plugin/action/join"
_ "github.com/ozontech/file.d/plugin/action/join_template"
_ "github.com/ozontech/file.d/plugin/action/json_decode"
Expand Down
1 change: 1 addition & 0 deletions e2e/start_work_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ import (
_ "github.com/ozontech/file.d/plugin/action/decode"
_ "github.com/ozontech/file.d/plugin/action/discard"
_ "github.com/ozontech/file.d/plugin/action/flatten"
_ "github.com/ozontech/file.d/plugin/action/hash"
_ "github.com/ozontech/file.d/plugin/action/join"
_ "github.com/ozontech/file.d/plugin/action/join_template"
_ "github.com/ozontech/file.d/plugin/action/json_decode"
Expand Down
2 changes: 2 additions & 0 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ require (
github.com/satori/go.uuid v1.2.0
github.com/stretchr/testify v1.9.0
github.com/tidwall/gjson v1.18.0
github.com/timtadh/lexmachine v0.2.3
github.com/twmb/franz-go v1.17.0
github.com/twmb/franz-go/pkg/kadm v1.12.0
github.com/twmb/franz-go/plugin/kzap v1.1.2
Expand Down Expand Up @@ -129,6 +130,7 @@ require (
github.com/spf13/pflag v1.0.5 // indirect
github.com/tidwall/match v1.1.1 // indirect
github.com/tidwall/pretty v1.2.1 // indirect
github.com/timtadh/data-structures v0.6.1 // indirect
github.com/twmb/franz-go/pkg/kmsg v1.8.0 // indirect
github.com/valyala/bytebufferpool v1.0.0 // indirect
github.com/xdg-go/pbkdf2 v1.0.0 // indirect
Expand Down
5 changes: 5 additions & 0 deletions go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -378,6 +378,11 @@ github.com/tidwall/match v1.1.1/go.mod h1:eRSPERbgtNPcGhD8UCthc6PmLEQXEWd3PRB5JT
github.com/tidwall/pretty v1.2.0/go.mod h1:ITEVvHYasfjBbM0u2Pg8T2nJnzm8xPwvNhhsoaGGjNU=
github.com/tidwall/pretty v1.2.1 h1:qjsOFOWWQl+N3RsoF5/ssm1pHmJJwhjlSbZ51I6wMl4=
github.com/tidwall/pretty v1.2.1/go.mod h1:ITEVvHYasfjBbM0u2Pg8T2nJnzm8xPwvNhhsoaGGjNU=
github.com/timtadh/data-structures v0.6.1 h1:76eDpwngj2rEi9r/qvdH6YL7wMXGsoFFzhEylo/IacA=
github.com/timtadh/data-structures v0.6.1/go.mod h1:uYUnI1cQi/5yMCc7s23I+x8Mn8BCMf4WgK+7/4QSEk4=
github.com/timtadh/getopt v1.0.0/go.mod h1:L3EL6YN2G0eIAhYBo9b7SB9d/kEQmdnwthIlMJfj210=
github.com/timtadh/lexmachine v0.2.3 h1:ZqlfHnfMcAygtbNM5Gv7jQf8hmM8LfVzDjfCrq235NQ=
github.com/timtadh/lexmachine v0.2.3/go.mod h1:oK1NW+93fQSIF6s+J6sXBFWsCPCFbNmrwKV1i0aqvW0=
github.com/twmb/franz-go v1.17.0 h1:hawgCx5ejDHkLe6IwAtFWwxi3OU4OztSTl7ZV5rwkYk=
github.com/twmb/franz-go v1.17.0/go.mod h1:NreRdJ2F7dziDY/m6VyspWd6sNxHKXdMZI42UfQ3GXM=
github.com/twmb/franz-go/pkg/kadm v1.12.0 h1:I8P/gpXFzhl73QcAYmJu+1fOXvrynyH/MAotr2udEg4=
Expand Down
5 changes: 5 additions & 0 deletions plugin/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -322,6 +322,11 @@ pipelines:
It transforms `{"animal":{"type":"cat","paws":4}}` into `{"pet_type":"b","pet_paws":"4"}`.

[More details...](plugin/action/flatten/README.md)
## hash
It calculates the hash for one of the specified event fields and adds a new field with result in the event root.
> Fields can be of any type except for an object and an array.

[More details...](plugin/action/hash/README.md)
## join
It makes one big event from the sequence of the events.
It is useful for assembling back together "exceptions" or "panics" if they were written line by line.
Expand Down
5 changes: 5 additions & 0 deletions plugin/action/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -165,6 +165,11 @@ pipelines:
It transforms `{"animal":{"type":"cat","paws":4}}` into `{"pet_type":"b","pet_paws":"4"}`.

[More details...](plugin/action/flatten/README.md)
## hash
It calculates the hash for one of the specified event fields and adds a new field with result in the event root.
> Fields can be of any type except for an object and an array.

[More details...](plugin/action/hash/README.md)
## join
It makes one big event from the sequence of the events.
It is useful for assembling back together "exceptions" or "panics" if they were written line by line.
Expand Down
8 changes: 8 additions & 0 deletions plugin/action/hash/README.idoc.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Hash plugin
@introduction

## Examples
@examples

## Config params
@config-params|description
108 changes: 108 additions & 0 deletions plugin/action/hash/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
# Hash plugin
It calculates the hash for one of the specified event fields and adds a new field with result in the event root.
> Fields can be of any type except for an object and an array.
## Examples
Hashing without normalization (first found field is `error.code`):
```yaml
pipelines:
example_pipeline:
...
actions:
- type: hash
fields:
- field: error.code
- field: level
result_field: hash
...
```
The original event:
```json
{
"level": "error",
"error": {
"code": "unauthenticated",
"message": "bad token format"
}
}
```
The resulting event:
```json
{
"level": "error",
"error": {
"code": "unauthenticated",
"message": "bad token format"
},
"hash": 6584967863753642363,
}
```
---
Hashing with normalization (first found field is `message`):
```yaml
pipelines:
example_pipeline:
...
actions:
- type: hash
fields:
- field: error.code
- field: message
format: normalize
result_field: hash
...
```
The original event:
```json
{
"level": "error",
"message": "2023-10-30T13:35:33.638720813Z error occurred, client: 10.125.172.251, upstream: \"http://10.117.246.15:84/download\", host: \"mpm-youtube-downloader-38.name.com:84\""
}
```

Normalized 'message':
`<datetime> error occurred, client: <ip>, upstream: "<url>", host: "<host>:<int>"`

The resulting event:
```json
{
"level": "error",
"message": "2023-10-30T13:35:33.638720813Z error occurred, client: 10.125.172.251, upstream: \"http://10.117.246.15:84/download\", host: \"mpm-youtube-downloader-38.name.com:84\"",
"hash": 13863947727397728753,
}
```

## Config params
**`fields`** *`[]Field`* *`required`*

Prioritized list of fields. The first field found will be used to calculate the hash.

`Field` params:
* **`field`** *`cfg.FieldSelector`* *`required`*

The event field for calculating the hash.

* **`format`** *`string`* *`default=no`* *`options=no|normalize`*

The field format for various hashing algorithms.

* **`max_size`** *`int`* *`default=0`*

The maximum field size used in hash calculation of any format.
If set to `0`, the entire field will be used in hash calculation.

> If the field size is greater than `max_size`, then
the first `max_size` bytes will be used in hash calculation.
>
> It can be useful in case of performance degradation when calculating the hash of long fields.

<br>

**`result_field`** *`cfg.FieldSelector`* *`required`*

The event field to which put the hash.

<br>


<br>*Generated using [__insane-doc__](https://github.com/vitkovskii/insane-doc)*
Loading

0 comments on commit b92c905

Please sign in to comment.