Skip to content

Commit

Permalink
Merge branch 'main' into refactor/xdg-go
Browse files Browse the repository at this point in the history
  • Loading branch information
kruskall authored Jun 14, 2024
2 parents 221528c + 8239434 commit 79f1e26
Showing 1 changed file with 8 additions and 6 deletions.
14 changes: 8 additions & 6 deletions x-pack/filebeat/docs/inputs/input-aws-s3.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -120,7 +120,7 @@ characters. This only applies to non-JSON logs. See <<_encoding_3>>.
==== `decoding`

The file decoding option is used to specify a codec that will be used to
decode the file contents. This can apply to any file stream data.
decode the file contents. This can apply to any file stream data.
An example config is shown below:

[source,yaml]
Expand All @@ -131,17 +131,17 @@ An example config is shown below:
Currently supported codecs are given below:-

1. <<attrib-decoding-parquet,Parquet>>: This codec decodes parquet compressed data streams.

[id="attrib-decoding-parquet"]
[float]
==== `the parquet codec`
The `parquet` codec is used to decode parquet compressed data streams.
Only enabling the codec will use the default codec options. The parquet codec supports
two sub attributes which can make parquet decoding more efficient. The `batch_size` attribute and
two sub attributes which can make parquet decoding more efficient. The `batch_size` attribute and
the `process_parallel` attribute. The `batch_size` attribute can be used to specify the number of
records to read from the parquet stream at a time. By default the `batch size` is set to `1` and
`process_parallel` is set to `false`. If the `process_parallel` attribute is set to `true` then functions
which read multiple columns will read those columns in parallel from the parquet stream with a
records to read from the parquet stream at a time. By default the `batch size` is set to `1` and
`process_parallel` is set to `false`. If the `process_parallel` attribute is set to `true` then functions
which read multiple columns will read those columns in parallel from the parquet stream with a
number of readers equal to the number of columns. Setting `process_parallel` to `true` will greatly
increase the rate of processing at the cost of increased memory usage. Having a larger `batch_size`
also helps to increase the rate of processing. An example config is shown below:
Expand All @@ -162,6 +162,8 @@ value can be assigned the name of the field or `.[]`. This setting will be able
the messages under the group value into separate events. For example, CloudTrail
logs are in JSON format and events are found under the JSON object "Records".

NOTE: When using `expand_event_list_from_field`, `content_type` config parameter has to be set to `application/json`.

["source","json"]
----
{
Expand Down

0 comments on commit 79f1e26

Please sign in to comment.