From d8aad09bdb50ccd349313555c3c4f6fa6ccaf5fd Mon Sep 17 00:00:00 2001 From: Sam Crauwels Date: Wed, 8 May 2024 19:14:10 +0200 Subject: [PATCH] Add a detailed explanation about the different field definition fields. (#3792) * Add a detailed explanation about the different field definition fields. Added a description of the different files in the `fields` directory of the integration: - agent.yml - base-fields.yml - ecs.yml - fields.yml * Verified the use of the file:// reference instead of GIT@ in the _dev/build/build.yml file to point at the ECS reference file. * Expanded explanation on mapping methods Further documented the four ways ECS mappings can be defined in integrations. - Using the `fields` directory in the integration. - The import_mappings: true option in the `_dev/build/build.yml` file. - Fleet pre-installed template `ecs@mapping` (>8.13.0) - Local ECS file * Intermediary commit * Integrated (no pun) feedback Integrated the feedback and expanded the field section. Also replaced some nginx references with apache. * Add link to fields section of general guidelines --- .../integrations/build-integration.asciidoc | 138 ++++++++++++++++-- 1 file changed, 123 insertions(+), 15 deletions(-) diff --git a/docs/en/integrations/build-integration.asciidoc b/docs/en/integrations/build-integration.asciidoc index ad475e0fe2..1f746707df 100644 --- a/docs/en/integrations/build-integration.asciidoc +++ b/docs/en/integrations/build-integration.asciidoc @@ -242,7 +242,7 @@ Learn more in the {ref}/ingest.html[ingest pipeline reference]. **** Ingest pipelines are defined in the `elasticsearch/ingest_pipeline` directory. -They only apply to the parent data stream within which they live. +They only apply to the parent data stream within which they live. For our example, this would be the `apache.access` dataset. For example, the https://github.com/elastic/integrations/tree/main/packages/apache[Apache integration]: @@ -294,10 +294,78 @@ Each document is a collection of fields, each having its own data type. When map To learn more, see {ref}/mapping.html[mapping]. **** -Mappings are defined in the `fields` directory. -Like ingest pipelines, mappings only apply to the parent data stream. -The Apache integration has four different field definitions: +In the integration, the `fields` directory serves as the blueprint used to create component templates for the integration. The content from all files in this directory will be unified when the integration is built, so the mappings need to be unique per data stream dataset. +Like ingest pipelines, mappings only apply to the data stream dataset, for our example the `apache.access` dataset. ++ +NOTE: The names of these files are conventions, any file name with a `.yml` extension will work. + +Integrations have had significant enhancements in how ECS fields are defined. Below is a guide on which approach to use, based on the version of Elastic your integration will support. ++ +. ECS mappings component template (>=8.13.0) +Integrations *only* supporting version 8.13.0 and up, can use the https://github.com/elastic/elasticsearch/blob/c2a3ec42632b0339387121efdef13f52c6c66848/x-pack/plugin/core/template-resources/src/main/resources/ecs%40mappings.json[ecs@mappings] component template installed by Fleet. +This makes explicitly declaring ECS fields unnecessary; the `ecs@mappings` component template in Elasticsearch will automatically detect and configure them. +However, should ECS fields be explicitly defined, they will overwrite the dynamic mapping provided by the `ecs@mappings` component template. +They can also be imported with an `external` declaration, as seen in the example below. ++ +. Dynamic mappings imports (<8.13.0 & >=8.13.0) +Integrations supporting the Elastic stack below version 8.13.0 can still dynamically import ECS field mappings by defining `import_mappings: true` in the ECS section of the `_dev/build/build.yml` file in the root of the package directory. +This introduces a https://github.com/elastic/elastic-package/blob/f439b96a74c27c5adfc3e7810ad584204bfaf85d/internal/builder/_static/ecs_mappings.yaml[dynamic mapping] with most of the ECS definitions. +Using this method means that, just like the previous approach, ECS fields don't need to be defined in your integration, they are dynamically integrated into the package at build time. +Explicitly defined ECS fields can be used and will also overwrite this mechanism. + +An example of the aformentioned `build.yml` file for this method: ++ +[source,yaml] +---- +dependencies: + ecs: + reference: git@v8.6.0 + import_mappings: true +---- ++ +. Explicit ECS mappings +As mentioned in the previous two approaches, ECS mappings can still be set explicitly and will overwrite the dynamic mappings. +This can be done in two ways: +- Using an `external: ecs` reference to import the definition of a specific field. +- Literally defining the ECS field. + +The `external: ecs` definition instructs the `elastic-package` command line tool to refer to an external ECS reference to resolve specific fields. By default it looks at the https://raw.githubusercontent.com/elastic/ecs/v8.6.0/generated/ecs/ecs_nested.yml[ECS reference] file hosted on Github. +This external reference file is determined by a Git reference found in the `_dev/build/build.yml` file, in the root of the package directory. +The `build.yml` file set up for external references: ++ +[source,yaml] +---- +dependencies: + ecs: + reference: git@v8.6.0 +---- + +Literal definition a ECS field: +[source,yaml] +---- +- name: cloud.acount.id + level: extended + type: keyword + ignore_above: 1024 + description: 'The cloud account or organ....' + example: 43434343 +---- + +. Local ECS reference file (air-gapped setup) +By changing the Git reference in in `_dev/build/build.yml` to the path of the downloaded https://raw.githubusercontent.com/elastic/ecs/v8.6.0/generated/ecs/ecs_nested.yml[ECS reference] file, it is possible for the `elastic-package` command line tool to look for this file locally. Note that the path should be the full path to the reference file. +Doing this, our `build.yml` file looks like: ++ +---- +dependencies: + ecs: + reference: file:///home/user/integrations/packages/apache/ecs_nested.yml +---- + +The `access` data stream dataset of the Apache integration has four different field definitions: ++ +NOTE: The `apache` integration below has not yet been updated to use the dynamic ECS field definition and uses `external` references to define ECS fields in `ecs.yml`. ++ [source,text] ---- apache @@ -306,23 +374,63 @@ apache │ │ └───elasticsearch/ingest_pipeline │ │ │ default.yml │ │ └───fields -│ │ agent.yml <1> -│ │ base-fields.yml <2> -│ │ ecs.yml <3> -│ │ fields.yml <4> +│ │ agent.yml +│ │ base-fields.yml +│ │ ecs.yml +│ │ fields.yml │ └───error +│ │ └───elasticsearch/ingest_pipeline +│ │ │ default.yml +│ │ └───fields +│ │ agent.yml +│ │ base-fields.yml +│ │ ecs.yml +│ │ fields.yml │ └───status ---- -<1> `agent.yml` fields the Elastic agent uses -<2> `base-fields.yml` never changes and is required for all integrations -<3> Defines the relevant ECS fields -<4> Custom Apache access log fields +=== agent.yml +The `agent.yml` file defines fields used by default processors. +Examples: `cloud.account.id`, `container.id`, `input.type` -// Need more on mapping +=== base-fields.yml +In this file, the `data_stream` subfields `type`, `dataset` and `namespace` are defined as type `constant_keyword`, the values for these fields are added by the integration. +The `event.module` and `event.dataset` fields are defined with a fixed value specific for this integration: +- `event.module: apache` +- `event.dataset: apache.access` +Field `@timestamp` is defined here as type `date`. + +=== ecs.yml +This file specifies every Elastic Common Schema (ECS) field used by the integration that is not defined in the files `agent.yml` or `base-fields.yml` files. It uses `external: ecs` references. +For example: ++ +[source,yaml] +---- +- external: ecs + name: client.ip +- external: ecs + name: destination.domain +---- + +=== fields.yml +Here we define fields that we need in our integration and are not found in the ECS. +The example below defines field `apache.access.ssl.protocol` in the Apache integration. ++ +[source,yaml] +---- +- name: apache.access + type: group + fields: + - name: ssl.protocol + type: keyword + description: | + SSL protocol version. +---- // Maybe something on ECS too?? +Learn more about fields in the https://www.elastic.co/guide/en/integrations-developer/current/general-guidelines.html#_document_all_fields[general guidelines]. + [[create-dashboards]] == Create and export dashboards @@ -705,13 +813,13 @@ vars: required: true <1> show_user: true <2> title: Access log paths <3> - description: Paths to the nginx access log file. <4> + description: Paths to the apache access log file. <4> type: text <5> multi: true <6> hide_in_deployment_modes: <7> - agentless default: - - /var/log/nginx/access.log* + - /var/log/httpd/access.log* ---- <1> option is required <2> don't hide the configuration option (collapsed menu)