Amazon Kinesis Data Firehose setup guideedit
-Prerequisitesedit
--
-
- -You have an AWS account where you can create a Firehose delivery stream. - -
- -You have a deployment in Elastic Cloud running Elastic Stack version 7.17 or greater on AWS. - -
Limitationsedit
--
-
-
-
When using Elastic integrations with Firehose, only a single log type may be sent per delivery stream, e.g. VPC Flow Logs. -This is due to how Firehose records are routed into data streams in Elasticsearch.
-It is possible to combine multiple log types in one delivery stream, but this will preclude the use of Elastic integrations (by default all Firehose logs are sent to the
-logs-generic-default
data stream).
- - -It is not possible to configure a delivery stream to send data to Elastic Cloud via PrivateLink (VPC endpoint). -This is a current limitation in Firehose, which we are working with AWS to resolve. - -
Instructionsedit
--
-
-
-
Install the relevant integrations in Kibana.
-In order to make the most of your data, install AWS integrations to load index templates, ingest pipelines, and dashboards into Kibana.
-In Kibana, navigate to Management > Integrations in the sidebar.
-Find the AWS integration by searching or browsing the catalog.
---- --Navigate to the Settings tab and click Install AWS assets. -Confirm by clicking Install AWS in the popup.
---- --
- -
-
Create a delivery stream in Amazon Kinesis Data Firehose.
-Sign into the AWS console and navigate to Amazon Kinesis. -Click Create delivery stream.
---- --Configure the delivery stream using the following settings:
-Choose source and destinationedit
-Unless you are streaming data from Kinesis Data Streams, set source to Direct PUT (see Setup guide for more details on data sources).
-Set destination to Elastic.
-Delivery stream nameedit
-Provide a meaningful name that will allow you to identify this delivery stream later.
-Transform records - optionaledit
-For advanced use cases, source records can be transformed by invoking a custom Lambda function. -When using Elastic integrations, this should not be required.
---- --Destination settingsedit
-Set Elastic endpoint URL to point to your Elasticsearch cluster running in Elastic Cloud. -This endpoint can be found in the Elastic Cloud console. -An example is
-https://my-deployment-28u274.es.eu-west-1.aws.found.io
.API key should be a Base64 encoded Elastic API key, which can be created in Kibana by following the instructions under API Keys. -If you are using an API key with “Restrict privileges”, be sure to review the Indices privileges to provide at least "auto_configure" & "write" permissions for the indices you will be using with this delivery stream.
-We recommend leaving Content encoding set to GZIP for improved network efficiency.
-Retry duration determines how long Firehose continues retrying the request in the event of an error. -A duration of 60-300s should be suitable for most use cases.
-Parameters:
----
-
-
-Elastic recommends setting the
es_datastream_name
parameter to help route data to the correct integration data streams. -If this parameter is not specified, data is sent to thelogs-generic-default
data stream by default. -
-
- ----The default data stream will change to
-logs-awsfirehose-default
in January 2024. To avoid breaking changes, do not leavees_datastream_name
empty. -To try the new routing functionality, setes_datastream_name
tologs-awsfirehose-default
.---
-
-
-
You can use the
-es_datastream_name
parameter to route documents to any data stream. -When Amazon Kinesis Data Firehose integration is installed, routing will be done automatically withes_datastream_name
sets tologs-awsfirehose-default
. -When using Elastic AWS integrations without the Firehose integration, you must set this parameter to specific data streams likelogs-aws.vpcflow-default
for ingesting VPC flow logs.Elastic integrations use data streams with specific naming conventions, and Firehose records need to be routed to the relevant data stream to use preconfigured index mappings, ingest pipelines, and dashboards.
-A separate Firehose delivery stream is required for each log type in AWS to make use of Elastic integrations.
-The following is a list of common AWS log types and the
-es_datastream_name
value that needs to be set to route the logs to the correct integration.---
-- - -- - - - - -AWS log type - -es_datastream_name
value- -- -logs-aws.cloudfront_logs-default
- -- -logs-aws.cloudtrail-default
- -- -logs-aws.cloudwatch_logs-default
- -- -logs-aws.ec2_logs-default
- -- -logs-aws.elb_logs-default
- -- -logs-aws.firewall_logs-default
- -- -logs-aws.route53_public_logs-default
- -- -logs-aws.route53_resolver_logs-default
- -- -logs-aws.s3access-default
- -- -logs-aws.vpcflow-default
- - -- -logs-aws.waf-default
As per the data stream naming conventions, the "namespace" is a user-configurable arbitrary grouping and can be changed from
-default
to fit your use case. For example, you may want to organize WAF Logs per environment intologs-aws.waf-production
andlogs-aws.waf-qa
data streams for more granular control over rollover, retention, and security permissions.For log types not listed above, review the relevant integration documentation to determine the correct
-es_datastream_name
value. -The data stream components can be found in the example event for each integration.--- --
- -
-The
include_cw_extracted_fields
parameter is optional and can be set when using a CloudWatch logs subscription filter as the Firehose data source. -When set totrue
, extracted fields generated by the filter pattern in the subscription filter will be collected. -Setting this parameter can add many fields into each record and may significantly increase data volume in Elasticsearch. -As such, use of this parameter should be carefully considered and used only when the extracted fields are required for specific filtering and/or aggregation. -
- -
-
The
-include_event_original
field is optional and should only be used for debugging purposes. -When set totrue
, each log record will contain an additional field namedevent.original
, which contains the raw (unprocessed) log message. -This parameter will increase the data volume in Elasticsearch and should be used with care.Elastic requires a Buffer size of 1MiB to avoid exceeding the Elasticsearch
-http.max_content_length
setting (typically 100MB) when the buffer is uncompressed.The default Buffer interval of 60s is recommended to ensure data freshness in Elastic.
---- --
-
Backup settingsedit
-It’s recommended to configure S3 backup for failed records. -It’s then possible to configure workflows to automatically re-try failed records, for example using Elastic Serverless Forwarder.
---- --Whilst Firehose guarantees at-least-once delivery of data to the destination, if your data is highly sensitive, it’s also recommended to backup all records to S3 in case there are any ingest issues in Elasticsearch.
-
- -
-Elastic recommends setting the
-
-
Send data to the Firehose delivery stream.
-Consult the AWS documentation for details on how to configure a variety of log sources to send data to Firehose delivery streams.
-Several services support writing data directly to delivery streams, including Cloudwatch logs. -In addition, there are other ways to create streaming data pipelines to Firehose, e.g. using AWS DMS.
-An example workflow for sending VPC Flow Logs to Firehose would be:
----
-
- -Publish VPC Flow Logs to a Cloudwatch log group. To learn how, refer to the AWS documentation about publishing flow logs. - -
- -Create a subscription filter in the CloudWatch log group to the Firehose delivery stream. To learn how, refer to the AWS documentation about using subscription filters. - -
-
Most Popular
-Video
- -Get Started with Elasticsearch
- -Video
- -Intro to Kibana
- -Video
- -ELK for Logs & Metrics
- -