diff --git a/.github/CODEOWNERS b/.github/CODEOWNERS index 9ad7aa51598c..c1dd7d7b945a 100644 --- a/.github/CODEOWNERS +++ b/.github/CODEOWNERS @@ -99,6 +99,7 @@ /packages/azure_functions @elastic/obs-infraobs-integrations /packages/azure_functions/data_stream/functionapplogs @elastic/obs-infraobs-integrations /packages/azure_functions/data_stream/metrics @elastic/obs-infraobs-integrations +/packages/azure_logs @elastic/obs-ds-hosted-services /packages/azure_metrics @elastic/obs-ds-hosted-services /packages/azure_metrics/data_stream/compute_vm @elastic/obs-ds-hosted-services /packages/azure_metrics/data_stream/compute_vm_scaleset @elastic/obs-ds-hosted-services diff --git a/packages/azure_logs/LICENSE.txt b/packages/azure_logs/LICENSE.txt new file mode 100644 index 000000000000..809108b857ff --- /dev/null +++ b/packages/azure_logs/LICENSE.txt @@ -0,0 +1,93 @@ +Elastic License 2.0 + +URL: https://www.elastic.co/licensing/elastic-license + +## Acceptance + +By using the software, you agree to all of the terms and conditions below. + +## Copyright License + +The licensor grants you a non-exclusive, royalty-free, worldwide, +non-sublicensable, non-transferable license to use, copy, distribute, make +available, and prepare derivative works of the software, in each case subject to +the limitations and conditions below. + +## Limitations + +You may not provide the software to third parties as a hosted or managed +service, where the service provides users with access to any substantial set of +the features or functionality of the software. + +You may not move, change, disable, or circumvent the license key functionality +in the software, and you may not remove or obscure any functionality in the +software that is protected by the license key. + +You may not alter, remove, or obscure any licensing, copyright, or other notices +of the licensor in the software. Any use of the licensor’s trademarks is subject +to applicable law. + +## Patents + +The licensor grants you a license, under any patent claims the licensor can +license, or becomes able to license, to make, have made, use, sell, offer for +sale, import and have imported the software, in each case subject to the +limitations and conditions in this license. This license does not cover any +patent claims that you cause to be infringed by modifications or additions to +the software. If you or your company make any written claim that the software +infringes or contributes to infringement of any patent, your patent license for +the software granted under these terms ends immediately. If your company makes +such a claim, your patent license ends immediately for work on behalf of your +company. + +## Notices + +You must ensure that anyone who gets a copy of any part of the software from you +also gets a copy of these terms. + +If you modify the software, you must include in any modified copies of the +software prominent notices stating that you have modified the software. + +## No Other Rights + +These terms do not imply any licenses other than those expressly granted in +these terms. + +## Termination + +If you use the software in violation of these terms, such use is not licensed, +and your licenses will automatically terminate. If the licensor provides you +with a notice of your violation, and you cease all violation of this license no +later than 30 days after you receive that notice, your licenses will be +reinstated retroactively. However, if you violate these terms after such +reinstatement, any additional violation of these terms will cause your licenses +to terminate automatically and permanently. + +## No Liability + +*As far as the law allows, the software comes as is, without any warranty or +condition, and the licensor will not be liable to you for any damages arising +out of these terms or the use or nature of the software, under any kind of +legal claim.* + +## Definitions + +The **licensor** is the entity offering these terms, and the **software** is the +software the licensor makes available under these terms, including any portion +of it. + +**you** refers to the individual or entity agreeing to these terms. + +**your company** is any legal entity, sole proprietorship, or other kind of +organization that you work for, plus all organizations that have control over, +are under the control of, or are under common control with that +organization. **control** means ownership of substantially all the assets of an +entity, or the power to direct its management and policies by vote, contract, or +otherwise. Control can be direct or indirect. + +**your licenses** are all the licenses granted to you for the software under +these terms. + +**use** means anything you do with the software requiring one of your licenses. + +**trademark** means trademarks, service marks, and similar rights. diff --git a/packages/azure_logs/_dev/build/docs/README.md b/packages/azure_logs/_dev/build/docs/README.md new file mode 100644 index 000000000000..d9c9de14de77 --- /dev/null +++ b/packages/azure_logs/_dev/build/docs/README.md @@ -0,0 +1,401 @@ +# Custom Azure Logs + +The Custom Azure Logs integration collects logs from Azure Event Hub. + +Use the integration to collect logs from: + +* Azure services that support exporting logs to Event Hub +* Any other source that can send logs to an Event Hub + +## Data streams + +The Custom Azure Logs integration collects one type of data stream: logs. + +The integration does not use a pre-defined Elastic data stream. You can select your dataset and namespace of choice when configuring the integration. + +For example, if you select `azure.custom` as your dataset, and `default` as your namespace, the integration will send the data to the `logs-azure.custom-default` data stream. + +Custom Logs integrations give you all the flexibility you need to configure the integration to your needs. + +## Requirements + +You need Elasticsearch for storing and searching your data and Kibana for visualizing and managing it. +You can use our hosted Elasticsearch Service on Elastic Cloud, which is recommended, or self-manage the Elastic Stack on your own hardware. + +Before using the Custom Azure Logs you will need: + +* One **event hub** to store in-flight logs exported by Azure services (or other sources) and make them available to Elastic Agent. +* A **storage account** to store information about logs consumed by the Elastic Agent. + +### Event Hub + +[Azure Event Hubs](https://learn.microsoft.com/en-us/azure/event-hubs/event-hubs-about) is a data streaming platform and event ingestion service. It can receive and temporary store millions of events. + +Elastic Agent with the Custom Azure Logs integration will consume logs from the Event Hubs service. + +```text + ┌────────────────┐ ┌───────────┐ + │ myeventhub │ │ Elastic │ + │ <> │─────▶│ Agent │ + └────────────────┘ └───────────┘ +``` + +To learn more about Event Hubs, refer to [Features and terminology in Azure Event Hubs](https://learn.microsoft.com/en-us/azure/event-hubs/event-hubs-features). + +### Storage Account Container + +The [Storage account](https://learn.microsoft.com/en-us/azure/storage/common/storage-account-overview) is a versatile Azure service that allows you to store data in various storage types, including blobs, file shares, queues, tables, and disks. + +The Custom Azure Logs integration requires a Storage account container to work. + +The integration uses the Storage Account container for checkpointing; it stores data about the Consumer Group (state, position, or offset) and shares it among the Elastic Agents. Sharing such information allows multiple Elastic Agents assigned to the same agent policy to work together; this enables horizontal scaling of the logs processing when required. + +```text + ┌────────────────┐ ┌───────────┐ + │ myeventhub │ logs │ Elastic │ + │ <> │────────────────────▶│ Agent │ + └────────────────┘ └───────────┘ + │ + consumer group info │ + ┌────────────────┐ (state, position, or │ + │ log-myeventhub │ offset) │ + │ <> │◀───────────────────────────┘ + └────────────────┘ +``` + +The Elastic Agent automatically creates one container for the Custom Azure Logs integration. The Agent will then create one blob for each partition on the event hub. + +For example, if the integration is configured to fetch data from an event hub with four partitions, the Agent will create the following: + +* One storage account container. +* Four blobs in that container. + +The information stored in the blobs is small (usually < 500 bytes per blob) and accessed frequently. Elastic recommends using the Hot storage tier. + +You need to keep the Storage Account container as long as you need to run the integration with the Elastic Agent. If you delete a Storage Account container, the Elastic Agent will stop working and create a new one the next time it starts. + +By deleting a Storage Account container, the Elastic Agent will lose track of the last message processed and start processing messages from the beginning of the event hub retention period. + +## Setup + +Before adding the integration, you must complete the following tasks. + +### Create an Event Hub + +The event hub receives the logs exported from the Azure service and makes them available to the Elastic Agent to pick up. + +Here's the high-level overview of the required steps: + +* Create a resource group, or select an existing one. +* Create an Event Hubs namespace. +* Create an event hub. + +For a detailed step-by-step guide, check the quickstart [Create an event hub using Azure portal](https://learn.microsoft.com/en-us/azure/event-hubs/event-hubs-create). + +Take note of the event hub **Name**, which you will use later when specifying an **eventhub** in the integration settings. + +#### Event Hubs Namespace vs Event Hub + +You should use the event hub name (not the Event Hubs namespace name) as a value for the **eventhub** option in the integration settings. + +If you are new to Event Hubs, think of the Event Hubs namespace as the cluster and the event hub as the topic. You will typically have one cluster and multiple topics. + +If you are familiar with Kafka, here's a conceptual mapping between the two: + +| Kafka Concept | Event Hub Concept | +|----------------|-------------------| +| Cluster | Namespace | +| Topic | An event hub | +| Partition | Partition | +| Consumer Group | Consumer Group | +| Offset | Offset | + +#### How many partitions? + +The number of partitions is essential to balance the event hub cost and performance. + +Here are a few examples with one or multiple agents, with recommendations on picking the correct number of partitions for your use case. + +##### Single Agent + +With a single Agent deployment, increasing the number of partitions on the event hub is the primary driver in scale-up performances. The Agent creates one worker for each partition. + +```text +┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐ ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐ + +│ │ │ │ + +│ ┌─────────────────┐ │ │ ┌─────────────────┐ │ + │ partition 0 │◀───────────│ worker │ +│ └─────────────────┘ │ │ └─────────────────┘ │ + ┌─────────────────┐ ┌─────────────────┐ +│ │ partition 1 │◀──┼────┼───│ worker │ │ + └─────────────────┘ └─────────────────┘ +│ ┌─────────────────┐ │ │ ┌─────────────────┐ │ + │ partition 2 │◀────────── │ worker │ +│ └─────────────────┘ │ │ └─────────────────┘ │ + ┌─────────────────┐ ┌─────────────────┐ +│ │ partition 3 │◀──┼────┼───│ worker │ │ + └─────────────────┘ └─────────────────┘ +│ │ │ │ + +│ │ │ │ + +└ Event Hub ─ ─ ─ ─ ─ ─ ─ ┘ └ Agent ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘ +``` + +##### Two or more Agents + +With more than one Agent, setting the number of partitions is crucial. The agents share the existing partitions to scale out performance and improve availability. + +The number of partitions must be at least the number of agents. + +```text +┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐ ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐ + +│ │ │ ┌─────────────────┐ │ + ┌──────│ worker │ +│ ┌─────────────────┐ │ │ │ └─────────────────┘ │ + │ partition 0 │◀────┘ ┌─────────────────┐ +│ └─────────────────┘ │ ┌──┼───│ worker │ │ + ┌─────────────────┐ │ └─────────────────┘ +│ │ partition 1 │◀──┼─┘ │ │ + └─────────────────┘ ─Agent─ ─ ─ ─ ─ ─ ─ ─ ─ ─ +│ ┌─────────────────┐ │ ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐ + │ partition 2 │◀────┐ +│ └─────────────────┘ │ │ │ ┌─────────────────┐ │ + ┌─────────────────┐ └─────│ worker │ +│ │ partition 3 │◀──┼─┐ │ └─────────────────┘ │ + └─────────────────┘ │ ┌─────────────────┐ +│ │ └──┼──│ worker │ │ + └─────────────────┘ +│ │ │ │ + +└ Event Hub ─ ─ ─ ─ ─ ─ ─ ┘ └ Agent ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘ +``` + +##### Recommendations + +Create an event hub with at least two partitions. Two partitions allow low-volume deployment to support high availability with two agents. Consider creating four partitions or more to handle medium-volume deployments with availability. + +To learn more about event hub partitions, read an in-depth guide from Microsoft at https://learn.microsoft.com/en-us/azure/event-hubs/event-hubs-create. + +To learn more about event hub partition from the performance perspective, check the scalability-focused document at https://learn.microsoft.com/en-us/azure/event-hubs/event-hubs-scalability#partitions. + +#### Consumer Group + +Like all other event hub clients, Elastic Agent needs a consumer group name to access the event hub. + +A Consumer Group is a view (state, position, or offset) of an entire event hub. Consumer groups enable multiple agents to each have a separate view of the event stream, and to read the logs independently at their own pace and with their own offsets. + +Consumer groups allow multiple Elastic Agents assigned to the same agent policy to work together; this enables horizontal scaling of the logs processing when required. + +In most cases, you can use the default consumer group named `$Default`. If `$Default` is already used by other applications, you can create a consumer group dedicated to the Azure Logs integration. + +#### Connection string + +The Elastic Agent requries a connection string to access the event hub and fetch the exported logs. The connection string contains details about the event hub used and the credentials required to access it. + +To get the connection string for your Event Hubs namespace: + +1. Visit the **Event Hubs namespace** you created in a previous step. +1. Select **Settings** > **Shared access policies**. + +Create a new Shared Access Policy (SAS): + +1. Select **Add** to open the creation panel. +1. Add a **Policy name** (for example, "ElasticAgent"). +1. Select the **Listen** claim. +1. Select **Create**. + +When the SAS Policy is ready, select it to display the information panel. + +Take note of the **Connection string–primary key**, which you will use later when specifying a **connection_string** in the integration settings. + +### Create a Diagnostic Settings + +The diagnostic settings export the logs from Azure services to a destination and in order to use Azure Logs integration, it must be an event hubb. + +To create a diagnostic settings to export logs: + +1. Locate the diagnostic settings for the service (for example, Microsoft Entra ID). +1. Select diagnostic settings in the **Monitoring** section of the service. Note that different services may place the diagnostic settings in different positions. +1. Select **Add diagnostic settings**. + +In the diagnostic settings page you have to select the source **log categories** you want to export and then select their **destination**. + +#### Select log categories + +Each Azure services exports a well-defined list of log categories. Check the individual integration doc to learn which log categories are supported by the integration. + +#### Select the destination + +Select the **subscription** and the **Event Hubs namespace** you previously created. Select the event hub dedicated to this integration. + +```text + ┌───────────────┐ ┌──────────────┐ ┌───────────────┐ ┌───────────┐ + │ MS Entra ID │ │ Diagnostic │ │ adlogs │ │ Elastic │ + │ <> ├──▶│ Settings │──▶│ <> │─────▶│ Agent │ + └───────────────┘ └──────────────┘ └───────────────┘ └───────────┘ +``` + +### Create a Storage account container + +The Elastic Agent stores the consumer group information (state, position, or offset) in a storage account container. Making this information available to all agents allows them to share the logs processing and resume from the last processed logs after a restart. + +NOTE: Use the storage account as a checkpoint store only. + +To create the storage account: + +1. Sign in to the [Azure Portal](https://portal.azure.com/) and create your storage account. +1. While configuring your project details, make sure you select the following recommended default settings: + - Hierarchical namespace: disabled + - Minimum TLS version: Version 1.2 + - Access tier: Hot + - Enable soft delete for blobs: disabled + - Enable soft delete for containers: disabled + +1. When the new storage account is ready, you need to take note of the storage account name and the storage account access keys, as you will use them later to authenticate your Elastic application’s requests to this storage account. + +This is the final diagram of the a setup for collecting Activity logs from the Azure Monitor service. + +```text + ┌───────────────┐ ┌──────────────┐ ┌────────────────┐ ┌───────────┐ + │ MS Entra ID │ │ Diagnostic │ │ adlogs │ logs │ Elastic │ + │ <> ├──▶│ Settings │──▶│ <> │────────▶│ Agent │ + └───────────────┘ └──────────────┘ └────────────────┘ └───────────┘ + │ + ┌──────────────┐ consumer group info │ + │ azurelogs │ (state, position, or │ + │<> │◀───────────────offset)──────────────┘ + └──────────────┘ +``` + +#### How many Storage account containers? + +The Elastic Agent can use one Storage Account (SA) for multiple integrations. + +The Agent creates one SA container for the integration. The SA container name is a combination of the event hub name and a prefix (`azure-eventhub-input-[eventhub]`). + +### Running the integration behind a firewall + +When you run the Elastic Agent behind a firewall, to ensure proper communication with the necessary components, you need to allow traffic on port `5671` and `5672` for the event hub, and port `443` for the Storage Account container. + +```text +┌────────────────────────────────┐ ┌───────────────────┐ ┌───────────────────┐ +│ │ │ │ │ │ +│ ┌────────────┐ ┌───────────┐ │ │ ┌──────────────┐ │ │ ┌───────────────┐ │ +│ │ diagnostic │ │ event hub │ │ │ │azure-eventhub│ │ │ │ activity logs │ │ +│ │ setting │──▶│ │◀┼AMQP─│ <> │─┼──┼▶│<>│ │ +│ └────────────┘ └───────────┘ │ │ └──────────────┘ │ │ └───────────────┘ │ +│ │ │ │ │ │ │ +│ │ │ │ │ │ │ +│ │ │ │ │ │ │ +│ ┌─────────────┬─────HTTPS─┼──────────┘ │ │ │ +│ ┌───────┼─────────────┼──────┐ │ │ │ │ │ +│ │ │ │ │ │ │ │ │ │ +│ │ ▼ ▼ │ │ └─Agent─────────────┘ └─Elastic Cloud─────┘ +│ │ ┌──────────┐ ┌──────────┐ │ │ +│ │ │ 0 │ │ 1 │ │ │ +│ │ │ <> │ │ <> │ │ │ +│ │ └──────────┘ └──────────┘ │ │ +│ │ │ │ +│ │ │ │ +│ └─Storage Account Container──┘ │ +│ │ +│ │ +└─Azure──────────────────────────┘ +``` + +#### Event Hub + +Port `5671` and `5672` are commonly used for secure communication with the event hub. These ports are used to receive events. By allowing traffic on these ports, the Elastic Agent can establish a secure connection with the event hub. + +For more information, check the following documents: + +* [What ports do I need to open on the firewall?](https://learn.microsoft.com/en-us/azure/event-hubs/event-hubs-faq#what-ports-do-i-need-to-open-on-the-firewall) from the [Event Hubs frequently asked questions](https://learn.microsoft.com/en-us/azure/event-hubs/event-hubs-faq#what-ports-do-i-need-to-open-on-the-firewall). +* [AMQP outbound port requirements](https://learn.microsoft.com/en-us/azure/service-bus-messaging/service-bus-amqp-protocol-guide#amqp-outbound-port-requirements) + +#### Storage Account Container + +Port `443` is used for secure communication with the Storage Account container. This port is commonly used for HTTPS traffic. By allowing traffic on port 443, the Elastic Agent can securely access and interact with the Storage Account container, which is essential for storing and retrieving checkpoint data for each event hub partition. + +#### DNS + +Optionally, you can restrict the traffic to the following domain names: + +```text +*.servicebus.windows.net +*.blob.core.windows.net +*.cloudapp.net +``` + +## Settings + +Use the following settings to configure the Azure Logs integration when you add it to Fleet. + +`eventhub` : +_string_ +A fully managed, real-time data ingestion service. Elastic recommends using only letters, numbers, and the hyphen (-) character for event hub names to maximize compatibility. You can use existing event hubs having underscores (_) in the event hub name; in this case, the integration will replace underscores with hyphens (-) when it uses the event hub name to create dependent Azure resources behind the scenes (e.g., the storage account container to store event hub consumer offsets). Elastic also recommends using a separate event hub for each log type as the field mappings of each log type differ. +Default value `insights-operational-logs`. + +`consumer_group` : +_string_ +Enable the publish/subscribe mechanism of Event Hubs with consumer groups. A consumer group is a view (state, position, or offset) of an entire event hub. Consumer groups enable multiple consuming applications to each have a separate view of the event stream, and to read the stream independently at their own pace and with their own offsets. +Default value: `$Default` + +`connection_string` : +_string_ + +The connection string required to communicate with Event Hubs. See [Get an Event Hubs connection string](https://docs.microsoft.com/en-us/azure/event-hubs/event-hubs-get-connection-string) for more information. + +A Blob Storage account is required to store/retrieve/update the offset or state of the event hub messages. This allows the integration to start back up at the spot that it stopped processing messages. + +`storage_account` : +_string_ +The name of the storage account that the state/offsets will be stored and updated. + +`storage_account_key` : +_string_ +The storage account key. Used to authorize access to data in your storage account. + +`storage_account_container` : +_string_ +The storage account container where the integration stores the checkpoint data for the consumer group. It is an advanced option to use with extreme care. You MUST use a dedicated storage account container for each Azure log type (activity, sign-in, audit logs, and others). DO NOT REUSE the same container name for more than one Azure log type. See [Container Names](https://docs.microsoft.com/en-us/rest/api/storageservices/naming-and-referencing-containers--blobs--and-metadata#container-names) for details on naming rules from Microsoft. The integration generates a default container name if not specified. + +`pipeline` : +_string_ +Optional. Overrides the default ingest pipeline for this integration. + +`resource_manager_endpoint` : +_string_ +Optional. By default, the integration uses the Azure public environment. To override this and use a different Azure environment, users can provide a specific resource manager endpoint + +Examples: + +* Azure ChinaCloud: `https://management.chinacloudapi.cn/` +* Azure GermanCloud: `https://management.microsoftazure.de/` +* Azure PublicCloud: `https://management.azure.com/` +* Azure USGovernmentCloud: `https://management.usgovcloudapi.net/` + +This setting can also be used to define your own endpoints, like for hybrid cloud models. + +## Handling Malformed JSON in Azure Logs + +Azure services have been observed to send [malformed JSON](https://learn.microsoft.com/en-us/answers/questions/1001797/invalid-json-logs-produced-for-function-apps) documents occasionally. These logs can disrupt the expected JSON formatting and lead to parsing issues during processing. + +To address this issue, the advanced settings section of each data stream offers two sanitization options: + +* Sanitizes New Lines: removes new lines in logs. +* Sanitizes Single Quotes: replaces single quotes with double quotes in logs, excluding single quotes occurring within double quotes. + +Malformed logs can be indentified by: + +* Presence of a records array in the message field, indicating a failure to unmarshal the byte slice. +* Existence of an error.message field containing the text "Received invalid JSON from the Azure Cloud platform. Unable to parse the source log message." + +Known data streams that might produce malformed logs: + +* Platform Logs +* Spring Apps Logs +* PostgreSQL Flexible Servers Logs diff --git a/packages/azure_logs/agent/input/input.yml.hbs b/packages/azure_logs/agent/input/input.yml.hbs new file mode 100644 index 000000000000..57d44412dfb7 --- /dev/null +++ b/packages/azure_logs/agent/input/input.yml.hbs @@ -0,0 +1,51 @@ +{{#if connection_string}} +connection_string: {{connection_string}} +{{/if}} +{{#if storage_account_container }} +storage_account_container: {{storage_account_container}} +{{else}} +{{#if eventhub}} +storage_account_container: azure-eventhub-input-{{eventhub}} +{{/if}} +{{/if}} +{{#if eventhub}} +eventhub: {{eventhub}} +{{/if}} +{{#if consumer_group}} +consumer_group: {{consumer_group}} +{{/if}} +{{#if storage_account}} +storage_account: {{storage_account}} +{{/if}} +{{#if storage_account_key}} +storage_account_key: {{storage_account_key}} +{{/if}} +{{#if pipeline}} +pipeline: {{pipeline}} +{{/if}} +{{#if resource_manager_endpoint}} +resource_manager_endpoint: {{resource_manager_endpoint}} +{{/if}} +data_stream: + dataset: {{data_stream.dataset}} +tags: +{{#if preserve_original_event}} + - preserve_original_event +{{/if}} +{{#each tags as |tag i|}} + - {{tag}} +{{/each}} +{{#contains "forwarded" tags}} +publisher_pipeline.disable_host: true +{{/contains}} +{{#if processors}} +processors: +{{processors}} +{{/if}} +sanitize_options: +{{#if sanitize_newlines}} + - NEW_LINES +{{/if}} +{{#if sanitize_singlequotes}} + - SINGLE_QUOTES +{{/if}} \ No newline at end of file diff --git a/packages/azure_logs/changelog.yml b/packages/azure_logs/changelog.yml new file mode 100644 index 000000000000..42e95ebc7a89 --- /dev/null +++ b/packages/azure_logs/changelog.yml @@ -0,0 +1,6 @@ +# newer versions go on top +- version: "0.1.0" + changes: + - description: Add Custom Azure Logs to collect log events from Azure Event Hubs + type: enhancement + link: https://github.com/elastic/integrations/pull/11552 diff --git a/packages/azure_logs/docs/README.md b/packages/azure_logs/docs/README.md new file mode 100644 index 000000000000..d9c9de14de77 --- /dev/null +++ b/packages/azure_logs/docs/README.md @@ -0,0 +1,401 @@ +# Custom Azure Logs + +The Custom Azure Logs integration collects logs from Azure Event Hub. + +Use the integration to collect logs from: + +* Azure services that support exporting logs to Event Hub +* Any other source that can send logs to an Event Hub + +## Data streams + +The Custom Azure Logs integration collects one type of data stream: logs. + +The integration does not use a pre-defined Elastic data stream. You can select your dataset and namespace of choice when configuring the integration. + +For example, if you select `azure.custom` as your dataset, and `default` as your namespace, the integration will send the data to the `logs-azure.custom-default` data stream. + +Custom Logs integrations give you all the flexibility you need to configure the integration to your needs. + +## Requirements + +You need Elasticsearch for storing and searching your data and Kibana for visualizing and managing it. +You can use our hosted Elasticsearch Service on Elastic Cloud, which is recommended, or self-manage the Elastic Stack on your own hardware. + +Before using the Custom Azure Logs you will need: + +* One **event hub** to store in-flight logs exported by Azure services (or other sources) and make them available to Elastic Agent. +* A **storage account** to store information about logs consumed by the Elastic Agent. + +### Event Hub + +[Azure Event Hubs](https://learn.microsoft.com/en-us/azure/event-hubs/event-hubs-about) is a data streaming platform and event ingestion service. It can receive and temporary store millions of events. + +Elastic Agent with the Custom Azure Logs integration will consume logs from the Event Hubs service. + +```text + ┌────────────────┐ ┌───────────┐ + │ myeventhub │ │ Elastic │ + │ <> │─────▶│ Agent │ + └────────────────┘ └───────────┘ +``` + +To learn more about Event Hubs, refer to [Features and terminology in Azure Event Hubs](https://learn.microsoft.com/en-us/azure/event-hubs/event-hubs-features). + +### Storage Account Container + +The [Storage account](https://learn.microsoft.com/en-us/azure/storage/common/storage-account-overview) is a versatile Azure service that allows you to store data in various storage types, including blobs, file shares, queues, tables, and disks. + +The Custom Azure Logs integration requires a Storage account container to work. + +The integration uses the Storage Account container for checkpointing; it stores data about the Consumer Group (state, position, or offset) and shares it among the Elastic Agents. Sharing such information allows multiple Elastic Agents assigned to the same agent policy to work together; this enables horizontal scaling of the logs processing when required. + +```text + ┌────────────────┐ ┌───────────┐ + │ myeventhub │ logs │ Elastic │ + │ <> │────────────────────▶│ Agent │ + └────────────────┘ └───────────┘ + │ + consumer group info │ + ┌────────────────┐ (state, position, or │ + │ log-myeventhub │ offset) │ + │ <> │◀───────────────────────────┘ + └────────────────┘ +``` + +The Elastic Agent automatically creates one container for the Custom Azure Logs integration. The Agent will then create one blob for each partition on the event hub. + +For example, if the integration is configured to fetch data from an event hub with four partitions, the Agent will create the following: + +* One storage account container. +* Four blobs in that container. + +The information stored in the blobs is small (usually < 500 bytes per blob) and accessed frequently. Elastic recommends using the Hot storage tier. + +You need to keep the Storage Account container as long as you need to run the integration with the Elastic Agent. If you delete a Storage Account container, the Elastic Agent will stop working and create a new one the next time it starts. + +By deleting a Storage Account container, the Elastic Agent will lose track of the last message processed and start processing messages from the beginning of the event hub retention period. + +## Setup + +Before adding the integration, you must complete the following tasks. + +### Create an Event Hub + +The event hub receives the logs exported from the Azure service and makes them available to the Elastic Agent to pick up. + +Here's the high-level overview of the required steps: + +* Create a resource group, or select an existing one. +* Create an Event Hubs namespace. +* Create an event hub. + +For a detailed step-by-step guide, check the quickstart [Create an event hub using Azure portal](https://learn.microsoft.com/en-us/azure/event-hubs/event-hubs-create). + +Take note of the event hub **Name**, which you will use later when specifying an **eventhub** in the integration settings. + +#### Event Hubs Namespace vs Event Hub + +You should use the event hub name (not the Event Hubs namespace name) as a value for the **eventhub** option in the integration settings. + +If you are new to Event Hubs, think of the Event Hubs namespace as the cluster and the event hub as the topic. You will typically have one cluster and multiple topics. + +If you are familiar with Kafka, here's a conceptual mapping between the two: + +| Kafka Concept | Event Hub Concept | +|----------------|-------------------| +| Cluster | Namespace | +| Topic | An event hub | +| Partition | Partition | +| Consumer Group | Consumer Group | +| Offset | Offset | + +#### How many partitions? + +The number of partitions is essential to balance the event hub cost and performance. + +Here are a few examples with one or multiple agents, with recommendations on picking the correct number of partitions for your use case. + +##### Single Agent + +With a single Agent deployment, increasing the number of partitions on the event hub is the primary driver in scale-up performances. The Agent creates one worker for each partition. + +```text +┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐ ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐ + +│ │ │ │ + +│ ┌─────────────────┐ │ │ ┌─────────────────┐ │ + │ partition 0 │◀───────────│ worker │ +│ └─────────────────┘ │ │ └─────────────────┘ │ + ┌─────────────────┐ ┌─────────────────┐ +│ │ partition 1 │◀──┼────┼───│ worker │ │ + └─────────────────┘ └─────────────────┘ +│ ┌─────────────────┐ │ │ ┌─────────────────┐ │ + │ partition 2 │◀────────── │ worker │ +│ └─────────────────┘ │ │ └─────────────────┘ │ + ┌─────────────────┐ ┌─────────────────┐ +│ │ partition 3 │◀──┼────┼───│ worker │ │ + └─────────────────┘ └─────────────────┘ +│ │ │ │ + +│ │ │ │ + +└ Event Hub ─ ─ ─ ─ ─ ─ ─ ┘ └ Agent ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘ +``` + +##### Two or more Agents + +With more than one Agent, setting the number of partitions is crucial. The agents share the existing partitions to scale out performance and improve availability. + +The number of partitions must be at least the number of agents. + +```text +┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐ ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐ + +│ │ │ ┌─────────────────┐ │ + ┌──────│ worker │ +│ ┌─────────────────┐ │ │ │ └─────────────────┘ │ + │ partition 0 │◀────┘ ┌─────────────────┐ +│ └─────────────────┘ │ ┌──┼───│ worker │ │ + ┌─────────────────┐ │ └─────────────────┘ +│ │ partition 1 │◀──┼─┘ │ │ + └─────────────────┘ ─Agent─ ─ ─ ─ ─ ─ ─ ─ ─ ─ +│ ┌─────────────────┐ │ ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐ + │ partition 2 │◀────┐ +│ └─────────────────┘ │ │ │ ┌─────────────────┐ │ + ┌─────────────────┐ └─────│ worker │ +│ │ partition 3 │◀──┼─┐ │ └─────────────────┘ │ + └─────────────────┘ │ ┌─────────────────┐ +│ │ └──┼──│ worker │ │ + └─────────────────┘ +│ │ │ │ + +└ Event Hub ─ ─ ─ ─ ─ ─ ─ ┘ └ Agent ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘ +``` + +##### Recommendations + +Create an event hub with at least two partitions. Two partitions allow low-volume deployment to support high availability with two agents. Consider creating four partitions or more to handle medium-volume deployments with availability. + +To learn more about event hub partitions, read an in-depth guide from Microsoft at https://learn.microsoft.com/en-us/azure/event-hubs/event-hubs-create. + +To learn more about event hub partition from the performance perspective, check the scalability-focused document at https://learn.microsoft.com/en-us/azure/event-hubs/event-hubs-scalability#partitions. + +#### Consumer Group + +Like all other event hub clients, Elastic Agent needs a consumer group name to access the event hub. + +A Consumer Group is a view (state, position, or offset) of an entire event hub. Consumer groups enable multiple agents to each have a separate view of the event stream, and to read the logs independently at their own pace and with their own offsets. + +Consumer groups allow multiple Elastic Agents assigned to the same agent policy to work together; this enables horizontal scaling of the logs processing when required. + +In most cases, you can use the default consumer group named `$Default`. If `$Default` is already used by other applications, you can create a consumer group dedicated to the Azure Logs integration. + +#### Connection string + +The Elastic Agent requries a connection string to access the event hub and fetch the exported logs. The connection string contains details about the event hub used and the credentials required to access it. + +To get the connection string for your Event Hubs namespace: + +1. Visit the **Event Hubs namespace** you created in a previous step. +1. Select **Settings** > **Shared access policies**. + +Create a new Shared Access Policy (SAS): + +1. Select **Add** to open the creation panel. +1. Add a **Policy name** (for example, "ElasticAgent"). +1. Select the **Listen** claim. +1. Select **Create**. + +When the SAS Policy is ready, select it to display the information panel. + +Take note of the **Connection string–primary key**, which you will use later when specifying a **connection_string** in the integration settings. + +### Create a Diagnostic Settings + +The diagnostic settings export the logs from Azure services to a destination and in order to use Azure Logs integration, it must be an event hubb. + +To create a diagnostic settings to export logs: + +1. Locate the diagnostic settings for the service (for example, Microsoft Entra ID). +1. Select diagnostic settings in the **Monitoring** section of the service. Note that different services may place the diagnostic settings in different positions. +1. Select **Add diagnostic settings**. + +In the diagnostic settings page you have to select the source **log categories** you want to export and then select their **destination**. + +#### Select log categories + +Each Azure services exports a well-defined list of log categories. Check the individual integration doc to learn which log categories are supported by the integration. + +#### Select the destination + +Select the **subscription** and the **Event Hubs namespace** you previously created. Select the event hub dedicated to this integration. + +```text + ┌───────────────┐ ┌──────────────┐ ┌───────────────┐ ┌───────────┐ + │ MS Entra ID │ │ Diagnostic │ │ adlogs │ │ Elastic │ + │ <> ├──▶│ Settings │──▶│ <> │─────▶│ Agent │ + └───────────────┘ └──────────────┘ └───────────────┘ └───────────┘ +``` + +### Create a Storage account container + +The Elastic Agent stores the consumer group information (state, position, or offset) in a storage account container. Making this information available to all agents allows them to share the logs processing and resume from the last processed logs after a restart. + +NOTE: Use the storage account as a checkpoint store only. + +To create the storage account: + +1. Sign in to the [Azure Portal](https://portal.azure.com/) and create your storage account. +1. While configuring your project details, make sure you select the following recommended default settings: + - Hierarchical namespace: disabled + - Minimum TLS version: Version 1.2 + - Access tier: Hot + - Enable soft delete for blobs: disabled + - Enable soft delete for containers: disabled + +1. When the new storage account is ready, you need to take note of the storage account name and the storage account access keys, as you will use them later to authenticate your Elastic application’s requests to this storage account. + +This is the final diagram of the a setup for collecting Activity logs from the Azure Monitor service. + +```text + ┌───────────────┐ ┌──────────────┐ ┌────────────────┐ ┌───────────┐ + │ MS Entra ID │ │ Diagnostic │ │ adlogs │ logs │ Elastic │ + │ <> ├──▶│ Settings │──▶│ <> │────────▶│ Agent │ + └───────────────┘ └──────────────┘ └────────────────┘ └───────────┘ + │ + ┌──────────────┐ consumer group info │ + │ azurelogs │ (state, position, or │ + │<> │◀───────────────offset)──────────────┘ + └──────────────┘ +``` + +#### How many Storage account containers? + +The Elastic Agent can use one Storage Account (SA) for multiple integrations. + +The Agent creates one SA container for the integration. The SA container name is a combination of the event hub name and a prefix (`azure-eventhub-input-[eventhub]`). + +### Running the integration behind a firewall + +When you run the Elastic Agent behind a firewall, to ensure proper communication with the necessary components, you need to allow traffic on port `5671` and `5672` for the event hub, and port `443` for the Storage Account container. + +```text +┌────────────────────────────────┐ ┌───────────────────┐ ┌───────────────────┐ +│ │ │ │ │ │ +│ ┌────────────┐ ┌───────────┐ │ │ ┌──────────────┐ │ │ ┌───────────────┐ │ +│ │ diagnostic │ │ event hub │ │ │ │azure-eventhub│ │ │ │ activity logs │ │ +│ │ setting │──▶│ │◀┼AMQP─│ <> │─┼──┼▶│<>│ │ +│ └────────────┘ └───────────┘ │ │ └──────────────┘ │ │ └───────────────┘ │ +│ │ │ │ │ │ │ +│ │ │ │ │ │ │ +│ │ │ │ │ │ │ +│ ┌─────────────┬─────HTTPS─┼──────────┘ │ │ │ +│ ┌───────┼─────────────┼──────┐ │ │ │ │ │ +│ │ │ │ │ │ │ │ │ │ +│ │ ▼ ▼ │ │ └─Agent─────────────┘ └─Elastic Cloud─────┘ +│ │ ┌──────────┐ ┌──────────┐ │ │ +│ │ │ 0 │ │ 1 │ │ │ +│ │ │ <> │ │ <> │ │ │ +│ │ └──────────┘ └──────────┘ │ │ +│ │ │ │ +│ │ │ │ +│ └─Storage Account Container──┘ │ +│ │ +│ │ +└─Azure──────────────────────────┘ +``` + +#### Event Hub + +Port `5671` and `5672` are commonly used for secure communication with the event hub. These ports are used to receive events. By allowing traffic on these ports, the Elastic Agent can establish a secure connection with the event hub. + +For more information, check the following documents: + +* [What ports do I need to open on the firewall?](https://learn.microsoft.com/en-us/azure/event-hubs/event-hubs-faq#what-ports-do-i-need-to-open-on-the-firewall) from the [Event Hubs frequently asked questions](https://learn.microsoft.com/en-us/azure/event-hubs/event-hubs-faq#what-ports-do-i-need-to-open-on-the-firewall). +* [AMQP outbound port requirements](https://learn.microsoft.com/en-us/azure/service-bus-messaging/service-bus-amqp-protocol-guide#amqp-outbound-port-requirements) + +#### Storage Account Container + +Port `443` is used for secure communication with the Storage Account container. This port is commonly used for HTTPS traffic. By allowing traffic on port 443, the Elastic Agent can securely access and interact with the Storage Account container, which is essential for storing and retrieving checkpoint data for each event hub partition. + +#### DNS + +Optionally, you can restrict the traffic to the following domain names: + +```text +*.servicebus.windows.net +*.blob.core.windows.net +*.cloudapp.net +``` + +## Settings + +Use the following settings to configure the Azure Logs integration when you add it to Fleet. + +`eventhub` : +_string_ +A fully managed, real-time data ingestion service. Elastic recommends using only letters, numbers, and the hyphen (-) character for event hub names to maximize compatibility. You can use existing event hubs having underscores (_) in the event hub name; in this case, the integration will replace underscores with hyphens (-) when it uses the event hub name to create dependent Azure resources behind the scenes (e.g., the storage account container to store event hub consumer offsets). Elastic also recommends using a separate event hub for each log type as the field mappings of each log type differ. +Default value `insights-operational-logs`. + +`consumer_group` : +_string_ +Enable the publish/subscribe mechanism of Event Hubs with consumer groups. A consumer group is a view (state, position, or offset) of an entire event hub. Consumer groups enable multiple consuming applications to each have a separate view of the event stream, and to read the stream independently at their own pace and with their own offsets. +Default value: `$Default` + +`connection_string` : +_string_ + +The connection string required to communicate with Event Hubs. See [Get an Event Hubs connection string](https://docs.microsoft.com/en-us/azure/event-hubs/event-hubs-get-connection-string) for more information. + +A Blob Storage account is required to store/retrieve/update the offset or state of the event hub messages. This allows the integration to start back up at the spot that it stopped processing messages. + +`storage_account` : +_string_ +The name of the storage account that the state/offsets will be stored and updated. + +`storage_account_key` : +_string_ +The storage account key. Used to authorize access to data in your storage account. + +`storage_account_container` : +_string_ +The storage account container where the integration stores the checkpoint data for the consumer group. It is an advanced option to use with extreme care. You MUST use a dedicated storage account container for each Azure log type (activity, sign-in, audit logs, and others). DO NOT REUSE the same container name for more than one Azure log type. See [Container Names](https://docs.microsoft.com/en-us/rest/api/storageservices/naming-and-referencing-containers--blobs--and-metadata#container-names) for details on naming rules from Microsoft. The integration generates a default container name if not specified. + +`pipeline` : +_string_ +Optional. Overrides the default ingest pipeline for this integration. + +`resource_manager_endpoint` : +_string_ +Optional. By default, the integration uses the Azure public environment. To override this and use a different Azure environment, users can provide a specific resource manager endpoint + +Examples: + +* Azure ChinaCloud: `https://management.chinacloudapi.cn/` +* Azure GermanCloud: `https://management.microsoftazure.de/` +* Azure PublicCloud: `https://management.azure.com/` +* Azure USGovernmentCloud: `https://management.usgovcloudapi.net/` + +This setting can also be used to define your own endpoints, like for hybrid cloud models. + +## Handling Malformed JSON in Azure Logs + +Azure services have been observed to send [malformed JSON](https://learn.microsoft.com/en-us/answers/questions/1001797/invalid-json-logs-produced-for-function-apps) documents occasionally. These logs can disrupt the expected JSON formatting and lead to parsing issues during processing. + +To address this issue, the advanced settings section of each data stream offers two sanitization options: + +* Sanitizes New Lines: removes new lines in logs. +* Sanitizes Single Quotes: replaces single quotes with double quotes in logs, excluding single quotes occurring within double quotes. + +Malformed logs can be indentified by: + +* Presence of a records array in the message field, indicating a failure to unmarshal the byte slice. +* Existence of an error.message field containing the text "Received invalid JSON from the Azure Cloud platform. Unable to parse the source log message." + +Known data streams that might produce malformed logs: + +* Platform Logs +* Spring Apps Logs +* PostgreSQL Flexible Servers Logs diff --git a/packages/azure_logs/fields/base-fields.yml b/packages/azure_logs/fields/base-fields.yml new file mode 100644 index 000000000000..7c798f4534ca --- /dev/null +++ b/packages/azure_logs/fields/base-fields.yml @@ -0,0 +1,12 @@ +- name: data_stream.type + type: constant_keyword + description: Data stream type. +- name: data_stream.dataset + type: constant_keyword + description: Data stream dataset. +- name: data_stream.namespace + type: constant_keyword + description: Data stream namespace. +- name: '@timestamp' + type: date + description: Event timestamp. diff --git a/packages/azure_logs/img/icon.svg b/packages/azure_logs/img/icon.svg new file mode 100644 index 000000000000..173fdec5072e --- /dev/null +++ b/packages/azure_logs/img/icon.svg @@ -0,0 +1,4 @@ + + + + \ No newline at end of file diff --git a/packages/azure_logs/manifest.yml b/packages/azure_logs/manifest.yml new file mode 100644 index 000000000000..dc1ffbcf8fd0 --- /dev/null +++ b/packages/azure_logs/manifest.yml @@ -0,0 +1,149 @@ +format_version: 3.3.0 +name: azure_logs +title: "Custom Azure Logs" +version: 0.1.0 +source: + license: Elastic-2.0 +description: "Collect log events from Azure Event Hubs with Elastic Agent" +type: input +categories: + - azure + - custom + - observability +conditions: + kibana: + version: "^8.13.0" + elastic: + subscription: "basic" +icons: + - src: "/img/icon.svg" + type: "image/svg+xml" +policy_templates: + - name: azure-logs + type: logs + title: Collect Azure logs from Event Hub + description: Collect Azure logs from Event Hub using the azure-eventhub input. + input: azure-eventhub + template_path: input.yml.hbs + vars: + - name: eventhub + type: text + title: Event Hub Name + multi: false + required: true + show_user: true + description: >- + The event hub name that contains the logs to ingest. + Do not use the event hub namespace here. Elastic + recommends using one event hub for each integration. + Visit [Create an event hub](https://docs.elastic.co/integrations/azure#create-an-event-hub) + to learn more. Use event hub names up to 30 characters long + to avoid compatibility issues. + - name: consumer_group + type: text + title: Consumer Group + multi: false + required: true + show_user: true + default: $Default + - name: connection_string + type: password + secret: true + title: Connection String + multi: false + required: true + show_user: true + description: >- + The connection string required to communicate with Event Hubs. + See [Get an Event Hubs connection string](https://docs.microsoft.com/en-us/azure/event-hubs/event-hubs-get-connection-string) + to learn more. + - name: storage_account + type: text + title: Storage Account + multi: false + required: true + show_user: true + description: >- + The name of the storage account where the consumer group's state/offsets + will be stored and updated. + - name: storage_account_key + type: password + secret: true + title: Storage Account Key + multi: false + required: true + show_user: true + description: >- + The storage account key, this key will be used to authorize access to + data in your storage account. + - name: data_stream.dataset + type: text + title: Dataset name + description: >- + Dataset to write data to. Changing the dataset will send the data to a different index. + You can't use `-` in the name of a dataset and only valid characters for + [Elasticsearch index names](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-index_.html). + default: azure_logs.generic + required: true + show_user: true + - name: pipeline + type: text + title: Ingest Pipeline + description: >- + The ingest pipeline ID to use for processing the data. If provided, + replaces the default pipeline for this integration. + required: false + show_user: true + - name: resource_manager_endpoint + type: text + title: Resource Manager Endpoint + description: >- + The Azure Resource Manager endpoint to use for authentication. + multi: false + required: false + show_user: false + - name: tags + type: text + title: Tags + multi: true + required: true + show_user: false + default: + - azure-eventhub + - forwarded + - name: processors + type: yaml + title: Processors + multi: false + required: false + show_user: false + description: > + Processors are used to reduce the number of fields in the exported + event or to enhance the event with metadata. This executes in the agent + before the logs are parsed. + See [Processors](https://www.elastic.co/guide/en/beats/filebeat/current/filtering-and-enhancing-data.html) + for details. + - name: sanitize_newlines + type: bool + title: Sanitizes New Lines + description: > + Removes new lines in logs to ensure proper formatting of JSON data and + avoid parsing issues during processing. + multi: false + required: false + show_user: false + default: false + - name: sanitize_singlequotes + required: true + show_user: false + title: Sanitizes Single Quotes + description: > + Replaces single quotes with double quotes (single quotes inside double + quotes are omitted) in logs to ensure proper formatting of JSON data + and avoid parsing issues during processing. + type: bool + multi: false + default: false +owner: + github: elastic/obs-ds-hosted-services + type: elastic