Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft: Allow parsing of IPv6 addresses in ingest pipeline #34387

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

cFire
Copy link

@cFire cFire commented Jan 25, 2023

@zmoog: Draft PR for #34277. Will add test cases soon.

  • Enhancement

What does this PR do?

Adds support for parsing IPv6 addresses in the filebeat Azure activitylogs, auditlogs, and platformlogs ingest pipelines.

Why is it important?

Currently any logs from these Azure log source which have an IPv6 address as the source are not ingested into elasticsearch because the ingest pipeline throws an error when attempting to ingest these logs.

Checklist

  • My code follows the style guidelines of this project
    - [ ] I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
    - [ ] I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

How to test this PR locally

No easy test yet, test cases still need to be added

Related issues

This adds support for parsing IPv6 addresses in the filebeat
Azure activitylogs, auditlogs, and platformlogs ingest pipelines.
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Jan 25, 2023
@cla-checker-service
Copy link

cla-checker-service bot commented Jan 25, 2023

💚 CLA has been signed

@mergify mergify bot assigned cFire Jan 25, 2023
@mergify
Copy link
Contributor

mergify bot commented Jan 25, 2023

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @cFire? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v8./d.0 is the label to automatically backport to the 8./d branch. /d is the digit

@elasticmachine
Copy link
Collaborator

elasticmachine commented Jan 25, 2023

❕ Build Aborted

The PR is not allowed to run in the CI yet

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Start Time: 2024-03-26T11:59:59.706+0000

  • Duration: 5 min 27 sec

Steps errors 2

Expand to view the steps failures

Load a resource file from a library
  • Took 0 min 0 sec . View more details here
  • Description: approval-list/elastic/beats.yml
Error signal
  • Took 0 min 0 sec . View more details here
  • Description: githubApiCall: The REST API call https://api.github.com/orgs/elastic/members/cFire return the message : java.lang.Exception: httpRequest: Failure connecting to the service https://api.github.com/orgs/elastic/members/cFire : httpRequest: Failure connecting to the service https://api.github.com/orgs/elastic/members/cFire : Code: 404Error: {"message":"User does not exist or is not a member of the organization","documentation_url":"https://docs.github.com/rest/orgs/members#check-organization-membership-for-a-user"}

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@zmoog zmoog added bug Team:Cloud-Monitoring Label for the Cloud Monitoring team labels Jan 26, 2023
@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Jan 26, 2023
@@ -39,6 +39,7 @@ processors:
- grok:
field: azure.activitylogs.callerIpAddress
patterns:
- "%{IPV6:source.ip}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember a previous PR where other people switched from the grok processor to the covert processor with a type: ip.

For example, https://github.com/elastic/integrations/pull/3411/files.

This looks like the recommended approach. Can you test if it also works for your use case?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we might be able to just port those same changes from the elastic package to these pipelines. I think the behavior will slightly change since it can't be hostnames or ip:port or host:port pairs anymore, but as far as I'm aware this never happens in Azure logging, but it would be nice if I can find some documentation about that rather than assuming this is the case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean some documentation from the Azure side about the expected formats for these with IP addresses and ports?

Copy link
Contributor

@zmoog zmoog Jan 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @cFire, I am sorry. I see these Azure logs are using a variety of formats for the IP, so grok is better suited for this task.

Let me double-check some more logs samples to see if this is the case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am looking for some test documents because, according to Grok processor reusable patterns, the pattern named IPORHOST should already support IPv6.

@zmoog
Copy link
Contributor

zmoog commented Jan 31, 2023

Hey @cFire, could you add a test document for each data stream in the azure/DATA_STREAM/test directory?

This directory contains some file pairs (.log and .log-expected.json.

For example, at https://github.com/cFire/beats/tree/34277-azure-ipv6/x-pack/filebeat/module/azure/activitylogs/test you can see that:

  • activitylogs.log contains the original log events
  • activitylogs.log-expected.json contains the event processed by the ingest pipeline

Having real test documents to test against the expected results is important.

@cFire
Copy link
Author

cFire commented Feb 1, 2023

Certainly. I had it in my schedule for sometime this week but I'll see if I can get it done later today.

@cFire
Copy link
Author

cFire commented Feb 1, 2023

@zmoog I have a test case I want to try locally to make sure it works, but I can't seem to find the way to run these tests. None of the make/mage commands I've found seem to run these tests. Perhaps you have any hints on where the documentation is that would explain how to run them?

@zmoog
Copy link
Contributor

zmoog commented Feb 2, 2023

Perhaps you have any hints on where the documentation is that would explain how to run them?

Hey @cFire, I can't find the docs that explain how to run the ingest pipelines. So I created a gist to capture what I know so far:

https://gist.github.com/zmoog/cf50d14416f5c732656fb2f41a1e7acf

I'll open a PR to update the Beats docs if this information is unavailable. Let me know if it works for you, here or on the gist.

@cFire
Copy link
Author

cFire commented Feb 2, 2023

Thanks for the info! I took a quick glance a the gist, it looks like that'll do it. I'll see if I can find some time in the workday tomorrow to give it a shot.

@cFire
Copy link
Author

cFire commented Feb 3, 2023

It seems I'm still missing a piece. Running with PYTEST_ADDOPTS="-k test_modules" it errors out because pytest doesn't recognize the --timeout=90 argument. Perhaps there's a different (older?) pytest version that's needed?

Without the PYTEST_ADDOPTS var set it runs into ERROR: Could not install packages due to an EnvironmentError: [Errno 39] Directory not empty: '_vendor' but I haven't yet looked into why that's happening.

Just to check: I'm using the ubuntu2004 vagrant box. Should I be using a different one maybe?

@zmoog
Copy link
Contributor

zmoog commented Feb 3, 2023

Just to check: I'm using the ubuntu2004 vagrant box. Should I be using a different one, maybe?

I am on macOS, but I can try it on a ubuntu2004 virtual machine.

In the meantime, can you add the source document to the PR? I will run the tests and generate the expected documents.

This adds support for parsing IPv6 addresses in the filebeat
Azure activitylogs, auditlogs, and platformlogs ingest pipelines.
@cFire
Copy link
Author

cFire commented Feb 8, 2023

Just as a status update so you know it's still in progress: Source documents are coming. Since they're samples from a live environment they have to be anonymized and double-checked etc. etc.

@zmoog
Copy link
Contributor

zmoog commented Feb 9, 2023

Just as a status update so you know it's still in progress: Source documents are coming. Since they're samples from a live environment they have to be anonymized and double-checked etc. etc.

Great thanks! And yes, please double-check that we are not leaking any private details.

@cFire
Copy link
Author

cFire commented Feb 13, 2023

The test cases have been added. The audit- and signinlogs are anonymized samples from our own infrastructure. For the activitylog sample we re-used the existing test case and replaced the IPv4 address with an IPv6 address.

@zmoog
Copy link
Contributor

zmoog commented Mar 14, 2023

Hey @cFire, my apologies for the delay! 🙇

I am resuming the review of this PR. I remember checking the grok patterns for the IP addresses like IPORHOST, and noticing the IPv6 should be already supported by the grok processor in Elasticsearch.

So am running a test using one of the sample documents you added in the last two commits with:

  • Integration version: v1.5.11
  • Elasticsearch: 8.5.3
$ pbpaste | jq
{
  "callerIpAddress": "2603:c022:c000:301:41db:be0e:16ba:b4d7",
  "category": "Action",
  "correlationId": "8a4de8b5-095c-47d0-a96f-a75130c61d53",
  "durationMs": 0,
  "identity": {
    "authorization": {
      "action": "Microsoft.EventHub/namespaces/authorizationRules/listKeys/action",
      "evidence": {
        "principalId": "8a4de8b5-095c-47d0-a96f-a75130c61d53",
        "principalType": "ServicePrincipal",
        "role": "Azure EventGrid Service BuiltIn Role",
        "roleAssignmentId": "8a4de8b5-095c-47d0-a96f-a75130c61d53",
        "roleAssignmentScope": "/subscriptions/8a4de8b5-095c-47d0-a96f-a75130c61d53",
        "roleDefinitionId": "8a4de8b5-095c-47d0-a96f-a75130c61d53"
      },
      "scope": "/subscriptions/8a4de8b5-095c-47d0-a96f-a75130c61d53/resourceGroups/sa-hem/providers/Microsoft.EventHub/namespaces/azurelsevents/authorizationRules/RootManageSharedAccessKey"
    },
    "claims": {
      "aio": "8a4de8b5-095c-47d0-a96f-a75130c61d53",
      "appid": "8a4de8b5-095c-47d0-a96f-a75130c61d53",
      "appidacr": "2",
      "aud": "https://management.core.windows.net/",
      "exp": "1571904826",
      "http://schemas.microsoft.com/identity/claims/identityprovider": "https://sts.windows.net/8a4de8b5-095c-47d0-a96f-a75130c61d53/",
      "http://schemas.microsoft.com/identity/claims/objectidentifier": "8a4de8b5-095c-47d0-a96f-a75130c61d53",
      "http://schemas.microsoft.com/identity/claims/tenantid": "8a4de8b5-095c-47d0-a96f-a75130c61d53",
      "http://schemas.xmlsoap.org/ws/2005/05/identity/claims/nameidentifier": "8a4de8b5-095c-47d0-a96f-a75130c61d53",
      "iat": "1571875726",
      "iss": "https://sts.windows.net/8a4de8b5-095c-47d0-a96f-a75130c61d53/",
      "nbf": "1571875726",
      "uti": "8a4de8b5-095c-47d0-a96f-a75130c61d53",
      "ver": "1.0"
    }
  },
  "level": "Information",
  "location": "global",
  "operationName": "MICROSOFT.EVENTHUB/NAMESPACES/AUTHORIZATIONRULES/LISTKEYS/ACTION",
  "resourceId": "/SUBSCRIPTIONS/8a4de8b5-095c-47d0-a96f-a75130c61d53/RESOURCEGROUPS/SA-HEMA/PROVIDERS/MICROSOFT.EVENTHUB/NAMESPACES/AZURELSEVENTS/AUTHORIZATIONRULES/ROOTMANAGESHAREDACCESSKEY",
  "resultSignature": "Started.",
  "resultType": "Start",
  "time": "2019-10-24T00:13:46.3554259Z"
}

I'm turning the JSON document into a string:

$pbpaste | jq 'tojson'
"{\"callerIpAddress\":\"2603:c022:c000:301:41db:be0e:16ba:b4d7\",\"category\":\"Action\",\"correlationId\":\"8a4de8b5-095c-47d0-a96f-a75130c61d53\",\"durationMs\":0,\"identity\":{\"authorization\":{\"action\":\"Microsoft.EventHub/namespaces/authorizationRules/listKeys/action\",\"evidence\":{\"principalId\":\"8a4de8b5-095c-47d0-a96f-a75130c61d53\",\"principalType\":\"ServicePrincipal\",\"role\":\"Azure EventGrid Service BuiltIn Role\",\"roleAssignmentId\":\"8a4de8b5-095c-47d0-a96f-a75130c61d53\",\"roleAssignmentScope\":\"/subscriptions/8a4de8b5-095c-47d0-a96f-a75130c61d53\",\"roleDefinitionId\":\"8a4de8b5-095c-47d0-a96f-a75130c61d53\"},\"scope\":\"/subscriptions/8a4de8b5-095c-47d0-a96f-a75130c61d53/resourceGroups/sa-hem/providers/Microsoft.EventHub/namespaces/azurelsevents/authorizationRules/RootManageSharedAccessKey\"},\"claims\":{\"aio\":\"8a4de8b5-095c-47d0-a96f-a75130c61d53\",\"appid\":\"8a4de8b5-095c-47d0-a96f-a75130c61d53\",\"appidacr\":\"2\",\"aud\":\"https://management.core.windows.net/\",\"exp\":\"1571904826\",\"http://schemas.microsoft.com/identity/claims/identityprovider\":\"https://sts.windows.net/8a4de8b5-095c-47d0-a96f-a75130c61d53/\",\"http://schemas.microsoft.com/identity/claims/objectidentifier\":\"8a4de8b5-095c-47d0-a96f-a75130c61d53\",\"http://schemas.microsoft.com/identity/claims/tenantid\":\"8a4de8b5-095c-47d0-a96f-a75130c61d53\",\"http://schemas.xmlsoap.org/ws/2005/05/identity/claims/nameidentifier\":\"8a4de8b5-095c-47d0-a96f-a75130c61d53\",\"iat\":\"1571875726\",\"iss\":\"https://sts.windows.net/8a4de8b5-095c-47d0-a96f-a75130c61d53/\",\"nbf\":\"1571875726\",\"uti\":\"8a4de8b5-095c-47d0-a96f-a75130c61d53\",\"ver\":\"1.0\"}},\"level\":\"Information\",\"location\":\"global\",\"operationName\":\"MICROSOFT.EVENTHUB/NAMESPACES/AUTHORIZATIONRULES/LISTKEYS/ACTION\",\"resourceId\":\"/SUBSCRIPTIONS/8a4de8b5-095c-47d0-a96f-a75130c61d53/RESOURCEGROUPS/SA-HEMA/PROVIDERS/MICROSOFT.EVENTHUB/NAMESPACES/AZURELSEVENTS/AUTHORIZATIONRULES/ROOTMANAGESHAREDACCESSKEY\",\"resultSignature\":\"Started.\",\"resultType\":\"Start\",\"time\":\"2019-10-24T00:13:46.3554259Z\"}"

Now it's ready for the dev tools.

I am now simulating the activity logs pipeline using the Dev Tools in Kibana using the most recent version (1.5.11) of the integration:

POST _ingest/pipeline/logs-azure.activitylogs-1.5.11/_simulate
{
  "docs": [
    {
      "_source": {
        "tags": [],
        "@timestamp": "2022-10-04T13:05:22.643+1300",
        "message": "{\"callerIpAddress\":\"2603:c022:c000:301:41db:be0e:16ba:b4d7\",\"category\":\"Action\",\"correlationId\":\"8a4de8b5-095c-47d0-a96f-a75130c61d53\",\"durationMs\":0,\"identity\":{\"authorization\":{\"action\":\"Microsoft.EventHub/namespaces/authorizationRules/listKeys/action\",\"evidence\":{\"principalId\":\"8a4de8b5-095c-47d0-a96f-a75130c61d53\",\"principalType\":\"ServicePrincipal\",\"role\":\"Azure EventGrid Service BuiltIn Role\",\"roleAssignmentId\":\"8a4de8b5-095c-47d0-a96f-a75130c61d53\",\"roleAssignmentScope\":\"/subscriptions/8a4de8b5-095c-47d0-a96f-a75130c61d53\",\"roleDefinitionId\":\"8a4de8b5-095c-47d0-a96f-a75130c61d53\"},\"scope\":\"/subscriptions/8a4de8b5-095c-47d0-a96f-a75130c61d53/resourceGroups/sa-hem/providers/Microsoft.EventHub/namespaces/azurelsevents/authorizationRules/RootManageSharedAccessKey\"},\"claims\":{\"aio\":\"8a4de8b5-095c-47d0-a96f-a75130c61d53\",\"appid\":\"8a4de8b5-095c-47d0-a96f-a75130c61d53\",\"appidacr\":\"2\",\"aud\":\"https://management.core.windows.net/\",\"exp\":\"1571904826\",\"http://schemas.microsoft.com/identity/claims/identityprovider\":\"https://sts.windows.net/8a4de8b5-095c-47d0-a96f-a75130c61d53/\",\"http://schemas.microsoft.com/identity/claims/objectidentifier\":\"8a4de8b5-095c-47d0-a96f-a75130c61d53\",\"http://schemas.microsoft.com/identity/claims/tenantid\":\"8a4de8b5-095c-47d0-a96f-a75130c61d53\",\"http://schemas.xmlsoap.org/ws/2005/05/identity/claims/nameidentifier\":\"8a4de8b5-095c-47d0-a96f-a75130c61d53\",\"iat\":\"1571875726\",\"iss\":\"https://sts.windows.net/8a4de8b5-095c-47d0-a96f-a75130c61d53/\",\"nbf\":\"1571875726\",\"uti\":\"8a4de8b5-095c-47d0-a96f-a75130c61d53\",\"ver\":\"1.0\"}},\"level\":\"Information\",\"location\":\"global\",\"operationName\":\"MICROSOFT.EVENTHUB/NAMESPACES/AUTHORIZATIONRULES/LISTKEYS/ACTION\",\"resourceId\":\"/SUBSCRIPTIONS/8a4de8b5-095c-47d0-a96f-a75130c61d53/RESOURCEGROUPS/SA-HEMA/PROVIDERS/MICROSOFT.EVENTHUB/NAMESPACES/AZURELSEVENTS/AUTHORIZATIONRULES/ROOTMANAGESHAREDACCESSKEY\",\"resultSignature\":\"Started.\",\"resultType\":\"Start\",\"time\":\"2019-10-24T00:13:46.3554259Z\"}"
      }
    }
  ]
}

This is the output:

{
  "docs": [
    {
      "doc": {
        "_index": "_index",
        "_id": "_id",
        "_version": "-3",
        "_source": {
          "geo": {
            "continent_name": "Europe",
            "region_iso_code": "NL-NH",
            "city_name": "Amsterdam",
            "country_iso_code": "NL",
            "country_name": "Netherlands",
            "region_name": "North Holland",
            "location": {
              "lon": 4.9392,
              "lat": 52.352
            }
          },
          "cloud": {
            "provider": "azure"
          },
          "@timestamp": "2019-10-24T00:13:46.355Z",
          "ecs": {
            "version": "8.0.0"
          },
          "related": {
            "ip": [
              "2603:c022:c000:301:41db:be0e:16ba:b4d7"
            ]
          },
          "log": {
            "level": "Information"
          },
          "client": {
            "ip": "2603:c022:c000:301:41db:be0e:16ba:b4d7"
          },
          "source": {
            "geo": {
              "continent_name": "Europe",
              "region_iso_code": "NL-NH",
              "city_name": "Amsterdam",
              "country_iso_code": "NL",
              "country_name": "Netherlands",
              "region_name": "North Holland",
              "location": {
                "lon": 4.9392,
                "lat": 52.352
              }
            },
            "as": {
              "number": 31898,
              "organization": {
                "name": "ORACLE-BMC-31898"
              }
            },
            "ip": "2603:c022:c000:301:41db:be0e:16ba:b4d7"
          },
          "event": {
            "duration": 0,
            "kind": "event",
            "action": "MICROSOFT.EVENTHUB/NAMESPACES/AUTHORIZATIONRULES/LISTKEYS/ACTION",
            "type": [
              "change"
            ]
          },
          "tags": [],
          "azure": {
            "subscription_id": "8a4de8b5-095c-47d0-a96f-a75130c61d53",
            "resource": {
              "provider": "MICROSOFT.EVENTHUB",
              "namespace": "AZURELSEVENTS",
              "id": "/SUBSCRIPTIONS/8a4de8b5-095c-47d0-a96f-a75130c61d53/RESOURCEGROUPS/SA-HEMA/PROVIDERS/MICROSOFT.EVENTHUB/NAMESPACES/AZURELSEVENTS/AUTHORIZATIONRULES/ROOTMANAGESHAREDACCESSKEY",
              "authorization_rule": "ROOTMANAGESHAREDACCESSKEY",
              "group": "SA-HEMA"
            },
            "correlation_id": "8a4de8b5-095c-47d0-a96f-a75130c61d53",
            "activitylogs": {
              "operation_name": "MICROSOFT.EVENTHUB/NAMESPACES/AUTHORIZATIONRULES/LISTKEYS/ACTION",
              "result_type": "Start",
              "identity": {
                "authorization": {
                  "evidence": {
                    "role_definition_id": "8a4de8b5-095c-47d0-a96f-a75130c61d53",
                    "role": "Azure EventGrid Service BuiltIn Role",
                    "role_assignment_scope": "/subscriptions/8a4de8b5-095c-47d0-a96f-a75130c61d53",
                    "role_assignment_id": "8a4de8b5-095c-47d0-a96f-a75130c61d53",
                    "principal_type": "ServicePrincipal",
                    "principal_id": "8a4de8b5-095c-47d0-a96f-a75130c61d53"
                  },
                  "scope": "/subscriptions/8a4de8b5-095c-47d0-a96f-a75130c61d53/resourceGroups/sa-hem/providers/Microsoft.EventHub/namespaces/azurelsevents/authorizationRules/RootManageSharedAccessKey",
                  "action": "Microsoft.EventHub/namespaces/authorizationRules/listKeys/action"
                },
                "claims": {
                  "ver": "1.0",
                  "aio": "8a4de8b5-095c-47d0-a96f-a75130c61d53",
                  "iss": "https://sts.windows.net/8a4de8b5-095c-47d0-a96f-a75130c61d53/",
                  "uti": "8a4de8b5-095c-47d0-a96f-a75130c61d53",
                  "http://schemas_microsoft_com/identity/claims/identityprovider": "https://sts.windows.net/8a4de8b5-095c-47d0-a96f-a75130c61d53/",
                  "http://schemas_xmlsoap_org/ws/2005/05/identity/claims/nameidentifier": "8a4de8b5-095c-47d0-a96f-a75130c61d53",
                  "aud": "https://management.core.windows.net/",
                  "http://schemas_microsoft_com/identity/claims/tenantid": "8a4de8b5-095c-47d0-a96f-a75130c61d53",
                  "nbf": "1571875726",
                  "appidacr": "2",
                  "appid": "8a4de8b5-095c-47d0-a96f-a75130c61d53",
                  "exp": "1571904826",
                  "iat": "1571875726",
                  "http://schemas_microsoft_com/identity/claims/objectidentifier": "8a4de8b5-095c-47d0-a96f-a75130c61d53"
                }
              },
              "category": "Action",
              "event_category": "Administrative",
              "result_signature": "Started."
            }
          }
        },
        "_ingest": {
          "timestamp": "2023-03-14T21:36:02.158240685Z"
        }
      }
    }
  ]
}

The IPv6 addresses seem handled correctly, with no need of additional care.

@cFire, are you getting a different result in your cluster?

@cFire
Copy link
Author

cFire commented Mar 16, 2023

@zmoog If you are running the pipeline from the integration package, this was indeed already fixed. The pipeline filebeat pushes is slightly different and does break. (I'll re-test on a fresh dev setup with the latest versions probably today or tomorrow to confirm.)

@cFire
Copy link
Author

cFire commented Mar 16, 2023

I've made a very relevant observation just now, it only breaks if second chunk of the IPv6 address does not contain any a-f chars. "2603:c022:c000:301:41db:be0e:16ba:b4d7" works fine, but "2603:1022:c000:301:41db:be0e:16ba:b4d7" breaks. ('c' -> '1' in the second part of the IPv6 address.)

In that second case it throws the error "'2603' is not an IP string literal.".

In retrospect it makes some sense that this is the case where it could confuse an IPv6 address with IPORHOST:PORT_NUMBER.

@cFire
Copy link
Author

cFire commented Mar 16, 2023

Side note: I also just noticed I added a test-case for signinlogs instead of platformlogs. I'll leave the signinlogs test case since it's good to have an IPv6 test case in there too and I'll still add one for platformlogs.

@cFire
Copy link
Author

cFire commented Mar 16, 2023

Some further testing produced two simple test cases:
fd5d:a2c9::f (works)
fd5d:12c9::f (breaks)

It seems if the first character after the first : is a digit 0-9 that it will get confused between IPv6 addresses and something has a :port suffix.

@zmoog
Copy link
Contributor

zmoog commented Mar 19, 2023

If you are running the pipeline from the integration package, this was indeed already fixed. The pipeline filebeat pushes is slightly different and does break. (I'll re-test on a fresh dev setup with the latest versions probably today or tomorrow to confirm.)

Hey @cFire, apologies for this error; you are right. I assumed the two pipelines were identical, but as you already noticed, it's not the case.

Thank you so much for investing the time to investigate this issue and nail down two simple cases.

It seems the IPORHOST pattern is capable of parsing stabdard IPv6 addresses, and the issue with the callerIpAddress field has roots in the order of the patterns in the grok processor.

By looking at https://github.com/elastic/beats/blob/main/x-pack/filebeat/module/azure/activitylogs/ingest/pipeline.yml#L39-L46 I see the patterns with the ports are tried first, and then the plain %{IPORHOST:source.ip}:

- grok:
    field: azure.activitylogs.callerIpAddress
    patterns:
      - \[%{IPORHOST:source.ip}\]:%{INT:source.port:int}
      - "%{IPORHOST:source.ip}:%{INT:source.port:int}"
      - "%{IPORHOST:source.ip}"
    ignore_missing: true
    ignore_failure: true

To avoid the ambiguity, we should probably go from general to specific and try the plain IP address first (%{IPORHOST:source.ip}), and then the variants with the port number like special cases:

- grok:
    field: azure.activitylogs.callerIpAddress
    patterns:
      - "%{IPORHOST:source.ip}"
      - \[%{IPORHOST:source.ip}\]:%{INT:source.port:int}
      - "%{IPORHOST:source.ip}:%{INT:source.port:int}"
    ignore_missing: true
    ignore_failure: true

I copied the standard pipeline filebeat-8.2.0-azure-activitylogs-pipeline as filebeat-8.2.0-azure-activitylogs-pipeline-ipv6-testing where I only altered the order of the patterns, moving %{IPORHOST:source.ip} to to top.

CleanShot 2023-03-19 at 09 35 17@2x

Using this pipeline, I can now successfully process the following documents:

POST _ingest/pipeline/filebeat-8.2.0-azure-activitylogs-pipeline-ipv6-testing/_simulate
{
  "docs": [
    {
      "_source": {
        "tags": [],
        "@timestamp": "2022-10-04T13:05:22.643+1300",
        "message": "{\"callerIpAddress\":\"[fd5d:12c9::f]:80\",\"category\":\"Action\",\"correlationId\":\"8a4de8b5-095c-47d0-a96f-a75130c61d53\",\"durationMs\":0,\"identity\":{\"authorization\":{\"action\":\"Microsoft.EventHub/namespaces/authorizationRules/listKeys/action\",\"evidence\":{\"principalId\":\"8a4de8b5-095c-47d0-a96f-a75130c61d53\",\"principalType\":\"ServicePrincipal\",\"role\":\"Azure EventGrid Service BuiltIn Role\",\"roleAssignmentId\":\"8a4de8b5-095c-47d0-a96f-a75130c61d53\",\"roleAssignmentScope\":\"/subscriptions/8a4de8b5-095c-47d0-a96f-a75130c61d53\",\"roleDefinitionId\":\"8a4de8b5-095c-47d0-a96f-a75130c61d53\"},\"scope\":\"/subscriptions/8a4de8b5-095c-47d0-a96f-a75130c61d53/resourceGroups/sa-hem/providers/Microsoft.EventHub/namespaces/azurelsevents/authorizationRules/RootManageSharedAccessKey\"},\"claims\":{\"aio\":\"8a4de8b5-095c-47d0-a96f-a75130c61d53\",\"appid\":\"8a4de8b5-095c-47d0-a96f-a75130c61d53\",\"appidacr\":\"2\",\"aud\":\"https://management.core.windows.net/\",\"exp\":\"1571904826\",\"http://schemas.microsoft.com/identity/claims/identityprovider\":\"https://sts.windows.net/8a4de8b5-095c-47d0-a96f-a75130c61d53/\",\"http://schemas.microsoft.com/identity/claims/objectidentifier\":\"8a4de8b5-095c-47d0-a96f-a75130c61d53\",\"http://schemas.microsoft.com/identity/claims/tenantid\":\"8a4de8b5-095c-47d0-a96f-a75130c61d53\",\"http://schemas.xmlsoap.org/ws/2005/05/identity/claims/nameidentifier\":\"8a4de8b5-095c-47d0-a96f-a75130c61d53\",\"iat\":\"1571875726\",\"iss\":\"https://sts.windows.net/8a4de8b5-095c-47d0-a96f-a75130c61d53/\",\"nbf\":\"1571875726\",\"uti\":\"8a4de8b5-095c-47d0-a96f-a75130c61d53\",\"ver\":\"1.0\"}},\"level\":\"Information\",\"location\":\"global\",\"operationName\":\"MICROSOFT.EVENTHUB/NAMESPACES/AUTHORIZATIONRULES/LISTKEYS/ACTION\",\"resourceId\":\"/SUBSCRIPTIONS/8a4de8b5-095c-47d0-a96f-a75130c61d53/RESOURCEGROUPS/SA-HEMA/PROVIDERS/MICROSOFT.EVENTHUB/NAMESPACES/AZURELSEVENTS/AUTHORIZATIONRULES/ROOTMANAGESHAREDACCESSKEY\",\"resultSignature\":\"Started.\",\"resultType\":\"Start\",\"time\":\"2019-10-24T00:13:46.3554259Z\"}"
      }
    }
  ]
}

POST _ingest/pipeline/filebeat-8.2.0-azure-activitylogs-pipeline-ipv6-testing/_simulate
{
  "docs": [
    {
      "_source": {
        "tags": [],
        "@timestamp": "2022-10-04T13:05:22.643+1300",
        "message": "{\"callerIpAddress\":\"fd5d:12c9::f\",\"category\":\"Action\",\"correlationId\":\"8a4de8b5-095c-47d0-a96f-a75130c61d53\",\"durationMs\":0,\"identity\":{\"authorization\":{\"action\":\"Microsoft.EventHub/namespaces/authorizationRules/listKeys/action\",\"evidence\":{\"principalId\":\"8a4de8b5-095c-47d0-a96f-a75130c61d53\",\"principalType\":\"ServicePrincipal\",\"role\":\"Azure EventGrid Service BuiltIn Role\",\"roleAssignmentId\":\"8a4de8b5-095c-47d0-a96f-a75130c61d53\",\"roleAssignmentScope\":\"/subscriptions/8a4de8b5-095c-47d0-a96f-a75130c61d53\",\"roleDefinitionId\":\"8a4de8b5-095c-47d0-a96f-a75130c61d53\"},\"scope\":\"/subscriptions/8a4de8b5-095c-47d0-a96f-a75130c61d53/resourceGroups/sa-hem/providers/Microsoft.EventHub/namespaces/azurelsevents/authorizationRules/RootManageSharedAccessKey\"},\"claims\":{\"aio\":\"8a4de8b5-095c-47d0-a96f-a75130c61d53\",\"appid\":\"8a4de8b5-095c-47d0-a96f-a75130c61d53\",\"appidacr\":\"2\",\"aud\":\"https://management.core.windows.net/\",\"exp\":\"1571904826\",\"http://schemas.microsoft.com/identity/claims/identityprovider\":\"https://sts.windows.net/8a4de8b5-095c-47d0-a96f-a75130c61d53/\",\"http://schemas.microsoft.com/identity/claims/objectidentifier\":\"8a4de8b5-095c-47d0-a96f-a75130c61d53\",\"http://schemas.microsoft.com/identity/claims/tenantid\":\"8a4de8b5-095c-47d0-a96f-a75130c61d53\",\"http://schemas.xmlsoap.org/ws/2005/05/identity/claims/nameidentifier\":\"8a4de8b5-095c-47d0-a96f-a75130c61d53\",\"iat\":\"1571875726\",\"iss\":\"https://sts.windows.net/8a4de8b5-095c-47d0-a96f-a75130c61d53/\",\"nbf\":\"1571875726\",\"uti\":\"8a4de8b5-095c-47d0-a96f-a75130c61d53\",\"ver\":\"1.0\"}},\"level\":\"Information\",\"location\":\"global\",\"operationName\":\"MICROSOFT.EVENTHUB/NAMESPACES/AUTHORIZATIONRULES/LISTKEYS/ACTION\",\"resourceId\":\"/SUBSCRIPTIONS/8a4de8b5-095c-47d0-a96f-a75130c61d53/RESOURCEGROUPS/SA-HEMA/PROVIDERS/MICROSOFT.EVENTHUB/NAMESPACES/AZURELSEVENTS/AUTHORIZATIONRULES/ROOTMANAGESHAREDACCESSKEY\",\"resultSignature\":\"Started.\",\"resultType\":\"Start\",\"time\":\"2019-10-24T00:13:46.3554259Z\"}"
      }
    }
  ]
}

POST _ingest/pipeline/filebeat-8.2.0-azure-activitylogs-pipeline-ipv6-testing/_simulate
{
  "docs": [
    {
      "_source": {
        "tags": [],
        "@timestamp": "2022-10-04T13:05:22.643+1300",
        "message": "{\"callerIpAddress\":\"fd5d:a2c9::f\",\"category\":\"Action\",\"correlationId\":\"8a4de8b5-095c-47d0-a96f-a75130c61d53\",\"durationMs\":0,\"identity\":{\"authorization\":{\"action\":\"Microsoft.EventHub/namespaces/authorizationRules/listKeys/action\",\"evidence\":{\"principalId\":\"8a4de8b5-095c-47d0-a96f-a75130c61d53\",\"principalType\":\"ServicePrincipal\",\"role\":\"Azure EventGrid Service BuiltIn Role\",\"roleAssignmentId\":\"8a4de8b5-095c-47d0-a96f-a75130c61d53\",\"roleAssignmentScope\":\"/subscriptions/8a4de8b5-095c-47d0-a96f-a75130c61d53\",\"roleDefinitionId\":\"8a4de8b5-095c-47d0-a96f-a75130c61d53\"},\"scope\":\"/subscriptions/8a4de8b5-095c-47d0-a96f-a75130c61d53/resourceGroups/sa-hem/providers/Microsoft.EventHub/namespaces/azurelsevents/authorizationRules/RootManageSharedAccessKey\"},\"claims\":{\"aio\":\"8a4de8b5-095c-47d0-a96f-a75130c61d53\",\"appid\":\"8a4de8b5-095c-47d0-a96f-a75130c61d53\",\"appidacr\":\"2\",\"aud\":\"https://management.core.windows.net/\",\"exp\":\"1571904826\",\"http://schemas.microsoft.com/identity/claims/identityprovider\":\"https://sts.windows.net/8a4de8b5-095c-47d0-a96f-a75130c61d53/\",\"http://schemas.microsoft.com/identity/claims/objectidentifier\":\"8a4de8b5-095c-47d0-a96f-a75130c61d53\",\"http://schemas.microsoft.com/identity/claims/tenantid\":\"8a4de8b5-095c-47d0-a96f-a75130c61d53\",\"http://schemas.xmlsoap.org/ws/2005/05/identity/claims/nameidentifier\":\"8a4de8b5-095c-47d0-a96f-a75130c61d53\",\"iat\":\"1571875726\",\"iss\":\"https://sts.windows.net/8a4de8b5-095c-47d0-a96f-a75130c61d53/\",\"nbf\":\"1571875726\",\"uti\":\"8a4de8b5-095c-47d0-a96f-a75130c61d53\",\"ver\":\"1.0\"}},\"level\":\"Information\",\"location\":\"global\",\"operationName\":\"MICROSOFT.EVENTHUB/NAMESPACES/AUTHORIZATIONRULES/LISTKEYS/ACTION\",\"resourceId\":\"/SUBSCRIPTIONS/8a4de8b5-095c-47d0-a96f-a75130c61d53/RESOURCEGROUPS/SA-HEMA/PROVIDERS/MICROSOFT.EVENTHUB/NAMESPACES/AZURELSEVENTS/AUTHORIZATIONRULES/ROOTMANAGESHAREDACCESSKEY\",\"resultSignature\":\"Started.\",\"resultType\":\"Start\",\"time\":\"2019-10-24T00:13:46.3554259Z\"}"
      }
    }
  ]
}

@zmoog
Copy link
Contributor

zmoog commented Mar 20, 2023

This does not work as I expected.

Given the following grok processor:

- grok:
    field: azure.activitylogs.callerIpAddress
    patterns:
      - "%{IPORHOST:source.ip}"
      - \[%{IPORHOST:source.ip}\]:%{INT:source.port:int}
      - "%{IPORHOST:source.ip}:%{INT:source.port:int}"
    ignore_missing: true
    ignore_failure: true

If I send a document a value like this:

{
  "callerIpAddress": "10.0.4.1:1234"
}

grok will find a match with the first pattern in the list, extracting the IP address only and yielding the following outcome:

{
  "source": {
    "ip": "10.0.4.1"
  }
}

Which is NOT correct since the port number is missing.

Since the port syntax is protocol version specific, maybe we can use the following patterns:

- grok:
    field: azure.activitylogs.callerIpAddress
    patterns:
      - \[%{IPV6:source.ip}\]:%{INT:source.port:int}
      - "%{IPV4:source.ip}:%{INT:source.port:int}"
      - "%{IP:source.ip}$"
    ignore_missing: true
    ignore_failure: true

The source.ip field is mapped as an ip type field so that we can use the IP, IPV4, and IPV6 patterns instead of the IPORHOST pattern.

@zmoog
Copy link
Contributor

zmoog commented Mar 22, 2023

@cFire we also have the option to align the Filebeat behaviour to the Agent, and use the convert processor:

  - convert:
        field: azure.activitylogs.callerIpAddress
        target_field: source.ip
        type: ip
        ignore_missing: true
        on_failure:
        - rename: 
            field: azure.activitylogs.callerIpAddress
            target_field: source.address
            ignore_missing: true 
            ignore_failure: true
  - remove:
      field: azure.activitylogs.callerIpAddress
      if: 'ctx.source?.ip != null'
      ignore_missing: true

In your datasets, do the actual azure.activitylogs.callerIpAddress field values have IP only or IP+port?

@cFire
Copy link
Author

cFire commented Mar 22, 2023

In our datasets I've only seen IPv4 and IPv6 addresses, I've not seen any that have a port or hostname specified. But I have no evidence one way or the other that this is something specific to our logs or if it is something universal.

I do like the idea of using the convert processor since we should be able to reasonably assume it works as intended since it was quite well tested in the agent package. I'll make some time over the weekend or next week to run that in our test setup.

@zmoog
Copy link
Contributor

zmoog commented Mar 23, 2023

Thank you @cFire, let me know how it goes.

I will check other datasets. I am willing to switch to the convert processor and align Beats and Elastic Agent integrations.

@cFire
Copy link
Author

cFire commented Mar 27, 2023

Testing the 'convert' method, the results are quite different:
What works:

  • plain IPv6 address (including case that broke previously)
  • plain IPv4 address

What doesn't work:

  • [IPv6]:port notation; The whole thing is copied to "source.address" rather than being split into IPv6 address and port
  • IPv4:port notation; The whole thing is copied to "source.address" rather than being split into IPv4 address and port
  • hostname; This was previously added to 'source.ip' but is now in 'source.address'.
  • hostname:port; The whole thing is copied to 'source.address' rather than being split into host and port.

The behaviour does not seem absurd to me, especially for hostnames it seems reasonable, but IPv4/6:port all being put in source.address seems like it's incorrect (or maybe rather incomplete) behaviour. I would imagine this will break a lot of queries/detection rules for people if we would copy the convert processor verbatim from the azure package like this.

Copy link
Contributor

mergify bot commented Feb 5, 2024

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @cFire? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v8./d.0 is the label to automatically backport to the 8./d branch. /d is the digit

@cFire
Copy link
Author

cFire commented Feb 24, 2024

In order to un-stall progress on this, I suggest for now we limit the scope of the work to just adding IPv6 parsing to the pipelines. It would be nice to have the same code as the integration package at some point, but for now I would prefer to prioritize fixing the parsing issue with a reasonable guarantee of no unintended consequences.

@cFire cFire marked this pull request as ready for review February 24, 2024 15:36
@cFire cFire requested a review from a team as a code owner February 24, 2024 15:36
Copy link
Contributor

mergify bot commented Dec 26, 2024

backport-8.x has been added to help with the transition to the new branch 8.x.
If you don't need it please use backport-skip label and remove the backport-8.x label.

@mergify mergify bot added the backport-8.x Automated backport to the 8.x branch with mergify label Dec 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-8.x Automated backport to the 8.x branch with mergify bug Team:Cloud-Monitoring Label for the Cloud Monitoring team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow filebeat Azure activitylogs, platformlogs and auditlogs to work with IPv6 addresses
3 participants