-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft: Allow parsing of IPv6 addresses in ingest pipeline #34387
base: main
Are you sure you want to change the base?
Conversation
This adds support for parsing IPv6 addresses in the filebeat Azure activitylogs, auditlogs, and platformlogs ingest pipelines.
💚 CLA has been signed |
This pull request does not have a backport label.
To fixup this pull request, you need to add the backport labels for the needed
|
❕ Build Aborted
Expand to view the summary
Build stats
Steps errors
Expand to view the steps failures
|
@@ -39,6 +39,7 @@ processors: | |||
- grok: | |||
field: azure.activitylogs.callerIpAddress | |||
patterns: | |||
- "%{IPV6:source.ip}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember a previous PR where other people switched from the grok
processor to the covert
processor with a type: ip
.
For example, https://github.com/elastic/integrations/pull/3411/files.
This looks like the recommended approach. Can you test if it also works for your use case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like we might be able to just port those same changes from the elastic package to these pipelines. I think the behavior will slightly change since it can't be hostnames or ip:port or host:port pairs anymore, but as far as I'm aware this never happens in Azure logging, but it would be nice if I can find some documentation about that rather than assuming this is the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean some documentation from the Azure side about the expected formats for these with IP addresses and ports?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @cFire, I am sorry. I see these Azure logs are using a variety of formats for the IP, so grok is better suited for this task.
Let me double-check some more logs samples to see if this is the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am looking for some test documents because, according to Grok processor reusable patterns, the pattern named IPORHOST
should already support IPv6.
Hey @cFire, could you add a test document for each data stream in the This directory contains some file pairs ( For example, at https://github.com/cFire/beats/tree/34277-azure-ipv6/x-pack/filebeat/module/azure/activitylogs/test you can see that:
Having real test documents to test against the expected results is important. |
Certainly. I had it in my schedule for sometime this week but I'll see if I can get it done later today. |
@zmoog I have a test case I want to try locally to make sure it works, but I can't seem to find the way to run these tests. None of the make/mage commands I've found seem to run these tests. Perhaps you have any hints on where the documentation is that would explain how to run them? |
Hey @cFire, I can't find the docs that explain how to run the ingest pipelines. So I created a gist to capture what I know so far: https://gist.github.com/zmoog/cf50d14416f5c732656fb2f41a1e7acf I'll open a PR to update the Beats docs if this information is unavailable. Let me know if it works for you, here or on the gist. |
Thanks for the info! I took a quick glance a the gist, it looks like that'll do it. I'll see if I can find some time in the workday tomorrow to give it a shot. |
It seems I'm still missing a piece. Running with Without the Just to check: I'm using the ubuntu2004 vagrant box. Should I be using a different one maybe? |
I am on macOS, but I can try it on a ubuntu2004 virtual machine. In the meantime, can you add the source document to the PR? I will run the tests and generate the expected documents. |
This adds support for parsing IPv6 addresses in the filebeat Azure activitylogs, auditlogs, and platformlogs ingest pipelines.
50d1d05
to
21d1eed
Compare
Just as a status update so you know it's still in progress: Source documents are coming. Since they're samples from a live environment they have to be anonymized and double-checked etc. etc. |
Great thanks! And yes, please double-check that we are not leaking any private details. |
The test cases have been added. The audit- and signinlogs are anonymized samples from our own infrastructure. For the activitylog sample we re-used the existing test case and replaced the IPv4 address with an IPv6 address. |
Hey @cFire, my apologies for the delay! 🙇 I am resuming the review of this PR. I remember checking the grok patterns for the IP addresses like So am running a test using one of the sample documents you added in the last two commits with:
$ pbpaste | jq
{
"callerIpAddress": "2603:c022:c000:301:41db:be0e:16ba:b4d7",
"category": "Action",
"correlationId": "8a4de8b5-095c-47d0-a96f-a75130c61d53",
"durationMs": 0,
"identity": {
"authorization": {
"action": "Microsoft.EventHub/namespaces/authorizationRules/listKeys/action",
"evidence": {
"principalId": "8a4de8b5-095c-47d0-a96f-a75130c61d53",
"principalType": "ServicePrincipal",
"role": "Azure EventGrid Service BuiltIn Role",
"roleAssignmentId": "8a4de8b5-095c-47d0-a96f-a75130c61d53",
"roleAssignmentScope": "/subscriptions/8a4de8b5-095c-47d0-a96f-a75130c61d53",
"roleDefinitionId": "8a4de8b5-095c-47d0-a96f-a75130c61d53"
},
"scope": "/subscriptions/8a4de8b5-095c-47d0-a96f-a75130c61d53/resourceGroups/sa-hem/providers/Microsoft.EventHub/namespaces/azurelsevents/authorizationRules/RootManageSharedAccessKey"
},
"claims": {
"aio": "8a4de8b5-095c-47d0-a96f-a75130c61d53",
"appid": "8a4de8b5-095c-47d0-a96f-a75130c61d53",
"appidacr": "2",
"aud": "https://management.core.windows.net/",
"exp": "1571904826",
"http://schemas.microsoft.com/identity/claims/identityprovider": "https://sts.windows.net/8a4de8b5-095c-47d0-a96f-a75130c61d53/",
"http://schemas.microsoft.com/identity/claims/objectidentifier": "8a4de8b5-095c-47d0-a96f-a75130c61d53",
"http://schemas.microsoft.com/identity/claims/tenantid": "8a4de8b5-095c-47d0-a96f-a75130c61d53",
"http://schemas.xmlsoap.org/ws/2005/05/identity/claims/nameidentifier": "8a4de8b5-095c-47d0-a96f-a75130c61d53",
"iat": "1571875726",
"iss": "https://sts.windows.net/8a4de8b5-095c-47d0-a96f-a75130c61d53/",
"nbf": "1571875726",
"uti": "8a4de8b5-095c-47d0-a96f-a75130c61d53",
"ver": "1.0"
}
},
"level": "Information",
"location": "global",
"operationName": "MICROSOFT.EVENTHUB/NAMESPACES/AUTHORIZATIONRULES/LISTKEYS/ACTION",
"resourceId": "/SUBSCRIPTIONS/8a4de8b5-095c-47d0-a96f-a75130c61d53/RESOURCEGROUPS/SA-HEMA/PROVIDERS/MICROSOFT.EVENTHUB/NAMESPACES/AZURELSEVENTS/AUTHORIZATIONRULES/ROOTMANAGESHAREDACCESSKEY",
"resultSignature": "Started.",
"resultType": "Start",
"time": "2019-10-24T00:13:46.3554259Z"
} I'm turning the JSON document into a string: $pbpaste | jq 'tojson'
"{\"callerIpAddress\":\"2603:c022:c000:301:41db:be0e:16ba:b4d7\",\"category\":\"Action\",\"correlationId\":\"8a4de8b5-095c-47d0-a96f-a75130c61d53\",\"durationMs\":0,\"identity\":{\"authorization\":{\"action\":\"Microsoft.EventHub/namespaces/authorizationRules/listKeys/action\",\"evidence\":{\"principalId\":\"8a4de8b5-095c-47d0-a96f-a75130c61d53\",\"principalType\":\"ServicePrincipal\",\"role\":\"Azure EventGrid Service BuiltIn Role\",\"roleAssignmentId\":\"8a4de8b5-095c-47d0-a96f-a75130c61d53\",\"roleAssignmentScope\":\"/subscriptions/8a4de8b5-095c-47d0-a96f-a75130c61d53\",\"roleDefinitionId\":\"8a4de8b5-095c-47d0-a96f-a75130c61d53\"},\"scope\":\"/subscriptions/8a4de8b5-095c-47d0-a96f-a75130c61d53/resourceGroups/sa-hem/providers/Microsoft.EventHub/namespaces/azurelsevents/authorizationRules/RootManageSharedAccessKey\"},\"claims\":{\"aio\":\"8a4de8b5-095c-47d0-a96f-a75130c61d53\",\"appid\":\"8a4de8b5-095c-47d0-a96f-a75130c61d53\",\"appidacr\":\"2\",\"aud\":\"https://management.core.windows.net/\",\"exp\":\"1571904826\",\"http://schemas.microsoft.com/identity/claims/identityprovider\":\"https://sts.windows.net/8a4de8b5-095c-47d0-a96f-a75130c61d53/\",\"http://schemas.microsoft.com/identity/claims/objectidentifier\":\"8a4de8b5-095c-47d0-a96f-a75130c61d53\",\"http://schemas.microsoft.com/identity/claims/tenantid\":\"8a4de8b5-095c-47d0-a96f-a75130c61d53\",\"http://schemas.xmlsoap.org/ws/2005/05/identity/claims/nameidentifier\":\"8a4de8b5-095c-47d0-a96f-a75130c61d53\",\"iat\":\"1571875726\",\"iss\":\"https://sts.windows.net/8a4de8b5-095c-47d0-a96f-a75130c61d53/\",\"nbf\":\"1571875726\",\"uti\":\"8a4de8b5-095c-47d0-a96f-a75130c61d53\",\"ver\":\"1.0\"}},\"level\":\"Information\",\"location\":\"global\",\"operationName\":\"MICROSOFT.EVENTHUB/NAMESPACES/AUTHORIZATIONRULES/LISTKEYS/ACTION\",\"resourceId\":\"/SUBSCRIPTIONS/8a4de8b5-095c-47d0-a96f-a75130c61d53/RESOURCEGROUPS/SA-HEMA/PROVIDERS/MICROSOFT.EVENTHUB/NAMESPACES/AZURELSEVENTS/AUTHORIZATIONRULES/ROOTMANAGESHAREDACCESSKEY\",\"resultSignature\":\"Started.\",\"resultType\":\"Start\",\"time\":\"2019-10-24T00:13:46.3554259Z\"}" Now it's ready for the dev tools. I am now simulating the activity logs pipeline using the Dev Tools in Kibana using the most recent version (1.5.11) of the integration:
This is the output: {
"docs": [
{
"doc": {
"_index": "_index",
"_id": "_id",
"_version": "-3",
"_source": {
"geo": {
"continent_name": "Europe",
"region_iso_code": "NL-NH",
"city_name": "Amsterdam",
"country_iso_code": "NL",
"country_name": "Netherlands",
"region_name": "North Holland",
"location": {
"lon": 4.9392,
"lat": 52.352
}
},
"cloud": {
"provider": "azure"
},
"@timestamp": "2019-10-24T00:13:46.355Z",
"ecs": {
"version": "8.0.0"
},
"related": {
"ip": [
"2603:c022:c000:301:41db:be0e:16ba:b4d7"
]
},
"log": {
"level": "Information"
},
"client": {
"ip": "2603:c022:c000:301:41db:be0e:16ba:b4d7"
},
"source": {
"geo": {
"continent_name": "Europe",
"region_iso_code": "NL-NH",
"city_name": "Amsterdam",
"country_iso_code": "NL",
"country_name": "Netherlands",
"region_name": "North Holland",
"location": {
"lon": 4.9392,
"lat": 52.352
}
},
"as": {
"number": 31898,
"organization": {
"name": "ORACLE-BMC-31898"
}
},
"ip": "2603:c022:c000:301:41db:be0e:16ba:b4d7"
},
"event": {
"duration": 0,
"kind": "event",
"action": "MICROSOFT.EVENTHUB/NAMESPACES/AUTHORIZATIONRULES/LISTKEYS/ACTION",
"type": [
"change"
]
},
"tags": [],
"azure": {
"subscription_id": "8a4de8b5-095c-47d0-a96f-a75130c61d53",
"resource": {
"provider": "MICROSOFT.EVENTHUB",
"namespace": "AZURELSEVENTS",
"id": "/SUBSCRIPTIONS/8a4de8b5-095c-47d0-a96f-a75130c61d53/RESOURCEGROUPS/SA-HEMA/PROVIDERS/MICROSOFT.EVENTHUB/NAMESPACES/AZURELSEVENTS/AUTHORIZATIONRULES/ROOTMANAGESHAREDACCESSKEY",
"authorization_rule": "ROOTMANAGESHAREDACCESSKEY",
"group": "SA-HEMA"
},
"correlation_id": "8a4de8b5-095c-47d0-a96f-a75130c61d53",
"activitylogs": {
"operation_name": "MICROSOFT.EVENTHUB/NAMESPACES/AUTHORIZATIONRULES/LISTKEYS/ACTION",
"result_type": "Start",
"identity": {
"authorization": {
"evidence": {
"role_definition_id": "8a4de8b5-095c-47d0-a96f-a75130c61d53",
"role": "Azure EventGrid Service BuiltIn Role",
"role_assignment_scope": "/subscriptions/8a4de8b5-095c-47d0-a96f-a75130c61d53",
"role_assignment_id": "8a4de8b5-095c-47d0-a96f-a75130c61d53",
"principal_type": "ServicePrincipal",
"principal_id": "8a4de8b5-095c-47d0-a96f-a75130c61d53"
},
"scope": "/subscriptions/8a4de8b5-095c-47d0-a96f-a75130c61d53/resourceGroups/sa-hem/providers/Microsoft.EventHub/namespaces/azurelsevents/authorizationRules/RootManageSharedAccessKey",
"action": "Microsoft.EventHub/namespaces/authorizationRules/listKeys/action"
},
"claims": {
"ver": "1.0",
"aio": "8a4de8b5-095c-47d0-a96f-a75130c61d53",
"iss": "https://sts.windows.net/8a4de8b5-095c-47d0-a96f-a75130c61d53/",
"uti": "8a4de8b5-095c-47d0-a96f-a75130c61d53",
"http://schemas_microsoft_com/identity/claims/identityprovider": "https://sts.windows.net/8a4de8b5-095c-47d0-a96f-a75130c61d53/",
"http://schemas_xmlsoap_org/ws/2005/05/identity/claims/nameidentifier": "8a4de8b5-095c-47d0-a96f-a75130c61d53",
"aud": "https://management.core.windows.net/",
"http://schemas_microsoft_com/identity/claims/tenantid": "8a4de8b5-095c-47d0-a96f-a75130c61d53",
"nbf": "1571875726",
"appidacr": "2",
"appid": "8a4de8b5-095c-47d0-a96f-a75130c61d53",
"exp": "1571904826",
"iat": "1571875726",
"http://schemas_microsoft_com/identity/claims/objectidentifier": "8a4de8b5-095c-47d0-a96f-a75130c61d53"
}
},
"category": "Action",
"event_category": "Administrative",
"result_signature": "Started."
}
}
},
"_ingest": {
"timestamp": "2023-03-14T21:36:02.158240685Z"
}
}
}
]
} The IPv6 addresses seem handled correctly, with no need of additional care. @cFire, are you getting a different result in your cluster? |
@zmoog If you are running the pipeline from the integration package, this was indeed already fixed. The pipeline filebeat pushes is slightly different and does break. (I'll re-test on a fresh dev setup with the latest versions probably today or tomorrow to confirm.) |
I've made a very relevant observation just now, it only breaks if second chunk of the IPv6 address does not contain any a-f chars. "2603:c022:c000:301:41db:be0e:16ba:b4d7" works fine, but "2603:1022:c000:301:41db:be0e:16ba:b4d7" breaks. ('c' -> '1' in the second part of the IPv6 address.) In that second case it throws the error In retrospect it makes some sense that this is the case where it could confuse an IPv6 address with IPORHOST:PORT_NUMBER. |
Side note: I also just noticed I added a test-case for signinlogs instead of platformlogs. I'll leave the signinlogs test case since it's good to have an IPv6 test case in there too and I'll still add one for platformlogs. |
Some further testing produced two simple test cases: It seems if the first character after the first |
Hey @cFire, apologies for this error; you are right. I assumed the two pipelines were identical, but as you already noticed, it's not the case. Thank you so much for investing the time to investigate this issue and nail down two simple cases. It seems the By looking at https://github.com/elastic/beats/blob/main/x-pack/filebeat/module/azure/activitylogs/ingest/pipeline.yml#L39-L46 I see the patterns with the ports are tried first, and then the plain - grok:
field: azure.activitylogs.callerIpAddress
patterns:
- \[%{IPORHOST:source.ip}\]:%{INT:source.port:int}
- "%{IPORHOST:source.ip}:%{INT:source.port:int}"
- "%{IPORHOST:source.ip}"
ignore_missing: true
ignore_failure: true To avoid the ambiguity, we should probably go from general to specific and try the plain IP address first ( - grok:
field: azure.activitylogs.callerIpAddress
patterns:
- "%{IPORHOST:source.ip}"
- \[%{IPORHOST:source.ip}\]:%{INT:source.port:int}
- "%{IPORHOST:source.ip}:%{INT:source.port:int}"
ignore_missing: true
ignore_failure: true I copied the standard pipeline Using this pipeline, I can now successfully process the following documents:
|
This does not work as I expected. Given the following grok processor: - grok:
field: azure.activitylogs.callerIpAddress
patterns:
- "%{IPORHOST:source.ip}"
- \[%{IPORHOST:source.ip}\]:%{INT:source.port:int}
- "%{IPORHOST:source.ip}:%{INT:source.port:int}"
ignore_missing: true
ignore_failure: true If I send a document a value like this: {
"callerIpAddress": "10.0.4.1:1234"
} grok will find a match with the first pattern in the list, extracting the IP address only and yielding the following outcome: {
"source": {
"ip": "10.0.4.1"
}
} Which is NOT correct since the port number is missing. Since the port syntax is protocol version specific, maybe we can use the following patterns: - grok:
field: azure.activitylogs.callerIpAddress
patterns:
- \[%{IPV6:source.ip}\]:%{INT:source.port:int}
- "%{IPV4:source.ip}:%{INT:source.port:int}"
- "%{IP:source.ip}$"
ignore_missing: true
ignore_failure: true The |
@cFire we also have the option to align the Filebeat behaviour to the Agent, and use the - convert:
field: azure.activitylogs.callerIpAddress
target_field: source.ip
type: ip
ignore_missing: true
on_failure:
- rename:
field: azure.activitylogs.callerIpAddress
target_field: source.address
ignore_missing: true
ignore_failure: true
- remove:
field: azure.activitylogs.callerIpAddress
if: 'ctx.source?.ip != null'
ignore_missing: true In your datasets, do the actual |
In our datasets I've only seen IPv4 and IPv6 addresses, I've not seen any that have a port or hostname specified. But I have no evidence one way or the other that this is something specific to our logs or if it is something universal. I do like the idea of using the convert processor since we should be able to reasonably assume it works as intended since it was quite well tested in the agent package. I'll make some time over the weekend or next week to run that in our test setup. |
Thank you @cFire, let me know how it goes. I will check other datasets. I am willing to switch to the |
Testing the 'convert' method, the results are quite different:
What doesn't work:
The behaviour does not seem absurd to me, especially for hostnames it seems reasonable, but IPv4/6:port all being put in source.address seems like it's incorrect (or maybe rather incomplete) behaviour. I would imagine this will break a lot of queries/detection rules for people if we would copy the convert processor verbatim from the azure package like this. |
This pull request does not have a backport label.
To fixup this pull request, you need to add the backport labels for the needed
|
In order to un-stall progress on this, I suggest for now we limit the scope of the work to just adding IPv6 parsing to the pipelines. It would be nice to have the same code as the integration package at some point, but for now I would prefer to prioritize fixing the parsing issue with a reasonable guarantee of no unintended consequences. |
|
@zmoog: Draft PR for #34277. Will add test cases soon.
What does this PR do?
Adds support for parsing IPv6 addresses in the filebeat Azure activitylogs, auditlogs, and platformlogs ingest pipelines.
Why is it important?
Currently any logs from these Azure log source which have an IPv6 address as the source are not ingested into elasticsearch because the ingest pipeline throws an error when attempting to ingest these logs.
Checklist
- [ ] I have commented my code, particularly in hard-to-understand areas- [ ] I have made corresponding change to the default configuration filesCHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.How to test this PR locally
No easy test yet, test cases still need to be added
Related issues