-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added support for data stream overrides for specific integrations based on a defined list #1913
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to investigate this more, it is unexpected to need to add a exception for a package.
@@ -122,6 +122,9 @@ var ( | |||
}, | |||
}, | |||
} | |||
dataStreamOverrides = map[string]bool{ | |||
"amazon_security_lake": true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't add exceptions to specific packages in elastic-package. We should have generic mechanisms so this is available for any package that supports it.
On this specific case, we have code to support routing rules, as well as custom datasets. If this is not enough for this package we should check why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jsoriano the reroute was added for input packages, not integration packages if I recall.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The examples and the test package in #1372 are integration packages.
@@ -937,7 +940,7 @@ func (r *tester) prepareScenario(ctx context.Context, config *testConfig, svcInf | |||
|
|||
// Input packages can set `data_stream.dataset` by convention to customize the dataset. | |||
dataStreamDataset := ds.Inputs[0].Streams[0].DataStream.Dataset | |||
if scenario.pkgManifest.Type == "input" { | |||
if scenario.pkgManifest.Type == "input" || dataStreamOverrides[scenario.pkgManifest.Name] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should do this in all cases and not only for input packages 🤔 Though integration packages should not need to configure datasets.
Or maybe this amazon_security_lake
package should be an input package if it is expected to be used with custom datasets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding this for all integration packages seems unnecessary and might cause unexpected results in tests. The thing is with this specific integration, there's a central data stream that runs the tests, but then reroutes the results to a separate data stream. This reroute mechanism causes the system test to fail, because elastic-package always checks for hits in the data stream the test is running. In future other integration packages might also do this, hence why I created this override list.
This also cannot be an input package cause it has multiple data streams and ingest pipelines have a lot of mapping logic. Generally input packages just consume raw inputs in a single generic data stream and dump them into elastic search.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can add support for a new package spec variable, whose presence could trigger this ? Then we can remove the list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thing is with this specific integration, there's a central data stream that runs the tests, but then reroutes the results to a separate data stream.
I see we have some packages with routing rules, but don't have system tests, it would be good to identify what is missing and fix it so all these packages can have system tests.
On the other hand I see that the entityanalytics_entra_id
package has routing rules and system tests. Is it doing some trick to workaround the current limitations? Or is this package doing something we could apply in other cases?
Maybe we can add support for a new package spec variable, whose presence could trigger this ? Then we can remove the list.
Yes, I would prefer this before adding a list with exceptions, but lets try to identify first what is missing, or what other packages are doing to have system tests with routing rules.
I think it would be good to open an issue identifying the problem, and then we could discuss about possible solutions. @ShourieG would this work for you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jsoriano I saw the entityanalytics_entra_id integration, and I don't think the systems tests over there are working properly, if you see the sample_event.json, it only contains agent data and does not contain any fields under the entityanalytics_entra_id namespace as per the field mappings. It also seems that the entityanalytics input has some internal mechanism to publish logs to all datasets which is not supported by the aws input, hence why a sample_event.json is even being generated in the first place even though the contents are incorrect.
💚 Build Succeeded
|
created a new issue as suggested here : #1917 |
Some integrations like amazon security lake use routing rules to route results to a different data stream from a central data stream. Having such a mechanism the standard system tests fail since it always searches for hits in the central data stream. We need a way to dynamically instruct elastic-package to look for hits in a specific data stream during system tests.
A similar functionally was recently added with a recent PR here. This is just a small iteration on top of that change to allow elastic-package to honour dynamically defined data streams for select integration packages.
NOTE: I have tested the functionally locally.