Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support for data stream overrides for specific integrations based on a defined list #1913

Closed
wants to merge 1 commit into from

Conversation

ShourieG
Copy link

@ShourieG ShourieG commented Jun 18, 2024

Some integrations like amazon security lake use routing rules to route results to a different data stream from a central data stream. Having such a mechanism the standard system tests fail since it always searches for hits in the central data stream. We need a way to dynamically instruct elastic-package to look for hits in a specific data stream during system tests.
A similar functionally was recently added with a recent PR here. This is just a small iteration on top of that change to allow elastic-package to honour dynamically defined data streams for select integration packages.

NOTE: I have tested the functionally locally.

@ShourieG ShourieG requested a review from jsoriano June 18, 2024 11:22
Copy link
Member

@jsoriano jsoriano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to investigate this more, it is unexpected to need to add a exception for a package.

@@ -122,6 +122,9 @@ var (
},
},
}
dataStreamOverrides = map[string]bool{
"amazon_security_lake": true,
Copy link
Member

@jsoriano jsoriano Jun 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't add exceptions to specific packages in elastic-package. We should have generic mechanisms so this is available for any package that supports it.

On this specific case, we have code to support routing rules, as well as custom datasets. If this is not enough for this package we should check why.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Support for reroute was added in #1372 and #1391.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jsoriano the reroute was added for input packages, not integration packages if I recall.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The examples and the test package in #1372 are integration packages.

@@ -937,7 +940,7 @@ func (r *tester) prepareScenario(ctx context.Context, config *testConfig, svcInf

// Input packages can set `data_stream.dataset` by convention to customize the dataset.
dataStreamDataset := ds.Inputs[0].Streams[0].DataStream.Dataset
if scenario.pkgManifest.Type == "input" {
if scenario.pkgManifest.Type == "input" || dataStreamOverrides[scenario.pkgManifest.Name] {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should do this in all cases and not only for input packages 🤔 Though integration packages should not need to configure datasets.

Or maybe this amazon_security_lake package should be an input package if it is expected to be used with custom datasets?

Copy link
Author

@ShourieG ShourieG Jun 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding this for all integration packages seems unnecessary and might cause unexpected results in tests. The thing is with this specific integration, there's a central data stream that runs the tests, but then reroutes the results to a separate data stream. This reroute mechanism causes the system test to fail, because elastic-package always checks for hits in the data stream the test is running. In future other integration packages might also do this, hence why I created this override list.

This also cannot be an input package cause it has multiple data streams and ingest pipelines have a lot of mapping logic. Generally input packages just consume raw inputs in a single generic data stream and dump them into elastic search.

Copy link
Author

@ShourieG ShourieG Jun 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can add support for a new package spec variable, whose presence could trigger this ? Then we can remove the list.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing is with this specific integration, there's a central data stream that runs the tests, but then reroutes the results to a separate data stream.

I see we have some packages with routing rules, but don't have system tests, it would be good to identify what is missing and fix it so all these packages can have system tests.

On the other hand I see that the entityanalytics_entra_id package has routing rules and system tests. Is it doing some trick to workaround the current limitations? Or is this package doing something we could apply in other cases?

Maybe we can add support for a new package spec variable, whose presence could trigger this ? Then we can remove the list.

Yes, I would prefer this before adding a list with exceptions, but lets try to identify first what is missing, or what other packages are doing to have system tests with routing rules.

I think it would be good to open an issue identifying the problem, and then we could discuss about possible solutions. @ShourieG would this work for you?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jsoriano I saw the entityanalytics_entra_id integration, and I don't think the systems tests over there are working properly, if you see the sample_event.json, it only contains agent data and does not contain any fields under the entityanalytics_entra_id namespace as per the field mappings. It also seems that the entityanalytics input has some internal mechanism to publish logs to all datasets which is not supported by the aws input, hence why a sample_event.json is even being generated in the first place even though the contents are incorrect.

@elasticmachine
Copy link
Collaborator

💚 Build Succeeded

@ShourieG
Copy link
Author

created a new issue as suggested here : #1917

@ShourieG ShourieG closed this Aug 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants