Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x-pack/filebeat/input/entityanalytics/provider/activedirectory: don't publish unmodified updates #41179

Merged
merged 9 commits into from
Oct 22, 2024

Conversation

efd6
Copy link
Contributor

@efd6 efd6 commented Oct 9, 2024

Proposed commit message

See title.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Disruptive User Impact

Author's Checklist

  • [ ]

How to test this PR locally

Related issues

Use cases

Screenshots

Logs

@efd6 efd6 added Filebeat Filebeat bugfix Team:Security-Service Integrations Security Service Integrations Team labels Oct 9, 2024
@efd6 efd6 self-assigned this Oct 9, 2024
@botelastic botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Oct 9, 2024
Copy link
Contributor

mergify bot commented Oct 9, 2024

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @efd6? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit

Copy link
Contributor

mergify bot commented Oct 9, 2024

backport-8.x has been added to help with the transition to the new branch 8.x.
If you don't need it please use backport-skip label and remove the backport-8.x label.

@mergify mergify bot added the backport-8.x Automated backport to the 8.x branch with mergify label Oct 9, 2024
@efd6 efd6 added the backport-8.15 Automated backport to the 8.15 branch with mergify label Oct 9, 2024
@efd6 efd6 marked this pull request as ready for review October 9, 2024 05:18
@efd6 efd6 requested a review from a team as a code owner October 9, 2024 05:18
@elasticmachine
Copy link
Collaborator

Pinging @elastic/security-service-integrations (Team:Security-Service Integrations)

Copy link
Contributor

mergify bot commented Oct 11, 2024

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b non-mod-entities upstream/non-mod-entities
git merge upstream/main
git push upstream non-mod-entities

Copy link
Contributor

@chrisberkhout chrisberkhout left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this to avoid duplicates or to avoid redundant work?

Does this mean that a computer added after the last full sync won't be ingested until the next full sync?

@efd6
Copy link
Contributor Author

efd6 commented Oct 14, 2024

Is this to avoid duplicates or to avoid redundant work?

To avoid redundant work.

Does this mean that a computer added after the last full sync won't be ingested until the next full sync?

No, it should be added with an update. I will reassess.

@efd6
Copy link
Contributor Author

efd6 commented Oct 14, 2024

/test

Copy link
Contributor

@chrisberkhout chrisberkhout left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed elsewhere, the way forward here may be to:

  • do all the additions from incremental changes
  • do the deletes only on full sync

Redundant additions could be skipped during full sync but I think it's better to do them anyway, as a way to periodically recover from missed updates.

p.logger.Errorw("Error running incremental update", "error", err)
p.metrics.updateError.Inc()
}
p.metrics.updateTotal.Inc()
p.metrics.updateProcessingTime.Update(time.Since(start).Nanoseconds())
updateTimer.Reset(p.cfg.UpdateInterval)
p.logger.Debugf("Next update expected at: %v", time.Now().Add(p.cfg.UpdateInterval))
last = start
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If time was used I think it would be better to get it from state.whenChanged in order to minimize issues from disagreements between client and server time (specifically, when the server doesn't provide all data up to the time of a request).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a little more subtle than that, but I think I have got what you want in my next change.

@efd6
Copy link
Contributor Author

efd6 commented Oct 16, 2024

@chrisberkhout PTAL Requires careful review.

Copy link
Contributor

@chrisberkhout chrisberkhout left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Activedirectory could use a final tweak of the time value.

Jamf I'm not sure whether the changes are necessary. Also it seems to not handle deletes.

@@ -350,7 +351,7 @@ func (p *jamfInput) runFullSync(inputCtx v2.Context, store *kvstore.Store, clien
// runIncrementalUpdate will run an incremental update. The process is similar
// to full synchronization, except only users which have changed (newly
// discovered, modified, or deleted) will be published.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only users which have changed (newly discovered, modified, or deleted) will be published

I don't see deletes being handled anywhere.

In doFetchComputers it will always fetch everything and return whatever is new or modified compared to the store (unless its a fullSync, in which case it returns nothing).

I don't see any mention of deletes in the GET /api/preview/computers documentation.

@efd6 efd6 force-pushed the non-mod-entities branch 2 times, most recently from ddd9e66 to 9e7674c Compare October 17, 2024 22:40
@efd6 efd6 changed the title x-pack/filebeat/input/entityanalytics/provider/{jamf,activedirectory}: don't publish unmodified updates x-pack/filebeat/input/entityanalytics/provider/activedirectory: don't publish unmodified updates Oct 17, 2024
Copy link
Contributor

mergify bot commented Oct 18, 2024

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b non-mod-entities upstream/non-mod-entities
git merge upstream/main
git push upstream non-mod-entities

This leaves a small window when it is possible to send redundant data,
but it is simpler and safer than tightening the bounds.
Only mark users as deleted on the basis of full sync since this is the only
time that we can know that a user is absent. Also remove users from the store
on close if they have been marked as deleted.
Copy link
Contributor

@chrisberkhout chrisberkhout left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good 👍

There's a duplicate line in the changelog.

Also the PR description and changelog could say more clearly what the changed behavior is:

  • I think "don't publish unmodified updates" is only happening in that the timestamp handling is improved. It's not actually skipping publication of things received from the server but already known (because the incremental sync doesn't get them and the full sync should to do a full publish anyway).
  • The other, more important change is that the old version was deleting anything that didn't appear in the new incremental sync response, and now it will only do the deletions during the full sync and only of things that the server actually doesn't have anymore.

@@ -164,6 +164,8 @@ https://github.com/elastic/beats/compare/v8.8.1\...main[Check the HEAD diff]
- Fixed failed job handling and removed false-positive error logs in the GCS input. {pull}41142[41142]
- Bump github.com/elastic/go-sfdc dependency used by x-pack/filebeat/input/salesforce. {pull}41192[41192]
- Log bad handshake details when websocket connection fails {pull}41300[41300]
- Don't send redundant documents for non-modified entities in the Jamf and Active Directory entityanalytics input. {pull}41179[41179]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Don't send redundant documents for non-modified entities in the Jamf and Active Directory entityanalytics input. {pull}41179[41179]

@efd6 efd6 merged commit 6b54074 into elastic:main Oct 22, 2024
22 checks passed
mergify bot pushed a commit that referenced this pull request Oct 22, 2024
…ve modification time and deletion logic (#41179)

This improves the update time stamps of modified events by using the
documents' whenChanged fields in the case of returned documents, and the
current time when a document is identified as having been deleted. The latest
of these is used to determine the time filter for the next Active Directory
query.

Documents are marked as deleted only when they are found to not exist in full
sync collection, and are removed from the state store when they are
identified as deleted. The change in behaviour to not use updates to identify
corrects behaviour that would cause older but not deleted entities to be
deleted from the index.

(cherry picked from commit 6b54074)

# Conflicts:
#	x-pack/filebeat/input/entityanalytics/provider/activedirectory/activedirectory.go
mergify bot pushed a commit that referenced this pull request Oct 22, 2024
…ve modification time and deletion logic (#41179)

This improves the update time stamps of modified events by using the
documents' whenChanged fields in the case of returned documents, and the
current time when a document is identified as having been deleted. The latest
of these is used to determine the time filter for the next Active Directory
query.

Documents are marked as deleted only when they are found to not exist in full
sync collection, and are removed from the state store when they are
identified as deleted. The change in behaviour to not use updates to identify
corrects behaviour that would cause older but not deleted entities to be
deleted from the index.

(cherry picked from commit 6b54074)
efd6 added a commit that referenced this pull request Oct 23, 2024
…/activedirectory: don't publish unmodified updates (#41386)

* x-pack/filebeat/input/entityanalytics/provider/activedirectory: improve modification time and deletion logic (#41179)

This improves the update time stamps of modified events by using the
documents' whenChanged fields in the case of returned documents, and the
current time when a document is identified as having been deleted. The latest
of these is used to determine the time filter for the next Active Directory
query.

Documents are marked as deleted only when they are found to not exist in full
sync collection, and are removed from the state store when they are
identified as deleted. The change in behaviour to not use updates to identify
corrects behaviour that would cause older but not deleted entities to be
deleted from the index.

(cherry picked from commit 6b54074)

* remove irrelevant changelog entry

---------

Co-authored-by: Dan Kortschak <[email protected]>
efd6 added a commit that referenced this pull request Oct 23, 2024
…ve modification time and deletion logic (#41179)

This improves the update time stamps of modified events by using the
documents' whenChanged fields in the case of returned documents, and the
current time when a document is identified as having been deleted. The latest
of these is used to determine the time filter for the next Active Directory
query.

Documents are marked as deleted only when they are found to not exist in full
sync collection, and are removed from the state store when they are
identified as deleted. The change in behaviour to not use updates to identify
corrects behaviour that would cause older but not deleted entities to be
deleted from the index.

(cherry picked from commit 6b54074)

# Conflicts:
#	x-pack/filebeat/input/entityanalytics/provider/activedirectory/activedirectory.go
efd6 added a commit that referenced this pull request Oct 23, 2024
…r/activedirectory: improve modification time and deletion logic (#41385)

* x-pack/filebeat/input/entityanalytics/provider/activedirectory: improve modification time and deletion logic (#41179)

This improves the update time stamps of modified events by using the
documents' whenChanged fields in the case of returned documents, and the
current time when a document is identified as having been deleted. The latest
of these is used to determine the time filter for the next Active Directory
query.

Documents are marked as deleted only when they are found to not exist in full
sync collection, and are removed from the state store when they are
identified as deleted. The change in behaviour to not use updates to identify
corrects behaviour that would cause older but not deleted entities to be
deleted from the index.

(cherry picked from commit 6b54074)

# Conflicts:
#	x-pack/filebeat/input/entityanalytics/provider/activedirectory/activedirectory.go

* fix cherry pick failures

---------

Co-authored-by: Dan Kortschak <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-8.x Automated backport to the 8.x branch with mergify backport-8.15 Automated backport to the 8.15 branch with mergify bugfix Filebeat Filebeat Team:Security-Service Integrations Security Service Integrations Team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants