Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert "[Service Runtime] Add ability for service operations to be retried (#2889)" #3092

Merged
merged 2 commits into from
Jul 18, 2023

Conversation

ycombinator
Copy link
Contributor

@ycombinator ycombinator commented Jul 17, 2023

What does this PR do?

This PR reverts #2889.

Why is it important?

The changes introduced in #2889 introduced a critical bug: #3077. While we have a fix for that particular bug (#3086), the fundamental nature of the change introduced in #2889 — changing previously synchronous code paths for service install and uninstall operations to asynchronous — has the potential for introducing other unforeseen bugs in these critical operations.

As mentioned in #3086 (comment),

We should probably step back and consider having blocking retries for all the service operations (as opposed to non-blocking retries, as implemented in #2889). It would greatly simplify the implementation but also probably reduce the likelihood of introducing bugs due to the unforeseen effects of non-blocking/async code paths.

How to test this PR locally

The example commands below are for MacOS systems.

Adding Elastic Defend integration should install Endpoint

  1. Build Elastic Agent with this PR.

    $ DEV=true EXTERNAL=true SNAPSHOT=true PLATFORMS=darwin/arm64 PACKAGES=tar.gz mage package
    
  2. Extract the built Agent artifact and replace Endpoint files with signed versions. Get signed versions by downloading the most recently released Agent artifact.

    $ cd /tmp
    $ tar xzf ~/development/github/elastic-agent/build/distributions/elastic-agent-8.10.0-SNAPSHOT-darwin-aarch64.tar.gz
    $ tar xzf ~/Downloads/elastic-agent-8.8.2-darwin-aarch64.tar.gz
    $ cp elastic-agent-8.8.2-darwin-aarch64/data/elastic-agent-cdc5ba/components/endpoint-security* elastic-agent-8.10.0-SNAPSHOT-darwin-aarch64/data/elastic-agent-2d1d32/components
    
  3. Enroll the Elastic Agent in Fleet.

    $ cd elastic-agent-8.10.0-SNAPSHOT-darwin-aarch64
    $ sudo ./elastic-agent install --url=https://XXXXXXXXX:443 --enrollment-token=XXXXXXXXXXXXX
    
  4. Verify that Endpoint isn't already installed on the system.

    $ sudo ls -l /Library/Elastic/Endpoint
    ls: /Library/Elastic/Endpoint: No such file or directory
    
  5. In the Fleet UI, edit the enrolled Agent's policy and add the Elastic Defend integration.

  6. After a few seconds, verify that Endpoint is installed on the system.

    $ sudo ls -l /Library/Elastic
    total 0
    drwxr-x---@ 17 root  wheel  544 Jul 17 15:02 Agent
    drwxr-xr-x@  9 root  wheel  288 Jul 17 15:07 Endpoint
    
  7. In the Fleet UI, verify that the Agent is healthy.

Unenrolling Elastic Agent should uninstall Endpoint

  1. Complete the steps for adding Elastic Defend Integration, which should install Endpoint.
  2. In the Fleet UI, unenroll Agent (do NOT check the "Remove Agent immediately" checkbox).
  3. Wait until Agent disappears from the Fleet UI.
  4. Verify that Endpoint isn't installed on the system.
    $ sudo ls -l /Library/Elastic/Endpoint
    ls: /Library/Elastic/Endpoint: No such file or directory
    

Removing Elastic Defend integration should uninstall Endpoint

  1. Complete the steps for adding Elastic Defend Integration, which should install Endpoint.
  2. In the Fleet UI, remove the Elastic Defend integration from the enrolled Agent's policy.
  3. Verify that Endpoint isn't installed on the system.
    $ sudo ls -l /Library/Elastic/Endpoint
    ls: /Library/Elastic/Endpoint: No such file or directory
    

Switching to policy without Elastic Defend integration should uninstall Endpoint

  1. Complete the steps for adding Elastic Defend Integration, which should install Endpoint.
  2. In the Fleet UI, create a new policy that does not include Elastic Defend.
  3. In the Fleet UI, update the enrolled Agent's policy to the new policy.
  4. Verify that Endpoint isn't installed on the system.
    $ sudo ls -l /Library/Elastic/Endpoint
    ls: /Library/Elastic/Endpoint: No such file or directory
    

Uninstalling Agent should uninstall Endpoint

  1. Complete the steps for adding Elastic Defend Integration, which should install Endpoint.
  2. From the command-line, uninstall Agent.
  3. Verify that Endpoint isn't installed on the system.
    $ sudo ls -l /Library/Elastic/Endpoint
    ls: /Library/Elastic/Endpoint: No such file or directory
    

Related issues

@elasticmachine
Copy link
Contributor

elasticmachine commented Jul 17, 2023

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2023-07-17T18:42:56.864+0000

  • Duration: 21 min 29 sec

Test stats 🧪

Test Results
Failed 0
Passed 5939
Skipped 27
Total 5966

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages.

  • run integration tests : Run the Elastic Agent Integration tests.

  • run end-to-end tests : Generate the packages and run the E2E Tests.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@ycombinator ycombinator marked this pull request as ready for review July 17, 2023 18:42
@ycombinator ycombinator requested a review from a team as a code owner July 17, 2023 18:42
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

Copy link
Member

@aleksmaus aleksmaus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. less is more. thanks!

@elasticmachine
Copy link
Contributor

🌐 Coverage report

Name Metrics % (covered/total) Diff
Packages 98.718% (77/78) 👍
Files 67.416% (180/267) 👍
Classes 66.329% (327/493) 👎 -0.068
Methods 53.642% (1031/1922) 👎 -0.248
Lines 39.88% (11771/29516) 👎 -0.23
Conditionals 100.0% (0/0) 💚

@kevinlog
Copy link

We should be sure to test the following after this revert PR goes in.

  • unenrolling agent removes endpoint
  • uninstalling agent removes endpoint
  • changing to a non-defend policy removes endpoint
  • removing defend from the policy removes endpoint

Is the integration test that @cmacknz added to this PR relevant here? #3086 . Could help us get coverage in place for the eventual re-introduction of this logs

@ycombinator
Copy link
Contributor Author

Is the integration test that @cmacknz added to this PR relevant here? #3086 . Could help us get coverage in place for the eventual re-introduction of this logs

Yes, good idea. Instead of closing #3086 unmerged, we'll just trim it down to the integration test.

@ycombinator
Copy link
Contributor Author

@kevinlog I tested the scenarios you mentioned in #3092 (comment) on my Mac with an Agent built from this PR. I also added detailed steps for each scenario to this PR's description. Each scenario worked mostly as expected; at the start of each scenario, I had to keep manually deleting /Applications/ElasticEndpoint.app. Otherwise, installing Endpoint as part of the subsequent scenario would fail. Is this a bug?

@ycombinator
Copy link
Contributor Author

buildkite test this

@ycombinator ycombinator merged commit 7e6b4f1 into elastic:main Jul 18, 2023
6 of 9 checks passed
@ycombinator ycombinator deleted the revert-2889 branch July 18, 2023 10:35
mergify bot pushed a commit that referenced this pull request Jul 18, 2023
…tried (#2889)" (#3092)

* Revert "[Service Runtime] Add ability for service operations to be retried (#2889)"

This reverts commit a3403e5.

* Remove extra parameter

(cherry picked from commit 7e6b4f1)
ycombinator added a commit that referenced this pull request Jul 18, 2023
…tried (#2889)" (#3092) (#3095)

* Revert "[Service Runtime] Add ability for service operations to be retried (#2889)"

This reverts commit a3403e5.

* Remove extra parameter

(cherry picked from commit 7e6b4f1)

Co-authored-by: Shaunak Kashyap <[email protected]>
@kevinlog
Copy link

@ycombinator thank you for testing the scenarios!

Each scenario worked mostly as expected; at the start of each scenario, I had to keep manually deleting /Applications/ElasticEndpoint.app. Otherwise, installing Endpoint as part of the subsequent scenario would fail. Is this a bug?

This doesn't sound right, but my question would be if it's a regression or not. Did the 8.8 Agent/Endpoint behave this way?

@ferullo @nfritts do you know?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-v8.9.0 Automated backport with mergify >bug Cleanup skip-changelog Team:Elastic-Agent Label for the Agent team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Agent fails to uninstall endpoint successfully when handling the Unenroll action
5 participants