-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Agent upgraded_at
field keeps updating to current time
#3263
Comments
I found the bug, |
@juliaElastic does it mean a newly install agent on 8.12.0 would have successfully be upgraded to 8.12.1? |
An agent on 8.12.0 cannot be upgraded to 8.12.1 via Fleet UI currently without the workaround Julia drafted here: #3264 (comment).
We need an automated test on all release branches where an upgrade from an agent on the latest available patch release for that branch is upgraded to the build for the current HEAD of that release branch. e.g. on the Additionally, we could have a daily run that does the same using the daily snapshot build instead of a PR build. |
@pierrehilbert @cmacknz regarding Kyle's comment above isn't this something we test already in the elastic agent testing framework? |
Hi @jlind23
We don't have the documented testcase for this scenario and cover this as a part of exploratory testing. Testing details: We weren't able to directly upgrade using Fleet UI till 8.12.1 BC1 from just previous version(8.12.0). Thanks! |
@amolnater-qasource but once you were able to test 8.12.0 to 8.12.1 it worked right? |
@jlind23 We have revalidated on the released 8.12.1 and observed that we are not able to upgrade from the UI.
Screenshots/Recordings: Agents.-.Fleet.-.Elastic.-.Google.Chrome.2024-02-12.14-45-10.mp4Please let us know if anything else is required from our end. |
Thanks @amolnater-qasource this makes sense as the next patch release isn't published during the BC phase and thus won't be shown in Fleet UI (maybe something we can file an enhancement for). To clarify: was upgrading from 8.12.0 -> 8.12.1 via the API successful during the BC test? If so can you share a summary of the test steps used as well? My Testrail access has lapsed as I don't log in frequently 🙃, otherwise I would check myself. Many thanks. |
We test a single upgrade, that is we install an agent build from the head of the current branch and upgrade it to the latest snapshot in that branch. This would be 8.13.0-SNAPSHOT or 8.12.0-SNAPSHOT for main and 8.12 respectively. We don't test two consecutive upgrades because from the agent's perspective there is no reason to, once the agent completes the upgrade state machine reported in the upgrade details it can upgrade again. There is no other state in the agent that can prevent this. |
@amolnater-qasource We looked at testrail with @kpollich and it looks like the test case below does not exist:
Can we make sure this is added please? |
Thanks, Craig. I think the agent test coverage is sufficient here and consecutive updates aren't something we should pursue adding. The coverage gaps lie elsewhere in Fleet. |
@kpollich yes the direct 8.12.0>8.12.1 BC1 was successful, which is part of our testcases using below API under the Dev tools:
Further, even on 8.12.1 released kibana environment, we are successfully able to upgrade 8.12.0>8.12.1 from Fleet UI. Screen Recording: Agents.-.Fleet.-.Elastic.-.Google.Chrome.2024-02-12.20-25-50.mp4
We do not have any testcase for upgrading the agents twice like from 8.11.4> 8.12.0> 8.12.1 However, we have testcases from one lower version from all OS's: Please let us know if anything else is required from our end. cc: @jlind23 |
@kpollich @juliaElastic according to #3263 (comment) it means that fresh install on 8.12.0 can be upgraded to 8.12.1, is that expected? |
Thanks @amolnater-qasource - this is extremely helpful in understanding our existing test coverage here.
I can confirm this is working as expected. I created a fresh
So, if I'm understanding the smoke tests properly, we wouldn't have caught this issue in smoke tests. In our smoke tests, we create a cloud instance on the latest release, then enroll an agent on the previous release, then attempt to upgrade it. To confirm this, I performed the same steps as above, but initially enrolled an agent on
So, in order to catch this bug in the QAS smoke tests, we would've needed to test a sequential upgrade from |
I don't think it matters, either sequence would reproduce the bug wouldn't it? Probably best to always use the latest versions with the most bug fixes to minimize other issues. This regression test using real agents is a good idea but it also feels like you could write automated test for the upgrade state directly in Fleet server. The simplest version of this would use mock agents (similar to horde) and you could query the resulting changes out of Elasticsearch directly, although a better test would probably use the Fleet API in Kibana. The agent test framework we use can provide the guarantee that the agent half of the upgrade works as expected, so you don't need to reverify that. Using mock agents would also allow you to have them do adversarial things like make requests with incorrect and out of order upgrade details. While Fleet shouldn't have to verify the agent part of the contact, it also shouldn't assume the agent will never have a bug in how it talks to Fleet and it should defend itself against that. |
Yes I should clarify: either scenario would reproduce this issue, but I meant to codify this process for future test runs. Using the latest versions sounds good to me. We'd codify this in TestRail as follows, to be run on patch releases
For minors, we'd stick with
I agree that ultimately this case should be covered in Fleet Server tests. There are substantial barriers to handling this in Kibana CI (we need to spawn "real" agents off of snapshot builds, for example) that don't exist in Fleet Server. Spawning a live Kibana server in Fleet Server CI is a good idea, but I don't know that we do that today. I know that's how the agent tests we're talking about work, so we also do this in Fleet Server for better test fidelity. I'm working on capturing all of this in a RCA doc that I'll send out later today, then we'll meet tomorrow as a group to make sure we're aligned on next steps. |
Hi Team, We have created 01 testcase for this scenario under our fleet test suite at link: Please let us know if anything else is required from our end. |
Hi Team, We have revalidated this issue on 8.12.2 BC1 kibana cloud environment and had below observations: Observations:
Logs: Build details: Hence, we are marking this issue as QA:Validated. Please let us know if we are missing anything here. |
Stack version 8.12.1 and possibly others.
There seems to be an issue of Agents
upgraded_at
field keep being updated to current time, and this results in Fleet UI now showingUpgrade available
when it should, andUpgrade agent
action being disabled, because Fleet UI doesn't consider agent upgradeable if the agent was updated in the last 10 minutes.It's not clear yet if the issue is on fleet-server or agent side.
Reproduced on a fresh 8.12.1 cluster, by enrolling a 8.11.4 agent, upgrade to 8.12.0 and wait 10 minutes.
The agent is still not allowed to be upgraded again to 8.12.1, and the
upgraded_at
field looks recent, event though the last upgrade happened more than 10m ago.Workaround:
"force": true
flag, orupgrade_details:null
value from agent docsThe text was updated successfully, but these errors were encountered: