Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assume installed agent on upgrade #5879

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# Kind can be one of:
# - breaking-change: a change to previously-documented behavior
# - deprecation: functionality that is being removed in a later release
# - bug-fix: fixes a problem in a previous version
# - enhancement: extends functionality but does not break or fix existing behavior
# - feature: new functionality
# - known-issue: problems that we are aware of in a given version
# - security: impacts on the security of a product or a user’s deployment.
# - upgrade: important information for someone upgrading from a prior version
# - other: does not fit into any of the other categories
kind: bug-fix

# Change summary; a 80ish characters long description of the change.
summary: This PR assumes installed agent on upgrade in order to initialize control path properly.

# Long description; in case the summary is not enough to describe the change
# this field accommodate a description without length limits.
# NOTE: This field will be rendered only for breaking-change and known-issue kinds at the moment.
#description:

# Affected component; usually one of "elastic-agent", "fleet-server", "filebeat", "metricbeat", "auditbeat", "all", etc.
component: elastic-agent

# PR URL; optional; the PR number that added the changeset.
# If not present is automatically filled by the tooling finding the PR where this changelog fragment has been added.
# NOTE: the tooling supports backports, so it's able to fill the original PR number instead of the backport PR number.
# Please provide it if you are adding a fragment for a different PR.
pr: https://github.com/owner/repo/5879

# Issue URL; optional; the GitHub issue related to this changeset (either closes or is part of).
# If not present is automatically filled by the tooling with the issue linked to the PR number.
issue: https://github.com/owner/repo/5872
122 changes: 122 additions & 0 deletions internal/pkg/agent/application/paths/paths_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,12 @@
package paths

import (
"fmt"
"runtime"
"testing"

"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)

func TestEqual(t *testing.T) {
Expand Down Expand Up @@ -92,3 +94,123 @@ func TestHasPrefixWindows(t *testing.T) {
})
}
}

func TestResolveControlSocketWithInstalledState(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, I appreciate that you added a unit test.

So, we only have one case where ResolveControlSocketWithInstalledState returns a different result than ControlSocketFromPath and it's for Windows when the agent is detected as installed...

I was more thinking about having a small UT for ResolveControlSocket() where we can prepare a temp directory to be the top path with and without the .installed marker and having explicit expected results.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i understand what you're trying to achieve.
i'd rather address this in a followup as we may hit the window for next release with actual state of the PR and tests which should be good enough to get us covered.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ResolveControlSocket() is only called from a single place, and that code already calls paths.RunningInstalled() right before ResolveControlSocket().

https://github.com/michalpristas/elastic-agent/blob/be8773ff18061ec0a7570fa43db902ff86e709fe/internal/pkg/agent/cmd/run.go#L713-L736

So IMO ResolveControlSocket just should always take whether we know we are installed as an argument and we can avoid having two functions. Then having the installed detection is never a part of this functions job, so we don't need to test that part here at all.

Just reading the test cases here it is really hard to get a picture of what is supposed to happen, fundamentally:

  1. When on Windows, whether you are installed or not changes the result.
  2. On Darwin and Linux, being installed makes no difference.

It's not clear why we have 3 test cases for each platform to me, when there are naturally only 2.

It feels like the tests could be simpler.

P.S. 8.16 is likely going to be delayed and there will be another BC sometime next week. If there isn't, this PR has already missed the release. So let's not worry about landing this and instead focus on whether we are happy with it as that is already out of our control until we know the next BC date.

testCases := []struct {
os string
controlSocket string
topPath string
runningInstalled bool
expectedSocket string
}{
{
"darwin",
ControlSocketFromPath("darwin", "/top"),
"/top",
true,
ControlSocketFromPath("darwin", "/top"),
},
{
"darwin",
ControlSocketFromPath("darwin", "/top"),
"/top",
false,
ControlSocketFromPath("darwin", "/top"),
},
{
"darwin",
"/control/socket",
"/top",
true,
"/control/socket",
},
{
"darwin",
"/control/socket",
"/top",
false,
"/control/socket",
},

{
"linux",
ControlSocketFromPath("linux", "/top"),
"/top",
true,
ControlSocketFromPath("linux", "/top"),
},
{
"linux",
ControlSocketFromPath("linux", "/top"),
"/top",
false,
ControlSocketFromPath("linux", "/top"),
},
{
"linux",
"/control/socket",
"/top",
true,
"/control/socket",
},
{
"linux",
"/control/socket",
"/top",
false,
"/control/socket",
},

{
"windows",
ControlSocketFromPath("windows", "/top"),
"/top",
true,
WindowsControlSocketInstalledPath,
},
{
"windows",
ControlSocketFromPath("windows", "/top"),
"/top",
false,
ControlSocketFromPath("windows", "/top"),
},

{
"windows",
"/control/socket",
"/top",
true,
"/control/socket",
},
{
"windows",
"/control/socket",
"/top",
false,
"/control/socket",
},
}

for i, tc := range testCases {
if runtime.GOOS != tc.os {
continue
}
t.Run(fmt.Sprintf("test case %d", i), func(t *testing.T) {
prevControlSocketPath := controlSocketPath
prevTopPath := topPath
defer func() {
// just to be sure
controlSocketPath = prevControlSocketPath
topPath = prevTopPath
}()
controlSocketPath = tc.controlSocket
topPath = tc.topPath

ResolveControlSocketWithInstalledState(tc.runningInstalled)

require.Equal(t, tc.expectedSocket, controlSocketPath)
})

}
}
3 changes: 3 additions & 0 deletions internal/pkg/agent/application/paths/paths_unix.go
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,9 @@ func initialControlSocketPath(topPath string) string {
// ResolveControlSocket does nothing on non-Windows hosts.
func ResolveControlSocket() {}

// ResolveControlSocketWithInstalledState updates the control socket path.
func ResolveControlSocketWithInstalledState(_ bool) {}

// HasPrefix tests if the path starts with the prefix.
func HasPrefix(path string, prefix string) bool {
if path == "" || prefix == "" {
Expand Down
7 changes: 6 additions & 1 deletion internal/pkg/agent/application/paths/paths_windows.go
Original file line number Diff line number Diff line change
Expand Up @@ -65,8 +65,13 @@ func initialControlSocketPath(topPath string) string {
// RunningInstalled will always be false, even when it is an installed version. Once
// that is fixed from the upgrade process the control socket path needs to be updated.
func ResolveControlSocket() {
ResolveControlSocketWithInstalledState(RunningInstalled())
}

// ResolveControlSocketWithInstalledState updates the control socket path.
func ResolveControlSocketWithInstalledState(runningInstalled bool) {
michalpristas marked this conversation as resolved.
Show resolved Hide resolved
currentPath := ControlSocket()
if currentPath == ControlSocketFromPath(runtime.GOOS, topPath) && RunningInstalled() {
if currentPath == ControlSocketFromPath(runtime.GOOS, topPath) && runningInstalled {
// path is not correct being that it's installed
// reset the control socket path to be the installed path
SetControlSocket(WindowsControlSocketInstalledPath)
Expand Down
6 changes: 6 additions & 0 deletions internal/pkg/agent/application/upgrade/watcher.go
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ import (

"google.golang.org/grpc"

"github.com/elastic/elastic-agent/internal/pkg/agent/application/paths"
"github.com/elastic/elastic-agent/pkg/control/v2/client"
"github.com/elastic/elastic-agent/pkg/core/logger"
)
Expand Down Expand Up @@ -50,6 +51,11 @@ type AgentWatcher struct {

// NewAgentWatcher creates a new agent watcher.
func NewAgentWatcher(ch chan error, log *logger.Logger, checkInterval time.Duration) *AgentWatcher {
// when starting watcher from pre 8.8 version of agent control socket is evaluated incorrectly and upgrade fails.
// resolving control socket updates it to a proper value before client is initiated
// upgrade is only available for installed agent so we can assume
paths.ResolveControlSocketWithInstalledState(true)

c := client.New()
ec := &AgentWatcher{
notifyChan: ch,
Expand Down
3 changes: 2 additions & 1 deletion testing/upgradetest/watcher.go
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,10 @@ import (
)

// FastWatcherCfg is configuration that makes the watcher run faster.
// we need to set grace period to 100s to be able to detect 5 failures 15 seconds apart and have a little buffer.
const FastWatcherCfg = `
agent.upgrade.watcher:
grace_period: 1m
grace_period: 100s
error_check.interval: 15s
crash_check.interval: 15s
`
Expand Down