Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

config-manager: Only use tedge-write after normal copy fails due to permissions and improve tedge-write logging #3069

Merged
merged 3 commits into from
Aug 29, 2024

Conversation

Bravo555
Copy link
Contributor

@Bravo555 Bravo555 commented Aug 19, 2024

Proposed changes

  • Only use tedge-write for privilege elevation after normal copy fails due to PermissionDenied error and tedge-write is enabled

  • Put more details into warn and error logs when deploying configuration files using tedge-write:

    before: "WARN: `sudo.enable` set to `true`, but sudo not found in $PATH"
    after:  "WARN: `sudo.enable` set to `true`, but sudo not found in $PATH, using tedge-write directly"
    
    before: "ERROR: config-manager failed writing updated configuration file: Starting tedge-write process failed"
    after:  "ERROR: config-manager failed writing updated configuration file: failed to start process '/usr/bin/tedge-write': No such file or directory (os error 2)"
    
    before: "ERROR: config-manager failed writing updated configuration file: sudo: /usr/bin/tedge-write: command not found"
    after:  "ERROR: config-manager failed writing updated configuration file: process returned non-zero exit code (1); stderr="sudo: /usr/bin/tedge-write: command not found""
    

Types of changes

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Improvement (general improvements like code refactoring that doesn't explicitly fix a bug or add any new functionality)
  • Documentation Update (if none of the other choices apply)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Paste Link to the issue


Checklist

  • I have read the CONTRIBUTING doc
  • I have signed the CLA (in all commits with git commit -s)
  • I ran cargo fmt as mentioned in CODING_GUIDELINES
  • I used cargo clippy as mentioned in CODING_GUIDELINES
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)

Further comments

Copy link

codecov bot commented Aug 19, 2024

Codecov Report

Attention: Patch coverage is 50.44248% with 56 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...rates/extensions/tedge_config_manager/src/actor.rs 56.84% 28 Missing and 13 partials ⚠️
crates/core/tedge_write/src/api.rs 12.50% 13 Missing and 1 partial ⚠️
crates/common/tedge_config/src/sudo.rs 50.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files

📢 Thoughts on this report? Let us know!

Copy link
Contributor

github-actions bot commented Aug 20, 2024

Robot Results

✅ Passed ❌ Failed ⏭️ Skipped Total Pass % ⏱️ Duration
499 0 2 499 100 1h25m36.385858s

Copy link
Contributor

@albinsuresh albinsuresh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, except for one minor unwarp issues.


if !output.status.success() {
let stderr = String::from_utf8(output.stderr).expect("output should be utf-8");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming that we're using expect here as we know that the output of tedge-write process can't be non-UTF8.


if !output.status.success() {
let stderr = String::from_utf8(output.stderr).expect("output should be utf-8");
let stderr = stderr.trim();
let exit_code = output.status.code().unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, isn't it better to be defenesive and account for the None status code case as well, as there is the theoretical possibility of the tedge-write process getting killed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I think that tedge-write process being terminated by a signal is very unlikely, as it is spawned and joined by tedge-agent, but you're right, it's better for that edge case to be properly handled as well.

Copy link
Contributor

@didier-wenzek didier-wenzek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, as this as been an issue observed by a user, it would be good to have a issue giving the context (#3073). Notably, it is not obvious reading this, that the root cause is that configuration management is not working on a device where /usr/bin is readonly the path for tedge_write being hard coded.

Second, instead of an option to disable tedge_write, it would be better to have a setting telling where tedge_write has been installed.

crates/common/tedge_config/src/sudo.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@didier-wenzek didier-wenzek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to discuss what is the best option : tedge_write.disableor tedge_write.path.

@Bravo555
Copy link
Contributor Author

Bravo555 commented Aug 20, 2024

During the call with @albinsuresh and @didier-wenzek we decided the following:

  1. A ticket with details about original user issue and their environment which required changes for tedge-write to be made will be created. It should explain the interaction between tedge-write and sudoers which makes it tricky.
    configuration update fails due to tedge-write when thin-edge.io is installed in a non-standard location #3073
  2. In the above mentioned user environment, tedge-agent is run as root. If tedge-agent runs as root, there is no reason for privilege elevation, so tedge-agent should never spawn tedge-write and just do a normal copy. This way we solve the original issue more elegantly, without adding yet more configuration options. This PR will be changed to use that approach instead of adding another configuration option.
    EDIT: Instead of checking if we're running as root, we should just try copying normally and only try to elevate once we fail with PermissionDenied. Limiting ourselves to root makes no sense if we could have permissions by some other means, and checking permissions ahead of time is also bad because it leads to TOCTOU bugs. Normal copy + tedge-write if failed is what should have been done from the very beginning.
  3. Ability to customise the path of tedge-write: if a user has a custom environment where they install thin-edge.io in non-standard locations, they should have an ability to customise paths of all thin-edge components, including tedge-write. However, for tedge-write to work as intended, there needs to be an entry in /etc/sudoers with a full path to tedge-write, e.g. /usr/bin/tedge-write and that full path also needs to be provided when spawning the process. If we allow users to use an alternative location for tedge-write, we need to make sure to let the user know they need to change the appropriate /etc/sudoers entry for privilege elevation to work.
    Thus, instead of hardcoding the path we can use another flow:
    1. use which to find full path of tedge-write, e.g. /opt/thinedge/bin/tedge-write
    2. spawn a sudo process using that path, i.e. sudo /opt/thinedge/bin/tedge-write
    3. Check if we're running as sudo/have necessary write permissions/copying file returns a success. If not, then we need to print a hint telling the user to edit their /etc/sudoers configuration.

@Bravo555 Bravo555 marked this pull request as draft August 20, 2024 10:21
@Bravo555 Bravo555 changed the title Allow user to disable tedge-write and improve its logging config-manager: Only use tedge-write after normal copy fails due to permissions and improve tedge-write logging Aug 20, 2024
@didier-wenzek
Copy link
Contributor

Side remark. It's a pity that the unit tests fail for a reason already highlighted by clippy: unused import: crate::FileEntry. We should not run the unit tests with RUSTFLAGS -D warnings

let error_message = format!("config-manager failed downloading a file: {err}",);
let err =
anyhow::Error::from(err).context("config-manager failed downloading a file");
let error_message = format!("{err:#}");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{err:#} on anyhow::Error prints all the lower level causes in error 1: error 2: error 3 format.

@Bravo555 Bravo555 marked this pull request as ready for review August 27, 2024 16:11
@Bravo555 Bravo555 added the theme:configuration Theme: Configuration management label Aug 28, 2024
Put more details into warn and error logs when deploying configuration
files using `tedge-write`:

- before: "WARN: `sudo.enable` set to `true`, but sudo not found in $PATH"
- after:  "WARN: `sudo.enable` set to `true`, but sudo not found in $PATH, using tedge-write directly"

- before: "ERROR: config-manager failed writing updated configuration file: Starting tedge-write process failed"
- after:  "ERROR: config-manager failed writing updated configuration file: failed to start process '/usr/bin/tedge-write': No such file or directory (os error 2)"

- before: "ERROR: config-manager failed writing updated configuration file: sudo: /usr/bin/tedge-write: command not found"
- after:  "ERROR: config-manager failed writing updated configuration file: process '/usr/bin/sudo' returned non-zero exit code (1); stderr="sudo: /usr/bin/tedge-write: command not found""

Signed-off-by: Marcel Guzik <[email protected]>
Configuration file deployment was changed to always do a regular copy
first, and only use tedge-write if copy failes due to lacking
permissions.

Signed-off-by: Marcel Guzik <[email protected]>
`Trigger config_update operation from another workflow` and `Trigger
config_snapshot operation from another operation` were failing if
started directly and not part of test suite. Because of the missing `Set
Device Context` call, workflow toml files were being copied to the wrong
device.

Signed-off-by: Marcel Guzik <[email protected]>
Copy link
Contributor

@didier-wenzek didier-wenzek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved

Comment on lines +413 to +414
// TODO: source temporary file should be cleaned up automatically
let _ = std::fs::remove_file(from);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has definitely to be done.

But not necessarily here. This file has been created and populated outside this method. It should be automatically cleaned by the caller. Not easy to implement with the current design, though. Can this be done once merged this refactoring

/// - `src` doesn't exist
/// - user or group doesn't exist
/// - we have no write/execute permissions to the directory
fn move_file_atomic_set_permissions_if_doesnt_exist(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fn move_file_atomic_set_permissions_if_doesnt_exist(
fn move_file_preserving_target_permissions(

Can we also move this API into tedge_utils so that it can be reused? After removing the dependency on FileEntry, of course. It could take the expected permissions instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I have this done for a follow-up PR

.tempfile_in(dest.parent().context("invalid path")?)
.with_context(|| format!("could not create temporary file at '{dest}'"))?;

std::io::copy(&mut src_file, &mut tempfile).context("failed to copy")?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function name implies move but we're doing a copy here. Can we attempt a file move at least when the source and target files are under the same parent directory?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, move is probably a little misleading for now, but this function will be used by both config-manager directly and by tedge-write, which reads from stdin, so the function will be renamed to write_... in a follow-up PR

Comment on lines +598 to +608
tempfile
.as_file()
.set_permissions(std::fs::Permissions::from_mode(target_permissions.mode))
.context("failed to set mode on the destination file")?;

fchown(
tempfile.as_file(),
Some(target_permissions.uid),
Some(target_permissions.gid),
)
.context("failed to change ownership of the destination file")?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could have been done with tedge_utils::file::PermissionEntry::apply() as well, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK PermissionEntry::apply() uses a chown, but since we have a file descriptor open and want all these operations finished before the final rename, I decided to use fchown, which should complete before the file descriptor is closed.

There is also one problem with PermissionEntry::apply() when used in this context:

Since we're doing an atomic write as either tedge or root, the temporary file will be owned by tedge or root but the target configuration file will likely be owned by some other user, so most likely we will always have to change file owner. We want the atomic write to look like a normal write so it has to preserve original ownership no matter what it is.

But it is technically possible for a file to have uid/gid of a user that does not exist on the system. Perhaps the file was created by a program coming from a package that set up its own user, and the package was since uninstalled and the user was removed. Still in this case we still want to preserve original permissions, so we have to use original uid/gid directly. But PermissionEntry::apply() uses a name, so if we wanted to use it here we'd need to look up the name of the owner of the file, which can fail.

So not using PermissionEntry::apply() avoids an extra failure mode.

Ok(Permissions { uid, gid, mode })
}

struct Permissions {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not reuse tedge_utils::file::PermissionEntry?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#3069 (comment)

We want to use uid/gid directly instead of names, since doing so avoids an extra failure mode.

@Bravo555 Bravo555 added this pull request to the merge queue Aug 29, 2024
Merged via the queue into thin-edge:main with commit d41cd84 Aug 29, 2024
31 checks passed
Copy link
Contributor

@gligorisaev gligorisaev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked the test and it is doing the expected verification

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme:configuration Theme: Configuration management
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants