-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
config-manager: Only use tedge-write after normal copy fails due to permissions and improve tedge-write logging #3069
Conversation
Codecov ReportAttention: Patch coverage is Additional details and impacted files📢 Thoughts on this report? Let us know! |
Robot Results
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, except for one minor unwarp issues.
crates/core/tedge_write/src/api.rs
Outdated
|
||
if !output.status.success() { | ||
let stderr = String::from_utf8(output.stderr).expect("output should be utf-8"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming that we're using expect
here as we know that the output of tedge-write
process can't be non-UTF8.
crates/core/tedge_write/src/api.rs
Outdated
|
||
if !output.status.success() { | ||
let stderr = String::from_utf8(output.stderr).expect("output should be utf-8"); | ||
let stderr = stderr.trim(); | ||
let exit_code = output.status.code().unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, isn't it better to be defenesive and account for the None
status code case as well, as there is the theoretical possibility of the tedge-write
process getting killed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I think that tedge-write
process being terminated by a signal is very unlikely, as it is spawned and joined by tedge-agent, but you're right, it's better for that edge case to be properly handled as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First, as this as been an issue observed by a user, it would be good to have a issue giving the context (#3073). Notably, it is not obvious reading this, that the root cause is that configuration management is not working on a device where /usr/bin
is readonly the path for tedge_write
being hard coded.
Second, instead of an option to disable tedge_write
, it would be better to have a setting telling where tedge_write
has been installed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to discuss what is the best option : tedge_write.disable
or tedge_write.path
.
During the call with @albinsuresh and @didier-wenzek we decided the following:
|
f9fc679
to
2884a0f
Compare
2884a0f
to
e130de6
Compare
e130de6
to
febaea6
Compare
Side remark. It's a pity that the unit tests fail for a reason already highlighted by clippy: unused import: |
febaea6
to
9c7a381
Compare
9c7a381
to
ec8562c
Compare
ec8562c
to
8f6bf66
Compare
8f6bf66
to
b89069c
Compare
b89069c
to
2d755e5
Compare
let error_message = format!("config-manager failed downloading a file: {err}",); | ||
let err = | ||
anyhow::Error::from(err).context("config-manager failed downloading a file"); | ||
let error_message = format!("{err:#}"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{err:#}
on anyhow::Error
prints all the lower level causes in error 1: error 2: error 3
format.
Put more details into warn and error logs when deploying configuration files using `tedge-write`: - before: "WARN: `sudo.enable` set to `true`, but sudo not found in $PATH" - after: "WARN: `sudo.enable` set to `true`, but sudo not found in $PATH, using tedge-write directly" - before: "ERROR: config-manager failed writing updated configuration file: Starting tedge-write process failed" - after: "ERROR: config-manager failed writing updated configuration file: failed to start process '/usr/bin/tedge-write': No such file or directory (os error 2)" - before: "ERROR: config-manager failed writing updated configuration file: sudo: /usr/bin/tedge-write: command not found" - after: "ERROR: config-manager failed writing updated configuration file: process '/usr/bin/sudo' returned non-zero exit code (1); stderr="sudo: /usr/bin/tedge-write: command not found"" Signed-off-by: Marcel Guzik <[email protected]>
Configuration file deployment was changed to always do a regular copy first, and only use tedge-write if copy failes due to lacking permissions. Signed-off-by: Marcel Guzik <[email protected]>
`Trigger config_update operation from another workflow` and `Trigger config_snapshot operation from another operation` were failing if started directly and not part of test suite. Because of the missing `Set Device Context` call, workflow toml files were being copied to the wrong device. Signed-off-by: Marcel Guzik <[email protected]>
2d755e5
to
33d220d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved
// TODO: source temporary file should be cleaned up automatically | ||
let _ = std::fs::remove_file(from); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has definitely to be done.
But not necessarily here. This file has been created and populated outside this method. It should be automatically cleaned by the caller. Not easy to implement with the current design, though. Can this be done once merged this refactoring
/// - `src` doesn't exist | ||
/// - user or group doesn't exist | ||
/// - we have no write/execute permissions to the directory | ||
fn move_file_atomic_set_permissions_if_doesnt_exist( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fn move_file_atomic_set_permissions_if_doesnt_exist( | |
fn move_file_preserving_target_permissions( |
Can we also move this API into tedge_utils
so that it can be reused? After removing the dependency on FileEntry
, of course. It could take the expected permissions instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I have this done for a follow-up PR
.tempfile_in(dest.parent().context("invalid path")?) | ||
.with_context(|| format!("could not create temporary file at '{dest}'"))?; | ||
|
||
std::io::copy(&mut src_file, &mut tempfile).context("failed to copy")?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function name implies move but we're doing a copy here. Can we attempt a file move at least when the source and target files are under the same parent directory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, move is probably a little misleading for now, but this function will be used by both config-manager directly and by tedge-write, which reads from stdin, so the function will be renamed to write_...
in a follow-up PR
tempfile | ||
.as_file() | ||
.set_permissions(std::fs::Permissions::from_mode(target_permissions.mode)) | ||
.context("failed to set mode on the destination file")?; | ||
|
||
fchown( | ||
tempfile.as_file(), | ||
Some(target_permissions.uid), | ||
Some(target_permissions.gid), | ||
) | ||
.context("failed to change ownership of the destination file")?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could have been done with tedge_utils::file::PermissionEntry::apply()
as well, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK PermissionEntry::apply()
uses a chown
, but since we have a file descriptor open and want all these operations finished before the final rename, I decided to use fchown
, which should complete before the file descriptor is closed.
There is also one problem with PermissionEntry::apply()
when used in this context:
Since we're doing an atomic write as either tedge
or root
, the temporary file will be owned by tedge
or root
but the target configuration file will likely be owned by some other user, so most likely we will always have to change file owner. We want the atomic write to look like a normal write so it has to preserve original ownership no matter what it is.
But it is technically possible for a file to have uid/gid of a user that does not exist on the system. Perhaps the file was created by a program coming from a package that set up its own user, and the package was since uninstalled and the user was removed. Still in this case we still want to preserve original permissions, so we have to use original uid/gid directly. But PermissionEntry::apply()
uses a name, so if we wanted to use it here we'd need to look up the name of the owner of the file, which can fail.
So not using PermissionEntry::apply()
avoids an extra failure mode.
Ok(Permissions { uid, gid, mode }) | ||
} | ||
|
||
struct Permissions { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not reuse tedge_utils::file::PermissionEntry
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to use uid/gid directly instead of names, since doing so avoids an extra failure mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checked the test and it is doing the expected verification
Proposed changes
Only use
tedge-write
for privilege elevation after normal copy fails due toPermissionDenied
error and tedge-write is enabledPut more details into warn and error logs when deploying configuration files using
tedge-write
:Types of changes
Paste Link to the issue
Checklist
cargo fmt
as mentioned in CODING_GUIDELINEScargo clippy
as mentioned in CODING_GUIDELINESFurther comments