-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add class RestProfile for local maintenance of users' REST parameters. #5463
base: main
Are you sure you want to change the base?
Conversation
22adc16
to
d285baa
Compare
tiledb/sm/rest/rest_profile.h
Outdated
void save(); | ||
|
||
/** Removes this profile from the local file. */ | ||
void remove(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer it this way.
void remove(); | |
static void remove(const std::string& name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments to begin with, I am reviewing actively so I'll be probably posting more
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a concern around atomicity that we'd need to re-work. But keep me honest if I have misunderstood something.
if (std::filesystem::exists(filepath_)) { | ||
// Temporarily append filename with a random label to guarantee atomicity. | ||
filepath_ = filepath_ + random_label(); | ||
if (std::rename(original_filepath.c_str(), filepath_.c_str()) != 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is a safe way to guarantee atomicity. What if we rename and then another process finds there is no file and recreates it? Then its racy, and we'll end up with partial data as well.
Couldn't we just use some mutex
for loading, saving and removing the file? Also I think this is a write after read case, and loading and saving should happen in the same atomic block for guaranteeing correctness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to synchronize across processes, we would need a named mutex, which is not available in the C++ standard library.
I'm not sure atomicity is important to have; modifying profiles is not a frequent operation either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought I read that we cared about it in the design doc @bekadavis9 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few bits to untangle here, but the gist of it is that we're renaming backwards. I shall resist the urge to go too far down the rabbit hole but I will try and explain a bit to make sure we're all on the same page. Also, I'm gonna have to rely on @teo-tsirpanis et al to figure out how this applies to Windows APIs. I'm vaguely remember that there's an equivalent, but my Windows knowledge is slim to none.
First things first, we do want to be using rename here, but not quite like what exists in the PR. Updating the profile.json should always follow this process:
- Create a temporary file. Adding a random label is perfect here (though I'd probably add a
.
separator for aesthetic reasons) - Write new contents to temporary file:
a. open file
b. write json to file
c. fsync file (yes really)
d. close file - Rename temporary file to the expected filename
The reason this works is because rename
is guaranteed to be "atomic" which means that the inode pointed at by the hard link is swapped from one to the other. This notably doesn't mean "guaranteed" or "orderable" or anything beyond the very narrow of "it either happened or didn't, the contents of the destination file are either unchanged or exactly the contents of the temp file".
This means a few things:
- The profiles.json file will either exist or it won't. It'll never exist in a partially written state which gives resilience to things like a segfault or someone tripping on the power cord while the profile is being updated.
- If someone does have multiple processes updating this file we can still guarantee consistency of the file, which notably is not the same as guaranteeing the contents of the file. That's the best we can do though and it's a pattern that works fine for these sorts of things given the frequency of updates. Someone finding issue with this is most definitely Doing it Wrong®.
- That's pretty much it.
Some concrete suggestions for tidying up the PR:
- I'd create two simple methods that are "save_file" and "load_file". All I/O happens in these two methods and nowhere else.
- All of the "add/update/remove" logic is external to file handling. I noticed a few different comments around that logic which I haven't got much opinion on other than try and be internally consistent at least.
- That... is also pretty much it.
Also, I should probably note for anyone wondering, this approach to file handling also avoids issues on reads. And if you're wondering how, its because files don't have to be linked on disk to be readable or writable. This means you only have two states to worry about:
- The file doesn't exist. Fine, return an empty object.
- The file does exist, yay we got a reference to its inode.
Notably, step 2 doesn't care if that inode is replaced while its reading the file since the OS guarantees that the file will be readable (If memory serves Windows diverges here a bit in that you can't delete a file while its being read, absolutely no clue how that plays out here).
And one last note since I've already written this much: The bottom line here is to avoid corrupting that profile.json file where "not corrupt" means "if it exists, it contains a valid JSON object". We can't make guarantees about the contents of that file under concurrent writes, but we can at least guarantee that its valid. If I were to write a test for this, I'd spawn N threads that all do random mutations to this file for some largish number of iterations (i.e., ~1000 * N seems reasonable). Then another thread is spun up and all it does is attempt to repeatedly call load_file
or w/e as fast as possible ensuring that either the file doesn't exist (only at the very beginning of the test, once it exists, it should always exist until the user deletes it) or contains valid JSON.
Thank you for coming to my Ted Talk.
// If a profile of the given name already exists, remove it. | ||
if (data.contains(name_)) { | ||
data.erase(name_); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the behavior we concluded we want? If we concluded that profiles should be immutable, I'd suggest that we just throw in this case, with an informative error that the profile with this name already exists, and if the user wish to replace it then he should first explicitly ask to remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some notes around file update semantics.
if (std::filesystem::exists(filepath_)) { | ||
// Temporarily append filename with a random label to guarantee atomicity. | ||
filepath_ = filepath_ + random_label(); | ||
if (std::rename(original_filepath.c_str(), filepath_.c_str()) != 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few bits to untangle here, but the gist of it is that we're renaming backwards. I shall resist the urge to go too far down the rabbit hole but I will try and explain a bit to make sure we're all on the same page. Also, I'm gonna have to rely on @teo-tsirpanis et al to figure out how this applies to Windows APIs. I'm vaguely remember that there's an equivalent, but my Windows knowledge is slim to none.
First things first, we do want to be using rename here, but not quite like what exists in the PR. Updating the profile.json should always follow this process:
- Create a temporary file. Adding a random label is perfect here (though I'd probably add a
.
separator for aesthetic reasons) - Write new contents to temporary file:
a. open file
b. write json to file
c. fsync file (yes really)
d. close file - Rename temporary file to the expected filename
The reason this works is because rename
is guaranteed to be "atomic" which means that the inode pointed at by the hard link is swapped from one to the other. This notably doesn't mean "guaranteed" or "orderable" or anything beyond the very narrow of "it either happened or didn't, the contents of the destination file are either unchanged or exactly the contents of the temp file".
This means a few things:
- The profiles.json file will either exist or it won't. It'll never exist in a partially written state which gives resilience to things like a segfault or someone tripping on the power cord while the profile is being updated.
- If someone does have multiple processes updating this file we can still guarantee consistency of the file, which notably is not the same as guaranteeing the contents of the file. That's the best we can do though and it's a pattern that works fine for these sorts of things given the frequency of updates. Someone finding issue with this is most definitely Doing it Wrong®.
- That's pretty much it.
Some concrete suggestions for tidying up the PR:
- I'd create two simple methods that are "save_file" and "load_file". All I/O happens in these two methods and nowhere else.
- All of the "add/update/remove" logic is external to file handling. I noticed a few different comments around that logic which I haven't got much opinion on other than try and be internally consistent at least.
- That... is also pretty much it.
Also, I should probably note for anyone wondering, this approach to file handling also avoids issues on reads. And if you're wondering how, its because files don't have to be linked on disk to be readable or writable. This means you only have two states to worry about:
- The file doesn't exist. Fine, return an empty object.
- The file does exist, yay we got a reference to its inode.
Notably, step 2 doesn't care if that inode is replaced while its reading the file since the OS guarantees that the file will be readable (If memory serves Windows diverges here a bit in that you can't delete a file while its being read, absolutely no clue how that plays out here).
And one last note since I've already written this much: The bottom line here is to avoid corrupting that profile.json file where "not corrupt" means "if it exists, it contains a valid JSON object". We can't make guarantees about the contents of that file under concurrent writes, but we can at least guarantee that its valid. If I were to write a test for this, I'd spawn N threads that all do random mutations to this file for some largish number of iterations (i.e., ~1000 * N seems reasonable). Then another thread is spun up and all it does is attempt to repeatedly call load_file
or w/e as fast as possible ensuring that either the file doesn't exist (only at the very beginning of the test, once it exists, it should always exist until the user deletes it) or contains valid JSON.
Thank you for coming to my Ted Talk.
Add class
RestProfile
for local maintenance of users'REST
parameters.class RestProfile
is the first step toward configuration unification. This internal API stores a user'sREST
parameters after they invokeTileDB-Cloud-Py::login
upstream. Once a user has created aRestProfile
and set their desired parameters,RestProfile::save()
will serialize the entire profile onto the local, on-disk file,$HOME/.tiledb/profiles.json
.Previously, the upstream API would save users'
REST
credentials to a different local file,$HOME/.tiledb/cloud.json
. To maintain backward compatibility,RestProfile
will search for that file and inherit the parameters within, saving them to the new local file,$HOME/.tiledb/profiles.json
. Going forward, we expect users to interact only with the latter of the local files. TheTileDB-Cloud-Py
API will be updated to reflect this change.A
RestProfile
default-initializes its parameter-value mapping, and is therefore complete for saving upon creation. Attempting to create multiple profiles of the same name will overwrite the local file with the latest data. Neither local file should be changed by the unit tests, so the test wrapper uses aTemporaryLocalDirectory
as a pseudo-$HOME
. As such, the local files are written to/tmp
during the test and removed at the end of theTEST_CASE
.A followup PR will make use of this class in
class Config
.[sc-62705]
TYPE: NO_HISTORY
DESC: Add class
RestProfile
for local maintenance of users'REST
parameters.