Add class RestProfile for local maintenance of users' REST parameters. #5463

bekadavis9 · 2025-02-19T18:55:12Z

Add class RestProfile for local maintenance of users' REST parameters.

class RestProfile is the first step toward configuration unification. This internal API stores a user's REST parameters after they invoke TileDB-Cloud-Py::login upstream. Once a user has created a RestProfile and set their desired parameters, RestProfile::save() will serialize the entire profile onto the local, on-disk file, $HOME/.tiledb/profiles.json.

Previously, the upstream API would save users' REST credentials to a different local file, $HOME/.tiledb/cloud.json. To maintain backward compatibility, RestProfile will search for that file and inherit the parameters within, saving them to the new local file, $HOME/.tiledb/profiles.json. Going forward, we expect users to interact only with the latter of the local files. The TileDB-Cloud-Py API will be updated to reflect this change.

A RestProfile default-initializes its parameter-value mapping, and is therefore complete for saving upon creation. Attempting to create multiple profiles of the same name will overwrite the local file with the latest data. Neither local file should be changed by the unit tests, so the test wrapper uses a TemporaryLocalDirectory as a pseudo-$HOME. As such, the local files are written to /tmp during the test and removed at the end of the TEST_CASE.

A followup PR will make use of this class in class Config.

[sc-62705]

TYPE: NO_HISTORY
DESC: Add class RestProfile for local maintenance of users' REST parameters.

tiledb/common/filesystem/home_directory.cc

tiledb/common/filesystem/home_directory.h

tiledb/sm/rest/rest_profile.h

tiledb/sm/rest/rest_profile.cc

teo-tsirpanis · 2025-02-19T20:18:04Z

tiledb/sm/rest/rest_profile.h

+  void save();
+
+  /** Removes this profile from the local file. */
+  void remove();


I would prefer it this way.

Suggested change

void remove();

static void remove(const std::string& name);

tiledb/sm/rest/rest_profile.cc

tiledb/sm/rest/test/unit_rest_profile.cc

tiledb/sm/rest/rest_profile.cc

tiledb/sm/rest/test/unit_rest_profile.cc

ypatia

A few comments to begin with, I am reviewing actively so I'll be probably posting more

tiledb/sm/rest/rest_profile.cc

ypatia

I have a concern around atomicity that we'd need to re-work. But keep me honest if I have misunderstood something.

tiledb/sm/rest/rest_profile.cc

tiledb/common/filesystem/home_directory.h

tiledb/sm/rest/rest_profile.cc

ypatia · 2025-02-26T11:25:11Z

tiledb/sm/rest/rest_profile.cc

+  if (std::filesystem::exists(filepath_)) {
+    // Temporarily append filename with a random label to guarantee atomicity.
+    filepath_ = filepath_ + random_label();
+    if (std::rename(original_filepath.c_str(), filepath_.c_str()) != 0) {


I don't think this is a safe way to guarantee atomicity. What if we rename and then another process finds there is no file and recreates it? Then its racy, and we'll end up with partial data as well.
Couldn't we just use some mutex for loading, saving and removing the file? Also I think this is a write after read case, and loading and saving should happen in the same atomic block for guaranteeing correctness.

In order to synchronize across processes, we would need a named mutex, which is not available in the C++ standard library.

I'm not sure atomicity is important to have; modifying profiles is not a frequent operation either way.

I thought I read that we cared about it in the design doc @bekadavis9 ?

A few bits to untangle here, but the gist of it is that we're renaming backwards. I shall resist the urge to go too far down the rabbit hole but I will try and explain a bit to make sure we're all on the same page. Also, I'm gonna have to rely on @teo-tsirpanis et al to figure out how this applies to Windows APIs. I'm vaguely remember that there's an equivalent, but my Windows knowledge is slim to none.

First things first, we do want to be using rename here, but not quite like what exists in the PR. Updating the profile.json should always follow this process:

Create a temporary file. Adding a random label is perfect here (though I'd probably add a . separator for aesthetic reasons)

Write new contents to temporary file:
a. open file
b. write json to file
c. fsync file (yes really)
d. close file

Rename temporary file to the expected filename

The reason this works is because rename is guaranteed to be "atomic" which means that the inode pointed at by the hard link is swapped from one to the other. This notably doesn't mean "guaranteed" or "orderable" or anything beyond the very narrow of "it either happened or didn't, the contents of the destination file are either unchanged or exactly the contents of the temp file".

This means a few things:

The profiles.json file will either exist or it won't. It'll never exist in a partially written state which gives resilience to things like a segfault or someone tripping on the power cord while the profile is being updated.

If someone does have multiple processes updating this file we can still guarantee consistency of the file, which notably is not the same as guaranteeing the contents of the file. That's the best we can do though and it's a pattern that works fine for these sorts of things given the frequency of updates. Someone finding issue with this is most definitely Doing it Wrong®.

That's pretty much it.

Some concrete suggestions for tidying up the PR:

I'd create two simple methods that are "save_file" and "load_file". All I/O happens in these two methods and nowhere else.

All of the "add/update/remove" logic is external to file handling. I noticed a few different comments around that logic which I haven't got much opinion on other than try and be internally consistent at least.

That... is also pretty much it.

Also, I should probably note for anyone wondering, this approach to file handling also avoids issues on reads. And if you're wondering how, its because files don't have to be linked on disk to be readable or writable. This means you only have two states to worry about:

The file doesn't exist. Fine, return an empty object.

The file does exist, yay we got a reference to its inode.

Notably, step 2 doesn't care if that inode is replaced while its reading the file since the OS guarantees that the file will be readable (If memory serves Windows diverges here a bit in that you can't delete a file while its being read, absolutely no clue how that plays out here).

And one last note since I've already written this much: The bottom line here is to avoid corrupting that profile.json file where "not corrupt" means "if it exists, it contains a valid JSON object". We can't make guarantees about the contents of that file under concurrent writes, but we can at least guarantee that its valid. If I were to write a test for this, I'd spawn N threads that all do random mutations to this file for some largish number of iterations (i.e., ~1000 * N seems reasonable). Then another thread is spun up and all it does is attempt to repeatedly call load_file or w/e as fast as possible ensuring that either the file doesn't exist (only at the very beginning of the test, once it exists, it should always exist until the user deletes it) or contains valid JSON.

Thank you for coming to my Ted Talk.

ypatia · 2025-02-26T11:32:38Z

tiledb/sm/rest/rest_profile.cc

+    // If a profile of the given name already exists, remove it.
+    if (data.contains(name_)) {
+      data.erase(name_);
+    }


Is this the behavior we concluded we want? If we concluded that profiles should be immutable, I'd suggest that we just throw in this case, with an informative error that the profile with this name already exists, and if the user wish to replace it then he should first explicitly ask to remove it.

tiledb/sm/rest/rest_profile.h

tiledb/sm/rest/rest_profile.cc

davisp

Some notes around file update semantics.

davisp · 2025-02-27T18:38:03Z

tiledb/sm/rest/rest_profile.cc

+  if (std::filesystem::exists(filepath_)) {
+    // Temporarily append filename with a random label to guarantee atomicity.
+    filepath_ = filepath_ + random_label();
+    if (std::rename(original_filepath.c_str(), filepath_.c_str()) != 0) {


A few bits to untangle here, but the gist of it is that we're renaming backwards. I shall resist the urge to go too far down the rabbit hole but I will try and explain a bit to make sure we're all on the same page. Also, I'm gonna have to rely on @teo-tsirpanis et al to figure out how this applies to Windows APIs. I'm vaguely remember that there's an equivalent, but my Windows knowledge is slim to none.

First things first, we do want to be using rename here, but not quite like what exists in the PR. Updating the profile.json should always follow this process:

Create a temporary file. Adding a random label is perfect here (though I'd probably add a . separator for aesthetic reasons)

Write new contents to temporary file:
a. open file
b. write json to file
c. fsync file (yes really)
d. close file

Rename temporary file to the expected filename

The reason this works is because rename is guaranteed to be "atomic" which means that the inode pointed at by the hard link is swapped from one to the other. This notably doesn't mean "guaranteed" or "orderable" or anything beyond the very narrow of "it either happened or didn't, the contents of the destination file are either unchanged or exactly the contents of the temp file".

This means a few things:

The profiles.json file will either exist or it won't. It'll never exist in a partially written state which gives resilience to things like a segfault or someone tripping on the power cord while the profile is being updated.

If someone does have multiple processes updating this file we can still guarantee consistency of the file, which notably is not the same as guaranteeing the contents of the file. That's the best we can do though and it's a pattern that works fine for these sorts of things given the frequency of updates. Someone finding issue with this is most definitely Doing it Wrong®.

That's pretty much it.

Some concrete suggestions for tidying up the PR:

I'd create two simple methods that are "save_file" and "load_file". All I/O happens in these two methods and nowhere else.

All of the "add/update/remove" logic is external to file handling. I noticed a few different comments around that logic which I haven't got much opinion on other than try and be internally consistent at least.

That... is also pretty much it.

Also, I should probably note for anyone wondering, this approach to file handling also avoids issues on reads. And if you're wondering how, its because files don't have to be linked on disk to be readable or writable. This means you only have two states to worry about:

The file doesn't exist. Fine, return an empty object.

The file does exist, yay we got a reference to its inode.

Notably, step 2 doesn't care if that inode is replaced while its reading the file since the OS guarantees that the file will be readable (If memory serves Windows diverges here a bit in that you can't delete a file while its being read, absolutely no clue how that plays out here).

And one last note since I've already written this much: The bottom line here is to avoid corrupting that profile.json file where "not corrupt" means "if it exists, it contains a valid JSON object". We can't make guarantees about the contents of that file under concurrent writes, but we can at least guarantee that its valid. If I were to write a test for this, I'd spawn N threads that all do random mutations to this file for some largish number of iterations (i.e., ~1000 * N seems reasonable). Then another thread is spun up and all it does is attempt to repeatedly call load_file or w/e as fast as possible ensuring that either the file doesn't exist (only at the very beginning of the test, once it exists, it should always exist until the user deletes it) or contains valid JSON.

Thank you for coming to my Ted Talk.

bekadavis9 requested review from ypatia, ihnorton and teo-tsirpanis February 19, 2025 18:55

Add class RestProfile for local maintenance of users' REST parameters.

d285baa

bekadavis9 force-pushed the rd/profile branch from 22adc16 to d285baa Compare February 19, 2025 19:57

teo-tsirpanis reviewed Feb 19, 2025

View reviewed changes

teo-tsirpanis reviewed Feb 20, 2025

View reviewed changes

tiledb/sm/rest/rest_profile.cc Outdated Show resolved Hide resolved

teo-tsirpanis and others added 4 commits February 20, 2025 23:51

Use SHGetKnownFolderPath.

625ffd0

Add missing ifdefs.

0c97dc7

Address comments.

1facd99

Add constructor overload to take in homedir.

70a2ccf

teo-tsirpanis reviewed Feb 24, 2025

View reviewed changes

tiledb/sm/rest/test/unit_rest_profile.cc Outdated Show resolved Hide resolved

Address comment.

c6aca03

ypatia reviewed Feb 25, 2025

View reviewed changes

tiledb/sm/rest/rest_profile.cc Show resolved Hide resolved

tiledb/sm/rest/rest_profile.cc Outdated Show resolved Hide resolved

tiledb/sm/rest/rest_profile.cc Outdated Show resolved Hide resolved

ypatia reviewed Feb 26, 2025

View reviewed changes

Address most comments, still locally investigating some.

f4f192a

kounelisagis self-requested a review February 27, 2025 00:48

Revert minor change, remove HomeDirectory class and optional behavior.

04de830

davisp reviewed Feb 27, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add class RestProfile for local maintenance of users' REST parameters. #5463

Add class RestProfile for local maintenance of users' REST parameters. #5463

bekadavis9 commented Feb 19, 2025 •

edited

Loading

teo-tsirpanis Feb 19, 2025

ypatia left a comment

ypatia left a comment

ypatia Feb 26, 2025

teo-tsirpanis Feb 26, 2025

ypatia Feb 26, 2025

davisp Feb 27, 2025

ypatia Feb 26, 2025

davisp left a comment

davisp Feb 27, 2025

Add class RestProfile for local maintenance of users' REST parameters. #5463

Are you sure you want to change the base?

Add class RestProfile for local maintenance of users' REST parameters. #5463

Conversation

bekadavis9 commented Feb 19, 2025 • edited Loading

teo-tsirpanis Feb 19, 2025

Choose a reason for hiding this comment

ypatia left a comment

Choose a reason for hiding this comment

ypatia left a comment

Choose a reason for hiding this comment

ypatia Feb 26, 2025

Choose a reason for hiding this comment

teo-tsirpanis Feb 26, 2025

Choose a reason for hiding this comment

ypatia Feb 26, 2025

Choose a reason for hiding this comment

davisp Feb 27, 2025

Choose a reason for hiding this comment

ypatia Feb 26, 2025

Choose a reason for hiding this comment

davisp left a comment

Choose a reason for hiding this comment

davisp Feb 27, 2025

Choose a reason for hiding this comment

bekadavis9 commented Feb 19, 2025 •

edited

Loading