Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CP-52226: Implement XS_DIRECTORY_PART call #2

Merged

Conversation

last-genius
Copy link
Contributor

@last-genius last-genius commented Nov 12, 2024

CP-52226 - oxenstored: Implement partial directory call for dom0 with per-connection cache

Implements XS_DIRECTORY_PART (originally specified in
https://lore.kernel.org/xen-devel/[email protected]/).
Like in CXenstored, it's limited to dom0. It is intended to list
directories larger than the size of a single payload (4096 bytes) over
several calls (not necessarily inside a single transaction).

To alleviate the impact of the differences in how node's children are
represented between C and OCaml Xenstoreds, implements a per-connection
cache allowing the string representation of node's children to be generated
once and reused later as long as the underlying list of children doesn't change
(and the connection is reused).

Whereas CXenstore uses a global generation count, modified at
transaction creation and node changes, the OCaml version relies on
comparisons of immutable trees. A per-node generation count is instead
faked in the connection cache - it allows clients to know that the
underlying information has changed and they should restart the sequence
of partial directory calls to get a consistent picture.
(i.e., while the generation count returned by CXenstore is globally unique,
this is not useful for this particular function, since generation count
is only compared against the results of previous invocations of partial
directory calls on the same node)

The cache is refreshed by OXenstored itself, and is limited to 8 entries
(by default, configurable).

A direct translation of the C code when it comes to string handling and
allocation.


CP-52226 - oxenstored: Return XS_ERROR (E2BIG) on socket connections too

Original XSA fix only returns XS_ERROR payload with E2BIG inside for
domU connections. libxenstore, however, expects an E2BIG response for
proper handling of partial directory calls, for dom0 as well.

In practice this removes the possibility for a "large packets" patch:
https://github.com/xenserver/xen.pg/blob/9cde7213128d777c4856d719c10945c052df0ce1/patches/oxenstore-large-packets.patch

It should be now superseded with the proper partial directory support.

====

This has been tested on large directories, where Oxenstored used by C clients (that did not know about partial packets, did not handle larger packets) no longer errors out and complies with the existing protocol:

xenstore-write /test "test"

for i in `seq 100 500`
do
    xenstore-write /test/entry_with_very_long_name_$i $i
done

xenstore-ls
xenstore-rm /test

I've also confirmed that the generation count is incremented on changes to node's children list between calls, and that the cache does indeed serve its purpose.

@last-genius
Copy link
Contributor Author

client support for this call in mirage/ocaml-xenstore will come separately

@last-genius
Copy link
Contributor Author

NOTE: cache size is said to be configurable but isn't yet - I wasn't sure if it's worth doing it and if so, through what means. Whatever the size, it doesn't stop any calls from making progress, it just makes them potentially slower (if you are doing X+1 chains of calls with different nodes at the same time)

let child_length = ref 0 in
let i = ref offset in
let cache_length = String.length children in
while !buffer_length + !child_length < maxlen && !i < cache_length do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be written as a fold to avoid the mutations. I think the goal is to add items, but only entire items as long as they fit, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've thought about expressing this in a nicer way, but all of the Array.fold, iter and such are implemented with mutations anyway, and I need to mutate several things so that the niceness isn't particularly maintained anyhow

oxenstored/store.ml Outdated Show resolved Hide resolved
| Some (_, generation_count, _, _) ->
Int64.add generation_count 1L
| None ->
1L
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to remember what the largest previous value was and start there, or some other way to ensure it is unique. Otherwise if we have to evict something from the cache (because the size got exceeded), and we readd it, then it'll go back to 1 and the client may not notice that it has changed.

We could store a highest-evicted-counter, or a counter of how many operations we've done on the connection cache, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do that, but then if the cache entry was evicted, the client will be forced to restart the sequence of calls as the generation count will have some new value. This is better than being oblivious to changes inbetween calls, of course.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@edwintorok could you please review the just-pushed fixup commit? I've created a per-connection generation count, so being pushed out of the cache will increment the generation count and force the client to restart the sequence of calls.

in
String.blit generation_s 0 buffer 0 genlen ;
String.blit children offset buffer genlen !buffer_length ;
if last_chunk then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we remove the cached entry if this was the last chunk?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the connection is kept, and another sequence of calls are issued, it's still useful

@last-genius last-genius changed the title Implement XS_DIRECTORY_PART call CP-52226: Implement XS_DIRECTORY_PART call Nov 12, 2024
Comment on lines 241 to 251
; transactions= Hashtbl.create 5
; directory_cache
; next_tid= initial_next_tid
; watches= Hashtbl.create 8

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the size here indicating an intent? It's an implementation concern but it's basically redundant to give small an initial size of < 16. These will all start with the same initial size.

oxenstored/connection.ml Show resolved Hide resolved
@last-genius
Copy link
Contributor Author

These changes passed Ring3 BST+BVT tests (Run ID #207701)

oxenstored/process.ml Outdated Show resolved Hide resolved
oxenstored/process.ml Show resolved Hide resolved
@last-genius last-genius force-pushed the private/asultanov/partial_directory branch from 427fbe7 to 7b90ff6 Compare December 5, 2024 09:38
@last-genius
Copy link
Contributor Author

last-genius commented Dec 5, 2024

Still passing BST+BVT with the fixup commit ✅

@lindig
Copy link
Contributor

lindig commented Dec 11, 2024

Are there outstanding changes or reasons not to approve this? @edwintorok @psafont

Copy link
Contributor

@edwintorok edwintorok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The per connection directory generation count looks OK now.

… per-connection cache

Implements XS_DIRECTORY_PART (originally specified in
https://lore.kernel.org/xen-devel/[email protected]/).
Like in CXenstored, it's limited to dom0. It is intended to list
directories larger than the size of a single payload (4096 bytes) over
several calls (not necessarily inside a single transaction).

To alleviate the impact of the differences in how node's children are
represented between C and OCaml Xenstoreds, implements a per-connection
cache allowing the string representation of node's children to be generated
once and reused later as long as the underlying list of children doesn't change
(and the connection is reused).

Whereas CXenstore uses a global generation count, modified at
transaction creation and node changes, the OCaml version relies on
comparisons of immutable trees. A per-node generation count is instead
faked in the connection cache - it allows clients to know that the
underlying information has changed and they should restart the sequence
of partial directory calls to get a consistent picture.
(i.e., while the generation count returned by CXenstore is globally unique,
this is not useful for this particular function, since generation count
is only compared against the results of previous invocations of partial
directory calls on the same node)

The cache is refreshed by OXenstored itself, and is limited to 8 entries
(by default, configurable).

A direct translation of the C code when it comes to string handling and
allocation.

Signed-off-by: Andrii Sultanov <[email protected]>
Original XSA fix only returns XS_ERROR payload with E2BIG inside for
domU connections. libxenstore, however, expects an E2BIG response for
proper handling of partial directory calls, for dom0 as well.

In practice this removes the possibility for a "large packets" patch:
https://github.com/xenserver/xen.pg/blob/9cde7213128d777c4856d719c10945c052df0ce1/patches/oxenstore-large-packets.patch

It should be now superseded with the proper partial directory support.

Signed-off-by: Andrii Sultanov <[email protected]>
@last-genius last-genius force-pushed the private/asultanov/partial_directory branch from 7b90ff6 to 623e263 Compare December 12, 2024 08:03
@last-genius
Copy link
Contributor Author

Squashed the fixup commit, merging

@last-genius last-genius merged commit 4ec8a41 into xapi-project:main Dec 12, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants