Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SDK retrieves skipped documents for a targeted multi-partition query on a sub-partitioned container when the query targets only one physical partition and ODE is enabled #4917

Open
alesk-kontent opened this issue Nov 28, 2024 · 6 comments
Assignees
Labels

Comments

@alesk-kontent
Copy link

alesk-kontent commented Nov 28, 2024

Describe the bug

When I run a targeted multi-partition query with the OFFSET LIMIT clause on a sub-partitioned container, the SDK retrieves skipped documents if the following conditions are met:

  • The query options include the first part of the partition key.
  • The query targets only one physical partition.
  • ODE is enabled.

To reproduce

I wrote a console application to reproduce the issue. Please download the solution, open the Program.cs file and specify a connection string to an Azure Cosmos DB for NoSQL account. The application creates a new database with a container Subpartitioned. It also upserts random data, runs a query (with and without a partition key in the request options) and displays relevant information. You can run the application several times; it will only create resources and upsert data once.

The container contains 1,500 documents (articles) with the following properties:

Id : string
CustomerId : Guid
Label : string
Title : string

The value of the Id property is a random GUID. The value of the CustomerId and Label properties is always the same. The value of the Title property is unique for each document.

The indexing policy is automatic and consistent. As for the partition key, it has two components, customerId and id, so there are 1,500 logical partitions with one document. The maximum throughput is 20,000 RU/s so the container has two physical partitions, one of which is empty.

The query is SELECT * FROM c WHERE c.customerId = @customerId ORDER BY c.title OFFSET 1000 LIMIT 500 and the application runs it twice:

  • Request options contain a partition key with the customer id.
  • Request options do not contain a partition key.

I can see the following results:

Container: Subpartitioned
Partition key: ["df165b31-7641-4664-9549-37862ed806ee"]
Request charge: 47,65
Retrieved documents: 1500

Container: Subpartitioned
Partition key:
Request charge: 14,71
Retrieved documents: 500

Expected behavior

As the query only targets one physical partition, the request charge should be the same and with ODE enabled, the SDK should only retrieve 500 documents.

Actual behavior

With a partition key in the request options, the SDK will also retrieve the skipped documents.

Environment summary

SDK Version: 3.46.0
OS Version: Windows 11 Enterprise (10.0.22631 Build 22631)

Additional context

It happens in both Direct and Gateway mode.

@alesk-kontent
Copy link
Author

It's been over a month and still no response. Have you been able to reproduce the issue? What could be the cause? Is this an expected behavior?

@Pilchie
Copy link
Member

Pilchie commented Jan 10, 2025

@adityasa - can someone from query take a look here?

@adityasa adityasa self-assigned this Jan 10, 2025
@adityasa
Copy link
Contributor

adityasa commented Jan 10, 2025

@alesk-kontent thanks for reporting, we will look into this issue. We ran into a number of issues with ODE (originally related to partition split scenarios, but others as well) and as a result, ODE is turned off by default for a while. Moreover interaction of ODE with hierarchically partitioned container needs further hardening. Therefore, the general recommendation for any customers is to turn off ODE in their workload.

@adityasa adityasa assigned leminh98 and unassigned adityasa Jan 14, 2025
@adityasa
Copy link
Contributor

@leminh98 - please account for this scenario for hardening ODE code and tests.

@alesk-kontent
Copy link
Author

@adityasa, thank you for the confirmation and valuable insight into the current state of ODE and hierarchical partitioning. This puts us in a difficult position because some customers need classic pagination, i.e. the offset limit clause. ODE helps us to reduce the cost of queries because in most cases the data is only stored in one physical partition and the SDK does not have to retrieve the skipped documents. This also brings up an important question: how do we know that a problem has been fixed? For example, we reported this issue related to partition splitting. At our request, it was added to the list of known issues. Since we use hierarchical partitioning, we no longer include the partition key in the request options, as failures caused by partition splitting are unacceptable. Now the changelog states that the issue has been fixed in version 3.39.0. Does this mean that it is now safe to include the partition key in the request options and the SDK will always return the expected results when a partition splits? The situation with ODE is similar. I assume you will close this issue as your recommendation is not to use ODE. But how can I, as a developer, decide if it is safe to use ODE? There is a known issue with ODE, but it is different from this one. Should we stop using ODE because there are other issues we are not aware of? Or can we continue to use ODE because there is only one known issue and it does not affect us? I realize that it is difficult to answer these questions, because hierarchical partitioning seems to be quite difficult to get right. But the main problem seems to be a lack of insight into the current state of the SDK. There are bits and pieces of information scattered across GitHub issues and PRs, but no guidelines to point us in the right direction.

@adityasa
Copy link
Contributor

@alesk-kontent - this is a great set of questions, and admittedly not with great answers. Here's how I see current state and how (I think) we arrived here:

Specific to ODE - for a feature as important as ODE, the team would double down and implement forward fixes for any issues found. Unfortunately, due to the severity of issues, lack of quick fixes (for the whole class of issues), we decided not leave users exposed to this feature for longer period. And we had to make a very painful decision of turning the feature off. The full set of issues with ODE is not uncovered. There is also a design issue related to continuation token compatibility, which leaves a subset of scenarios (with a requirement of reusing continuation tokens after days) exposed and unrecoverable with ODE.
Since it's a public API change, we cannot drop the option to internal to avoid users from running into this.
That said, it is recommended to not use ODE until we are able to address these issues more wholistically. <-- This is the general guidance related to ODE. I will include this general recommendation to the known issues list.

There is also a cross cutting of features that heavily impact ODE. For e.g. newer features that have been developing/hardening alongside like hierarchical partitioning (where there may be gaps on either feature related to the other scenario).

Specific to hierarchical partitioning - the feature is released but degree of sophistication varies across the stack (SDKs, services, backend etc). Specific to query, one part that can improve is the partition elimination. It's not always on par with single partitioned container. Based on my read, this has to do with the fact that query collateral is not made aware of the hierarchical partitioning in the correct way (or from ground up). Due to this, when it comes to it, we opt for progress first (in case of infinite loop issue) followed by correctness (as part of the same change) over most optimal solution (which generally happens later). This is why some queries execute in cross partition manner when it may seem unnecessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants