Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add BulkResponse wrapper for improved decoding of HTTP bulk responses #649

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

amotl
Copy link
Member

@amotl amotl commented Oct 2, 2024

About

CrateDB HTTP bulk responses include rowcount= items, either signalling if a bulk operation succeeded or failed.

  • success means rowcount=1
  • failure means rowcount=-2

https://cratedb.com/docs/crate/reference/en/latest/interfaces/http.html#error-handling

References

The code is coming from CrateDB Toolkit, but is generally usable beyond there.

@cla-bot cla-bot bot added the cla-signed label Oct 2, 2024
src/crate/client/test_result.py Outdated Show resolved Hide resolved
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests.py is the entrypoint for zope.testing to discover test cases. Many utility functions had to be refactored away from here, in order to avoid circular imports.

@amotl amotl force-pushed the bulk-response-wrapper branch 3 times, most recently from 7ef36cd to ccbffd2 Compare October 2, 2024 21:06
@amotl amotl requested review from seut and matriv October 2, 2024 21:13
@amotl amotl marked this pull request as ready for review October 2, 2024 21:13
Copy link
Contributor

@matriv matriv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx for improving the bulk response handling. Left some comments.

src/crate/client/result.py Outdated Show resolved Hide resolved
src/crate/client/test_result.py Outdated Show resolved Hide resolved
src/crate/client/test_result.py Outdated Show resolved Hide resolved
@amotl amotl requested a review from matriv October 3, 2024 12:47
@amotl amotl force-pushed the bulk-response-wrapper branch 2 times, most recently from 50347d4 to 433dfdf Compare October 3, 2024 13:03
src/crate/client/test_result.py Outdated Show resolved Hide resolved
CrateDB HTTP bulk responses include `rowcount=` items, either signalling
if a bulk operation succeeded or failed.

- success means `rowcount=1`
- failure means `rowcount=-2`

https://cratedb.com/docs/crate/reference/en/latest/interfaces/http.html#error-handling
self.assertEqual(result, [{"rowcount": 1}, {"rowcount": -2}])

# Verify decoded response.
bulk_response = BulkResponse(invalid_records, result)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this more carefully, I don't like that BulkResponse is something you need con construct manually. Couldn't we directly return it from the insert execution, instead of a list of BulkResultItem?

Copy link
Member Author

@amotl amotl Oct 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi. I don't think we can do anything like this here, because the Python database driver must adhere to the Python Database API Specification, so the BulkResponse is just meant as an optional extension to it.

Copy link
Member Author

@amotl amotl Oct 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

af32409, just added, provides a bit of documentation for that extension in the section about bulk operations.

Copy link
Member

@mfussenegger mfussenegger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate a bit more on the motivation for this?

Seems to me as if getting the info without this is not really that much more difficult.

@amotl
Copy link
Member Author

amotl commented Oct 7, 2024

Hi @mfussenegger,

The code is coming from CrateDB Toolkit, but is generally usable beyond there.

The idea is to have a concise code representation over here, for the BulkProcessor component, by abstracting away the handling of magic numbers from user-side code.

Because CrateDB Toolkit is rather heavy, and I wanted to make it reusable, I wanted to add it elsewhere. Another option would be to use the now separate cratedb-sqlalchemy package. BulkProcessor depends on it anyway, but BulkResponse doesn't, and would provide a generic little convenience wrapper around CrateDB's special responses to bulk requests.

@mfussenegger
Copy link
Member

The idea is to have a concise code representation over here, for the BulkProcessor component, by abstracting away the handling of magic numbers from user-side code.

That kinda confirms my point in that there's not much more code without the BulkResponse wrapper:

 cursor = self.connection.execute(statement=statement, parameters=operation.parameters)
 self.connection.commit()
 cratedb_bulk_result = getattr(cursor.context, "last_executemany_result", None)
 failed_records = 0
 success_count = 0
 for result in cratedb_bulk_result:
     if result["rowcount"] == -2:
         failed_records += 1
     else:
         success_count += 1
 self._metrics.count_success_total += success_count
 self.progress_bar and self.progress_bar.update(n=success_count)

This to me would even be more readable as it avoids the additional indirection.

Also, given that the DB API spec says:

Return values are not defined.

I think we could extend our return value - as long as we can keep it compatible with the existing one.

@amotl amotl marked this pull request as draft October 7, 2024 22:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants