Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DynamoDB paginator fails when partition key is of type number #2932

Open
twitu opened this issue Jul 16, 2021 · 9 comments · May be fixed by boto/botocore#3203
Open

DynamoDB paginator fails when partition key is of type number #2932

twitu opened this issue Jul 16, 2021 · 9 comments · May be fixed by boto/botocore#3203
Labels
bug This issue is a confirmed bug. dynamodb p2 This is a standard priority issue pagination

Comments

@twitu
Copy link

twitu commented Jul 16, 2021

Describe the bug
Pagination for dynamodb fails when the partition key is of type number.

This happens because the returned LastEvaluatedKey is of type Decimal and it cannot be json encoded (json.dumps fails). This happens in botocore.paginate.encode it tries to do convert the token to string.

This issue talks about the issue with Decimal types. The gist is that a Number type is returned like this { value: 3} -> { value: { N: Decimal("3")}}. This cannot be converted to string by json.

Steps to reproduce

  1. Create a table, choose Number as type of partition key, choose a sort key (anything is fine)
  2. Add two records with same partition key and different sort key
  3. Run a query similar to the following.
ddb_client = boto3.resource("dynamodb").meta.client
query_paginator = ddb_client.get_paginator("query")

page_iterator = query_paginator.paginate(
    TableName=<TABLE-NAME>,
    KeyConditionExpression=Key(<PK>).eq(<PK-VALUE>),
    ScanIndexForward=False,
    PaginationConfig = {
        MaxItems: 1,
        PageSize: 1,
    }
)

for page in page_iterator:
  for return_value in page["Items"]:
      print(return_value)

Expected behavior
Pagination should succeed. It should handle the Decimal type by itself.

@twitu twitu added the needs-triage This issue or PR still needs to be triaged. label Jul 16, 2021
@stobrien89
Copy link
Contributor

Hi @twitu,

Thanks for pointing this out. I can definitely see where this could cause some problems. Would you be able to provide debug logs by adding boto3.set_stream_logger('') to your code? Please redact any sensitive information, such as account numbers. Thanks!

@stobrien89 stobrien89 self-assigned this Jul 19, 2021
@stobrien89 stobrien89 added guidance Question that needs advice or information. response-requested Waiting on additional information or feedback. bug This issue is a confirmed bug. and removed needs-triage This issue or PR still needs to be triaged. guidance Question that needs advice or information. labels Jul 19, 2021
@github-actions
Copy link

Greetings! It looks like this issue hasn’t been active in longer than a week. We encourage you to check if this is still an issue in the latest release. Because it has been longer than a week since the last update on this, and in the absence of more information, we will be closing this issue soon. If you find that this is still a problem, please feel free to provide a comment or add an upvote to prevent automatic closure, or if the issue is already closed, please feel free to open a new one.

@github-actions github-actions bot added the closing-soon This issue will automatically close in 4 days unless further comments are made. label Jul 26, 2021
@stobrien89 stobrien89 removed the closing-soon This issue will automatically close in 4 days unless further comments are made. label Jul 27, 2021
@github-actions github-actions bot removed the response-requested Waiting on additional information or feedback. label Jul 27, 2021
@twitu
Copy link
Author

twitu commented Jul 29, 2021

I've added debug logs here. This is from running the function in a vanilla Python3.8 Lambda.

botocore.retryhandler [DEBUG] No retry needed.

botocore.hooks [DEBUG] Event after-call.dynamodb.Query: calling handler <bound method TransformationInjector.inject_attribute_value_output of <boto3.dynamodb.transform.TransformationInjector object at 0x7f8f4a706d30>>

[ERROR] TypeError: Object of type Decimal is not JSON serializable
Traceback (most recent call last):
  File "/var/task/lambda_function.py", line 158, in lambda_handler
    for page in page_iterator:
  File "/var/runtime/botocore/paginate.py", line 280, in __iter__
    self._truncate_response(parsed, primary_result_key,
  File "/var/runtime/botocore/paginate.py", line 424, in _truncate_response
    self.resume_token = next_token
  File "/var/runtime/botocore/paginate.py", line 230, in resume_token
    self._resume_token = self._token_encoder.encode(value)
  File "/var/runtime/botocore/paginate.py", line 65, in encode
    json_string = json.dumps(encoded_token)
  File "/var/lang/lib/python3.8/json/__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
  File "/var/lang/lib/python3.8/json/encoder.py", line 199, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/var/lang/lib/python3.8/json/encoder.py", line 257, in iterencode
    return _iterencode(o, 0)
  File "/var/lang/lib/python3.8/json/encoder.py", line 179, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '

@twitu
Copy link
Author

twitu commented Jul 29, 2021

I have a specific use case where I want n records satisfying a filter condition. I don't really need the ability to paginate. Here's a partial workaround** for anyone who might be facing this issue.

In the pseudocode below I just set the MaxItem to the desired n in PaginationConfig i.e. I do not set the PageSize parameter. This makes the first page automatically itself yield the required number of values.


page_iterator = ddb_client.get_paginator("query").paginate(PaginationConfig = { "MaxItem": n }, **other_arguments)

for page in page_iterator:
  for return_value in page["Items"]:
    print(return_value)

@stobrien89
Copy link
Contributor

Hi @twitu,

Thanks for the additional information. Leaving this marked as a bug for now, as it appears #369 may need to be addressed before anything can be done about this.

@bwo
Copy link

bwo commented Aug 20, 2021

I don't see why that's the case. It's possible to handle encoding decimal.Decimal objects without making not using decimals a generally available feature, especially since the failing function, TokenEncoder.encode, is documented as producing an "opaque string". Presumably one could extend TokenEncoder._encode, or provide a subclass of JSONEncoder that handles decimals.

@bwo
Copy link

bwo commented Aug 20, 2021

 $ git diff                                                                                                            
diff --git a/botocore/paginate.py b/botocore/paginate.py                                                                                                                   
index b08c7ed8b..501f8bf74 100644
--- a/botocore/paginate.py
+++ b/botocore/paginate.py
@@ -15,6 +15,7 @@ from itertools import tee
 
 from botocore.compat import six
 
+from decimal import Decimal
 import jmespath
 import json
 import base64
@@ -27,6 +28,13 @@ from botocore.utils import set_value_from_jmespath, merge_dicts
 log = logging.getLogger(__name__)
 
 
+class DecimalEncoder(json.JSONEncoder):
+    def default(self, o):
+        if isinstance(o, Decimal):
+            return str(o)
+        return json.JSONEncoder.default(o)
+
+
 class TokenEncoder(object):
     """Encodes dictionaries into opaque strings.
 
@@ -52,7 +60,7 @@ class TokenEncoder(object):
         try:
             # Try just using json dumps first to avoid having to traverse
             # and encode the dict. In 99.9999% of cases this will work.
-            json_string = json.dumps(token)
+            json_string = json.dumps(token, cls=DecimalEncoder)
         except (TypeError, UnicodeDecodeError):
             # If normal dumping failed, go through and base64 encode all bytes.
             encoded_token, encoded_keys = self._encode(token, [])

seems to work

(ETA: well, you'd also want to update the second call to json.dumps.)

@stobrien89 stobrien89 removed their assignment Jan 7, 2022
@antoinejeannot
Copy link

Up. Also having this issue. Unfortunately, the fix proposed by @twitu does not work with a large number of items, since boto seems to self paginate. However, it does works for a small number of items.

Any update @stobrien89 ?
Seems like the #369 issue could last another 7 years :)

Anyway, thanks for your time and sweat.

@arunim2405
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a confirmed bug. dynamodb p2 This is a standard priority issue pagination
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants