Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Followup: Return TIMESTAMP columns as native Python datetime objects #437

Closed
wants to merge 23 commits into from

Conversation

amotl
Copy link
Member

@amotl amotl commented Jul 18, 2022

Dear @Aymaru,

apologies for the late reply.

On the patch you submitted at #395 the other day, I recently exercised #426 separately. Your other improvements from there have been converged into this very patch, which has been slightly cleaned up to reflect the changes without the gist of #426 and other spurious commits introduced by merging from the master branch.

I've tried to keep your original commits for now, but I might finally squash them together while working on the patch. The next steps are rebasing upon master and adding eventual fixup commits.

Feel free to also add additional comments or suggestions, specifically if you can spot a place where I missed to reflect the improvements from your original patch #395 correctly.

With kind regards,
Andreas.


Backlog

@staticmethod
def _transform_date_columns(row, gen_flags):
"""
Generates iterates over each value from a row and converts timestamps to pandas TIMESTAMP
Copy link
Contributor

@matriv matriv Jul 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do something more elegant with lambdas? (I have basic python knowledge though).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using the generator here is fine. Let me know if you think otherwise.

Copy link
Member Author

@amotl amotl Jul 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When looking at this again, specifically after cleaning up a bit with f355164, I tend to agree with you. It is probably enough that _transform_result_types, which iterates all records, is a generator, in order to save memory.

On the other hand, _transform_datetime_columns, which iterates all columns per record, does not necessarily have to be a generator. I will keep the original patch unchanged on this matter yet, in order to not change too much prematurely, and will await the official review.

@amotl amotl force-pushed the amo/aymaru-datetime-format branch from db91b3d to ea4f9c7 Compare July 25, 2022 08:40
Comment on lines 87 to 88
if value < 0:
yield None
Copy link
Member Author

@amotl amotl Jul 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about this guard? Was it thought to protect against negative timestamps? In practice, they would translate to Python well, no?

>>> datetime.fromtimestamp(-2342342342415 / 1e3)
datetime.datetime(1895, 10, 10, 14, 20, 57, 585000)

@amotl amotl force-pushed the amo/aymaru-datetime-format branch from 2a9b4c2 to 6522e43 Compare July 26, 2022 15:00
Comment on lines -83 to -85
>>> (now - location.datetime_tz).seconds < 4
True

Copy link
Member Author

@amotl amotl Jul 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did this test case protect against anything significant? Currently, it croaks on my machine with a difference of 79600s (22.11 h), and I can not make much other sense of it. Most probably, I am missing the point.

Otherwise, the values of the datetime objects will depend on the time
zone setting of the system, making it difficult to compare
deterministically.
- Naming things
- Slight structural changes
- Use iterator instead of generator for column type flagging
- Improve inline documentation
@amotl amotl force-pushed the amo/aymaru-datetime-format branch from 6522e43 to f355164 Compare July 26, 2022 15:07

Currently, only converting to native `datetime` objects is implemented.
"""
datetime_column_types = [11, 15]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does someone have a resource at hand where those column code types are enumerated? The table at 1 coincidentally lists _timestamp without time zone as 1115, but here the code is apparently expecting two-digit integer numbers.

I will be happy to receive further pointers for better educating myself on this topic.

Footnotes

  1. https://crate.io/docs/crate/reference/en/5.0/interfaces/postgres.html#pg-type

Copy link
Member

@mfussenegger mfussenegger Jul 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://crate.io/docs/crate/reference/en/5.0/interfaces/http.html#column-types

The code here will probably also need to handle arrays of timestamps.

Copy link
Member Author

@amotl amotl Jul 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I will add the conversion for container types as well.

Reading up on your reference, I am asking myself whether it is appropriate that type=15 (Unchecked object) is handled here as well?

break

if flag and value is not None:
value = datetime.fromtimestamp(value / 1e3)
Copy link
Member

@mfussenegger mfussenegger Jul 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using utcfromtimestamp is the less error prone option here.

There are two scenarios:

The user stores UTC on the server:

  • fromtimestamp would be wrong, or at least confusing as it would convert to localtime but without attaching a timezone info.
  • utcfromtimestamp would at least not be wrong.

The user stores local timestamps on the server. Unless they also store the timezone in a separate field this is error prone but

  • utcfromtimestamp would at least preserve the value used on the server
  • fromtimestamp would be wrong, because it would apply another offset.

It's important that we don't attach a tzinfo given that we don't have one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option could be to to let users configure the timezone on the client (with default UTC) and always create timezone aware objects.

The default could turn out to be wrong, but at least the user has an easy option to fix it, and it's easier to spot a incorrectly attached tzone then it is to figure out that a conversion happened and why/where some conversion happend.
It's also way more convenient/less error prone to continue working with tz aware datetimes.

Copy link
Member Author

@amotl amotl Jul 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. With 3154d32, I've switched to use datetime.utcfromtimestamp instead.

Another option could be to to let users configure the timezone on the client (with default UTC) and always create timezone aware objects.

Shall we bring in the timezone awareness (#359, #361) with a subsequent patch right after this one?

@amotl
Copy link
Member Author

amotl commented Sep 29, 2022

@amotl amotl closed this Sep 29, 2022
@amotl amotl deleted the amo/aymaru-datetime-format branch September 29, 2022 18:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants