Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Types: Unlock supporting timezone-aware DateTime fields #22

Merged
merged 4 commits into from
Jun 25, 2024

Conversation

amotl
Copy link
Member

@amotl amotl commented Jan 15, 2024

About

Timezone-aware SQLAlchemy DateTime fields apparently are supported already. This patch just removes a corresponding assertion which prevented them from actually being used.

References

Originally, this is coming from a monkeypatch to meltano-target-cratedb. Most probably, it is also related to those requests:

Other than this, that other patch is also related:

Backlog

  • Software tests.
  • Changelog item.
  • Check if documentation needs to be updated.

@amotl amotl changed the title Types: Unlock supporting timezone-aware DateTime fields [TYPES] Unlock supporting timezone-aware DateTime fields Jan 15, 2024
@amotl amotl changed the title [TYPES] Unlock supporting timezone-aware DateTime fields Types: Unlock supporting timezone-aware DateTime fields Jan 15, 2024
@amotl amotl requested review from seut, matriv and surister June 22, 2024 20:38
@amotl amotl marked this pull request as ready for review June 22, 2024 20:39
@amotl amotl force-pushed the amo/timezone-aware branch 2 times, most recently from 4fb90ac to 347f2a7 Compare June 22, 2024 21:03
Copy link
Contributor

@matriv matriv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments, thx.

CHANGES.md Outdated Show resolved Hide resolved
tests/datetime_test.py Outdated Show resolved Hide resolved
tests/datetime_test.py Outdated Show resolved Hide resolved
assert result["date"].year == 2009
assert result["datetime"].year == 2009
assert result["datetime"].tzname() is None
assert result["datetime"].timetz() == dt.time(19, 19, 30)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any conversion, using the CST timezone, the tz is not set on the result (as it should be) but the value stored shouldn't it be converted to UTC first ? (so hour should be 18?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When inserting or updating records using timezone-aware Python date or datetime objects, timezone information will not be preserved.

-- https://cratedb.com/docs/sqlalchemy-cratedb/data-types.html

Shouldn't it be converted to UTC first?

I am not sure. We may want to validate how it works with PostgreSQL and its SQLAlchemy dialect?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to use timestamp WITH time zone as the column data type to make the conversion, otherwise the time zone information is just stripped:

cr> create table test(id string, tz timestamp with time zone);
CREATE OK, 1 row affected (0.359 sec)
cr> insert into test values('UTC1', '2024-06-24 10:20:30.123UTC');
INSERT OK, 1 row affected (0.017 sec)
cr> insert into test values('UTC2', '2024-06-24 10:20:30.123');
INSERT OK, 1 row affected (0.017 sec)
cr> insert into test values('+2h', '2024-06-24 10:20:30.123+02:00');
INSERT OK, 1 row affected (0.009 sec)
cr> insert into test values('+3h', '2024-06-24 10:20:30.123+03:00');
INSERT OK, 1 row affected (0.018 sec)
cr> select * from test order by id;
+------+---------------+
| id   |            tz |
+------+---------------+
| +2h  | 1719217230123 |
| +3h  | 1719213630123 |
| UTC1 | 1719224430123 |
| UTC2 | 1719224430123 |
+------+---------------+
SELECT 4 rows in set (0.004 sec)
cr> drop table test;
DROP OK, 1 row affected (0.045 sec)
cr> create table test(id string, tz timestamp without time zone);
CREATE OK, 1 row affected (0.361 sec)
cr> insert into test values('UTC1', '2024-06-24 10:20:30.123UTC');
INSERT OK, 1 row affected (0.017 sec)
cr> insert into test values('UTC2', '2024-06-24 10:20:30.123');
INSERT OK, 1 row affected (0.017 sec)
cr> insert into test values('+2h', '2024-06-24 10:20:30.123+02:00');
INSERT OK, 1 row affected (0.018 sec)
cr> insert into test values('+3h', '2024-06-24 10:20:30.123+03:00');
INSERT OK, 1 row affected (0.017 sec)
cr> select * from test order by id;
+------+---------------+
| id   |            tz |
+------+---------------+
| +2h  | 1719224430123 |
| +3h  | 1719224430123 |
| UTC1 | 1719224430123 |
| UTC2 | 1719224430123 |
+------+---------------+
SELECT 4 rows in set (0.004 sec)

Can we have a test with time zone for the col type?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fyi, we follow the behavior of postgres:

matriv=> create table test(id text, tz timestamp without time zone);
CREATE TABLE
matriv=> insert into test values('UTC1', '2024-06-24 10:20:30.123UTC');
INSERT 0 1
matriv=> insert into test values('UTC2', '2024-06-24 10:20:30.123');
INSERT 0 1
matriv=> insert into test values('+2hr', '2024-06-24 10:20:30.123+02:00');
INSERT 0 1
matriv=> insert into test values('+3hr', '2024-06-24 10:20:30.123+03:00');
INSERT 0 1
matriv=> select * from test;
  id  |           tz            
------+-------------------------
 UTC1 | 2024-06-24 10:20:30.123
 UTC2 | 2024-06-24 10:20:30.123
 +2hr | 2024-06-24 10:20:30.123
 +3hr | 2024-06-24 10:20:30.123
(4 rows)

matriv=> drop table test;
DROP TABLE
matriv=> create table test(id text, tz timestamp with time zone);
CREATE TABLE
matriv=> insert into test values('UTC1', '2024-06-24 10:20:30.123UTC');
INSERT 0 1
matriv=> insert into test values('UTC2', '2024-06-24 10:20:30.123');
INSERT 0 1
matriv=> insert into test values('+2hr', '2024-06-24 10:20:30.123+02:00');
INSERT 0 1
matriv=> insert into test values('+3hr', '2024-06-24 10:20:30.123+03:00');
INSERT 0 1
matriv=> select * from test;
  id  |             tz             
------+----------------------------
 UTC1 | 2024-06-24 13:20:30.123+03
 UTC2 | 2024-06-24 10:20:30.123+03
 +2hr | 2024-06-24 11:20:30.123+03
 +3hr | 2024-06-24 10:20:30.123+03
(4 rows)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your suggestions. I will add more test cases, and also validate the implementation using the whole set of possible datetime/timestamp column types.

Copy link
Member Author

@amotl amotl Jun 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

619c991, added just now, starts using time zone Europe/Kyiv for testing, but it does not change anything on the test outcome.

Copy link
Member Author

@amotl amotl Jun 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've inspected the wire using sudo ngrep -d lo0 -Wbyline port 4200, and found that on the JSON level, serialization does not take the time zone into account when invoking the software test test_datetime_tz using pytest --no-cov -k test_datetime_tz?

The library just sends 2009-05-13T19:19:30.123456Z to the wire, effectively ignoring time zone information at data submission time already. It is clear that there is a flaw.

{"stmt": "INSERT INTO foobar (name, date, datetime_notz, datetime_tz) VALUES (?, ?, ?, ?)", "args": ["foo", "2009-05-13", "2009-05-13T19:19:30.123456Z", "2009-05-13T19:19:30.123456Z"]}

Copy link
Member Author

@amotl amotl Jun 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Contrary to the observation above, like @matriv suggested, it looks like it works well when using direct SQL for querying.

DDL

SHOW CREATE TABLE doc.foobar;
CREATE TABLE IF NOT EXISTS "doc"."foobar" (    
   "name" TEXT NOT NULL,                       
   "date" TIMESTAMP WITHOUT TIME ZONE,         
   "datetime_notz" TIMESTAMP WITHOUT TIME ZONE,
   "datetime_tz" TIMESTAMP WITH TIME ZONE,     
   PRIMARY KEY ("name")                        
)                                              

DML

INSERT INTO foobar (name, date, datetime_notz, datetime_tz) VALUES ('foo', '2009-05-13', '2009-05-13T19:19:30.123456Z', '2009-05-13T19:19:30.123456Z');
INSERT INTO foobar (name, date, datetime_notz, datetime_tz) VALUES ('bar', '2009-05-13', '2009-05-13T19:19:30.123456+03:00', '2009-05-13T19:19:30.123456+03:00');

DQL

SELECT * FROM foobar;
+------+---------------+---------------+---------------+
| name |          date | datetime_notz |   datetime_tz |
+------+---------------+---------------+---------------+
| foo  | 1242172800000 | 1242242370123 | 1242242370123 |
| bar  | 1242172800000 | 1242242370123 | 1242231570123 |
+------+---------------+---------------+---------------+

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this spirit, it is clear that there is a flaw, and the relevant spot can be discovered easily. 💥

return value.strftime('%Y-%m-%dT%H:%M:%S.%fZ')

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

145f701 aims to get the improvement straight, test outcomes now reflect the relevant conversion implicitly performed by CrateDB.

@amotl amotl force-pushed the amo/timezone-aware branch 2 times, most recently from c7212b2 to 833328b Compare June 24, 2024 10:15
@amotl amotl requested a review from matriv June 24, 2024 10:18
@amotl amotl marked this pull request as draft June 24, 2024 12:46
@amotl amotl force-pushed the amo/timezone-aware branch 3 times, most recently from 4482076 to 009d233 Compare June 24, 2024 18:27
Comment on lines 62 to 59
INPUT_DATETIME_NOTZ = dt.datetime(2009, 5, 13, 19, 19, 30, 123456)
INPUT_DATETIME_TZ = dt.datetime(2009, 5, 13, 19, 19, 30, 123456, tzinfo=CST())
OUTPUT_DATE = INPUT_DATE
OUTPUT_TIME = dt.time(19, 19, 30, 123000)
OUTPUT_DATETIME_NOTZ = dt.datetime(2009, 5, 13, 19, 19, 30, 123000)
OUTPUT_DATETIME_TZ = dt.datetime(2009, 5, 13, 19, 19, 30, 123000)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks a bit confusing that there isn't a difference when passing in a time zone or not. I guess this relates that when no time zone is given the local one which is also CST will be used in python? Inside CrateDB, all timestamps without timezone are treat as UTC. Shouldn't we then rather use an exotic one so the outputs aren't the same?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will be happy to steer this patch into that direction. Thanks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this relates that when no time zone is given the local one which is also CST will be used in Python?

The root cause was different.

amotl and others added 3 commits June 25, 2024 13:29
... instead of `sa.DateTime` and `sa.TIMESTAMP`. Introduce
`visit_TIMESTAMP` from PGTypeCompiler to render SQL DDL clauses like
`TIMESTAMP WITH|WITHOUT TIME ZONE`.
@amotl amotl marked this pull request as ready for review June 25, 2024 12:58
@amotl amotl requested a review from seut June 25, 2024 12:58
dt = datetime.strptime(args[1], '%Y-%m-%dT%H:%M:%S.%fZ')
dt = datetime.strptime(args[1], '%Y-%m-%dT%H:%M:%S.%f')
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can not use the %z suffix here. Otherwise:

ValueError: time data '2024-06-25T13:48:33.295461' does not match format '%Y-%m-%dT%H:%M:%S.%f%z'

dt = datetime.strptime(args[2], '%Y-%m-%dT%H:%M:%S.%fZ')
dt = datetime.strptime(args[2], '%Y-%m-%dT%H:%M:%S.%f')
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dito.

Copy link
Member

@seut seut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 thx!

@amotl amotl merged commit fb4c7be into main Jun 25, 2024
27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants