Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return TIMESTAMP columns as native Python datetime objects to improve support for Apache Superset #395

Closed
wants to merge 28 commits into from
Closed
Show file tree
Hide file tree
Changes from 3 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
0212487
return a list instead of a set on get_pk_constraint
Aymaru Feb 22, 2021
f9bd08c
Changed the sql_path
Aymaru Feb 22, 2021
66e7b51
Transform dates from crate to python datetime
Aymaru Feb 22, 2021
3dc5cb7
Datetime conversion implemented using map and generator
Aymaru Mar 2, 2021
8dfcf80
Updated tests
Aymaru Mar 2, 2021
b08cb16
Using generators to work with large datasets
Aymaru Mar 3, 2021
efe8bfb
changed tests
Aymaru Mar 3, 2021
2d1f41f
fix
Aymaru Mar 3, 2021
246d917
test
Aymaru Mar 3, 2021
94b3d1e
updated datetime transformation using generators and updated test cases
Aymaru Mar 4, 2021
85b7c7f
cleaning debug prints
Aymaru Mar 4, 2021
2cf32e2
Passing a generator of the flags instead of passing the list of values
Aymaru Mar 4, 2021
fbef448
Removed tests
Aymaru Mar 4, 2021
3b66626
Merge branch 'master' into datetime-format
Aymaru Mar 4, 2021
57ec090
updated conversion of timestamps
Aymaru Mar 30, 2021
9b82ac2
Added pandas dependency
Aymaru Mar 30, 2021
5b4d5ec
Changed pandas timestamp to python datetime && deleted pandas dependecy
Aymaru Mar 30, 2021
e9788ea
fixed - E226 missing whitespace around arithmetic operator
Aymaru Mar 30, 2021
eebd670
Changed yield value
Aymaru Mar 31, 2021
187ed3f
Adjust badges, bump SQLAlchemy test version and add "workflow_dispatch"
amotl Feb 23, 2021
e3b8d50
updated conversion of timestamps
Aymaru Mar 30, 2021
32a4826
Added pandas dependency
Aymaru Mar 30, 2021
2c9e3aa
Changed pandas timestamp to python datetime && deleted pandas dependecy
Aymaru Mar 30, 2021
2adf833
fixed - E226 missing whitespace around arithmetic operator
Aymaru Mar 30, 2021
9bc5f51
Changed yield value
Aymaru Mar 31, 2021
8e53da5
Merge branch 'master' into datetime-format
Aymaru Mar 31, 2021
5ddced3
Merge remote-tracking branch 'origin/datetime-format' into datetime-f…
Aymaru Mar 31, 2021
aaaf7ee
Validate date type in processors
Aymaru Mar 31, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 28 additions & 0 deletions src/crate/client/cursor.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@
from .exceptions import ProgrammingError
from distutils.version import StrictVersion
import warnings
from datetime import datetime

BULK_INSERT_MIN_VERSION = StrictVersion("0.42.0")

Expand Down Expand Up @@ -53,8 +54,35 @@ def execute(self, sql, parameters=None, bulk_parameters=None):
self._result = self.connection.client.sql(sql, parameters,
bulk_parameters)
if "rows" in self._result:
if "col_types" in self._result:
col_types = self._result["col_types"]
tmp_data = self._result["rows"]

rows_to_convert = self._get_rows_to_convert_to_date(col_types)
tmp_data = self._convert_dates_to_datetime(tmp_data, rows_to_convert)

self._result["rows"] = tmp_data

self.rows = iter(self._result["rows"])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can come up with a more elegant and efficient implementation. When trying to follow the code, I recognize that it builds up a list of boolean flags which designates which columns to apply the datetime conversion to, right?

Based on that information, the code will then apply the conversion function on each column while mapping each row into memory through the list() function.

I am a bit hesitant on this and wanted to share my specific concerns:

a) The code might allocate a significant amount of memory on larger results.
b) On a very large result set, it might even run out of memory.
c) When applying the conversion, there are currently no sanity checks around which test for NULL- or other emptyness. I am specifically looking at float(str(x)[0:10]) here. Also, there is no other exception handling either. We might want to improve on this aspect.

Maybe we can wrap that conversion into a generator function in order to keep the memory footprint low and, while being at it, add some more bells and whistles on the error handling side?


@staticmethod
def _get_rows_to_convert_to_date(col_types):
return [True if col_type == 11 or col_type == 15 else False for col_type in col_types]

@staticmethod
def _date_to_datetime(row, rows_to_convert):
return list(
map(lambda x, y:
datetime.fromtimestamp(float(str(x)[0:10])) if y else x,
row,
rows_to_convert))

def _convert_dates_to_datetime(self, rows, rows_to_convert):
return list(
map(lambda x:
self._date_to_datetime(x, rows_to_convert),
rows))

def executemany(self, sql, seq_of_parameters):
"""
Prepare a database operation (query or command) and then execute it
Expand Down
2 changes: 1 addition & 1 deletion src/crate/client/http.py
Original file line number Diff line number Diff line change
Expand Up @@ -315,7 +315,7 @@ class Client(object):
Crate connection client using crate's HTTP API.
"""

SQL_PATH = '/_sql'
SQL_PATH = '/_sql?types'
"""Crate URI path for issuing SQL statements."""

retry_interval = 30
Expand Down
5 changes: 3 additions & 2 deletions src/crate/client/sqlalchemy/dialect.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@
"smallint": sqltypes.SmallInteger,
"timestamp": sqltypes.TIMESTAMP,
"timestamp with time zone": sqltypes.TIMESTAMP,
"timestamp without time zone": sqltypes.TIMESTAMP,
"object": Object,
"integer": sqltypes.Integer,
"long": sqltypes.NUMERIC,
Expand All @@ -64,6 +65,7 @@
TYPES_MAP["smallint_array"] = ARRAY(sqltypes.SmallInteger)
TYPES_MAP["timestamp_array"] = ARRAY(sqltypes.TIMESTAMP)
TYPES_MAP["timestamp with time zone_array"] = ARRAY(sqltypes.TIMESTAMP)
TYPES_MAP["timestamp without time zone_array"] = ARRAY(sqltypes.TIMESTAMP)
TYPES_MAP["long_array"] = ARRAY(sqltypes.NUMERIC)
TYPES_MAP["bigint_array"] = ARRAY(sqltypes.NUMERIC)
TYPES_MAP["double_array"] = ARRAY(sqltypes.DECIMAL)
Expand All @@ -75,7 +77,6 @@
except Exception:
pass


log = logging.getLogger(__name__)


Expand Down Expand Up @@ -261,7 +262,7 @@ def get_pk_constraint(self, engine, table_name, schema=None, **kw):

def result_fun(result):
rows = result.fetchall()
return set(map(lambda el: el[0], rows))
return list(set(map(lambda el: el[0], rows)))
Comment on lines -264 to +269
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I don't know about the specific background why there has been a set used here, can you elaborate why this should now return a list?

Copy link
Member

@amotl amotl Jun 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dear @Aymaru,

apologies for the late reply. We've split off this amendment into #426. For you, it apparently solved this issue:

Changing the return type from set to list on get_pk_constraint(), solves a compatibility issue from Apache Superset that doesn't show the metadata from CrateDB tables in the SQL Lab.

With the patch at #426, we are observing a regression. Maybe due to the upgrade to SQLAlchemy 1.4, and maybe other changes in Apache Superset, this is not needed anymore?

With kind regards,
Andreas.

else:
query = """SELECT constraint_name
FROM information_schema.table_constraints
Expand Down