Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return TIMESTAMP columns as native Python datetime objects to improve support for Apache Superset #395

Closed
wants to merge 28 commits into from

Conversation

Aymaru
Copy link

@Aymaru Aymaru commented Feb 22, 2021

Summary of the changes / Why this is an improvement

Changing the return type from set to list on get_pk_constraint(), solves a compatibility issue from Apache Superset that doesn't show the metadata from CrateDB tables in the SQL Lab.

CrateDB types TIMESTAMP and TIMESTAMP WITHOUT TIME ZONE, are returned as python datetime objects, correctly displaying date columns in charts and SQL Lab from Apache Superset.

Checklist

  • [x ] CLA is signed

This is to fix the apache superset bug, that doesn't show the cratedb table metadata in the sqleditor
This returns the data type of the columns in a query, to allow us to transform the dates from cratedb (long int) to python datetime
In execute(), transform the columns with type timestamp and timestamp without time zone to python datetime, this will correctly display dates in apache superset
@amotl
Copy link
Member

amotl commented Feb 23, 2021

Dear @Aymaru,

thanks a stack for your contribution. We will have a look and might come back to you with some comments.

With kind regards,
Andreas.

@amotl
Copy link
Member

amotl commented Mar 1, 2021

Dear @Aymaru,

first of all, thank you very much for your contribution again. I was able to take a more detailed look at the implementation right now and wanted to start a discussion around your patch.

Apologies that the CI job apparently didn't work when you submitted your patch. I enabled it again (see #396 (comment)) and just triggered it by pushing (and then removing) an empty commit to your PR.

The outcome is that the patch currently doesn't pass the tests. The spots are:

Will you be able to fix those? I also added some thoughts about the implementation itself.

With kind regards,
Andreas.

Comment on lines 58 to 66
col_types = self._result["col_types"]
tmp_data = self._result["rows"]

rows_to_convert = self._get_rows_to_convert_to_date(col_types)
tmp_data = self._convert_dates_to_datetime(tmp_data, rows_to_convert)

self._result["rows"] = tmp_data

self.rows = iter(self._result["rows"])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can come up with a more elegant and efficient implementation. When trying to follow the code, I recognize that it builds up a list of boolean flags which designates which columns to apply the datetime conversion to, right?

Based on that information, the code will then apply the conversion function on each column while mapping each row into memory through the list() function.

I am a bit hesitant on this and wanted to share my specific concerns:

a) The code might allocate a significant amount of memory on larger results.
b) On a very large result set, it might even run out of memory.
c) When applying the conversion, there are currently no sanity checks around which test for NULL- or other emptyness. I am specifically looking at float(str(x)[0:10]) here. Also, there is no other exception handling either. We might want to improve on this aspect.

Maybe we can wrap that conversion into a generator function in order to keep the memory footprint low and, while being at it, add some more bells and whistles on the error handling side?

@amotl amotl changed the title Datetime format Return TIMESTAMP columns as native Python datetime objects to improve support for Apache Superset Mar 1, 2021
@amotl
Copy link
Member

amotl commented Mar 15, 2021

Dear @Aymaru,

thank you very much for continuing your work on this. Please let me know if you are still able to spend time for more iterations on it or whether you would like to see us taking over. On the patch itself, I will add some more comments.

With kind regards,
Andreas.

rows_to_convert = self._get_rows_to_convert_to_date(self._result["col_types"])
for flag in rows_to_convert:
if flag:
t_rows = (row for row in self._result["rows"])
Copy link
Member

@amotl amotl Mar 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 61 will probably still load the full result into memory, right?

I was thinking about swapping in the generator right in between of self.rows = iter(self._result["rows"]), which is then responsible for converting between data types (timestamp epoch integer to Python datetime object).

In that manner, the pre-computation using _get_rows_to_convert_to_date() might become obsolete.

Comment on lines -264 to +265
return set(map(lambda el: el[0], rows))
return list(set(map(lambda el: el[0], rows)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I don't know about the specific background why there has been a set used here, can you elaborate why this should now return a list?

Copy link
Member

@amotl amotl Jun 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dear @Aymaru,

apologies for the late reply. We've split off this amendment into #426. For you, it apparently solved this issue:

Changing the return type from set to list on get_pk_constraint(), solves a compatibility issue from Apache Superset that doesn't show the metadata from CrateDB tables in the SQL Lab.

With the patch at #426, we are observing a regression. Maybe due to the upgrade to SQLAlchemy 1.4, and maybe other changes in Apache Superset, this is not needed anymore?

With kind regards,
Andreas.

1373932800000,
datetime.datetime(2013, 7, 15, 18, 0),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This currently raises an exception on CI, see [1]. The current code expects datetime.datetime(2013, 7, 16, 0, 0) here.

[1] https://github.com/crate/crate-python/runs/2114206441?check_suite_focus=true#step:5:388

@lgtm-com
Copy link

lgtm-com bot commented Mar 30, 2021

This pull request introduces 1 alert when merging 57ec090 into 71adf4d - view on LGTM.com

new alerts:

  • 1 for Unguarded next in generator

@lgtm-com
Copy link

lgtm-com bot commented Mar 30, 2021

This pull request introduces 1 alert when merging 9b82ac2 into 71adf4d - view on LGTM.com

new alerts:

  • 1 for Unguarded next in generator

@amotl
Copy link
Member

amotl commented Mar 31, 2021

Dear @Aymaru,

first things first: As I noticed recent activity from you on this patch again, I wanted to thank you for keeping up to it.

Further, I wanted to take the chance to tell you that we recently fixed some flaws within the test suite. Besides it now also works on macOS, more importantly, it received different fixes where the outcome of the software tests was indeterministic before. So, you might want to rebase your branch on top of the current master branch in order to also receive those improvements.

With kind regards,
Andreas.

P.S.: Because we updated the installation procedure (i.e. removing bootstrap.py), I am also humbly asking you to reevaluate the DEVELOP.rst file in order to be fully up to speed with the recent improvements.

@amotl
Copy link
Member

amotl commented Sep 29, 2022

Dear @Aymaru,

the main aspects of your patch have been converged into #442, #445, and crate/sqlalchemy-cratedb#89.

With #426, we evaluated that the specific detail of returning a list from get_pk_constraint creates a regression when used with Apache Superset, so we did not take it into further consideration. Please correct me if you think I am wrong.

With kind regards,
Andreas.

@amotl amotl marked this pull request as draft October 18, 2022 19:41
@amotl
Copy link
Member

amotl commented May 8, 2024

Hi again.

@hlcianfagna: Do you think that with your recent patch to fix the CrateDB adapter in Apache Superset, this issue can be considered as resolved, or do you think it is referring to a similar but different detail on datetime/timestamp compatibility matters?

@Aymaru: May we humbly ask you to check again, using the most recent versions of both Apache Superset, and CrateDB?

With kind regards,
Andreas.

@amotl
Copy link
Member

amotl commented Jun 17, 2024

Dear @hlcianfagna and @Aymaru,

we would like to thank you once again for your excellent improvements on this matter. It looks like Hernan's recent improvements made it into the upstream codebase of Apache Superset, so we may consider this issue as resolved?

@Aymaru: Can you verify if everything you aimed at works well with CrateDB now, when using a recent version of Apache Superset?

With kind regards,
Andreas.

@amotl
Copy link
Member

amotl commented Jun 26, 2024

Hi again. Based on those improvements, and that we didn't hear about any other flaws recently, we are closing this PR, in the assumption that everything works well in this regard.

Please let us know if you think anything is still wrong, best raised on the issue tracker of sqlalchemy-cratedb. Thank you again for your efforts on this!

@amotl amotl closed this Jun 26, 2024
@amotl amotl requested a review from hlcianfagna June 26, 2024 15:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants