Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ibis bokeh datetime axis #5522

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open

Conversation

MarcSkovMadsen
Copy link
Collaborator

Closes #5521

This is my first real attempt at a contribution to HoloViews. So please be nice to me 😄

I hope an experienced core contributor can show me how this should really be solved or point me in the right direction. It a bit hard to do a real unit test because the functions contains so much code.

@MarcSkovMadsen MarcSkovMadsen added this to the 5521 milestone Nov 26, 2022
@codecov-commenter
Copy link

codecov-commenter commented Nov 26, 2022

Codecov Report

Merging #5522 (d66dd2e) into master (0317237) will decrease coverage by 0.50%.
The diff coverage is 12.28%.

@@            Coverage Diff             @@
##           master    #5522      +/-   ##
==========================================
- Coverage   88.07%   87.57%   -0.51%     
==========================================
  Files         302      302              
  Lines       62283    62335      +52     
==========================================
- Hits        54855    54589     -266     
- Misses       7428     7746     +318     
Impacted Files Coverage Δ
holoviews/core/util.py 85.91% <ø> (ø)
holoviews/tests/core/data/test_ibisinterface.py 4.80% <10.63%> (-75.69%) ⬇️
holoviews/core/data/ibis.py 31.14% <20.00%> (-48.32%) ⬇️
holoviews/core/data/util.py 53.57% <0.00%> (-5.36%) ⬇️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@MarcSkovMadsen
Copy link
Collaborator Author

MarcSkovMadsen commented Nov 27, 2022

I could see that after my change a CategoricalAxis would not be correctly identified.

I have tried for 1½ hours now to fix, but the function _axes_props is doing way too much and there is no documentation. I don't have a chance to fix. I give up.

@MarcSkovMadsen MarcSkovMadsen modified the milestones: 5521, 1.15.3 Nov 27, 2022
@MarcSkovMadsen
Copy link
Collaborator Author

MarcSkovMadsen commented Nov 27, 2022

Ok. Wasted another hour. I can that this function does not work for the ibis backend either.

image

And it is so long and complicated. And its even changing the specified parameters ranges in place. Makes it quite hard to reason about.

@@ -111,7 +111,13 @@ def nonzero(cls, dataset):
@cached
def range(cls, dataset, dimension):
dimension = dataset.get_dimension(dimension, strict=True)
if cls.dtype(dataset, dimension).kind in 'SUO':
if cls.dtype(dataset, dimension).kind == 'O':
# Can this be done more efficiently?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think your previous implementation was better. This will be slow if the database table has a billion rows.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that the previous implementation did not produce the expected results.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a problem with ibis. I will probably add an if statement to your previous implementation, checking if the output is a pd.series and take the first value if it is.

import os
import sqlite3

import ibis
import pandas as pd


def create_sqlite(df, name):
    filename = "tmp.sqlite"
    os.remove(filename)
    con = sqlite3.Connection(filename)
    df.to_sql(name, con, index=False)
    return ibis.sqlite.connect(filename).table(name)


df = pd.DataFrame(
    {"string": ["Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"]},
)
sqlite_table = create_sqlite(df, "table")
pandas_table = ibis.pandas.connect({"df": df}).table("df")

pandas_table["string"], sqlite_table["string"]
pandas_table["string"].first().execute()
sqlite_table["string"].first().execute()

image

Copy link
Member

@philippjfr philippjfr Nov 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a better way to figure out that we are working with a datetime column? This approach for getting the first and last value really should be removed entirely. Indeed at some point I planned to consistently return None for all non-numeric and non-datetime types.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not for a datetime column but for a categorical column.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MarcSkovMadsen can correct me. But the fix for the datetime column was done by adding dimension_type to the class.

Copy link
Member

@philippjfr philippjfr Nov 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to fully explain, range is really an operation that should be reserved for numeric and datetime values. Early on I decided that in the categorical/string/object case I would simply grab the first/last value in the array. This was fine when the only interfaces we had were in-memory but for lazy interfaces like SQL or dask backed data the behavior is often ill-defined or actually quite expensive to compute. So the correct thing going forward is to simply abort early and return (None, None) if the column is categorical. The remaining problem is that some interfaces do not correctly declare datetime columns, reporting an object dtype. So we will still need to handle that case correctly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I simply don't understand what I should do here to resolve

Copy link
Member

@philippjfr philippjfr Nov 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's understandable, I was just trying to provide context and didn't have any prescription for what should be done.

My question is whether the change to dimension_type is sufficient to fix the issue? If so I would suggest reverting the change to range and eventually removing this conditional branch entirely.

Copy link
Collaborator Author

@MarcSkovMadsen MarcSkovMadsen Nov 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its not enough. Both the DateTime and Categorical axis where broken. Needs two different fixes

holoviews/tests/core/data/test_ibisinterface.py Outdated Show resolved Hide resolved
holoviews/core/util.py Show resolved Hide resolved
holoviews/core/data/ibis.py Outdated Show resolved Hide resolved
@hoxbro hoxbro modified the milestones: 1.15.3, 1.15.x Dec 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ibis Timestamp not displaying as datetime axis
4 participants