Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DateTimeIndex error when tsmode=True #1631

Closed
3 tasks done
lpeti69 opened this issue Jul 26, 2024 · 6 comments · Fixed by #1635
Closed
3 tasks done

DateTimeIndex error when tsmode=True #1631

lpeti69 opened this issue Jul 26, 2024 · 6 comments · Fixed by #1635
Assignees
Labels
bug 🐛 Something isn't working

Comments

@lpeti69
Copy link

lpeti69 commented Jul 26, 2024

Current Behaviour

I get the following error:

ValueError: Date ordinal 27743050.0 converts to 77927-11-23T00:00:00.000000 (using epoch 1970-01-01T00:00:00), but Matplotlib dates must be between year 0001 and 9999.

Expected Behaviour

Compiled & rendered report

Data Description

See code snippet.

Code that reproduces the bug

import pandas as pd
from ydata_profiling import ProfileReport

data = {
    'value': [1, 2, 3, 4],
    'datetime': ['2022-10-01 00:10:00', '2022-10-02 00:20:00', '2022-10-03 00:30:00', '2022-10-04 00:40:00']
}
df = pd.DataFrame(data)
df['datetime'] = pd.to_datetime(df['datetime'], errors='raise')
df.set_index('datetime', inplace=True)

profile = ProfileReport(df, tsmode=True, title="Pandas Profiling Report")
profile.to_file("report.html")

pandas-profiling version

latest

Dependencies

pandas==1.5.3
pandas-profiling==3.6.6
ydata-profiling==4.9.0

OS

Macos

Checklist

  • There is not yet another bug report for this issue in the issue tracker
  • The problem is reproducible from this bug report. This guide can help to craft a minimal bug report.
  • The issue has not been resolved by the entries listed under Common Issues.
@lpeti69
Copy link
Author

lpeti69 commented Jul 26, 2024

Updates:
I tried also not to set the datetime as index and specify it by the sortby parameter, but it did not help.

import pandas as pd
from ydata_profiling import ProfileReport

from ydata_profiling.visualisation.plot import timeseries_heatmap

data = {
    'value': [1, 2, 3, 4],
    'datetime': ['2022-10-01 00:10:00', '2022-10-02 00:20:00', '2022-10-03 00:30:00', '2022-10-04 00:40:00']
}
df = pd.DataFrame(data)
df['datetime'] = pd.to_datetime(df['datetime'], errors='raise')

profile = ProfileReport(df, tsmode=True, sortby="datetime", title="Pandas Profiling Report")
profile.to_file("report.html")

@fabclmnt fabclmnt added bug 🐛 Something isn't working and removed needs-triage labels Aug 1, 2024
@fabclmnt
Copy link
Contributor

fabclmnt commented Aug 1, 2024

Hi @lpeti69 ,

can you please share the python version that you are using along with other packages and versions that you have installed?

@bruno2009
Copy link

bruno2009 commented Aug 24, 2024

I am having the same problem.

Python version: 3.12.4 | packaged by Anaconda, Inc. | (main, Jun 18 2024, 15:12:24) [GCC 11.2.0]
pandas version: 2.2.2
ydata_profiling version: 0.0.dev0
from ucimlrepo import fetch_ucirepo 

import pandas as pd

from ydata_profiling import ProfileReport

# fetch dataset 
data = fetch_ucirepo(id=374) 

# Convert the data to a pandas DataFrame
df = pd.DataFrame(data=data.data.features)

re_express = r'(\d{4})-(\d{2})-(\d{2})(\d{2}):(\d{2}):(\d{2})'

# Replace using the pattern
df['date'] = df['date'].str.replace(re_express, r'\1-\2-\3 \4:\5:\6', regex=True)

# Convert the 'date' column to datetime format
df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%d %H:%M:%S', errors='raise')

# df['date_'] = pd.date_range(start='2016-01-11 17:00:00', end='2016-05-27 18:00:00', freq='10min')

# If the data has target labels, you can also add them to the DataFrame
df['target'] = data.data.targets

# # Set the 'date' column as the index
df.set_index('date', inplace=True)

# # Optionally, sort the DataFrame by the new datetime index
df.sort_index(inplace=True)

# Display the first few rows of the DataFrame
# df.head()

profile = ProfileReport(df, tsmode=True, title="Pandas Profiling Report")
profile.to_file("report.html")
# profile.to_notebook_iframe()
Summarizedataset: 100% [-------] 821/821 [04:13<00:00,  3.02it/s, Completed]
Generatereportstructure:   0% [------------] 0/1 [02:08<?, ?it/s]

{
	"name": "ValueError",
	"message": "Date ordinal 24208860.0 converts to 68251-08-11T00:00:00.000000 (using epoch 1970-01-01T00:00:00), but Matplotlib dates must be between year 0001 and 9999.",
	"stack": "---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[5], line 2
      1 profile = ProfileReport(df, tsmode=True, title=\"Pandas Profiling Report\")
----> 2 profile.to_file(\"report.html\")

File ~/anaconda3/envs/MLOps/lib/python3.12/site-packages/ydata_profiling/profile_report.py:379, in ProfileReport.to_file(self, output_file, silent)
    376         self.config.html.assets_prefix = str(output_file.stem) + \"_assets\"
    377     create_html_assets(self.config, output_file)
--> 379 data = self.to_html()
    381 if output_file.suffix != \".html\":
    382     suffix = output_file.suffix

File ~/anaconda3/envs/MLOps/lib/python3.12/site-packages/ydata_profiling/profile_report.py:496, in ProfileReport.to_html(self)
    488 def to_html(self) -> str:
    489     \"\"\"Generate and return complete template as lengthy string
    490         for using with frameworks.
    491 
   (...)
    494 
    495     \"\"\"
--> 496     return self.html

File ~/anaconda3/envs/MLOps/lib/python3.12/site-packages/ydata_profiling/profile_report.py:292, in ProfileReport.html(self)
    289 @property
    290 def html(self) -> str:
    291     if self._html is None:
--> 292         self._html = self._render_html()
    293     return self._html

File ~/anaconda3/envs/MLOps/lib/python3.12/site-packages/ydata_profiling/profile_report.py:409, in ProfileReport._render_html(self)
    406 def _render_html(self) -> str:
    407     from ydata_profiling.report.presentation.flavours import HTMLReport
--> 409     report = self.report
    411     with tqdm(
    412         total=1, desc=\"Render HTML\", disable=not self.config.progress_bar
    413     ) as pbar:
    414         html = HTMLReport(copy.deepcopy(report)).render(
    415             nav=self.config.html.navbar_show,
    416             offline=self.config.html.use_local_assets,
   (...)
    424             version=self.description_set.package[\"ydata_profiling_version\"],
    425         )

File ~/anaconda3/envs/MLOps/lib/python3.12/site-packages/ydata_profiling/profile_report.py:286, in ProfileReport.report(self)
    283 @property
    284 def report(self) -> Root:
    285     if self._report is None:
--> 286         self._report = get_report_structure(self.config, self.description_set)
    287     return self._report

File ~/anaconda3/envs/MLOps/lib/python3.12/site-packages/ydata_profiling/report/structure/report.py:387, in get_report_structure(config, summary)
    368 section_items: List[Renderable] = [
    369     Container(
    370         get_dataset_items(config, summary, alerts),
   (...)
    374     ),
    375 ]
    377 if len(summary.variables) > 0:
    378     section_items.append(
    379         Dropdown(
    380             name=\"Variables\",
    381             anchor_id=\"variables-dropdown\",
    382             id=\"variables-dropdown\",
    383             is_row=True,
    384             classes=[\"dropdown-toggle\"],
    385             items=list(summary.variables),
    386             item=Container(
--> 387                 render_variables_section(config, summary),
    388                 sequence_type=\"accordion\",
    389                 name=\"Variables\",
    390                 anchor_id=\"variables\",
    391             ),
    392         )
    393     )
    395 scatter_items = get_interactions(config, summary.scatter)
    396 if len(scatter_items) > 0:

File ~/anaconda3/envs/MLOps/lib/python3.12/site-packages/ydata_profiling/report/structure/report.py:162, in render_variables_section(config, dataframe_summary)
    160     variable_type = summary[\"type\"]
    161 render_map_type = render_map.get(variable_type, render_map[\"Unsupported\"])
--> 162 template_variables.update(render_map_type(config, template_variables))
    164 # Ignore these
    165 if reject_variables:

File ~/anaconda3/envs/MLOps/lib/python3.12/site-packages/ydata_profiling/report/structure/variables/render_timeseries.py:175, in render_timeseries(config, summary)
    100 table1 = Table(
    101     [
    102         {
   (...)
    133     style=config.html.style,
    134 )
    136 table2 = Table(
    137     [
    138         {
   (...)
    171     style=config.html.style,
    172 )
    174 mini_plot = Image(
--> 175     mini_ts_plot(config, summary[\"series\"]),
    176     image_format=image_format,
    177     alt=\"Mini TS plot\",
    178 )
    180 template_variables[\"top\"] = Container(
    181     [info, table1, table2, mini_plot], sequence_type=\"grid\"
    182 )
    184 quantile_statistics = Table(
    185     [
    186         {
   (...)
    226     style=config.html.style,
    227 )

File ~/anaconda3/envs/MLOps/lib/python3.12/contextlib.py:81, in ContextDecorator.__call__.<locals>.inner(*args, **kwds)
     78 @wraps(func)
     79 def inner(*args, **kwds):
     80     with self._recreate_cm():
---> 81         return func(*args, **kwds)

File ~/anaconda3/envs/MLOps/lib/python3.12/site-packages/ydata_profiling/visualisation/plot.py:708, in mini_ts_plot(config, series, figsize)
    705 plot.xaxis.set_tick_params(rotation=45)
    706 plt.rc(\"ytick\", labelsize=3)
--> 708 for tick in plot.xaxis.get_major_ticks():
    709     if isinstance(series.index, pd.DatetimeIndex):
    710         tick.label1.set_fontsize(6)

File ~/anaconda3/envs/MLOps/lib/python3.12/site-packages/matplotlib/axis.py:1666, in Axis.get_major_ticks(self, numticks)
   1651 r\"\"\"
   1652 Return the list of major `.Tick`\\s.
   1653 
   (...)
   1663     Use `.set_tick_params` instead if possible.
   1664 \"\"\"
   1665 if numticks is None:
-> 1666     numticks = len(self.get_majorticklocs())
   1668 while len(self.majorTicks) < numticks:
   1669     # Update the new tick label properties from the old.
   1670     tick = self._get_tick(major=True)

File ~/anaconda3/envs/MLOps/lib/python3.12/site-packages/matplotlib/axis.py:1531, in Axis.get_majorticklocs(self)
   1529 def get_majorticklocs(self):
   1530     \"\"\"Return this Axis' major tick locations in data coordinates.\"\"\"
-> 1531     return self.major.locator()

File ~/anaconda3/envs/MLOps/lib/python3.12/site-packages/matplotlib/dates.py:1314, in AutoDateLocator.__call__(self)
   1312 def __call__(self):
   1313     # docstring inherited
-> 1314     dmin, dmax = self.viewlim_to_dt()
   1315     locator = self.get_locator(dmin, dmax)
   1316     return locator()

File ~/anaconda3/envs/MLOps/lib/python3.12/site-packages/matplotlib/dates.py:1099, in DateLocator.viewlim_to_dt(self)
   1097 if vmin > vmax:
   1098     vmin, vmax = vmax, vmin
-> 1099 return num2date(vmin, self.tz), num2date(vmax, self.tz)

File ~/anaconda3/envs/MLOps/lib/python3.12/site-packages/matplotlib/dates.py:484, in num2date(x, tz)
    458 \"\"\"
    459 Convert Matplotlib dates to `~datetime.datetime` objects.
    460 
   (...)
    481 For details, see the module docstring.
    482 \"\"\"
    483 tz = _get_tzinfo(tz)
--> 484 return _from_ordinalf_np_vectorized(x, tz).tolist()

File ~/anaconda3/envs/MLOps/lib/python3.12/site-packages/numpy/lib/function_base.py:2372, in vectorize.__call__(self, *args, **kwargs)
   2369     self._init_stage_2(*args, **kwargs)
   2370     return self
-> 2372 return self._call_as_normal(*args, **kwargs)

File ~/anaconda3/envs/MLOps/lib/python3.12/site-packages/numpy/lib/function_base.py:2365, in vectorize._call_as_normal(self, *args, **kwargs)
   2362     vargs = [args[_i] for _i in inds]
   2363     vargs.extend([kwargs[_n] for _n in names])
-> 2365 return self._vectorize_call(func=func, args=vargs)

File ~/anaconda3/envs/MLOps/lib/python3.12/site-packages/numpy/lib/function_base.py:2455, in vectorize._vectorize_call(self, func, args)
   2452 # Convert args to object arrays first
   2453 inputs = [asanyarray(a, dtype=object) for a in args]
-> 2455 outputs = ufunc(*inputs)
   2457 if ufunc.nout == 1:
   2458     res = asanyarray(outputs, dtype=otypes[0])

File ~/anaconda3/envs/MLOps/lib/python3.12/site-packages/matplotlib/dates.py:350, in _from_ordinalf(x, tz)
    347 dt = (np.datetime64(get_epoch()) +
    348       np.timedelta64(int(np.round(x * MUSECONDS_PER_DAY)), 'us'))
    349 if dt < np.datetime64('0001-01-01') or dt >= np.datetime64('10000-01-01'):
--> 350     raise ValueError(f'Date ordinal {x} converts to {dt} (using '
    351                      f'epoch {get_epoch()}), but Matplotlib dates must be '
    352                       'between year 0001 and 9999.')
    353 # convert from datetime64 to datetime:
    354 dt = dt.tolist()

ValueError: Date ordinal 24208860.0 converts to 68251-08-11T00:00:00.000000 (using epoch 1970-01-01T00:00:00), but Matplotlib dates must be between year 0001 and 9999."
}

@fabclmnt
Copy link
Contributor

Hi @bruno2009 ,

this is waiting for the next release to be widely available (planned for tomorrow).

Nevertheless, the version of Ydata-profiling that you are using does not seem correct - ydata_profiling version: 0.0.dev0.

From where and how are you installing the package? Can you provide more insights?

Cheers.

@lpeti69
Copy link
Author

lpeti69 commented Aug 26, 2024 via email

@bruno2009
Copy link

bruno2009 commented Aug 26, 2024

Hi @bruno2009 ,

this is waiting for the next release to be widely available (planned for tomorrow).

Nevertheless, the version of Ydata-profiling that you are using does not seem correct - ydata_profiling version: 0.0.dev0.

From where and how are you installing the package? Can you provide more insights?

Cheers.

This is the command that I used to install the package:

conda install -c conda-forge ydata-profiling

However, does not matter which one I choose in order to install the package (If it is pip or conda) both of them are given the same version output.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐛 Something isn't working
Projects
Status: Approval
Development

Successfully merging a pull request may close this issue.

5 participants