From dba15fe843baf043a52c01550259caf0c3d988aa Mon Sep 17 00:00:00 2001 From: Andreas Motl Date: Tue, 5 Mar 2024 23:42:52 +0100 Subject: [PATCH] Time Series QA: Make notebooks self-contained, also adding DDL and DML Otherwise, people or QA jobs invoking individual notebooks, or in a different order, are having a hard time. --- .../exploratory_data_analysis.ipynb | 27 ++++++++++++++++++- topic/timeseries/requirements-dev.txt | 4 +-- topic/timeseries/requirements.txt | 1 + .../time-series-decomposition.ipynb | 27 ++++++++++++++++++- ...timeseries-queries-and-visualization.ipynb | 6 ++--- 5 files changed, 58 insertions(+), 7 deletions(-) diff --git a/topic/timeseries/exploratory_data_analysis.ipynb b/topic/timeseries/exploratory_data_analysis.ipynb index 247484c8..2b9615ad 100644 --- a/topic/timeseries/exploratory_data_analysis.ipynb +++ b/topic/timeseries/exploratory_data_analysis.ipynb @@ -102,12 +102,37 @@ "engine = sa.create_engine(CONNECTION_STRING, echo=os.environ.get('DEBUG'))" ] }, + { + "cell_type": "markdown", + "source": [ + "First, import data into CrateDB. This is a shorthand notation for the same code\n", + "illustrated in `timeseries-queries-and-visualization.ipynb`, running corresponding\n", + "SQL DDL and DML statements, to load the data." + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "from cratedb_toolkit.datasets import load_dataset\n", + "\n", + "dataset = load_dataset(\"tutorial/weather-basic\")\n", + "dataset.dbtable(dburi=CONNECTION_STRING, table=\"weather_data\").load()" + ], + "metadata": { + "collapsed": false + } + }, { "cell_type": "markdown", "id": "cdae15fa", "metadata": {}, "source": [ - "The next step fetches data from CrateDB and load it into a pandas data frame:" + "Then, load data from CrateDB into a pandas data frame:" ] }, { diff --git a/topic/timeseries/requirements-dev.txt b/topic/timeseries/requirements-dev.txt index 4f771791..cfd81eee 100644 --- a/topic/timeseries/requirements-dev.txt +++ b/topic/timeseries/requirements-dev.txt @@ -1,5 +1,5 @@ # Real. -# pueblo[notebook,testing]>=0.0.7 +pueblo[notebook,testing]>=0.0.9 # Development. -pueblo[notebook,testing] @ git+https://github.com/pyveci/pueblo.git@amo/testbook +# pueblo[notebook,testing] @ git+https://github.com/pyveci/pueblo.git@amo/testbook diff --git a/topic/timeseries/requirements.txt b/topic/timeseries/requirements.txt index bbc66e95..a75b6aa2 100644 --- a/topic/timeseries/requirements.txt +++ b/topic/timeseries/requirements.txt @@ -1,4 +1,5 @@ crate[sqlalchemy]==0.34.0 +cratedb-toolkit==0.0.6 refinitiv-data<1.7 pandas<2 pycaret>=3.0,<3.4 diff --git a/topic/timeseries/time-series-decomposition.ipynb b/topic/timeseries/time-series-decomposition.ipynb index c6e88764..71a051e6 100644 --- a/topic/timeseries/time-series-decomposition.ipynb +++ b/topic/timeseries/time-series-decomposition.ipynb @@ -106,12 +106,37 @@ "engine = sa.create_engine(CONNECTION_STRING, echo=os.environ.get('DEBUG'))" ] }, + { + "cell_type": "markdown", + "source": [ + "First, import data into CrateDB. This is a shorthand notation for the same code\n", + "illustrated in `timeseries-queries-and-visualization.ipynb`, running corresponding\n", + "SQL DDL and DML statements, to load the data." + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "from cratedb_toolkit.datasets import load_dataset\n", + "\n", + "dataset = load_dataset(\"tutorial/weather-basic\")\n", + "dataset.dbtable(dburi=CONNECTION_STRING, table=\"weather_data\").load()" + ], + "metadata": { + "collapsed": false + } + }, { "cell_type": "markdown", "id": "cdae15fa", "metadata": {}, "source": [ - "The next step fetches data from CrateDB and load it into a pandas data frame:" + "Then, load data from CrateDB into a pandas data frame:" ] }, { diff --git a/topic/timeseries/timeseries-queries-and-visualization.ipynb b/topic/timeseries/timeseries-queries-and-visualization.ipynb index 41b6dc19..ca9f1ed0 100644 --- a/topic/timeseries/timeseries-queries-and-visualization.ipynb +++ b/topic/timeseries/timeseries-queries-and-visualization.ipynb @@ -200,9 +200,9 @@ "id": "226e67f8", "metadata": {}, "source": [ - "After inserting data, it is recommended to `ANALYZE` the tables to make the query optimizer obtain\n", - "important statistics information about them. Let's also invoke a `REFRESH` statement beforehand,\n", - "to make sure that the data is up-to-date." + "After inserting data, let's invoke a `REFRESH` statement, to make sure it is\n", + "up-to-date. It is also recommended to `ANALYZE` the tables, to make the query\n", + "optimizer obtain important statistics information about them." ] }, {