Final changes.

mike-matera · Jun 14, 2021 · 1b86812 · 1b86812
1 parent 0cee092
commit 1b86812
Show file tree

Hide file tree

Showing 3 changed files with 134 additions and 1 deletion.
diff --git a/notebooks/Project/importing_data.ipynb b/notebooks/Project/importing_data.ipynb
@@ -0,0 +1,133 @@
+{
+ "metadata": {
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.5"
+  },
+  "orig_nbformat": 2,
+  "kernelspec": {
+   "name": "python385jvsc74a57bd0b122a4c4bdb99d3165b2cc8edf569b9c7d6451311834707c9fe8c2d5b9d3ea9a",
+   "display_name": "Python 3.8.5 64-bit ('venv-databases': venv)"
+  },
+  "metadata": {
+   "interpreter": {
+    "hash": "b122a4c4bdb99d3165b2cc8edf569b9c7d6451311834707c9fe8c2d5b9d3ea9a"
+   }
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2,
+ "cells": [
+  {
+   "source": [
+    "# Data Import Tools \n",
+    "\n",
+    "This notebook has a refinement of the cell that you used in the Cabrillo Courses notebook from a couple of weeks ago. It will import CSVs into a corresponding tables. Use it to get your project started. \n",
+    "\n",
+    "Start by running the next cell to get setup. "
+   ],
+   "cell_type": "markdown",
+   "metadata": {}
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [
+    {
+     "output_type": "stream",
+     "name": "stdout",
+     "text": [
+      "The sql extension is already loaded. To reload it, use:\n  %reload_ext sql\n"
+     ]
+    }
+   ],
+   "source": [
+    "%load_ext sql\n",
+    "%config SqlMagic.autolimit=500\n",
+    "    \n",
+    "import re \n",
+    "import pathlib\n",
+    "import subprocess \n",
+    "import folium\n",
+    "import folium.plugins\n",
+    "import pandas as pd \n",
+    "from sqlalchemy import create_engine"
+   ]
+  },
+  {
+   "source": [
+    "## Import Your Data Files \n",
+    "\n",
+    "On my Jupyter server you can drag and drop data files alongside this notebook and use them in the next cell. If you're using Google Colaboratory you should connect your Google Drive to the notebook. I recoreded how to do this in class. Here's a video showing the process: \n",
+    "\n",
+    "https://youtu.be/mNTqIw-Oy44\n",
+    "\n",
+    "Change the data files in the next cell. You can have as many as you like and as few as one. Name the files (with their path in Colab) inside of quotes and separated by commas. \n",
+    "\n",
+    "**This cell may take a long time if you have lots of data.** "
+   ],
+   "cell_type": "markdown",
+   "metadata": {}
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Change these to match your data files. \n",
+    "datafiles = [\"MasterCourseFile.csv\", \"ProgramFile.csv\", \"ProgramCourseFile.csv\"]\n",
+    "\n",
+    "# This will create a file called projdata.sqlite3 with your schema.\n",
+    "url = 'sqlite:///projdata.sqlite3'\n",
+    "engine = create_engine(url, echo=False)\n",
+    "for f in map(pathlib.Path, datafiles):\n",
+    "    df = pd.read_csv(f)\n",
+    "    df.to_sql(f.name.replace('.csv',''), con=engine, if_exists='replace')"
+   ]
+  },
+  {
+   "source": [
+    "The previous cell creates one table for each CSV file. The table will have the same name as the CSV without the file path and the `.csv` extension. Here are examples to show you what your tables are named:\n",
+    "\n",
+    "| File Name | Table Name | \n",
+    "| --- | --- | \n",
+    "| ../../labs/MasterCourseFile.csv | MasterCourseFile | \n",
+    "| ProgramCourseFile.csv | ProgramCourseFile | \n",
+    "| /data/drive/My Drive/MyData.csv | MyData | \n",
+    "\n",
+    "Re running the previous cell will re-load the CSV data dropping any data that was in the tables before. If you reformat your CSV files (for example by changing the columns) the import may fail. You can fix that by deleting the `projdata.sqlite3` file and re-running the previous cell.\n",
+    "\n"
+   ],
+   "cell_type": "markdown",
+   "metadata": {}
+  },
+  {
+   "source": [
+    "## Use and Query Your Data \n",
+    "\n",
+    "SQL cells should start with the `%%sql` command shown in the cell below: "
+   ],
+   "cell_type": "markdown",
+   "metadata": {}
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%sql sqlite:///projdata.sqlite3\n"
+   ]
+  }
+ ]
+}
diff --git a/notebooks/Project/projdata.sqlite3 b/notebooks/Project/projdata.sqlite3
diff --git a/projects/using_public_data_2.md b/projects/using_public_data_2.md
@@ -17,7 +17,7 @@ Write two queries that show something interesting in your data. For each query a
 
 ## Export Your Data 
 
-For the final submission export your data from MySQL as an SQL dump file. 
+You can use either MySQL or Jupyter notebook. If you use MySQL export your data as an SQL dump file. If you use Jupyter export the sqlite file. 
 
 ## Turn In