-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
0cee092
commit 1b86812
Showing
3 changed files
with
134 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,133 @@ | ||
{ | ||
"metadata": { | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.8.5" | ||
}, | ||
"orig_nbformat": 2, | ||
"kernelspec": { | ||
"name": "python385jvsc74a57bd0b122a4c4bdb99d3165b2cc8edf569b9c7d6451311834707c9fe8c2d5b9d3ea9a", | ||
"display_name": "Python 3.8.5 64-bit ('venv-databases': venv)" | ||
}, | ||
"metadata": { | ||
"interpreter": { | ||
"hash": "b122a4c4bdb99d3165b2cc8edf569b9c7d6451311834707c9fe8c2d5b9d3ea9a" | ||
} | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2, | ||
"cells": [ | ||
{ | ||
"source": [ | ||
"# Data Import Tools \n", | ||
"\n", | ||
"This notebook has a refinement of the cell that you used in the Cabrillo Courses notebook from a couple of weeks ago. It will import CSVs into a corresponding tables. Use it to get your project started. \n", | ||
"\n", | ||
"Start by running the next cell to get setup. " | ||
], | ||
"cell_type": "markdown", | ||
"metadata": {} | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 3, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"output_type": "stream", | ||
"name": "stdout", | ||
"text": [ | ||
"The sql extension is already loaded. To reload it, use:\n %reload_ext sql\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"%load_ext sql\n", | ||
"%config SqlMagic.autolimit=500\n", | ||
" \n", | ||
"import re \n", | ||
"import pathlib\n", | ||
"import subprocess \n", | ||
"import folium\n", | ||
"import folium.plugins\n", | ||
"import pandas as pd \n", | ||
"from sqlalchemy import create_engine" | ||
] | ||
}, | ||
{ | ||
"source": [ | ||
"## Import Your Data Files \n", | ||
"\n", | ||
"On my Jupyter server you can drag and drop data files alongside this notebook and use them in the next cell. If you're using Google Colaboratory you should connect your Google Drive to the notebook. I recoreded how to do this in class. Here's a video showing the process: \n", | ||
"\n", | ||
"https://youtu.be/mNTqIw-Oy44\n", | ||
"\n", | ||
"Change the data files in the next cell. You can have as many as you like and as few as one. Name the files (with their path in Colab) inside of quotes and separated by commas. \n", | ||
"\n", | ||
"**This cell may take a long time if you have lots of data.** " | ||
], | ||
"cell_type": "markdown", | ||
"metadata": {} | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 8, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# Change these to match your data files. \n", | ||
"datafiles = [\"MasterCourseFile.csv\", \"ProgramFile.csv\", \"ProgramCourseFile.csv\"]\n", | ||
"\n", | ||
"# This will create a file called projdata.sqlite3 with your schema.\n", | ||
"url = 'sqlite:///projdata.sqlite3'\n", | ||
"engine = create_engine(url, echo=False)\n", | ||
"for f in map(pathlib.Path, datafiles):\n", | ||
" df = pd.read_csv(f)\n", | ||
" df.to_sql(f.name.replace('.csv',''), con=engine, if_exists='replace')" | ||
] | ||
}, | ||
{ | ||
"source": [ | ||
"The previous cell creates one table for each CSV file. The table will have the same name as the CSV without the file path and the `.csv` extension. Here are examples to show you what your tables are named:\n", | ||
"\n", | ||
"| File Name | Table Name | \n", | ||
"| --- | --- | \n", | ||
"| ../../labs/MasterCourseFile.csv | MasterCourseFile | \n", | ||
"| ProgramCourseFile.csv | ProgramCourseFile | \n", | ||
"| /data/drive/My Drive/MyData.csv | MyData | \n", | ||
"\n", | ||
"Re running the previous cell will re-load the CSV data dropping any data that was in the tables before. If you reformat your CSV files (for example by changing the columns) the import may fail. You can fix that by deleting the `projdata.sqlite3` file and re-running the previous cell.\n", | ||
"\n" | ||
], | ||
"cell_type": "markdown", | ||
"metadata": {} | ||
}, | ||
{ | ||
"source": [ | ||
"## Use and Query Your Data \n", | ||
"\n", | ||
"SQL cells should start with the `%%sql` command shown in the cell below: " | ||
], | ||
"cell_type": "markdown", | ||
"metadata": {} | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"%%sql sqlite:///projdata.sqlite3\n" | ||
] | ||
} | ||
] | ||
} |
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters