Add incomplete files and visualization lesson

UofUDELPHI · Jan 30, 2024 · c0117af · c0117af
1 parent dcafb39
commit c0117af
Show file tree

Hide file tree

Showing 34 changed files with 4,271 additions and 256 deletions.
diff --git a/.Rhistory b/.Rhistory
diff --git a/content/complete/01_variables.ipynb b/content/complete/01_variables.ipynb
@@ -13,6 +13,8 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
+    "### Simple computations\n",
+    "\n",
     "We can use Python to do simple computations, like this:"
    ]
   },
@@ -41,6 +43,8 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
+    "### Defining variables/objects\n",
+    "\n",
     "If I want to use the \"output\" of this code, we need to assign it to a variable/object."
    ]
   },
@@ -166,6 +170,8 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
+    "### Overwriting variables\n",
+    "\n",
     "You can overwrite variables, by re-assinging them:"
    ]
   },
@@ -204,6 +210,8 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
+    "### The `+=` shortcut\n",
+    "\n",
     "There is a shortcut that will let you add a number to a variable *and* update its value: `+=`"
    ]
   },

diff --git a/content/complete/02_types.ipynb b/content/complete/02_types.ipynb
@@ -65,6 +65,8 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
+    "### The `type()` function\n",
+    "\n",
     "We can check the type of `y` using the `type()` funciton"
    ]
   },

diff --git a/content/complete/03_type_conversions.ipynb b/content/complete/03_type_conversions.ipynb
@@ -62,6 +62,8 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
+    "### Converting to a string using `str()`\n",
+    "\n",
     "The `str()` function will convert whatever value it is given to a string (whose shorthand is `str`). \n",
     "\n",
     "Below, we convert the integer `4` to a string, assign it to a variable called `a` and then we check the type of `a` (which is `str`):"
@@ -130,6 +132,8 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
+    "### Converting to an integer using `int()`\n",
+    "\n",
     "Converting the float `3.0` to an integer removes the decimal point:"
    ]
   },
@@ -215,6 +219,8 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
+    "### Converting to a boolean using `bool()`\n",
+    "\n",
     "When you convert a number to a boolean using `bool()`, it is always converted to `True`, unless the number is equal to `0` (this is the only number that is converted to `False`):"
    ]
   },

diff --git a/content/complete/04_boolean_operations.ipynb b/content/complete/04_boolean_operations.ipynb
@@ -30,6 +30,8 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
+    "### Asking if two things are equal with `==`\n",
+    "\n",
     "To ask a question of equality, we use two equal signs `==`"
    ]
   },
@@ -86,6 +88,8 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
+    "### Asking if two things are not equal with `!=`\n",
+    "\n",
     "The \"not equal to\" operator is written `!=`. The following question asks if the `age` variable is \"not equal\" to 10:"
    ]
   },
@@ -114,6 +118,8 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
+    "### Less than or greater than with `<` and `>`\n",
+    "\n",
     "Next, to ask questions of greater than or less than, we use the `<` and `>` operators:"
    ]
   },
@@ -219,6 +225,8 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
+    "### Less than or greater than for strings\n",
+    "\n",
     "Strings are treated alphabetically, so `'apple'` is \"less\" than `'bannana'` because the first letter of apple \"a\" comes before the first letter of banana \"b\" in the alphabet:"
    ]
   },
@@ -264,6 +272,13 @@
     "'carrot' < 'banana'"
    ]
   },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
   {
    "cell_type": "markdown",
    "metadata": {},

diff --git a/content/complete/05_numpy.ipynb b/content/complete/05_numpy.ipynb
@@ -20,6 +20,8 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
+    "### Installing numpy\n",
+    "\n",
     "Just like an application on your computer, where you need to first download and install the application before you can use it on your computer, before you can use Python libraries, you need to first download and install them. \n",
     "\n",
     "The way that you will install Python libraries depends on your Python installation. \n",
@@ -66,7 +68,15 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "You *don't* need to include this `pip install numpy` code in your notebook.\n",
+    "You *don't* need to include this `pip install numpy` code in your notebook.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "\n",
+    "### Importing numpy\n",
     "\n",
     "Once you've successfully installed the numpy library once, you can import the library (make its functions available) using the `import <libraryname> as <nickname>` command below. \n",
     "\n",
@@ -87,6 +97,8 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
+    "### Using numpy functions\n",
+    "\n",
     "Let's take a look at some of the functions that the numpy library provides.\n",
     "\n",
     "First, let's define a variable `x` that contains the value `2`:"

diff --git a/content/complete/06_pandas_dataframes.ipynb b/content/complete/06_pandas_dataframes.ipynb
@@ -41,6 +41,8 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
+    "### Loading a data file into a pandas DataFrame\n",
+    "\n",
     "To load a .csv data file into our space, we need to use the `read_csv()` function from the pandas library. Make sure that you have saved the `gapminder.csv` file in a `data` subfolder that lives in the same place where this notebook is saved.\n",
     "\n",
     "Let's load the gapminder dataset:"
@@ -618,6 +620,8 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
+    "### The shape attribute\n",
+    "\n",
     "To extract an attribute from an object in Python, we use the `object.attribute` syntax. So if we want to extract the `shape` attribute from the `gapminder` DataFrame object, we can do so as follows:"
    ]
   },
@@ -653,6 +657,8 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
+    "### The head() method\n",
+    "\n",
     "The `head()` function typically prints out the first few rows of a DataFrame. However, `head()` is not a regular function. If `head()` were a regular function, we would be able to apply it like this:"
    ]
   },
@@ -794,6 +800,8 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
+    "### Arguments\n",
+    "\n",
     "You can provide additional arguments to the `head()` inside the parentheses. For example, if you want to print 10 rows instead of 5, you can do so as follows:"
    ]
   },

diff --git a/content/complete/07_index.ipynb b/content/complete/07_index.ipynb
@@ -150,6 +150,8 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
+    "### Changing the index\n",
+    "\n",
     "You can change the index using the `set_index()` method and providing, for example, a column name as a string."
    ]
   },

diff --git a/content/complete/08_series.ipynb b/content/complete/08_series.ipynb
@@ -227,7 +227,11 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "There are several ways to extract a column from a DataFrame. The first, involves writing the name of the DataFrame object followed by square parentheses inside which you provide the name of the column you want to extract as a string:"
+    "There are several ways to extract a column from a DataFrame. \n",
+    "\n",
+    "### Method 1: Using square brackets\n",
+    "\n",
+    "The first, involves writing the name of the DataFrame object followed by square parentheses inside which you provide the name of the column you want to extract as a string:"
    ]
   },
   {
@@ -266,6 +270,8 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
+    "### Method 2: Using the column attribute with `.`\n",
+    "\n",
     "Another way to do the same thing is to use the `.` syntax to extract the named column attribute from the DataFrame object, such as:"
    ]
   },
@@ -428,6 +434,8 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
+    "### The Series index\n",
+    "\n",
     "They do however have an `index` (row name) attribute, which is inherited from the DataFrame from which the Series came:"
    ]
   },
@@ -456,6 +464,8 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
+    "### The vectorized nature of Series objects\n",
+    "\n",
     "The nice thing about Pandas Series objects is that they are **vectorized**. \n",
     "\n",
     "This means that when you apply simple mathematical operations to them, the operation will be applied to *every* entry in the Series. For example, if we add `5` to the `year` Series object, `5` will be added to *every* value in the `year` Series object:"

diff --git a/content/complete/09_subsetting.ipynb b/content/complete/09_subsetting.ipynb
@@ -4,7 +4,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Working with DataFrames\n",
+    "# Extracting subsets of data frames\n",
     "\n",
     "In this notebook, we will learn how to manipulate pandas DataFrame objects, starting with extracting subsets."
    ]
@@ -120,13 +120,8 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Extracting subsets of data frames"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
+    "### Extracting multiple columns\n",
+    "\n",
     "Suppose that you want to extract multiple columns at once from your DataFrame object. You might imagine that you can do this by providing two column names inside the square parentheses that follow the object name, as follows:"
    ]
   },
@@ -362,6 +357,8 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
+    "### Using `:` with `.loc` to select all rows/columns\n",
+    "\n",
     "If you want to extract all rows (or columns), you can replace the corresponding index entry with `:`. So the following code will extract all rows for the `gdpPercap` column:"
    ]
   },
@@ -702,6 +699,8 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
+    "### Using `.loc` with non-numeric indexes\n",
+    "\n",
     "Note that the fact that we can index the rows using `.loc` with integers is solely a result of the fact that the row index corresponds to integers. If, instead the row index corresponded to the `country` values, such as in `gapminder_country`, we would not be able to use integers to subset the rows, and we would instead need to use the country names. \n",
     "\n",
     "Let's create `gapminder_country`, whose row index corresponds to the country variable:"

diff --git a/content/complete/10_filtering_logical.ipynb b/content/complete/10_filtering_logical.ipynb
@@ -4,7 +4,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Filtering using logical operations and `.loc`"
+    "# Filtering using logical operations and `.loc`"
    ]
   },
   {
@@ -114,6 +114,13 @@
     "gapminder.head()"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Filtering with `.loc` using a boolean series\n"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -159,6 +166,8 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
+    "\n",
+    "\n",
     "We can use this boolean series to subset/filter the rows of our DataFrame by providing it in the row indexing position of the `.loc` indexer. The following will filter the `gapminder` DataFrame just to the rows where the `country` value equals `'Australia'`:"
    ]
   },

diff --git a/content/complete/11_filtering_query.ipynb b/content/complete/11_filtering_query.ipynb
@@ -473,6 +473,8 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
+    "### Filtering using `.query()`\n",
+    "\n",
     "The `.query()` method does the same thing, but the syntax is a bit different. Since `query` is a method, it is followed by round parentheses `()` rather than square parentheses `[]`, and unlike in the above examples where we need to explicitly create a boolean Series object from the `country` column, e.g., `gapminder['country'] == \"Australia\"`, we instead provide a string (text) argument in which we just write the name of the column that we are using to filter, `country`, followed by the condiiton `== \"Australia\"`."
    ]
   },
@@ -653,6 +655,8 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
+    "### External variables in the `.query()` method\n",
+    "\n",
     "Note that if you want to use an \"external\" variable in your filtering query, you need to access it within the argument using `@variable_name`. For example, if we have defined an external variable, `selected_country` that contains the name of the country that we want to use to filter to in our query, to access this `selected_country` variable inside our query argument, we need to write `@selected_country` with the `@` symbol, which will impute the value stored in `selected_country` when the query is executed."
    ]
   },
@@ -835,6 +839,8 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
+    "### Combining `.query()` with `.loc`\n",
+    "\n",
     "Note that since `gapminder.query()` outputs a DataFrame itself, you can follow a query method call with further subsetting which will then apply to the outputted DataFrame. The code below filters to just the country rows equal to \"Brazil\", and then uses the `.loc` indexer to subset just the \"year\" and \"lifeExp\" columns for the eventual output:"
    ]
   },