add lx.DataArray

vlc · Jan 31, 2022 · 971ad15 · 971ad15
1 parent d168fb2
commit 971ad15
Show file tree

Hide file tree

Showing 11 changed files with 508 additions and 216 deletions.
diff --git a/book/_config.yml b/book/_config.yml
@@ -73,3 +73,5 @@ sphinx:
     conda:
       - "https://docs.conda.io/projects/conda/en/latest/"
       - null
+  config:
+    bibtex_reference_style: author_year
diff --git a/book/_toc.yml b/book/_toc.yml
@@ -11,6 +11,7 @@ parts:
     - file: user-guide/data-fundamentals
       title: Data Fundamentals
     - file: user-guide/linear-funcs
+    - file: bibliography
 
   - caption: API Reference
     chapters:

diff --git a/book/bibliography.md b/book/bibliography.md
@@ -0,0 +1,5 @@
+# References
+
+```{bibliography}
+:style: plain
+```
diff --git a/book/example/000_mtc_data.ipynb b/book/example/000_mtc_data.ipynb
@@ -22,7 +22,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "The MTC sample dataset is the same data used in the Self Instructing Manual for discrete choice modeling:\n",
+    "The MTC sample dataset is the same data used in the Self Instructing Manual {cite:p}`koppelman2006self` for discrete choice modeling:\n",
     "\n",
     "> The San Francisco Bay Area work mode choice data set comprises 5029 home-to-work commute trips in the\n",
     "> San Francisco Bay Area. The data is drawn from the San Francisco Bay Area Household Travel Survey\n",
@@ -226,4 +226,4 @@
  },
  "nbformat": 4,
  "nbformat_minor": 2
-}
+}
diff --git a/book/example/201_exville_mode_choice.ipynb b/book/example/201_exville_mode_choice.ipynb
@@ -18,11 +18,9 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "import os\n",
-    "import numpy as np\n",
-    "import pandas as pd \n",
+    "# HIDDEN\n",
     "import larch.numba as lx\n",
-    "from larch import P, X"
+    "from pytest import approx"
    ]
   },
   {
@@ -31,7 +29,11 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "lx.__version__"
+    "import os\n",
+    "import numpy as np\n",
+    "import pandas as pd \n",
+    "import larch.numba as lx\n",
+    "from larch import P, X"
    ]
   },
   {
@@ -72,9 +74,7 @@
     "The Exampville data output contains a set of files similar to what we might\n",
     "find for a real travel survey: network skims, and tables of households, persons,\n",
     "and tours.  We'll need to connect these tables together to create a composite dataset\n",
-    "for mode choice model estimation.\n",
-    "\n",
-    "We can merge data from other tables using the usual pandas syntax for merging.\n"
+    "for mode choice model estimation, using the DataTree structure."
    ]
   },
   {
@@ -100,71 +100,20 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "tour"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "tour_dataset = lx.Dataset.from_idco(tour.set_index('TOURID'), alts=Mode)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "tour_dataset"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "dt = lx.DataTree(tour=tour_dataset)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "dt.add_dataset('hh', hh.set_index('HHID'), relationships=\"tours.HHID @ hh.HHID\")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "dt.add_dataset('person', pp.set_index('PERSONID'), relationships=\"tours.PERSONID @ person.PERSONID\")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
+    "tour_dataset = lx.Dataset.from_idco(tour.set_index('TOURID'), alts=Mode)\n",
     "od_skims = lx.Dataset.from_omx(skims)\n",
-    "dt.add_dataset(\n",
-    "    'od', od_skims, \n",
+    "\n",
+    "dt = lx.DataTree(\n",
+    "    tour=tour_dataset,\n",
+    "    hh=hh.set_index('HHID'),\n",
+    "    person=pp.set_index('PERSONID'),\n",
+    "    od=od_skims,\n",
+    "    do=od_skims,\n",
     "    relationships=(\n",
+    "        \"tours.HHID @ hh.HHID\",\n",
+    "        \"tours.PERSONID @ person.PERSONID\",\n",
     "        \"hh.HOMETAZ @ od.otaz\",\n",
     "        \"tours.DTAZ @ od.dtaz\",\n",
-    "    ),\n",
-    ")\n",
-    "dt.add_dataset(\n",
-    "    'do', od_skims, \n",
-    "    relationships=(\n",
     "        \"hh.HOMETAZ @ do.dtaz\",\n",
     "        \"tours.DTAZ @ do.otaz\",\n",
     "    ),\n",
@@ -281,15 +230,6 @@
     "'Motor' nest."
    ]
   },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "dt_work.root_dataset"
-   ]
-  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -379,24 +319,6 @@
     "## Model Estimation"
    ]
   },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "m.loglike()"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "m.data_as_loaded"
-   ]
-  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -411,7 +333,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "# m.dataframes.choice_avail_summary()"
+    "m.choice_avail_summary()"
    ]
   },
   {
@@ -420,7 +342,21 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "# m.dataframes.data_co.statistics()"
+    "# TEST\n",
+    "summary = m.choice_avail_summary()\n",
+    "assert (summary.to_markdown()) == '''\n",
+    "|                            | name     |   chosen | available   | availability condition   |\n",
+    "|:---------------------------|:---------|---------:|:------------|:-------------------------|\n",
+    "| 1                          | DA       |      810 | 7564        | AGE >= 16                |\n",
+    "| 2                          | SR       |      196 | 4179        | 1                        |\n",
+    "| 3                          | Walk     |       72 | 7564        | WALK_TIME < 60           |\n",
+    "| 4                          | Bike     |      434 | 4199        | BIKE_TIME < 60           |\n",
+    "| 5                          | Transit  |     6862 | 7564        | TRANSIT_FARE>0           |\n",
+    "| 6                          | Car      |      268 | 7564        |                          |\n",
+    "| 7                          | NonMotor |     7296 | 7564        |                          |\n",
+    "| 8                          | Motor    |     7564 | 7564        |                          |\n",
+    "| < Total All Alternatives > |          |     6052 |             |                          |\n",
+    "'''[1:-1]"
    ]
   },
   {
@@ -437,7 +373,18 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "result = m.maximize_loglike(method='slsqp')"
+    "m.set_cap(20) # improves optimization stability\n",
+    "result = m.maximize_loglike()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# TEST\n",
+    "assert result.loglike == approx(-3493.0397298749467)"
    ]
   },
   {
@@ -474,6 +421,34 @@
     "m.parameter_summary()"
    ]
   },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# TEST\n",
+    "assert (m.parameter_summary().data.to_markdown()) == '''\n",
+    "|                   |   Value |   Std Err |   t Stat | Signif   |   Null Value |\n",
+    "|:------------------|--------:|----------:|---------:|:---------|-------------:|\n",
+    "| ASC_Bike          |  -0.258 |    1.34   |    -0.19 |          |            0 |\n",
+    "| ASC_SR            |   1.42  |    1      |     1.42 |          |            0 |\n",
+    "| ASC_Transit       |   6.75  |    2.06   |     3.27 | **       |            0 |\n",
+    "| ASC_Walk          |   8.62  |    1.14   |     7.57 | ***      |            0 |\n",
+    "| Cost              |  -0.176 |    0.12   |    -1.47 |          |            0 |\n",
+    "| InVehTime         |  -0.124 |    0.0292 |    -4.24 | ***      |            0 |\n",
+    "| LogIncome:Bike    |  -0.197 |    0.124  |    -1.59 |          |            0 |\n",
+    "| LogIncome:SR      |  -0.194 |    0.135  |    -1.43 |          |            0 |\n",
+    "| LogIncome:Transit |  -0.557 |    0.169  |    -3.29 | ***      |            0 |\n",
+    "| LogIncome:Walk    |  -0.523 |    0.1    |    -5.21 | ***      |            0 |\n",
+    "| Mu:Car            |   0.259 |    0.181  |    -4.1  | ***      |            1 |\n",
+    "| Mu:Motor          |   0.802 |    0.201  |    -0.99 |          |            1 |\n",
+    "| Mu:NonMotor       |   0.854 |    0.112  |    -1.3  |          |            1 |\n",
+    "| NonMotorTime      |  -0.266 |    0.0163 |   -16.29 | ***      |            0 |\n",
+    "| OutVehTime        |  -0.255 |    0.0646 |    -3.95 | ***      |            0 |\n",
+    "'''[1:-1]"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -558,7 +533,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.10.2"
+   "version": "3.9.9"
   },
   "toc": {
    "base_numbering": 1,
-Original file line number
+Diff line change
@@ Expand Up / @@ -73,3 +73,5 @@ sphinx: @@
         conda:
           - "https://docs.conda.io/projects/conda/en/latest/"
           - null
+      config:
+        bibtex_reference_style: author_year