From 0c865e58ca27efdeb1eb370b0a92ca37f930ab03 Mon Sep 17 00:00:00 2001
From: "martin.holmer@gmail.com" <martin.holmer@gmail.com>
Date: Mon, 26 Feb 2024 15:55:43 -0500
Subject: [PATCH 1/4] Add examination/results.md document and associated files
 and links

---
 README.md                                     |  15 +-
 .../examination/methods.md                    |   8 +-
 .../examination/results.md                    | 158 ++++++++++++++++++
 .../examination/taxcalculator/pe23.res-expect |  24 +++
 4 files changed, 199 insertions(+), 6 deletions(-)
 create mode 100644 tax_microdata_benchmarking/examination/results.md
 create mode 100644 tax_microdata_benchmarking/examination/taxcalculator/pe23.res-expect

diff --git a/README.md b/README.md
index 9a79dc21..0b98a266 100644
--- a/README.md
+++ b/README.md
@@ -1,5 +1,16 @@
 # tax-microdata
 
-This repository contains all working files for a project to develop a general-purpose validated microdata file for use in PolicyEngine and Tax-Calculator, benchmarked against existing microdata files used in either model.
+This repository contains all working files for a project to develop a
+general-purpose validated microdata file for use in
+[PolicyEngine-US](https://github.com/PolicyEngine/policyengine-us) and
+[Tax-Calculator](https://github.com/PSLmodels/Tax-Calculator).  The
+development will proceed in several phases.
 
-To install, clone the repository and run `pip install -e .` from the root directory. To check that the installation was successful, run `make test` or `pytest .` from the root directory.
+To install, clone the repository and run `pip install -e .` from the
+root directory.  To check that the installation was successful, run
+`make test` or `pytest .` from the root directory.
+
+To assess, review the [examination
+results](./tax_microdata_benchmarking/examination/results.md) that
+compare federal agency tax estimates with those generated using the
+microdata file created in each project phase.
diff --git a/tax_microdata_benchmarking/examination/methods.md b/tax_microdata_benchmarking/examination/methods.md
index 83ba118e..92a06ed2 100644
--- a/tax_microdata_benchmarking/examination/methods.md
+++ b/tax_microdata_benchmarking/examination/methods.md
@@ -49,10 +49,10 @@ calendar year 2023.
 
 The second, which is called the `phase 1 dataset`, is a CSV-formatted
 version of the hierarchical dataset created for the
-[Policyengine-US](https://github.com/PolicyEngine) microsimulation
-model.  It contains 2022 CPS data, enhanced with 2015 TSY SOI PUF
-data, that is extrapolated to calendar year 2023.  (Subsequent phases
-of this project will develop other datasets.)
+[Policyengine-US](https://github.com/PolicyEngine/policyengine-us)
+microsimulation model.  It contains 2022 CPS data, enhanced with 2015
+TSY SOI PUF data, that is extrapolated to calendar year 2023.
+(Subsequent phases of this project will develop other datasets.)
 
 In both these input dataset cases, the same procedure is used to
 estimate the amounts corresponding to the federal agency estimates.
diff --git a/tax_microdata_benchmarking/examination/results.md b/tax_microdata_benchmarking/examination/results.md
new file mode 100644
index 00000000..e74ad3d3
--- /dev/null
+++ b/tax_microdata_benchmarking/examination/results.md
@@ -0,0 +1,158 @@
+Data Examination Results
+========================
+
+This project is developing a new dataset for use by income and payroll
+tax microsimulation models.  The project is progressing in several
+phases.  At the end of each phase the current dataset is used as input
+to a tax microsimulation model to generate several basic tax
+statistics for calendar year 2023.  These statistics are then compared
+with corresponding estimates generated by federal agencies using their
+more comprehensive data.  Examining the differences between the
+statistics generated from this project's data and the statistics
+published by the federal agencies should provide ideas about how to
+improve the development of this project's dataset.
+
+For more on the source of the federal agency estimates and on how the
+model-plus-dataset estimates are generated, see the [examination
+methods document](./methods.md).
+
+<br>
+
+**CY2023 Payroll Tax Liability** ($ billion)<br>
+(federal employee plus employer share)
+| Amount | Source |
+| ---:   | :---   |
+| 1580.0 | CBO    |
+| 1696.7 | Tax-Calculator + phase 1 dataset |
+| 1482.1 | Tax-Calculator + taxdata dataset |
+
+<br>
+
+**CY2023 Individual Income Tax Liability** ($ billion)<br>
+(federal individual income tax)
+| Amount | Source |
+| ---:   | :---   |
+| 2512.3 | CBO    |
+| 2012.9 | Tax-Calculator + phase 1 dataset |
+| 2154.4 | Tax-Calculator + taxdata dataset |
+
+<br>
+
+**CY2023 CTC Tax Expenditure** ($ billion)<br>
+(from the federal child tax credit)
+| Amount | Source |
+| ---:   | :---   |
+| 122.1  | JCT    |
+| 108.6  | TSY    |
+| 113.9  | Tax-Calculator + phase 1 dataset |
+| 126.6  | Tax-Calculator + taxdata dataset |
+
+<br>
+
+**CY2023 EITC Tax Expenditure** ($ billion)<br>
+(from the federal earned income credit)
+| Amount | Source |
+| ---:   | :---   |
+| 71.9   | JCT    |
+| 63.6   | TSY    |
+| 64.0   | Tax-Calculator + phase 1 dataset |
+| 75.0   | Tax-Calculator + taxdata dataset |
+
+<br>
+
+**CY2023 SSBEN Tax Expenditure** ($ billion)<br>
+(from excluding some social security benefits from federal AGI)
+| Amount | Source |
+| ---:   | :---   |
+| 45.9   | JCT    |
+| 31.4   | TSY    |
+| 46.9   | Tax-Calculator + phase 1 dataset |
+| 58.4   | Tax-Calculator + taxdata dataset |
+
+<br>
+
+**CY2023 NIIT Tax Expenditure** ($ billion)<br>
+(from the 3.8% federal surtax on investment income)
+| Amount | Source |
+| ---:   | :---   |
+| -56.5  | JCT    |
+| ----   | TSY    |
+| -69.2  | Tax-Calculator + phase 1 dataset |
+| -56.6  | Tax-Calculator + taxdata dataset |
+
+<br>
+
+**CY2023 CGQD Tax Expenditure** ($ billion)<br>
+(from taxing long-term capital gains and qualified dividends at lower federal rates)
+| Amount | Source |
+| ---:   | :---   |
+| 259.3  | JCT    |
+| 153.9  | TSY    |
+| 292.7  | Tax-Calculator + phase 1 dataset |
+| 224.5  | Tax-Calculator + taxdata dataset |
+
+<br>
+
+**CY2023 QBID Tax Expenditure** ($ billion)<br>
+(from the 20% federal qualified business income deduction)
+| Amount | Source |
+| ---:   | :---   |
+| 56.2   | JCT    |
+| 50.4   | TSY    |
+| 16.6   | Tax-Calculator + phase 1 dataset |
+| 18.2   | Tax-Calculator + taxdata dataset |
+
+<br>
+
+**CY2023 ACA-PTC Tax Expenditure** ($ billion)<br>
+(from the federal ACA premium tax credit)
+| Amount | Source |
+| ---:   | :---   |
+| 83.9   | JCT    |
+| 53.0   | TSY    |
+| ----   | Tax-Calculator + phase 1 dataset |
+| ----   | Tax-Calculator + taxdata dataset |
+
+<br>
+
+Comments on Phase 1 Dataset
+---------------------------
+
+The phase 1 dataset, which is a flat-file version of the most recent
+Policyengine-US 2023 PUF-enhanced CPS input dataset, does a pretty
+good job of getting close to federal agency estimates in most cases.
+
+The fact that it overestimates (relative to CBO) payroll tax liability
+by about 7% yet underestimates income tax liability (relative to CBO)
+by about 20%, suggests the possibility that high earnings may be
+underrepresented in the phase 1 dataset and/or unearned income may be
+underrepresented in the phase 1 dataset.  There are plans in future
+phases to do more detailed calculations that should illuminate the
+reasons for these discrepancies.
+
+The phase 1 dataset estimates for the CTC, EITC, and SSBEN tax
+expenditures are reasonably close to the JCT and TSY tax expenditure
+estimates.  The fact that the NIIT and CGQD tax expenditure estimates
+are noticeably above the JCT estimates (NIIT by about 22% and CGQD by
+almost 13%) suggests that investment income, especially among high-AGI
+tax units, is overrepresented.  This is interesting because it casts
+doubt on the suggestion made in the previous paragraph that unearned
+income might be underrepresented in the phase 1 dataset.
+
+The phase 1 dataset estimates for the QBID tax expenditure is
+significantly lower than either federal agency estimate: about 70%
+lower than JCT and about 67% lower than TSY.  The model has been
+extensively tested with a variety of hypothetical tax units with
+qualified business income, so there is confidence that the model
+represents the QBID tax rules correctly.  However, there is a
+shortage of publicly-available information about the nature of
+the businesses generating the qualified business income, and it
+is exactly that kind of business information that is required
+to accurately estimate the QBID tax expenditure.
+
+And finally, Tax-Calculator does not include any logic for the
+Affordable Care Act premium tax credit, so it is impossible to
+generate an estimate of this tax expenditure in phase 1.
+
+
+
diff --git a/tax_microdata_benchmarking/examination/taxcalculator/pe23.res-expect b/tax_microdata_benchmarking/examination/taxcalculator/pe23.res-expect
new file mode 100644
index 00000000..fb5be6fe
--- /dev/null
+++ b/tax_microdata_benchmarking/examination/taxcalculator/pe23.res-expect
@@ -0,0 +1,24 @@
+Weighted Tax Reform Totals by Baseline Expanded-Income Decile
+    Returns    ExpInc    IncTax    PayTax     LSTax    AllTax
+ A   189.54   18241.5    2012.9    1696.7       0.0    3709.6
+
+==> pe23-23-#-cgqd-#-tab.text <==
+ A   189.54   18241.5     292.7       0.0       0.0     292.7
+
+==> pe23-23-#-clp-#-tab.text <==
+ A   189.54   18241.5       0.0       0.0       0.0       0.0
+
+==> pe23-23-#-ctc-#-tab.text <==
+ A   189.54   18241.5     113.9       0.0       0.0     113.9
+
+==> pe23-23-#-eitc-#-tab.text <==
+ A   189.54   18241.5      64.0       0.0       0.0      64.0
+
+==> pe23-23-#-niit-#-tab.text <==
+ A   189.54   18241.5     -69.2       0.0       0.0     -69.2
+
+==> pe23-23-#-qbid-#-tab.text <==
+ A   189.54   18241.5      16.6       0.0       0.0      16.6
+
+==> pe23-23-#-ssben-#-tab.text <==
+ A   189.54   18241.5      46.9       0.0       0.0      46.9

From f00e1b3c07b41c1f49da2f2793b9655b49850c52 Mon Sep 17 00:00:00 2001
From: "martin.holmer@gmail.com" <martin.holmer@gmail.com>
Date: Mon, 26 Feb 2024 16:16:48 -0500
Subject: [PATCH 2/4] Minor changes to the results.md document

---
 tax_microdata_benchmarking/examination/results.md | 13 +++++--------
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/tax_microdata_benchmarking/examination/results.md b/tax_microdata_benchmarking/examination/results.md
index e74ad3d3..80d53be8 100644
--- a/tax_microdata_benchmarking/examination/results.md
+++ b/tax_microdata_benchmarking/examination/results.md
@@ -144,15 +144,12 @@ significantly lower than either federal agency estimate: about 70%
 lower than JCT and about 67% lower than TSY.  The model has been
 extensively tested with a variety of hypothetical tax units with
 qualified business income, so there is confidence that the model
-represents the QBID tax rules correctly.  However, there is a
-shortage of publicly-available information about the nature of
-the businesses generating the qualified business income, and it
-is exactly that kind of business information that is required
-to accurately estimate the QBID tax expenditure.
+represents the QBID tax rules correctly.  However, there is a shortage
+of publicly-available information about the attributes of the
+businesses generating the qualified business income, and it is exactly
+that kind of business information that is required to estimate
+accurately the QBID tax expenditure.
 
 And finally, Tax-Calculator does not include any logic for the
 Affordable Care Act premium tax credit, so it is impossible to
 generate an estimate of this tax expenditure in phase 1.
-
-
-

From 22e535d0dcdc6f3c972a6235bc8a345213784b17 Mon Sep 17 00:00:00 2001
From: "martin.holmer@gmail.com" <martin.holmer@gmail.com>
Date: Tue, 27 Feb 2024 08:37:16 -0500
Subject: [PATCH 3/4] Minor edits to methods.md and results.md documents

---
 .../examination/methods.md                    | 23 ++++++++++---------
 .../examination/results.md                    |  2 +-
 2 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/tax_microdata_benchmarking/examination/methods.md b/tax_microdata_benchmarking/examination/methods.md
index 92a06ed2..a1e9edd0 100644
--- a/tax_microdata_benchmarking/examination/methods.md
+++ b/tax_microdata_benchmarking/examination/methods.md
@@ -26,11 +26,11 @@ comparable to the model-plus-dataset estimates, the estimates for the
 two fiscal years overlapping with the calendar year are used in a
 simple linear interpolation and linear extrapolation of the two fiscal
 year estimate to produce a calendar year estimate.  This linear
-adjustment is done by the [fy2cy.awk]() script using as input one of
-three files containing estimates for fiscal years 2023 and 2024:
-[cy23_cbo.csv](./cy23_cbo.csv), [cy23_jct.csv](./cy23_jct.csv), or
-[cy23_tsy.csv](./cy23_tsy.csv).  These three `.csv` files contain
-detailed information about the source of the federal agency estimates.
+adjustment is done by the `fy2cy.awk` script using as input one of
+three files containing federal agency estimates for fiscal years 2023
+and 2024: `cy23_cbo.csv`, `cy23_jct.csv`, or `cy23_tsy.csv`.  These
+three `.csv` files contain detailed information about the source of
+the federal agency estimates.
 
 Model-plus-Dataset Estimates for Phase 1
 ----------------------------------------
@@ -58,15 +58,16 @@ In both these input dataset cases, the same procedure is used to
 estimate the amounts corresponding to the federal agency estimates.
 This procedure involves using the Tax-Calculator's
 command-line-interface tool,
-[`tc`](https://taxcalc.pslmodels.org/guide/cli.html).  The payroll and
-individual income tax liabilities are estimated using the
+[`tc`](https://taxcalc.pslmodels.org/guide/cli.html), in the
+`examination/taxcalculator` directory.  The payroll and individual
+income tax liabilities are estimated using the
 [`clp.json`](./taxcalculator/clp.json) null reform to produce
 estimates for 2023 baseline tax policy.  Each tax expenditure estimate
 is generated using a simple reform that negates that feature of
 baseline tax policy.  The several `tc` runs are collected into a
-single shell script called [`runs.sh`](./taxcalculator/runs.sh), which
-in turn calls the [`execute.sh`](./taxcalculator/execute.sh) script
-for each run.  The simple tax expenditure reforms are included in the
+single shell script called `examination/taxcalculator/runs.sh`, which
+in turn calls the `examination/taxcalculator/execute.sh` script for
+each run.  The simple tax expenditure reforms are included in the
 following JSON files:
 
 * **CTC Tax Expenditure**: [`ctc.json`](./taxcalculator/ctc.json)
@@ -83,4 +84,4 @@ no `ptc.json` reform file.
 Model-plus-Dataset Estimates for Phase 2
 ----------------------------------------
 
-*Text to be added during phase 2 of the project.*
+*Text to be added at the end of phase 2 of the project.*
diff --git a/tax_microdata_benchmarking/examination/results.md b/tax_microdata_benchmarking/examination/results.md
index 80d53be8..0efb8cf6 100644
--- a/tax_microdata_benchmarking/examination/results.md
+++ b/tax_microdata_benchmarking/examination/results.md
@@ -14,7 +14,7 @@ improve the development of this project's dataset.
 
 For more on the source of the federal agency estimates and on how the
 model-plus-dataset estimates are generated, see the [examination
-methods document](./methods.md).
+methods](./methods.md) document.
 
 <br>
 

From 2a170a0d1ccb2ca8d6c644262ba6b978c1cf9d8f Mon Sep 17 00:00:00 2001
From: "martin.holmer@gmail.com" <martin.holmer@gmail.com>
Date: Tue, 27 Feb 2024 13:14:36 -0500
Subject: [PATCH 4/4] Add file location of results from the td23 and pe23 runs
 to methods.md

---
 tax_microdata_benchmarking/examination/methods.md | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/tax_microdata_benchmarking/examination/methods.md b/tax_microdata_benchmarking/examination/methods.md
index a1e9edd0..ec88ecc0 100644
--- a/tax_microdata_benchmarking/examination/methods.md
+++ b/tax_microdata_benchmarking/examination/methods.md
@@ -81,6 +81,10 @@ Tax-Calculator is being used as the model in phase 1, but that model
 contains no logic to estimate the ACA premium tax credit, so there is
 no `ptc.json` reform file.
 
+The results of the two sets of `tc` runs are in the
+`examination/taxcalculator/td23.res-expect` and the
+`examination/taxcalculator/pe23.res-expect` files.
+
 Model-plus-Dataset Estimates for Phase 2
 ----------------------------------------