Merge pull request #17 from martinholmer/examination-results

Add Phase 1 data examination results
PSLmodels · Feb 27, 2024 · bf28cef · bf28cef
2 parents abf1ba4 + a74bd48
commit bf28cef
Show file tree

Hide file tree

Showing 4 changed files with 212 additions and 17 deletions.
diff --git a/README.md b/README.md
@@ -1,5 +1,16 @@
 # tax-microdata
 
-This repository contains all working files for a project to develop a general-purpose validated microdata file for use in PolicyEngine and Tax-Calculator, benchmarked against existing microdata files used in either model.
+This repository contains all working files for a project to develop a
+general-purpose validated microdata file for use in
+[PolicyEngine-US](https://github.com/PolicyEngine/policyengine-us) and
+[Tax-Calculator](https://github.com/PSLmodels/Tax-Calculator).  The
+development will proceed in several phases.
 
-To install, clone the repository and run `pip install -e .` from the root directory. To check that the installation was successful, run `make test` or `pytest .` from the root directory.
+To install, clone the repository and run `pip install -e .` from the
+root directory.  To check that the installation was successful, run
+`make test` or `pytest .` from the root directory.
+
+To assess, review the [examination
+results](./tax_microdata_benchmarking/examination/results.md) that
+compare federal agency tax estimates with those generated using the
+microdata file created in each project phase.
diff --git a/tax_microdata_benchmarking/examination/methods.md b/tax_microdata_benchmarking/examination/methods.md
@@ -26,11 +26,11 @@ comparable to the model-plus-dataset estimates, the estimates for the
 two fiscal years overlapping with the calendar year are used in a
 simple linear interpolation and linear extrapolation of the two fiscal
 year estimate to produce a calendar year estimate.  This linear
-adjustment is done by the [fy2cy.awk]() script using as input one of
-three files containing estimates for fiscal years 2023 and 2024:
-[cy23_cbo.csv](./cy23_cbo.csv), [cy23_jct.csv](./cy23_jct.csv), or
-[cy23_tsy.csv](./cy23_tsy.csv).  These three `.csv` files contain
-detailed information about the source of the federal agency estimates.
+adjustment is done by the `fy2cy.awk` script using as input one of
+three files containing federal agency estimates for fiscal years 2023
+and 2024: `cy23_cbo.csv`, `cy23_jct.csv`, or `cy23_tsy.csv`.  These
+three `.csv` files contain detailed information about the source of
+the federal agency estimates.
 
 Model-plus-Dataset Estimates for Phase 1
 ----------------------------------------
@@ -49,24 +49,25 @@ calendar year 2023.
 
 The second, which is called the `phase 1 dataset`, is a CSV-formatted
 version of the hierarchical dataset created for the
-[Policyengine-US](https://github.com/PolicyEngine) microsimulation
-model.  It contains 2022 CPS data, enhanced with 2015 TSY SOI PUF
-data, that is extrapolated to calendar year 2023.  (Subsequent phases
-of this project will develop other datasets.)
+[Policyengine-US](https://github.com/PolicyEngine/policyengine-us)
+microsimulation model.  It contains 2022 CPS data, enhanced with 2015
+TSY SOI PUF data, that is extrapolated to calendar year 2023.
+(Subsequent phases of this project will develop other datasets.)
 
 In both these input dataset cases, the same procedure is used to
 estimate the amounts corresponding to the federal agency estimates.
 This procedure involves using the Tax-Calculator's
 command-line-interface tool,
-[`tc`](https://taxcalc.pslmodels.org/guide/cli.html).  The payroll and
-individual income tax liabilities are estimated using the
+[`tc`](https://taxcalc.pslmodels.org/guide/cli.html), in the
+`examination/taxcalculator` directory.  The payroll and individual
+income tax liabilities are estimated using the
 [`clp.json`](./taxcalculator/clp.json) null reform to produce
 estimates for 2023 baseline tax policy.  Each tax expenditure estimate
 is generated using a simple reform that negates that feature of
 baseline tax policy.  The several `tc` runs are collected into a
-single shell script called [`runs.sh`](./taxcalculator/runs.sh), which
-in turn calls the [`execute.sh`](./taxcalculator/execute.sh) script
-for each run.  The simple tax expenditure reforms are included in the
+single shell script called `examination/taxcalculator/runs.sh`, which
+in turn calls the `examination/taxcalculator/execute.sh` script for
+each run.  The simple tax expenditure reforms are included in the
 following JSON files:
 
 * **CTC Tax Expenditure**: [`ctc.json`](./taxcalculator/ctc.json)
@@ -80,7 +81,11 @@ Tax-Calculator is being used as the model in phase 1, but that model
 contains no logic to estimate the ACA premium tax credit, so there is
 no `ptc.json` reform file.
 
+The results of the two sets of `tc` runs are in the
+`examination/taxcalculator/td23.res-expect` and the
+`examination/taxcalculator/pe23.res-expect` files.
+
 Model-plus-Dataset Estimates for Phase 2
 ----------------------------------------
 
-*Text to be added during phase 2 of the project.*
+*Text to be added at the end of phase 2 of the project.*
diff --git a/tax_microdata_benchmarking/examination/results.md b/tax_microdata_benchmarking/examination/results.md
@@ -0,0 +1,155 @@
+Data Examination Results
+========================
+
+This project is developing a new dataset for use by income and payroll
+tax microsimulation models.  The project is progressing in several
+phases.  At the end of each phase the current dataset is used as input
+to a tax microsimulation model to generate several basic tax
+statistics for calendar year 2023.  These statistics are then compared
+with corresponding estimates generated by federal agencies using their
+more comprehensive data.  Examining the differences between the
+statistics generated from this project's data and the statistics
+published by the federal agencies should provide ideas about how to
+improve the development of this project's dataset.
+
+For more on the source of the federal agency estimates and on how the
+model-plus-dataset estimates are generated, see the [examination
+methods](./methods.md) document.
+
+<br>
+
+**CY2023 Payroll Tax Liability** ($ billion)<br>
+(federal employee plus employer share)
+| Amount | Source |
+| ---:   | :---   |
+| 1580.0 | CBO    |
+| 1696.7 | Tax-Calculator + phase 1 dataset |
+| 1482.1 | Tax-Calculator + taxdata dataset |
+
+<br>
+
+**CY2023 Individual Income Tax Liability** ($ billion)<br>
+(federal individual income tax)
+| Amount | Source |
+| ---:   | :---   |
+| 2512.3 | CBO    |
+| 2012.9 | Tax-Calculator + phase 1 dataset |
+| 2154.4 | Tax-Calculator + taxdata dataset |
+
+<br>
+
+**CY2023 CTC Tax Expenditure** ($ billion)<br>
+(from the federal child tax credit)
+| Amount | Source |
+| ---:   | :---   |
+| 122.1  | JCT    |
+| 108.6  | TSY    |
+| 113.9  | Tax-Calculator + phase 1 dataset |
+| 126.6  | Tax-Calculator + taxdata dataset |
+
+<br>
+
+**CY2023 EITC Tax Expenditure** ($ billion)<br>
+(from the federal earned income credit)
+| Amount | Source |
+| ---:   | :---   |
+| 71.9   | JCT    |
+| 63.6   | TSY    |
+| 64.0   | Tax-Calculator + phase 1 dataset |
+| 75.0   | Tax-Calculator + taxdata dataset |
+
+<br>
+
+**CY2023 SSBEN Tax Expenditure** ($ billion)<br>
+(from excluding some social security benefits from federal AGI)
+| Amount | Source |
+| ---:   | :---   |
+| 45.9   | JCT    |
+| 31.4   | TSY    |
+| 46.9   | Tax-Calculator + phase 1 dataset |
+| 58.4   | Tax-Calculator + taxdata dataset |
+
+<br>
+
+**CY2023 NIIT Tax Expenditure** ($ billion)<br>
+(from the 3.8% federal surtax on investment income)
+| Amount | Source |
+| ---:   | :---   |
+| -56.5  | JCT    |
+| ----   | TSY    |
+| -69.2  | Tax-Calculator + phase 1 dataset |
+| -56.6  | Tax-Calculator + taxdata dataset |
+
+<br>
+
+**CY2023 CGQD Tax Expenditure** ($ billion)<br>
+(from taxing long-term capital gains and qualified dividends at lower federal rates)
+| Amount | Source |
+| ---:   | :---   |
+| 259.3  | JCT    |
+| 153.9  | TSY    |
+| 292.7  | Tax-Calculator + phase 1 dataset |
+| 224.5  | Tax-Calculator + taxdata dataset |
+
+<br>
+
+**CY2023 QBID Tax Expenditure** ($ billion)<br>
+(from the 20% federal qualified business income deduction)
+| Amount | Source |
+| ---:   | :---   |
+| 56.2   | JCT    |
+| 50.4   | TSY    |
+| 16.6   | Tax-Calculator + phase 1 dataset |
+| 18.2   | Tax-Calculator + taxdata dataset |
+
+<br>
+
+**CY2023 ACA-PTC Tax Expenditure** ($ billion)<br>
+(from the federal ACA premium tax credit)
+| Amount | Source |
+| ---:   | :---   |
+| 83.9   | JCT    |
+| 53.0   | TSY    |
+| ----   | Tax-Calculator + phase 1 dataset |
+| ----   | Tax-Calculator + taxdata dataset |
+
+<br>
+
+Comments on Phase 1 Dataset
+---------------------------
+
+The phase 1 dataset, which is a flat-file version of the most recent
+Policyengine-US 2023 PUF-enhanced CPS input dataset, does a pretty
+good job of getting close to federal agency estimates in most cases.
+
+The fact that it overestimates (relative to CBO) payroll tax liability
+by about 7% yet underestimates income tax liability (relative to CBO)
+by about 20%, suggests the possibility that high earnings may be
+underrepresented in the phase 1 dataset and/or unearned income may be
+underrepresented in the phase 1 dataset.  There are plans in future
+phases to do more detailed calculations that should illuminate the
+reasons for these discrepancies.
+
+The phase 1 dataset estimates for the CTC, EITC, and SSBEN tax
+expenditures are reasonably close to the JCT and TSY tax expenditure
+estimates.  The fact that the NIIT and CGQD tax expenditure estimates
+are noticeably above the JCT estimates (NIIT by about 22% and CGQD by
+almost 13%) suggests that investment income, especially among high-AGI
+tax units, is overrepresented.  This is interesting because it casts
+doubt on the suggestion made in the previous paragraph that unearned
+income might be underrepresented in the phase 1 dataset.
+
+The phase 1 dataset estimates for the QBID tax expenditure is
+significantly lower than either federal agency estimate: about 70%
+lower than JCT and about 67% lower than TSY.  The model has been
+extensively tested with a variety of hypothetical tax units with
+qualified business income, so there is confidence that the model
+represents the QBID tax rules correctly.  However, there is a shortage
+of publicly-available information about the attributes of the
+businesses generating the qualified business income, and it is exactly
+that kind of business information that is required to estimate
+accurately the QBID tax expenditure.
+
+And finally, Tax-Calculator does not include any logic for the
+Affordable Care Act premium tax credit, so it is impossible to
+generate an estimate of this tax expenditure in phase 1.
diff --git a/tax_microdata_benchmarking/examination/taxcalculator/pe23.res-expect b/tax_microdata_benchmarking/examination/taxcalculator/pe23.res-expect
@@ -0,0 +1,24 @@
+Weighted Tax Reform Totals by Baseline Expanded-Income Decile
+    Returns    ExpInc    IncTax    PayTax     LSTax    AllTax
+ A   189.54   18241.5    2012.9    1696.7       0.0    3709.6
+
+==> pe23-23-#-cgqd-#-tab.text <==
+ A   189.54   18241.5     292.7       0.0       0.0     292.7
+
+==> pe23-23-#-clp-#-tab.text <==
+ A   189.54   18241.5       0.0       0.0       0.0       0.0
+
+==> pe23-23-#-ctc-#-tab.text <==
+ A   189.54   18241.5     113.9       0.0       0.0     113.9
+
+==> pe23-23-#-eitc-#-tab.text <==
+ A   189.54   18241.5      64.0       0.0       0.0      64.0
+
+==> pe23-23-#-niit-#-tab.text <==
+ A   189.54   18241.5     -69.2       0.0       0.0     -69.2
+
+==> pe23-23-#-qbid-#-tab.text <==
+ A   189.54   18241.5      16.6       0.0       0.0      16.6
+
+==> pe23-23-#-ssben-#-tab.text <==
+ A   189.54   18241.5      46.9       0.0       0.0      46.9