Skip to content

Commit

Permalink
Merge pull request #17 from martinholmer/examination-results
Browse files Browse the repository at this point in the history
Add Phase 1 data examination results
  • Loading branch information
martinholmer authored Feb 27, 2024
2 parents abf1ba4 + a74bd48 commit bf28cef
Show file tree
Hide file tree
Showing 4 changed files with 212 additions and 17 deletions.
15 changes: 13 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,16 @@
# tax-microdata

This repository contains all working files for a project to develop a general-purpose validated microdata file for use in PolicyEngine and Tax-Calculator, benchmarked against existing microdata files used in either model.
This repository contains all working files for a project to develop a
general-purpose validated microdata file for use in
[PolicyEngine-US](https://github.com/PolicyEngine/policyengine-us) and
[Tax-Calculator](https://github.com/PSLmodels/Tax-Calculator). The
development will proceed in several phases.

To install, clone the repository and run `pip install -e .` from the root directory. To check that the installation was successful, run `make test` or `pytest .` from the root directory.
To install, clone the repository and run `pip install -e .` from the
root directory. To check that the installation was successful, run
`make test` or `pytest .` from the root directory.

To assess, review the [examination
results](./tax_microdata_benchmarking/examination/results.md) that
compare federal agency tax estimates with those generated using the
microdata file created in each project phase.
35 changes: 20 additions & 15 deletions tax_microdata_benchmarking/examination/methods.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,11 +26,11 @@ comparable to the model-plus-dataset estimates, the estimates for the
two fiscal years overlapping with the calendar year are used in a
simple linear interpolation and linear extrapolation of the two fiscal
year estimate to produce a calendar year estimate. This linear
adjustment is done by the [fy2cy.awk]() script using as input one of
three files containing estimates for fiscal years 2023 and 2024:
[cy23_cbo.csv](./cy23_cbo.csv), [cy23_jct.csv](./cy23_jct.csv), or
[cy23_tsy.csv](./cy23_tsy.csv). These three `.csv` files contain
detailed information about the source of the federal agency estimates.
adjustment is done by the `fy2cy.awk` script using as input one of
three files containing federal agency estimates for fiscal years 2023
and 2024: `cy23_cbo.csv`, `cy23_jct.csv`, or `cy23_tsy.csv`. These
three `.csv` files contain detailed information about the source of
the federal agency estimates.

Model-plus-Dataset Estimates for Phase 1
----------------------------------------
Expand All @@ -49,24 +49,25 @@ calendar year 2023.

The second, which is called the `phase 1 dataset`, is a CSV-formatted
version of the hierarchical dataset created for the
[Policyengine-US](https://github.com/PolicyEngine) microsimulation
model. It contains 2022 CPS data, enhanced with 2015 TSY SOI PUF
data, that is extrapolated to calendar year 2023. (Subsequent phases
of this project will develop other datasets.)
[Policyengine-US](https://github.com/PolicyEngine/policyengine-us)
microsimulation model. It contains 2022 CPS data, enhanced with 2015
TSY SOI PUF data, that is extrapolated to calendar year 2023.
(Subsequent phases of this project will develop other datasets.)

In both these input dataset cases, the same procedure is used to
estimate the amounts corresponding to the federal agency estimates.
This procedure involves using the Tax-Calculator's
command-line-interface tool,
[`tc`](https://taxcalc.pslmodels.org/guide/cli.html). The payroll and
individual income tax liabilities are estimated using the
[`tc`](https://taxcalc.pslmodels.org/guide/cli.html), in the
`examination/taxcalculator` directory. The payroll and individual
income tax liabilities are estimated using the
[`clp.json`](./taxcalculator/clp.json) null reform to produce
estimates for 2023 baseline tax policy. Each tax expenditure estimate
is generated using a simple reform that negates that feature of
baseline tax policy. The several `tc` runs are collected into a
single shell script called [`runs.sh`](./taxcalculator/runs.sh), which
in turn calls the [`execute.sh`](./taxcalculator/execute.sh) script
for each run. The simple tax expenditure reforms are included in the
single shell script called `examination/taxcalculator/runs.sh`, which
in turn calls the `examination/taxcalculator/execute.sh` script for
each run. The simple tax expenditure reforms are included in the
following JSON files:

* **CTC Tax Expenditure**: [`ctc.json`](./taxcalculator/ctc.json)
Expand All @@ -80,7 +81,11 @@ Tax-Calculator is being used as the model in phase 1, but that model
contains no logic to estimate the ACA premium tax credit, so there is
no `ptc.json` reform file.

The results of the two sets of `tc` runs are in the
`examination/taxcalculator/td23.res-expect` and the
`examination/taxcalculator/pe23.res-expect` files.

Model-plus-Dataset Estimates for Phase 2
----------------------------------------

*Text to be added during phase 2 of the project.*
*Text to be added at the end of phase 2 of the project.*
155 changes: 155 additions & 0 deletions tax_microdata_benchmarking/examination/results.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,155 @@
Data Examination Results
========================

This project is developing a new dataset for use by income and payroll
tax microsimulation models. The project is progressing in several
phases. At the end of each phase the current dataset is used as input
to a tax microsimulation model to generate several basic tax
statistics for calendar year 2023. These statistics are then compared
with corresponding estimates generated by federal agencies using their
more comprehensive data. Examining the differences between the
statistics generated from this project's data and the statistics
published by the federal agencies should provide ideas about how to
improve the development of this project's dataset.

For more on the source of the federal agency estimates and on how the
model-plus-dataset estimates are generated, see the [examination
methods](./methods.md) document.

<br>

**CY2023 Payroll Tax Liability** ($ billion)<br>
(federal employee plus employer share)
| Amount | Source |
| ---: | :--- |
| 1580.0 | CBO |
| 1696.7 | Tax-Calculator + phase 1 dataset |
| 1482.1 | Tax-Calculator + taxdata dataset |

<br>

**CY2023 Individual Income Tax Liability** ($ billion)<br>
(federal individual income tax)
| Amount | Source |
| ---: | :--- |
| 2512.3 | CBO |
| 2012.9 | Tax-Calculator + phase 1 dataset |
| 2154.4 | Tax-Calculator + taxdata dataset |

<br>

**CY2023 CTC Tax Expenditure** ($ billion)<br>
(from the federal child tax credit)
| Amount | Source |
| ---: | :--- |
| 122.1 | JCT |
| 108.6 | TSY |
| 113.9 | Tax-Calculator + phase 1 dataset |
| 126.6 | Tax-Calculator + taxdata dataset |

<br>

**CY2023 EITC Tax Expenditure** ($ billion)<br>
(from the federal earned income credit)
| Amount | Source |
| ---: | :--- |
| 71.9 | JCT |
| 63.6 | TSY |
| 64.0 | Tax-Calculator + phase 1 dataset |
| 75.0 | Tax-Calculator + taxdata dataset |

<br>

**CY2023 SSBEN Tax Expenditure** ($ billion)<br>
(from excluding some social security benefits from federal AGI)
| Amount | Source |
| ---: | :--- |
| 45.9 | JCT |
| 31.4 | TSY |
| 46.9 | Tax-Calculator + phase 1 dataset |
| 58.4 | Tax-Calculator + taxdata dataset |

<br>

**CY2023 NIIT Tax Expenditure** ($ billion)<br>
(from the 3.8% federal surtax on investment income)
| Amount | Source |
| ---: | :--- |
| -56.5 | JCT |
| ---- | TSY |
| -69.2 | Tax-Calculator + phase 1 dataset |
| -56.6 | Tax-Calculator + taxdata dataset |

<br>

**CY2023 CGQD Tax Expenditure** ($ billion)<br>
(from taxing long-term capital gains and qualified dividends at lower federal rates)
| Amount | Source |
| ---: | :--- |
| 259.3 | JCT |
| 153.9 | TSY |
| 292.7 | Tax-Calculator + phase 1 dataset |
| 224.5 | Tax-Calculator + taxdata dataset |

<br>

**CY2023 QBID Tax Expenditure** ($ billion)<br>
(from the 20% federal qualified business income deduction)
| Amount | Source |
| ---: | :--- |
| 56.2 | JCT |
| 50.4 | TSY |
| 16.6 | Tax-Calculator + phase 1 dataset |
| 18.2 | Tax-Calculator + taxdata dataset |

<br>

**CY2023 ACA-PTC Tax Expenditure** ($ billion)<br>
(from the federal ACA premium tax credit)
| Amount | Source |
| ---: | :--- |
| 83.9 | JCT |
| 53.0 | TSY |
| ---- | Tax-Calculator + phase 1 dataset |
| ---- | Tax-Calculator + taxdata dataset |

<br>

Comments on Phase 1 Dataset
---------------------------

The phase 1 dataset, which is a flat-file version of the most recent
Policyengine-US 2023 PUF-enhanced CPS input dataset, does a pretty
good job of getting close to federal agency estimates in most cases.

The fact that it overestimates (relative to CBO) payroll tax liability
by about 7% yet underestimates income tax liability (relative to CBO)
by about 20%, suggests the possibility that high earnings may be
underrepresented in the phase 1 dataset and/or unearned income may be
underrepresented in the phase 1 dataset. There are plans in future
phases to do more detailed calculations that should illuminate the
reasons for these discrepancies.

The phase 1 dataset estimates for the CTC, EITC, and SSBEN tax
expenditures are reasonably close to the JCT and TSY tax expenditure
estimates. The fact that the NIIT and CGQD tax expenditure estimates
are noticeably above the JCT estimates (NIIT by about 22% and CGQD by
almost 13%) suggests that investment income, especially among high-AGI
tax units, is overrepresented. This is interesting because it casts
doubt on the suggestion made in the previous paragraph that unearned
income might be underrepresented in the phase 1 dataset.

The phase 1 dataset estimates for the QBID tax expenditure is
significantly lower than either federal agency estimate: about 70%
lower than JCT and about 67% lower than TSY. The model has been
extensively tested with a variety of hypothetical tax units with
qualified business income, so there is confidence that the model
represents the QBID tax rules correctly. However, there is a shortage
of publicly-available information about the attributes of the
businesses generating the qualified business income, and it is exactly
that kind of business information that is required to estimate
accurately the QBID tax expenditure.

And finally, Tax-Calculator does not include any logic for the
Affordable Care Act premium tax credit, so it is impossible to
generate an estimate of this tax expenditure in phase 1.
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
Weighted Tax Reform Totals by Baseline Expanded-Income Decile
Returns ExpInc IncTax PayTax LSTax AllTax
A 189.54 18241.5 2012.9 1696.7 0.0 3709.6

==> pe23-23-#-cgqd-#-tab.text <==
A 189.54 18241.5 292.7 0.0 0.0 292.7

==> pe23-23-#-clp-#-tab.text <==
A 189.54 18241.5 0.0 0.0 0.0 0.0

==> pe23-23-#-ctc-#-tab.text <==
A 189.54 18241.5 113.9 0.0 0.0 113.9

==> pe23-23-#-eitc-#-tab.text <==
A 189.54 18241.5 64.0 0.0 0.0 64.0

==> pe23-23-#-niit-#-tab.text <==
A 189.54 18241.5 -69.2 0.0 0.0 -69.2

==> pe23-23-#-qbid-#-tab.text <==
A 189.54 18241.5 16.6 0.0 0.0 16.6

==> pe23-23-#-ssben-#-tab.text <==
A 189.54 18241.5 46.9 0.0 0.0 46.9

0 comments on commit bf28cef

Please sign in to comment.