Runtimes

Full Data

TARGET=outcome
DROPS='search_conducted,search_disposition,search_type'

~15 minutes for type inference and univariate metrics ~30 minutes to tune first embedded model (I think LGBM?) ~24 hours to tune second embedded model (Linear?)

Observations:

### Continuous Features (Accuracy: Higher = More important)

| feature         |       acc |
|:----------------|----------:|
| hour_of_stop    | 1.200e-02 |
| year_of_stop    | 7.333e-03 |
| license_lat     | 1.333e-03 |
| vehicle_year    | 0.000e+00 |
|       ...       |    ...    |

### Categorical Features (Accuracy: Higher = More important)

| feature           |       acc |
|:------------------|----------:|
| violation_type    | 3.627e-01 |
| chrg_title        | 1.140e-01 |
| chrg_sect_mtch    | 9.200e-02 |
| chrg_title_mtch   | 5.933e-02 |
| stop_chrg_title   | 4.867e-02 |
| article           | 2.600e-02 |
| race              | 2.267e-02 |
| accident          | 1.333e-02 |
| acc_blame         | 1.333e-02 |
| prop_dmg          | 6.000e-03 |
| unmarked_arrest   | 4.667e-03 |
| pers_injury       | 4.000e-03 |
| licensed_outstate | 4.000e-03 |
| comm_license      | 4.000e-03 |
| belts             | 3.333e-03 |
| patrol_entity     | 2.667e-03 |
| vehicle_make      | 2.000e-03 |
| sex               | 0.000e+00 |
|        ...        |    ...    |

Above suggests charge and violation_type should be removed as features (are results, not causes). This also means charge-derived features (eg. "chrg_*_mtch" features) are suspect for leakage. This also includes the "article" feature.

Features

subagency: categorical

'3rd District, Silver Spring'
'6th District, Gaithersburg / Montgomery Village'
'4th District, Wheaton'
'2nd District, Bethesda'
'Headquarters and Special Operations'
'5th District, Germantown'
'1st District, Rockville'
'W15'
'S15' latitude: continuous
latitude of stop longitude: continuous
longitude of stop accident: bool
is accident or not belts: bool
seatbelts worn / present pers_injury: bool
any person injured prop_dmg: bool
any property damage fatal: bool
any fatalities comm_license: bool
commercial license hazmat: bool
??? comm_vehicle: bool
commercial vehicle alcohol: bool
alcohol involved work_zone: bool
stop in a work zone search_conducted: bool
if search conducted on stop search_disposition: categorical
"nan", "Property Only", None, "Contraband Only", "Contraband and Property" search_type: categorical
'nan', 'Both', 'Person', 'Property', None vehicle_type: categorical
auto, truck, other, cycle, trailer, bus, farm, etc., 10 categories vehicle_year: ordinal
year of vehicle vehicle_make: categorical
chevy, honda, toyota, etc. vehicle_color: categorical
paint color violation_type: categorical
Citation, ESERO, SERO, Warning article: categorical
'Maryland Rules', 'Transportation Article', 'nan', 'other' acc_blame: bool
accept blame? >97% are true, not a very useful feature race: categorical
ancestry sex: categorical
biological sex ("M", "F", "U") patrol_entity: categorical
e.g. marked patrol, unmarked patrol, marked laser, etc. hour_of_stop: continuous
0-24.0 year_of_stop: ordinal
2023, ... month_of_stop: ordinal
0-11 weeknum_of_stop: ordinal
week of year weekday_of_stop: ordinal
day of week reg_lat: continuous
registration latitude reg_long: continuous
registration longitude reg_km: continuous
distance between vehicle registration location and stop/search location home_lat: continuous
home latitude (converted from state) home_long: continuous
home longitude (converted from state) license_lat: continuous
license latitude license_long: continuous
license longitude chrg_title: ordinal
charge title number (suspected charge / stop reason) stop_chrg_title: ordinal
stop actual charge title number chrg_title_mtch: bool
if suspected charge matches stop charge chrg_sect_mtch: bool
if suspected charge section number matches stop charge section number unmarked_arrest: bool
if arrest was unmarked tech_arrest: bool
if arrest used technology only (e.g. laser, camera) home_outstate: bool
if driver lives out of state licensed_outstate: bool
if driver is licensed out of state

outcome: categorical [TARGET]

'Citation', 'Arrest', 'nan', 'Warning', 'SERO', None

Drops (Invalid Predictors)

There are a number of highly-related categorical variables that can each function as a target (i.e. are caused outcomes), and so should never be included in the same model as features. The Cramer V correlations are:

               final   stop    charge   charge
               charge  charge  section  title   search  search          viol-
               title   title   matches  matches condu-  dispos  search  ation
                               stop     stop    -cted   ion     type    type   article

outcome 0.16 0.12 0.61 0.70 0.82 0.50 0.49 0.52 0.35 article 0.71 0.46 0.21 0.16 0.03 0.02 0.02 0.58 violation_type 0.34 0.14 0.28 0.20 0.16 0.07 0.09 search_type 0.04 0.04 0.03 0.08 1.00 0.61 search_disposition 0.03 0.04 0.03 0.06 1.00 search_conducted 0.06 0.05 0.03 0.08 chrg_title_mtch 0.29 0.15 0.80 chrg_sect_mtch 0.27 0.08 stop_chrg_title 0.60

search_conducted: bool

if search conducted on stop search_disposition: categorical
"nan", "Property Only", None, "Contraband Only", "Contraband and Property" search_type: categorical
'nan', 'Both', 'Person', 'Property', None

chrg_title: ordinal

charge title number (suspected charge / stop reason) stop_chrg_title: ordinal
stop actual charge title number chrg_title_mtch: bool
if suspected charge matches stop charge chrg_sect_mtch: bool
if suspected charge section number matches stop charge section number

violation_type: categorical

Citation, ESERO, SERO, Warning outcome: categorical [TARGET]
'Citation', 'Arrest', 'nan', 'Warning', 'SERO', None

Unhelpful

article: categorical

'Maryland Rules', 'Transportation Article', 'nan', 'other' acc_blame: bool
accept blame? >97% are true, not a very useful feature

Targets

search_conducted: bool

if search conducted on stop search_disposition: categorical
"nan", "Property Only", None, "Contraband Only", "Contraband and Property" search_type: categorical
'nan', 'Both', 'Person', 'Property', None

chrg_title_mtch: bool

if suspected charge matches stop charge chrg_sect_mtch: bool
if suspected charge section number matches stop charge section number

Runs

TARGET=outcome
TARGET=violation_type
TARGET=search_conducted
TARGET=chrg_title_mtch

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

traffic_notes.md

traffic_notes.md

Runtimes

Full Data

Features

Drops (Invalid Predictors)

Unhelpful

Targets

Runs

Files

traffic_notes.md

Latest commit

History

traffic_notes.md

File metadata and controls

Runtimes

Full Data

Features

Drops (Invalid Predictors)

Unhelpful

Targets

Runs