Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What should be done if, given an ILCS, there are multiple matching IUCR codes? #4

Closed
bepetersn opened this issue Aug 11, 2014 · 3 comments
Labels

Comments

@bepetersn
Copy link
Member

It seems as though our iucr package's functionality currently does nothing if, when trying to associate an IUCR code with an ILCS statute reference, it finds more than one code. More specfically, the iucr package raises an exception, which statute.py of this data project responds to by setting a disposition's iucr_code field to the empty string. We are currently losing about 30% of our IUCR data just to this, in absolute terms.

However, it's really a little bit worse than just 30%. Some statutes are affected disproportionately by this. I am planning on posting a JSON document with all of the statutes for which this happens, along with counts for each. Consider 720-5/19-1(a), though. Burglary. There are around 15000 dispositions for which there is no IUCR code because of this issue. This translates into about half as many convictions with no IUCR code.

Here are some of the other statutes disproportionately affected by this issue:

  • 625-5/11-501(a)
  • 720-570/401(c-d)
  • 720-5/16-1(a)(1)
  • 625-5/4-103(a)(1)

In my opinion, there isn't an obvious solution to this problem. The shape of the data varies among statutes, but typically there is at least SOME relationship between the multiple IUCR codes associated with a single statute. So from one perspective, it might not matter that much. The simplest thing I can think to do is to return the first IUCR code associated with a statute. It might be possible to make this slightly more dynamic in the cases where there might be value in doing so. For instance, choosing the most "severe" IUCR code.

Thoughts, @ghing?

@bepetersn
Copy link
Member Author

This is related to #3.

@ghing
Copy link
Contributor

ghing commented Aug 11, 2014

@bepetersn Good catch.

More specfically, the iucr package raises an exception, which statute.py of this data project responds to by setting a disposition's iucr_code field to the empty string.

Is this precisely what happens? The iucr.lookup_by_ilcs() should return a list of Offense objects when an ILCS code maps to more than one offense. See https://github.com/sc3/python-iucr/blob/master/iucr/__init__.py#L108 through https://github.com/sc3/python-iucr/blob/master/iucr/__init__.py#L110.

In any case, the important observation is that we currently don't try to disambiguate between the multiple IUCR codes and just set it to an empty string.

Here's a few thoughts off the top of my heads:

  • Doing nothing (the current behavior) is wrong.
  • I don't want to arbitrarily set the IUCR code to the first or to the the most severe matching IUCR.
  • Can you check and see if we can use the final_chrgdesc field to disambiguate when the ILCS maps to different IUCR codes? I added this field in a recent pull. You'll need a new version of the database dump to get this. I'll get started on adding this to drive. If you want to start looking at this, just see if the chrgdesc and ammndchrgdescr fields provide enough to disambiguate. final_chrgdesc is just ammndchrgdescr if it exists otherwise chrgdesc.
  • In most cases, are the categories of the IUCR the same for the ambiguous IUCR mappings? I know that for 720-5/19-1(a) it spans Burglary and Burglary or Theft From Motor Vehicle , but I don't think that's a huge deal. If the category is at least the same, we can leave the iucr_code field empty and instead just set the iucr_category field. For most of the questions we've seen so far, I think this gives us enough info to construct our queries.

@bepetersn
Copy link
Member Author

#4 is part of an answer to this. The rest of it is that we ultimately don't care about getting IUCRs for every single statute, especially if it's not due to our incompetence, but because of the way statutes and IUCR codes get assigned.

Between using charge descriptions, and @ghing's work to roll up IUCR codes to our categories of interest, we will handle multiple IUCR codes for a statute.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants