You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Mar 6, 2019. It is now read-only.
I noticed that a lot of mappings were
a) just wrong (e.g. linked to the wrong record, like col a vs col b, or the wrong version number / line item)
b) missing (e.g. no field to capture some data in a record)
c) duplicated (e.g. multiple fields mapped to the same name)
d) inconsistently named
e) not well segregated (e.g. comma or newline within fields that aren't escaped and are comma/newline separated)
So I'm working on a major overhaul of the source mapping, deriving directly from the e-filing headers all versions.xlsx eFilingFormats file. While at it, I'm having it support versions 1 & 2 as well as deprecated forms.
Because the data import will have to be re-done anyway (because of a-c above), I'm being a bit aggressive about making the names consistent and semantic — e.g. total_receipts_ytd instead of col_b_total_receipts. I'm hoping to reduce the total number of canonical field names from the current ~1.2k to something a bit more sane. ;-)
The new version will have a regex based mapping file, with US delimiters (ascii 31) and field type/size data, both to make it easier to edit in the future and to be able to automatically output a database migration file.
I'm expecting to be done in about a week and will make a pull request then. Right now it's not in a fully consistent state.
So @dwillis et al, please hold off on working on this part of the code for the moment.
(Also, I'll be publishing an .sql.gz dump of the full import to date.)
The text was updated successfully, but these errors were encountered:
This is a pretty significant undertaking; I appreciate the effort. I do want to say one thing about conventions: using something like ytd all the time isn't correct; in some cases col_b is cycle-to-date, and we want to reflect that where we can. Reducing the list of canonical field names is something I'm very interested in, but want to make sure it doesn't lose anything we actually need.
Agreed re. naming. I intend to distinguish them properly. Thanks for
reminding me of the YTD vs CTD difference. (Do you happen to remember what
forms use it?)
In any case, it'd not risk losing anything. I'm enforcing unique names per
version and row type. Might just require a review to ensure the names are
apt; if they aren't, with my new scheme, renaming a column is very easy.
I noticed that a lot of mappings were
a) just wrong (e.g. linked to the wrong record, like col a vs col b, or the wrong version number / line item)
b) missing (e.g. no field to capture some data in a record)
c) duplicated (e.g. multiple fields mapped to the same name)
d) inconsistently named
e) not well segregated (e.g. comma or newline within fields that aren't escaped and are comma/newline separated)
So I'm working on a major overhaul of the source mapping, deriving directly from the
e-filing headers all versions.xlsx
eFilingFormats file. While at it, I'm having it support versions 1 & 2 as well as deprecated forms.Because the data import will have to be re-done anyway (because of a-c above), I'm being a bit aggressive about making the names consistent and semantic — e.g.
total_receipts_ytd
instead ofcol_b_total_receipts
. I'm hoping to reduce the total number of canonical field names from the current ~1.2k to something a bit more sane. ;-)The new version will have a regex based mapping file, with US delimiters (ascii 31) and field type/size data, both to make it easier to edit in the future and to be able to automatically output a database migration file.
I'm expecting to be done in about a week and will make a pull request then. Right now it's not in a fully consistent state.
So @dwillis et al, please hold off on working on this part of the code for the moment.
(Also, I'll be publishing an .sql.gz dump of the full import to date.)
The text was updated successfully, but these errors were encountered: