-
Notifications
You must be signed in to change notification settings - Fork 259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Structures cause error #32
Comments
@SergeNov thanks for the report. I'll attempt to reproduce and fix the issue. |
jcrobak
added a commit
that referenced
this issue
Oct 1, 2016
Rather than flattening schemas and adding a '.' between paths in the schema (e.g. `foo.bar`), support the schema path as a first-class object (for schema operations, at least). This is an experimental implementation and likely has bugs. But it supports some simple cases. This implementation changes behavior. Specifically: * `DictReader()` now has a `flatten` argument that defaults to `False`. If Flatten is false, DictReader will read nested data as `{'foo': {'bar': 1}}` instead of as `{'foo.bar': 1}`. * Likewise, this is the new default behavior for the command-line tool with `--format json`. This can be changed with `--flatten`. Known issues: * Repetition-levels still aren't supported. A file with arrays will break. * nulls aren't interpretted at the level. (e.g.: `{"foo": null}` will be interpetted as `{"foo": {"bar": null}}` if `foo` has a child of `bar`. Refs: #32
jcrobak
added a commit
that referenced
this issue
Oct 1, 2016
Rather than flattening schemas and adding a '.' between paths in the schema (e.g. `foo.bar`), support the schema path as a first-class object (for schema operations, at least). This is an experimental implementation and likely has bugs. But it supports some simple cases. This implementation changes behavior. Specifically: * `DictReader()` now has a `flatten` argument that defaults to `False`. If Flatten is false, DictReader will read nested data as `{'foo': {'bar': 1}}` instead of as `{'foo.bar': 1}`. * Likewise, this is the new default behavior for the command-line tool with `--format json`. This can be changed with `--flatten`. Known issues: * Repetition-levels still aren't supported. A file with arrays will break. * nulls aren't interpretted at the level. (e.g.: `{"foo": null}` will be interpetted as `{"foo": {"bar": null}}` if `foo` has a child of `bar`. Refs: #32
Still experiencing this issue in version 1.3.1. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi Joe and others
I am trying to use your module to read a parquet file, and i ran into a problem here:
schema.py, line 21:
assert len(self.schema_elements) == len(self.schema_elements_by_name)
Apparently the init method assumes that my structure has multiple fields with the same name. Module works correctly if you comment out this line though
Originally these files were used by Hive, and here is the list of fields in the table:
fileid bigint,
version bigint,
ip_geocode structcountrycode:string,regionname:string,city:string,postalcode:string,metrocode:string,dmacode:string,
timestamp bigint,
region bigint,
pixel bigint,
uuid bigint,
uuid_exists boolean,
referingurl string,
useragent string,
ip string,
querystring string,
campaignsinfo array<struct<campaign_id:bigint,media_types:array,advertiser_id:bigint,funnel_step_id:bigint,funnel_step_value:bigint,track_conversion:boolean>>,
opted_out boolean,
event_id string
Here is how the list of fields that the module sees:
name=u'hive_schema', field_id=None, repetition_type=None, type_length=None, precision=None, num_children=17, converted_type=None, type=None
name=u'fileid', field_id=None, repetition_type=1, type_length=None, precision=None, num_children=None, converted_type=None, type=2
name=u'version', field_id=None, repetition_type=1, type_length=None, precision=None, num_children=None, converted_type=None, type=2
name=u'ip_geocode', field_id=None, repetition_type=1, type_length=None, precision=None, num_children=6, converted_type=None, type=None
name=u'countrycode', field_id=None, repetition_type=1, type_length=None, precision=None, num_children=None, converted_type=None, type=6
name=u'regionname', field_id=None, repetition_type=1, type_length=None, precision=None, num_children=None, converted_type=None, type=6
name=u'city', field_id=None, repetition_type=1, type_length=None, precision=None, num_children=None, converted_type=None, type=6
name=u'postalcode', field_id=None, repetition_type=1, type_length=None, precision=None, num_children=None, converted_type=None, type=6
name=u'metrocode', field_id=None, repetition_type=1, type_length=None, precision=None, num_children=None, converted_type=None, type=6
name=u'dmacode', field_id=None, repetition_type=1, type_length=None, precision=None, num_children=None, converted_type=None, type=6
name=u'timestamp', field_id=None, repetition_type=1, type_length=None, precision=None, num_children=None, converted_type=None, type=2
name=u'region', field_id=None, repetition_type=1, type_length=None, precision=None, num_children=None, converted_type=None, type=2
name=u'pixel', field_id=None, repetition_type=1, type_length=None, precision=None, num_children=None, converted_type=None, type=2
name=u'uuid', field_id=None, repetition_type=1, type_length=None, precision=None, num_children=None, converted_type=None, type=2
name=u'uuid_exists', field_id=None, repetition_type=1, type_length=None, precision=None, num_children=None, converted_type=None, type=0
name=u'referingurl', field_id=None, repetition_type=1, type_length=None, precision=None, num_children=None, converted_type=None, type=6
name=u'useragent', field_id=None, repetition_type=1, type_length=None, precision=None, num_children=None, converted_type=None, type=6
name=u'ip', field_id=None, repetition_type=1, type_length=None, precision=None, num_children=None, converted_type=None, type=6
name=u'querystring', field_id=None, repetition_type=1, type_length=None, precision=None, num_children=None, converted_type=None, type=6
name=u'campaignsinfo', field_id=None, repetition_type=1, type_length=None, precision=None, num_children=1, converted_type=3, type=None
name=u'bag', field_id=None, repetition_type=2, type_length=None, precision=None, num_children=1, converted_type=None, type=None
name=u'array_element', field_id=None, repetition_type=1, type_length=None, precision=None, num_children=6, converted_type=None, type=None
name=u'campaign_id', field_id=None, repetition_type=1, type_length=None, precision=None, num_children=None, converted_type=None, type=2
name=u'media_types', field_id=None, repetition_type=1, type_length=None, precision=None, num_children=1, converted_type=3, type=None
name=u'bag', field_id=None, repetition_type=2, type_length=None, precision=None, num_children=1, converted_type=None, type=None
name=u'array_element', field_id=None, repetition_type=1, type_length=None, precision=None, num_children=None, converted_type=None, type=2
name=u'advertiser_id', field_id=None, repetition_type=1, type_length=None, precision=None, num_children=None, converted_type=None, type=2
name=u'funnel_step_id', field_id=None, repetition_type=1, type_length=None, precision=None, num_children=None, converted_type=None, type=2
name=u'funnel_step_value', field_id=None, repetition_type=1, type_length=None, precision=None, num_children=None, converted_type=None, type=2
name=u'track_conversion', field_id=None, repetition_type=1, type_length=None, precision=None, num_children=None, converted_type=None, type=0
name=u'opted_out', field_id=None, repetition_type=1, type_length=None, precision=None, num_children=None, converted_type=None, type=0
name=u'event_id', field_id=None, repetition_type=1, type_length=None, precision=None, num_children=None, converted_type=None, type=6
name=u'dt', field_id=None, repetition_type=1, type_length=None, precision=None, num_children=None, converted_type=None, type=1
name=u'hr', field_id=None, repetition_type=1, type_length=None, precision=None, num_children=None, converted_type=None, type=1
Apparently there are 2 elements named 'array_element' and 'bag' - i assume these fields just come with structures
The text was updated successfully, but these errors were encountered: