-
Notifications
You must be signed in to change notification settings - Fork 259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Two different errors when reading two different files #54
Comments
Hi @Khris777 - what version of python are on? |
I'm using Python 2.7. |
@Khris777 can you try opening the files in binary mode? i.e. |
Using binary mode leads to the script not finishing at all. It does not lock up, it just runs on and on. The two files are both less than 1 MB, so this is odd. When killing the process after several minutes it throws the usual
|
Any chance you could try to reproduce this on the 1.2 release that was just published? |
I will once I figure out why the latest python-snappy version fails to install. |
Okay, now things are like this. I installed parquet and snappy into my Python 3.6 environment and there parquet works flawlessly, I can read everything just like I can using fastparquet. I did a fresh install, fetching a precompiled snappy-wheel from http://www.lfd.uci.edu/~gohlke/pythonlibs/ and getting the latest parquet with pip. On Python 2.7 however it still doesn't work. I updated the parquet package normally using pip after also installing the precompiled snappy-wheel for 2.7. I have the same data in three different formats, uncompressed, snappy-compressed, and gzip-compressed. All three always throw the same error so it doesn't seem to be a compression problem. My testing code:
throws this error:
Without binary mode
|
Oh interesting. I'd love to try to recreate this issue. How are you generating the parquet file that it fails on? |
The files are generated on a Cloudera Hadoop Cluster version 5.4.4 in Java by a colleague. I asked him for some code and he gave me the parts that write the parquet file, it's part of a larger file though:
Maybe this helps a little, I can't provide you with the files because of company policies. |
I'm using parquet on Windows 10 and I have two different parquet files for testing, one is snappy-compressed, one is not compressed.
Simple test code for reading:
The uncompressed file throws this error:
Reading the compressed file like that gives:
I can open both files with fastparquet 0.0.5 just fine so there's nothing wrong with the files.
What am I doing wrong?
Do I have to explicitely uncompress the data with snappy or is parquet doing that by itself?
Can you in general provide some more documentation on the basic usage?
The text was updated successfully, but these errors were encountered: