Skip to content
This repository has been archived by the owner on Jun 15, 2021. It is now read-only.

RAR library doesn't handle 5.0 format. #280

Open
gkoh opened this issue May 11, 2016 · 4 comments
Open

RAR library doesn't handle 5.0 format. #280

gkoh opened this issue May 11, 2016 · 4 comments

Comments

@gkoh
Copy link
Contributor

gkoh commented May 11, 2016

Just recently my postprocess issued an error on a single release:

2016-05-10 19:05:07 CRITICAL Traceback (most recent call last):
  File "/opt/local/lib/python3.4/site-packages/sqlalchemy/engine/base.py", line 1139, in _execute_context
    context)
  File "/opt/local/lib/python3.4/site-packages/sqlalchemy/engine/default.py", line 450, in do_execute
    cursor.execute(statement, parameters)
psycopg2.DataError: bigint out of range

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/var/www/pynab/postprocess.py", line 57, in process_rars
    return pynab.rars.process(100)
  File "/var/www/pynab/pynab/rars.py", line 287, in process
    db.commit()
<snip>

PostgreSQL wasn't happy either, confirming the 'bigint out of range' error:

2016-05-10 19:05:06 ACST ERROR:  bigint out of range
2016-05-10 19:05:06 ACST STATEMENT:  UPDATE releases SET size=18446744073709551615, passworded='NO' WHERE releases.id = 1159275

I have since deleted the release from the DB to stop the errors.

I tracked this down to the in-built rar library we use not handling that release properly (I downloaded the first part and pulled it apart with the library).
In particular this release looks like a 5.0 format for which our library has no support. The library does manage to find a mostly matching RAR header and proceeds to decode the parts that are complete, which doesn't include the right size data (all 0xFFFFs).
Improving it to support RAR5.0 doesn't look nice, it would be cleaner to rewrite :(

When we encountered similar issues (#217, #233), @Murodese mentioned it was the only library that could extract the needed information from partial headers, I haven't found one either.
newznab handles this with a hand crafted RAR header file decoder written in PHP.

I've been experimenting with unrarlib here:
https://github.com/matiasb/python-unrar

which in basic testing suggests we can use for the same purpose.
However, that python library requires ctypes and needs users to install the libunrar binary package (which should be available for all platforms):
http://www.rarlab.com/rar_add.htm

Of course, doing it this way makes it much more difficult for users to install and use pynab.

So, call for comments on either:

  1. Use an external library, make installation messier
  2. Spend time finding/writing a pure Python RAR header decoder
@NeilBetham
Copy link

To me it seems like a better idea to have a purpose built tool to just parse headers than try and contort an external lib to work headers only. This would also reduce the chance for memory leaks inside a c extension. I can look into working on a new lib if people think that is the way it should go.

@gkoh
Copy link
Contributor Author

gkoh commented Nov 17, 2016

Agreed. The current RAR header decoder is quite brittle, but functional for RAR 4.
There's a pretty good reference in the nZEDb project for all the compression headers they support (RAR5 included).
Additionally, the RAR5 header seems reasonably well documented.
It would be great to have a pure python library if you have the time.

@NeilBetham
Copy link

Okay, I will work on updating or replacing the current lib.

@brookesy2
Copy link
Collaborator

@NeilBetham Legend :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants