Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory error #10

Open
canasdiaz opened this issue Nov 19, 2013 · 5 comments
Open

Memory error #10

canasdiaz opened this issue Nov 19, 2013 · 5 comments
Assignees
Labels

Comments

@canasdiaz
Copy link
Member

$ python mlstats --db-user=root --db-password=root --db-name=mlstats_innodb --db-admin-user=root --db-admin-password=root https://lists.libresoft.es/pipermail/metrics-grimoire/ &> report.log

Traceback (most recent call last):
  File "mlstats", line 37, in <module>
    pymlstats.start()
  File "/home/luis/repos/MailingListStats/pymlstats/__init__.py", line 154, in start
    web_user, web_password)
  File "/home/luis/repos/MailingListStats/pymlstats/main.py", line 145, in __init__
    t,s,np = self.__analyze_mailing_list(mailing_list)
  File "/home/luis/repos/MailingListStats/pymlstats/main.py", line 298, in __analyze_mailing_list
    total, stored, non_parsed = self.__analyze_list_of_files(mailing_list, archives_to_analyze)
  File "/home/luis/repos/MailingListStats/pymlstats/main.py", line 451, in __analyze_list_of_files
    messages, non_parsed_messages = self.mail_parser.get_messages()
  File "/home/luis/repos/MailingListStats/pymlstats/analyzer.py", line 166, in get_messages
    filtered_message['body'])
  File "/home/luis/repos/MailingListStats/pymlstats/analyzer.py", line 265, in make_msgid
    m = hashlib.md5(message.encode('utf-8')).hexdigest()
MemoryError
@ghost ghost assigned sduenas Nov 19, 2013
@sduenas
Copy link
Member

sduenas commented Nov 19, 2013

It seems there's a problem parsing the file 2007-February.txt from metrics-grimoire mailinglists. When mlstats parses it, loops forever.

@sduenas
Copy link
Member

sduenas commented Nov 19, 2013

Dave Neaty report the same error almost two years ago...

https://bugzilla.libresoft.es/show_bug.cgi?id=325

@gpoo
Copy link
Member

gpoo commented Nov 19, 2013

Maybe related to #1

mlstats does not handle correctly attachments, or for that matter it does not handle MIME objects.

IIUC, some points of the a rant on mime parsers in http://jeffreystedfast.blogspot.ca/2013/09/time-for-rant-on-mime-parsers.html applies to mlstats (no, mlstats is not the target of the rant, but some things seems to apply the way mlstats parses mbox files).

gpoo added a commit to gpoo/MailingListStats that referenced this issue May 21, 2014
@gpoo
Copy link
Member

gpoo commented May 21, 2014

I have a branch where this problem is partially solved. The diff is here:
gpoo@ad87c8f

and the branch is https://github.com/gpoo/MailingListStats/tree/strictmbox

I said partially because in some messages my branch might not consider an extra (empty) line that is in the message. I have not looked in detail, and I wrote it a couple of months ago to remember :-)

@gpoo
Copy link
Member

gpoo commented May 23, 2014

FWIW, in the source code of mailbox, with respect to the old classes, there is the following comment:

# This algorithm, and the way it interacts with _search_start() and
# _search_end() may not be completely correct, because it doesn't check
# that the two characters preceding "From " are \n\n or the beginning of
# the file.  Fixing this would require a more extensive rewrite than is
# necessary.  For convenience, we've added a PortableUnixMailbox class
# which does no checking of the format of the 'From' line.

Even though the algorithm changed later, I don't think it changed in a way that solves this issue. Just to keep it in mind.

gpoo added a commit to gpoo/MailingListStats that referenced this issue May 23, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants