Configurable memory limit #420

Kixiron · 2020-06-13T18:30:56Z

I’ve been running into an issue where large documents (few hundred MiB) being parsed cause massive amounts of memory usage that can slow or even crash (drastically crash, sometimes causing OOM kills) in production and the ability to limit the memory usage of kuchiki would be invaluable. Having a memory limit could also allow the usage of preallocated buffers, which would do wonders for performance as well

Ygg01 · 2020-06-15T09:42:09Z

Are there any examples you can share? It seems bizarre that a few hundred MiB documents can cause OOM.

While limiting buffers is one solution, it seems like this is probably a pathological case that the parser should be able to handle.

Kixiron · 2020-06-15T16:07:48Z

This (300MiB) is our most problematic example, it and similarly sized ones have OOM'd a 15GB server, but other smaller ones have similarly painful consequences. The code we use to handle html is here, it doesn't seem like it should have any specific issues

Edit: Here's the memory and cpu usage when that file was accessed, the cut is due to the oom killer

pietroalbini · 2020-06-15T16:24:51Z

Other than downloading it from Firefox Send, it's possible to regenerate that file by running:

curl -O https://static.crates.io/crates/jni-android-sys/jni-android-sys-0.0.4.crate
tar xvzf jni-android-sys-0.0.4.crate
cd jni-android-sys-0.0.4
cargo doc --no-deps --features api-level-28,force-define

The file will be located at:

target/doc/src/jni_android_sys/reference/api-level-28.rs.html

The file is a rendered code highlighting of a 946k lines source code file.

jdm · 2020-06-15T20:00:43Z

As a point of interest, I just ran the jni-android-sys HTML through the kuchiki find_matches example program. htop didn't report the program taking more than 4gb of memory over the course of its execution.

jyn514 · 2020-06-16T21:59:53Z

We took a closer look at the metrics for the VM and there were four bursts of downloads from S3, so it's likely that four requests happened at the same time. That would explain OOMing if a single parse takes around 4GB of RAM.

An increase in memory usage 10x the size of the file seems a little large to me, is it possible to improve that? We're hoping to start parsing only a single file at a time which should help, but ideally we'd use less memory in the first place.

jdm · 2020-07-01T16:26:08Z

I've filed kuchiki-rs/kuchiki#73 for one quick win for memory usage for your use case.

jdm mentioned this issue Jul 1, 2020

NodeData is unnecessarily large, bloating the size of all Nodes kuchiki-rs/kuchiki#73

Closed

jyn514 mentioned this issue Jul 2, 2020

Box most NodeData variants kuchiki-rs/kuchiki#74

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Configurable memory limit #420

Configurable memory limit #420

Kixiron commented Jun 13, 2020

Ygg01 commented Jun 15, 2020 •

edited

Loading

Kixiron commented Jun 15, 2020 •

edited

Loading

pietroalbini commented Jun 15, 2020

jdm commented Jun 15, 2020

jyn514 commented Jun 16, 2020

jdm commented Jul 1, 2020

Configurable memory limit #420

Configurable memory limit #420

Comments

Kixiron commented Jun 13, 2020

Ygg01 commented Jun 15, 2020 • edited Loading

Kixiron commented Jun 15, 2020 • edited Loading

pietroalbini commented Jun 15, 2020

jdm commented Jun 15, 2020

jyn514 commented Jun 16, 2020

jdm commented Jul 1, 2020

Ygg01 commented Jun 15, 2020 •

edited

Loading

Kixiron commented Jun 15, 2020 •

edited

Loading