Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory Leak on extracting text from files #138

Open
raghav-axero opened this issue Sep 20, 2019 · 0 comments
Open

Memory Leak on extracting text from files #138

raghav-axero opened this issue Sep 20, 2019 · 0 comments

Comments

@raghav-axero
Copy link

raghav-axero commented Sep 20, 2019

We noticed that every time we extract the text from TikaOnDotNet there is memory leak after the text has been extracted:

The code is simple as given in your samples:

new TextExtractor().Extract(filePhysicalPath);

Already using the latest Dlls:

TikaOnDotNet.dll (version 1.17.1.0)

image

TikaOnDotNet.TextExtraction.dll (version 1.17)

image

IIS version on we are testing: 10
image

Target Framework: 4.7.2
image

Memory leak detection by ANTS Profiler:

The first is the base when we didn't start any extraction, second is the one which we took after the extraction has been completed.

The second one is confirming that memory increased and stayed there even after the extraction has been completed.

image

image

image

You can see from the above screenshots that "LinkedHashMap + Entry" live objects from "java.util" are still there in the memory even after the extraction has already been completed.

I am attaching the PDF with which you can try the above test:
PDF: 200 MB size
https://drive.google.com/file/d/1DWdWfkHebS9aLpqiLAbaRwwiSamGw8Ym/view?usp=sharing

EDIT:

If I use the following code before and after Tika extraction, the memory comes back to normal levels:

               // Force GC to handle memory leak via Tika
                GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced);
                GC.WaitForPendingFinalizers();
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant