-
-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"close" syscall run twice on same file descriptor when loading binary LZW-compressed TIFF #7042
Comments
Updated example that actually leads to bugscreate_tiff_file.py #!/usr/bin/env python
import numpy as np
import PIL.Image as Image
# Create really big file so it will take time to load. But even bigger file will cause libtiff to complain that we are trying to zip-bomb it
Image.fromarray(np.random.rand(5000, 30000) > 0.9) \
.save("example.tif", format="TIFF", compression="tiff_lzw") reproduce.py #!/usr/bin/env python
from threading import Thread
import os
from collections import deque
finish = False
def load_files():
descriptors = deque()
while not finish:
try:
fd = os.open("test", os.O_RDONLY)
descriptors.append(fd)
except: # in case of too many descriptors
os.close(descriptors.pop())
for d in descriptors:
os.close(d)
print("success")
t = Thread(target=load_files)
import PIL.Image as Image
print("Right before the with-block", flush=True)
with Image.open("example.tif") as mask:
print("Right in the beginning of the with-block", flush=True)
t.start()
mask.load() # replace it with time.sleep(10) and it works just fine without exceptions
print("Right after load()", flush=True)
print("Right after the with-block", flush=True)
finish = True
t.join() Steps to reproduce: touch test
./create_tiff_file.py
./reproduce.py Note: you need to delete example.tif and test before every iteration of running this. otherwise the OS might cache the files and thus the problem won't reproduce because libtiff will load file too fast Expected behavior: Actual behavior:
Cause of bug
If another thread decides to open new file between steps 2 and 3 (like |
This fixes the bug, but I'm not sure that libtiff will always close the descriptor, even if some error occurs diff --git a/src/PIL/TiffImagePlugin.py b/src/PIL/TiffImagePlugin.py
index 3d4d0910a..d5ea99075 100644
--- a/src/PIL/TiffImagePlugin.py
+++ b/src/PIL/TiffImagePlugin.py
@@ -1296,6 +1296,8 @@ class TiffImageFile(ImageFile.ImageFile):
self.fp.seek(0)
# 4 bytes, otherwise the trace might error out
n, err = decoder.decode(b"fpfp")
+ # fp should be closed after call to decode
+ fp = None
else:
# we have something else.
logger.debug("don't have fileno or getvalue. just reading") |
Because it's unclear if libtiff will close the file descriptor or not, what about this as an alternative solution? It checks if the file descriptor is open first, by trying to read 0 bytes. I've tested it against create_tiff_file.py and reproduce.py and it works. diff --git a/src/PIL/TiffImagePlugin.py b/src/PIL/TiffImagePlugin.py
index 3d4d0910a..97d184f5f 100644
--- a/src/PIL/TiffImagePlugin.py
+++ b/src/PIL/TiffImagePlugin.py
@@ -1305,6 +1305,9 @@ class TiffImageFile(ImageFile.ImageFile):
if fp:
try:
+ # Test if the file descriptor is still open
+ os.read(fp, 0)
+
os.close(fp)
except OSError:
pass |
I reproduced the problem with this fix. Unfortunately, reproduction code doesn't work in all cases, and I don't know how to make a better version without hooking something in libtiff. I can only suggest storing file "test" on some very slow drive to ensure read is really slow How this can still lead to a bug:
I checked libtiff code and it should always close the descriptor after performing decode. It will stay open only if libtiff contains a bug (but this will probably lead to much more serious errors anyway) https://github.com/python-pillow/Pillow/blob/main/src/libImaging/TiffDecode.c#L723 If libtiff may not close the fd in some cases, the proper fix would be to either:
|
As to why we are closing the file pointer ourselves in the first place, it is because of #5936 |
As to why we're duping the file pointer in the firster place, I think the original logic was:
One option was to use python/our code for all the byte shuffling at the file level. IIRC there was a pretty decent performance and complexity improvement for cases when we did have a file pointer to be able to send that into libtiff and let it manage all of that. (edit: yeah, that's definitely my comment style) |
I met the same problem. Agree with @nurumaik. im = Image.open(file_path) to im = Image.open(io.BytesIO(open(file_path, "rb").read())) |
@crazyyao0 do you have a different way to reproduce the bug that you could share in a self-contained example? |
To be clear, applying the suggested fix from #7042 (comment) does cause the scenario from #5936 to fail, which would otherwise have passed. |
I've created PR #7199 to stop duplicating the file pointer in the first place. Does that resolve this? |
I don't think this issue is fixed completely. I can still reproduce the "Bad file descriptor" error both on Windows and Linux in heavily multi-threaded applications, e.g. with the following script: import argparse
import tempfile
import threading
from PIL import Image
parser = argparse.ArgumentParser()
parser.add_argument('filename')
args = parser.parse_args()
def test():
with Image.open(args.filename) as im, tempfile.NamedTemporaryFile('wb') as f:
im.save(f, format="TIFF", compression="tiff_lzw")
threads = []
for i in range(32):
t = threading.Thread(target=test, daemon=True)
t.start()
threads.append(t)
for t in threads:
t.join() Run it with the attached example tiff (had to gzip it here so that Github would accept it) 3-pages.tiff.gz and you will see many It seems that the |
I tested it with the latest stable release 10.4.0 with the binary wheels from PyPI. Maybe increasing the number of threads could help? |
I've created #8458 to address this. Here are some wheels for you to test if you would like to verify the fix - wheels.zip |
Looks good to me, I tested with the provided |
If I open an LZW-compressed binary TIFF file using pillow and load its data, pillow calls the "close" syscall on one of the two relevant file descriptors twice. The second call results in a "EBADF (Bad file descriptor)" error. The program doesn't crash and everything seems to work ok. But if multithreading is used and a race condition happens, this might crash the program.
In this example, the expected behaviour would be that there is only one "close" syscall on each of the two relevant file descriptors and that there is no "EBADF" error. Below, I show how to reproduce the problem.
Also, I want to note that this caused my training of a neural network to crash once. It happened because Pytorch's
DataLoader
uses multithreading and a race condition happened.The text was updated successfully, but these errors were encountered: