"close" syscall run twice on same file descriptor when loading binary LZW-compressed TIFF #7042

philip-bl · 2023-03-29T15:20:09Z

If I open an LZW-compressed binary TIFF file using pillow and load its data, pillow calls the "close" syscall on one of the two relevant file descriptors twice. The second call results in a "EBADF (Bad file descriptor)" error. The program doesn't crash and everything seems to work ok. But if multithreading is used and a race condition happens, this might crash the program.

In this example, the expected behaviour would be that there is only one "close" syscall on each of the two relevant file descriptors and that there is no "EBADF" error. Below, I show how to reproduce the problem.

$ cat create_tiff_file.py
#!/usr/bin/env python

import numpy as np
import PIL.Image as Image

Image.fromarray(np.random.rand(500, 600) > 0.9) \
    .save("example.tif", format="TIFF", compression="tiff_lzw")
$ cat read_tiff_file.py
#!/usr/bin/env python

import PIL.Image as Image

print("Right before the with-block", flush=True)
with Image.open("example.tif") as mask:
    print("Right in the beginning of the with-block", flush=True)
    mask.load()
    print("Right after load()", flush=True)
print("Right after the with-block", flush=True)
$ mamba create -n pillow_double_close_env --override-channels --channel anaconda python=3.11  # omitting the output
$ mamba activate pillow_double_close_env  # omitting the output
$ pip install numpy pillow  # omitting the output
$ conda list
# packages in environment at /home/philip/soft/mambaforge/envs/pillow_double_close_env:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main    anaconda
_openmp_mutex             5.1                       1_gnu    anaconda
bzip2                     1.0.8                h7b6447c_0    anaconda
ca-certificates           2023.01.10           h06a4308_0    anaconda
certifi                   2022.9.24       py311h06a4308_0    anaconda
ld_impl_linux-64          2.38                 h1181459_1    anaconda
libffi                    3.4.2                h6a678d5_6    anaconda
libgcc-ng                 11.2.0               h1234567_1    anaconda
libgomp                   11.2.0               h1234567_1    anaconda
libstdcxx-ng              11.2.0               h1234567_1    anaconda
libuuid                   1.41.5               h5eee18b_0    anaconda
ncurses                   6.4                  h6a678d5_0    anaconda
numpy                     1.24.2                   pypi_0    pypi
openssl                   1.1.1s               h7f8727e_0    anaconda
pillow                    9.4.0                    pypi_0    pypi
pip                       22.2.2          py311h06a4308_0    anaconda
python                    3.11.0               h7a1cb2a_2    anaconda
readline                  8.2                  h5eee18b_0    anaconda
setuptools                65.5.0          py311h06a4308_0    anaconda
sqlite                    3.40.1               h5082296_0    anaconda
tk                        8.6.12               h1ccaba5_0    anaconda
tzdata                    2022a                hda174b7_0    anaconda
wheel                     0.37.1             pyhd3eb1b0_0    anaconda
xz                        5.2.10               h5eee18b_1    anaconda
zlib                      1.2.13               h5eee18b_0    anaconda
$ pip freeze
certifi @ file:///croot/certifi_1669749558946/work/certifi
numpy==1.24.2
Pillow==9.4.0
$ python create_tiff_file.py
$ strace -e decode-fds=path -e trace=openat,close ./read_tiff_file.py
# omitting all the output except for the few last lines, which are the most relevant
Right in the beginning of the with-block
close(4</pillow_double_close_bug_report/example.tif>) = 0
close(4)                                = -1 EBADF (Bad file descriptor)
close(3</pillow_double_close_bug_report/example.tif>) = 0
Right after load()
Right after the with-block
+++ exited with 0 +++

Also, I want to note that this caused my training of a neural network to crash once. It happened because Pytorch's DataLoader uses multithreading and a race condition happened.

The text was updated successfully, but these errors were encountered:

nurumaik · 2023-03-29T17:23:59Z

Updated example that actually leads to bugs

create_tiff_file.py

#!/usr/bin/env python

import numpy as np
import PIL.Image as Image

# Create really big file so it will take time to load. But even bigger file will cause libtiff to complain that we are trying to zip-bomb it
Image.fromarray(np.random.rand(5000, 30000) > 0.9) \  
    .save("example.tif", format="TIFF", compression="tiff_lzw")

reproduce.py

#!/usr/bin/env python
from threading import Thread
import os
from collections import deque

finish = False

def load_files():
    descriptors = deque()
    while not finish:
        try:
            fd = os.open("test", os.O_RDONLY)
            descriptors.append(fd)
        except:  # in case of too many descriptors
            os.close(descriptors.pop())
    
    for d in descriptors:
        os.close(d)
    
    print("success")


t = Thread(target=load_files)

import PIL.Image as Image

print("Right before the with-block", flush=True)

with Image.open("example.tif") as mask:
    print("Right in the beginning of the with-block", flush=True)
    t.start()
    mask.load()  # replace it with time.sleep(10) and it works just fine without exceptions
    print("Right after load()", flush=True)
print("Right after the with-block", flush=True)
finish = True
t.join()

Steps to reproduce:

touch test
./create_tiff_file.py
./reproduce.py

Note: you need to delete example.tif and test before every iteration of running this. otherwise the OS might cache the files and thus the problem won't reproduce because libtiff will load file too fast

Expected behavior:
load_files outputs "success"

Actual behavior:

Right in the beginning of the with-block
Right after load()
Right after the with-block
Exception in thread Thread-1 (load_files):
Traceback (most recent call last):
  File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "reproduce.py", line 17, in load_files
    os.close(d)
OSError: [Errno 9] Bad file descriptor

Cause of bug

tiff loader creates file duplicate of file descriptor for use in external c library code

Pillow/src/PIL/TiffImagePlugin.py

Line 1256 in 5832362

fp = hasattr(self.fp, "fileno") and os.dup(self.fp.fileno())
As comments say "libtiff closes the file descriptor, so pass in a dup.", libtiff probably closes fp after decoding here

Pillow/src/PIL/TiffImagePlugin.py

Line 1298 in 5832362

n, err = decoder.decode(b"fpfp")
Pillow closes fp again

Pillow/src/PIL/TiffImagePlugin.py

Line 1308 in 5832362

os.close(fp)

If another thread decides to open new file between steps 2 and 3 (like load_files in example), it may get file descriptor that was just closed by libtiff. Then, in step 3, pillow closes this file, making descriptor invalid

nurumaik · 2023-03-29T17:32:41Z

This fixes the bug, but I'm not sure that libtiff will always close the descriptor, even if some error occurs

diff --git a/src/PIL/TiffImagePlugin.py b/src/PIL/TiffImagePlugin.py
index 3d4d0910a..d5ea99075 100644
--- a/src/PIL/TiffImagePlugin.py
+++ b/src/PIL/TiffImagePlugin.py
@@ -1296,6 +1296,8 @@ class TiffImageFile(ImageFile.ImageFile):
                 self.fp.seek(0)
             # 4 bytes, otherwise the trace might error out
             n, err = decoder.decode(b"fpfp")
+            # fp should be closed after call to decode
+            fp = None
         else:
             # we have something else.
             logger.debug("don't have fileno or getvalue. just reading")

radarhere · 2023-03-30T12:32:43Z

Because it's unclear if libtiff will close the file descriptor or not, what about this as an alternative solution? It checks if the file descriptor is open first, by trying to read 0 bytes.

I've tested it against create_tiff_file.py and reproduce.py and it works.

diff --git a/src/PIL/TiffImagePlugin.py b/src/PIL/TiffImagePlugin.py
index 3d4d0910a..97d184f5f 100644
--- a/src/PIL/TiffImagePlugin.py
+++ b/src/PIL/TiffImagePlugin.py
@@ -1305,6 +1305,9 @@ class TiffImageFile(ImageFile.ImageFile):
 
         if fp:
             try:
+                # Test if the file descriptor is still open
+                os.read(fp, 0)
+
                 os.close(fp)
             except OSError:
                 pass

nurumaik · 2023-03-30T14:19:01Z

I reproduced the problem with this fix. Unfortunately, reproduction code doesn't work in all cases, and I don't know how to make a better version without hooking something in libtiff. I can only suggest storing file "test" on some very slow drive to ensure read is really slow

How this can still lead to a bug:

We call os.dup on self.fd and get descriptor, let's say 4
decode finishes and calls close(4)
Other thread opens another file, and os gives it the same number 4 as descriptor, since it's already freed
We call os.read(4, 0) on file descriptor and it successfully reads that other file
We close file that doesn't belong to us

I checked libtiff code and it should always close the descriptor after performing decode. It will stay open only if libtiff contains a bug (but this will probably lead to much more serious errors anyway)

https://github.com/python-pillow/Pillow/blob/main/src/libImaging/TiffDecode.c#L723

If libtiff may not close the fd in some cases, the proper fix would be to either:

Somehow report back if it closed fd or not
Disable fd closing in libtiff

radarhere · 2023-04-06T02:13:11Z

As to why we are closing the file pointer ourselves in the first place, it is because of #5936

wiredfool · 2023-04-07T10:11:20Z

As to why we're duping the file pointer in the firster place, I think the original logic was:

Libtiff always closes the pointer when read on an image is complete.
If we have a multiple frames in an image, we're looping over calls into libtiff to decode individual frames. If libtiff closes on the first frame, we don't get subsequent frames
Our original source for the file pointer may be one of a filename, tempfile, or open file like object. We can't necessarily re-open a file once it's been closed.

One option was to use python/our code for all the byte shuffling at the file level. IIRC there was a pretty decent performance and complexity improvement for cases when we did have a file pointer to be able to send that into libtiff and let it manage all of that.

(edit: yeah, that's definitely my comment style)

crazyyao0 · 2023-04-20T06:39:50Z

I met the same problem. Agree with @nurumaik.
A workaround is changing the code from

im = Image.open(file_path)

to

im = Image.open(io.BytesIO(open(file_path, "rb").read()))

radarhere · 2023-04-20T09:19:18Z

@crazyyao0 do you have a different way to reproduce the bug that you could share in a self-contained example?

radarhere · 2023-05-21T06:13:36Z

To be clear, applying the suggested fix from #7042 (comment) does cause the scenario from #5936 to fail, which would otherwise have passed.

radarhere · 2023-06-05T02:00:14Z

I've created PR #7199 to stop duplicating the file pointer in the first place. Does that resolve this?

scskrause · 2024-10-10T10:04:19Z

I don't think this issue is fixed completely. I can still reproduce the "Bad file descriptor" error both on Windows and Linux in heavily multi-threaded applications, e.g. with the following script:

import argparse
import tempfile
import threading
from PIL import Image

parser = argparse.ArgumentParser()
parser.add_argument('filename')
args = parser.parse_args()

def test():
    with Image.open(args.filename) as im, tempfile.NamedTemporaryFile('wb') as f:
        im.save(f, format="TIFF", compression="tiff_lzw")

threads = []

for i in range(32):
    t = threading.Thread(target=test, daemon=True)
    t.start()
    threads.append(t)

for t in threads:
    t.join()

Run it with the attached example tiff (had to gzip it here so that Github would accept it) 3-pages.tiff.gz and you will see many Bad file descriptor error messages.

It seems that the save() method is responsible for the double fd close here.

radarhere · 2024-10-10T12:14:06Z

I pushed a commit to GitHub Actions, and saw no errors on Ubuntu or Windows. What version of Pillow and libtiff are you using?

scskrause · 2024-10-10T12:29:06Z

I tested it with the latest stable release 10.4.0 with the binary wheels from PyPI. Maybe increasing the number of threads could help?

scskrause · 2024-10-10T12:57:13Z

Since it's a race-condition, the issue doesn't happen on every run. On my local machine maybe half the time. This is what repated runs look like on Windows with Python 3.12 and Pillow 10.4.0:

radarhere · 2024-10-11T08:56:27Z

I've created #8458 to address this.

Here are some wheels for you to test if you would like to verify the fix - wheels.zip

scskrause · 2024-10-11T09:54:27Z

Looks good to me, I tested with the provided pillow-11.0.0.dev0-cp312-cp312-win_amd64.whl and couldn't reproduce the issue anymore.

radarhere added the TIFF label Mar 29, 2023

aclark4life added the Anaconda Issues with Anaconda's Pillow label May 19, 2023

radarhere changed the title ~~Pillow performs the "close" syscall twice on the same file descriptor when loading a binary LZW-compressed TIFF file~~ "close" syscall run twice on same file descriptor when loading binary LZW-compressed TIFF May 21, 2023

radarhere mentioned this issue Jun 5, 2023

Do not close provided file handles with libtiff #7199

Merged

radarhere mentioned this issue Jun 19, 2023

PROPOSED FIX: Too many open files from TiffImagePlugin.py #7221

Closed

hugovk closed this as completed in #7199 Jun 30, 2023

radarhere mentioned this issue Oct 11, 2024

Do not close provided file handles with libtiff when saving #8458

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"close" syscall run twice on same file descriptor when loading binary LZW-compressed TIFF #7042

"close" syscall run twice on same file descriptor when loading binary LZW-compressed TIFF #7042

philip-bl commented Mar 29, 2023 •

edited by hugovk

Loading

nurumaik commented Mar 29, 2023 •

edited by radarhere

Loading

nurumaik commented Mar 29, 2023 •

edited

Loading

radarhere commented Mar 30, 2023

nurumaik commented Mar 30, 2023 •

edited

Loading

radarhere commented Apr 6, 2023

wiredfool commented Apr 7, 2023 •

edited

Loading

crazyyao0 commented Apr 20, 2023 •

edited by radarhere

Loading

radarhere commented Apr 20, 2023

radarhere commented May 21, 2023

radarhere commented Jun 5, 2023 •

edited

Loading

scskrause commented Oct 10, 2024 •

edited

Loading

radarhere commented Oct 10, 2024 •

edited

Loading

scskrause commented Oct 10, 2024

scskrause commented Oct 10, 2024

radarhere commented Oct 11, 2024

scskrause commented Oct 11, 2024

"close" syscall run twice on same file descriptor when loading binary LZW-compressed TIFF #7042

"close" syscall run twice on same file descriptor when loading binary LZW-compressed TIFF #7042

Comments

philip-bl commented Mar 29, 2023 • edited by hugovk Loading

nurumaik commented Mar 29, 2023 • edited by radarhere Loading

Updated example that actually leads to bugs

Cause of bug

nurumaik commented Mar 29, 2023 • edited Loading

radarhere commented Mar 30, 2023

nurumaik commented Mar 30, 2023 • edited Loading

radarhere commented Apr 6, 2023

wiredfool commented Apr 7, 2023 • edited Loading

crazyyao0 commented Apr 20, 2023 • edited by radarhere Loading

radarhere commented Apr 20, 2023

radarhere commented May 21, 2023

radarhere commented Jun 5, 2023 • edited Loading

scskrause commented Oct 10, 2024 • edited Loading

radarhere commented Oct 10, 2024 • edited Loading

scskrause commented Oct 10, 2024

scskrause commented Oct 10, 2024

radarhere commented Oct 11, 2024

scskrause commented Oct 11, 2024

philip-bl commented Mar 29, 2023 •

edited by hugovk

Loading

nurumaik commented Mar 29, 2023 •

edited by radarhere

Loading

nurumaik commented Mar 29, 2023 •

edited

Loading

nurumaik commented Mar 30, 2023 •

edited

Loading

wiredfool commented Apr 7, 2023 •

edited

Loading

crazyyao0 commented Apr 20, 2023 •

edited by radarhere

Loading

radarhere commented Jun 5, 2023 •

edited

Loading

scskrause commented Oct 10, 2024 •

edited

Loading

radarhere commented Oct 10, 2024 •

edited

Loading