Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migration to Pillow and huge performance improvements #137

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

QSchulz
Copy link
Contributor

@QSchulz QSchulz commented Mar 3, 2021

A notable change is that the resize option for images only accepts
percentages for now.

Another notable change is that the .copy() function actually also
applies the quality setting, unlike the implementation with
graphicsmagick.

This has been tested against Pillow from 6.0.0 to 8.1.0 and Pillow-SIMD
7.0.0.post3.

Here are the different benchmarks. The setup is the following: ~1400 photos spread among 31 galleries. Building everything from scratch. Graphicsmagick means the current implementation in prosopopee. "Built Pillow 8.1.0" can be reproduced by installing libjpegturbo and then running pip3 install --no-binary :all: --force-reinstall pillow. Please follow https://pillow.readthedocs.io/en/stable/installation.html#building-from-source to make sure you have all the packages installed in your distribution prior to trying to compile Pillow.

Computer Graphicsmagick Pillow 8.1.0 Built Pillow 8.1.0 Pillow-SIMD 7.0.0.post3
Intel Q6600 (4c/4t @2.4GHz) 4GB RAM (Fedora Desktop 33) 1:37:13.06 26:57.71 17:43.66 N/A
Intel Atom N2800 (2c/4t @1.86GHz 2GB RAM (Fedora Server 33) 5:35:32.57 1:44:21.93 1:16:32.42 N/A
Intel Celeron G1610T (2c/2t @2.3GHz) 4GB RAM (Fedora Server 33) 1:42:33.79 46:10.00 26:10.30 17:30.49
Intel Core i7-8700 (6c/12t @3.2GHz) 32GB RAM (Ubuntu Desktop 20.04.2) 33:01.63 6:00.16 3:40.09 2:16.03
RaspberryPi 4 4GB RAM (Ubuntu Server 20.10) 3:44:57.00 44:43.67 33:29.86 N/A

Regenerating only one gallery of 71 photos:

Computer Graphicsmagick Pillow 8.1.0 Built Pillow 8.1.0 Pillow-SIMD 7.0.0.post3
Intel Q6600 (4c/4t @2.4GHz) 4GB RAM (Fedora Desktop 33) 3:59.49 1:37.47 1:13.93 N/A
Intel Atom N2800 (2c/4t @1.86GHz 2GB RAM (Fedora Server 33) 14:59.40 6:09.36 5:04.45 N/A
Intel Celeron G1610T (2c/2t @2.3GHz) 4GB RAM (Fedora Server 33) 4:33.75 2:31.85 1:45.11 1:18.00
Intel Core i7-8700 (6c/12t @3.2GHz) 32GB RAM (Ubuntu Desktop 20.04.2) 1:31.09 21.179 12.373 8.665
RaspberryPi 4 4GB RAM (Ubuntu Server 20.10) 10:26.63 2:23.45 2:22.50 N/A

Currently, thumbnail generation is done in a single thread while parsing
the galleries by calling graphicsmagick for every thumbnail to be
generated. This is suboptimal even though graphicsmagick spreads its
payload over all available CPU cores.

After a quick and dirty benchmarking, it was found that multiprocessed
Pillow for generating thumbnails was much more efficient than
graphicsmagick.

This PR adds support for generation of tuhmbnails with multiprocessed
Pillow.

Multiple processes have to be used and not multiple threads because
Python still uses the Global Interpreter Lock (GIL) for threads, meaning
they cannot concurrently be running, which is what one wants for CPU
intensive tasks such as thumbnail generation.

Multiprocess brings its own set of challenges because most data
structures cannot be shared between processes, such as the cache for
example. All data modified by any of the processes should be of a type
handled by multiprocess.Manager data structures.

In order to have the best performances, all thumbnails for an image
should be generated at once, so that the original image is opened only
once. This therefore requires to keep track of images and add thumbnails
to be created to the original image. This can be done via a factory
which is passed to the Jinja templates so that they can request
thumbnails for given images without knowing more than the original path,
name of the original image and the parameters of the thumbnails to
create.

The ImageFactory keeps all of those original images in a dictionary
which consists of a virtual path made from the original image name and a
CRC32 of all the options that applies to its thumbnails. This gives
prosopopee the ability to group thumbnails per options (e.g. if options
are passed in gallery settings.yaml).

The original image (or BaseImage) is returned by the ImageFactory and
the templates can then request .copy() or .thumbnail() for it.

The thumbnails are kept in a dictionary whose keys are the name of the
thumbnail which is made out of the original name plus its size and the
crc32 of the original image and the options that apply to it. This way,
thumbnails are guaranteed to be unique even if requested multiple times
by templates.

The size is now read with imagesize.getsize() only once when ratio
property or .copy() is called on the image so that the performance impact
is minimal.

Since multiprocess.Pool.map splits iterables into pre-defined chunks
which are then assigned to processes, it is needed for best performance
to have processes with more or less the same taskload so that one or
more processes aren't idle when one is working 100%. For that, the
original images whose thumbnails are all cached should be removed from
the list of images to generate thumbnails from before the list is passed
to multiprocess.Pool.map so that each process has more or less the same
taskload.

Thanks,
Quentin

This move makes sense if one wants to reuse remove_superficial_options
since it can be not specific to cache.py only.

This prepares prosopopee for Pillow support.

Signed-off-by: Quentin Schulz <[email protected]>
Dry runs (`prosopopee test`) shouldn't dump the cache since nothing's
done except creating the HTML files which means the cache is more or
less meaningless in that case.

Let's dump the cache only when doing a normal build run.

Signed-off-by: Quentin Schulz <[email protected]>
For images, calls to copy() is only needed when later in the template
{{ image }} is used.

Removing those copy() as they trigger creation of thumbnails that will
never be used.

Signed-off-by: Quentin Schulz <[email protected]>
Big gallery covers should be used for lines where only one gallery
cover appears.

With the current logic, if there is a prime number of galleries (except
2 and 3), first one and all galleries whose index is prime (except 2nd
and 3rd) will have a big cover.

In the end, all it matters is that if the galleries_line contains only
one gallery, that gallery should have a big cover.

Signed-off-by: Quentin Schulz <[email protected]>
Loggers work by hierarchy. The parent always overrides whatever the
child logger has already defined. This applies to the loglevel, which is
changed in prosopopee according to the --log-level argument.

Since the root logger (gotten with logger = logging.getLogger()) is the
parent of ALL loggers which could be declared in any third party module,
prosopopee's loglevel also applies to those modules which is usually not
wanted especially when prosopopee's default loglevel is the highest
available.

This is very annoying with Pillow since it's pretty verbose when saving
files.

Instead, let's declare a logger for prosopopee only. Unfortunately,
since the package layout is unconventional (all *.py files in the same
directory, instead of subdirs), the recommended
logger = logging.getLogger(__name__) cannot be used because __name__ is
__main__ in prosopopee.py, and the filename of the file in which it is
used (e.g. in cache.py, it'll be cache). Which means they're not related
in the eyes of the logging module and prosopopee.py's loglevel will not
apply to other *.py files in the project.

Instead the expected value of __name__ for more conventional packaging
layouts is simulated by appending prosopopee. in front of __name__
except for prosopopee.py which is the parent logger and thus will be
simply named prosopopee.

Since prosopopee's logger is not the root logger anymore, NOTSET
loglevel cannot be used anymore because its meaning is basically
"offload messages to parent logger" and the root logger has a default
loglevel of WARNING, meaning prosopopee's default loglevel will not
print anything labelled as INFO or DEBUG.

c.f. https://stackoverflow.com/a/50755200

Signed-off-by: Quentin Schulz <[email protected]>
In order to prepare for multiprocess support, migrate Cache.cache from a
simple dict to a Manager().dict which is one of the data type that can
be modified safely from other processes.

Signed-off-by: Quentin Schulz <[email protected]>
…uration

json.dumps() which is used to write the cache dict to a file transforms
tuples into a list. With the current implementation, if a tuple is
supposed to be cached, the needs_to_be_generated method will always
return True even though it might not be correct.

In order to support tuples in cache entries, let's pass the options
passed as parameter to the method through json.loads(json.dumps()) to
have the same format between cached options and to-be-compared options.

This will be used in a later commit which adds a tuple (width, height) to
the cache.

Signed-off-by: Quentin Schulz <[email protected]>
Currently, thumbnail generation is done in a single thread while parsing
the galleries by calling graphicsmagick for every thumbnail to be
generated. This is suboptimal even though graphicsmagick spreads its
payload over all available CPU cores.

After a quick and dirty benchmarking, it was found that multiprocessed
Pillow for generating thumbnails was much more efficient than
graphicsmagick.

This patch adds support for generation of tuhmbnails with multiprocessed
Pillow.

Multiple processes have to be used and not multiple threads because
Python still uses the Global Interpreter Lock (GIL) for threads, meaning
they cannot concurrently be running, which is what one wants for CPU
intensive tasks such as thumbnail generation.

Multiprocess brings its own set of challenges because most data
structures cannot be shared between processes, such as the cache for
example. All data modified by any of the processes should be of a type
handled by multiprocess.Manager data structures.

In order to have the best performances, all thumbnails for an image
should be generated at once, so that the original image is opened only
once. This therefore requires to keep track of images and add thumbnails
to be created to the original image. This can be done via a factory
which is passed to the Jinja templates so that they can request
thumbnails for given images without knowing more than the original path,
name of the original image and the parameters of the thumbnails to
create.

The ImageFactory keeps all of those original images in a dictionary
which consists of a virtual path made from the original image name and a
CRC32 of all the options that applies to its thumbnails. This gives
prosopopee the ability to group thumbnails per options (e.g. if options
are passed in gallery settings.yaml).

The original image (or BaseImage) is returned by the ImageFactory and
the templates can then request .copy() or .thumbnail() for it.

The thumbnails are kept in a dictionary whose keys are the name of the
thumbnail which is made out of the original name plus its size and the
crc32 of the original image and the options that apply to it. This way,
thumbnails are guaranteed to be unique even if requested multiple times
by templates.

The size is now read with imagesize.getsize() only once when ratio
property or .copy() is called on the image so that the performance impact
is minimal.

A notable change is that the resize option for images only accepts
percentages for now.

Another notable change is that the .copy() function actually also
applies the quality setting, unlike the implementation with
graphicsmagick.

Since multiprocess.Pool.map splits iterables into pre-defined chunks
which are then assigned to processes, it is needed for best performance
to have processes with more or less the same taskload so that one or
more processes aren't idle when one is working 100%. For that, the
original images whose thumbnails are all cached should be removed from
the list of images to generate thumbnails from before the list is passed
to multiprocess.Pool.map so that each process has more or less the same
taskload.

This has been tested against Pillow from 6.0.0 to 8.1.0 and Pillow-SIMD
7.0.0.post3.

Signed-off-by: Quentin Schulz <[email protected]>
…ration

Generating thumbnails is done in parallel threads via
multiprocessing.Pool. By default, Pool schedules tasks on as many
threads as there are cpu threads on the host machine.

Let's allow users to select the number of threads Pool can use.

Signed-off-by: Quentin Schulz <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant