Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid reprocessing the same file over and over again #109

Open
dmitryd opened this issue May 14, 2024 · 8 comments
Open

Avoid reprocessing the same file over and over again #109

dmitryd opened this issue May 14, 2024 · 8 comments

Comments

@dmitryd
Copy link

dmitryd commented May 14, 2024

If the file passed the thereshold check, it will always be reprocessed again but the result will be discarded if the new size is the same or larger. It could save resources if the file was marked in some way to avoid reprocessing. The goal is to avoid running resizes on hundreds of files that still exceed the threshold. Executing ImageMafic is VERY resource consuming!

I can imagine it by using file metadata, a new field with a hash that is computed as the following:

sha1([
  $fileModificationDate,
  $fileSize,
  $imageWidth,
  $imageHeight,
  serialize(file_get_contents('....../image_autoresize.config.php'))
]);

When running, get the metadata, compute the hash and compare with the stored hash. If they match, do not attempt to resize it.

Possible question: why not fine tune the threashold?
Answer: because the threshold is about file size and file size greatly depends on the content of the image. For example, a 3000x2000px solid white jpeg can be smaller in bytes than 1000x600px jpeg of the sea or city.

What do you think?

If you do not think it is a good idea, would you consider at least adding an event to the beginning of the ImageResizer::processFile() to let other extensions decide if the file should be processed or not?

@xperseguers
Copy link
Owner

What I miss in your description is the context in which you think (or figured out) that the file will be "reprocessed again".

If you do not think it is a good idea, would you consider at least adding an event to the beginning of the ImageResizer::processFile() to let other extensions decide if the file should be processed or not?

Regardless of whether I think it's a good idea, adding an event is basically no problem for me as extensibility in the heart spirit of TYPO3.

@dmitryd
Copy link
Author

dmitryd commented May 14, 2024

What I miss in your description is the context in which you think (or figured out) that the file will be "reprocessed again".

I set the threshold to 50K. I have a file, which is 100K in size.

Each time when I run the command, this line is executed on the same set of files:

$tempFileInfo = $gifCreator->imageMagickConvert($fileName, $destExtension, '', '', $imParams, '', $options, true);

So the file gets resized with ImageMagic on each scheduler execution taking CPU time.

Then there is this check:

        } elseif (!$isRotated && filesize($tempFileInfo[3]) >= $originalFileSize - 10240 && $destExtension === $fileExtension) {
            // Conversion leads to same or bigger file (rounded to 10KB to accommodate tiny variations in compression) => skip!
            @unlink($tempFileInfo[3]);
            $tempFileInfo = null;
        }

And the result of the resizing is discarded. What is interesting: it tries to scale to the same width/height, with each scheduler execution. The result is always discarded because the file was resized already before.

@xperseguers
Copy link
Owner

OK, so the context is that it runs over and over again for the same files when the scheduler task for batch processing is invoked.

@dmitryd
Copy link
Author

dmitryd commented May 14, 2024

Yes. Our editors can upload 6000x4000 huge images, so we need to run the task regularly. Manual execution is not an option because it is a closed system with no console and there are a lot of sites like this. Thus, the scheduler runs daily. It would be good to optimize the resizing process :)

@xperseguers
Copy link
Owner

and just to get it, why are you not resizing on the fly during upload? Your editors are pushing those files via FTP or some external system?

@dmitryd
Copy link
Author

dmitryd commented May 14, 2024

We already have many sites with a lot of files. The problem exists for quite soime time.

We could run the job once and rely on the resizing on the fly for new files. Is this what you suggest? 🤔

@xperseguers
Copy link
Owner

Yes this is what I suggest. In my experience, unless there is a misconfiguration of the GFX part and you don't see that quickly enough, or you have misconfigured how to resize, the resizing on-the-fly while uploading works fine. This makes the upload slightly slower, that's true, but that usually doesn't really bother anyone.

I typically run the batch processing only once and if I see that somehow the resizing was not properly configured or I had GFX problems and my editors just uploaded bunch of huge photos. But I really don't run that task on a daily basis as upload is only possible through the Backend (or through custom Frontend plugin but in that case, if you do it correctly, meta-data extraction, resizing and everything works fine as well).

@dmitryd
Copy link
Author

dmitryd commented May 14, 2024

Thank you! We will do it like this. 🙇

Please, feel free to close the ticket (or add an event using this ticket).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants