Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OIRReader: Major performance improvement when reading a ROI within a plane #4205

Merged
merged 10 commits into from
Sep 10, 2024

Conversation

NicoKiaru
Copy link
Contributor

In the implementation before this PR each time a small ROI is read from an image, a full OIR Pixelblock was read. Combined with the fact that the Optimal Tile Size was not overriden and defaulted to a thin stripe in y (128 pixels), this could lead to a particularly bad scenario for big images:

  • Imagine a 10000 by 10000 pixel plane read by tiles of 10000x128 pixels -> 79 tiles have to be read to read the full plane
  • For big tiled OIR images, a PixelBlock typically spans the full plane
  • This means that, to acquire all tiles of this plane, the full plane is actually read 97 times

The proposed new implementation of this PR:

  • prevents reading fully all PixelBlocks when one needs to read a ROI within them
  • overrides getOptimalTile size methods to match the underlying PixelBlock size (or crops at 2048 pixels)

When requesting a cropped region within big images, the speedup is massive.

Here's a real time display with big dataviewer with the current implementation:
previousreader

Here's a real time display with big dataviewer with the implementation in this PR:
prreader

…e OIRReader:

- prevents reading fully all PixelBlocks when one needs to read a ROI within them
- overrides getOptimalTile size methods to match the underlying PixelBlock size (or crops at 2048 pixels)
@imagesc-bot
Copy link

This pull request has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/why-is-bioformats-reading-large-oir-files-so-much-slower-than-the-microscope-companion-software-provided-by-olympus/66922/6

@dgault dgault added the include label Jul 1, 2024
@dgault
Copy link
Member

dgault commented Jul 1, 2024

Thanks again @NicoKiaru for the hard work investigating and getting the PR opened. I have marked it for inclusion for the repository tests to check for any side effects in other datasets.

@NicoKiaru
Copy link
Contributor Author

Thanks, I have tested it on very few files, so while I tried to keep the changes to the minimum, I'm not sure I did not break something! Let's see.

@jburel
Copy link
Member

jburel commented Jul 3, 2024

Failed with the following errors

  [testng] java.lang.NullPointerException: Cannot assign field "yStart" because "block" is null
   [testng] 	at loci.formats.in.OIRReader.initFile(OIRReader.java:432)
   [testng] 	at loci.formats.FormatReader.setId(FormatReader.java:1480)
   [testng] java.lang.ArrayIndexOutOfBoundsException: Index 1023 out of bounds for length 1023
at loci.formats.in.OIRReader.initFile(OIRReader.java:420)

@melissalinkert
Copy link
Member

Note that some of the test failures occur on public data:

@NicoKiaru
Copy link
Contributor Author

So I've fixed what I could test with the public files.

@dgault
Copy link
Member

dgault commented Jul 15, 2024

Added a commit to fix a NullPointerException in the nightly tests (https://merge-ci.openmicroscopy.org/jenkins/job/BIOFORMATS-test-folder/18844/console)

@dgault
Copy link
Member

dgault commented Jul 16, 2024

Added a further commit to address failures in the nightly tests: https://merge-ci.openmicroscopy.org/jenkins/job/BIOFORMATS-test-folder/19008/console

These last 2 failures related to spectral images using modulo C. I have updated the indexing for the blocks to match the previous behaviour.

@dgault dgault removed the include label Jul 18, 2024
@dgault
Copy link
Member

dgault commented Jul 18, 2024

I am still seeing failures for the modulo C files in the nightly tests, though from testing it looks as though the original implementation currently in the reader may also be incorrect. In both cases a single set of pixel blocks is being used for each of the modulo C frames, the tests fail because this PR is defaults to a different frame.

The real issue seems to be that in the case of the Lambda files as set here, that I am unsure how to identify from the pixel block uids which index they correctly belong to. Currently both implementations are matching the channel id which is the same for each step and results in the same frames each time. I will try and dig further to see if I can correctly identify the correct blocks.

@NicoKiaru
Copy link
Contributor Author

NicoKiaru commented Jul 18, 2024

Ok, good luck with that. If you have a public lambda file, I could also investigate. But if not I can also try to create one and make it public.

@NicoKiaru
Copy link
Contributor Author

NicoKiaru commented Jul 18, 2024

I've submitted some example file to the bio-formats QA community in Zenodo. They cover:

  • lambda scan
  • lambda scan + t
  • lambda scan + z
  • lambda scan + z + t

EDIT: Zenodo repo: https://zenodo.org/records/12773657

And it looks like none of these files are properly opened by the current reader (and I'm pretty sure this PR does not improve the situation).

NicoKiaru added a commit to BIOP/ijp-kheops that referenced this pull request Jul 19, 2024
@NicoKiaru
Copy link
Contributor Author

I've made a small modification, which, in the files I could test, fixes the reading of spectral acquisition.

@NicoKiaru
Copy link
Contributor Author

Hello, do you think you would be able to give it a (new) go ? I've shared some sample spectral dataset + the previous version was not able to read spectral dataset from FV4000. Of course there may still be issued with private files, but I can't fix anything in that regard.

@sbesson sbesson added the include label Sep 2, 2024
@sbesson sbesson added this to the 8.0.0 milestone Sep 2, 2024
@melissalinkert
Copy link
Member

Thanks for the updates, @NicoKiaru, and apologies for the delay. This will now be included in tonight's builds, so we'll know the status in the morning.

@melissalinkert
Copy link
Member

This appears to have passed all tests (https://merge-ci.openmicroscopy.org/jenkins/job/BIOFORMATS-test-folder/27045/console), so is on my list to review for inclusion in 8.0.0.

@NicoKiaru
Copy link
Contributor Author

Thanks! Before doing that please wait a day or two: I want to upload such a big 2D dataset since I'm not sure you have that in the demo files.

@NicoKiaru
Copy link
Contributor Author

Ok, I've submitted a tiled image on Zenodo bio-formats submission community.

I think to demo the performance increase I should rather have done a single very large plane than a z-stack. But timing the opening with a 'crop on import' the perf increase should still be visible I hope.

@joshmoore
Copy link
Member

see https://zenodo.org/records/13680725

Copy link
Member

@melissalinkert melissalinkert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am generally fine with this, there are just a few small things that should be fixed before merging, the null check on line 213-214 being most important.

Several tests of showinf -crop ... with https://zenodo.org/records/13680725 in particular show a definite performance improvement with these changes compared to the current state of develop. I also tried reading the matl.omp2info (see OlympusTileReader); while that wasn't dramatically faster, it certainly wasn't any worse, and did demonstrate that the optimal tile size changes here are correctly propagated to OlympusTileReader.

As previously indicated, automated tests are passing. Since those tests don't thoroughly exercise reading cropped planes, I also ran a simple test to compare plane hashes and read times across all of our .oir data:

import java.security.MessageDigest;
import java.util.ArrayList;
import loci.common.DataTools;
import loci.common.DebugTools;
import loci.tests.testng.*;
import loci.formats.in.OIRReader;

public class PixTest {
  public static void main(String[] args) throws Exception {
    DebugTools.enableLogging("WARN");
    MessageDigest md = MessageDigest.getInstance("MD5");
    try (OIRReader reader = new OIRReader()) {
      ArrayList<String> testFiles = new ArrayList<String>();
      TestTools.getFiles(args[0], testFiles, new ConfigurationTree(args[0], args[1], null), null, null, "");
      for (String testFile : testFiles) {
        if (!reader.isThisType(testFile)) {
          continue;
        }
        System.out.println(testFile);
        reader.setId(testFile);

        int x = reader.getSizeX() /4;
        int y = reader.getSizeY() / 4;
        int w = (int) Math.min(reader.getSizeX() / 3, 2048);
        int h = (int) Math.min(reader.getSizeY() / 3, 2048);

        long t0 = System.currentTimeMillis();
        for (int plane=0; plane<reader.getImageCount(); plane++) {
          byte[] buf = reader.openBytes(plane, x, y, w, h);
          md.reset();
          md.update(buf);
          String md5 = DataTools.bytesToHex(md.digest());
          System.out.println("plane #" + plane + " MD5: " + md5);
        }
        long t1 = System.currentTimeMillis();
        System.out.println("total plane read time: " + (t1 - t0) + " ms");
        reader.close();
      }
    }
  }
}

Without this change, that test against all of our existing .oir data took 75m22s locally. With this change and the addition of a null check between lines 213/214, it takes 58m58s. There are MD5 differences for the following files, but these appear to be correct and a result of better lambda handling:

olympus/spectral_scan_XYL.oir
qa-17409/Stitch_A01_G001.oir

For both files, showinf with develop shows the same data for all planes, which is not the case with this change.

So, once the few suggestions here have been addressed and we have one more passing build that includes the updates, I am happy to see this merged.

@NicoKiaru
Copy link
Contributor Author

Thanks a lot @melissalinkert for this thorough review! I'll get back to it next week.

* set optimalTileHeight to 0 in close
* check reader initialisation in optimal tile size method calls
* checks and warns in case of null PixelBlocks
@NicoKiaru
Copy link
Contributor Author

I've done the modifications you suggested. Thanks again @melissalinkert !

Copy link
Member

@melissalinkert melissalinkert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@melissalinkert melissalinkert merged commit 1f03e09 into ome:develop Sep 10, 2024
18 checks passed
@NicoKiaru NicoKiaru deleted the oir-crop-pixel-block branch September 11, 2024 08:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants