improving batched and single-point texture and environment lookups #4244

kfjahnke · 2024-04-25T09:30:44Z

kfjahnke
Apr 25, 2024

I have been working with OIIO's texture and environment lookups for some time now, investigating speed, quality and usability. Some issues have cropped up during this investigation, but I feel that a discussion might be a better environment to address the topic. I feel that the OIIO code could benefit from some fresh ideas, and some things aren't there which I did hope to find. So I'll start out with a few points where I'd like to see a discussion.

environment lookup only accepts lat/lon environments, not cubemaps
texture lookup is SIMDized vertically and batched lookups use a loop over single-point lookups
the default antialiasing filter requires a lot of input (the derivatives)
the texture system seems to be entirely file-based

I'd like to say a few words about each of these points to clarify what I'd like to discuss.

environment lookup only accepts lat/lon environments, not cubemaps

This was a bit of a disappointment, and opened a can of worms for me. Initially I thought that my input wasn't correct, so to produce cubemaps which would comply, I used openEXR's exrenvmap utility to generate a cubemap from a lat/lon environment map and tried to use that as source data to OIIO's 'environment' function. But to no avail. I think that cubemaps simply aren't supported, and the discussion in this issue suggests that this is indeed the case. Because I wasn't happy with my experience with exrenvmap, I decided to write some code to convert lat/lon to cubemap format and reverse - the results can be seen here. The utility defaults to use OIIO's texture and environment lookups - best as I could employ them - but I also wrote code which does the lookup with my own code, and this code might hint at ways to add new modes of texture lookup to OIIO, which leads me to the next point.

texture lookup is SIMDized vertically and batched lookups use a loop over single-point lookups

Both the 'environment' and the 'texture' function in OIIO's texture system code have batched variants, which process sixteen lanes at once. I was happy to see these signatures, because my 'strip-mining' code is SIMDized, and so I thought that my batches of input would just carry on to produce batches of output in sixteen-lane SIMD code. When I stumbled upon an issue with the code (the one I linked to above) I looked at the code and saw that it processes the batches by rearranging the batched arguments and processing them in a loop over the individual points. As Larry pointed out, the individual point lookups are SIMDized to an extent, but the SIMDization is vertical, filling only as many lanes as the pixel has channels. This is severe underpopulation with today's SIMD register sizes.

the default antialiasing filter requires a lot of input (the derivatives)

With the processing logic of the lookup, passing the derivatives is a good way to control the shape of the antialiasing filter. The resulting lookups are perfectly general, the signature is uniform. It's good, solid code and it works with every scale change. The workload handles tiled input, which is a great advantage. But the code is what I'd call 'bulky', and the way it's SIMDized makes it quite slow. So I tried to come up with an alternative which would reduce the amount of required arguments and lend itself to horizontal SIMDization, which tends to be faster and easier to code. I came up with a method I call 'twining', which I implemented for envutil, which is easy to build from source so that it's performance can be evaluated against using OIIO's lookup. I'll explain what 'twining' does a bit further down.

the texture system seems to be entirely file-based

When I was looking for a way to process cubemaps, I wrote code to 'augment' the raw cubemap data to a form where each 'cube face proper' is surrounded with a 'frame' of extra pixels to make the image size a multiple of a given tile size and provide support for 'better' interpolators. I wanted to access these data with OIIO's texture lookup. The only way I found to do that was to write the data to an image file and feed that to the texture system. This seems like a wasteful way of handling the data - I have them in-memory after all - so I was wondering if there wasn't a route to directly feed a buffer of in-memory data to the texture system. Am I missing something?

twining - in-line oversampling and low-pass filering

Here's my idea how to handle the lookups with horizontal SIMD code and a leaner signature:

Let's first look at the usual way to do a lookup. You have, incoming, a target coordinate, which you convert to a source coordinate, let's call it the 'pick-up' coordinate. Then you access the source data with an interpolator/filter, producing a result pixel. The filter has to reflect the geometrical relation between source and target - usually something which could be modeled locally as an affine transformation. To steer the filter, information about the target's relation to the source has to be carried over to the lookup in the source data - hence the derivatives in OIIO's lookup code.

Why do we interpolate/filter? Because picking the pixel up naively may result in aliasing when down-scaling, and when up-scaling, the interpolator is responsible for avoiding staircase artifacts and the likes. Suitable interpolators for up-scaling are easily found (OIIO's bicubic is perfectly adequate), but antialiasing is tricky. There is a 'brute force' approach to avoid aliasing: if the output has 'sufficiently high resolution', aliasing simply doesn't happen. So one can approach the problem by not rendering to a scaled-down target but to a larger one which is sufficiently large to avoid aliasing. Subsequently, the oversized target can be decimated after using a low-pass filter - now in the target domain. Initially this seems like a silly idea - one would need a potentially very large target and produce a lot of bus traffic. This is where 'twining' comes in.

If we use a simple interpolator - e.g. bilinear interpolation - we get a signal which is continuous and already 'quite good'. If we oversample this signal so that the sampling steps are in the order of magnitude of the source signal's sampling steps, we won't see aliasing. Twining 'inlines' the oversampling. When picking up for a given pick-up point, the twining code instead picks up from a population of locations in the vicinity and instantly forms a (potentially weighted) sum, which constitutes it's output. So there are still many pick-ups, but in close vicinity (the source data will likely be in-cache), and the memory traffic is reduced to the pick-ups, whereas there is no need for an oversized target - the 'compression' of the result happens in the twining code and is transparent to the caller. This scheme is easily SIMDized horizontally: instead of doing single-point lookups, groups of N pickup locations are processed in parallel. envutil has code to do just that. If you pass --itp 1 to use bilinear interpolation, you can pass e.g. --twine 3 to oversample to a 3X3 vicinity with subsequent box filtering (averaging). To make the code 'tastier', I've added options to weight with a gaussian and to modify the extent of the vicinity - the filter itself is a generalization of convolution, which combines a sub-pick-up's offset from the original pick-up with the weight, one such set for each sub-pick-up. This isn't exploited in envutil (yet) - only 'regular' shapes are used.

I have run the conversion between lat/lon and cubemap with varying degrees of down-scaling and found that 'twining' is fast and produces appealing results. So my idea might hint at a possible route towards horizontal SIMDization of the lookup with reduced parameter passing, and I'd like to invite interested parties to play with it and voice their opinion. I think that a way to speed up texture lookup without compromising quality would be beneficial to OIIO. Up-scaling, on the other hand, does not need antialiasing - a good interpolator does the trick, and interpolation could also be SIMDized horizontally quite easily. There is little to be gained from up-scaling environment data, though - after all the content won't improve.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improving batched and single-point texture and environment lookups #4244

{{title}}

Replies: 0 comments

Select a reply

improving batched and single-point texture and environment lookups #4244

kfjahnke Apr 25, 2024

environment lookup only accepts lat/lon environments, not cubemaps

texture lookup is SIMDized vertically and batched lookups use a loop over single-point lookups

the default antialiasing filter requires a lot of input (the derivatives)

the texture system seems to be entirely file-based

twining - in-line oversampling and low-pass filering

Replies: 0 comments

kfjahnke
Apr 25, 2024