Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for double-precision calculations #13

Open
evanmiller opened this issue Sep 20, 2017 · 5 comments
Open

Support for double-precision calculations #13

evanmiller opened this issue Sep 20, 2017 · 5 comments

Comments

@evanmiller
Copy link
Owner

(Following up on the discussion in #4, cc @BobBane)

I avoided double-precision in the past because Apple's OpenCL compiler was very buggy with doubles (crashing etc). The compiler quality has improved considerably since 2011, and my other OpenCL project (not open source, and not geo-related) uses double-precision exclusively now.

I see a few paths forward here:

  1. Migrate everything to double-precision

  2. Maintain separate kernels for single-precision and double-precision

  3. Use typedefs / #defines with a single set of kernels

I'm hesitant to adopt the #define KFLOAT float approach because the algorithms themselves may need to differ between single and double-precision. I.e. I use a lot of tricks to avoid round-off in the single-precision world that wouldn't be necessary with double-precision. Then there's the various tolerance levels and iteration counts that will need to differ between the two precisions, as well as the annoyance that literals must have an "f" specifier in the single-precision world (i.e. 0.5 has to be written 0.5f).

The only real reason to maintain single-precision is for applications that (strongly) prefer speed over accuracy, or to support GPUs that lack double-precision support. With Magic Maps, the slowdown might be an acceptable trade-off, but I won't really know until I try it out. So I'm hesitant to rip out the single-precision code willy-nilly.

For now I'm leaning toward a two-kernel world, implementing (porting) double-precision versions of projections as needed. I imagine an extra argument to pl_context_init would specify the desired computation precision — which would later be passed to pl_find_kernel — and I think that a wrapper function (or several) around clSetKernelArg should mean we won't have to duplicate many host-side functions. I'm envisioning two separate folders kernel/float/ and kernel/double/, with all OpenCL functions prefixed with either plf_ or pld_ depending on the precision. Generally only one set of kernels would be compiled for a given PLContext.

To keep a unified C API, I am fine with requiring all input/output buffers to be double-precision; for my application, copying matters much less than raw computation speed, so I'm okay with sending in and reading out only double and double *. So the user would only need to think about the precision choice when initializing the context, and thus could easily switch between computation precisions without having to rework all the client code.

What do you think?

@BobBane
Copy link
Contributor

BobBane commented Sep 20, 2017

In kernels, nothing is as simple as it seems. Given the syntactic quirks (0.5f - ouch) and semantic differences, having separate float/double kernels seems mandatory.

All I/O in double precision would certainly meet my needs, as long as it doesn't overburden yours.

@evanmiller
Copy link
Owner Author

OK then — in that case I'll start laying some groundwork in the internal API, so you'll be productive when you get a chance to tackle double-precision properly. (Nothing too drastic... if your priorities change it won't hurt my feelings :-)

evanmiller added a commit that referenced this issue Sep 20, 2017
This is preliminary work for supporting double-precision kernels (issue #13)
@evanmiller
Copy link
Owner Author

By the way, the 0.5/0.5f dilemma can be avoided with -cl-single-precision-constant (under "Math Intrinsics Options"):

https://www.khronos.org/registry/OpenCL/sdk/1.2/docs/man/xhtml/clBuildProgram.html

That at least opens the door to unified kernels, though I still think separate ones will be preferable.

@BobBane
Copy link
Contributor

BobBane commented Sep 21, 2017

We're still left with:

  • the rounding-sensitive computations (which probably won't hurt single->double, just look odd)
  • the number of iterations and tolerance values (compiler could handle that at compile time with case (sizeof (KFLOAT)) ... , but yuck)

Separating the kernels still seems right for now.

I am likely to start on the lat/lon extension next week - I have a crash in the field to diagnose today.

@evanmiller
Copy link
Owner Author

Ok, sounds good to me. My only ambition in the meantime is to try to get the single-precision accuracy down to 1 arc-second for all the routines; looks like everything's hitting that target except Mr. Transverse Mercator...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants