Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TensorFlow] Added headers from common_runtime/gpu/* #863

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

okdzhimiev
Copy link

As requested by Samuel.

  • added those headers I needed
  • error log posted here

@okdzhimiev okdzhimiev changed the title Added headers from common_runtime/gpu/* [TensorFlow] Added headers from common_runtime/gpu/* Apr 3, 2020
@saudet
Copy link
Member

saudet commented Apr 5, 2020

Yeah, that is going to require some amount of work to get all this mapped in a meaningful way...

BTW, Google isn't supporting the C++ API anymore, so this is all deprecated. We should, if possible, use the C API. Could you provide more details about which features you need to access?

/cc @karllessard

@okdzhimiev
Copy link
Author

okdzhimiev commented Apr 6, 2020

Basically what is needed is something like this:

TensorShape shape = TensorShape({256});
PlatformGpuId platform_gpu_id(0);

GPUMemAllocator *sub_allocator =
    new GPUMemAllocator(
        GpuIdUtil::ExecutorForPlatformGpuId(platform_gpu_id).ValueOrDie(),
        platform_gpu_id, false, {}, {});

GPUBFCAllocator *allocator =
    new GPUBFCAllocator(sub_allocator, shape.num_elements() * sizeof(DT_UINT8), "GPU_0_bfc");

auto inputTensor = Tensor(allocator, DT_UINT8, shape);

Ultimately I would like to be able to feed graph from GPU memory directly - for that 2 things are needed:

  1. create a tensor in gpu memory. This is missing in JavaCPP. Need to:
    • create tensor in gpu memory. See the code above.
    • get pointer to memory to run through some CUDA kernel (currently doing with JCuda). Normally the pointer can be acquired via:
void* p =  t.tensor_data().data();
or
void* p = TF_TensorData(t);
  1. run graph with specific options. This seems to exist in JavaCPP - didn't test though. Also, see direct_session_test.cc#L2387, the workflow is:
    construct CallableOptions -> run makeCallable -> run runCallable

Notes:

  • Didn't know it's all deprecated. Though it's still there in the current master branch - TF 2.1.0
  • In the direct_session_test.cc they get gpu_tensor from one runCallable() then feed it to another runCallable(). I haven't explored if I could just do the same trick and 'bypass' GPUMemAllocator. If that worked then no changes are required to JavaCPP.
  • Also, I managed to add the required modifications to the built-in TF's JNI (here). First tried to build JavaCPP and contacted you, then went ahead with the TF's JNI.

@saudet
Copy link
Member

saudet commented Apr 7, 2020

If I follow you, what you would need is a way to allocate tensors in GPU memory directly and be able to specify which device exactly? Can you bring this up on the SIG JVM mailing list at https://groups.google.com/a/tensorflow.org/forum/#!forum/jvm or the Gitter channel at https://gitter.im/tensorflow/sig-jvm? The guys at Google have started using JavaCPP for their Java bindings, so TensorFlow is basically a downstream project of JavaCPP now...

  • Didn't know it's all deprecated. Though it's still there in the current master branch - TF 2.1.0

Yes, it looks like they will leave it there for a while, but from what I know it is no longer being updated, so will most likely either start to become unusable somewhere down the road or become the target of internal refactoring efforts without prior notice.

  • In the direct_session_test.cc they get gpu_tensor from one runCallable() then feed it to another runCallable(). I haven't explored if I could just do the same trick and 'bypass' GPUMemAllocator. If that worked then no changes are required to JavaCPP.

From what I understand of the way TensorFlow works is that all intput/output tensors are first allocated in host memory, but they can also have allocated GPU memory associated with them once they get used in sessions and what not, which TensorFlow manages. I stumbled on a nice thread about that at tensorflow/tensorflow#5902. It's not clear to me how any of this is supposed to help when we actually want to do everything manually through.

  • Also, I managed to add the required modifications to the built-in TF's JNI (here). First tried to build JavaCPP and contacted you, then went ahead with the TF's JNI.

That's cool, but like I said that's all deprecated so the SIG JVM will probably not want to use that anyway (unless this becomes part of the official upstream C API, which I would encourage you to contribute to). Let's see what these guys say though.

@okdzhimiev
Copy link
Author

okdzhimiev commented Apr 7, 2020

If I follow you, what you would need is a way to allocate tensors in GPU memory directly and be able to specify which device exactly?

Yes. Here's the workflow:

  1. Open image steam - get resolution for the shape of tensor.
  2. Create Tensor in gpu, acquire pointer beforehand - p.
  3. Read image from input stream.
  4. Run image through CUDA kernel (distortions, aberrations and other linear transforms, so NN is hardware agnostic), place result at p to avoid unnecessary transfers between device and host.
  5. Run graph.

From what I understand of the way TensorFlow works is that all intput/output tensors are first allocated in host memory, but they can also have allocated GPU memory associated with them once they get used in sessions and what not, which TensorFlow manages. I stumbled on a nice thread about that at tensorflow/tensorflow#5902. It's not clear to me how any of this is supposed to help when we actually want to do everything manually through.

Yeah, I read that. With some help from @fierval - in my C++ test program tensor gets allocated as in #5902, comment #263944891 (also the GPUBFCAllocator in my previous comment here) then some code from direct_session_test.cc.

Thanks for your suggestions. I'll try to post to the resources you mentioned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants