Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP Allow Python backend to directly write Numpy arrays to SHM #264

Draft
wants to merge 6 commits into
base: r23.05
Choose a base branch
from

Conversation

asos-danielbunting
Copy link

No description provided.

Copy link
Member

@Tabrizian Tabrizian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asos-danielbunting thanks for the PR. I was wondering what is the use-case that this PR is trying to address? Is the idea to pre-allocate the buffers in shared memory and directly work with them to speed up the inference process?

Could you please share more details about the places where this becomes useful?

@asos-danielbunting
Copy link
Author

Hi @Tabrizian I'm looking at trying to speed up passing a large tensor between a Python BLS model doing preprocessing and a Tensorflow inference model.

As you say the idea is to allocate the buffer and directly write my data into it from the python side and so avoid an extra allocation + copy time. I've run a couple of tests and for my use case this can speed up my inference time by a decent amount eg for a 100000 x 200 float32 tensor the saving was 30ms

@@ -0,0 +1,31 @@
FROM asnpdsacr.azurecr.io/public/tritonserver:23.05-tf2-python-py3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove

@@ -431,8 +431,12 @@ Stub::StubSetup()
py::setattr(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove all the changes except the ones in the src directory.


c_python_backend_utils.attr("shared_memory") = py::cast(shm_pool_.get());
python_backend_utils.attr("shared_memory") = py::cast(shm_pool_.get());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not needed.

@@ -494,6 +498,7 @@ Stub::Initialize(bi::managed_external_buffer::handle_t map_handle)
python_backend_utils, "InferenceResponse",
c_python_backend_utils.attr("InferenceResponse"));
c_python_backend_utils.attr("shared_memory") = py::cast(shm_pool_.get());
python_backend_utils.attr("shared_memory") = py::cast(shm_pool_.get());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not required.

@@ -1603,6 +1608,8 @@ PYBIND11_EMBEDDED_MODULE(c_python_backend_utils, module)

py::register_exception<PythonBackendException>(
module, "TritonModelException");

module.def("new_shm_tensor", &PbTensor::CreateInSHM, "Creates a new Tensor directly into shared memory");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we rename this to pb.Tensor.new(shape, dtype, device='cpu')?

reinterpret_cast<char*>(tensor_shm_ptr) + pb_memory_offset,
shm_handle + pb_memory_offset, false);
tensor_shm_ptr->memory = 0;
std::cout << "Offset is - " << pb_memory_offset<< "\n";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove print statement.

{

// Input params of tensor
//std::vector<int64_t> dims = std::vector<int64_t>({10, 10});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants