You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am experiencing a large degradation in the performance of the Loader when adding a transform with EmbeddingOperator, for loading data from a pretrained embeddings numpy array.
I have been following the method exposed in this tutorial notebook
Without the transforms argument the entire dataset is consumed in 6 seconds, while with the loading from pretrained embeddings array it takes almost 40 minutes!
My "validation.parquet" is a small NVTabular dataset with 16 partitions, totalling almost 200 MB.
Specifically with the transforms enabled, I am seeing a very low CPU and GPU utilization, as well as close to zero GPU memory consumption. Nor the CPU or the GPU gets utilized more than 6%.
It seems very strange to me that simply reading batch_size specific rows from a numpy array takes that much time, even considering moving them to GPU.
Details
Here is a minimal working example to reproduce this degradation.
❓ Questions & Help
I am experiencing a large degradation in the performance of the
Loader
when adding a transform withEmbeddingOperator
, for loading data from a pretrained embeddings numpy array.I have been following the method exposed in this tutorial notebook
Without the
transforms
argument the entire dataset is consumed in 6 seconds, while with the loading from pretrained embeddings array it takes almost 40 minutes!My "validation.parquet" is a small NVTabular dataset with 16 partitions, totalling almost 200 MB.
Specifically with the
transforms
enabled, I am seeing a very low CPU and GPU utilization, as well as close to zero GPU memory consumption. Nor the CPU or the GPU gets utilized more than 6%.It seems very strange to me that simply reading
batch_size
specific rows from a numpy array takes that much time, even considering moving them to GPU.Details
Here is a minimal working example to reproduce this degradation.
Question
Is this behaviour intended? What are possible bottlenecks for this? Is something like data prefetching or asynchronous loading applicable here?
Tasks
The text was updated successfully, but these errors were encountered: