Dump v #200

Lifann · 2024-08-13T16:22:33Z

Here is the costs in microseconds of dump_kernel and dump_kernel_v2 on both pinned host or device output on 2^24 capacity table with half of the contents are exported. The table values are stored on pure GPU buckets.

capacity: 2^24, and the table is full when running the export_batch_if
num exported: 8388771
dim: 64

A100 + AMD

	dump_kernel	dump_kernel_v2	dump_kernel_v2_vectorized
Pinned host memory	14887.594	2116.001	607.138
Device	24.700	6.012	3.957

H20 + Intel

	dump_kernel	dump_kernel_v2	dump_kernel_v2_vectorized
Pinned host memory	624.399	44.536	44.143
Device	16.615	4.546	2.359

github-actions · 2024-08-13T16:23:57Z

Documentation preview

https://nvidia-merlin.github.io/HierarchicalKV/review/pr-200

jiashuy · 2024-08-14T00:39:20Z

include/merlin/core_kernels.cuh

+template <class K, class V, class S,
+          template <typename, typename> class PredFunctor,
+          int TILE_SIZE>
+__global__ void dump_kernel_v2(const Table<K, V, S>* __restrict table,


No call to this kernel?

jiashuy · 2024-08-14T03:02:30Z

include/merlin/core_kernels.cuh

+  int dim = table->dim;
+  auto g = cg::tiled_partition<TILE_SIZE>(cg::this_thread_block());
+
+  __shared__ block_acc;


block_acc is not used.

Lifann · 2024-08-14T13:22:14Z

Hi, @jiashuy This PR is under development yet. I'll fix the problems ASAP.

jiashuy

LGTM

jiashuy · 2024-08-18T20:21:18Z

/blossom-ci

jiashuy · 2024-08-19T10:24:42Z

/blossom-ci

jiashuy · 2024-08-20T05:03:44Z

/blossom-ci

jiashuy · 2024-08-21T15:07:53Z

tests/export_batch_if_test.cc.cu

+  cudaEventCreate(&start);
+  cudaEventCreate(&stop);
+  cudaEventRecord(start);
+  table->export_batch_if<ExportIfPredFunctor>(


There are total three kernel templates, and has each kernel been tested? If not is it necessary to test each kernel.

I've tested them seperately, but not added them into the tests case. Since it's an internal option not for public API.

…memory wavefronts

Lifann force-pushed the dump-v branch from 9eea84c to 808b073 Compare August 13, 2024 16:23

Lifann force-pushed the dump-v branch from a8e6171 to c4e82fc Compare August 13, 2024 16:24

jiashuy reviewed Aug 14, 2024

View reviewed changes

Lifann force-pushed the dump-v branch from 12ccbd3 to 93e2c85 Compare August 14, 2024 14:51

Lifann requested a review from rhdong August 16, 2024 12:40

jiashuy previously approved these changes Aug 18, 2024

View reviewed changes

Lifann dismissed jiashuy’s stale review via 7aaabe7 August 20, 2024 14:25

Lifann force-pushed the dump-v branch from 2d8a9b6 to 9b79bc9 Compare August 21, 2024 06:50

jiashuy reviewed Aug 21, 2024

View reviewed changes

opt(export_batch_if): Optimize the export_batch_if in cond to reduce …

989c5cf

…memory wavefronts

Lifann force-pushed the dump-v branch from 54a6501 to 989c5cf Compare August 22, 2024 08:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dump v #200

Dump v #200

Lifann commented Aug 13, 2024 •

edited

Loading

github-actions bot commented Aug 13, 2024

jiashuy Aug 14, 2024

jiashuy Aug 14, 2024

Lifann commented Aug 14, 2024

jiashuy left a comment

jiashuy commented Aug 18, 2024

jiashuy commented Aug 19, 2024

jiashuy commented Aug 20, 2024

jiashuy Aug 21, 2024

Lifann Aug 22, 2024 •

edited

Loading

Dump v #200

Are you sure you want to change the base?

Dump v #200

Conversation

Lifann commented Aug 13, 2024 • edited Loading

A100 + AMD

H20 + Intel

github-actions bot commented Aug 13, 2024

Documentation preview

jiashuy Aug 14, 2024

Choose a reason for hiding this comment

jiashuy Aug 14, 2024

Choose a reason for hiding this comment

Lifann commented Aug 14, 2024

jiashuy left a comment

Choose a reason for hiding this comment

jiashuy commented Aug 18, 2024

jiashuy commented Aug 19, 2024

jiashuy commented Aug 20, 2024

jiashuy Aug 21, 2024

Choose a reason for hiding this comment

Lifann Aug 22, 2024 • edited Loading

Choose a reason for hiding this comment

Lifann commented Aug 13, 2024 •

edited

Loading

Lifann Aug 22, 2024 •

edited

Loading