Skip to content

Commit

Permalink
Merge pull request #3007 from vespa-engine/jobergum/gpu
Browse files Browse the repository at this point in the history
Jobergum/gpu
  • Loading branch information
arnej27959 authored Nov 24, 2023
2 parents e6e8aa4 + a6141cd commit 4f32a51
Show file tree
Hide file tree
Showing 2 changed files with 10 additions and 6 deletions.
13 changes: 8 additions & 5 deletions en/cross-encoders.md
Original file line number Diff line number Diff line change
Expand Up @@ -244,7 +244,7 @@ schema my_document {
}
}</pre>

Notice that both tokens uses the same mapped embedding dimension name `p`.
Notice that both tokens use the same mapped embedding dimension name `p`.

<pre>
rank-profile max-paragraph-into-cross-encoder inherits default {
Expand All @@ -268,15 +268,18 @@ rank-profile max-paragraph-into-cross-encoder inherits default {
function my_attention_mask() {
expression: tokenAttentionMask(256, query(tokens), best_input)
}
match-features: best_input my_input_ids my_token_type_ids my_attention_mask
global-phase {
rerank-count: 25
expression: onnx(cross_encoder){d0:0,d1:0}
expression: onnx(cross_encoder){d0:0,d1:0} #Slice
}
}
</pre>

The `best_input` uses a tensor join between the `closest(embedding)` tensor and the `tokens` tensor
which then returns the tokens of the best matching paragraph, this is then fed into the other transformer
related functions as the document tokens.
The `best_input` uses a tensor join between the `closest(embedding)` tensor and the `tokens` tensor,
which then returns the tokens of the best-matching (closest) paragraph.

This tensor is used in the other Transformer-related functions
(`tokenTypeIds tokenAttentionMask tokenInputIds`) as the document tokens.


3 changes: 2 additions & 1 deletion en/reference/schema-reference.html
Original file line number Diff line number Diff line change
Expand Up @@ -2397,7 +2397,8 @@ <h2 id="onnx-model">onnx-model</h2>
<td>Zero or one</td>
<td>
Set the GPU device number to use for computation, starting at 0, i.e.
if your GPU is <code>/dev/nvidia0</code> set this to 0. This must be an Nvidia CUDA-enabled GPU.
if your GPU is <code>/dev/nvidia0</code> set this to 0. This must be an Nvidia CUDA-enabled GPU. Currently only
models used in <a href="#globalphase-rank">global-phase</a> can make use of GPU-acceleration.
</td>
</tr>
<tr>
Expand Down

0 comments on commit 4f32a51

Please sign in to comment.