Merge pull request #3007 from vespa-engine/jobergum/gpu

Jobergum/gpu
vespa-engine · Nov 24, 2023 · 4f32a51 · 4f32a51
2 parents e6e8aa4 + a6141cd
commit 4f32a51
Show file tree

Hide file tree

Showing 2 changed files with 10 additions and 6 deletions.
diff --git a/en/cross-encoders.md b/en/cross-encoders.md
@@ -244,7 +244,7 @@ schema my_document {
     }
 }</pre>
 
-Notice that both tokens uses the same mapped embedding dimension name `p`. 
+Notice that both tokens use the same mapped embedding dimension name `p`. 
 
 <pre>
 rank-profile max-paragraph-into-cross-encoder inherits default {
@@ -268,15 +268,18 @@ rank-profile max-paragraph-into-cross-encoder inherits default {
     function my_attention_mask() {
         expression: tokenAttentionMask(256, query(tokens), best_input)
     }
+    match-features: best_input my_input_ids my_token_type_ids my_attention_mask
     global-phase {
         rerank-count: 25
-        expression: onnx(cross_encoder){d0:0,d1:0}
+        expression: onnx(cross_encoder){d0:0,d1:0} #Slice 
     }
 }
 </pre>
 
-The `best_input` uses a tensor join between the `closest(embedding)` tensor and the `tokens` tensor
-which then returns the tokens of the best matching paragraph, this is then fed into the other transformer
-related functions as the document tokens.  
+The `best_input` uses a tensor join between the `closest(embedding)` tensor and the `tokens` tensor,
+which then returns the tokens of the best-matching (closest) paragraph. 
+
+This tensor is used in the other Transformer-related functions 
+(`tokenTypeIds tokenAttentionMask tokenInputIds`) as the document tokens.  
 
 
diff --git a/en/reference/schema-reference.html b/en/reference/schema-reference.html
@@ -2397,7 +2397,8 @@ <h2 id="onnx-model">onnx-model</h2>
     <td>Zero or one</td>
     <td>
       Set the GPU device number to use for computation, starting at 0, i.e.
-      if your GPU is <code>/dev/nvidia0</code> set this to 0. This must be an Nvidia CUDA-enabled GPU.
+      if your GPU is <code>/dev/nvidia0</code> set this to 0. This must be an Nvidia CUDA-enabled GPU. Currently only
+      models used in <a href="#globalphase-rank">global-phase</a> can make use of GPU-acceleration.
     </td>
   </tr>
   <tr>