-
Notifications
You must be signed in to change notification settings - Fork 62
无进位乘法和GHASH
Sun Yimin edited this page Aug 21, 2023
·
25 revisions
参考Reference 1,Page 35 - 39
- The PCLMULQDQ + AES-NI combination for AES-GCM
- AES-NI facilitate high performance AES encryption and decryption
- PCLMULQDQ
$64 \times 64 \rightarrow 128$ (carry-less)- Binary polynomial multiplication; speed up computations in binary fields
- Using it for AES-GCM:
- To use it for GHASH computations:
$GF(2^{128})$ multiplication:- Compute
$128 \times 128 \rightarrow 256$ via carry-less multiplication (of 64-bit operands) - Reduction:
${256 \rightarrow 128} \ modulo \ {x^{128} + x^7 + x^2 + x + 1}$ (done efficiently via software)
- Compute
- 128-bit Carry-less Multiplication using PCLMULQDQ
(Gueron Kounavis, 2009) Multiply$128 \times 128 \rightarrow 256 \ [A_1 : A_0]\cdot[B_1 : B_0]$ -
Schoolbook (4 PCLMULQDQ invocations)
$A_0 \cdot B_0 = [C_1 : C_0], \ A_1 \cdot B_1 = [D_1 : D_0]$
$A_0 \cdot B_1 = [E_1 : E_0], \ A_1 \cdot B_0 = [F_1 : F_0]$
$[A_1 : A_0] \cdot [B_1 : B_0] = [D_1:D_0 \oplus E_1 \oplus F_1:C_1 \oplus E_0 \oplus F_0 : C_0]$ -
Carry-less Karatsuba (3 PCLMULQDQ invocations)
$A_1 \cdot B_1 = [C_1 : C_0], \ A_0 \cdot B_0 = [D_1 : D_0]$
$(A_1 \oplus A_0) \cdot (B_1 \oplus B_0) = [E_1 : E_0]$
$[A_1 : A_0] \cdot [B_1 : B_0] = [C_1:C_0 \oplus C_1 \oplus D_1 \oplus E_1 : D_1 \oplus C_0 \oplus D_0 \oplus E_0 : D_0]$
-
Schoolbook (4 PCLMULQDQ invocations)
- A new interpretation to GHASH operations
- GHASH does not use
$GF(2^{128})$ COMPUTATIONS "as expected"- Not in the usual polynomial representation convention
- The bits inside the 128-bit operands are reflected
- Actually - it is an operation on a permutation of elements of
$GF(2^{128})$ $T1 = reflect(A)$ $T2 = reflect(B)$ -
$T3 \times T2 \ modulo \ {x^{128} + x^7 + x^2 + x + 1}$ (a$GF(2^{128})$ multiplication) $reflect(T3)$
- It can be proved that this operation is:
-
$A \times B \times x^{-127} \ mod \ x^{128} + x^{127} + x^{126} + x^{121} + 1$ - A weird Montgomery Multiplication in
$GF(2^{128})$ modulo a reversed poly
- A weird Montgomery Multiplication in
- Better written as
$A \times (B \times x) \times x^{-128} \ mod \ x^{128} + x^{127} + x^{126} + x^{121} + 1$
-
- GHASH does not use
- Fast reduction modulo
$x^{128} + x^{127} + x^{126} + x^{121} + 1$ - Input 256-bit operand
$[X_3:X_2:X_1:X_0]$ $[A_1:A_0] = X_0 \cdot 0xc200000000000000 $ $[B_1:B_0] = [X_0 \oplus A_1 : X_1 \oplus A_0]$ $[C_1:C_0] = B_0 \cdot 0xc200000000000000 $ $[D_1:D_0] = [B_0 \oplus C_1 : B_1 \oplus C_0]$ - Output:
$[D_1 \oplus X_3 : D_0 \oplus X_2]$
- Input 256-bit operand
; Input is in T1:T0
vmodqa T3, [W] ; poly
vpclmulqda T2, T3, T0, 0x01
vpshufd T4, T0, 78
vpxor T4, T4, T2
vpclmulqda T2, T3, T4, 0x01
vpshufd T4, T4, 78
vpxor T4, T4, T2
vpxor T1, T1, T4 ; result in T1
- Aggregated Reduction
- In a Horner form (iterative computation)
-
$Y_i = MM[(X_i \oplus Y_{i-1}), Hx]$ ... everyting$mod \ x^{128} + x^{127} + x^{126} + x^{121} + 1$
-
- 4-way expanded Horner form (aggregate results to defer the reduction)
$MM[X_i , Hx] \oplus MM[X_{i-1} , {(Hx)}^2] \oplus MM[X_{i-2} , {(Hx)}^3] \oplus MM[(X_{i-3} \oplus Y_{i-4}, {(Hx)}^4] $ - Can be expanded to N > 4 blocks, we use 8 blocks now.
- Overhead: pre-calculate the powers of Hx (amortized for reasonably long buffer)
- The gain: reduction deffered to once per "N" blocks
- In a Horner form (iterative computation)
- Cryptographic Hardware and Software and useful architectures
- The Galois/Counter Mode of Operation (GCM)
- Recommendation for Block Cipher Modes of Operation: Galois/Counter Mode (GCM) and GMAC
- Intel® Carry-Less Multiplication Instruction and its Usage for Computing the GCM Mode
- Optimized Galois-Counter Mode implementation on Intel® Architecture Processors
- Enabling High Performance Galois-Counter Mode on Intel® Architecture Processors