• 模型压缩 Compression [KMY16]
• 算法优化 Optimization algorithms [KMR16]
• 参与方选取 Client selection [NY18]
• 边缘计算 Resource constraint, IoT, Edge computing [WTS18]
• 数据分布不均匀 Data distribution and selection [ZLL18]
• 个性化 Personalization [SCS18]
• 本地训练 Local training
• 秘密共享 Secret share aggregated update
• 稳定性 Robust to vanishing clients
• 无个人梯度信息泄露 Individual gradient not disclosed
• 半诚实假设 honest-but-curious setting, server does not collude with users
ID X1 X2 X3
U1 9 80 600
U2 4 50 550
U3 2 35 520
U4 10 100 600
U5 5 75 600
U6 5 75 520
U7 8 80 600
Retail A Data
ID X4 X5 Y
U1 6000 600 No
U2 5500 500 Yes
U3 7200 500 Yes
U4 6000 600 No
U8 6000 600 No
U9 4520 500 Yes
U10 6000 600 No
Bank B Data
目标: ➢ A方 和 B 方 联合建立模型
假设: ➢ 只有一方有标签 Y ➢ 两方均不暴露数据
挑战: ➢ 只有X的一方没有办法建立模型 ➢ 双方不能交换共享数据
预期: ➢ 双方均获得数据保护 ➢ 模型无损失 (LOSSLESS)
半诚实(Honest-but-curious) VS 恶意(Malicious)
零知识(Zero knowledge) VS 一些知识(Some knowledge)
恶意中心(Adversarial Server) VS 恶意数据节点(Adversarial Client)
• 保证信息层面的数据安全
• 零知识证明 zero knowledge
• 需要多方参与
• 大量信息传输
• 降低数据安全要求可以提高效率
• 隐私保护下的机器学习 Privacy-preserving machine learning for linear regression, logistic regression and neural network training
• 秘密共享,混淆电路,不经意传输 Combine secret sharing, garbled circuits and oblivious transfer
• 需要两方计算 Learn via two untrusted, but non-colluding servers
• Computationally secure, but expensive
• 全同态或者半同态 Full Homomorphic Encryption and Partial Homomorphic Encryption
• 数据层面的信息保护 Data-level information protection
Addition : [[u]] + [[v]] = [[u+v]]
Scalar multiplication: n[[u]] = [[nu]]
• For public key pk = n, the encoded form of m ∈ {0, . . . , n − 1} is Encode(m) = r^n (1 + n)^m mod n^2 r is random selected from {0, . . . , n − 1}.
• For float q = (s, e) , encrypt [[q]] = ([[s]], e) , here q = sβe ,is base-β exponential representation.
① 多项式近似 Polynomial approximation for logarithm function
②加密计算 Encrypted computation for each term in the polynomial function
• Leveled homomorphic encryption
• Privately evaluate neural network
• Yao’s GC
• Offline lattice-based AHE
• Online GC and secret-sharing
• 3-server model
• trusted third-party dealers
• Lattice-based packed addictive HE for linear layer
• Garbled Circuit for non-linear layer
• Three orders of magnitude faster than CryptoNets [DBL+16]
• All parties are honest-but-curious.
• We assume a threat model with a semi-honest adversary who can corrupt at most one of the two data clients.
• For a protocol P performing (OA ,OB )=P(IA ,IB ), where OA and OB are party A and B's output and IA and IB are their inputs,
P is secure against A if there exists an infinite number of (I'B ,O'B ) pairs such that (OA ,O'B )=P(IA ,I'B ).
• A practical solution to control information disclosure.
• All C learns are the masked gradients and the randomness and secrecy of the masked matrix are guaranteed
• Party A learns its gradient at each step, but this is not enough for A to learn any information about B
• Inability of solving n equations with more than n unknowns
Step 1 Party A and B send public keys to each other
Step 2 Parties compute, encrypt and exchange intermediate results
Step 3 Parties compute encrypted gradients, add masks and send to each other
Step 4 Parties decrypt gradients and exchange, unmask and update model locally
Step 1 Party B computes and sends intermediate results to A
Step 2 Party A computes encrypted result, adds mask and send back to B
Step 3 Party B decrypts masked result and sends back to A
Step 4 Party A unmasks results, predicts, and sends to B
• 没有泄露原始数据 No exposure of raw data
• 没有泄露原始数据的加密形式 No exposure of encrypted raw data
• 没有第三方 No Third Party
• 模型几乎无损失 Almost lossless accuracy
• 加密数据传输和加密运算是最大影响因素;
• 数据传输与数据量成正比;
• 加密运算代价与模型复杂度(参数量)成正比;
Step 1 : Private transfer learning
Step 2 : Private federated learning
Step 3 : Private model integration
Step 4 : Private model inference
1.选择一个机器学习算法,设计多方安全计算协议
2.定义多方交互的数据变量
3.构建算法执行工作流
4.基于EggRoll & Federation Api 实现算法工作流中各个功能组件
• Secure Logistic Regression
• Secure Boosting Tree
• Secure DNN/CNN(Coming Soon)
• Secure Logistic Regression
• Secure Boosting Tree(Coming Soon)
• Secure DNN/CNN(Coming Soon)
• 定义联邦算法组件执行工作流
• 参数初始化组件
• 数据加载和转换组件
• 训练、预测组件
• 评估组件
• 模型保存组件
• 定义梯度和损失计算公式
• 设计算法并行方式
• 通过Eggroll API 实现分布式梯度聚合和损失计算
• 定义算法交互信息-梯度(json 配置文件,数据源和目的地)
• 生成梯度交互信息唯一标识符
• Federation API 完成梯度交互信息的收发