Skip to content

Latest commit

 

History

History
195 lines (164 loc) · 7.72 KB

联邦学习的研究与应用.md

File metadata and controls

195 lines (164 loc) · 7.72 KB

1.联邦学习 (Federated Machine Learning)的研究进展

• 系统效率

• 模型压缩 Compression [KMY16]
• 算法优化 Optimization algorithms [KMR16]
• 参与方选取 Client selection [NY18]
• 边缘计算 Resource constraint, IoT, Edge computing [WTS18]

• 模型效果

• 数据分布不均匀 Data distribution and selection [ZLL18]
• 个性化 Personalization [SCS18]

• 数据安全

2.Secure Aggregation [BIK+17]

• 本地训练 Local training
• 秘密共享 Secret share aggregated update
• 稳定性 Robust to vanishing clients
• 无个人梯度信息泄露 Individual gradient not disclosed
• 半诚实假设 honest-but-curious setting, server does not collude with users

3.纵向联邦学习 Vertical Federated Learning

ID X1 X2 X3 
U1 9 80 600 
U2 4 50 550 
U3 2 35 520 
U4 10 100 600 
U5 5 75 600 
U6 5 75 520 
U7 8 80 600 
Retail A Data 
ID X4 X5 Y 
U1 6000 600 No 
U2 5500 500 Yes 
U3 7200 500 Yes 
U4 6000 600 No 
U8 6000 600 No 
U9 4520 500 Yes 
U10 6000 600 No 
Bank B Data 

目标: ➢ A方 和 B 方 联合建立模型
假设: ➢ 只有一方有标签 Y ➢ 两方均不暴露数据
挑战: ➢ 只有X的一方没有办法建立模型 ➢ 双方不能交换共享数据
预期: ➢ 双方均获得数据保护 ➢ 模型无损失 (LOSSLESS)

4.保护隐私的机器学习Privacy-Preserving Machine Learning

半诚实(Honest-but-curious) VS 恶意(Malicious)
零知识(Zero knowledge) VS 一些知识(Some knowledge)
恶意中心(Adversarial Server) VS 恶意数据节点(Adversarial Client)

5.多方安全计算 (MPC)

• 保证信息层面的数据安全
• 零知识证明 zero knowledge
• 需要多方参与

• 缺点:

• 大量信息传输
• 降低数据安全要求可以提高效率

6.SecureML [MZ17]

• 隐私保护下的机器学习 Privacy-preserving machine learning for linear regression, logistic regression and neural network training
• 秘密共享,混淆电路,不经意传输 Combine secret sharing, garbled circuits and oblivious transfer
• 需要两方计算 Learn via two untrusted, but non-colluding servers
• Computationally secure, but expensive

7.同态加密 Homomorphic Encryption

• 全同态或者半同态 Full Homomorphic Encryption and Partial Homomorphic Encryption
• 数据层面的信息保护 Data-level information protection

Paillier 半同态加密 Partially homomorphic encryption

Addition : [[u]] + [[v]] = [[u+v]]
Scalar multiplication: n[[u]] = [[nu]]
• For public key pk = n, the encoded form of m ∈ {0, . . . , n − 1} is Encode(m) = r^n (1 + n)^m mod n^2 r is random selected from {0, . . . , n − 1}.
• For float q = (s, e) , encrypt [[q]] = ([[s]], e) , here q = sβe ,is base-β exponential representation.

8.同态加密在机器学习上应用 Apply HE to Machine Learning

① 多项式近似 Polynomial approximation for logarithm function
②加密计算 Encrypted computation for each term in the polynomial function

9.隐私保护的深度学习推断

CryptoNet

• Leveled homomorphic encryption
• Privately evaluate neural network

DeepSecure

• Yao’s GC

MiniONN

• Offline lattice-based AHE
• Online GC and secret-sharing

Chameleon

• 3-server model
• trusted third-party dealers

GAZELLE [GVC18]

• Lattice-based packed addictive HE for linear layer
• Garbled Circuit for non-linear layer
• Three orders of magnitude faster than CryptoNets [DBL+16]

10.安全定义 Security Definition

• All parties are honest-but-curious.
• We assume a threat model with a semi-honest adversary who can corrupt at most one of the two data clients.
• For a protocol P performing (OA ,OB )=P(IA ,IB ), where OA and OB are party A and B's output and IA and IB are their inputs,
P is secure against A if there exists an infinite number of (I'B ,O'B ) pairs such that (OA ,O'B )=P(IA ,I'B ).
• A practical solution to control information disclosure.

11.一堆的数学表达,好难看啊

12.联邦学习是可以去掉第三方,产生两方方案的

13.真 · 一堆的数学表达,好难看啊QAQ

14.安全性分析

• Security against third-party C

• All C learns are the masked gradients and the randomness and secrecy of the masked matrix are guaranteed

• Security against each other

• Party A learns its gradient at each step, but this is not enough for A to learn any information about B
• Inability of solving n equations with more than n unknowns

• Security in the semi-honest setting

15.还是一堆数学分析,真的有人会仔细看吗?

16.系统原理 Architecture

Step 1 Party A and B send public keys to each other
Step 2 Parties compute, encrypt and exchange intermediate results
Step 3 Parties compute encrypted gradients, add masks and send to each other
Step 4 Parties decrypt gradients and exchange, unmask and update model locally

17.数学分析……

18.模型推断

Step 1 Party B computes and sends intermediate results to A
Step 2 Party A computes encrypted result, adds mask and send back to B
Step 3 Party B decrypts masked result and sends back to A
Step 4 Party A unmasks results, predicts, and sends to B

19.数学表达

20.优势 Advantages

• 没有泄露原始数据 No exposure of raw data
• 没有泄露原始数据的加密形式 No exposure of encrypted raw data
• 没有第三方 No Third Party
• 模型几乎无损失 Almost lossless accuracy

21.延伸性 Scalability

• 加密数据传输和加密运算是最大影响因素;
• 数据传输与数据量成正比;
• 加密运算代价与模型复杂度(参数量)成正比;

22.Feature-based Heterogeneous FTL (HFTL)

Step 1 : Private transfer learning
Step 2 : Private federated learning
Step 3 : Private model integration
Step 4 : Private model inference

23.开发流程 Basic Process of Developing a Federated AI Algorithm

1.选择一个机器学习算法,设计多方安全计算协议
2.定义多方交互的数据变量
3.构建算法执行工作流
4.基于EggRoll & Federation Api 实现算法工作流中各个功能组件

24.目前 FATE 项目中算法&案例

• Secure Intersection for Sample Alignment

• Vertical-Split Feature Space Federated Learning

• Secure Logistic Regression
• Secure Boosting Tree
• Secure DNN/CNN(Coming Soon)

• Horizontal-Split Sample Space Federated Learning

• Secure Logistic Regression
• Secure Boosting Tree(Coming Soon)
• Secure DNN/CNN(Coming Soon)

• Secure Federated Transfer Learning

25.WorkFlow Example

• 工作流

• 定义联邦算法组件执行工作流

• 组件

• 参数初始化组件
• 数据加载和转换组件
• 训练、预测组件
• 评估组件
• 模型保存组件

26.FederatedML Functions Example

• 纵向LR梯度一方分布式计算

• 定义梯度和损失计算公式
• 设计算法并行方式
• 通过Eggroll API 实现分布式梯度聚合和损失计算

27.Federation API Example

• 纵向LR梯度两方联合

• 定义算法交互信息-梯度(json 配置文件,数据源和目的地)
• 生成梯度交互信息唯一标识符
• Federation API 完成梯度交互信息的收发