From 1301c2b35a618f83c91ef5775453882421973887 Mon Sep 17 00:00:00 2001 From: tarantula-leo <54618933+tarantula-leo@users.noreply.github.com> Date: Thu, 29 Feb 2024 10:53:22 +0800 Subject: [PATCH] =?UTF-8?q?=E3=80=90OSCP=E3=80=91=E4=B8=BA=20TEEU=20?= =?UTF-8?q?=E6=B7=BB=E5=8A=A0=E7=BA=BF=E6=80=A7=E5=9B=9E=E5=BD=92=E5=92=8C?= =?UTF-8?q?=E9=80=BB=E8=BE=91=E5=9B=9E=E5=BD=92=E6=95=99=E7=A8=8B=20(#1161?= =?UTF-8?q?)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * Create teeu_linerregression.md Create teeu_linerregression.md * Update teeu_linerregression.md * Create teeu_regression.po * Rename docs/tutorial/teeu_linerregression.md to docs/tutorial/teeu/teeu_linerregression.md * Rename docs/locales/zh_CN/LC_MESSAGES/tutorial/teeu_regression.po to docs/locales/zh_CN/LC_MESSAGES/tutorial/teeu/teeu_regression.po * Update index.rst * Update teeu_linerregression.md --- .../tutorial/teeu/teeu_regression.po | 368 +++++++++++ docs/tutorial/index.rst | 2 + docs/tutorial/teeu/teeu_linerregression.md | 596 ++++++++++++++++++ 3 files changed, 966 insertions(+) create mode 100644 docs/locales/zh_CN/LC_MESSAGES/tutorial/teeu/teeu_regression.po create mode 100644 docs/tutorial/teeu/teeu_linerregression.md diff --git a/docs/locales/zh_CN/LC_MESSAGES/tutorial/teeu/teeu_regression.po b/docs/locales/zh_CN/LC_MESSAGES/tutorial/teeu/teeu_regression.po new file mode 100644 index 000000000..c14cb69d0 --- /dev/null +++ b/docs/locales/zh_CN/LC_MESSAGES/tutorial/teeu/teeu_regression.po @@ -0,0 +1,368 @@ +# SOME DESCRIPTIVE TITLE. +# Copyright (C) 2022 Ant Group Co., Ltd. +# This file is distributed under the same license as the SecretFlow package. +# FIRST AUTHOR , 2024. +# +msgid "" +msgstr "" +"Project-Id-Version: SecretFlow \n" +"Report-Msgid-Bugs-To: \n" +"POT-Creation-Date: 2024-02-27 13:45+0800\n" +"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" +"Last-Translator: FULL NAME \n" +"Language-Team: LANGUAGE \n" +"MIME-Version: 1.0\n" +"Content-Type: text/plain; charset=utf-8\n" +"Content-Transfer-Encoding: 8bit\n" +"Generated-By: Babel 2.14.0\n" + +#: ../../tutorial/teeu/teeu_regression.md:1 +msgid "TEEU Example: LinerRegression" +msgstr "TEE示例:LinerRegression" + +#: ../../tutorial/teeu/teeu_regression.md:3 +msgid "**Tips**" +msgstr "提示" + +#: ../../tutorial/teeu/teeu_regression.md:5 +msgid "" +"Before reading this article, it is strongly recommended to read [TEEU " +"Getting Started Guide](../teeu.md) at first." +msgstr "在阅读本文之前,强烈推荐先阅读 [TEEU上手指南](../teeu.md) 。" + +#: ../../tutorial/teeu/teeu_regression.md:9 +msgid "" +"TEEU (`TEE` processing `U`nit) is a TEE device in SecretFlow. Through " +"TEEU, users can conveniently put data in TEE for calculation, and achieve" +" the purpose of protecting data integrity and security." +msgstr "" +"TEEU(`TEE` processing `U`nit)是 SecretFlow 中的 TEE 设备,通过 " +"TEEU,用户可以方便的把数据放在TEE内进行计算,并且达到保护数据完整和安全的目的" + +#: ../../tutorial/teeu/teeu_regression.md:11 +msgid "" +"This article will demonstrate how to run LinerRegression in TEEU for " +"model training." +msgstr "本文将演示如何在TEEU中使用LinerRegression训练模型。" + +#: ../../tutorial/teeu/teeu_regression.md:13 +msgid "1.1 Simulation mode" +msgstr "1.1 仿真模式" + +#: ../../tutorial/teeu/teeu_regression.md:15 +msgid "" +"To facilitate users who do not have access to a real TEE environment, " +"SecretFlow offers a TEEU simulation mode. This feature allows users to " +"try out TEEU functions on an ordinary machine. Code writing and usage in " +"the simulation mode are almost same with the non-simulation mode, so it " +"is recommended to use the simulation mode for quick experimental " +"verification first." +msgstr "" +"为了方便用户在没有真实 TEE 环境的情况下对 TEEU 进行尝试,SecretFlow 提供了 TEEU " +"仿真模式,这意味着您可以在普通机器上仍然可以尝试 TEEU " +"的功能。在仿真模式下,代码编写和使用体感与非仿真模式几乎无差别,因此建议可以先使用仿真模式进行快速实验验证。" + +#: ../../tutorial/teeu/teeu_regression.md:18 +msgid "" +"Note that since the real TEE environment is not used, the simulation mode" +" lacks security features that depend on the TEE environment, such as " +"remote attestation and memory encryption isolation, and cannot protect " +"data integrity and confidentiality. Simulation mode is not secure and " +"should not be used in production, keep this in mind." +msgstr "" +"注意,由于并没有使用真正的 TEE 环境,因此仿真模式缺乏远程认证和内存加密隔离等依赖 TEE " +"环境的安全特性,无法保护数据的完整性与机密性。仿真模式并不是安全的,不能用于生产上,请牢记这一点。" + +#: ../../tutorial/teeu/teeu_regression.md:20 +msgid "Pre-work" +msgstr "前置工作" + +#: ../../tutorial/teeu/teeu_regression.md:22 +msgid "Understand the SecretFlow deployment of multi-ray cluster mode" +msgstr "了解多ray集群模式的SecretFlow部署" + +#: ../../tutorial/teeu/teeu_regression.md:24 +msgid "" +"For security reasons, Ray running in TEE is an independent cluster, so " +"currently SecretFlow only supports the use of TEEU in multiple Ray " +"cluster mode. You can read the [SecretFlow Deployment " +"Documentation](../../getting_started/deployment.md#production) in advance to" +" understand the deployment of multiple Ray clusters." +msgstr "" +"出于安全原因,运行在 TEE 里的 Ray 是独立的集群,因此目前 SecretFlow 仅支持在多个 Ray 集群模式下使用 " +"TEEU。您可以事先阅读[SecretFlow部署文档](../../getting_started/deployment.md#production)了解多个" +" Ray 集群的部署。" + +#: ../../tutorial/teeu/teeu_regression.md:26 +msgid "Prepare to run the simulated TEEU machine" +msgstr "准备运行仿真 TEEU 的机器" + +#: ../../tutorial/teeu/teeu_regression.md:28 +msgid "" +"At present, SecretFlow TEEU only provides docker images. Due to some " +"limitations of the basic technology, TEE programs currently require a " +"large amount of memory to run successfully. You need to ensure that the " +"available memory for the Docker container is at least 30GB or more, " +"depending on the size of the data to be processed in TEEU." +msgstr "" +"目前 SecretFlow TEEU 仅提供 docker 镜像,由于基础技术的一些限制,目前 TEE 程序需要较大内存才能运行成功,您需要确保 " +"docker 容器可使用内存至少大于 30GB 或者可能更大,取决于TEEU要处理的数据大小。" + +#: ../../tutorial/teeu/teeu_regression.md:30 +msgid "Deploy AuthManager" +msgstr "部署 AuthManager" + +#: ../../tutorial/teeu/teeu_regression.md:32 +msgid "AuthManager is the module responsible for authorization management." +msgstr "AuthManager是负责授权管理的模块。" + +#: ../../tutorial/teeu/teeu_regression.md:34 +msgid "Download the docker image" +msgstr "下载 docker 镜像" + +#: ../../tutorial/teeu/teeu_regression.md:39 +msgid "Enter the docker image" +msgstr "进入 docker 镜像" + +#: ../../tutorial/teeu/teeu_regression.md:44 +msgid "(Optional) Configure TLS" +msgstr "(可选)配置 TLS" + +#: ../../tutorial/teeu/teeu_regression.md:46 +msgid "" +"AuthManager enables TLS by default. If you only use it for local " +"simulation, you can turn off TLS by set `enable_tls` to `false` in " +"`/root/occlum_release/config.yaml`." +msgstr "" +"AuthManager 默认启用 TLS,如果您只是为了本机仿真,可以关闭TLS功能,具体方法为编辑 config.yaml 文件,将 " +"`enable_tls` 设置为 false。" + +#: ../../tutorial/teeu/teeu_regression.md:48 +msgid "Start the service" +msgstr "启动服务" + +#: ../../tutorial/teeu/teeu_regression.md:54 +msgid "" +"The default port is 8835. Feel free to modify the `port` in config.yaml " +"if port conflicts." +msgstr "默认端口号是8835。如果发生端口冲突,请修改为其他未占用端口。" + +#: ../../tutorial/teeu/teeu_regression.md:56 +msgid "Example: LinerRegression in TEEU" +msgstr "示例:TEEU中运行LinerRegression" + +#: ../../tutorial/teeu/teeu_regression.md:58 +msgid "" +"Next, we will demonstrate how to combine data from multiple parties in " +"TEEU, and then use LinerRegression to train it." +msgstr "接下来,我们将演示如何在TEEU中合并多方的数据并且使用LinerRegression训练。" + +#: ../../tutorial/teeu/teeu_regression.md:60 +msgid "Example code" +msgstr "示例代码" + +#: ../../tutorial/teeu/teeu_regression.md:62 +msgid "" +"Assuming that Alice and Bob have the same feature space, but the sample " +"space does not overlap with each other, and each has some user features, " +"Alice and Bob intend to use TEEU to safely combine their samples and use " +"LinerRegression to train a model. At the same time, Carol acts as the " +"provider of TEEU." +msgstr "假设Alice和Bob拥有相同的特征空间,但是样本空间互不重叠,各自拥有部分用户的特征,Alice和Bob打算使用TEEU安全地对他们的样本进行合并并且使用LinerRegression训练出一个模型。与此同时,Carol作为TEEU的提供方。" + +#: ../../tutorial/teeu/teeu_regression.md:64 +msgid "The core code of the above case is as follows." +msgstr "上述案例的核心代码如下。" + +#: ../../tutorial/teeu/teeu_regression.md:153 +msgid "Alice runs the code" +msgstr "Alice运行代码" + +#: ../../tutorial/teeu/teeu_regression.md:155 +#: ../../tutorial/teeu/teeu_regression.md:301 +msgid "Start the ray master node" +msgstr "启动 ray 主节点" + +#: ../../tutorial/teeu/teeu_regression.md:157 +msgid "" +"You should modify the following command to match the actual situation, as" +" it currently assumes that Alice's Ray master node is listening at " +"192.168.0.10:10000." +msgstr "下列命令假设Alice的ray主节点监听地址为 192.168.0.10:10000,请根据实际情况修改。" + +#: ../../tutorial/teeu/teeu_regression.md:163 +#: ../../tutorial/teeu/teeu_regression.md:308 +msgid "Generate a public-private key pair" +msgstr "生成公私钥对" + +#: ../../tutorial/teeu/teeu_regression.md:165 +msgid "" +"As Alice's data needs to be encrypted and sent to TEEU, it is imperative " +"to generate a pair of public and private keys. Below, you may find the " +"code that, upon execution, generates the public and private keys, which " +"will be stored in the current directory in PEM format as " +"\"private_key.pem\" and \"public_key.pem\", respectively." +msgstr "" +"因为 Alice 的数据需要加密发送给 TEEU,所以需要事先生成一对公私钥。您可以执行下列代码生成公私钥,公私钥以 pem " +"格式分别存放在当前目录的 private_key.pem,public_key.pem。" + +#: ../../tutorial/teeu/teeu_regression.md:172 +msgid "Execute code" +msgstr "执行代码" + +#: ../../tutorial/teeu/teeu_regression.md:174 +msgid "" +"Add the SecretFlow initialization related code in front of the code to " +"get the following code. First, you need to modify the configuration items" +" in the code." +msgstr "在代码的前面加上SecretFlow初始化相关代码,得到下列的代码。首先您需要对代码中的配置项进行修改。" + +#: ../../tutorial/teeu/teeu_regression.md:176 +msgid "" +"The code assumes that Alice's communication address is " +"192.168.0.10:20001, please modify it according to the actual situation" +msgstr "代码中假设 Alice 通信地址为 192.168.0.10:20001,请您根据实际情况修改" + +#: ../../tutorial/teeu/teeu_regression.md:177 +#: ../../tutorial/teeu/teeu_regression.md:322 +#: ../../tutorial/teeu/teeu_regression.md:456 +msgid "You need to fill in the correct `auth_manager_config`" +msgstr "您需要填写填充正确的 `auth_manager_config`" + +#: ../../tutorial/teeu/teeu_regression.md:178 +#: ../../tutorial/teeu/teeu_regression.md:323 +msgid "`host` is the listening address of the AuthManager service" +msgstr "`host`为 AuthManager 的服务监听地址" + +#: ../../tutorial/teeu/teeu_regression.md:179 +msgid "" +"`ca_cert` is the CA certificate address of AuthManager, if AuthManager " +"does not start with TLS, no configuration is required." +msgstr "`ca_cert`为 AuthManager 的 CA 证书地址,如果 AuthManager 未启动 TLS,则不需要配置。" + +#: ../../tutorial/teeu/teeu_regression.md:181 +msgid "" +"Suppose we save the code as `demo.py`, and then execute `python demo.py` " +"on Alice's machine." +msgstr "假设我们把代码保存为 `demo.py`,然后在 Alice 的机器上执行 `python demo.py`。" + +#: ../../tutorial/teeu/teeu_regression.md:299 +msgid "Bob runs the code" +msgstr "Bob 运行代码" + +#: ../../tutorial/teeu/teeu_regression.md:303 +msgid "" +"You should modify the following command to match the actual situation, as" +" it currently assumes that Bob's Ray master node is listening at " +"192.168.0.20:10000." +msgstr "下列命令假设 Bob 的Ray主节点监听在 192.168.0.20:10000,请根据实际情况修改。" + +#: ../../tutorial/teeu/teeu_regression.md:310 +msgid "" +"As Bob's data needs to be encrypted and sent to TEEU, it is imperative to" +" generate a pair of public and private keys. Below, you may find the code" +" that, upon execution, generates the public and private keys, which will " +"be stored in the current directory in PEM format as \"private_key.pem\" " +"and \"public_key.pem\", respectively." +msgstr "" +"因为 Bob 的数据需要加密发送给 TEEU,所以需要事先生成一对公私钥。您可以执行下列代码生成公私钥,公私钥以 pem 格式分别存放在当前目录的" +" private_key.pem,public_key.pem。" + +#: ../../tutorial/teeu/teeu_regression.md:316 +msgid "Run the code" +msgstr "运行代码" + +#: ../../tutorial/teeu/teeu_regression.md:318 +msgid "" +"Similar to Alice, add the SecretFlow initialization code in front of the " +"code to get the following code. First, you need to modify the " +"configuration items in the code." +msgstr "与 Alice 类似,在代码前面加上 SecretFlow 初始化相关代码,得到下列的代码" + +#: ../../tutorial/teeu/teeu_regression.md:321 +msgid "" +"The code assumes that Bob's communication address is 192.168.0.20:20001, " +"please modify it according to the actual situation" +msgstr "代码中假设 Bob 通信地址为 192.168.0.20:20001,请您根据实际情况修改" + +#: ../../tutorial/teeu/teeu_regression.md:324 +msgid "" +"`ca_cert` is the CA certificate address of AuthManager, if AuthManager " +"does not start tls, no configuration is required." +msgstr "`ca_cert`为 AuthManager 的 CA 证书地址,如果 AuthManager 未启动 TLS,则不需要配置。" + +#: ../../tutorial/teeu/teeu_regression.md:326 +msgid "" +"Suppose we save the code as `demo.py`, and then execute `python demo.py` " +"on Bob's machine." +msgstr "假设我们把代码保存为 `demo.py`,然后在Bob的机器上执行 `python demo.py`。" + +#: ../../tutorial/teeu/teeu_regression.md:444 +msgid "Carol runs code (executed in TEE)" +msgstr "Carol 运行代码(在TEE中执行)" + +#: ../../tutorial/teeu/teeu_regression.md:446 +msgid "Run the SecretFlow TEE image firstly." +msgstr "启动容器" + +#: ../../tutorial/teeu/teeu_regression.md:452 +msgid "" +"Similarly, add the SecretFlow initialization code in front of the code to" +" get the following code. Unlike the previous one, Carol's code needs to " +"run in TEE, so some extra steps are required. First, you need to modify " +"the configuration items in the code." +msgstr "" +"类似地,在代码前面加上 SecretFlow " +"初始化相关代码,得到下列的代码。但是与前面有所区别,因为Carol是在TEE中运行,因此需要一些额外的步骤。首先,你需要修改代码中的配置项。" + +#: ../../tutorial/teeu/teeu_regression.md:455 +msgid "" +"In the code, it is assumed that Carol's communication address is " +"192.168.0.30:20001, please modify it according to the actual situation" +msgstr "代码中假设 Carol 通信地址为 192.168.0.30:20001,请您根据实际情况修改" + +#: ../../tutorial/teeu/teeu_regression.md:457 +msgid "`host` is the listen address of AuthManager" +msgstr "`host`为 AuthManager 的服务监听地址" + +#: ../../tutorial/teeu/teeu_regression.md:458 +msgid "" +"`ca_cert` is the CA certificate path of AuthManager, if AuthManager does " +"not enable TLS, no configuration is required." +msgstr "`ca_cert` 为 AuthManager 的 CA 证书地址,如果 AuthManager 未启动 TLS,则不需要配置。" + +#: ../../tutorial/teeu/teeu_regression.md:460 +msgid "" +"After modification, please save the file to " +"`/root/occlum_instance/image/root/demo.py`." +msgstr "修改完毕后,请把该文件保存至 /root/occlum_instance/image/root/demo.py" + +#: ../../tutorial/teeu/teeu_regression.md:582 +msgid "Then we run the script with the following command." +msgstr "然后我们通过下列命令运行脚本。" + +#: ../../tutorial/teeu/teeu_regression.md:592 +msgid "1.2 Non-simulation mode" +msgstr "1.2 非仿真模式" + +#: ../../tutorial/teeu/teeu_regression.md:594 +msgid "" +"When it is necessary to use the real TEE environment to protect the " +"confidentiality and integrity of the data in the computing process, the " +"user needs to enable the non-simulation mode, and at this time, the " +"security mechanisms provided by the TEE such as remote attestation and " +"memory encryption will be enabled. To enable the non-simulation mode, the" +" user needs to have the TEE hardware supported by the current SecretFlow " +"TEEU. Currently, SecretFlow only supports Intel SGX2.0, and more TEE " +"types will be supported in the future." +msgstr "" +"当需要使用真实的 TEE 环境保护计算过程中数据的机密性和完整性时,用户需要开启非仿真模式,此时远程认证以及内存加密等由 TEE " +"提供的安全机制将被开启。开启非仿真模式,用户需要持有当前 Secretflow TEEU 支持的 TEE 硬件,当前 Secretflow 仅支持" +" Intel SGX2.0,未来会支持更多 TEE 种类。" + +#: ../../tutorial/teeu/teeu_regression.md:596 +msgid "" +"Please check [Non-simulation](../teeu.md#summary) for running in non-" +"simulation mode." +msgstr "请查阅 [Non-simulation](../teeu.md#summary) 了解如何在非仿真模式下运行。" + diff --git a/docs/tutorial/index.rst b/docs/tutorial/index.rst index 93f6a6245..31c327fec 100644 --- a/docs/tutorial/index.rst +++ b/docs/tutorial/index.rst @@ -49,6 +49,8 @@ We hope you enjoy these toturials from SecretFlow developers. teeu teeu_xgboost + teeu/teeu_onehotencoder + teeu/teeu_regression .. toctree:: :maxdepth: 1 diff --git a/docs/tutorial/teeu/teeu_linerregression.md b/docs/tutorial/teeu/teeu_linerregression.md new file mode 100644 index 000000000..4cef9ef09 --- /dev/null +++ b/docs/tutorial/teeu/teeu_linerregression.md @@ -0,0 +1,596 @@ +# TEEU Example: LinerRegression + +**Tips** + +Before reading this article, it is strongly recommended to read [TEEU Getting Started Guide](../teeu.md) at first. + +--- + +TEEU (`TEE` processing `U`nit) is a TEE device in SecretFlow. Through TEEU, users can conveniently put data in TEE for calculation, and achieve the purpose of protecting data integrity and security. + +This article will demonstrate how to run LinerRegression in TEEU for model training. + +## 1.1 Simulation mode + +To facilitate users who do not have access to a real TEE environment, SecretFlow offers a TEEU simulation mode. This feature allows users to try out TEEU functions on an ordinary machine. +Code writing and usage in the simulation mode are almost same with the non-simulation mode, so it is recommended to use the simulation mode for quick experimental verification first. + +Note that since the real TEE environment is not used, the simulation mode lacks security features that depend on the TEE environment, such as remote attestation and memory encryption isolation, and cannot protect data integrity and confidentiality. Simulation mode is not secure and should not be used in production, keep this in mind. + +### Pre-work + +#### Understand the SecretFlow deployment of multi-ray cluster mode + +For security reasons, Ray running in TEE is an independent cluster, so currently SecretFlow only supports the use of TEEU in multiple Ray cluster mode. You can read the [SecretFlow Deployment Documentation](../../getting_started/deployment.md#production) in advance to understand the deployment of multiple Ray clusters. + +#### Prepare to run the simulated TEEU machine + +At present, SecretFlow TEEU only provides docker images. Due to some limitations of the basic technology, TEE programs currently require a large amount of memory to run successfully. You need to ensure that the available memory for the Docker container is at least 30GB or more, depending on the size of the data to be processed in TEEU. + +#### Deploy AuthManager + +AuthManager is the module responsible for authorization management. + +1. Download the docker image +```shell +docker pull secretflow/authmanager-release-sim-ubuntu:latest +``` + +2. Enter the docker image +```shell +docker run -it --net host secretflow/authmanager-release-sim-ubuntu:latest +``` + +3. (Optional) Configure TLS + +AuthManager enables TLS by default. If you only use it for local simulation, you can turn off TLS by set `enable_tls` to `false` in `/root/occlum_release/config.yaml`. + +4. Start the service + +```shell +cd occlum_release +occlum run /bin/auth-manager --config_path /host/config.yaml +``` +The default port is 8835. Feel free to modify the `port` in config.yaml if port conflicts. + +### Example: LinerRegression in TEEU + +Next, we will demonstrate how to combine data from multiple parties in TEEU, and then use LinerRegression to train it. + +#### Example code + +Assuming that Alice and Bob have the same feature space, but the sample space does not overlap with each other, and each has some user features, Alice and Bob intend to use TEEU to safely combine their samples and use LinerRegression to train a model. At the same time, Carol acts as the provider of TEEU. + +The core code of the above case is as follows. + +```python +import secretflow as sf +import numpy as np + +def gen_data(): + """ + Generate random classified data for simulation. + """ + from sklearn.datasets import make_classification + + num_classes = 2 + x, y = make_classification(n_samples=1000, n_informative=5, + n_classes=num_classes) + return x, y + + +def liner_regression(x_slices, y_slices): + """ + Cancat the input x and y, then train them with Liner_Regression. + """ + from sklearn.linear_model import LinearRegression + from sklearn.model_selection import train_test_split + from sklearn.metrics import accuracy_score + + x = np.concatenate(x_slices) + y = np.concatenate(y_slices) + x_train, x_test = train_test_split(x, random_state=0) + y_train, y_test = train_test_split(y, random_state=0) + model = LinearRegression() + model.fit(x_train, y_train) + y_pred = model.predict(x_test) + y_pred = np.where(y_pred > 0.5, 1, 0) + result = accuracy_score(y_test, y_pred) + return result + + +def logistic_regression(x_slices, y_slices): + """ + Cancat the input x and y, then train them with Logistic_Regression. + """ + from sklearn.linear_model import LogisticRegression + from sklearn.model_selection import train_test_split + from sklearn.metrics import accuracy_score + + x = np.concatenate(x_slices) + y = np.concatenate(y_slices) + x_train, x_test = train_test_split(x, random_state=0) + y_train, y_test = train_test_split(y, random_state=0) + model = LogisticRegression() + model.fit(x_train, y_train) + y_pred = model.predict(x_test) + y_pred = np.where(y_pred > 0.5, 1, 0) + result = accuracy_score(y_test, y_pred) + return result + +alice = sf.PYU('alice') +bob = sf.PYU('bob') + +# Alice generates its samples. +x_a, y_a = alice(gen_data, num_returns=2)() +# Bob generates its samples. +x_b, y_b = bob(gen_data, num_returns=2)() + +from secretflow.device import TEEU + +# mrenclave can be omitted in simulation mode. +teeu = TEEU('carol', mr_enclave='') + +# Transfer data to teeu. +x_a_teeu = x_a.to(teeu, allow_funcs=[logistic_regression,liner_regression]) +y_a_teeu = y_a.to(teeu, allow_funcs=[logistic_regression,liner_regression]) + +x_b_teeu = x_b.to(teeu, allow_funcs=[logistic_regression,liner_regression]) +y_b_teeu = y_b.to(teeu, allow_funcs=[logistic_regression,liner_regression]) + +# Run logistic_regression. +res = teeu(logistic_regression)([x_a_teeu, x_b_teeu], [y_a_teeu, y_b_teeu]) +logistic_acc = sf.reveal(res) +print(f'Logistic_Regression_accuracy: {logistic_acc}') + +# Run liner_regression. +res = teeu(liner_regression)([x_a_teeu, x_b_teeu], [y_a_teeu, y_b_teeu]) +liner_acc = sf.reveal(res) +print(f'Liner_Regression_accuracy: {liner_acc}') + +``` + +#### Alice runs the code + +1. Start the ray master node + +You should modify the following command to match the actual situation, as it currently assumes that Alice's Ray master node is listening at 192.168.0.10:10000. + +```bash +ray start --head --node-ip-address="192.168.0.10" --port="10000" --include-dashboard=False --disable-usage-stats +``` + +2. Generate a public-private key pair + +As Alice's data needs to be encrypted and sent to TEEU, it is imperative to generate a pair of public and private keys. Below, you may find the code that, upon execution, generates the public and private keys, which will be stored in the current directory in PEM format as "private_key.pem" and "public_key.pem", respectively. + +```bash +openssl genrsa -3 -out private_key.pem 3072 +openssl rsa -in private_key.pem -pubout -out public_key.pem +``` + +3. Execute code + +Add the SecretFlow initialization related code in front of the code to get the following code. +First, you need to modify the configuration items in the code. +- The code assumes that Alice's communication address is 192.168.0.10:20001, please modify it according to the actual situation +- You need to fill in the correct `auth_manager_config` + - `host` is the listening address of the AuthManager service + - `ca_cert` is the CA certificate address of AuthManager, if AuthManager does not start with TLS, no configuration is required. + +Suppose we save the code as `demo.py`, and then execute `python demo.py` on Alice's machine. + +```python +import secretflow as sf + +cluster_config = { + 'parties': { + 'alice': {'address': '192.168.0.10:20001', 'listen_address': '0.0.0.0:20001'}, + 'bob': {'address': '192.168.0.20:20001', 'listen_address': '0.0.0.0:20001'}, + 'carol': {'address': '192.168.0.30:20001', 'listen_address': '0.0.0.0:20001'}, + }, + 'self_party': 'alice', +} + +party_key_pair = { + 'alice': {'private_key': './private_key.pem', 'public_key': './public_key.pem'} +} + +auth_manager_config = { + 'host': 'host of AuthManager', + 'ca_cert': 'path_of_AuthManager_ca_certificate', + 'mr_enclave': '' +} + +# Connect to alice's ray +sf.init( + address='192.168.0.10:10000', + cluster_config=cluster_config, + party_key_pair=party_key_pair, + auth_manager_config=auth_manager_config, + tee_simulation=True, +) + +import numpy as np + +def gen_data(): + """ + Generate random classified data for simulation. + """ + from sklearn.datasets import make_classification + + num_classes = 2 + x, y = make_classification(n_samples=1000, n_informative=5, + n_classes=num_classes) + return x, y + + +def liner_regression(x_slices, y_slices): + """ + Cancat the input x and y, then train them with Liner_Regression. + """ + from sklearn.linear_model import LinearRegression + from sklearn.model_selection import train_test_split + from sklearn.metrics import accuracy_score + + x = np.concatenate(x_slices) + y = np.concatenate(y_slices) + x_train, x_test = train_test_split(x, random_state=0) + y_train, y_test = train_test_split(y, random_state=0) + model = LinearRegression() + model.fit(x_train, y_train) + y_pred = model.predict(x_test) + y_pred = np.where(y_pred > 0.5, 1, 0) + result = accuracy_score(y_test, y_pred) + return result + + +def logistic_regression(x_slices, y_slices): + """ + Cancat the input x and y, then train them with Logistic_Regression. + """ + from sklearn.linear_model import LogisticRegression + from sklearn.model_selection import train_test_split + from sklearn.metrics import accuracy_score + + x = np.concatenate(x_slices) + y = np.concatenate(y_slices) + x_train, x_test = train_test_split(x, random_state=0) + y_train, y_test = train_test_split(y, random_state=0) + model = LogisticRegression() + model.fit(x_train, y_train) + y_pred = model.predict(x_test) + y_pred = np.where(y_pred > 0.5, 1, 0) + result = accuracy_score(y_test, y_pred) + return result + +alice = sf.PYU('alice') +bob = sf.PYU('bob') + +# Alice generates its samples. +x_a, y_a = alice(gen_data, num_returns=2)() +# Bob generates its samples. +x_b, y_b = bob(gen_data, num_returns=2)() + +from secretflow.device import TEEU + +# mrenclave can be omitted in simulation mode. +teeu = TEEU('carol', mr_enclave='') + +# Transfer data to teeu. +x_a_teeu = x_a.to(teeu, allow_funcs=[logistic_regression,liner_regression]) +y_a_teeu = y_a.to(teeu, allow_funcs=[logistic_regression,liner_regression]) + +x_b_teeu = x_b.to(teeu, allow_funcs=[logistic_regression,liner_regression]) +y_b_teeu = y_b.to(teeu, allow_funcs=[logistic_regression,liner_regression]) + +# Run logistic_regression. +res = teeu(logistic_regression)([x_a_teeu, x_b_teeu], [y_a_teeu, y_b_teeu]) +logistic_acc = sf.reveal(res) +print(f'Logistic_Regression_accuracy: {logistic_acc}') + +# Run liner_regression. +res = teeu(liner_regression)([x_a_teeu, x_b_teeu], [y_a_teeu, y_b_teeu]) +liner_acc = sf.reveal(res) +print(f'Liner_Regression_accuracy: {liner_acc}') + +``` + +#### Bob runs the code + +1. Start the ray master node + +You should modify the following command to match the actual situation, as it currently assumes that Bob's Ray master node is listening at 192.168.0.20:10000. +```bash +ray start --head --node-ip-address="192.168.0.20" --port="100000" --include-dashboard=False --disable-usage-stats +``` + +2. Generate a public-private key pair + +As Bob's data needs to be encrypted and sent to TEEU, it is imperative to generate a pair of public and private keys. Below, you may find the code that, upon execution, generates the public and private keys, which will be stored in the current directory in PEM format as "private_key.pem" and "public_key.pem", respectively. +```bash +openssl genrsa -3 -out private_key.pem 3072 +openssl rsa -in private_key.pem -pubout -out public_key.pem +``` + +3. Run the code + +Similar to Alice, add the SecretFlow initialization code in front of the code to get the following code. +First, you need to modify the configuration items in the code. + +- The code assumes that Bob's communication address is 192.168.0.20:20001, please modify it according to the actual situation +- You need to fill in the correct `auth_manager_config` +- `host` is the listening address of the AuthManager service +- `ca_cert` is the CA certificate address of AuthManager, if AuthManager does not start tls, no configuration is required. + +Suppose we save the code as `demo.py`, and then execute `python demo.py` on Bob's machine. + +```python +import secretflow as sf + +cluster_config = { + 'parties': { + 'alice': {'address': '192.168.0.10:20001', 'listen_address': '0.0.0.0:20001'}, + 'bob': {'address': '192.168.0.20:20001', 'listen_address': '0.0.0.0:20001'}, + 'carol': {'address': '192.168.0.30:20001', 'listen_address': '0.0.0.0:20001'}, + }, + 'self_party': 'bob', +} + +party_key_pair = { + 'bob': {'private_key': './private_key.pem', 'public_key': './public_key.pem'} +} + +auth_manager_config = { + 'host': 'host of AuthManager', + 'ca_cert': 'path_of_AuthManager_ca_certificate', + 'mr_enclave': '' +} + +# Connect to Bob's ray +sf.init( + address='192.168.0.20:10000', + cluster_config=cluster_config, + party_key_pair=party_key_pair, + auth_manager_config=auth_manager_config, + tee_simulation=True, +) + +import numpy as np + +def gen_data(): + """ + Generate random classified data for simulation. + """ + from sklearn.datasets import make_classification + + num_classes = 2 + x, y = make_classification(n_samples=1000, n_informative=5, + n_classes=num_classes) + return x, y + + +def liner_regression(x_slices, y_slices): + """ + Cancat the input x and y, then train them with Liner_Regression. + """ + from sklearn.linear_model import LinearRegression + from sklearn.model_selection import train_test_split + from sklearn.metrics import accuracy_score + + x = np.concatenate(x_slices) + y = np.concatenate(y_slices) + x_train, x_test = train_test_split(x, random_state=0) + y_train, y_test = train_test_split(y, random_state=0) + model = LinearRegression() + model.fit(x_train, y_train) + y_pred = model.predict(x_test) + y_pred = np.where(y_pred > 0.5, 1, 0) + result = accuracy_score(y_test, y_pred) + return result + + +def logistic_regression(x_slices, y_slices): + """ + Cancat the input x and y, then train them with Logistic_Regression. + """ + from sklearn.linear_model import LogisticRegression + from sklearn.model_selection import train_test_split + from sklearn.metrics import accuracy_score + + x = np.concatenate(x_slices) + y = np.concatenate(y_slices) + x_train, x_test = train_test_split(x, random_state=0) + y_train, y_test = train_test_split(y, random_state=0) + model = LogisticRegression() + model.fit(x_train, y_train) + y_pred = model.predict(x_test) + y_pred = np.where(y_pred > 0.5, 1, 0) + result = accuracy_score(y_test, y_pred) + return result + +alice = sf.PYU('alice') +bob = sf.PYU('bob') + +# Alice generates its samples. +x_a, y_a = alice(gen_data, num_returns=2)() +# Bob generates its samples. +x_b, y_b = bob(gen_data, num_returns=2)() + +from secretflow.device import TEEU + +# mrenclave can be omitted in simulation mode. +teeu = TEEU('carol', mr_enclave='') + +# Transfer data to teeu. +x_a_teeu = x_a.to(teeu, allow_funcs=[logistic_regression,liner_regression]) +y_a_teeu = y_a.to(teeu, allow_funcs=[logistic_regression,liner_regression]) + +x_b_teeu = x_b.to(teeu, allow_funcs=[logistic_regression,liner_regression]) +y_b_teeu = y_b.to(teeu, allow_funcs=[logistic_regression,liner_regression]) + +# Run logistic_regression. +res = teeu(logistic_regression)([x_a_teeu, x_b_teeu], [y_a_teeu, y_b_teeu]) +logistic_acc = sf.reveal(res) +print(f'Logistic_Regression_accuracy: {logistic_acc}') + +# Run liner_regression. +res = teeu(liner_regression)([x_a_teeu, x_b_teeu], [y_a_teeu, y_b_teeu]) +liner_acc = sf.reveal(res) +print(f'Liner_Regression_accuracy: {liner_acc}') + +``` + +#### Carol runs code (executed in TEE) + +Run the SecretFlow TEE image firstly. + +```bash +docker run -it --network host secretflow/secretflow-teeu:latest +``` + +Similarly, add the SecretFlow initialization code in front of the code to get the following code. Unlike the previous one, Carol's code needs to run in TEE, so some extra steps are required. +First, you need to modify the configuration items in the code. + +1. In the code, it is assumed that Carol's communication address is 192.168.0.30:20001, please modify it according to the actual situation +2. You need to fill in the correct `auth_manager_config` + - `host` is the listen address of AuthManager + - `ca_cert` is the CA certificate path of AuthManager, if AuthManager does not enable TLS, no configuration is required. + +After modification, please save the file to `/root/occlum_instance/image/root/demo.py`. + +```python + +# Generate tls cert and key at first. +from tls_cert import generate_self_signed_tls_certs + +generate_self_signed_tls_certs('/root/server.crt', '/root/server.key') + + +import secretflow as sf + +cluster_config = { + 'parties': { + 'alice': {'address': '192.168.0.10:20001', 'listen_address': '0.0.0.0:20001'}, + 'bob': {'address': '192.168.0.20:20001', 'listen_address': '0.0.0.0:20001'}, + 'carol': {'address': '192.168.0.30:20001', 'listen_address': '0.0.0.0:20001'}, + }, + 'self_party': 'carol', +} + +auth_manager_config = { + 'host': 'host of AuthManager', + 'ca_cert': 'path_of_AuthManager_ca_certificate', + 'mr_enclave': '' +} + +# Start a local Ray. +sf.init( + address='local', + cluster_config=cluster_config, + auth_manager_config=auth_manager_config, + tee_simulation=True, + _temp_dir="/host/tmp/ray", + _plasma_directory="/tmp", +) + +import numpy as np + +def gen_data(): + """ + Generate random classified data for simulation. + """ + from sklearn.datasets import make_classification + + num_classes = 2 + x, y = make_classification(n_samples=1000, n_informative=5, + n_classes=num_classes) + return x, y + + +def liner_regression(x_slices, y_slices): + """ + Cancat the input x and y, then train them with Liner_Regression. + """ + from sklearn.linear_model import LinearRegression + from sklearn.model_selection import train_test_split + from sklearn.metrics import accuracy_score + + x = np.concatenate(x_slices) + y = np.concatenate(y_slices) + x_train, x_test = train_test_split(x, random_state=0) + y_train, y_test = train_test_split(y, random_state=0) + model = LinearRegression() + model.fit(x_train, y_train) + y_pred = model.predict(x_test) + y_pred = np.where(y_pred > 0.5, 1, 0) + result = accuracy_score(y_test, y_pred) + return result + + +def logistic_regression(x_slices, y_slices): + """ + Cancat the input x and y, then train them with Logistic_Regression. + """ + from sklearn.linear_model import LogisticRegression + from sklearn.model_selection import train_test_split + from sklearn.metrics import accuracy_score + + x = np.concatenate(x_slices) + y = np.concatenate(y_slices) + x_train, x_test = train_test_split(x, random_state=0) + y_train, y_test = train_test_split(y, random_state=0) + model = LogisticRegression() + model.fit(x_train, y_train) + y_pred = model.predict(x_test) + y_pred = np.where(y_pred > 0.5, 1, 0) + result = accuracy_score(y_test, y_pred) + return result + +alice = sf.PYU('alice') +bob = sf.PYU('bob') + +# Alice generates its samples. +x_a, y_a = alice(gen_data, num_returns=2)() +# Bob generates its samples. +x_b, y_b = bob(gen_data, num_returns=2)() + +from secretflow.device import TEEU + +# mrenclave can be omitted in simulation mode. +teeu = TEEU('carol', mr_enclave='') + +# Transfer data to teeu. +x_a_teeu = x_a.to(teeu, allow_funcs=[logistic_regression,liner_regression]) +y_a_teeu = y_a.to(teeu, allow_funcs=[logistic_regression,liner_regression]) + +x_b_teeu = x_b.to(teeu, allow_funcs=[logistic_regression,liner_regression]) +y_b_teeu = y_b.to(teeu, allow_funcs=[logistic_regression,liner_regression]) + +# Run logistic_regression. +res = teeu(logistic_regression)([x_a_teeu, x_b_teeu], [y_a_teeu, y_b_teeu]) +logistic_acc = sf.reveal(res) +print(f'Logistic_Regression_accuracy: {logistic_acc}') + +# Run liner_regression. +res = teeu(liner_regression)([x_a_teeu, x_b_teeu], [y_a_teeu, y_b_teeu]) +liner_acc = sf.reveal(res) +print(f'Liner_Regression_accuracy: {liner_acc}') + +``` + +Then we run the script with the following command. + +```bash +cd /root/occlum_instance +openssl genrsa -3 -out private_key.pem 3072 +openssl rsa -in private_key.pem -pubout -out public_key.pem +occlum build --sgx-mode sim --sign-key private_key.pem +occlum run /bin/python3 /root/demo.py +``` + +## 1.2 Non-simulation mode + +When it is necessary to use the real TEE environment to protect the confidentiality and integrity of the data in the computing process, the user needs to enable the non-simulation mode, and at this time, the security mechanisms provided by the TEE such as remote attestation and memory encryption will be enabled. To enable the non-simulation mode, the user needs to have the TEE hardware supported by the current SecretFlow TEEU. Currently, SecretFlow only supports Intel SGX2.0, and more TEE types will be supported in the future. + +Please check [Non-simulation](../teeu.md#summary) for running in non-simulation mode.