Skip to content
This repository has been archived by the owner on Jan 20, 2022. It is now read-only.

numpy fails while trying to run in python #267

Closed
tarikova opened this issue Oct 24, 2018 · 25 comments
Closed

numpy fails while trying to run in python #267

tarikova opened this issue Oct 24, 2018 · 25 comments

Comments

@tarikova
Copy link

tarikova commented Oct 24, 2018

I am working on a project that entails running python scientific computing stack inside intel SGX enclaves. I found Graphene very helpful in terms of running python programs. While I can run all the native libraries (written fully in python) using the manifest that comes with /LibOS/shim/test/apps/python, when I try to run some slightly more complex i.e. libraries with underlying compiled c library (.so files) dependencies, the program fails.

I am trying to run this simple program:

import numpy as np

arr = np.linspace(100)

print(arr)

I tried two different solutions:

  1. Changed manifest file to mount the path to the python libraries and added allowed_files the full library folder dist_packages (which includes all the python libraries so that dependencies are not an issue)

  2. Because I realized that there might be some additional dependencies or .so library files that I might be missing, I tried this program called pyinstaller (https://www.pyinstaller.org/). This compiles a single python program in an executable with added .so depndencies so that one can run this in any computer (with similar OS). The nice thing is that it shows me all the .so files I need to run the program. However, after running this I get the same error as method 1.

Both of these methods end with the following error:

Cannot attach to any TCS!
Memory Mapping Exception in Untrusted Code (RIP = 55ae916d52cd)

I tried reading the underlying thread_map code in graphene (Pal/src/host/Linux-SGX/sgx_thread.c), but could not figure out a fix for this.

I would really appreciate if you could help me find a solution or point me towards some ways that I might try. I think if we can run scientific computiing stack (numpy, pandas, scikit, sklearn) inside graphene it would be of tremendous benefit to the scientific community or any parties who are trying to run secure computation over some untrusted servers/need guarantees of their data security.

Manifest:

#!$(PAL)

loader.preload = file:$(SHIMPATH)
loader.exec = file:/usr/bin/python
loader.execname = python
loader.env.LD_LIBRARY_PATH = /graphene:/graphene/resolv:/host:/usr/lib:/usr/lib/x86_64-linux-gnu
loader.env.PATH = /usr/bin:/bin
loader.env.USERNAME =
loader.env.HOME =
loader.env.PWD =
loader.debug_type = none

fs.mount.lib1.type = chroot
fs.mount.lib1.path = /graphene
fs.mount.lib1.uri = file:$(LIBCDIR)

fs.mount.lib2.type = chroot
fs.mount.lib2.path = /host
fs.mount.lib2.uri = file:/lib/x86_64-linux-gnu

fs.mount.bin.type = chroot
fs.mount.bin.path = /bin
fs.mount.bin.uri = file:/bin

fs.mount.usr.type = chroot
fs.mount.usr.path = /usr
fs.mount.usr.uri = file:/usr

fs.mount.etc.type = chroot
fs.mount.etc.path = /etc
fs.mount.etc.uri = file:

fs.mount.home.type = chroot
fs.mount.home.path = /home
fs.mount.home.uri = file:/home

sys.stack.size = 1M
sys.brk.size = 4M
glibc.heap_size = 16M

sgx.trusted_files.ld = file:$(LIBCDIR)/ld-linux-x86-64.so.2
sgx.trusted_files.libc = file:$(LIBCDIR)/libc.so.6
sgx.trusted_files.libdl = file:$(LIBCDIR)/libdl.so.2
sgx.trusted_files.libm = file:$(LIBCDIR)/libm.so.6
sgx.trusted_files.libpthread = file:$(LIBCDIR)/libpthread.so.0
sgx.trusted_files.liburil = file:$(LIBCDIR)/libutil.so.1
sgx.trusted_files.libz = file:/lib/x86_64-linux-gnu/libz.so.1
sgx.trusted_files.libnss1 = file:/lib/x86_64-linux-gnu/libnss_compat.so.2
sgx.trusted_files.libnss2 = file:/lib/x86_64-linux-gnu/libnss_files.so.2
sgx.trusted_files.libnss3 = file:$(LIBCDIR)/libnss_dns.so.2
sgx.trusted_files.libssl = file:/lib/x86_64-linux-gnu/libssl.so.1.0.0
sgx.trusted_files.libcrypto = file:/lib/x86_64-linux-gnu/libcrypto.so.1.0.0
sgx.trusted_files.libresolv = file:$(LIBCDIR)/libresolv.so.2
sgx.trusted_files.hosts = file:hosts
sgx.trusted_files.resolv = file:resolv.conf
sgx.trusted_files.gai = file:gai.conf

sgx.allowed_files.pyhome = file:/usr/lib/python2.7
sgx.allowed_files.pyhome2 = file:scripts
sgx.allowed_files.pyhome3 = file:/home/$(USER)/.local/lib/python2.7/site-packages
@dimakuv
Copy link

dimakuv commented Oct 24, 2018

Cannot attach to any TCS!
Memory Mapping Exception in Untrusted Code (RIP = 55ae916d52cd)

These are two issues (probably) unrelated to C libraries needed by Graphene-SGX.

  1. The first line (Cannot attach to any TCS!) implies that Graphene-SGX didn't allocate enough threads at startup. Recall that in the current SGX environment, all enclave threads must be pre-allocated.

By default, Graphene-SGX allocates four enclave threads. There is a special knob in the SGX manifest: sgx.thread_num =4. See Wiki page. Try to increase the number of enclave threads to e.g. 8 in your Manifest file. Also experiment with bigger enclave sizes:

...
sgx.thread_num =4
sgx.enclave_size= 1024M
  1. The second issue (Memory Mapping Exception in Untrusted Code) may be resolved after you resolve the first issue. But it can also be completely unrelated. If you still experience it after resolving the first issue, try to debug it (GDB=1 SGX=1 ...) and feel free to report your findings here.

I would really appreciate if you could help me find a solution or point me towards some ways that I might try. I think if we can run scientific computiing stack (numpy, pandas, scikit, sklearn) inside graphene it would be of tremendous benefit to the scientific community or any parties who are trying to run secure computation over some untrusted servers/need guarantees of their data security.

Agreed. I would also be interested in trying more Python workloads. Feel free to share your repo where you experiment with Python+Numpy so we can provide more direct feedback.

P.S. Graphene-SGX has a clear warning/error message if it cannot find a shared library. So usually you will notice the missing library in the Graphene-SGX's output. For debugging purposes, it is also helpful to enable more verbose output loader.debug_type = inline. And use strace -f ... when in doubt.

@thomasknauth
Copy link
Contributor

Possibly, Python tries to create a new process under the hood to find the location of the shared library to load (code in cpython/Lib/ctypes/util.py). To get more insights, you may want to compile cpython from source with debug info and run it under gdb (Graphene has gdb support). This should hopefully bring you closer to the question why it fails. Monitoring the system calls as Dmitrii suggested may also bring you closer to the problem.

I have no idea how pyinstaller works, but if it is just packing all the dependencies into a single directory it won't be of help in this scenario. The python runtime probably still fork()s to discover the shared libraries to load.

@donporter
Copy link
Contributor

I agree with these comments. You might try to figure out how many threads this application requires on Linux and increase the thread_num parameter until the TCS error goes away.

Let us know whether you are still having issues after that.

@tarikova
Copy link
Author

Thanks so much everyone! I increased the resources as suggested by making the following changes:

sys.stack.size = 1M
sys.brk.size = 4M
glibc.heap_size = 16M

sgx.enclave_size = 1024M
sgx.thread_num = 16

In addition, I had to add some other .so library files to trusted files. Then it worked without any issues!

I was able to train a random forest using sklearn, pandas, and numpy libraries on some cancer-cell image dataset :D

Given we have professor Porter on the thread: may I ask whether there is any known easy way to implement sealing and remote attestation (well at least seaing) inside graphene enclave? I tried something very similar to this library: https://github.com/adombeck/python-sgx by running some C code inside graphene and pass to python. But no luck--the program fails with no output. I probably should start another issue to discuss that topic. Thanks again for help.

@donporter
Copy link
Contributor

@chiache ?

I do not believe we support sealing, but there is some support for remote attestation.

@thomasknauth
Copy link
Contributor

https://github.com/cloud-security-research/sgx-ra-tls/blob/master/README.md maybe this could help? There are examples how to use the RA-TLS library with Graphene. Doing it from within Python should be doable with minimal effort. In fact, the repo demonstrates how to do it from within Python but for SGX-LKL instead of Graphene.

@tarikova
Copy link
Author

Oh wow! This is very helpful @thomasknauth 👍 Thanks so much. I will take a look.

Sealing would be incredibly helpful as that's one way to persist say some sort of key or certificates on a server that we do not trust by design. I will keep trying the swig based solution that I am rooting for i.e. compile a library file that can call the underlying sgx_seal_data function from sgx_tseal.h and then try to call it from python code that is running inside the enclave.

I wonder if anyone has ever been successful doing this?

I will keep everyone updated here and issue #157

Thanks again for helpful pointers

@tarikova
Copy link
Author

tarikova commented Oct 25, 2018

One quick issue add--nothing major but I still want to report. When I run some not-very-basic ML algorithm inside graphene, I get the following warning messages (and I get a ton of them):

file_map does not currently support writeable pass-through mappings on SGX.  You may add the PAL_PROT_WRITECOPY (MAP_PRIVATE) flag to your file mapping to keep the writes inside the enclave but they won't be reflected outside of the enclave.

I am wondering what the can I pass-through mappings are here? Can anyone enlighten me about those? Also, does it have any potential performance issues?

p.s. this is the piece of code I ran inside graphene:

print(__doc__)
import sys
sys.path.append('/home/tarik/.local/lib/python2.7/site-packages')

from time import time
t = time()
import numpy as np

from sklearn import metrics
from sklearn.cluster import KMeans
from sklearn.datasets import load_digits
from sklearn.decomposition import PCA
from sklearn.preprocessing import scale

np.random.seed(42)

digits = load_digits()
data = scale(digits.data)

n_samples, n_features = data.shape
n_digits = len(np.unique(digits.target))
labels = digits.target

sample_size = 300

print("n_digits: %d, \t n_samples %d, \t n_features %d"
      % (n_digits, n_samples, n_features))


print(82 * '_')
print('init\t\ttime\tinertia\thomo\tcompl\tv-meas\tARI\tAMI\tsilhouette')

N = 1

def bench_k_means(estimator, name, data):
    t0 = time()
    estimator.fit(data)
    print('%-9s\t%.2fs\t%i\t%.3f\t%.3f\t%.3f\t%.3f\t%.3f\t%.3f'
          % (name, (time() - t0), estimator.inertia_,
             metrics.homogeneity_score(labels, estimator.labels_),
             metrics.completeness_score(labels, estimator.labels_),
             metrics.v_measure_score(labels, estimator.labels_),
             metrics.adjusted_rand_score(labels, estimator.labels_),
             metrics.adjusted_mutual_info_score(labels,  estimator.labels_),
             metrics.silhouette_score(data, estimator.labels_,
                                      metric='euclidean',
                                      sample_size=sample_size)))

bench_k_means(KMeans(init='k-means++', n_clusters=n_digits, n_init=10, n_jobs=N),
              name="k-means++", data=data)

bench_k_means(KMeans(init='random', n_clusters=n_digits, n_init=10, n_jobs=N),
              name="random", data=data)

# in this case the seeding of the centers is deterministic, hence we run the
# kmeans algorithm only once with n_init=1
pca = PCA(n_components=n_digits).fit(data)
bench_k_means(KMeans(init=pca.components_, n_clusters=n_digits, n_init=1, n_jobs=N),
              name="PCA-based",
              data=data)
print(82 * '_')

# #############################################################################
# Visualize the results on PCA-reduced data

reduced_data = PCA(n_components=2).fit_transform(data)
kmeans = KMeans(init='k-means++', n_clusters=n_digits, n_init=10, n_jobs=1)
kmeans.fit(reduced_data)

# Step size of the mesh. Decrease to increase the quality of the VQ.
h = .02     # point in the mesh [x_min, x_max]x[y_min, y_max].

# Plot the decision boundary. For that, we will assign a color to each
x_min, x_max = reduced_data[:, 0].min() - 1, reduced_data[:, 0].max() + 1
y_min, y_max = reduced_data[:, 1].min() - 1, reduced_data[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))

# Obtain labels for each point in mesh. Use last trained model.
Z = kmeans.predict(np.c_[xx.ravel(), yy.ravel()])

print "Took %.2f seconds" % (time() - t)

@donporter
Copy link
Contributor

A pass-through mapping writes output to a file on the untrusted host. The issue is that Graphene neither encrypts this output, nor integrity checks it if you re-read the contents of the file.

These are features that should either be added to Graphene, or be part of a larger system that provides an encrypted and integrity-checked file system. Their absence doesn't create a performance problem, but does create an attack vector on an adversarial host. At this point, our goal in that warning is to, well, warn the user about the state of things.

@tarikova
Copy link
Author

Thanks a lot for the clarification. Good to know the security risks.

@Vampsj
Copy link

Vampsj commented Dec 26, 2018

Hi, @donporter sorry for having so many problems. When I want to run the python with numpy library, I
get the error message about the

ImportError: /graphene/libm.so.6: version `GLIBC_2.23' not found (required by /usr/lib/x86_64-linux-gnu/libquadmath.so.0)

Though I have checked Issue#179, and tried to add ''asm(".symver realpath,realpath@GLIBC_2.19");" in the py file but it still doesn't work, giving the same error message as before.

I will really appreciate if someone can help me about how should I recompile/run the file against glibc-2.19. Thank you.

@donporter
Copy link
Contributor

I think you will either need a patched glibc 2.23, or to get a version of libm that is compiled against glibc 2.19. I might consider just downgrading to an older version of ubuntu in the interest of getting things working. I'm guessing you are on 18.04?

@Vampsj
Copy link

Vampsj commented Dec 26, 2018

Thanks for your reply! I will check about both ways. I am running on 16.04 but the glibc is just 2.23...

@thomasknauth
Copy link
Contributor

thomasknauth commented Jan 2, 2019

We are running some numpy code with Graphene-SGX without the problems you mentioned. It may help to use virtualenv to install numpy and use that instead of the package version. Thinking about it, that was only one of the problems we ran into. I don't think we are ready to share our modified version of Graphene just yet.

@Vampsj
Copy link

Vampsj commented Jan 5, 2019

Thank you @thomasknauth . I will try again with virtualenv.

@Khallu
Copy link

Khallu commented Dec 19, 2019

Hey @tarikova
I'm trying to run a few ML libs inside graphene too.
Would highly appreciate it if you could share the manifest and the other setup for this script. #267 (comment) !

I keep getting the following everytime I try to import an ML package (scikit, tf, keras) whose deps I matched in manifest.

Cannot open manifest file: python3.6.manifest.sgx
USAGE: /home/khallu/graphene/Pal/src/../../Runtime/pal-Linux-SGX [executable|manifest] args ...

and when I remove the import statement everything else runs file.

@Khallu
Copy link

Khallu commented Dec 19, 2019

@dimakuv yes, I ran them too. But I'm unable to run use python tf or keras module. I get the above mentioned error even though I have mapped the shared objects that are necessary to run in the manifest.
I also tried modifying the same example to run tf/keras/scikit, but I get the same error

@dimakuv
Copy link

dimakuv commented Dec 19, 2019

@Khallu , could you explain a bit more about your issue? Do you have the file python3.6.manifest.sgx in your directory? What is the import statement you mention?

@Khallu
Copy link

Khallu commented Dec 20, 2019

@dimakuv I'm using the same manifest. I'm adding the dependencies that I'd require for tensorflow and I add them to the makefile, like this to PY_LIBS

$(PYTHONSITEHOME)/numpy/core/_multiarray_umath.cpython-$(PYTHONSHORTVERSION)m-x86_64-linux-gnu.so \
$(PYTHONSITEHOME)/scipy/sparse/_sparsetools.cpython-$(PYTHONSHORTVERSION)m-x86_64-linux-gnu.so \
$(PYTHONSITEHOME)/tensorflow/python/_pywrap_tensorflow_internal.so \
$(PYTHONSITEHOME)/h5py/h5.cpython-$(PYTHONSHORTVERSION)m-x86_64-linux-gnu.so

But when I run the python scripts (with the import tensorflow in it) using graphene. I get
Cannot open manifest file: python3.6.manifest.sgx USAGE: /home/khallu/graphene/Pal/src/../../Runtime/pal-Linux-SGX [executable|manifest] args ...
But when I remove the import statement it runs fine.

The name of the manifest in both cases is python.manifest.sgx. But trying to import tf makes it looks for python3.6 as the exec

@dimakuv
Copy link

dimakuv commented Dec 20, 2019

Ah, that's interesting. It looks like import tf wants to spawn a new process which is not just python (a symlink) but an actual executable name python3.6. I suggest you rename your python.manifest.template to python3.6.manifest.template and change python everywhere inside your makefile/manifest to python3.6.

@Khallu
Copy link

Khallu commented Dec 21, 2019

Thanks, that worked for that specific issue I suppose. But now, the program while executing leads my machine to log-out and reboot automatically. I'm unsure why. Could it be due to a memory issue?

The following are the general options in the manifest:

# Graphene general options

# Graphene creates stacks of 256KB by default. It is not enough for SciPy/NumPy
# packages, e.g., libopenblas dependency assumes more than 512KB-sized stacks.
sys.stack.size = 2M

# SGX general options

# Set the virtual memory size of the SGX enclave. For SGX v1, the enclave
# size must be specified during signing. If Python needs more virtual memory
# than the enclave size, Graphene will not be able to allocate it.
sgx.enclave_size = 4G

# Set the maximum number of enclave threads. For SGX v1, the number of enclave
# TCSes must be specified during signing, so the application cannot use more
# threads than the number of TCSes. Note that Graphene also creates an internal
# thread for handling inter-process communication (IPC), and potentially another
# thread for asynchronous events. Therefore, the actual number of threads that
# the application can create is (sgx.thread_num - 2).
sgx.thread_num = 64

@Khallu
Copy link

Khallu commented Dec 21, 2019

It works with 32 as the thread no. But now I get this (after I added the tupletable and cputable as trusted files)

Unknown or illegal instruction at RIP 0x00000000ec829cec
Internal illegal fault at 0xec829cec (IP = 0xec829cec, VMID = 2284740226, TID = 1)
Unknown or illegal instruction at RIP 0x00000000ec829cec
Internal illegal fault at 0xec829cec (IP = 0xec829cec, VMID = 2284740226, TID = 1)
Unknown or illegal instruction at RIP 0x00000000ec829cec
Internal illegal fault at 0xec829cec (IP = 0xec829cec, VMID = 2284740226, TID = 1)
Unknown or illegal instruction at RIP 0x00000000ec829cec
Internal illegal fault at 0xec829cec (IP = 0xec829cec, VMID = 2284740226, TID = 1)
Unknown or illegal instruction at RIP 0x00000000ec829cec
Internal illegal fault at 0xec829cec (IP = 0xec829cec, VMID = 2284740226, TID = 1)
Unknown or illegal instruction at RIP 0x00000000ec829cec
Internal illegal fault at 0xec829cec (IP = 0xec829cec, VMID = 2284740226, TID = 1)
Unknown or illegal instruction at RIP 0x00000000ec829cec
Internal illegal fault at 0xec829cec (IP = 0xec829cec, VMID = 2284740226, TID = 1)
Unknown or illegal instruction at RIP 0x00000000ec829cec

@dimakuv
Copy link

dimakuv commented Dec 30, 2019

Now this looks like a bug in Graphene :)

Could you create a minimal test case? And attach the required Makefile + manifest.template to your comment? At this point, we need to debug and understand the root cause.

@Khallu
Copy link

Khallu commented Jan 3, 2020

Okay, I'll try to reproduce the issue and post the test case.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants