Skip to content

Commit

Permalink
docs: document avaiable security measures (#2270)
Browse files Browse the repository at this point in the history
docs: document available security measures

Several security measures can be used to mitigate risk when processing
potentially malicious input.

This change adds documentation about available security measures and
examples and tests that illustrate their usage.
  • Loading branch information
aucampia authored Mar 16, 2023
1 parent 60d98db commit 1c25676
Show file tree
Hide file tree
Showing 13 changed files with 719 additions and 15 deletions.
16 changes: 16 additions & 0 deletions docs/apidocs/examples.rst
Original file line number Diff line number Diff line change
Expand Up @@ -115,3 +115,19 @@ These examples all live in ``./examples`` in the source-distribution of RDFLib.
:undoc-members:
:show-inheritance:

:mod:`~examples.secure_with_audit` Module
-----------------------------------------

.. automodule:: examples.secure_with_audit
:members:
:undoc-members:
:show-inheritance:


:mod:`~examples.secure_with_urlopen` Module
-------------------------------------------

.. automodule:: examples.secure_with_urlopen
:members:
:undoc-members:
:show-inheritance:
13 changes: 13 additions & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,18 @@ RDFLib is a pure Python package for working with `RDF <http://www.w3.org/RDF/>`_

* both Queries and Updates are supported

.. caution::

RDFLib is designed to access arbitrary network and file resources, in some
cases these are directly requested resources, in other cases they are
indirectly referenced resources.

If you are using RDFLib to process untrusted documents or queries you should
take measures to restrict file and network access.

For information on available security measures, see the RDFLib
:doc:`Security Considerations </security_considerations>`
documentation.

Getting started
---------------
Expand Down Expand Up @@ -56,6 +68,7 @@ If you are familiar with RDF and are looking for details on how RDFLib handles i
merging
upgrade5to6
upgrade4to5
security_considerations


Reference
Expand Down
113 changes: 113 additions & 0 deletions docs/security_considerations.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
.. _security_considerations: Security Considerations

=======================
Security Considerations
=======================

RDFLib is designed to access arbitrary network and file resources, in some cases
these are directly requested resources, in other cases they are indirectly
referenced resources.

An example of where indirect resources are access is JSON-LD processing, where
network or file resources referenced by ``@context`` values will be loaded and
processed.

RDFLib also supports SPARQL, which has federated query capabilities that allow
queries to query arbitrary remote endpoints.

If you are using RDFLib to process untrusted documents or queries you should
take measures to restrict file and network access.

Some measures that can be taken to restrict file and network access are:

* `Operating System Security Measures`_.
* `Python Runtime Audit Hooks`_.
* `Custom URL Openers`_.

Of these, operating system security measures are recommended. The other
measures work, but they are not as effective as operating system security
measures, and even if they are used they should be used in conjunction with
operating system security measures.

Operating System Security Measures
==================================

Most operating systems provide functionality that can be used to restrict
network and file access of a process.

Some examples of these include:

* `Open Container Initiative (OCI) Containers
<https://www.opencontainers.org/>`_ (aka Docker containers).

Most OCI runtimes provide mechanisms to restrict network and file access of
containers. For example, using Docker, you can limit your container to only
being access files explicitly mapped into the container and only access the
network through a firewall. For more information refer to the
documentation of the tool you use to manage your OCI containers:

* `Kubernetes <https://kubernetes.io/docs/home/>`_
* `Docker <https://docs.docker.com/>`_
* `Podman <https://podman.io/>`_

* `firejail <https://firejail.wordpress.com/>`_ can be used to
sandbox a process on Linux and restrict its network and file access.

* File and network access restrictions.

Most operating systems provide a way to restrict operating system users to
only being able to access files and network resources that are explicitly
allowed. Applications that process untrusted input could be run as a user with
these restrictions in place.

Many other measures are available, however, listing them outside the scope
of this document.

Of the listed measures OCI containers are recommended. In most cases, OCI
containers are constrained by default and can't access the loopback interface
and can only access files that are explicitly mapped into the container.

Python Runtime Audit Hooks
==========================

From Python 3.8 onwards, Python provides a mechanism to install runtime audit
hooks that can be used to limit access to files and network resources.

The runtime audit hook system is described in more detail in `PEP 578 – Python
Runtime Audit Hooks <https://peps.python.org/pep-0578/>`_.

Runtime audit hooks can be installed using the `sys.addaudithook
<https://docs.python.org/3/library/sys.html#sys.addaudithook>`_ function, and
will then get called when audit events occur. The audit events raised by the
Python runtime and standard library are described in Python's `audit events
table <https://docs.python.org/3/library/audit_events.html>`_.

RDFLib uses `urllib.request.urlopen` for HTTP, HTTPS and other network access,
and this function raises a ``urllib.Request`` audit event. For file access,
RDFLib uses `open`, which raises an ``open`` audit event.

Users of RDFLib can install audit hooks that react to these audit events and
raises an exception when an attempt is made to access files or network resources
that are not explicitly allowed.

RDFLib's test suite includes tests which verify that audit hooks can block
access to network and file resources.

RDFLib also includes an example that shows how runtime audit hooks can be
used to restrict network and file access in :mod:`~examples.secure_with_audit`.

Custom URL Openers
==================

RDFLib uses the `urllib.request.urlopen` for HTTP, HTTPS and other network
access. This function will use a `urllib.request.OpenerDirector` installed with
`urllib.request.install_opener` to open the URLs.

Users of RDFLib can install a custom URL opener that raise an exception when an
attempt is made to access network resources that are not explicitly allowed.

RDFLib's test suite includes tests which verify that custom URL openers can be
used to block access to network resources.

RDFLib also includes an example that shows how a custom opener can be used to
restrict network access in :mod:`~examples.secure_with_urlopen`.
120 changes: 120 additions & 0 deletions examples/secure_with_audit.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
"""
This example demonstrates how to use `Python audit hooks
<https://docs.python.org/3/library/sys.html#sys.addaudithook>`_ to block access
to files and URLs.
It installs a audit hook with `sys.addaudithook <https://docs.python.org/3/library/sys.html#sys.addaudithook>`_ that blocks access to files and
URLs that end with ``blocked.jsonld``.
The code in the example then verifies that the audit hook is blocking access to
URLs and files as expected.
"""

import logging
import os
import sys
from typing import Any, Optional, Tuple

from rdflib import Graph


def audit_hook(name: str, args: Tuple[Any, ...]) -> None:
"""
An audit hook that blocks access when an attempt is made to open a
file or URL that ends with ``blocked.jsonld``.
Details of the audit events can be seen in the `audit events
table <https://docs.python.org/3/library/audit_events.html>`_.
:param name: The name of the audit event.
:param args: The arguments of the audit event.
:return: `None` if the audit hook does not block access.
:raises PermissionError: If the file or URL being accessed ends with ``blocked.jsonld``.
"""
if name == "urllib.Request" and args[0].endswith("blocked.jsonld"):
raise PermissionError("Permission denied for URL")
if name == "open" and args[0].endswith("blocked.jsonld"):
raise PermissionError("Permission denied for file")
return None


def main() -> None:
"""
The main code of the example.
The important steps are:
* Install a custom audit hook that blocks some URLs and files.
* Attempt to parse a JSON-LD document that will result in a blocked URL being accessed.
* Verify that the audit hook blocked access to the URL.
* Attempt to parse a JSON-LD document that will result in a blocked file being accessed.
* Verify that the audit hook blocked access to the file.
"""

logging.basicConfig(
level=os.environ.get("PYTHON_LOGGING_LEVEL", logging.INFO),
stream=sys.stderr,
datefmt="%Y-%m-%dT%H:%M:%S",
format=(
"%(asctime)s.%(msecs)03d %(process)d %(thread)d %(levelno)03d:%(levelname)-8s "
"%(name)-12s %(module)s:%(lineno)s:%(funcName)s %(message)s"
),
)

if sys.version_info < (3, 8):
logging.warn("This example requires Python 3.8 or higher")
return None

# Install the audit hook
#
# note on type error: This is needed because we are running mypy with python
# 3.7 mode, so mypy thinks the previous condition will always be true.
sys.addaudithook(audit_hook) # type: ignore[unreachable]

graph = Graph()

# Attempt to parse a JSON-LD document that will result in the blocked URL
# being accessed.
error: Optional[PermissionError] = None
try:
graph.parse(
data=r"""{
"@context": "http://example.org/blocked.jsonld",
"@id": "example:subject",
"example:predicate": { "@id": "example:object" }
}""",
format="json-ld",
)
except PermissionError as caught:
logging.info("Permission denied: %s", caught)
error = caught

# `Graph.parse` would have resulted in a `PermissionError` being raised from
# the audit hook.
assert isinstance(error, PermissionError)
assert error.args[0] == "Permission denied for URL"

# Attempt to parse a JSON-LD document that will result in the blocked file
# being accessed.
error = None
try:
graph.parse(
data=r"""{
"@context": "file:///srv/blocked.jsonld",
"@id": "example:subject",
"example:predicate": { "@id": "example:object" }
}""",
format="json-ld",
)
except PermissionError as caught:
logging.info("Permission denied: %s", caught)
error = caught

# `Graph.parse` would have resulted in a `PermissionError` being raised from
# the audit hook.
assert isinstance(error, PermissionError)
assert error.args[0] == "Permission denied for file"


if __name__ == "__main__":
main()
82 changes: 82 additions & 0 deletions examples/secure_with_urlopen.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
"""
This example demonstrates how to use a custom global URL opener installed with `urllib.request.install_opener` to block access to URLs.
"""
import http.client
import logging
import os
import sys
from typing import Optional
from urllib.request import HTTPHandler, OpenerDirector, Request, install_opener

from rdflib import Graph


class SecuredHTTPHandler(HTTPHandler):
"""
A HTTP handler that blocks access to URLs that end with "blocked.jsonld".
"""

def http_open(self, req: Request) -> http.client.HTTPResponse:
"""
Block access to URLs that end with "blocked.jsonld".
:param req: The request to open.
:return: The response.
:raises PermissionError: If the URL ends with "blocked.jsonld".
"""
if req.get_full_url().endswith("blocked.jsonld"):
raise PermissionError("Permission denied for URL")
return super().http_open(req)


def main() -> None:
"""
The main code of the example.
The important steps are:
* Install a custom global URL opener that blocks some URLs.
* Attempt to parse a JSON-LD document that will result in a blocked URL being accessed.
* Verify that the URL opener blocked access to the URL.
"""

logging.basicConfig(
level=os.environ.get("PYTHON_LOGGING_LEVEL", logging.INFO),
stream=sys.stderr,
datefmt="%Y-%m-%dT%H:%M:%S",
format=(
"%(asctime)s.%(msecs)03d %(process)d %(thread)d %(levelno)03d:%(levelname)-8s "
"%(name)-12s %(module)s:%(lineno)s:%(funcName)s %(message)s"
),
)

opener = OpenerDirector()
opener.add_handler(SecuredHTTPHandler())
install_opener(opener)

graph = Graph()

# Attempt to parse a JSON-LD document that will result in the blocked URL
# being accessed.
error: Optional[PermissionError] = None
try:
graph.parse(
data=r"""{
"@context": "http://example.org/blocked.jsonld",
"@id": "example:subject",
"example:predicate": { "@id": "example:object" }
}""",
format="json-ld",
)
except PermissionError as caught:
logging.info("Permission denied: %s", caught)
error = caught

# `Graph.parse` would have resulted in a `PermissionError` being raised from
# the url opener.
assert isinstance(error, PermissionError)
assert error.args[0] == "Permission denied for URL"


if __name__ == "__main__":
main()
Loading

0 comments on commit 1c25676

Please sign in to comment.