docs: document avaiable security measures (#2270)

docs: document available security measures Several security measures can be used to mitigate risk when processing potentially malicious input. This change adds documentation about available security measures and examples and tests that illustrate their usage.
RDFLib · Mar 16, 2023 · 1c25676 · 1c25676
1 parent 60d98db
commit 1c25676
Show file tree

Hide file tree

Showing 13 changed files with 719 additions and 15 deletions.
diff --git a/docs/apidocs/examples.rst b/docs/apidocs/examples.rst
@@ -115,3 +115,19 @@ These examples all live in ``./examples`` in the source-distribution of RDFLib.
     :undoc-members:
     :show-inheritance:
 
+:mod:`~examples.secure_with_audit` Module
+-----------------------------------------
+
+.. automodule:: examples.secure_with_audit
+    :members:
+    :undoc-members:
+    :show-inheritance:
+
+
+:mod:`~examples.secure_with_urlopen` Module
+-------------------------------------------
+
+.. automodule:: examples.secure_with_urlopen
+    :members:
+    :undoc-members:
+    :show-inheritance:
diff --git a/docs/index.rst b/docs/index.rst
@@ -26,6 +26,18 @@ RDFLib is a pure Python package for working with `RDF <http://www.w3.org/RDF/>`_
 
   * both Queries and Updates are supported
 
+.. caution::
+
+   RDFLib is designed to access arbitrary network and file resources, in some
+   cases these are directly requested resources, in other cases they are
+   indirectly referenced resources.
+
+   If you are using RDFLib to process untrusted documents or queries you should
+   take measures to restrict file and network access.
+
+   For information on available security measures, see the RDFLib
+   :doc:`Security Considerations </security_considerations>`
+   documentation.
 
 Getting started
 ---------------
@@ -56,6 +68,7 @@ If you are familiar with RDF and are looking for details on how RDFLib handles i
    merging
    upgrade5to6
    upgrade4to5
+   security_considerations
 
 
 Reference

diff --git a/docs/security_considerations.rst b/docs/security_considerations.rst
@@ -0,0 +1,113 @@
+.. _security_considerations: Security Considerations
+
+=======================
+Security Considerations
+=======================
+
+RDFLib is designed to access arbitrary network and file resources, in some cases
+these are directly requested resources, in other cases they are indirectly
+referenced resources.
+
+An example of where indirect resources are access is JSON-LD processing, where
+network or file resources referenced by ``@context`` values will be loaded and
+processed.
+
+RDFLib also supports SPARQL, which has federated query capabilities that allow
+queries to query arbitrary remote endpoints.
+
+If you are using RDFLib to process untrusted documents or queries you should
+take measures to restrict file and network access.
+
+Some measures that can be taken to restrict file and network access are:
+
+* `Operating System Security Measures`_.
+* `Python Runtime Audit Hooks`_.
+* `Custom URL Openers`_.
+
+Of these, operating system security measures are recommended. The other
+measures work, but they are not as effective as operating system security
+measures, and even if they are used they should be used in conjunction with
+operating system security measures.
+
+Operating System Security Measures
+==================================
+
+Most operating systems provide functionality that can be used to restrict
+network and file access of a process.
+
+Some examples of these include:
+
+* `Open Container Initiative (OCI) Containers
+  <https://www.opencontainers.org/>`_ (aka Docker containers).
+
+  Most OCI runtimes provide mechanisms to restrict network and file access of
+  containers. For example, using Docker, you can limit your container to only
+  being access files explicitly mapped into the container and only access the
+  network through a firewall. For more information refer to the
+  documentation of the tool you use to manage your OCI containers:
+
+  * `Kubernetes <https://kubernetes.io/docs/home/>`_
+  * `Docker <https://docs.docker.com/>`_
+  * `Podman <https://podman.io/>`_
+
+* `firejail <https://firejail.wordpress.com/>`_ can be used to
+  sandbox a process on Linux and restrict its network and file access.
+
+* File and network access restrictions.
+
+  Most operating systems provide a way to restrict operating system users to
+  only being able to access files and network resources that are explicitly
+  allowed. Applications that process untrusted input could be run as a user with
+  these restrictions in place.
+
+Many other measures are available, however, listing them outside the scope
+of this document.
+
+Of the listed measures OCI containers are recommended. In most cases, OCI
+containers are constrained by default and can't access the loopback interface
+and can only access files that are explicitly mapped into the container.
+
+Python Runtime Audit Hooks
+==========================
+
+From Python 3.8 onwards, Python provides a mechanism to install runtime audit
+hooks that can be used to limit access to files and network resources.
+
+The runtime audit hook system is described in more detail in `PEP 578 – Python
+Runtime Audit Hooks <https://peps.python.org/pep-0578/>`_.
+
+Runtime audit hooks can be installed using the `sys.addaudithook
+<https://docs.python.org/3/library/sys.html#sys.addaudithook>`_ function, and
+will then get called when audit events occur. The audit events raised by the
+Python runtime and standard library are described in Python's `audit events
+table <https://docs.python.org/3/library/audit_events.html>`_.
+
+RDFLib uses `urllib.request.urlopen` for HTTP, HTTPS and other network access,
+and this function raises a ``urllib.Request`` audit event. For file access,
+RDFLib uses `open`, which raises an ``open`` audit event.
+
+Users of RDFLib can install audit hooks that react to these audit events and
+raises an exception when an attempt is made to access files or network resources
+that are not explicitly allowed.
+
+RDFLib's test suite includes tests which verify that audit hooks can block
+access to network and file resources.
+
+RDFLib also includes an example that shows how runtime audit hooks can be
+used to restrict network and file access in :mod:`~examples.secure_with_audit`.
+
+Custom URL Openers
+==================
+
+RDFLib uses the `urllib.request.urlopen` for HTTP, HTTPS and other network
+access. This function will use a `urllib.request.OpenerDirector` installed with
+`urllib.request.install_opener` to open the URLs.
+
+Users of RDFLib can install a custom URL opener that raise an exception when an
+attempt is made to access network resources that are not explicitly allowed.
+
+RDFLib's test suite includes tests which verify that custom URL openers can be
+used to block access to network resources.
+
+RDFLib also includes an example that shows how a custom opener can be used to
+restrict network access in :mod:`~examples.secure_with_urlopen`.
diff --git a/examples/secure_with_audit.py b/examples/secure_with_audit.py
@@ -0,0 +1,120 @@
+"""
+This example demonstrates how to use `Python audit hooks
+<https://docs.python.org/3/library/sys.html#sys.addaudithook>`_ to block access
+to files and URLs.
+
+It installs a audit hook with `sys.addaudithook <https://docs.python.org/3/library/sys.html#sys.addaudithook>`_ that blocks access to files and
+URLs that end with ``blocked.jsonld``.
+
+The code in the example then verifies that the audit hook is blocking access to
+URLs and files as expected.
+"""
+
+import logging
+import os
+import sys
+from typing import Any, Optional, Tuple
+
+from rdflib import Graph
+
+
+def audit_hook(name: str, args: Tuple[Any, ...]) -> None:
+    """
+    An audit hook that blocks access when an attempt is made to open a
+    file or URL that ends with ``blocked.jsonld``.
+
+    Details of the audit events can be seen in the `audit events
+    table <https://docs.python.org/3/library/audit_events.html>`_.
+
+    :param name: The name of the audit event.
+    :param args: The arguments of the audit event.
+    :return: `None` if the audit hook does not block access.
+    :raises PermissionError: If the file or URL being accessed ends with ``blocked.jsonld``.
+    """
+    if name == "urllib.Request" and args[0].endswith("blocked.jsonld"):
+        raise PermissionError("Permission denied for URL")
+    if name == "open" and args[0].endswith("blocked.jsonld"):
+        raise PermissionError("Permission denied for file")
+    return None
+
+
+def main() -> None:
+    """
+    The main code of the example.
+
+    The important steps are:
+
+    * Install a custom audit hook that blocks some URLs and files.
+    * Attempt to parse a JSON-LD document that will result in a blocked URL being accessed.
+    * Verify that the audit hook blocked access to the URL.
+    * Attempt to parse a JSON-LD document that will result in a blocked file being accessed.
+    * Verify that the audit hook blocked access to the file.
+    """
+
+    logging.basicConfig(
+        level=os.environ.get("PYTHON_LOGGING_LEVEL", logging.INFO),
+        stream=sys.stderr,
+        datefmt="%Y-%m-%dT%H:%M:%S",
+        format=(
+            "%(asctime)s.%(msecs)03d %(process)d %(thread)d %(levelno)03d:%(levelname)-8s "
+            "%(name)-12s %(module)s:%(lineno)s:%(funcName)s %(message)s"
+        ),
+    )
+
+    if sys.version_info < (3, 8):
+        logging.warn("This example requires Python 3.8 or higher")
+        return None
+
+    # Install the audit hook
+    #
+    # note on type error: This is needed because we are running mypy with python
+    # 3.7 mode, so mypy thinks the previous condition will always be true.
+    sys.addaudithook(audit_hook)  # type: ignore[unreachable]
+
+    graph = Graph()
+
+    # Attempt to parse a JSON-LD document that will result in the blocked URL
+    # being accessed.
+    error: Optional[PermissionError] = None
+    try:
+        graph.parse(
+            data=r"""{
+            "@context": "http://example.org/blocked.jsonld",
+            "@id": "example:subject",
+            "example:predicate": { "@id": "example:object" }
+        }""",
+            format="json-ld",
+        )
+    except PermissionError as caught:
+        logging.info("Permission denied: %s", caught)
+        error = caught
+
+    # `Graph.parse` would have resulted in a `PermissionError` being raised from
+    # the audit hook.
+    assert isinstance(error, PermissionError)
+    assert error.args[0] == "Permission denied for URL"
+
+    # Attempt to parse a JSON-LD document that will result in the blocked file
+    # being accessed.
+    error = None
+    try:
+        graph.parse(
+            data=r"""{
+            "@context": "file:///srv/blocked.jsonld",
+            "@id": "example:subject",
+            "example:predicate": { "@id": "example:object" }
+        }""",
+            format="json-ld",
+        )
+    except PermissionError as caught:
+        logging.info("Permission denied: %s", caught)
+        error = caught
+
+    # `Graph.parse` would have resulted in a `PermissionError` being raised from
+    # the audit hook.
+    assert isinstance(error, PermissionError)
+    assert error.args[0] == "Permission denied for file"
+
+
+if __name__ == "__main__":
+    main()
diff --git a/examples/secure_with_urlopen.py b/examples/secure_with_urlopen.py
@@ -0,0 +1,82 @@
+"""
+This example demonstrates how to use a custom global URL opener installed with `urllib.request.install_opener` to block access to URLs.
+"""
+import http.client
+import logging
+import os
+import sys
+from typing import Optional
+from urllib.request import HTTPHandler, OpenerDirector, Request, install_opener
+
+from rdflib import Graph
+
+
+class SecuredHTTPHandler(HTTPHandler):
+    """
+    A HTTP handler that blocks access to URLs that end with "blocked.jsonld".
+    """
+
+    def http_open(self, req: Request) -> http.client.HTTPResponse:
+        """
+        Block access to URLs that end with "blocked.jsonld".
+
+        :param req: The request to open.
+        :return: The response.
+        :raises PermissionError: If the URL ends with "blocked.jsonld".
+        """
+        if req.get_full_url().endswith("blocked.jsonld"):
+            raise PermissionError("Permission denied for URL")
+        return super().http_open(req)
+
+
+def main() -> None:
+    """
+    The main code of the example.
+
+    The important steps are:
+
+    * Install a custom global URL opener that blocks some URLs.
+    * Attempt to parse a JSON-LD document that will result in a blocked URL being accessed.
+    * Verify that the URL opener blocked access to the URL.
+    """
+
+    logging.basicConfig(
+        level=os.environ.get("PYTHON_LOGGING_LEVEL", logging.INFO),
+        stream=sys.stderr,
+        datefmt="%Y-%m-%dT%H:%M:%S",
+        format=(
+            "%(asctime)s.%(msecs)03d %(process)d %(thread)d %(levelno)03d:%(levelname)-8s "
+            "%(name)-12s %(module)s:%(lineno)s:%(funcName)s %(message)s"
+        ),
+    )
+
+    opener = OpenerDirector()
+    opener.add_handler(SecuredHTTPHandler())
+    install_opener(opener)
+
+    graph = Graph()
+
+    # Attempt to parse a JSON-LD document that will result in the blocked URL
+    # being accessed.
+    error: Optional[PermissionError] = None
+    try:
+        graph.parse(
+            data=r"""{
+            "@context": "http://example.org/blocked.jsonld",
+            "@id": "example:subject",
+            "example:predicate": { "@id": "example:object" }
+        }""",
+            format="json-ld",
+        )
+    except PermissionError as caught:
+        logging.info("Permission denied: %s", caught)
+        error = caught
+
+    # `Graph.parse` would have resulted in a `PermissionError` being raised from
+    # the url opener.
+    assert isinstance(error, PermissionError)
+    assert error.args[0] == "Permission denied for URL"
+
+
+if __name__ == "__main__":
+    main()