TonicAI · akamor · Nov 21, 2024 · Nov 21, 2024 · Nov 21, 2024 · Nov 21, 2024
diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -25,3 +25,5 @@ The quickstart provides information on how to install the SDK and set up an API
    quickstart/getting_started
    parse/index
    redact/index
+   redact/api
+   parse/api
diff --git a/docs/source/parse/index.rst b/docs/source/parse/index.rst
@@ -15,4 +15,3 @@ To learn more about how to use Textual to redact entities within text and files
    parsing_files
    pipelines
    working_with_parsed_output
-   api
diff --git a/docs/source/redact/index.rst b/docs/source/redact/index.rst
@@ -1,28 +1,39 @@
 Redact
 =============
 
-The Textual redact functionality allows you to identify entities in files, and then optionally redact/synthesize these entities to create a safe version of your unstructured text.  This functionality works on both raw strings and files, including PDF, DOCX, XLSX, and other formats.
+The Textual redact functionality allows you to identify entities in files, and then optionally tokenize/synthesize these entities to create a safe version of your unstructured text.  This functionality works on both raw strings and files, including PDF, DOCX, XLSX, and other formats.
 
 Before you can use these functions, read the :doc:`Getting started <../quickstart/getting_started>` guide and create an API key.
 
-Redacting strings
+Redacting Text
 -----------------
 
-To identify entities in a raw string, call the **redact** function.
+You can redact text directly in a variety of formats such as plain text, json, xml, and html.  All redaction requests return a response which includes the original text, redacted text, a list of found entities and their locations.  Additionally all redact functions allow you to specify which entities are tokenized and which are synthesized.
 
-.. code-block:: python
-
-    from tonic_textual.redact_api import TextualNer
+The common set of inputs to are redact functions are:
 
-    textual = TonicTextual("https://textual.tonic.ai")
-
-    raw_redaction = textual.redact("My name is John, and today I am demo-ing Textual, a software product created by Tonic")
+* **generator_default**
+   The default operation performed on an entity. The options are 'Redact', 'Synthesis', and 'Off'
+* **generator_config**
+   A dictionary whose keys are entity labels and values are how to redact the entity.  The options are 'Redact', 'Synthesis', and 'Off'.
+
+   Example: {'NAME_GIVEN': 'Synthesis'}
+* **label_allow_lists**
+   A dictionary whose keys are entity labels and values are lists of regexes.  If a piece of text matches a regex it is flagged as that entity type.
+
+   Example: {'HEALTHCARE_ID': [r'[a-zA-zZ]{3}\\d{6,}']
+* **label_block_lists**
+   A dictionary whose keys are entity labels and values are lists of regexes.  If a piece of text matches a regex it is ignored for that entity type.
+
+   Example: {'NUMERIC_VALUE': [r'\\d{3}']
 
-The response provides a list of identified entities, with information about each entity.
+The JSON and XML redact functions also have additional inputs which you can read about in their respective sections.
 
-It also returns a redacted string that replaces the found entities with tokens. You can configure how to handle each type of entities - whether to redact or synthesize them.
+.. toctree::
+   :hidden:
+   :maxdepth: 2
 
-To learn more about to redact raw strings, go to :doc:`Redacting text <redacting_text>`.
+   redacting_text
 
 Redacting files
 ---------------
@@ -51,6 +62,12 @@ To generated redacted/synthesized files:
 
 To learn more about how to generate redacted and synthesized files, go to :doc:`Redacting files <redacting_files>`.
 
+.. toctree::
+   :maxdepth: 2
+   :hidden:
+
+   redacting_files
+
 Working with datasets
 ---------------------
 
@@ -60,8 +77,7 @@ To help automate workflows, you can work with datasets directly from the SDK. To
 
 
 .. toctree::
-
-   redacting_text
-   redacting_files
+   :maxdepth: 2
+   :hidden:
+
    datasets
-   api
diff --git a/docs/source/redact/redacting_text.rst b/docs/source/redact/redacting_text.rst
@@ -1,6 +1,3 @@
-🅰 Text
-=========================
-
 Redact raw text
 ---------------
 To redact sensitive information from a text string, pass the string to the `redact` method:
@@ -18,7 +15,7 @@ This produces the following output:
 
 .. code-block:: console
 
-    My name is Alfonzo, and today I am demoing Textual, a software product created by New Ignition Worldwide
+    My name is [NAME_GIVEN_HI1h7], and [DATE_TIME_4hKfrH] I am demoing Textual, a software product created by [ORGANIZATION_P5XLAH]
     {
         "start": 11,
         "end": 15,
@@ -28,97 +25,94 @@ This produces the following output:
         "text": "John",
         "score": 0.9,
         "language": "en",
-        "new_text": "[NAME_GIVEN_dySb5]"
-    }
-    {
-        "start": 79,
-        "end": 84,
-        "new_start": 93,
-        "new_end": 114,
-        "label": "ORGANIZATION",
-        "text": "Tonic",
-        "score": 0.9,
-        "language": "en",
-        "new_text": "[ORGANIZATION_5Ve7OH]"
+        "new_text": "[NAME_GIVEN_HI1h7]"
     }
-
-Synthesize raw text
--------------------
-The following example passes the same string to the `redact` method, but sets some categories to `Synthesis`, which indicates to use realistic replacement values:
-
-.. code-block:: python
-
-    from tonic_textual.redact_api import TextualNer
-
-    textual = TextualNer()
-    generator_config = {"NAME_GIVEN":"Synthesis", "ORGANIZATION":"Synthesis"}
-    raw_synthesis = textual.redact(
-        "My name is John, and today I am demoing Textual, a software product created by Tonic", 
-        generator_config=generator_config)
-    print(raw_synthesis.describe())
-
-This produces the following output:
-
-.. code-block:: console
-
-    My name is Alfonzo, and today I am demoing Textual, a software product created by New Ignition Worldwide
     {
-        "start": 11,
-        "end": 15,
-        "new_start": 11,
-        "new_end": 18,
-        "label": "NAME_GIVEN",
-        "text": "John",
+        "start": 21,
+        "end": 26,
+        "new_start": 35,
+        "new_end": 53,
+        "label": "DATE_TIME",
+        "text": "today",
         "score": 0.9,
         "language": "en",
-        "new_text": "Alfonzo"
+        "new_text": "[DATE_TIME_4hKfrH]"
     }
     {
         "start": 79,
         "end": 84,
-        "new_start": 82,
-        "new_end": 104,
+        "new_start": 106,
+        "new_end": 127,
         "label": "ORGANIZATION",
         "text": "Tonic",
         "score": 0.9,
         "language": "en",
-        "new_text": "New Ignition Worldwide"
-    }          
+        "new_text": "[ORGANIZATION_P5XLAH]"
+    }
 
-Using LLM synthesis
--------------------
-The following example passes the same string to the `llm_synthesis` method:
+Bulk redact raw text
+---------------------
+In the same way that our `redact` method can be used to redact strings our `redact_bulk` method allows you to redact many strings at once.  Each string is individually redacted, meaning individual strings are fed into our model independently and cannot affect each other.  To redact sensitive information from a list of text strings, pass the list to the `redact_bulk` method:
 
 .. code-block:: python
 
     from tonic_textual.redact_api import TextualNer
 
     textual = TextualNer()
 
-    raw_synthesis = textual.llm_synthesis("My name is John, and today I am demoing Textual, a software product created by Tonic")
-    print(raw_synthesis.describe())
+    raw_redaction = textual.redact_bulk(["Tonic was founded in 2018", "John Smith is a person"])
+    print(raw_redaction.describe())
 
-This produces the following output:
+This produces the following output.  Note that the 'idx' property denotes the position in the original string list to which the result pertains.
 
 .. code-block:: console
 
-    My name is Matthew, and today I am demoing Textual, a software product created by Google.
-    {
-        "start": 11,
-        "end": 15,
-        "label": "NAME_GIVEN",
-        "text": "John",
-        "score": 0.9
-    }
-    {
-        "start": 79,
-        "end": 84,
-        "label": "ORGANIZATION",
-        "text": "Tonic",
-        "score": 0.9
-    }
-
-Note that LLM Synthesis is non-deterministic — you will likely get different results each time you run.
+[ORGANIZATION_5Ve7OH] was founded in [DATE_TIME_DnuC1]
+{
+  "start": 0,
+  "end": 5,
+  "new_start": 0,
+  "new_end": 21,
+  "label": "ORGANIZATION",
+  "text": "Tonic",
+  "score": 0.9,
+  "language": "en",
+  "new_text": "[ORGANIZATION_5Ve7OH]"
+}
+{
+  "start": 21,
+  "end": 25,
+  "new_start": 37,
+  "new_end": 54,
+  "label": "DATE_TIME",
+  "text": "2018",
+  "score": 0.9,
+  "language": "en",
+  "new_text": "[DATE_TIME_DnuC1]"
+}
+[NAME_GIVEN_dySb5] [NAME_FAMILY_7w4Db3] is a person
+{
+  "start": 0,
+  "end": 4,
+  "new_start": 0,
+  "new_end": 18,
+  "label": "NAME_GIVEN",
+  "text": "John",
+  "score": 0.9,
+  "language": "en",
+  "new_text": "[NAME_GIVEN_dySb5]"
+}
+{
+  "start": 5,
+  "end": 10,
+  "new_start": 19,
+  "new_end": 39,
+  "label": "NAME_FAMILY",
+  "text": "Smith",
+  "score": 0.9,
+  "language": "en",
+  "new_text": "[NAME_FAMILY_7w4Db3]"
+}
 
 Redact JSON data
 ----------------
@@ -220,3 +214,83 @@ To redact sensitive information from HTML, pass the HTML document string to the
     xml_redaction = textual.redact_html(html_content)
 
 The response includes entity level information, including the XPATH at which the sensitive entity is found. The start and end positions are relative to the beginning of thhe XPATH location where the entity is found.
+
+Choosing tokenization or synthesis  raw text
+----------------------------------------------
+You can choose whether a given entitiy is synthesized or tokenized.  By default all entities are tokenized.  You can specify which entities you wish to synthesize/tokenize by using the `generator_config` parameter.  This works the same for all of our `redact` functions.
+
+The following example passes the same string to the `redact` method, but sets some entities to `Synthesis`, which indicates to use realistic replacement values:
+
+.. code-block:: python
+
+    from tonic_textual.redact_api import TextualNer
+
+    textual = TextualNer()
+    generator_config = {"NAME_GIVEN":"Synthesis", "ORGANIZATION":"Synthesis"}
+    raw_synthesis = textual.redact(
+        "My name is John, and today I am demoing Textual, a software product created by Tonic", 
+        generator_config=generator_config)
+    print(raw_synthesis.describe())
+
+This produces the following output:
+
+.. code-block:: console
+
+    My name is Alfonzo, and today I am demoing Textual, a software product created by New Ignition Worldwide
+    {
+        "start": 11,
+        "end": 15,
+        "new_start": 11,
+        "new_end": 18,
+        "label": "NAME_GIVEN",
+        "text": "John",
+        "score": 0.9,
+        "language": "en",
+        "new_text": "Alfonzo"
+    }
+    {
+        "start": 79,
+        "end": 84,
+        "new_start": 82,
+        "new_end": 104,
+        "label": "ORGANIZATION",
+        "text": "Tonic",
+        "score": 0.9,
+        "language": "en",
+        "new_text": "New Ignition Worldwide"
+    }          
+
+Using LLM synthesis
+-------------------
+The following example passes the same string to the `llm_synthesis` method:
+
+.. code-block:: python
+
+    from tonic_textual.redact_api import TextualNer
+
+    textual = TextualNer()
+
+    raw_synthesis = textual.llm_synthesis("My name is John, and today I am demoing Textual, a software product created by Tonic")
+    print(raw_synthesis.describe())
+
+This produces the following output:
+
+.. code-block:: console
+
+    My name is Matthew, and today I am demoing Textual, a software product created by Google.
+    {
+        "start": 11,
+        "end": 15,
+        "label": "NAME_GIVEN",
+        "text": "John",
+        "score": 0.9
+    }
+    {
+        "start": 79,
+        "end": 84,
+        "label": "ORGANIZATION",
+        "text": "Tonic",
+        "score": 0.9
+    }
+
+Note that LLM Synthesis is non-deterministic — you will likely get different results each time you run.