diff --git a/docs/architecture-decisions/0009-define-categorization-architecture.md b/docs/architecture-decisions/0009-define-categorization-architecture.md
new file mode 100644
index 0000000..abb3d51
--- /dev/null
+++ b/docs/architecture-decisions/0009-define-categorization-architecture.md
@@ -0,0 +1,427 @@
+# 9. Define categorization architecture
+Date: 2024-09-06
+## Status
+## Context
+We need to define the data model and workflow for TACOS and its users to place search terms into categories. This
+includes a discussion about how those categories themselves will be represented (and what they are), and how existing
+structures like Detectors contribute to that categorization activity.
+A future decision, which should be considered now although not yet resolved, is how to enable users to validate these
+categorization actions.
+### The relationship between Terms, Detectors, and Categories
+At a very high level, TACOS works according to the following flowchart:
+flowchart LR
+ Terms
+ Detectors
+ Categories
+ Terms -- are evaluated by --> Detectors
+ Detectors <-- are mapped to --> Categories
+ Categories -- get linked with --> Terms
+Search terms are received from a contributing system, and are evaluated by a set of Detectors which look for specific
+patterns. Those Detectors are mapped to one or more Categories. As a result of these detections and their relationship
+with each category, TACOS is able to calculate the strength of the link between each term and category.
+The decision being documented here is how we achieve this relationship.
+## Options considered
+We evaluated multiple ways of implementing these relationships through prototyping, diagramming, and extensive
+discussions. Each are documented here.
+Each of the options described below uses the same graphic language:
+* Terms, which flow in continuously with Search Events;
+* A knowledge graph, which includes the categories, detectors, and relationships
+ between the two which TACOS defines and maintains, and which is consulted during categorization; and
+* The linkages between these terms and the graph, which record which signals are
+ detected in each term, and how those signals are interpreted to place the term into a category.
+A simple way to describe the Categorization workflow would be to say that Categorization involves populating the blue
+tables in the diagrams below.
+### Prototype Zero
+The simplest option to relate these elements is a single three-way join model, which would have pointers back to each
+of the Term, Detector, and Category models.
+ direction TB
+ Term <-- Link
+ Detector <-- Link
+ Category <-- Link
+ class Term
+ Term: +Integer id
+ Term: +String phrase
+ class Category
+ Category: +Integer id
+ Category: +String name
+ class Link:::styleClass
+ Link: +Integer id
+ Link: +Integer term_id
+ Link: +Integer category_id
+ Link: +Integer detector_id
+ class Detector
+ Detector: +Integer id
+ Detector: +String name
+ style Term fill:#000,stroke:#66c2a5,color:#66c2a5,stroke-width:4px;
+ style Category fill:#000,stroke:#fc8d62,color:#fc8d62
+ style Detector fill:#000,stroke:#fc8d62,color:#fc8d62
+ style Link fill:#000,stroke:#8da0cb,color:#8da0cb,stroke-dasharray: 3 5;
+This option was rejected almost immediately because it does not allow for enough flexibility and would spawn far too
+many extraneous records.
+### Prototype A
+The "A" prototype defined its linking records in two large models. The `Detection` model would record the relationship
+between every `Term` and each detector in the application, with a field for each output. The `Categorization` model
+would then build upon those detections, with a field for a calculated score according to each category. The category
+with the highest score would finally be stored in the `Term` model for better performance.
+The knowledge graph in this prototype would be comparatively sparse, with models for each lookup-style detector. The
+relationships between detectors and categories would be defined directly within methods in the `Categorization` model.
+ direction LR
+ Term --< Detection: has many
+ Detection <-- Categorization: based on
+ Categorization --> SuggestedResource: looks up
+ Detection --> SuggestedResource: looks up
+ Detection --> Journal: looks up
+ class Term
+ Term: +Integer id
+ Term: +String phrase
+ Term: +Enum category
+ class SuggestedResource
+ SuggestedResource: +Integer id
+ SuggestedResource: +String title
+ SuggestedResource: +String url
+ SuggestedResource: +String phrase
+ SuggestedResource: +String fingerprint
+ SuggestedResource: +Enum category
+ SuggestedResource: calculateFingerprint()
+ class Journal
+ Journal: +Integer id
+ Journal: +String title
+ class Detection
+ Detection: +Integer id
+ Detection: +Integer term_id
+ Detection: +Integer detector_version
+ Detection: +Boolean DOI
+ Detection: +Boolean ISBN
+ Detection: +Boolean ISSN
+ Detection: +Boolean PMID
+ Detection: +Boolean Journal
+ Detection: +Boolean SuggestedResource
+ Detection: initialize()
+ Detection: setDetectionVersion()
+ Detection: recordDetections()
+ Detection: recordPatterns()
+ Detection: recordJournals()
+ Detection: recordSuggestedResource()
+ class Categorization
+ Categorization: +Integer id
+ Categorization: +Integer detection_id
+ Categorization: +Float information_score
+ Categorization: +Float navigation_score
+ Categorization: +Float transaction_score
+ Categorization: initialize()
+ Categorization: assign()
+ Categorization: evaluate()
+ Categorization: calculateAll()
+ Categorization: calculateInformation()
+ Categorization: calculateNavigation()
+ Categorization: calculateTransaction()
+ style Term fill:#000,stroke:#66c2a5,color:#66c2a5,stroke-width:4px;
+ style Category fill:#000,stroke:#fc8d62,color:#fc8d62
+ style Detector fill:#000,stroke:#fc8d62,color:#fc8d62
+ style Journal fill:#000,stroke:#fc8d62,color:#fc8d62
+ style SuggestedResource fill:#000,stroke:#fc8d62,color:#fc8d62
+ style Detection fill:#000,stroke:#8da0cb,color:#8da0cb,stroke-dasharray: 3 5;
+ style Categorization fill:#000,stroke:#8da0cb,color:#8da0cb,stroke-dasharray: 3 5;
+A benefit of this prototype is that the `Detection` and `Categorization` models would be very intuitive to work with,
+and allow for repeated classification as our application evolves. Querying these models from the controller level would
+be very simple.
+An area of uncertainty in this prototype was how to calculate confidence values and categorization scores for each
+detector and category. We discussed multiple options for this question, but ultimately did not decide on a single
+### Prototype B
+The "B" prototype makes a different choice for recording both the knowledge graph, and the linkages to the terms flowing
+into the application. The knowledge graph is more explicitly modeled in the database, with models for `Category`,
+`Detectinator`, and the `DetectinatorCategory` model which maps between the two.
+Because each of these records are now separate entries, this prototype further breaks up the large models for detection
+and categorization outputs. The detection result is spread across multiple records in the `TermDetectinator` and
+`TermSuggestedResource` models. The final categorization process is also recorded in multiple `TermCategory` records.
+Because of this dispersion of information across multiple records, the methods needed to do the work end up being
+defined in the `Term` model - shown here as methods like `evaluate_detectinators()` and `categorize()`.
+ direction LR
+ Term >-- TermDetectinator
+ TermDetectinator --> Detectinator
+ Category <-- DetectinatorCategory
+ DetectinatorCategory --> Detectinator
+ Term --> TermCategory
+ TermCategory <-- Category
+ SuggestedResource --> Category
+ Term <-- TermSuggestedResource
+ TermSuggestedResource --> SuggestedResource
+ class Term
+ Term: +Integer id
+ Term: +String phrase
+ Term: categorize()
+ Term: evaluate_detectinators()
+ Term: evaluate_identifiers()
+ Term: evaluate_journals()
+ Term: evaluate_suggested_resources()
+ class TermDetectinator
+ TermDetectinator: +Integer term_id
+ TermDetectinator: +Integer detector_id
+ TermDetectinator: +Boolean result
+ class Detectinator
+ Detectinator: +Integer id
+ Detectinator: +String name
+ Detectinator: +Float confidence
+ class Category
+ Category: +Integer id
+ Category: +String name
+ Category: +String note
+ class DetectinatorCategory
+ DetectinatorCategory: +Integer detectinator_id
+ DetectinatorCategory: +Integer category_id
+ DetectinatorCategory: +Float confidence
+ class TermCategory
+ TermCategory: +Integer term_id
+ TermCategory: +Integer category_id
+ TermCategory: +Float confidence
+ TermCategory: +Integer user_id
+ class SuggestedResource
+ SuggestedResource: +Integer id
+ SuggestedResource: +String title
+ SuggestedResource: +String fingerprint
+ SuggestedResource: +URL url
+ SuggestedResource: +Integer category_id
+ class TermSuggestedResource
+ TermSuggestedResource: +Integer term_id
+ TermSuggestedResource: +Integer suggested_resource_id
+ TermSuggestedResource: +Boolean result
+ style Term fill:#000,stroke:#66c2a5,color:#66c2a5,stroke-width:4px;
+ style Category fill:#000,stroke:#fc8d62,color:#fc8d62
+ style Detectinator fill:#000,stroke:#fc8d62,color:#fc8d62
+ style DetectinatorCategory fill:#000,stroke:#fc8d62,color:#fc8d62
+ style SuggestedResource fill:#000,stroke:#fc8d62,color:#fc8d62
+ style TermDetectinator fill:#000,stroke:#8da0cb,color:#8da0cb,stroke-dasharray: 3 5;
+ style TermSuggestedResource fill:#000,stroke:#8da0cb,color:#8da0cb,stroke-dasharray: 3 5;
+ style TermCategory fill:#000,stroke:#8da0cb,color:#8da0cb,stroke-dasharray: 3 5;
+One immediate advantage of this approach is that we have appropriate fields in the knowledge graph for storing
+confidence values, which would be multiplied together to generate the final `score` value that is recorded in the
+`TermCategory` records.
+A drawback to this prototype is the duplication between the Detectinator and SuggestedResource models (remembering that
+SuggestedResource is one of the application's detectors). While this set of models was meant to allow different
+SuggestedResource records to be affiliated with different categories, that feature can be supported via code, rather
+than relying on the data model.
+### Prototype C
+The "C" prototype was a further evolution of the "A" prototype, which attempted to combine all detection and
+categorization outputs in a single model. By changing the `Detection` table to storing floats rather than boolean
+values, we attempted to reduce the number of models needed in the application.
+ direction LR
+ Term --< Detection: has many
+ class Term
+ Term: +Integer id
+ Term: +String phrase
+ Term: calculateCategory()
+ class Detection
+ Detection: +Integer id
+ Detection: +Integer term_id
+ Detection: +Integer detector_version
+ Detection: +Float DOI
+ Detection: +Float ISBN
+ Detection: +Float ISSN
+ Detection: +Float PMID
+ Detection: +Float Journal
+ Detection: +Float SuggestedResource
+ Detection: initialize()
+ Detection: setDetectionVersion()
+ Detection: recordDetections()
+ Detection: recordPatterns()
+ Detection: recordJournals()
+ Detection: recordSuggestedResource()
+ style Term fill:#000,stroke:#66c2a5,color:#66c2a5,stroke-width:4px;
+ style Detection fill:#000,stroke:#8da0cb,color:#8da0cb,stroke-dasharray: 3 5;
+Development of this prototype was halted fairly early, after realizing that the calculation of categorization values
+would not necessarily be helped by combining models in this way.
+### Prototype D
+The "D" prototype was a further evolution of the "B" prototype, focused primarily on removing the separate structures
+for SuggestedResources. There is still a knowledge graph spread across Detectors, Categories, and the mapping between
+them. Detection and Categorization results are also spread across multiple link records.
+Further refinements in this prototype are the inclusion of a `detector_version` value in the Detection model, and the
+removal of a `user_id` field from the Categorization model (we are still debating the role of user-supplied
+categorizations, compared to the user-supplied validation of existing categorizations).
+ direction LR
+ Term "1" --> "1..*" Detection
+ Term "1" --> "0..*" Categorization
+ Detection "0..*" --> "1" Detector
+ DetectionCategory "0..*" --> "1" Category
+ Categorization "0..*" --> "1" Category
+ Detector "1" --> "0..*" DetectionCategory
+ class Term
+ Term: +Integer id
+ Term: +String phrase
+ Term: calculateCategory()
+ class Detection
+ Detection: +Integer id
+ Detection: +Integer term_id
+ Detection: +Integer detector_id
+ Detection: +Integer detector_version
+ Detection: +Float confidence
+ Detection: initialize()
+ Detection: setDetectionVersion()
+ Detection: recordDetections()
+ Detection: recordPatterns()
+ Detection: recordJournals()
+ Detection: recordSuggestedResource()
+ class Detector
+ Detector: +Integer id
+ Detector: +String name
+ Detector: +Float confidence
+ Detector: incrementConfidence()
+ Detector: decrementConfidence()
+ class Category
+ Category: +Integer id
+ Category: +String name
+ class Categorization
+ Categorization: +Integer category_id
+ Categorization: +Integer term_id
+ Categorization: +Float confidence
+ class DetectionCategory
+ DetectionCategory: +Integer id
+ DetectionCategory: +Integer detector_id
+ DetectionCategory: +Integer category_id
+ DetectionCategory: +Float confidence
+ DetectionCategory: incrementConfidence()
+ DetectionCategory: decrementConfidence()
+ style Term fill:#000,stroke:#66c2a5,color:#66c2a5,stroke-width:4px;
+ style Category fill:#000,stroke:#fc8d62,color:#fc8d62
+ style DetectionCategory fill:#000,stroke:#fc8d62,color:#fc8d62
+ style Detector fill:#000,stroke:#fc8d62,color:#fc8d62
+ style Categorization fill:#000,stroke:#8da0cb,color:#8da0cb,stroke-dasharray: 3 5;
+ style Detection fill:#000,stroke:#8da0cb,color:#8da0cb,stroke-dasharray: 3 5;
+The significant benefit of this prototype is the removal of the SuggestedResource models, which leaves a more
+straightforward data model which records only Detectors and Categories, without special consideration for any one
+## Decision
+We will pursue the "D" prototype, with explicit models for the application's knowledge graph, and detection and
+categorization outputs spread across linking records rather than concentrated in a single record.
+## Consequences
+There are still unknowns which we will confront while implementing this design. Among those are how the user permissions
+model will intersect with these models, and how the controller and view layers will be defined to enable this to
+function. Additionally, while we have discussed the process of calculating confidence values, it may be that writing
+this implementation may reveal shortcomings we have not yet realized.
+Our commitment at this stage, due to these uncertainties, is that we will further develop the "D" prototype by
+attempting to implement it. Only time will tell whether we will successfully do so, or if we will need to change course.
diff --git a/docs/explanation/categorization-workflow.md b/docs/explanation/categorization-workflow.md
new file mode 100644
index 0000000..35037ae
--- /dev/null
+++ b/docs/explanation/categorization-workflow.md
@@ -0,0 +1,107 @@
+# Categorization workflow
+## Conceptual diagram
+There are three basic models which we are attempting to relate to each other:
+Terms, Detectors, and Categories. The relationship looks like this:
+flowchart LR
+ Terms
+ Detectors
+ Categories
+ Terms -- are evaluated by --> Detectors
+ Detectors <-- are mapped to --> Categories
+ Categories -- get linked with --> Terms
+## Example data
+### Terms
+| id | phrase |
+| 1 | web of science |
+| 2 | pitchbook |
+| 3 | vaibbhav taraate |
+| 4 | doi.org/10.1080/17460441.2022.2084607 |
+We have received more than 40,000 unique search terms from the Bento system in
+the first three months of TACOS' operation.
+### Categories
+| id | name | note |
+| 1 | Transactional | The user wants to complete an _action_ (i.e. to receive an item) |
+| 2 | Navigational | The user wants to reach a _place_ which might be a web page, or perhaps talk to a person. |
+| 3 | Informational | The user wants _information_ about an idea or concept. |
+Thus far, we have only focused on these three categories of search intent. It
+should be noted that the SEO literature references additional categories, such
+as "commercial" or "conversational".
+Additionally, some of these categories may be sub-divided. Transactional
+searches might be looking for a book, a journal article, or a thesis.
+Navigational searches might be satisfied by visiting the desired webpage, or
+contacting a liaison.
+### Detectors
+| id | name | note |
+| 1 | DOI | Regex detection |
+| 2 | ISBN | Regex detection |
+| 3 | ISSN | Regex detection |
+| 4 | PMID | Regex detection |
+| 5 | Journal name | Term lookup |
+| 6 | Suggested resource | Term lookup |
+Our detectors so far fall into one of two broad types: those which use regular expressions to detect patterns within
+the search term, and those which check whether the search term appears in an external list of resources.
+## Workflow
+Most of the time, this workflow will be followed automatically when a new search phrase is recorded by the application
+for the first time. Occasionally, we will re-run this workflow (either manually, or via a schedule) when the application
+changes enough that a prior workflow is no longer valid. Our method of determining when prior work is no longer valid is
+to rely on the `detector_version` value in the Detection model.
+When a search phrase is received which has already been categorized, the prior scores are used without needing to follow
+this workflow again.
+### Pass the term through our suite of detectors
+Passing the search phrase through all of our detectors is done via a method like `recordDetections()`, which is part of
+the `Detection` model. Should ony a subset of detectors need to be consulted, there are internal methods which can
+accomplish this.
+### Calculate the categorization scores based on these detections
+The `Term` model has a method which looks up all the Detectors which found a positive result for that term. This
+`calculateCategory()` model performs the necessary math to determine the score for each Category in the system, and
+creates the needed `Categorization` records. The calculated score is stored in the `confidence` field of this model.
+One detector in this application is associated with different categories on a record-by-record basis - the
+SuggestedResource detector. The `calculateCategory()` method includes a lookup for this detector to make sure that any
+detections are scored appropriately.
+### Human validation of these operations
+There will be an ability for humans to inspect these operations, and to submit feedback about any actions which were
+not correct. These validations will be used to further refine the confidence values associated with our `Detector` and
+`DetectionCategory` records, as well as to refine the operation of the detectors, or the mappings between these
+This validation workflow has not been defined yet, nor has the data model been expanded to support this feedback. We do
+anticipate, however, that successful or unsuccessful validations would end up adjusting the relevant confidence values
+via the `incrementConfidence()` or `decrementConfidence()` methods.
+Further discussion of this design can be found in the [Classes diagram](../reference/classes.md).
diff --git a/docs/explanation/work-activity-analysis.md b/docs/explanation/work-activity-analysis.md
index 0ba76fd..07cd787 100644
--- a/docs/explanation/work-activity-analysis.md
+++ b/docs/explanation/work-activity-analysis.md
@@ -50,7 +50,7 @@ _Our initial minimal product will only include this staff workflow to allow us t
graph TD
- A("Liaison 🧑") --> C{Dashboard}
+ A("Expert 🧑") --> C{Dashboard}
C --> E(View uncategorized)
G --> E
E --> G("Enter categorization (and optional comments)")
@@ -68,7 +68,7 @@ One way to frame this is "Is this search a match with this category" (a yes/no q
graph TD
- A("Liaison 🧑") --> C{Dashboard}
+ A("Expert 🧑") --> C{Dashboard}
G --> D
C --> D(View algorithm predictions)
D --> F{Correct prediction?}
diff --git a/docs/reference/classes.md b/docs/reference/classes.md
index dd28493..2b26047 100644
--- a/docs/reference/classes.md
+++ b/docs/reference/classes.md
@@ -1,56 +1,90 @@
+# Modeling categorization
+The application includes the following entities, most of which an be broken into one of the following three areas:
+* Search activity, which flow in continuously with Terms and Search Events;
+* A knowledge graph, which includes the categories, detectors, and relationships
+ between the two which TACOS defines and maintains, and which is consulted during categorization; and
+* The linkages between these search terms and the graph, which record which signals are
+ detected in each term, and how those signals are interpreted to place the term into a category.
- direction TB
+ direction LR
- AdminUser --> User : Is a Type of
Term --> SearchEvent : has many
- User --> Categorization : Creates a
- User --> Category : Proposes a
- Categorization --> Term : Includes a
- Categorization --> Category : Includes a
+ Term "1" --> "1..*" Detection
+ Term "1" --> "0..*" Categorization
+ Detection "0..*" --> "1" Detector
+ DetectionCategory "0..*" --> "1" Category
+ Categorization "0..*" --> "1" Category
+ Detector "1" --> "0..*" DetectionCategory
+ class User
+ User: +String uid
+ User: +String email
+ User: +Boolean admin
class Term
Term: id
Term: +String phrase
- Term: calculate_certainty(term)
- Term: list_unique_terms_with_counts()
- Term: uncategorized_term()
- Term: categorized_term()
+ Term: calculateCategory()
class SearchEvent
SearchEvent: +Integer id
SearchEvent: +Integer term_id
SearchEvent: +String source
- SearchEvent: +Timestamp timestamp
+ SearchEvent: +Timestamp created_at
+ SearchEvent: single_month()
- class User
- User: +String kerbid
- User: +Boolean admin
- User: categorize_term(term, category, notes (optional))
- User: propose_category(name, description, reason)
- User: view_next_term()
- class AdminUser
- AdminUser: approve_category()
- AdminUser: create_category()
- AdminUser: upload_batch()
- AdminUser: view_proposed_categories()
+ class Detection
+ Detection: +Integer id
+ Detection: +Integer term_id
+ Detection: +Integer detector_id
+ Detection: +Integer detector_version
+ Detection: +Float confidence
+ Detection: initialize()
+ Detection: setDetectionVersion()
+ Detection: recordDetections()
+ Detection: recordPatterns()
+ Detection: recordJournals()
+ Detection: recordSuggestedResource()
+ class Detector
+ Detector: +Integer id
+ Detector: +String name
+ Detector: +Float confidence
+ Detector: incrementConfidence()
+ Detector: decrementConfidence()
class Category
+ Category: +Integer id
Category: +String name
- Category: +String reason
- Category: +Boolean approved
- Category: +Text description
class Categorization
- Categorization: id
Categorization: +Integer category_id
Categorization: +Integer term_id
- Categorization: +Integer user_id
- Categorization: +Text notes
+ Categorization: +Float confidence
+ class DetectionCategory
+ DetectionCategory: +Integer id
+ DetectionCategory: +Integer detector_id
+ DetectionCategory: +Integer category_id
+ DetectionCategory: +Float confidence
+ DetectionCategory: incrementConfidence()
+ DetectionCategory: decrementConfidence()
+ style SearchEvent fill:#000,stroke:#66c2a5,color:#66c2a5,stroke-width:4px;
+ style Term fill:#000,stroke:#66c2a5,color:#66c2a5,stroke-width:4px;
+ style Category fill:#000,stroke:#fc8d62,color:#fc8d62
+ style DetectionCategory fill:#000,stroke:#fc8d62,color:#fc8d62
+ style Detector fill:#000,stroke:#fc8d62,color:#fc8d62
- class Report
- Report: percent_categorized()
- Report: category_history()
+ style Categorization fill:#000,stroke:#8da0cb,color:#8da0cb,stroke-dasharray: 3 5;
+ style Detection fill:#000,stroke:#8da0cb,color:#8da0cb,stroke-dasharray: 3 5;