From bba78812300689db102e6ede4691e0395d9d32f2 Mon Sep 17 00:00:00 2001 From: Matthew Bernhardt Date: Fri, 23 Aug 2024 10:21:43 -0400 Subject: [PATCH] Write up documentation for workflow so far Update validation workflow doc Update workflow explanation This adds the Ruby code block to categorize all terms Separate prototype data model document This needs to be cleaned up, along with classes.md Further documentation work Updates to documentation --- docs/explanation/validation-workflow-a.md | 161 ++++++++++++++ docs/explanation/validation-workflow-b.md | 3 + docs/reference/classes-prototype-a.md | 226 +++++++++++++++++++ docs/reference/classes-prototype-b.md | 256 ++++++++++++++++++++++ docs/reference/classes-prototype-zero.md | 36 +++ docs/reference/classes.md | 228 +------------------ 6 files changed, 686 insertions(+), 224 deletions(-) create mode 100644 docs/explanation/validation-workflow-a.md create mode 100644 docs/explanation/validation-workflow-b.md create mode 100644 docs/reference/classes-prototype-a.md create mode 100644 docs/reference/classes-prototype-b.md create mode 100644 docs/reference/classes-prototype-zero.md diff --git a/docs/explanation/validation-workflow-a.md b/docs/explanation/validation-workflow-a.md new file mode 100644 index 0000000..6264b9f --- /dev/null +++ b/docs/explanation/validation-workflow-a.md @@ -0,0 +1,161 @@ +# The categorization and validation workflow + +This document describes the workflow for categorizing, and then validating, how +a given term has been processed by TACOS. + +## Preparation + +Pick what record we're working with. In production, this would happen as new +terms are recorded, but for now we're working with a randomly chosen example. + +```ruby +t = Term.all.sample +``` + +## Pass the term through our suite of detectors + +This assumes that all of our detection algorithms are integrated with the +Detector model, which creates a record of their output for processing during the +Categorization phase. + +```ruby +d = Detection.new(t) +d.save +``` + +To this point the Detection model only records activations by each detection, as +boolean values. Future development might add more details, such as which records +are matched, or what external lookups return. It might also be relevant to note +whether multiple patterns are found. + +```ruby +irb(main):013> d +=> +# +``` + +In this example, none of the detectors found anything. + +The `detection_version` value in these records gets stored in ENV, and +incremented as our detection algorithms change. This helps identify whether a +Detection is outdated and needs to be refreshed. + +## Generate the Categorization values based on these detections + +```ruby +c = Categorization.new(d) +c.save +``` + +The creation of the record includes the calculation of scores for each of the +three categories. To this point, the logic is exceedingly simple, but this can +be made more nuanced with time. + +```ruby +irb(main):019> c +=> +# +``` + +These scores are used by the `evaluate` method to assign the term to a category, +if relevant. Because none of the detectors fired in the previous step, all of +the category scores are 0.0 and the term will be placed in the "unknown" +category. + +```ruby +t.category = c.evaluate +t.save +``` + +There is also an `assign` method at the moment, which combines the above steps. +This may not make sense in production, however. + +The result of the Categorization workflow is that the original Term record now +has been placed in a category: + +```ruby +irb(main):008> t +=> +# +``` + +From end to end, the code to categorize all untouched term records is then this: + +```ruby +Term.where("category is null").each { |t| + d = Detection.new(t) + d.save + c = Categorization.new(d) + c.assign +} +``` + +## Validation + +Humans will be asked to inspect the outcomes of the previous steps, and provide +feedback about whether any decisions were made incorrectly. + +```ruby +v = Validation.new(c) +v.save +``` + +Validation records have a boolean flag for each decision which went into the +process thus far: + +```ruby +irb(main):011> v +=> +# +``` + +This includes a flag for the final result, each component score, each individual +detection, and a final flag that indicates the Term itself needs review. The +intent of this final flag is for the case where a search term is somehow +problematic and needs to be expunged. + +There are no methods yet on this model, because all values are meant to be set +individually via the web interface. + +There is not - yet - a notes field on the Validation model, but this is +something that we've discussed in case the validator has more detailed feedback +about some part of the decision-making that is being reviewed. + diff --git a/docs/explanation/validation-workflow-b.md b/docs/explanation/validation-workflow-b.md new file mode 100644 index 0000000..5578bec --- /dev/null +++ b/docs/explanation/validation-workflow-b.md @@ -0,0 +1,3 @@ +# The categorization and validation workflow + +Need to write up how Prototype B would operate from start to end... diff --git a/docs/reference/classes-prototype-a.md b/docs/reference/classes-prototype-a.md new file mode 100644 index 0000000..5b26d61 --- /dev/null +++ b/docs/reference/classes-prototype-a.md @@ -0,0 +1,226 @@ +# Prototype A ("Code") + +This prototype relies on fewer tables, with one record in each, and leans more heavily on behavior in code. + +## Shared preface + +The same color scheme is used for both prototypes: + +* Terms, which flow in continuously with Search Events; +* A knowledge graph, which includes the categories, detectors, and relationships + between the two which TACOS defines and maintains, and which is consulted during categorization; and +* The linkages between these terms and the graph, which record which signals are + detected in each term, and how those signals are interpreted to place the term into a category. + +A simple way to describe the Categorization workflow would be to say that Categorization involves populating the blue +tables in the diagrams below. + +## Categorization + +```mermaid +classDiagram + direction LR + + Term --< Detection: has many + Detection <-- Categorization: based on + Categorization --> SuggestedResource: looks up + Detection --> SuggestedResource: looks up + Detection --> Journal: looks up + + class Term + Term: +Integer id + Term: +String phrase + Term: +Enum category + + class SuggestedResource + SuggestedResource: +Integer id + SuggestedResource: +String title + SuggestedResource: +String url + SuggestedResource: +String phrase + SuggestedResource: +String fingerprint + SuggestedResource: +Enum category + SuggestedResource: calculateFingerprint() + + class Journal + Journal: +Integer id + Journal: +String title + + class Detection + Detection: +Integer id + Detection: +Integer term_id + Detection: +Integer detector_version + Detection: +Boolean DOI + Detection: +Boolean ISBN + Detection: +Boolean ISSN + Detection: +Boolean PMID + Detection: +Boolean Journal + Detection: +Boolean SuggestedResource + Detection: initialize() + Detection: setDetectionVersion() + Detection: recordDetections() + Detection: recordPatterns() + Detection: recordJournals() + Detection: recordSuggestedResource() + + class Categorization + Categorization: +Integer id + Categorization: +Integer detection_id + Categorization: +Float information_score + Categorization: +Float navigation_score + Categorization: +Float transaction_score + Categorization: initialize() + Categorization: assign() + Categorization: evaluate() + Categorization: calculateAll() + Categorization: calculateInformation() + Categorization: calculateNavigation() + Categorization: calculateTransaction() + + style Term fill:#000,stroke:#66c2a5,color:#66c2a5 + + style Category fill:#000,stroke:#fc8d62,color:#fc8d62 + style Detector fill:#000,stroke:#fc8d62,color:#fc8d62 + style Journal fill:#000,stroke:#fc8d62,color:#fc8d62 + style SuggestedResource fill:#000,stroke:#fc8d62,color:#fc8d62 + + style Detection fill:#000,stroke:#8da0cb,color:#8da0cb + style Categorization fill:#000,stroke:#8da0cb,color:#8da0cb +``` + +### Order of operations + +1. A new `Term` is registered. +2. A `Detection` record for that `Term` is created (which allows repeat detection operations as TACOS gains new + capabilities). +3. The various `Detection` records (either the most recent for each term, or all detections over time) are processed via + code to generate scores for each potential category. These results are stored as `Categorization` records. +4. The three category scores are compared, and the one with the highest score is stored back in the `Term` record. + +### Category values + +There is no `Category` table, but two models have separate enumerated fields. The `Detector::SuggestedResource` model +has three possible values (Informational, Navigational, and Transactional), while the `Term` model has an additional +value ("Unknown") which is assigned during Categorization if two category scores are equal. + +(This lack of a category table is not a fundamental aspect of this prototype, but it does indicate the general choice to +rely on code, rather than database records, as much as possible. Such a model could be accommodated, or implemented via +a shared helper method perhaps) + +### Calculating the category scores + +At the moment, category scores are assigned in methods like: + +```ruby +# FILE: app/models/categorization.rb + def calculate_transactional + self.transaction_score = 0.0 + self.transaction_score = 1.0 if %i[doi isbn issn pmid journal].any? do |signal| + self.detection[signal] + end + self.transaction_score = 1.0 if Detector::SuggestedResource.full_term_match(self.detection.term.phrase).first&.category == 'transactional' + end +``` + +This is effectively an "all or nothing" approach, where any detection at all results in the maximum possible score. This +lacks nuance, obviously, and we've talked about ways to include a confidence value in these calculations. As yet, this +prototype has not attempted to include that feature however. + +**Note:** I've tried to anticipate how to include confidence values appropriately in this prototype, and it is not at +all clear how that might happen. This gets to the mathematical operations involved in calculating the category scores, +which might need to be documented separately. + +## Validations + +```mermaid +classDiagram + direction LR + + Term --< Detection: has many + Detection <-- Categorization: based on + Categorization --> SuggestedResource: looks up + Detection --> SuggestedResource: looks up + Detection --> Journal: looks up + Categorization >-- Validation: subject to + + class Term + Term: +Integer id + Term: +String phrase + Term: +Enum category + + class SuggestedResource + SuggestedResource: +Integer id + SuggestedResource: +String title + SuggestedResource: +String url + SuggestedResource: +String phrase + SuggestedResource: +String fingerprint + SuggestedResource: +Enum category + SuggestedResource: calculateFingerprint() + + class Journal + Journal: +Integer id + Journal: +String title + + class Detection + Detection: +Integer id + Detection: +Integer term_id + Detection: +Integer detector_version + Detection: +Boolean DOI + Detection: +Boolean ISBN + Detection: +Boolean ISSN + Detection: +Boolean PMID + Detection: +Boolean Journal + Detection: +Boolean SuggestedResource + Detection: initialize() + Detection: setDetectionVersion() + Detection: recordDetections() + Detection: recordPatterns() + Detection: recordJournals() + Detection: recordSuggestedResource() + + class Categorization + Categorization: +Integer id + Categorization: +Integer detection_id + Categorization: +Float information_score + Categorization: +Float navigation_score + Categorization: +Float transaction_score + Categorization: initialize() + Categorization: assign() + Categorization: evaluate() + Categorization: calculateAll() + Categorization: calculateInformation() + Categorization: calculateNavigation() + Categorization: calculateTransaction() + + class Validation + Validation: +Integer id + Validation: +Integer categorization_id + Validation: +Integer user_id + Validation: +Boolean approve_transaction + Validation: +Boolean approve_information + Validation: +Boolean approve_navigation + Validation: +Boolean approve_doi + Validation: +Boolean approve_isbn + Validation: +Boolean approve_issn + Validation: +Boolean approve_pmid + Validation: +Boolean approve_journal + Validation: +Boolean approve_suggested_resource + + style Term fill:#000,stroke:#66c2a5,color:#66c2a5 + + style Category fill:#000,stroke:#fc8d62,color:#fc8d62 + style Detector fill:#000,stroke:#fc8d62,color:#fc8d62 + style Journal fill:#000,stroke:#fc8d62,color:#fc8d62 + style SuggestedResource fill:#000,stroke:#fc8d62,color:#fc8d62 + + style Detection fill:#000,stroke:#8da0cb,color:#8da0cb + style Categorization fill:#000,stroke:#8da0cb,color:#8da0cb + + style Validation fill:#000,stroke:#ffd407,color:#ffd407 +``` + +Validations, in this prototype, are collected in a single table with a field for each decision which came before it. As +the application expands, any new detectors or categories would result in new fields, both in the Detection or +Categorization models and also in the Validation model. + +Multiple validations are possible for a single Categorization decision, enabled by the user_id field, which allows for +feedback provided by multiple users if bandwidth allows. diff --git a/docs/reference/classes-prototype-b.md b/docs/reference/classes-prototype-b.md new file mode 100644 index 0000000..18c14ac --- /dev/null +++ b/docs/reference/classes-prototype-b.md @@ -0,0 +1,256 @@ +# Prototype B ("Data") + +This prototype relies on more models, more linking records, and as a result relies less on behavior in code. + +## Shared preface + +* Terms, which flow in continuously with Search Events; +* A knowledge graph, which includes the categories, detectors, and relationships + between the two which TACOS defines and maintains, and which is consulted during categorization; and +* The linkages between these terms and the graph, which record which signals are + detected in each term, and how those signals are interpreted to place the term into a category. + +A simple way to describe the Categorization workflow would be to say that Categorization involves populating the blue +tables in the diagrams below. + +## Categorization + +```mermaid +classDiagram + direction LR + + Term >-- TermDetectinator + TermDetectinator --> Detectinator + Category <-- Mapping + Mapping --> Detectinator + Term --> TermCategory + TermCategory <-- Category + SuggestedResource --> Category + Term <-- TermSuggestedResource + TermSuggestedResource --> SuggestedResource + + class Term:::primarytable + Term: +Integer id + Term: +String phrase + Term: categorize() + Term: evaluate_detectinators() + Term: evaluate_identifiers() + Term: evaluate_journals() + Term: evaluate_suggested_resources() + + class TermDetectinator + TermDetectinator: +Integer term_id + TermDetectinator: +Integer detector_id + TermDetectinator: +Boolean result + + class Detectinator + Detectinator: +Integer id + Detectinator: +String name + Detectinator: +Float confidence + + class Category + Category: +Integer id + Category: +String name + Category: +String note + + class Mapping + Mapping: +Integer detectinator_id + Mapping: +Integer category_id + Mapping: +Float confidence + + class TermCategory + TermCategory: +Integer term_id + TermCategory: +Integer category_id + TermCategory: +Integer user_id + + class SuggestedResource + SuggestedResource: +Integer id + SuggestedResource: +String title + SuggestedResource: +String fingerprint + SuggestedResource: +URL url + SuggestedResource: +Integer category_id + + class TermSuggestedResource + TermSuggestedResource: +Integer term_id + TermSuggestedResource: +Integer suggested_resource_id + TermSuggestedResource: +Boolean result + + style Term fill:#000,stroke:#66c2a5,color:#66c2a5 + + style Category fill:#000,stroke:#fc8d62,color:#fc8d62 + style Detectinator fill:#000,stroke:#fc8d62,color:#fc8d62 + style Mapping fill:#000,stroke:#fc8d62,color:#fc8d62 + style SuggestedResource fill:#000,stroke:#fc8d62,color:#fc8d62 + + style TermDetectinator fill:#000,stroke:#8da0cb,color:#8da0cb + style TermSuggestedResource fill:#000,stroke:#8da0cb,color:#8da0cb + style TermCategory fill:#000,stroke:#8da0cb,color:#8da0cb +``` + +### The "knowledge graph" + +The relationship between Detectors and Categories would be generally set ahead of time. Detectors produce a boolean +output in the cleanest case - they eitherdetect a signal, or they do not. Relatedly, detectors have an influence over +whether a given Category is relevant, or not: + +* If the Detector for a DOI pattern returns `true`, then this influences the `transactional` Category to a significant + degree. +* However, the Detector for a DOI pattern does almost nothing to influence the `navigational` Category. +* If Categorization is a zero-sum activity, however, the DOI pattern detector would _exclusively_ claim a Term for the + `transactional` Category - so it would effectively rule out the other two Categories. + +The exception to this Detector rule is the SuggestedResource detector - which has variability in its records. Some +SuggestedResources are in each of the three Categories, so there is a more complicated decision-making algorithm, and +thus a different set of database tables. + +### Category scores + +At the moment, category scores are assigned in methods like... + +### Order of operations + +The linkages between these tables are filled in at different moments. + +The Detector-Category linkage is determined as either set of resource is made, +and on a relatively slow cadence. Operationally, the links which matter are made +as new Terms flow into TACOS. + +1. A new Term is recorded in the system. +2. That Term is compared with each Detector, and any positive responses are recorded. Negative responses may be + discarded, or recorded for the sake of completeness (to confirm that the link was tested). These outcomes are stored + as several TermDetectinator records. +3. Those TermDetectinator records are then used to perform the Categorization work, comparing the confidence values of + each Detectinator and Mapping. The + responses are then used to perform the Categorization + work, which results in records being created in the TermCategory table. + +### Questions + +* The application defines a `Detector` module/namespace. Ideally I want a `Detector` class for the records of our + various detectors, but I'm not sure this is possible (or I haven't figured out how). If `Detector` is not possible, + should we use an un-namespaced option like `Detectinator`, or instead go with something like `Detector::Detector` or + `Detector::Base` ? + * One of the reasons why I went with an un-namespaced class here is to make defining link tables easier + (`Term_Detectinator` instead of `Term_DetectorBase`) +* The `TermDetectinator` table records the results of our suite of detectors in response to a given term. Should we + record only positive results, or should we also record negative results? + * The `Mappings` table (which should be named `CategoryDetectinator`) has a similar question - whether we should + record no-confidence mappings (for example, a DOI detection would have 0 confidence toward a navigational + categorization) + +## Validations + +Valdations might get thorny in this model, because the results we are validating are spread across multiple records in +the same class. For example, a single term record like `Collins HK. When listening is spoken. doi: 10.1016/j.copsyc.2022.101402. PMID: 35841883.` +would result in multiple records in the `TermDetectinator` table, each of which would be subject to validation. As a +result it might make sense to embed the validation throughout the data model, rather than in a separate field? + +```mermaid +classDiagram + direction LR + + Term >-- TermDetectinator + TermDetectinator --> Detectinator + Category <-- Mapping + Mapping --> Detectinator + Term --> TermCategory + TermCategory <-- Category + SuggestedResource --> Category + Term <-- TermSuggestedResource + TermSuggestedResource --> SuggestedResource + Validation <-- ValidTermDetectinator + ValidTermDetectinator --> TermDetectinator + Validation <-- ValidTermCategory + ValidTermCategory --> TermCategory + Validation <-- ValidTermSuggestedResource + ValidTermSuggestedResource --> TermSuggestedResource + + class Term:::primarytable + Term: +Integer id + Term: +String phrase + Term: categorize() + Term: evaluate_detectinators() + Term: evaluate_identifiers() + Term: evaluate_journals() + Term: evaluate_suggested_resources() + + class TermDetectinator + TermDetectinator: +Integer term_id + TermDetectinator: +Integer detector_id + TermDetectinator: +Boolean result + + class Detectinator + Detectinator: +Integer id + Detectinator: +String name + Detectinator: +Float confidence + + class Category + Category: +Integer id + Category: +String name + Category: +String note + + class Mapping + Mapping: +Integer detectinator_id + Mapping: +Integer category_id + Mapping: +Float confidence + + class TermCategory + TermCategory: +Integer term_id + TermCategory: +Integer category_id + TermCategory: +Integer user_id + + class SuggestedResource + SuggestedResource: +Integer id + SuggestedResource: +String title + SuggestedResource: +String fingerprint + SuggestedResource: +URL url + SuggestedResource: +Integer category_id + + class TermSuggestedResource + TermSuggestedResource: +Integer term_id + TermSuggestedResource: +Integer suggested_resource_id + TermSuggestedResource: +Boolean result + + class Validation + Validation: +Integer id + Validation: +Integer user_id + + class ValidTermCategory + ValidTermCategory: +Integer validation_id + ValidTermCategory: +Integer termcategory_id + ValidTermCategory: +Boolean valid + + class ValidTermDetectinator + ValidTermDetectinator: +Integer validation_id + ValidTermDetectinator: +Integer termdetectinator_id + ValidTermDetectinator: +Boolean valid + + class ValidTermSuggestedResource + ValidTermSuggestedResource: +Integer validation_id + ValidTermSuggestedResource: +Integer termsuggestedresource_id + ValidTermSuggestedResource: +Boolean valid + + + style Term fill:#000,stroke:#66c2a5,color:#66c2a5 + + style Category fill:#000,stroke:#fc8d62,color:#fc8d62 + style Detectinator fill:#000,stroke:#fc8d62,color:#fc8d62 + style Mapping fill:#000,stroke:#fc8d62,color:#fc8d62 + style SuggestedResource fill:#000,stroke:#fc8d62,color:#fc8d62 + + style TermDetectinator fill:#000,stroke:#8da0cb,color:#8da0cb + style TermSuggestedResource fill:#000,stroke:#8da0cb,color:#8da0cb + style TermCategory fill:#000,stroke:#8da0cb,color:#8da0cb + + style Validation fill:#000,stroke:#ffd407,color:#ffd407 + style ValidTermCategory fill:#000,stroke:#ffd407,color:#ffd407 + style ValidTermDetectinator fill:#000,stroke:#ffd407,color:#ffd407 + style ValidTermSuggestedResource fill:#000,stroke:#ffd407,color:#ffd407 +``` + +This is an extension of the original class diagram, adding the validation data model in yellow. The thesis of the model +is that every decision made during Categorization is subject to review during Validation, potentially by multiple +reviewers. + +If validation is only performed once, we don't need any of the yellow tables, and we instead could just add a boolean +`valid` flag to each categorization table. \ No newline at end of file diff --git a/docs/reference/classes-prototype-zero.md b/docs/reference/classes-prototype-zero.md new file mode 100644 index 0000000..7330a7c --- /dev/null +++ b/docs/reference/classes-prototype-zero.md @@ -0,0 +1,36 @@ +# Prototype Zero + +This was the simplest possible way to join the three basic resources (Terms, Detectors, and Categories). + +```mermaid +classDiagram + direction TB + + Term --> Link + Category --> Link + Detector --> Link + + class Term + Term: +Integer id + Term: +String phrase + + class Category + Category: +Integer id + Category: +String name + + class Link + Link: +Integer + Link: +Integer term_id + Link: +Integer category_id + Link: +Integer detector_id + + class Detector + Detector: +Integer id + Detector: +String name +``` + +This was not developed further, because the other two prototypes (A and B) immediately seemed more capable than this +approach. Having a single join table link all three resources is a recipe for duplicate and inconsistent data that is +hard to work with. + +It is included here only for the sake of completeness. diff --git a/docs/reference/classes.md b/docs/reference/classes.md index bb0b88e..b194159 100644 --- a/docs/reference/classes.md +++ b/docs/reference/classes.md @@ -140,230 +140,10 @@ contacting a liaison. | 5 | Journal name | Term lookup | | 6 | Suggested resource | Term lookup | - -## One central join table -```mermaid -classDiagram - direction TB - - Term --> Link - Category --> Link - Detector --> Link - - class Term - Term: +Integer id - Term: +String phrase - - class Category - Category: +Integer id - Category: +String name - - class Link - Link: +Integer - Link: +Integer term_id - Link: +Integer category_id - Link: +Integer detector_id - - class Detector - Detector: +Integer id - Detector: +String name -``` ---- -# Sets of two-way join tables - -```mermaid -classDiagram - direction LR - - Term >-- TermDetector - TermDetector --> Detector - Category <-- DetectorCategory - DetectorCategory --> Detector - Term --> TermCategory - TermCategory <-- Category - SuggestedResource --> Category - Term <-- TermSuggestedResource - TermSuggestedResource --> SuggestedResource - - class Term:::primarytable - Term: +Integer id - Term: +String phrase - - class TermDetector - TermDetector: +Integer term_id - TermDetector: +Integer detector_id - TermDetector: +Boolean result - - class Detector - Detector: +Integer id - Detector: +String name - Detector: hasMatch() - - class Category - Category: +Integer id - Category: +String name - Category: +String note - - class DetectorCategory - DetectorCategory: +Integer detector_id - DetectorCategory: +Integer category_id - - class TermCategory - TermCategory: +Integer term_id - TermCategory: +Integer category_id - TermCategory: +Integer user_id - - class SuggestedResource - SuggestedResource: +Integer id - SuggestedResource: +String title - SuggestedResource: +String fingerprint - SuggestedResource: +URL url - SuggestedResource: +Integer category_id - - class TermSuggestedResource - TermSuggestedResource: +Integer term_id - TermSuggestedResource: +Integer suggested_resource_id - TermSuggestedResource: +Boolean result - - style Category fill:#000,stroke:#ffd407,color:#ffd407 - style Detector fill:#000,stroke:#ffd407,color:#ffd407 - style Term fill:#000,stroke:#ffd407,color:#ffd407 -``` - -The principle resources are Terms, Categories, and Detectors. Terms flow in -continuously. Detectors are less fluid, but might still be expected to change as -we improve our operations. Categories are the slowest changing. - -The relationship between Detectors and Categories would be generally set ahead -of time. Detectors produce a boolean output in the cleanest case - they either -detect a signal, or they do not. Relatedly, detectors have an influence over -whether a given Category is relevant, or not: - -* If the Detector for a DOI pattern returns `true`, then this influences the - `transactional` Category to a significant degree. -* However, the Detector for a DOI pattern does almost nothing to influence the - `navigational` Category. -* If Categorization is a zero-sum activity, however, the DOI pattern detector - would _exclusively_ claim a Term for the `transactional` Category - so it - would effectively rule out the other two Categories. - -The exception to this Detector rule is the SuggestedResource detector - which -has variability in its records. Some SuggestedResources are in each of the three -Categories, so there is a more complicated decision-making algorithm, and thus -a different set of database tables. - -## Order of operations - -The linkages between these tables are filled in at different moments. - -The Detector-Category linkage is determined as either set of resource is made, -and on a relatively slow cadence. Operationally, the links which matter are made -as new Terms flow into TACOS. - -1. A new Term is recorded in the system. -2. That Term is compared with each Detector, and any positive responses are - recorded. Negative responses may be discarded, or recorded for the sake of - completeness (to confirm that the link was tested). -3. Those Term-Detector responses are then used to perform the Categorization - work, which results in records being created in the TermCategory table. - --- -# Less "pure" implementation -```mermaid -classDiagram - - Term >-- Detection: has many - Detection >-- Categorization: based on - Category >-- SuggestedResource: belongs to - Categorization --> SuggestedResource: looks up - Detection --> SuggestedResource: looks up - Detection --> Journal: looks up - Categorization >-- Validation: subject to +Further discussion of the class diagram can be found in the three prototype files: - class Term - Term: +Integer id - Term: +String phrase - - class SuggestedResource - SuggestedResource: +Integer id - SuggestedResource: +String title - SuggestedResource: +String url - SuggestedResource: +String phrase - SuggestedResource: +String fingerprint - SuggestedResource: +Integer category_id - SuggestedResource: calculateFingerprint() - - class Journal - Journal: +Integer id - Journal: +String title - - class Detection - Detection: +Integer id - Detection: +Integer term_id - Detection: +Integer detector_version - Detection: +Boolean DOI - Detection: +Boolean ISBN - Detection: +Boolean ISSN - Detection: +Boolean PMID - Detection: +Boolean Journal - Detection: +Integer journal_id - Detection: +Boolean SuggestedResource - Detection: +Integer suggested_resource_id - Detection: +Boolean LCSH - Detection: +Boolean WebsitePageTitle - Detection: hasDOI() - Detection: hasISBN() - Detection: hasISSN() - Detection: hasPMID() - Detection: hasJournal() - Detection: hasSuggestedResource() - Detection: hasLCSH() - Detection: hasWebsitePageTitle() - - class Detector - Detector: +Integer id - Detector: +String name - Detector: +Float DOI_Confidence - - class Category - Category: +Integer id - Category: +String name - - class Categorization - Categorization: +Integer id - Categorization: +Integer detection_id - Categorization: +Float transaction_score - Categorization: +Float information_score - Categorization: +Float navigation_score - Categorization: evaluateTransaction() - Categorization: evaluateInformation() - Categorization: evaluateNavigation() - - class Validation - Validation: +Integer id - Validation: +Integer categorization_id - Validation: +Boolean approve_transaction - Validation: +Boolean approve_information - Validation: +Boolean approve_navigation - Validation: +Boolean approve_doi - Validation: +Boolean approve_isbn - Validation: +Boolean approve_issn - Validation: +Boolean approve_pmid - Validation: +Boolean approve_journal - Validation: +Boolean approve_suggested_resource - Validation: +Boolean approve_lcsh - Validation: +Boolean approve_webpage - - style Term fill:#000,stroke:#ffd407,color:#ffd407 - style Detector fill:#000,stroke:#ffd407,color:#ffd407 - style Category fill:#000,stroke:#ffd407,color:#ffd407 -``` -This makes the order of operation a bit more explicit: - -1. A new Term is registered. -2. The Detection table entry for that Term is populated (which allows repeat - Detection passes as the detector models change). -3. The output of various Detection passes (either the most recent for each term, - or all detections over time) are processed via code to generate scores for - each potential category. \ No newline at end of file +* [Prototype zero (abandoned)](./classes-prototype-zero.md) +* [Prototype A ("Code")](./classes-prototype-a.md) +* [Prototype B ("Data")](./classes-prototype-b.md)