Skip to content
Luther Tychonievich edited this page Oct 15, 2014 · 3 revisions

Use Cases

Following are a few use cases that Polygenea makes straightforward but are more difficult under other common data models, as well as some that are not first-class in Polygenea but are nonetheless easily implemented.

Fixing a Mistake

If a node is in error, you simply

  1. Add another node that is not in error

  2. Add a Connection node noting the new node is an "update-of" the old one

The error gets corrected and the fact that the error existed and was corrected gets recorded without any chance of stepping on the toes of another researcher.

Distributed Research

Because nodes are immutable, there is no danger of incompatible edits. If a hundred users and databases each have a copy of some nodes and each add new nodes to the mix, all that is needed to re-synchronize the various copies is to send one another the new nodes.

The only possible challenge is in ensuring that the id fields of independently-created Thing nodes are unique. I consider this a minor challenge at best because many solutions to the unique identifier problem are known, including UUID versions 1 and 4 among others.

Privacy and Data Ownership

Private data is easily handled: it's just a set of nodes that do not get shared with others.

Because data is immutable, sharing it with others does not risk it being changed in any way. Thus, most of the importance of data ownership is moot. If you wish to keep a copy of some data, doing so does not hurt anyone else.

Paywall data is a bit more difficult. It can easily be tagged with a Property with a key of meta-paywall-owner so that tools will know it was paywalled, but keeping someone from spreading it once they have it is not feasible.

I usually assume that each user will have a "belief set", a set of nodes that the user in question has decided to accept. This set could be kept secret or revealed to others, but I would be hesitant to make a single set shared by multiple users. However, the belief set idea is really a matter of the user experience; it is not part of the data itself.

Handling Disagreement

If you and I disagree about some fact, all we need to do is have a different set of nodes we chose to believe. We can still share with one another new nodes we create and most of these will probably apply to both of our views of the world whether the disagreement is rooted in a match, a property, a connection, or a source. Edit wars are difficult to create in a world without edits.

Handling Uncertainty

Even more useful than being able to disagree with collaborators is being able to disagree with one's self. I can, for example, have data asserting that persons A and B are the same individual, that persons B and C are the same individual, and that A and C are distinct individuals. Even though this state contains a logical contradiction and cannot be how history actually looked, I can leave it in the data to represent my confusion until such time as I have sufficient evidence to resolve the puzzle. In general, I can enter both sides of a difficult choice into the data and explore what the world would look like under each scenario, only later deciding that one decision is superior in some way to the others.

Polygenea can also model probabilistic uncertainty with meta-probability properties, though the merits of such properties are not at present evident to me.

Recording Provenance

Provenance of sources is easily modelled as two sources with a "derived-from" Connection between them. Those connection nodes themselves may be sourced to express the rationale behind the provenance decisions, or may be left without a source.

Attribution

Attribution (to the creator of a node) is easily handled with meta-creator properties. Nodes that are independently created several times by several researchers will simply accrue a set of creator properties.

Safe, Small, Interesting Tasks

The polygenea data model was designed with the goal of having each atomic research step result in the creation of a few new nodes. If I succeeded in this goal then any small-but-useful contribution one might make can be represented by the addition of a few nodes. Additionally, the impact of a mistake is limited since others can simply ignore it.

Rule Creation

I envision a simple way to guide novices through the process of creating a Rule node:

  1. User creates a ruleless Inference and the nodes that will cite it as their source.

  2. A overly-constrained Rule is generated by

    1. Copying each antecedent node of the Inference into the Rule's antecedent
    2. Copying each nodes sourcing the Inference to the Rule's consequent pool
    3. Adding any nodes that the consequent references that are not yet either an antecedent or consequent to the antecedent pool
    4. Removing all fields of antecedent nodes that reference nodes not in the antecedent pool
    5. Replacing all remaining references with local references
  3. The Rule is generalized by asking the user questions like the following:

    • "Is the fact that the name was 'John' important to this inference?"
    • "Both things in this example had the same date property; is that important to this inference?"

    …and so on to determine parts of the antecedents that can be loosened up.

Tool-Specific Values

Many tools have specific keys or values that are not shared by other tools. One example that came up in [email protected] was FamilySearch's Ternary parent-child.

I suggest that keys that do not begin with an underscore or exclamation point be treated as unconstrained: if a source suggests person 15 had a magical ability of 47.8 we are free to add <Property key="magical ability" of="15" value="47.8"/>. I further suggest that a standards body like FHISO allocate keys that do start with _ and !. ! should prefix keys with agreed-on meaning by the standards body itself; _ should prefix individual organisations with approved key namespaces. Thus we might have date keys for the literal date contained in a source, a !date key for a standard-format date value, a _fs_date key for FamilySearch's in-house date standard, etc.

How would an controlled key be added to Polygenea? I suggest it be defined by some Rules. For example, an enforced-ternary child relationship could be created by a set of rules like

{"!class":"Rule","antecedents":[
	{"!class":"Thing"},
	{"!class":"Property","key":"!type","of":0,"value":"_fs_ternary"},
	{"!class":"Thing"},
	{"!class":"Connection","key":"_fs_mother","of":0,"value":2},
	{"!class":"Thing"},
	{"!class":"Connection","key":"_fs_mother","of":0,"value":4},
],"consequents":[
	{"!class":"Property","key":"meta-validity","of":0,"value":"invalid"},
]}

that mark any non-ternary relationship as invalid.

Belief Sets

One use of BitItem nodes is to represent belief sets. When working with others I might want to keep track of the nodes that I believe and the nodes each of my contributors has chosen to believe as well. To do that I add an OutRef like <OutRef type="belief" user="_fhiso_username=ltychonievich"/> and a lot of Connection nodes like <Connection key="member" of="42" value="8"/> which, if "42" was a reference to the belief OutRef, would mean that node 8 is part of my belief.

Another use for belief sets is to have several views of my own work. Often there are two or more variants of possible history, each made up of many claims but mutually exclusive. I'd make the nodes for both groups and then add a few belief sets like

<OutRef option="A" type="belief" user="_fhiso_username=ltychonievich"/>
<OutRef option="B" type="belief" user="_fhiso_username=ltychonievich"/>

to connect the various possible versions of history together. The user interface could then display the different sets however it choses, perhaps by showing only data linked from the currently-viewed belief or showing several color-coded beliefs at once, etc.

Note that many uses of belief sets can be automated by a computer by using inferences. Often conflicting ideas stem from a conflict: from node set A you can infer that nodes in set B are not true and vice versa. If these inferences are recorded with Inference nodes sourced to not-true properties, it should be straightforward for the computer could detect those patterns and present the set of alternatives without the need for manual belief set identification. Similarly, it should be possible for the computer to identify a minimal set of core differences between the various options, etc. However, since the algorithms to do this are not yet in production the manual belief set is probably best for now.

Authority-Controlled Attributions

For various reasons some groups want to be able to control attribution of various values related to family history. Examples include membership in various "daughters-of-____" groups, tribal rights, and LDS temple ordinances. But problems arise as the details of the deceased-as-believed-to-be change. If we realize that Tom Jones was really two people, which one (if either) gets to keep the attribution?

I suggest that controlled attributes be attached to a single Thing node within a Match-based individual. Properties may be discredited, Matches made or broken, but no ambiguity results: there is still one attributed Thing node. If at some point a few different attributed Thing nodes are Matched together we have unambiguous data showing that the attribution was applied multiple times.

I am assuming that the authority would keep its own list of attribution Property nodes which it might make publicly readable, but that it would not accept these attributions from outside sources. The attributions could also be cryptographically signed so that third parties could verify if attributions they hear about were created by the authority or not.

My suggestions for a solution to this are not unique to polygenea: they would work in any DeadEnds, Lifelines, Behold, and other persona/source detail systems as well.