Proposed Dataset
API changes
#2591
Replies: 6 comments 13 replies
-
It also looks like a simple |
Beta Was this translation helpful? Give feedback.
-
I support all of the proposals in this discussion. This has been a long-time coming - we've noticed these things for years - but have never done anything about these and they still hurt u - @edmondchuc is battling with datasets in a current project. I suggest we also remove the |
Beta Was this translation helpful? Give feedback.
-
I don't think |
Beta Was this translation helpful? Give feedback.
-
I've wrote down my thoughts on what expected interfaces are in a pseudo python/rdflib format:
Hopefully it's a coherent perspective; it may take some effort to reconcile / integrate with others'. Will have a go at this next. Minimal class definitionsOnly enough to illustrate the thinking/scenarios class GraphType(Enum):
DEFAULT = "default"
NAMED = "named"
class Graph:
def __init__(
self,
identifier: URIRef | None = None,
graph_type: GraphType | None = None,
):
pass Dataset: class Dataset:
def __init__(self):
pass
def quads(
self,
context: GraphType | URIRef | list[GraphType | URIRef] | None = None,
):
pass
def triples(
self,
context: GraphType | URIRef | list[GraphType | URIRef] | None = None,
):
pass
def add_graph(
self,
graph: Graph,
target: URIRef | GraphType.DEFAULT | None = None,
):
pass Graph ScenariosScenario 1: Default Graph (Start with Triple)Graph instantiated without context becomes a "default" or contextless graph when the first thing added is a triple. g = Graph()
g.parse(data="<ex:s1> <ex:p1> <ex:o1> .", format="turtle")
print(list(g.triples()))
> [('<ex:s1>', '<ex:p1>', '<ex:o1>')]
print(list(g.quads()))
> [('<ex:s1>', '<ex:p1>', '<ex:o1>', None)]
print(g.graph_type)
> default # graph type is now "default"; any triples or quads added after this have no context
g.parse(data="<ex:s2> <ex:p2> <ex:o2> <ex:graph> .", format="nquads")
print(list(g.triples()))
> [('<ex:s1>', '<ex:p1>', '<ex:o1>'), ('<ex:s2>', '<ex:p2>', '<ex:o2>')]
print(list(g.quads()))
> [('<ex:s1>', '<ex:p1>', '<ex:o1>', None), ('<ex:s2>', '<ex:p2>', '<ex:o2>', None)] Scenario 2: Named Graph (Start with Quad)Graph instantiated without context gets context from parsed quad. g = Graph()
g.parse(data="<ex:s2> <ex:p2> <ex:o2> <ex:g2> .", format="nquads")
print(list(g.triples()))
> [('<ex:s2>', '<ex:p2>', '<ex:o2>')]
print(list(g.quads()))
> [('<ex:s2>', '<ex:p2>', '<ex:o2>', '<ex:g2>')]
print(g.graph_type)
> named
g.parse(data="<ex:s3> <ex:p3> <ex:o3> .", format="turtle")
print(list(g.triples()))
> [('<ex:s2>', '<ex:p2>', '<ex:o2>'), ('<ex:s3>', '<ex:p3>', '<ex:o3>')]
print(list(g.quads()))
> [('<ex:s2>', '<ex:p2>', '<ex:o2>', '<ex:g2>'), ('<ex:s3>', '<ex:p3>', '<ex:o3>', '<ex:g2>')] Scenario 3: Named Graph with IdentifierTriples added to graph inherit the context. g = Graph(identifier="ex:g1")
print(g.graph_type)
> named
g.parse(data="<ex:s1> <ex:p1> <ex:o1> .", format="turtle")
print(list(g.triples()))
> [('<ex:s1>', '<ex:p1>', '<ex:o1>')]
print(list(g.quads()))
> [('<ex:s1>', '<ex:p1>', '<ex:o1>', '<ex:g1>')] Scenario 4: Add quad to default graphContext is ignored. g = Graph(graph_type="default")
g.parse(data="<ex:s1> <ex:p1> <ex:o1> <ex:graph> .", format="nquads")
print(list(g.triples()))
> [('<ex:s1>', '<ex:p1>', '<ex:o1>')]
print(list(g.quads()))
> [('<ex:s1>', '<ex:p1>', '<ex:o1>', None)] Dataset ScenariosScenario 5: Add a Default Graph to a Datasetg = Graph(graph_type="default")
g.parse(data="<ex:s1> <ex:p1> <ex:o1> .", format="turtle")
ds = Dataset()
ds.add_graph(g)
print(list(ds.triples()))
> [('<ex:s1>', '<ex:p1>', '<ex:o1>')]
print(list(ds.quads()))
> [('<ex:s1>', '<ex:p1>', '<ex:o1>', None)] Scenario 6: Add a Named Graph to a Datasetg = Graph(identifier="ex:g1")
g.parse(data="<ex:s2> <ex:p2> <ex:o2> .", format="turtle")
ds = Dataset()
ds.add_graph(g)
print(list(ds.triples()))
> [('<ex:s2>', '<ex:p2>', '<ex:o2>')]
print(list(ds.quads()))
> [('<ex:s2>', '<ex:p2>', '<ex:o2>', '<ex:g1>')] Scenario 7: Add a Graph to the Default ContextGraph ID of graph being added (if present) is overridden by "target". g = Graph(identifier="ex:g1")
g.parse(data="<ex:s3> <ex:p3> <ex:o3> .", format="turtle")
ds = Dataset()
ds.add_graph(g, target="default")
print(list(ds.triples()))
> [('<ex:s3>', '<ex:p3>', '<ex:o3>')]
print(list(ds.quads()))
> [('<ex:s3>', '<ex:p3>', '<ex:o3>', None)] Scenario 8: Add Graphs to Dataset changing the graphGraph ID of graph being added (if present) is overridden by "target". g = Graph(identifier="ex:g2", graph_type="named")
g.parse(data="<ex:s4> <ex:p4> <ex:o4> .", format="turtle")
ds = Dataset()
ds.add_graph(g, target="ex:newg")
print(list(ds.triples()))
> [('<ex:s4>', '<ex:p4>', '<ex:o4>')]
print(list(ds.quads()))
> [('<ex:s4>', '<ex:p4>', '<ex:o4>', '<ex:newg>')] Scenario 9: Iterate Over Triples with Contextsg1 = Graph(graph_type="default")
g1.parse(data="<ex:s1> <ex:p1> <ex:o1> .", format="turtle")
g2 = Graph(identifier="ex:g1")
g2.parse(data="<ex:s2> <ex:p2> <ex:o2> .", format="turtle")
ds = Dataset()
ds.add_graph(g1)
ds.add_graph(g2)
print(list(ds.triples()))
> [('<ex:s1>', '<ex:p1>', '<ex:o1>'), ('<ex:s2>', '<ex:p2>', '<ex:o2>')]
print(list(ds.triples(context=["NAMED", "DEFAULT"]))) # equivalent to default behaviour when not specifying context
> [('<ex:s1>', '<ex:p1>', '<ex:o1>'), ('<ex:s2>', '<ex:p2>', '<ex:o2>')]
print(list(ds.triples(context="NAMED")))
> [('<ex:s2>', '<ex:p2>', '<ex:o2>')]
print(list(ds.triples(context="DEFAULT")))
> [('<ex:s1>', '<ex:p1>', '<ex:o1>')]
print(list(ds.triples(context=["DEFAULT", "ex:g2"]))) # ex:g2 is not in the dataset so no data returned from this graph.
> [('<ex:s1>', '<ex:p1>', '<ex:o1>')] Scenario 10: Iterate Over Quads with Contextsg1 = Graph(graph_type="default")
g1.parse(data="<ex:s1> <ex:p1> <ex:o1> .", format="turtle")
g2 = Graph(identifier="ex:g1")
g2.parse(data="<ex:s2> <ex:p2> <ex:o2> .", format="turtle")
ds = Dataset()
ds.add_graph(g1)
ds.add_graph(g2)
print(list(ds.quads()))
> [('<ex:s1>', '<ex:p1>', '<ex:o1>', None), ('<ex:s2>', '<ex:p2>', '<ex:o2>', '<ex:g1>')]
print(list(ds.quads(context="NAMED")))
> [('<ex:s2>', '<ex:p2>', '<ex:o2>', '<ex:g1>')]
print(list(ds.quads(context="DEFAULT")))
> [('<ex:s1>', '<ex:p1>', '<ex:o1>', None)]
print(list(ds.quads(context=["DEFAULT", "ex:g2"]))) # ex:g2 is not in the dataset so no data returned from this graph.
> [('<ex:s1>', '<ex:p1>', '<ex:o1>', None)] |
Beta Was this translation helpful? Give feedback.
-
Observations from afar ...
Would the dataset (the storage unit) have a default context setting? Otherwise if an app changes, then it might require every API call to be tracked down and changed. FWIW Fuseki has both modes - union default graph is SPARQL only, and it is a view of the dataset at query time. The usual way is to have a setting on the dataset but it can be set per query execution. For update, where do new triples go to in an inclusive dataset? |
Beta Was this translation helpful? Give feedback.
-
The
Dataset
is quite weird and assumes that standaloneGraph
s have identifiers, which will be phased out (#2537). For example, adding a named graph to aDataset
looks like this:Moreover,
Dataset
uses the termcontext
when referring to named graphs. I think it should be phased out as well.If in doubt, I suggest just copying Jena's
Dataset
API.My suggestions for
Dataset
:add_named_graph(uri: IdentifiedNode, graph: Graph)
methodhas_named_graph(uri: IdentifiedNode)
methodremove_named_graph(uri: IdentifiedNode)
methodreplace_named_graph(uri: IdentifiedNode, graph: Graph))
methodgraphs()
method as an alias forcontexts()
default_graph
property as an alias fordefault_context
get_named_graph
as an alias forget_graph
graph(graph)
methodremove_graph(graph)
methodcontexts()
methodUsing
IdentifiedNode
as a super-interface forURIRef
andBNode
(since both are allowed as graph names in RDF 1.1).The above example would become something like this after these changes:
Beta Was this translation helpful? Give feedback.
All reactions