Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing full Compliance to the Linked Data Cloud principles #9

Open
4 of 5 tasks
kuzeko opened this issue Oct 31, 2019 · 13 comments
Open
4 of 5 tasks

Missing full Compliance to the Linked Data Cloud principles #9

kuzeko opened this issue Oct 31, 2019 · 13 comments
Assignees
Labels
enhancement New feature or request

Comments

@kuzeko
Copy link
Contributor

kuzeko commented Oct 31, 2019

To have the bonsai dataset added to the Linked Data Cloud
We need to satisfy the Linked Data Principles

In particular, we need to satisfy the following (quoting from the website):

  • There must be resolvable http:// (or https://) URIs.
  • They must resolve, with or without content negotiation, to RDF data in one of the popular RDF formats (RDFa, RDF/XML, Turtle, N-Triples).
  • The dataset must contain at least 1000 triples.
  • The dataset must be connected via RDF links to a dataset that is already in the diagram. This means, either your dataset must use URIs from the other dataset, or vice versa. We arbitrarily require at least 50 links.
  • Access of the entire dataset must be possible via RDF crawling, via an RDF dump, or via a SPARQL endpoint.

The missing issue is then the "resolving" of URIs, I think we need something like an .htaccess file or a script that does the correct redirect to our rdf server or to the correct query at our sparql endpoint.

@kuzeko kuzeko added the enhancement New feature or request label Oct 31, 2019
@tngTUDOR
Copy link
Contributor

tngTUDOR commented Nov 3, 2019

@kuzeko can you please elaborate a little bit more on the second item ?
Say for example by providing a specific tool that "resolves". For example:
if I do:

wget http://rdf.bonsai.uno/XX/YY

The result must be:

content-type: rdf-ttl 
XXXX: YYYY
data: ZZZ

@tngTUDOR tngTUDOR pinned this issue Nov 3, 2019
@cmutel
Copy link
Contributor

cmutel commented Nov 3, 2019

First off, this is an excellent issue report, I only wish I was as thorough.

I don't understand the issue itself - is the problem that rdf.bonsai.uno/flowobject/exiobase3_3_17/#C_ADDC doesn't provide any references to the actual datasets? Or is the problem that rdf.bonsai.uno itself doesn't provide a way to traverse its own directory tree?

BTW, htaccess is basically considered an anti-pattern, at least by some.

@tngTUDOR
Copy link
Contributor

tngTUDOR commented Nov 4, 2019

I think I understand better now. With content negotiation it would be something like:

curl -H 'accept: text/turtle' https://rdf.bonsai.uno/flowobject/exiobase3_3_17/#C_ADDC

@tngTUDOR
Copy link
Contributor

tngTUDOR commented Nov 4, 2019

The rdf contents are statically served (with nginx). I'll look for a solution that allows for content negotiation.

@tngTUDOR
Copy link
Contributor

tngTUDOR commented Nov 4, 2019

Right now, without any modifications of nginx, this is possible:

curl -H 'accept: text/turtle' http://rdf.bonsai.uno/flowobject/exiobase3_3_17/exiobase3_3_17.ttl

but I couldn't get a specific \ # reference:

curl -H 'accept: text/turtle' https://rdf.bonsai.uno/flowobject/exiobase3_3_17/exiobase3_3_17.ttl\#C_ADDC 

@kuzeko
Copy link
Contributor Author

kuzeko commented Nov 4, 2019

So the best possible (utopic?) solution would be the following:

curl -H 'accept: text/turtle' https://rdf.bonsai.uno/flowobject/exiobase3_3_17/C_ADDC

Should return the result of the query:

SELECT *
FROM <http://rdf.bonsai.uno/flowobject/exiobase3_3_17/>
WHERE {
  BIND ( <http://rdf.bonsai.uno/flowobject/exiobase3_3_17/C_ADDC> as ?s)
 ?s ?p ?o .
}

@kuzeko
Copy link
Contributor Author

kuzeko commented Nov 4, 2019

Also note that there is not # in that specific URI (shall we decide on a convention here?)

@cmutel
Copy link
Contributor

cmutel commented Nov 6, 2019

We shouldn't be reinventing the wheel, if we need to expose quasi-database endpoints, maybe our URIs should just point to the database. Otherwise, as this is a standard, there should be a standard software that we can just throw .ttl files at and not have to think too much about it, right?

@kuzeko
Copy link
Contributor Author

kuzeko commented Nov 6, 2019

We cannot just point to the database.
As you say, the option for NOT reinventing the wheel is to use some additional application software.
For instance, famous endpoints use Virtuoso instead of Jena.
This provides the full stack, e.g.,
https://bio2rdf.org/clinicaltrials:NCT00050895

@kuzeko
Copy link
Contributor Author

kuzeko commented Nov 14, 2019

I've found the following SPARQL to be the equivalent of what we need:

CONSTRUCT { 
   ?s ?p ?o .
}
FROM <http://rdf.bonsai.uno/flowobject/exiobase3_3_17/>
WHERE {
  BIND ( <http://rdf.bonsai.uno/flowobject/exiobase3_3_17/C_ADDC> as ?s)
 ?s ?p ?o .
}

Jena can understand also content negotiation.

Hence, if a URI is not found as a file, ti can be replaced in the BIND and sent to the Jena endpoint

@tngTUDOR
Copy link
Contributor

tngTUDOR commented Aug 21, 2020

virtuoso seems indeed to be used by quite a few existing repositories.
A clear list of other open source alternatives (the one catching my eye is marmotta) can be found at Linked Data Platform Implementation Conformance Report

@kuzeko
Copy link
Contributor Author

kuzeko commented Aug 24, 2020

Hi @tngTUDOR are you still working on the way to solve this issue?

@kuzeko
Copy link
Contributor Author

kuzeko commented Sep 1, 2020

This tool can also be an option maybe:
https://github.com/KNowledgeOnWebScale/walder

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants