Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parameterized Queries #15

Open
bryonjacob opened this issue Jun 26, 2017 · 5 comments
Open

Parameterized Queries #15

bryonjacob opened this issue Jun 26, 2017 · 5 comments

Comments

@bryonjacob
Copy link

Parameterized Queries

I'd like to propose a spec for how parameterized queries should be processed by the query endpoints - there's two parts to that.

  • How Parameter Values and Types are specified .
  • How Parameter Names and Values are provided on the URL.

Parameter Values and Types

for both SQL and SPARQL, I think we should support "simple", "safe", and "RDF" parameter values.

  • RDF parameters allow for complete precision in the same language the underlying query engine understands, at the cost of a pretty verbose and esoteric syntax. If you want total precision, use this syntax.
  • "safe" parameters allow you to be specific about String/URI parameters by wrapping values in "" or <> - definitely a necessity for user-entered content. If you're building an SDK or integration, this helps make sure that you can be precise about types without the loss of readability in RDF types.
  • "simple" parameters mean that we'll do the right thing with most values, defaulting to String where we can't make a better guess. This maximizes the chance that ad-hoc queries will return results when the user's meaning is clear.

to parse values, here is the algorithm:

try "RDF" parameters:

if value matches /^"(.*)"\^\^<([^<>]*)>$/ :

  "abcdef"^^<http://www.w3.org/2001/XMLSchema#string>
  "3"^^<http://www.w3.org/2001/XMLSchema#integer>
  "4.2"^^<http://www.w3.org/2001/XMLSchema#decimal>
  "true"^^<http://www.w3.org/2001/XMLSchema#boolean>

(matches two "groups" - the string type value and the URI of the type)

"Safe" parameters:

if value matches /^"(.*)"$/ :

  "abcdef"                                 <- String
  "3"                                      <- String
  "4.2"                                    <- String
  "true"                                   <- String
  "https://data.world/"                    <- String

(matches one group - the string value)

if value matches /^<(.*)>$/ :

  <https://data.world/>                    <- URI
  <abcdef>                                 <- URI
  <3>                                      <- URI

(matches one group - the URI)

"Simple" parameters:

if value matches /^([0-9]+)$/ :

  3                                        <- Integer

if value matches /^([0-9]*[.][0-9]+)$/ :

  4.2                                      <- Decimal

if value matches /^(true|false)$/ :

  true                                     <- Boolean

if value matches /^([a-z]+:\/\/.*)$/ :

  https://data.world/                      <- URI

(all of the above match one group - the value to interpret as Integer/Decimal/Boolean/URI)

otherwise :

  abcdef                                   <- String

(just treat the whole value as a String if nothing else matches)

Parameter Names and Values

For SPARQL:

SPARQL supports named parameters, and parameters in queries can be specified either as ?var or $var - it's a very common convention to use ?var for variables that are meant to be matched and $var for variables that are bound to the query execution. Because of that, using the $ syntax as query string parameters is a common way to pass bound variables on a HTTP URL. No reason we shouldn't use that syntax here:

  .../sparql/user/dataset?query=<QUERY>&$var1=<VALUE1>&$var2=<VALUE2>

where and are values according to the spec above

For SQL:

SQL only supports positional parameters. Luckily, HTTP query parameters have a straightforward way to specify an arbitrary length sequence of values for a query parameter - simply repeat the same query parameter name, and multiple instances of that will be treated as a sequence of those values. I'm proposing that we use p for the name of our parameter variable (to keep the URLs nice and short), but could do param or parameter too:

  .../sql/user/dataset?query=<QUERY>&p=<VALUE1>&p=<VALUE2>

where, again, and are values according to the spec above

In both cases (SPARQL and SQL) the way we interpret values is identical. Clearly the values will need to be URL-encoded when actually sent on a URL (as with any value)...

@shawnsmith
Copy link
Contributor

Given that this is a JSON api, I'd like the ability to specify parameters using JSON formatting:

Standard JSON primitive types:
3                  <- xsd:integer
3.4                <- xsd:double
true               <- xsd:boolean
"foo"              <- xsd:string
"foo\nbar"         <- xsd:string using JSON escaping rules

Fallback to JSON object patterned on "application/sparql-results+json" for other data types:

{"datatype": "http://www.w3.org/2001/XMLSchema#decimal", "value": "3.4"}

For SQL queries this allows using the most common primitive types (boolean, integer, double, string) without need to understand any RDF concepts.

Also, I think this will work nicely with a purely JSON request object. For example, here's how a SQL query might support positional parameters:

{
   "language": "SQL",
   "query": "SELECT * FROM MyTable WHERE Field >= ?",
   "parameters": [ "foo" ]
}

And a SPARQL query should support named parameters (here type is either uri or literal as used with application/sparql-results+json):

{
   "language": "SPARQL",
   "query": "select ?s ?o WHERE { ?s ?p ?o }.",
   "parameters": {
      "p": {"type": "uri", "value": "http://www.w3.org/2000/01/rdf-schema#Class"}
   }
}

I would like to move away from putting SQL/SPARQL query parameters in the url, move them to the request body.

@shawnsmith
Copy link
Contributor

I'll point out that in my proposal we never have to parse a string to guess what its type is, and there are no data.world-specific string escaping rules to learn.

@bryonjacob
Copy link
Author

For most purposes I'm okay with just going your direction instead - it means that all queries need to be POSTs, which is a change, but we can deprecate the old endpoints if we go this route (leaving them in place to support the SDKs that are in the wild using them)

One caveat is that our SPARQL query endpoints are currently valid SPARQL endpoints (plus the addition of an auth filter), so we need to do our best to make that still the case - which probably means supporting GET as well as POST, and that would mean that we'd need to accept SPARQL query parameters on the URL's query string. This is all a bit hand-wavy, since SPARQL's spec doesn't specify handling for parameters, so any impl is going to be vendor-specific - but it should be ADDITIVE to spec-compliant SPARQL.

My main point in writing this up was to point out that the current API for specifying parameterized queries is a mess and we need a better version, with a formal specification, and it needs to be in the doc. I like your proposal a lot @shawnsmith and if we're able to rationalize the SPARQL protocol compatibility I'm all for it.

@shawnsmith
Copy link
Contributor

I agree it’s a mess. How about implementing two apis:

  • Spec-compatible SPARQL endpoint w/$var named parameters ala common SPARQL database extensions

  • Swagger-friendly public api data.world query endpoint that requires POST w/json body, supports both SQL and SPARQL plus whatever other extensions we find useful. This api should look and act like the rest of our public apis.

Do we need SQL support in the first, or can we leave it out?

One of the issues we've have w/parameterized queries is that Swagger doesn't support variable # of query parameters. The $var convention can't be represented cleanly w/Swagger.

@bryonjacob
Copy link
Author

agreed with this entire plan, adding new endpoints as needed, deprecating entire old endpoints and/or parameters to the same as needed.

no, we don't need SQL support in the first. The existing SQL protocol looks a lot like SPARQL protocol because that was a pattern we were emulating.

and to skateboard this more quickly, I'd even say just leave off parameterized queries from the spec-compatible SPARQL endpoint, and add them to the swagger-friendly query endpoint.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants