-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Backslash escape character is always stripped #224
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't forget to add BDD tests.
f80f2f1
to
8b77714
Compare
DO NOT MERGE To avoid forcing users to re-load, we will save merging this until we have to break backward compatibility for other reasons. Changes required when merging:
|
I am writing a hard coded workaround to this issue atm, as shown here https://forum.vaticle.com/t/escape-hell-handling-string-values/166/8 |
We have further considerations for when this issue comes back: One proposal is that the parser should parse "\n" pair of raw text characters as one newline character, which is then serialisable to the server. In client-java, we have to use an unformatted-toString() to make sure that we don't introduce new tabs into strings with newlines in them with this solution (alternatively, we can introduce templated queries and serialise attribute values separately). On printing of an attribute value, we would print a newline character as a pair of "\n" characters, making this operation fully reversible. If the user wishes to write a pair of "\n" characters explicitly, they would write "\n". One interesting edge case here is that writing an actual newline (eg. "enter" key) in console leaves another way of getting a newline character into the database - perhaps this one should be banned? |
hmm, so your suggestion would work, but currently it works anyway, just using a substitution approach as shown here https://forum.vaticle.com/t/escape-hell-handling-string-values/166/12 It just took me some time to work it out, and its the kind of thing that would be useful in documentation , for the less talented people like myself (i.e. why i posted the fix in the discussion forum) so certainly this should be way down the list of priorities, and I would need to change that code when you make the change |
DO NOT MERGE (see comment)
What is the goal of this PR?
We address various String escaping difficulties by which the user could not get a raw string containing a quote, or a backslash, into TypeDB without also serialising an escape character into the database. This is contrary to the existing database conventions.
We now expect that a user can insert any dataset into TypeDB by applying the following operation to their raw input strings:
This process will replace each
\
character with a\\
sequence, and then replace each"
with an\"
sequence. This will ensure that by the time the query is parsed, the user's raw data is held in memory on the server side.Example
If
raw_data
above containshello " world \
, then theescaped_data
will containhello \" world \\
. After the query is submitted to the server and parsed, the data value held for attributedata
will behello " world \
and this is the value that will be persisted into the database.What are the changes implemented in this PR?
\
as the official escape character"
and\
, but we always kept the escape character after parsing - as a result the user data plus backslashes ended up stored on disk.Existing database conventions:
Let's split query languages into two domains: console languages, and programmatic languages.
Programmatic languages include parametrised queries, for example in SQL a query can exclude attribute values, and pass them to the server separately to the parametrised query. This means escaping is never needed (and protects against SQL injection attacks) because a String attribute is passed directly to the server as a string. TypeDB does not yet support parametrised queries. As a result, we must always construct query strings that are parsed by the parser in full. Neo4j and MongoDB have their own parametrised input mechanisms that behave similarly.
Console languages are languages whose queries must be inputted entirely as strings. All TypeQL queries are provided as strings, so let's define them as console languages. Console languages all have an escape mechanism to input quoting into string attribute values. SQL and Neo4j have console languages available that use different conventions:
SQL uses the quoting character itself as the escape character:
This will be parsed as
it's escaped
and utilised in the query exection. The escaping'
is removed.Neo4J uses backslashes:
In this case, the parsed value removes the escape character
\
in both usages.We can see that existing console languages use the convention of stripping the escape characters from the input, which TypeQL now matches.