You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It has come to my attention one unexpected behaviour of blobs from the Cassandra database and I would like to discuss it and make sure it is fixed correctly. Currently I have a script that handles the bug, but it feels more of a hack rather than an actual patch and I would like to address it.
The problem I encountered was when I tried to get the value at the column test from test_by_score table (which is a blob). It seems to me that on line:
Sometimes, when we get things from Cassandra and we call the toString() method, either Cassandra does not convert the blob to a valid JSON string or, we are inserting the wrong value string during testing in the script:
Explicitly the bug that I was having was that the string that was returned by toString() looked something like the following:
{"prefix": "enwiki", "title": ""Aghnadarragh""}
The problem is that the title ""Aghnadarragh"" has doube " instead of a single one, which is invalid JSON. One way to fix it is to have a regex that strips off the double " and insert a single " as I have in:
Doesn't insert it with double quotes nor does the original data (in my previous link) have double quotes.
My guess it that for some reason the .toString() inserted double quotes for some reason.
I am not sure if it was because we are using Cassandra 2.0.5 specifically or because that is the intended behaviour of a BLOB.
I wanted to know what the rest of the team thought about this issue. I was also wondering, why can't that column (test) in the database be a VARCHAR instead of a BLOB? Is there any specific reason why we chose a BLOB and not a VARCHAR? Would that fix the problem? I am just a little worried that right now the Regex I have fixes the double quote problem but, how do we know we won't have to keep appending more weird Regexes (that shouldn't even be there in the first place) to fix future unexpected JSON string problems? Is this an issue with Cassandra 2.0.5? Would be fixed by using the most up to date Cassandra framework? Is that even an option?
The text was updated successfully, but these errors were encountered:
Hi team,
It has come to my attention one unexpected behaviour of blobs from the Cassandra database and I would like to discuss it and make sure it is fixed correctly. Currently I have a script that handles the bug, but it feels more of a hack rather than an actual patch and I would like to address it.
The problem I encountered was when I tried to get the value at the column test from test_by_score table (which is a blob). It seems to me that on line:
https://github.com/gwicke/testreduce/blob/master/CassandraBackend.js#L830
Sometimes, when we get things from Cassandra and we call the toString() method, either Cassandra does not convert the blob to a valid JSON string or, we are inserting the wrong value string during testing in the script:
https://github.com/gwicke/testreduce/blob/master/articles/importCassandra.js
Explicitly the bug that I was having was that the string that was returned by toString() looked something like the following:
{"prefix": "enwiki", "title": ""Aghnadarragh""}
The problem is that the title ""Aghnadarragh"" has doube " instead of a single one, which is invalid JSON. One way to fix it is to have a regex that strips off the double " and insert a single " as I have in:
https://github.com/brando90/testreduce/blob/feature_oneSkip_oneFail_OtherFails_FlaggedRegressions/server.js#L480-L484
This seems to "fix it", since it makes sure invalid JSON is not fed to the parser, however, it seems like an odd solution.
I have looked into the actual data file where Aghnadarragh lives at :
https://github.com/brando90/testreduce/blob/feature_oneSkip_oneFail_OtherFails_FlaggedRegressions/articles/enwiki-10000.json#L4
and it doesn't seem to be inserting it into the database with double quotes since the inserting function at:
https://github.com/gwicke/testreduce/blob/master/articles/importCassandra.js#L21-L23
Doesn't insert it with double quotes nor does the original data (in my previous link) have double quotes.
My guess it that for some reason the .toString() inserted double quotes for some reason.
I am not sure if it was because we are using Cassandra 2.0.5 specifically or because that is the intended behaviour of a BLOB.
I wanted to know what the rest of the team thought about this issue. I was also wondering, why can't that column (test) in the database be a VARCHAR instead of a BLOB? Is there any specific reason why we chose a BLOB and not a VARCHAR? Would that fix the problem? I am just a little worried that right now the Regex I have fixes the double quote problem but, how do we know we won't have to keep appending more weird Regexes (that shouldn't even be there in the first place) to fix future unexpected JSON string problems? Is this an issue with Cassandra 2.0.5? Would be fixed by using the most up to date Cassandra framework? Is that even an option?
The text was updated successfully, but these errors were encountered: