Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug when inserting title blobs and returning them from Cassandra db #43

Open
brando90 opened this issue May 3, 2014 · 0 comments
Open
Labels

Comments

@brando90
Copy link
Collaborator

brando90 commented May 3, 2014

Hi team,

It has come to my attention one unexpected behaviour of blobs from the Cassandra database and I would like to discuss it and make sure it is fixed correctly. Currently I have a script that handles the bug, but it feels more of a hack rather than an actual patch and I would like to address it.

The problem I encountered was when I tried to get the value at the column test from test_by_score table (which is a blob). It seems to me that on line:

https://github.com/gwicke/testreduce/blob/master/CassandraBackend.js#L830

Sometimes, when we get things from Cassandra and we call the toString() method, either Cassandra does not convert the blob to a valid JSON string or, we are inserting the wrong value string during testing in the script:

https://github.com/gwicke/testreduce/blob/master/articles/importCassandra.js

Explicitly the bug that I was having was that the string that was returned by toString() looked something like the following:

{"prefix": "enwiki", "title": ""Aghnadarragh""}

The problem is that the title ""Aghnadarragh"" has doube " instead of a single one, which is invalid JSON. One way to fix it is to have a regex that strips off the double " and insert a single " as I have in:

https://github.com/brando90/testreduce/blob/feature_oneSkip_oneFail_OtherFails_FlaggedRegressions/server.js#L480-L484

This seems to "fix it", since it makes sure invalid JSON is not fed to the parser, however, it seems like an odd solution.

I have looked into the actual data file where Aghnadarragh lives at :

https://github.com/brando90/testreduce/blob/feature_oneSkip_oneFail_OtherFails_FlaggedRegressions/articles/enwiki-10000.json#L4

and it doesn't seem to be inserting it into the database with double quotes since the inserting function at:

https://github.com/gwicke/testreduce/blob/master/articles/importCassandra.js#L21-L23

Doesn't insert it with double quotes nor does the original data (in my previous link) have double quotes.

My guess it that for some reason the .toString() inserted double quotes for some reason.

I am not sure if it was because we are using Cassandra 2.0.5 specifically or because that is the intended behaviour of a BLOB.

I wanted to know what the rest of the team thought about this issue. I was also wondering, why can't that column (test) in the database be a VARCHAR instead of a BLOB? Is there any specific reason why we chose a BLOB and not a VARCHAR? Would that fix the problem? I am just a little worried that right now the Regex I have fixes the double quote problem but, how do we know we won't have to keep appending more weird Regexes (that shouldn't even be there in the first place) to fix future unexpected JSON string problems? Is this an issue with Cassandra 2.0.5? Would be fixed by using the most up to date Cassandra framework? Is that even an option?

@brando90 brando90 added the bug label May 3, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant