-
-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Baseline Django Field Tests #62
base: master
Are you sure you want to change the base?
Conversation
django_scylla/cql/compiler.py
Outdated
from django.db.models.sql import compiler | ||
|
||
from datetime import timedelta | ||
from cassandra.util import Duration | ||
|
||
|
||
def unique_rowid(): | ||
# TODO: guarantee that this is globally unique |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This caused a failure in the test for SmallAutoField
because the number generated was beyond the 32767 limit for short ints. Which leads me to:
I don't think supporting SmallAutoField
is very important, but I do think that models should be able to generate unique identifiers consistently without risk of collision. I think that row IDs generated with this driver should use the standard UUID, since Scylla does not actually support the paradigm of "unique counters".
Realistically, the only reason you would specifically want to preserve the tradition of keeping row_ids as ints is because it gives you an idea of when an object was created chronologically and because it makes URL slugs cleaner, but if you are consciously choosing to use Scylla I think that's an acceptable tradeoff and you can model that requirement in better ways.
@r4fek are you opposed to the idea of making these UUID values by default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not at all. Let's use uuids by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an update to this:
There are actually many major obstacles to using UUIDs as the default for AutoFields. Django assumes by default that all AutoFields are subclasses of IntegerField [1], and the workarounds for that require a lot of monkeypatching. The canonical approach is to have an AutoID field + a separate UUID field, but that sort of defeats the purpose since Scylla can't really do that anyway...
Transparently overriding AutoFields into UUIDs doesn't work so well because the generated UUID gets cast as an integer regardless internally due to field validation in many places (and UUIDs translate to 128 bit integers, much larger than even bigint's range).
I did try creating a UUIDAutoField that subclasses the existing AutoField so it could be used by setting DEFAULT_AUTO_FIELD
to UUIDAutoField
. With that I was able to override the existing validation methods and prevent it from being cast to an integer, but even in that situation it would lead to column conflict errors since internal models still expect a BigAutoField
value-- meaning it would generate a UUID value and pass every validation check internally but fail once the actual query executed.
Django devs seem to be aware of this and working on it, so I figured at this point it is probably best to wait for full support with an official UUIDAutoField and double down on the original approach (all AutoFields as BigInts, generate suitably random number). I've reverted back to the original approach and added a few additional steps so that generated values are guaranteed to be more random.
[1] https://code.djangoproject.com/ticket/32577
[2] https://code.djangoproject.com/ticket/33450
|
||
# Ensure that the other side of this relationship has been created and is selectable | ||
for library in libraries: | ||
assert library.book_set is not None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside from the SmallAutoField
test, this is currently the only failure.
Creation works as expected, but the ORM's internal JOIN behavior is hard to work around when these values are being selected afterward.
pid = getpid() | ||
sys_random = SystemRandom().randint(0, 65535) | ||
|
||
return timestamp + counter['value'] + pid + sys_random |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally this approach means that given the same code running simultaneously: each worker will return a different pid, each thread of that worker will return a different counter, and each call will return a different sys_random
value.
The counter in particular is worth noting because it uses a little known "pitfall" of Python, which is that default values specified as dicts are persistent between runs.
e.g.
>>> def foo(counter={'value':0}):
... counter['value'] += 1
... return counter['value']
...
>>> foo()
1
>>> foo()
2
>>> foo()
3
>>> foo()
4
This PR adds testing and confirms functionality for all of Django's ORM coverage and field types.
mixer
has been added as a dev dependency for test fuzzing.I've opened this as a draft for discussion on a few points.