Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

INTPYTHON-348 add support for QuerySet.raw_aggregate() #183

Open
wants to merge 32 commits into
base: main
Choose a base branch
from

Conversation

aclark4life
Copy link
Collaborator

@aclark4life aclark4life commented Nov 4, 2024

test changes: mongodb-forks/django#12

django_mongodb/query.py Outdated Show resolved Hide resolved
django_mongodb/query.py Outdated Show resolved Hide resolved
django_mongodb/query.py Outdated Show resolved Hide resolved
@aclark4life
Copy link
Collaborator Author

aclark4life commented Nov 4, 2024

For @Jibola

Creating test database for alias 'default'...
System check identified no issues (1 silenced).
E
======================================================================
ERROR: test_raw_query_lazy (raw_query.tests.RawQueryTests.test_raw_query_lazy)
Raw queries are lazy: they aren't actually executed until they're
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/alex.clark/Developer/just-django/src/django/tests/raw_query/tests.py", line 150, in test_raw_query_lazy
    list(q)
  File "/Users/alex.clark/Developer/just-django/src/django/django/db/models/query.py", line 2127, in __iter__
    self._fetch_all()
  File "/Users/alex.clark/Developer/just-django/src/django/django/db/models/query.py", line 2114, in _fetch_all
    self._result_cache = list(self.iterator())
                         ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alex.clark/Developer/just-django/src/django/django/db/models/query.py", line 2141, in iterator
    yield from RawModelIterable(self)
  File "/Users/alex.clark/Developer/just-django/src/django/django/db/models/query.py", line 160, in __iter__
    query_iterator = iter(query)
                     ^^^^^^^^^^^
  File "/Users/alex.clark/Developer/just-django/src/django-mongodb/django_mongodb/query.py", line 339, in __iter__
    self.cursor = self._execute_query()
                  ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alex.clark/Developer/just-django/src/django-mongodb/django_mongodb/query.py", line 369, in _execute_query
    return collection.aggregate(self.sql)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alex.clark/Developer/just-django/src/pymongo/pymongo/synchronous/collection.py", line 2978, in aggregate
    return self._aggregate(
           ^^^^^^^^^^^^^^^^
  File "/Users/alex.clark/Developer/just-django/src/pymongo/pymongo/_csot.py", line 119, in csot_wrapper
    return func(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alex.clark/Developer/just-django/src/pymongo/pymongo/synchronous/collection.py", line 2876, in _aggregate
    cmd = aggregation_command(
          ^^^^^^^^^^^^^^^^^^^^
  File "/Users/alex.clark/Developer/just-django/src/pymongo/pymongo/synchronous/aggregation.py", line 66, in __init__
    pipeline = common.validate_list("pipeline", pipeline)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alex.clark/Developer/just-django/src/pymongo/pymongo/common.py", line 517, in validate_list
    raise TypeError(f"{option} must be a list")
TypeError: pipeline must be a list

----------------------------------------------------------------------
Ran 1 test in 0.037s

FAILED (errors=1)
Destroying test database for alias 'default'...

Actually w/test data updated, back to this:


Testing against Django installed in '/Users/alex.clark/Developer/just-django/src/django/django'
Found 1 test(s).
Creating test database for alias 'default'...
System check identified no issues (1 silenced).
E
======================================================================
ERROR: test_raw_query_lazy (raw_query.tests.RawQueryTests.test_raw_query_lazy)
Raw queries are lazy: they aren't actually executed until they're
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/alex.clark/Developer/just-django/src/django/tests/raw_query/tests.py", line 151, in test_raw_query_lazy
    list(q)
  File "/Users/alex.clark/Developer/just-django/src/django/django/db/models/query.py", line 2127, in __iter__
    self._fetch_all()
  File "/Users/alex.clark/Developer/just-django/src/django/django/db/models/query.py", line 2114, in _fetch_all
    self._result_cache = list(self.iterator())
                         ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alex.clark/Developer/just-django/src/django/django/db/models/query.py", line 2141, in iterator
    yield from RawModelIterable(self)
  File "/Users/alex.clark/Developer/just-django/src/django/django/db/models/query.py", line 170, in __iter__
    raise exceptions.FieldDoesNotExist(
django.core.exceptions.FieldDoesNotExist: Raw query must include the primary key

----------------------------------------------------------------------
Ran 1 test in 0.032s

FAILED (errors=1)
Destroying test database for alias 'default'...

django_mongodb/query.py Outdated Show resolved Hide resolved
django_mongodb/query.py Outdated Show resolved Hide resolved
@aclark4life
Copy link
Collaborator Author

One step further past primary key issue, this looks promising to me.


======================================================================
ERROR: test_raw_query_lazy (raw_query.tests.RawQueryTests.test_raw_query_lazy)
Raw queries are lazy: they aren't actually executed until they're
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/alex.clark/Developer/just-django/src/django/tests/raw_query/tests.py", line 151, in test_raw_query_lazy
    list(q)
  File "/Users/alex.clark/Developer/just-django/src/django/django/db/models/query.py", line 2127, in __iter__
    self._fetch_all()
  File "/Users/alex.clark/Developer/just-django/src/django/django/db/models/query.py", line 2114, in _fetch_all
    self._result_cache = list(self.iterator())
                         ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alex.clark/Developer/just-django/src/django/django/db/models/query.py", line 2141, in iterator
    yield from RawModelIterable(self)
  File "/Users/alex.clark/Developer/just-django/src/django/django/db/models/query.py", line 181, in __iter__
    model_init_values = [values[pos] for pos in model_init_pos]
                         ~~~~~~^^^^^
KeyError: 0


Because values is a dict

-> model_init_values = [values[pos] for pos in model_init_pos]
(Pdb) values
{'_id': ObjectId('67296173f362ea5893fe2536'), 'first_name': 'Joe', 'last_name': 'Smith', 'dob': datetime.datetime(1950, 9, 20, 0, 0)}

which maybe can be fixed in MongoRawQuery

(Pdb) l
180  	            for values in query_iterator:
181  	                # Associate fields to values
182  	                model_init_values = [values[pos] for pos in model_init_pos]
183  	                instance = model_cls.from_db(db, model_init_names, model_init_values)
184  	                if annotation_fields:
185  	                    for column, pos in annotation_fields:
186  	                        setattr(instance, column, values[pos])
187  	                yield instance
188  	        finally:
189  	            # Done iterating the Query. If it has its own cursor, close it.
190  	            if hasattr(query, "cursor") and query.cursor:
(Pdb) type(query_iterator)
<class 'pymongo.synchronous.command_cursor.CommandCursor'>
(Pdb) type(query)
<class 'django_mongodb.query.MongoRawQuery'>
(Pdb) iter(query)
<pymongo.synchronous.command_cursor.CommandCursor object at 0x105c73a70>

@aclark4life aclark4life marked this pull request as ready for review November 5, 2024 19:30
@aclark4life
Copy link
Collaborator Author

aclark4life commented Nov 5, 2024

@timgraham @Jibola @WaVEV Any advice for getting a "real" mql attribute set via the __str__ method of MongoRawQuery ?

class MongoRawQuery(RawQuery):
    def __init__(self, sql, using, model, params=()):
        super().__init__(sql, using, params)
        self.model = model

    def __iter__(self):
        self.cursor = self._execute_query()
        if not connections[self.using].features.can_use_chunked_reads:
            result = list(self.cursor)                
        else:                              
            result = self.cursor
        return iter(result)           
                                       
    def _execute_query(self):    
        connection = connections[self.using]
        collection = connection.get_collection(self.model._meta.db_table)
        return collection.aggregate(self.sql)

    def get_columns(self):
        if self.cursor is None:
            self._execute_query()
        converter = connections[self.using].introspection.identifier_converter
        return [converter(column_name) for column_name in self.model._meta.fields]
                                                                             
            
    def __str__(self):
        return "self.mql"

Edit: never mind, misguided question. I've moved on to replacing the SQL in django tests with MQL. Thanks all.

django_mongodb/query.py Outdated Show resolved Hide resolved
django_mongodb/query.py Outdated Show resolved Hide resolved
django_mongodb/query.py Outdated Show resolved Hide resolved
django_mongodb/query.py Outdated Show resolved Hide resolved
@timgraham
Copy link
Collaborator

test_missing_fields fails because self.queryset.columns returns all model fields (MongoRawQuery.get_columns()) rather than the fields in the query (RawQuery.get_columns() uses cursor.description). I'm not sure how we could solve this besides "peeking" at the first document to see what keys the document has.

@aclark4life
Copy link
Collaborator Author

test_missing_fields fails because self.queryset.columns returns all model fields (MongoRawQuery.get_columns()) rather than the fields in the query (RawQuery.get_columns() uses cursor.description). I'm not sure how we could solve this besides "peeking" at the first document to see what keys the document has.

@Jibola 🤔 ^

@aclark4life
Copy link
Collaborator Author

@Jibola @timgraham In case it helps, less args to $project reproduces the error


(Pdb) query
[{'$project': {'dob': 1, 'last_name': 1, 'first_name': 1, 'id': 1}}]

(Pdb) Author.objects.raw_mql(query)
<MongoRawQuerySet: [{'$project': {'dob': 1, 'last_name': 1, 'first_name': 1, 'id': 1}}]>

(Pdb) [i for i in Author.objects.raw_mql([{'$project': {'dob': 1} }])]
*** IndexError: list index out of range

@timgraham
Copy link
Collaborator

Right. What I meant by "peeking at the first document" is that we can't use resolve_model_init_order(). Instead we have to figure out the info that method generates (what columns and annotations are returned and what database converters we need to process them) by examining the keys of the first document in queryset_iterator.

@aclark4life
Copy link
Collaborator Author

E.g. something like this?


diff --git a/django_mongodb/query.py b/django_mongodb/query.py
index 89b0085..10b56ec 100644
--- a/django_mongodb/query.py
+++ b/django_mongodb/query.py
@@ -1,7 +1,7 @@
 from functools import reduce, wraps
 from operator import add as add_operator
 
-from django.core.exceptions import EmptyResultSet, FullResultSet
+from django.core.exceptions import EmptyResultSet, FullResultSet, FieldDoesNotExist
 from django.db import DatabaseError, IntegrityError, NotSupportedError, connections
 from django.db.models import QuerySet
 from django.db.models.expressions import Case, Col, When
@@ -360,6 +360,23 @@ class MongoRawQuerySet(RawQuerySet):
     def iterator(self):
         yield from MongoRawModelIterable(self)
 
+    def resolve_model_init_order(self, query):
+        """Resolve the init field names and value positions."""
+        converter = connections[self.db].introspection.identifier_converter
+        model_init_fields = [
+            f for f in self.model._meta.fields if converter(f.column) in self.columns and converter(f.column) in [i for i in query][0].keys()
+        ]
+        annotation_fields = [
+            (column, pos)
+            for pos, column in enumerate(self.columns)
+            if column not in self.model_fields
+        ]
+        model_init_order = [
+            self.columns.index(converter(f.column)) for f in model_init_fields
+        ]
+        model_init_names = [f.attname for f in model_init_fields]
+        return model_init_names, model_init_order, annotation_fields
+
 
 class MongoRawModelIterable(RawModelIterable):
     """
@@ -379,10 +396,10 @@ class MongoRawModelIterable(RawModelIterable):
                 model_init_names,
                 model_init_pos,
                 annotation_fields,
-            ) = self.queryset.resolve_model_init_order()
+            ) = self.queryset.resolve_model_init_order(query)
             model_cls = self.queryset.model
             if model_cls._meta.pk.attname not in model_init_names:
-                raise exceptions.FieldDoesNotExist(
+                raise FieldDoesNotExist(
                     "Raw query must include the primary key"
                 )
             fields = [self.queryset.model_fields.get(c) for c in self.queryset.columns]


That works for me along with mongodb-forks/django@1351d2c

@timgraham
Copy link
Collaborator

I haven't examined the details, but yea, something like that. Probably it would be a cleaner abstraction to pass a columns argument rather than query (and then could columns be used instead of self.columns in that method?).

Btw, in any code you copy you can can omit identifier_converter since that isn't customized by our backend and the default implementation returns the value unchanged.

django_mongodb/query.py Outdated Show resolved Hide resolved
Comment on lines 367 to 370
model_init_fields = [f for f in self.model._meta.fields if f.column in columns]
annotation_fields = [
(column, pos) for pos, column in enumerate(columns) if column not in self.model_fields
]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does self.model._meta.fields differ from self.model_fields?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to pdb,

(Pdb) self.model._meta.fields
(<django_mongodb.fields.auto.ObjectIdAutoField: id>, <django.db.models.fields.CharField: first_name>, <django.db.models.fields.CharField: last_name>, <django.db.models.fields.DateField: dob>)
(Pdb) self.model_fields
{'_id': <django_mongodb.fields.auto.ObjectIdAutoField: id>, 'first_name': <django.db.models.fields.CharField: first_name>, 'last_name': <django.db.models.fields.CharField: last_name>, 'dob': <django.db.models.fields.DateField: dob>}
(Pdb) 

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refactored so we don't have to override this method anymore.

@timgraham timgraham changed the title INTPYTHON-348 add support for QuerySet.raw_mql() INTPYTHON-348 add support for QuerySet.raw_aggregate() Nov 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants