Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeDecodeError with UTF-8 queries #77

Open
karanlyons opened this issue Feb 15, 2017 · 12 comments
Open

UnicodeDecodeError with UTF-8 queries #77

karanlyons opened this issue Feb 15, 2017 · 12 comments

Comments

@karanlyons
Copy link

Traceback (most recent call last):
  File "/venv/pypy/site-packages/django/db/models/manager.py", line 85, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
  File "/venv/pypy/site-packages/django/db/models/query.py", line 449, in bulk_create
    self._batched_insert(objs_with_pk, fields, batch_size)
  File "/venv/pypy/site-packages/django/db/models/query.py", line 1062, in _batched_insert
    inserted_id = self._insert(item, fields=fields, using=self.db, return_id=True)
  File "/venv/pypy/site-packages/django/db/models/query.py", line 1045, in _insert
    return query.get_compiler(using=using).execute_sql(return_id)
  File "/venv/pypy/site-packages/django/db/models/sql/compiler.py", line 1054, in execute_sql
    cursor.execute(sql, params)
  File "/venv/pypy/site-packages/django/db/backends/utils.py", line 64, in execute
    return self.cursor.execute(sql, params)
  File "/venv/pypy/site-packages/psycopg2cffi/_impl/cursor.py", line 30, in check_closed_
    return func(self, *args, **kwargs)
  File "/venv/pypy/site-packages/psycopg2cffi/_impl/cursor.py", line 244, in execute
    self._query = _combine_cmd_params(query, parameters, conn)
  File "/venv/pypy/site-packages/psycopg2cffi/_impl/cursor.py", line 1002, in _combine_cmd_params
    return b''.join(parts)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 38: ordinal not in range(128)

This might be a personal issue, but it seems like there may still be some lingering encoding issues with this package. I see that there’s some work done in this function from #57, but it’s only under python 3.

I’m getting this error in 2.7 with unicode_literals.

@lopuhin
Copy link
Member

lopuhin commented Feb 15, 2017

Thanks for the report! I don't think it's a personal issue, since it's Django that is creating the query.
What Django version are you using?

@karanlyons
Copy link
Author

Django 1.10, though I can likely test this in some other versions as well.

I’m fairly confident that just having _combine_cmd_params encode the strings in all versions of Python will do the trick, but I haven’t had a chance to sit down and debug it yet.

@karanlyons
Copy link
Author

Okay, the string causing issues is 'Club La Unión'. Since this isn’t in Py3 the strings don’t get encoded. Removing the if six.PY3 checks is enough to fix the problem.

I think what we’re trying to do here is encode the string if it’s unicode, right? If so, just checking against six.text_type is enough for both versions of Python, since in 2.7 that’s unicode, and str in 3.

So if that all sounds right, I think a two line pull request should fix our problems 😄

@lopuhin
Copy link
Member

lopuhin commented Feb 16, 2017

@karanlyons a PR that fixes an issue for you is more then welcome! Even better if it comes with tests, but if not I'll add them. I'm not sure yet which six.PY3 check you mean, but I haven't debugged the issue yet.

@karanlyons
Copy link
Author

Sure, give me a little bit to write the tests and I’ll submit a PR we can look at.

@karanlyons
Copy link
Author

karanlyons commented Feb 16, 2017

Alright, worryingly I’m having a hard time reproducing this in a test. The way this query is generated in my Django codebase is...questionable, but the error also happens with Django’s bulk_create. Still, I can’t get it to throw an error in either CPython or PyPy once I start reducing it down.

I’m not yet convinced that this isn’t actually my fault.

@karanlyons
Copy link
Author

karanlyons commented Feb 16, 2017

Okay, so the issue turned out to be completely different than I thought:

Django has a DateTimeRangeField in django.contrib.postgres.fields but it doesn’t support setting custom bounds. So I had a subclass that allowed me to set bounds for a field and have it instantiate DateTimeTZRange with those bounds. Basically, this:

from __future__ import unicode_literals
from psycopg2.extras import DateTimeTZRange
from django.contrib.postgres.fields import DateTimeRangeField


class DateTimeRangeFieldWithBounds(DateTimeRangeField):
	range_type = DateTimeTZRange
	
	def __init__(self, *args, **kwargs):
		self.default_bounds = kwargs.pop('default_bounds', '[)')
		super(DateTimeRangeFieldWithBounds, self).__init__(*args, **kwargs)
	...

unicode_literals means that that the bounds string is actually unicode, which meant (for reasons I haven’t sussed out yet) that the final tstzrange string that made its way into _combine_cmd_params was unicode instead of bytes/str.

Making the bounds kwarg bytes instead of unicode fixes everything in PyPy. In CPython there’s no problem with either.

And...I’m not sure where this leaves us. There definitely is a difference between what happens on CPython with psycopg2 and PyPy with psycopg2cffi, but I’m not totally certain where in that large variable surface the problem actually lies.

@lopuhin
Copy link
Member

lopuhin commented Feb 16, 2017

Thanks for digging into it @karanlyons ! I think I finally understand the issue and what six.PY3 check you were talking about. Looks like in some cases https://github.com/chtd/psycopg2cffi/blob/master/psycopg2cffi/_impl/adapters.py#L288 can return unicode under python 2 as well. I need to check with psycopg2 - if conversion to unicode should always happen in the adapters, I'll fix it there, and if not - then we can just remove six.PY3 check as you suggested.

@karanlyons
Copy link
Author

Yeah, I think just removing the six.PY3 checks in _combine_cmd_params should be fine, but we may also want to make sure the return types for adapters have parity between psycopg2 and psycopg2cffi for good measure (especially in case someone’s using these “internal” functions as opposed to just working through the higher level API).

@lopuhin
Copy link
Member

lopuhin commented Feb 16, 2017

@karanlyons hm, I just checked how _combine_cmd_params works with unicode both in CPython 2.7 and in PyPy, and I could not reproduce it yet. This is what I tried:

In [1]: from psycopg2cffi import connect
In [2]: c = connect('host=localhost')
In [3]: from psycopg2cffi._impl.cursor import _combine_cmd_params
In [4]: _combine_cmd_params('select %(foo)s from x', {'foo': u'ё'}, c)
Out[4]: "select '\xd1\x91' from x"

Can you check the arguments (and their types) that cause _combine_cmd_params(query, parameters, conn) to fail?

@karanlyons
Copy link
Author

karanlyons commented Feb 16, 2017

import psycopg2cffi
from psycopg2cffi.extras import DateTimeTZRange
from psycopg2cffi._impl.cursor import _combine_cmd_params
from datetime import datetime

conn = psycopg2cffi.connect("dbname='psycopg2_test' user='postgres' host='localhost'")

ascii_string = 'Test'
unicode_string = u'T\xe9st'
ascii_range = DateTimeTZRange(datetime.now(), datetime.now(), '[]')
unicode_range = DateTimeTZRange(datetime.now(), datetime.now(), u'[]')

# Works
_combine_cmd_params('%s', [ascii_string], conn)
_combine_cmd_params('%s', [unicode_string], conn)
_combine_cmd_params('%s', [ascii_range], conn)
_combine_cmd_params('%s', [unicode_range], conn)

_combine_cmd_params('%s %s', [ascii_string, ascii_range], conn)
_combine_cmd_params('%s %s', [unicode_string, ascii_range], conn)
_combine_cmd_params('%s %s', [ascii_range, unicode_range], conn)

# Fails
_combine_cmd_params('%s %s', [unicode_string, unicode_range], conn)

Please tell me that last one fails for you too. This bug is either more complex than I thought or I’m insane, and I’m really hoping for the former.

@lopuhin
Copy link
Member

lopuhin commented Feb 16, 2017

Yes, that does fail for me, thanks a lot for building a reproducer! Technically this is a bug in how Range objects are adapted, but the actual issue is deeper - b helper defined here https://github.com/chtd/psycopg2cffi/blob/master/psycopg2cffi/extensions.py#L69 is not what you would expect given it's name :) Ideally I should get rid of it - luckily it's used only in _range and in tests, so it should not touch too much code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants