|
@@ -3,8 +3,8 @@ Unicode data
|
|
|
============
|
|
|
|
|
|
Django natively supports Unicode data everywhere. Providing your database can
|
|
|
-somehow store the data, you can safely pass around Unicode strings to
|
|
|
-templates, models and the database.
|
|
|
+somehow store the data, you can safely pass around strings to templates,
|
|
|
+models, and the database.
|
|
|
|
|
|
This document tells you what you need to know if you're writing applications
|
|
|
that use data or templates that are encoded in something other than ASCII.
|
|
@@ -35,11 +35,10 @@ able to store certain characters in the database, and information will be lost.
|
|
|
.. _section 2: https://docs.oracle.com/database/121/NLSPG/ch2charset.htm#NLSPG002
|
|
|
.. _section 11: https://docs.oracle.com/database/121/NLSPG/ch11charsetmig.htm#NLSPG011
|
|
|
|
|
|
-All of Django's database backends automatically convert Unicode strings into
|
|
|
+All of Django's database backends automatically convert strings into
|
|
|
the appropriate encoding for talking to the database. They also automatically
|
|
|
-convert strings retrieved from the database into Python Unicode strings. You
|
|
|
-don't even need to tell Django what encoding your database uses: that is
|
|
|
-handled transparently.
|
|
|
+convert strings retrieved from the database into strings. You don't even need
|
|
|
+to tell Django what encoding your database uses: that is handled transparently.
|
|
|
|
|
|
For more, see the section "The database API" below.
|
|
|
|
|
@@ -48,7 +47,7 @@ General string handling
|
|
|
|
|
|
Whenever you use strings with Django -- e.g., in database lookups, template
|
|
|
rendering or anywhere else -- you have two choices for encoding those strings.
|
|
|
-You can use normal Unicode strings or bytestrings (starting with a 'b').
|
|
|
+You can use normal strings or bytestrings (starting with a 'b').
|
|
|
|
|
|
.. warning::
|
|
|
|
|
@@ -74,13 +73,13 @@ using your application -- and if that person chooses a different setting, your
|
|
|
code must still continue to work. Ergo, it cannot rely on that setting.
|
|
|
|
|
|
In most cases when Django is dealing with strings, it will convert them to
|
|
|
-Unicode strings before doing anything else. So, as a general rule, if you pass
|
|
|
-in a bytestring, be prepared to receive a Unicode string back in the result.
|
|
|
+strings before doing anything else. So, as a general rule, if you pass
|
|
|
+in a bytestring, be prepared to receive a string back in the result.
|
|
|
|
|
|
Translated strings
|
|
|
------------------
|
|
|
|
|
|
-Aside from Unicode strings and bytestrings, there's a third type of string-like
|
|
|
+Aside from strings and bytestrings, there's a third type of string-like
|
|
|
object you may encounter when using Django. The framework's
|
|
|
internationalization features introduce the concept of a "lazy translation" --
|
|
|
a string that has been marked as translated but whose actual translation result
|
|
@@ -93,7 +92,7 @@ Normally, you won't have to worry about lazy translations. Just be aware that
|
|
|
if you examine an object and it claims to be a
|
|
|
``django.utils.functional.__proxy__`` object, it is a lazy translation.
|
|
|
Calling ``str()`` with the lazy translation as the argument will generate a
|
|
|
-Unicode string in the current locale.
|
|
|
+string in the current locale.
|
|
|
|
|
|
For more details about lazy translation objects, refer to the
|
|
|
:doc:`internationalization </topics/i18n/index>` documentation.
|
|
@@ -102,17 +101,17 @@ Useful utility functions
|
|
|
------------------------
|
|
|
|
|
|
Because some string operations come up again and again, Django ships with a few
|
|
|
-useful functions that should make working with Unicode and bytestring objects
|
|
|
+useful functions that should make working with string and bytestring objects
|
|
|
a bit easier.
|
|
|
|
|
|
Conversion functions
|
|
|
~~~~~~~~~~~~~~~~~~~~
|
|
|
|
|
|
The ``django.utils.encoding`` module contains a few functions that are handy
|
|
|
-for converting back and forth between Unicode and bytestrings.
|
|
|
+for converting back and forth between strings and bytestrings.
|
|
|
|
|
|
* ``smart_text(s, encoding='utf-8', strings_only=False, errors='strict')``
|
|
|
- converts its input to a Unicode string. The ``encoding`` parameter
|
|
|
+ converts its input to a string. The ``encoding`` parameter
|
|
|
specifies the input encoding. (For example, Django uses this internally
|
|
|
when processing form input data, which might not be UTF-8 encoded.) The
|
|
|
``strings_only`` parameter, if set to True, will result in Python
|
|
@@ -126,7 +125,7 @@ for converting back and forth between Unicode and bytestrings.
|
|
|
cases. The difference is when the first argument is a :ref:`lazy
|
|
|
translation <lazy-translations>` instance. While ``smart_text()``
|
|
|
preserves lazy translations, ``force_text()`` forces those objects to a
|
|
|
- Unicode string (causing the translation to occur). Normally, you'll want
|
|
|
+ string (causing the translation to occur). Normally, you'll want
|
|
|
to use ``smart_text()``. However, ``force_text()`` is useful in
|
|
|
template tags and filters that absolutely *must* have a string to work
|
|
|
with, not just something that can be converted to a string.
|
|
@@ -139,8 +138,8 @@ for converting back and forth between Unicode and bytestrings.
|
|
|
but the difference is needed in a few places within Django's internals.
|
|
|
|
|
|
Normally, you'll only need to use ``force_text()``. Call it as early as
|
|
|
-possible on any input data that might be either Unicode or a bytestring, and
|
|
|
-from then on, you can treat the result as always being Unicode.
|
|
|
+possible on any input data that might be either a string or a bytestring, and
|
|
|
+from then on, you can treat the result as always being a string.
|
|
|
|
|
|
.. _uri-and-iri-handling:
|
|
|
|
|
@@ -225,13 +224,13 @@ double-quoting problems.
|
|
|
Models
|
|
|
======
|
|
|
|
|
|
-Because all strings are returned from the database as Unicode strings, model
|
|
|
+Because all strings are returned from the database as ``str`` objects, model
|
|
|
fields that are character based (CharField, TextField, URLField, etc.) will
|
|
|
contain Unicode values when Django retrieves data from the database. This
|
|
|
is *always* the case, even if the data could fit into an ASCII bytestring.
|
|
|
|
|
|
You can pass in bytestrings when creating a model or populating a field, and
|
|
|
-Django will convert it to Unicode when it needs to.
|
|
|
+Django will convert it to strings when it needs to.
|
|
|
|
|
|
Taking care in ``get_absolute_url()``
|
|
|
-------------------------------------
|
|
@@ -263,7 +262,7 @@ non-ASCII characters would have been removed in quoting in the first line.)
|
|
|
The database API
|
|
|
================
|
|
|
|
|
|
-You can pass either Unicode strings or UTF-8 bytestrings as arguments to
|
|
|
+You can pass either strings or UTF-8 bytestrings as arguments to
|
|
|
``filter()`` methods and the like in the database API. The following two
|
|
|
querysets are identical::
|
|
|
|
|
@@ -273,11 +272,12 @@ querysets are identical::
|
|
|
Templates
|
|
|
=========
|
|
|
|
|
|
-You can use either Unicode or bytestrings when creating templates manually::
|
|
|
+You can use either strings or UTF-8 bytestrings when creating templates
|
|
|
+manually::
|
|
|
|
|
|
from django.template import Template
|
|
|
t1 = Template(b'This is a bytestring template.')
|
|
|
- t2 = Template('This is a Unicode template.')
|
|
|
+ t2 = Template('This is a string template.')
|
|
|
|
|
|
But the common case is to read templates from the filesystem, and this creates
|
|
|
a slight complication: not all filesystems store their data encoded as UTF-8.
|
|
@@ -294,13 +294,13 @@ Template tags and filters
|
|
|
|
|
|
A couple of tips to remember when writing your own template tags and filters:
|
|
|
|
|
|
-* Always return Unicode strings from a template tag's ``render()`` method
|
|
|
+* Always return strings from a template tag's ``render()`` method
|
|
|
and from template filters.
|
|
|
|
|
|
* Use ``force_text()`` in preference to ``smart_text()`` in these
|
|
|
places. Tag rendering and filter calls occur as the template is being
|
|
|
rendered, so there is no advantage to postponing the conversion of lazy
|
|
|
- translation objects into strings. It's easier to work solely with Unicode
|
|
|
+ translation objects into strings. It's easier to work solely with
|
|
|
strings at that point.
|
|
|
|
|
|
.. _unicode-files:
|