123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381 |
- .. _topics-serialization:
- ==========================
- Serializing Django objects
- ==========================
- Django's serialization framework provides a mechanism for "translating" Django
- objects into other formats. Usually these other formats will be text-based and
- used for sending Django objects over a wire, but it's possible for a
- serializer to handle any format (text-based or not).
- .. seealso::
- If you just want to get some data from your tables into a serialized
- form, you could use the :djadmin:`dumpdata` management command.
- Serializing data
- ----------------
- At the highest level, serializing data is a very simple operation::
- from django.core import serializers
- data = serializers.serialize("xml", SomeModel.objects.all())
- The arguments to the ``serialize`` function are the format to serialize the data
- to (see `Serialization formats`_) and a :class:`~django.db.models.QuerySet` to
- serialize. (Actually, the second argument can be any iterator that yields Django
- objects, but it'll almost always be a QuerySet).
- You can also use a serializer object directly::
- XMLSerializer = serializers.get_serializer("xml")
- xml_serializer = XMLSerializer()
- xml_serializer.serialize(queryset)
- data = xml_serializer.getvalue()
- This is useful if you want to serialize data directly to a file-like object
- (which includes an :class:`~django.http.HttpResponse`)::
- out = open("file.xml", "w")
- xml_serializer.serialize(SomeModel.objects.all(), stream=out)
- Subset of fields
- ~~~~~~~~~~~~~~~~
- If you only want a subset of fields to be serialized, you can
- specify a ``fields`` argument to the serializer::
- from django.core import serializers
- data = serializers.serialize('xml', SomeModel.objects.all(), fields=('name','size'))
- In this example, only the ``name`` and ``size`` attributes of each model will
- be serialized.
- .. note::
- Depending on your model, you may find that it is not possible to
- deserialize a model that only serializes a subset of its fields. If a
- serialized object doesn't specify all the fields that are required by a
- model, the deserializer will not be able to save deserialized instances.
- Inherited Models
- ~~~~~~~~~~~~~~~~
- If you have a model that is defined using an :ref:`abstract base class
- <abstract-base-classes>`, you don't have to do anything special to serialize
- that model. Just call the serializer on the object (or objects) that you want to
- serialize, and the output will be a complete representation of the serialized
- object.
- However, if you have a model that uses :ref:`multi-table inheritance
- <multi-table-inheritance>`, you also need to serialize all of the base classes
- for the model. This is because only the fields that are locally defined on the
- model will be serialized. For example, consider the following models::
- class Place(models.Model):
- name = models.CharField(max_length=50)
- class Restaurant(Place):
- serves_hot_dogs = models.BooleanField()
- If you only serialize the Restaurant model::
- data = serializers.serialize('xml', Restaurant.objects.all())
- the fields on the serialized output will only contain the `serves_hot_dogs`
- attribute. The `name` attribute of the base class will be ignored.
- In order to fully serialize your Restaurant instances, you will need to
- serialize the Place models as well::
- all_objects = list(Restaurant.objects.all()) + list(Place.objects.all())
- data = serializers.serialize('xml', all_objects)
- Deserializing data
- ------------------
- Deserializing data is also a fairly simple operation::
- for obj in serializers.deserialize("xml", data):
- do_something_with(obj)
- As you can see, the ``deserialize`` function takes the same format argument as
- ``serialize``, a string or stream of data, and returns an iterator.
- However, here it gets slightly complicated. The objects returned by the
- ``deserialize`` iterator *aren't* simple Django objects. Instead, they are
- special ``DeserializedObject`` instances that wrap a created -- but unsaved --
- object and any associated relationship data.
- Calling ``DeserializedObject.save()`` saves the object to the database.
- This ensures that deserializing is a non-destructive operation even if the
- data in your serialized representation doesn't match what's currently in the
- database. Usually, working with these ``DeserializedObject`` instances looks
- something like::
- for deserialized_object in serializers.deserialize("xml", data):
- if object_should_be_saved(deserialized_object):
- deserialized_object.save()
- In other words, the usual use is to examine the deserialized objects to make
- sure that they are "appropriate" for saving before doing so. Of course, if you
- trust your data source you could just save the object and move on.
- The Django object itself can be inspected as ``deserialized_object.object``.
- .. _serialization-formats:
- Serialization formats
- ---------------------
- Django supports a number of serialization formats, some of which require you
- to install third-party Python modules:
- ========== ==============================================================
- Identifier Information
- ========== ==============================================================
- ``xml`` Serializes to and from a simple XML dialect.
- ``json`` Serializes to and from JSON_ (using a version of simplejson_
- bundled with Django).
- ``python`` Translates to and from "simple" Python objects (lists, dicts,
- strings, etc.). Not really all that useful on its own, but
- used as a base for other serializers.
- ``yaml`` Serializes to YAML (YAML Ain't a Markup Language). This
- serializer is only available if PyYAML_ is installed.
- ========== ==============================================================
- .. _json: http://json.org/
- .. _simplejson: http://undefined.org/python/#simplejson
- .. _PyYAML: http://www.pyyaml.org/
- Notes for specific serialization formats
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- json
- ^^^^
- If you're using UTF-8 (or any other non-ASCII encoding) data with the JSON
- serializer, you must pass ``ensure_ascii=False`` as a parameter to the
- ``serialize()`` call. Otherwise, the output won't be encoded correctly.
- For example::
- json_serializer = serializers.get_serializer("json")()
- json_serializer.serialize(queryset, ensure_ascii=False, stream=response)
- The Django source code includes the simplejson_ module. However, if you're
- using Python 2.6 (which includes a builtin version of the module), Django will
- use the builtin ``json`` module automatically. If you have a system installed
- version that includes the C-based speedup extension, or your system version is
- more recent than the version shipped with Django (currently, 2.0.7), the
- system version will be used instead of the version included with Django.
- Be aware that if you're serializing using that module directly, not all Django
- output can be passed unmodified to simplejson. In particular, :ref:`lazy
- translation objects <lazy-translations>` need a `special encoder`_ written for
- them. Something like this will work::
- from django.utils.functional import Promise
- from django.utils.encoding import force_unicode
- class LazyEncoder(simplejson.JSONEncoder):
- def default(self, obj):
- if isinstance(obj, Promise):
- return force_unicode(obj)
- return super(LazyEncoder, self).default(obj)
- .. _special encoder: http://svn.red-bean.com/bob/simplejson/tags/simplejson-1.7/docs/index.html
- .. _topics-serialization-natural-keys:
- Natural keys
- ------------
- The default serialization strategy for foreign keys and many-to-many
- relations is to serialize the value of the primary key(s) of the
- objects in the relation. This strategy works well for most types of
- object, but it can cause difficulty in some circumstances.
- Consider the case of a list of objects that have foreign key on
- :class:`ContentType`. If you're going to serialize an object that
- refers to a content type, you need to have a way to refer to that
- content type. Content Types are automatically created by Django as
- part of the database synchronization process, so you don't need to
- include content types in a fixture or other serialized data. As a
- result, the primary key of any given content type isn't easy to
- predict - it will depend on how and when :djadmin:`syncdb` was
- executed to create the content types.
- There is also the matter of convenience. An integer id isn't always
- the most convenient way to refer to an object; sometimes, a
- more natural reference would be helpful.
- Deserialization of natural keys
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- It is for these reasons that Django provides `natural keys`. A natural
- key is a tuple of values that can be used to uniquely identify an
- object instance without using the primary key value.
- Consider the following two models::
- from django.db import models
- class Person(models.Model):
- first_name = models.CharField(max_length=100)
- last_name = models.CharField(max_length=100)
- birthdate = models.DateField()
- class Book(models.Model):
- name = models.CharField(max_length=100)
- author = models.ForeignKey(Person)
- Ordinarily, serialized data for ``Book`` would use an integer to refer to
- the author. For example, in JSON, a Book might be serialized as::
- ...
- {
- "pk": 1,
- "model": "store.book",
- "fields": {
- "name": "Mostly Harmless",
- "author": 42
- }
- }
- ...
- This isn't a particularly natural way to refer to an author. It
- requires that you know the primary key value for the author; it also
- requires that this primary key value is stable and predictable.
- However, if we add natural key handling to Person, the fixture becomes
- much more humane. To add natural key handling, you define a default
- Manager for Person with a ``get_by_natural_key()`` method. In the case
- of a Person, a good natural key might be the pair of first and last
- name::
- from django.db import models
- class PersonManager(models.Manager):
- def get_by_natural_key(self, first_name, last_name):
- return self.get(first_name=first_name, last_name=last_name)
- class Person(models.Model):
- objects = PersonManager()
- first_name = models.CharField(max_length=100)
- last_name = models.CharField(max_length=100)
- birthdate = models.DateField()
- Now books can use that natural key to refer to ``Person`` objects::
- ...
- {
- "pk": 1,
- "model": "store.book",
- "fields": {
- "name": "Mostly Harmless",
- "author": ["Douglas", "Adams"]
- }
- }
- ...
- When you try to load this serialized data, Django will use the
- ``get_by_natural_key()`` method to resolve ``["Douglas", "Adams"]``
- into the primary key of an actual ``Person`` object.
- Serialization of natural keys
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- So how do you get Django to emit a natural key when serializing an object?
- Firstly, you need to add another method -- this time to the model itself::
- class Person(models.Model):
- objects = PersonManager()
- first_name = models.CharField(max_length=100)
- last_name = models.CharField(max_length=100)
- birthdate = models.DateField()
- def natural_key(self):
- return (self.first_name, self.last_name)
- Then, when you call ``serializers.serialize()``, you provide a
- ``use_natural_keys=True`` argument::
- >>> serializers.serialize([book1, book2], format='json', indent=2, use_natural_keys=True)
- When ``use_natural_keys=True`` is specified, Django will use the
- ``natural_key()`` method to serialize any reference to objects of the
- type that defines the method.
- If you are using :djadmin:`dumpdata` to generate serialized data, you
- use the `--natural` command line flag to generate natural keys.
- .. note::
- You don't need to define both ``natural_key()`` and
- ``get_by_natural_key()``. If you don't want Django to output
- natural keys during serialization, but you want to retain the
- ability to load natural keys, then you can opt to not implement
- the ``natural_key()`` method.
- Conversely, if (for some strange reason) you want Django to output
- natural keys during serialization, but *not* be able to load those
- key values, just don't define the ``get_by_natural_key()`` method.
- Dependencies during serialization
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- Since natural keys rely on database lookups to resolve references, it
- is important that data exists before it is referenced. You can't make
- a `forward reference` with natural keys - the data you are referencing
- must exist before you include a natural key reference to that data.
- To accommodate this limitation, calls to :djadmin:`dumpdata` that use
- the :djadminopt:`--natural` optionwill serialize any model with a
- ``natural_key()`` method before it serializes normal key objects.
- However, this may not always be enough. If your natural key refers to
- another object (by using a foreign key or natural key to another object
- as part of a natural key), then you need to be able to ensure that
- the objects on which a natural key depends occur in the serialized data
- before the natural key requires them.
- To control this ordering, you can define dependencies on your
- ``natural_key()`` methods. You do this by setting a ``dependencies``
- attribute on the ``natural_key()`` method itself.
- For example, consider the ``Permission`` model in ``contrib.auth``.
- The following is a simplified version of the ``Permission`` model::
- class Permission(models.Model):
- name = models.CharField(max_length=50)
- content_type = models.ForeignKey(ContentType)
- codename = models.CharField(max_length=100)
- # ...
- def natural_key(self):
- return (self.codename,) + self.content_type.natural_key()
- The natural key for a ``Permission`` is a combination of the codename for the
- ``Permission``, and the ``ContentType`` to which the ``Permission`` applies. This means
- that ``ContentType`` must be serialized before ``Permission``. To define this
- dependency, we add one extra line::
- class Permission(models.Model):
- # ...
- def natural_key(self):
- return (self.codename,) + self.content_type.natural_key()
- natural_key.dependencies = ['contenttypes.contenttype']
- This definition ensures that ``ContentType`` models are serialized before
- ``Permission`` models. In turn, any object referencing ``Permission`` will
- be serialized after both ``ContentType`` and ``Permission``.
|