serialization.txt 15 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402
  1. ==========================
  2. Serializing Django objects
  3. ==========================
  4. Django's serialization framework provides a mechanism for "translating" Django
  5. models into other formats. Usually these other formats will be text-based and
  6. used for sending Django data over a wire, but it's possible for a
  7. serializer to handle any format (text-based or not).
  8. .. seealso::
  9. If you just want to get some data from your tables into a serialized
  10. form, you could use the :djadmin:`dumpdata` management command.
  11. Serializing data
  12. ----------------
  13. At the highest level, serializing data is a very simple operation::
  14. from django.core import serializers
  15. data = serializers.serialize("xml", SomeModel.objects.all())
  16. The arguments to the ``serialize`` function are the format to serialize the data
  17. to (see `Serialization formats`_) and a
  18. :class:`~django.db.models.query.QuerySet` to serialize. (Actually, the second
  19. argument can be any iterator that yields Django model instances, but it'll
  20. almost always be a QuerySet).
  21. .. function:: django.core.serializers.get_serializer(format)
  22. You can also use a serializer object directly::
  23. XMLSerializer = serializers.get_serializer("xml")
  24. xml_serializer = XMLSerializer()
  25. xml_serializer.serialize(queryset)
  26. data = xml_serializer.getvalue()
  27. This is useful if you want to serialize data directly to a file-like object
  28. (which includes an :class:`~django.http.HttpResponse`)::
  29. with open("file.xml", "w") as out:
  30. xml_serializer.serialize(SomeModel.objects.all(), stream=out)
  31. .. note::
  32. Calling :func:`~django.core.serializers.get_serializer` with an unknown
  33. :ref:`format <serialization-formats>` will raise a
  34. ``django.core.serializers.SerializerDoesNotExist`` exception.
  35. Subset of fields
  36. ~~~~~~~~~~~~~~~~
  37. If you only want a subset of fields to be serialized, you can
  38. specify a ``fields`` argument to the serializer::
  39. from django.core import serializers
  40. data = serializers.serialize('xml', SomeModel.objects.all(), fields=('name','size'))
  41. In this example, only the ``name`` and ``size`` attributes of each model will
  42. be serialized.
  43. .. note::
  44. Depending on your model, you may find that it is not possible to
  45. deserialize a model that only serializes a subset of its fields. If a
  46. serialized object doesn't specify all the fields that are required by a
  47. model, the deserializer will not be able to save deserialized instances.
  48. Inherited Models
  49. ~~~~~~~~~~~~~~~~
  50. If you have a model that is defined using an :ref:`abstract base class
  51. <abstract-base-classes>`, you don't have to do anything special to serialize
  52. that model. Just call the serializer on the object (or objects) that you want to
  53. serialize, and the output will be a complete representation of the serialized
  54. object.
  55. However, if you have a model that uses :ref:`multi-table inheritance
  56. <multi-table-inheritance>`, you also need to serialize all of the base classes
  57. for the model. This is because only the fields that are locally defined on the
  58. model will be serialized. For example, consider the following models::
  59. class Place(models.Model):
  60. name = models.CharField(max_length=50)
  61. class Restaurant(Place):
  62. serves_hot_dogs = models.BooleanField()
  63. If you only serialize the Restaurant model::
  64. data = serializers.serialize('xml', Restaurant.objects.all())
  65. the fields on the serialized output will only contain the `serves_hot_dogs`
  66. attribute. The `name` attribute of the base class will be ignored.
  67. In order to fully serialize your Restaurant instances, you will need to
  68. serialize the Place models as well::
  69. all_objects = list(Restaurant.objects.all()) + list(Place.objects.all())
  70. data = serializers.serialize('xml', all_objects)
  71. Deserializing data
  72. ------------------
  73. Deserializing data is also a fairly simple operation::
  74. for obj in serializers.deserialize("xml", data):
  75. do_something_with(obj)
  76. As you can see, the ``deserialize`` function takes the same format argument as
  77. ``serialize``, a string or stream of data, and returns an iterator.
  78. However, here it gets slightly complicated. The objects returned by the
  79. ``deserialize`` iterator *aren't* simple Django objects. Instead, they are
  80. special ``DeserializedObject`` instances that wrap a created -- but unsaved --
  81. object and any associated relationship data.
  82. Calling ``DeserializedObject.save()`` saves the object to the database.
  83. This ensures that deserializing is a non-destructive operation even if the
  84. data in your serialized representation doesn't match what's currently in the
  85. database. Usually, working with these ``DeserializedObject`` instances looks
  86. something like::
  87. for deserialized_object in serializers.deserialize("xml", data):
  88. if object_should_be_saved(deserialized_object):
  89. deserialized_object.save()
  90. In other words, the usual use is to examine the deserialized objects to make
  91. sure that they are "appropriate" for saving before doing so. Of course, if you
  92. trust your data source you could just save the object and move on.
  93. The Django object itself can be inspected as ``deserialized_object.object``.
  94. .. versionadded:: 1.5
  95. If fields in the serialized data do not exist on a model,
  96. a ``DeserializationError`` will be raised unless the ``ignorenonexistent``
  97. argument is passed in as True::
  98. serializers.deserialize("xml", data, ignorenonexistent=True)
  99. .. _serialization-formats:
  100. Serialization formats
  101. ---------------------
  102. Django supports a number of serialization formats, some of which require you
  103. to install third-party Python modules:
  104. ========== ==============================================================
  105. Identifier Information
  106. ========== ==============================================================
  107. ``xml`` Serializes to and from a simple XML dialect.
  108. ``json`` Serializes to and from JSON_.
  109. ``yaml`` Serializes to YAML (YAML Ain't a Markup Language). This
  110. serializer is only available if PyYAML_ is installed.
  111. ========== ==============================================================
  112. .. _json: http://json.org/
  113. .. _PyYAML: http://www.pyyaml.org/
  114. Notes for specific serialization formats
  115. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  116. json
  117. ^^^^
  118. Be aware that not all Django output can be passed unmodified to :mod:`json`.
  119. In particular, :ref:`lazy translation objects <lazy-translations>` need a
  120. `special encoder`_ written for them. Something like this will work::
  121. import json
  122. from django.utils.functional import Promise
  123. from django.utils.encoding import force_text
  124. class LazyEncoder(json.JSONEncoder):
  125. def default(self, obj):
  126. if isinstance(obj, Promise):
  127. return force_text(obj)
  128. return super(LazyEncoder, self).default(obj)
  129. .. _special encoder: http://docs.python.org/library/json.html#encoders-and-decoders
  130. .. _topics-serialization-natural-keys:
  131. Natural keys
  132. ------------
  133. The default serialization strategy for foreign keys and many-to-many relations
  134. is to serialize the value of the primary key(s) of the objects in the relation.
  135. This strategy works well for most objects, but it can cause difficulty in some
  136. circumstances.
  137. Consider the case of a list of objects that have a foreign key referencing
  138. :class:`~django.contrib.contenttypes.models.ContentType`. If you're going to
  139. serialize an object that refers to a content type, then you need to have a way
  140. to refer to that content type to begin with. Since ``ContentType`` objects are
  141. automatically created by Django during the database synchronization process,
  142. the primary key of a given content type isn't easy to predict; it will
  143. depend on how and when :djadmin:`syncdb` was executed. This is true for all
  144. models which automatically generate objects, notably including
  145. :class:`~django.contrib.auth.models.Permission`,
  146. :class:`~django.contrib.auth.models.Group`, and
  147. :class:`~django.contrib.auth.models.User`.
  148. .. warning::
  149. You should never include automatically generated objects in a fixture or
  150. other serialized data. By chance, the primary keys in the fixture
  151. may match those in the database and loading the fixture will
  152. have no effect. In the more likely case that they don't match, the fixture
  153. loading will fail with an :class:`~django.db.IntegrityError`.
  154. There is also the matter of convenience. An integer id isn't always
  155. the most convenient way to refer to an object; sometimes, a
  156. more natural reference would be helpful.
  157. It is for these reasons that Django provides *natural keys*. A natural
  158. key is a tuple of values that can be used to uniquely identify an
  159. object instance without using the primary key value.
  160. Deserialization of natural keys
  161. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  162. Consider the following two models::
  163. from django.db import models
  164. class Person(models.Model):
  165. first_name = models.CharField(max_length=100)
  166. last_name = models.CharField(max_length=100)
  167. birthdate = models.DateField()
  168. class Meta:
  169. unique_together = (('first_name', 'last_name'),)
  170. class Book(models.Model):
  171. name = models.CharField(max_length=100)
  172. author = models.ForeignKey(Person)
  173. Ordinarily, serialized data for ``Book`` would use an integer to refer to
  174. the author. For example, in JSON, a Book might be serialized as::
  175. ...
  176. {
  177. "pk": 1,
  178. "model": "store.book",
  179. "fields": {
  180. "name": "Mostly Harmless",
  181. "author": 42
  182. }
  183. }
  184. ...
  185. This isn't a particularly natural way to refer to an author. It
  186. requires that you know the primary key value for the author; it also
  187. requires that this primary key value is stable and predictable.
  188. However, if we add natural key handling to Person, the fixture becomes
  189. much more humane. To add natural key handling, you define a default
  190. Manager for Person with a ``get_by_natural_key()`` method. In the case
  191. of a Person, a good natural key might be the pair of first and last
  192. name::
  193. from django.db import models
  194. class PersonManager(models.Manager):
  195. def get_by_natural_key(self, first_name, last_name):
  196. return self.get(first_name=first_name, last_name=last_name)
  197. class Person(models.Model):
  198. objects = PersonManager()
  199. first_name = models.CharField(max_length=100)
  200. last_name = models.CharField(max_length=100)
  201. birthdate = models.DateField()
  202. class Meta:
  203. unique_together = (('first_name', 'last_name'),)
  204. Now books can use that natural key to refer to ``Person`` objects::
  205. ...
  206. {
  207. "pk": 1,
  208. "model": "store.book",
  209. "fields": {
  210. "name": "Mostly Harmless",
  211. "author": ["Douglas", "Adams"]
  212. }
  213. }
  214. ...
  215. When you try to load this serialized data, Django will use the
  216. ``get_by_natural_key()`` method to resolve ``["Douglas", "Adams"]``
  217. into the primary key of an actual ``Person`` object.
  218. .. note::
  219. Whatever fields you use for a natural key must be able to uniquely
  220. identify an object. This will usually mean that your model will
  221. have a uniqueness clause (either unique=True on a single field, or
  222. ``unique_together`` over multiple fields) for the field or fields
  223. in your natural key. However, uniqueness doesn't need to be
  224. enforced at the database level. If you are certain that a set of
  225. fields will be effectively unique, you can still use those fields
  226. as a natural key.
  227. Serialization of natural keys
  228. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  229. So how do you get Django to emit a natural key when serializing an object?
  230. Firstly, you need to add another method -- this time to the model itself::
  231. class Person(models.Model):
  232. objects = PersonManager()
  233. first_name = models.CharField(max_length=100)
  234. last_name = models.CharField(max_length=100)
  235. birthdate = models.DateField()
  236. def natural_key(self):
  237. return (self.first_name, self.last_name)
  238. class Meta:
  239. unique_together = (('first_name', 'last_name'),)
  240. That method should always return a natural key tuple -- in this
  241. example, ``(first name, last name)``. Then, when you call
  242. ``serializers.serialize()``, you provide a ``use_natural_keys=True``
  243. argument::
  244. >>> serializers.serialize('json', [book1, book2], indent=2, use_natural_keys=True)
  245. When ``use_natural_keys=True`` is specified, Django will use the
  246. ``natural_key()`` method to serialize any reference to objects of the
  247. type that defines the method.
  248. If you are using :djadmin:`dumpdata` to generate serialized data, you
  249. use the `--natural` command line flag to generate natural keys.
  250. .. note::
  251. You don't need to define both ``natural_key()`` and
  252. ``get_by_natural_key()``. If you don't want Django to output
  253. natural keys during serialization, but you want to retain the
  254. ability to load natural keys, then you can opt to not implement
  255. the ``natural_key()`` method.
  256. Conversely, if (for some strange reason) you want Django to output
  257. natural keys during serialization, but *not* be able to load those
  258. key values, just don't define the ``get_by_natural_key()`` method.
  259. Dependencies during serialization
  260. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  261. Since natural keys rely on database lookups to resolve references, it
  262. is important that the data exists before it is referenced. You can't make
  263. a `forward reference` with natural keys -- the data you're referencing
  264. must exist before you include a natural key reference to that data.
  265. To accommodate this limitation, calls to :djadmin:`dumpdata` that use
  266. the :djadminopt:`--natural` option will serialize any model with a
  267. ``natural_key()`` method before serializing standard primary key objects.
  268. However, this may not always be enough. If your natural key refers to
  269. another object (by using a foreign key or natural key to another object
  270. as part of a natural key), then you need to be able to ensure that
  271. the objects on which a natural key depends occur in the serialized data
  272. before the natural key requires them.
  273. To control this ordering, you can define dependencies on your
  274. ``natural_key()`` methods. You do this by setting a ``dependencies``
  275. attribute on the ``natural_key()`` method itself.
  276. For example, let's add a natural key to the ``Book`` model from the
  277. example above::
  278. class Book(models.Model):
  279. name = models.CharField(max_length=100)
  280. author = models.ForeignKey(Person)
  281. def natural_key(self):
  282. return (self.name,) + self.author.natural_key()
  283. The natural key for a ``Book`` is a combination of its name and its
  284. author. This means that ``Person`` must be serialized before ``Book``.
  285. To define this dependency, we add one extra line::
  286. def natural_key(self):
  287. return (self.name,) + self.author.natural_key()
  288. natural_key.dependencies = ['example_app.person']
  289. This definition ensures that all ``Person`` objects are serialized before
  290. any ``Book`` objects. In turn, any object referencing ``Book`` will be
  291. serialized after both ``Person`` and ``Book`` have been serialized.