serialization.txt 15 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408
  1. .. _topics-serialization:
  2. ==========================
  3. Serializing Django objects
  4. ==========================
  5. Django's serialization framework provides a mechanism for "translating" Django
  6. objects into other formats. Usually these other formats will be text-based and
  7. used for sending Django objects over a wire, but it's possible for a
  8. serializer to handle any format (text-based or not).
  9. .. seealso::
  10. If you just want to get some data from your tables into a serialized
  11. form, you could use the :djadmin:`dumpdata` management command.
  12. Serializing data
  13. ----------------
  14. At the highest level, serializing data is a very simple operation::
  15. from django.core import serializers
  16. data = serializers.serialize("xml", SomeModel.objects.all())
  17. The arguments to the ``serialize`` function are the format to serialize the data
  18. to (see `Serialization formats`_) and a :class:`~django.db.models.QuerySet` to
  19. serialize. (Actually, the second argument can be any iterator that yields Django
  20. objects, but it'll almost always be a QuerySet).
  21. You can also use a serializer object directly::
  22. XMLSerializer = serializers.get_serializer("xml")
  23. xml_serializer = XMLSerializer()
  24. xml_serializer.serialize(queryset)
  25. data = xml_serializer.getvalue()
  26. This is useful if you want to serialize data directly to a file-like object
  27. (which includes an :class:`~django.http.HttpResponse`)::
  28. out = open("file.xml", "w")
  29. xml_serializer.serialize(SomeModel.objects.all(), stream=out)
  30. Subset of fields
  31. ~~~~~~~~~~~~~~~~
  32. If you only want a subset of fields to be serialized, you can
  33. specify a ``fields`` argument to the serializer::
  34. from django.core import serializers
  35. data = serializers.serialize('xml', SomeModel.objects.all(), fields=('name','size'))
  36. In this example, only the ``name`` and ``size`` attributes of each model will
  37. be serialized.
  38. .. note::
  39. Depending on your model, you may find that it is not possible to
  40. deserialize a model that only serializes a subset of its fields. If a
  41. serialized object doesn't specify all the fields that are required by a
  42. model, the deserializer will not be able to save deserialized instances.
  43. Inherited Models
  44. ~~~~~~~~~~~~~~~~
  45. If you have a model that is defined using an :ref:`abstract base class
  46. <abstract-base-classes>`, you don't have to do anything special to serialize
  47. that model. Just call the serializer on the object (or objects) that you want to
  48. serialize, and the output will be a complete representation of the serialized
  49. object.
  50. However, if you have a model that uses :ref:`multi-table inheritance
  51. <multi-table-inheritance>`, you also need to serialize all of the base classes
  52. for the model. This is because only the fields that are locally defined on the
  53. model will be serialized. For example, consider the following models::
  54. class Place(models.Model):
  55. name = models.CharField(max_length=50)
  56. class Restaurant(Place):
  57. serves_hot_dogs = models.BooleanField()
  58. If you only serialize the Restaurant model::
  59. data = serializers.serialize('xml', Restaurant.objects.all())
  60. the fields on the serialized output will only contain the `serves_hot_dogs`
  61. attribute. The `name` attribute of the base class will be ignored.
  62. In order to fully serialize your Restaurant instances, you will need to
  63. serialize the Place models as well::
  64. all_objects = list(Restaurant.objects.all()) + list(Place.objects.all())
  65. data = serializers.serialize('xml', all_objects)
  66. Deserializing data
  67. ------------------
  68. Deserializing data is also a fairly simple operation::
  69. for obj in serializers.deserialize("xml", data):
  70. do_something_with(obj)
  71. As you can see, the ``deserialize`` function takes the same format argument as
  72. ``serialize``, a string or stream of data, and returns an iterator.
  73. However, here it gets slightly complicated. The objects returned by the
  74. ``deserialize`` iterator *aren't* simple Django objects. Instead, they are
  75. special ``DeserializedObject`` instances that wrap a created -- but unsaved --
  76. object and any associated relationship data.
  77. Calling ``DeserializedObject.save()`` saves the object to the database.
  78. This ensures that deserializing is a non-destructive operation even if the
  79. data in your serialized representation doesn't match what's currently in the
  80. database. Usually, working with these ``DeserializedObject`` instances looks
  81. something like::
  82. for deserialized_object in serializers.deserialize("xml", data):
  83. if object_should_be_saved(deserialized_object):
  84. deserialized_object.save()
  85. In other words, the usual use is to examine the deserialized objects to make
  86. sure that they are "appropriate" for saving before doing so. Of course, if you
  87. trust your data source you could just save the object and move on.
  88. The Django object itself can be inspected as ``deserialized_object.object``.
  89. .. _serialization-formats:
  90. Serialization formats
  91. ---------------------
  92. Django supports a number of serialization formats, some of which require you
  93. to install third-party Python modules:
  94. ========== ==============================================================
  95. Identifier Information
  96. ========== ==============================================================
  97. ``xml`` Serializes to and from a simple XML dialect.
  98. ``json`` Serializes to and from JSON_ (using a version of simplejson_
  99. bundled with Django).
  100. ``python`` Translates to and from "simple" Python objects (lists, dicts,
  101. strings, etc.). Not really all that useful on its own, but
  102. used as a base for other serializers.
  103. ``yaml`` Serializes to YAML (YAML Ain't a Markup Language). This
  104. serializer is only available if PyYAML_ is installed.
  105. ========== ==============================================================
  106. .. _json: http://json.org/
  107. .. _simplejson: http://undefined.org/python/#simplejson
  108. .. _PyYAML: http://www.pyyaml.org/
  109. Notes for specific serialization formats
  110. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  111. json
  112. ^^^^
  113. If you're using UTF-8 (or any other non-ASCII encoding) data with the JSON
  114. serializer, you must pass ``ensure_ascii=False`` as a parameter to the
  115. ``serialize()`` call. Otherwise, the output won't be encoded correctly.
  116. For example::
  117. json_serializer = serializers.get_serializer("json")()
  118. json_serializer.serialize(queryset, ensure_ascii=False, stream=response)
  119. The Django source code includes the simplejson_ module. However, if you're
  120. using Python 2.6 or later (which includes a builtin version of the module), Django will
  121. use the builtin ``json`` module automatically. If you have a system installed
  122. version that includes the C-based speedup extension, or your system version is
  123. more recent than the version shipped with Django (currently, 2.0.7), the
  124. system version will be used instead of the version included with Django.
  125. Be aware that if you're serializing using that module directly, not all Django
  126. output can be passed unmodified to simplejson. In particular, :ref:`lazy
  127. translation objects <lazy-translations>` need a `special encoder`_ written for
  128. them. Something like this will work::
  129. from django.utils.functional import Promise
  130. from django.utils.encoding import force_unicode
  131. class LazyEncoder(simplejson.JSONEncoder):
  132. def default(self, obj):
  133. if isinstance(obj, Promise):
  134. return force_unicode(obj)
  135. return super(LazyEncoder, self).default(obj)
  136. .. _special encoder: http://svn.red-bean.com/bob/simplejson/tags/simplejson-1.7/docs/index.html
  137. .. _topics-serialization-natural-keys:
  138. Natural keys
  139. ------------
  140. .. versionadded:: 1.2
  141. The ability to use natural keys when serializing/deserializing data was
  142. added in the 1.2 release.
  143. The default serialization strategy for foreign keys and many-to-many
  144. relations is to serialize the value of the primary key(s) of the
  145. objects in the relation. This strategy works well for most types of
  146. object, but it can cause difficulty in some circumstances.
  147. Consider the case of a list of objects that have foreign key on
  148. :class:`ContentType`. If you're going to serialize an object that
  149. refers to a content type, you need to have a way to refer to that
  150. content type. Content Types are automatically created by Django as
  151. part of the database synchronization process, so you don't need to
  152. include content types in a fixture or other serialized data. As a
  153. result, the primary key of any given content type isn't easy to
  154. predict - it will depend on how and when :djadmin:`syncdb` was
  155. executed to create the content types.
  156. There is also the matter of convenience. An integer id isn't always
  157. the most convenient way to refer to an object; sometimes, a
  158. more natural reference would be helpful.
  159. It is for these reasons that Django provides *natural keys*. A natural
  160. key is a tuple of values that can be used to uniquely identify an
  161. object instance without using the primary key value.
  162. Deserialization of natural keys
  163. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  164. Consider the following two models::
  165. from django.db import models
  166. class Person(models.Model):
  167. first_name = models.CharField(max_length=100)
  168. last_name = models.CharField(max_length=100)
  169. birthdate = models.DateField()
  170. class Meta:
  171. unique_together = (('first_name', 'last_name'),)
  172. class Book(models.Model):
  173. name = models.CharField(max_length=100)
  174. author = models.ForeignKey(Person)
  175. Ordinarily, serialized data for ``Book`` would use an integer to refer to
  176. the author. For example, in JSON, a Book might be serialized as::
  177. ...
  178. {
  179. "pk": 1,
  180. "model": "store.book",
  181. "fields": {
  182. "name": "Mostly Harmless",
  183. "author": 42
  184. }
  185. }
  186. ...
  187. This isn't a particularly natural way to refer to an author. It
  188. requires that you know the primary key value for the author; it also
  189. requires that this primary key value is stable and predictable.
  190. However, if we add natural key handling to Person, the fixture becomes
  191. much more humane. To add natural key handling, you define a default
  192. Manager for Person with a ``get_by_natural_key()`` method. In the case
  193. of a Person, a good natural key might be the pair of first and last
  194. name::
  195. from django.db import models
  196. class PersonManager(models.Manager):
  197. def get_by_natural_key(self, first_name, last_name):
  198. return self.get(first_name=first_name, last_name=last_name)
  199. class Person(models.Model):
  200. objects = PersonManager()
  201. first_name = models.CharField(max_length=100)
  202. last_name = models.CharField(max_length=100)
  203. birthdate = models.DateField()
  204. class Meta:
  205. unique_together = (('first_name', 'last_name'),)
  206. Now books can use that natural key to refer to ``Person`` objects::
  207. ...
  208. {
  209. "pk": 1,
  210. "model": "store.book",
  211. "fields": {
  212. "name": "Mostly Harmless",
  213. "author": ["Douglas", "Adams"]
  214. }
  215. }
  216. ...
  217. When you try to load this serialized data, Django will use the
  218. ``get_by_natural_key()`` method to resolve ``["Douglas", "Adams"]``
  219. into the primary key of an actual ``Person`` object.
  220. .. note::
  221. Whatever fields you use for a natural key must be able to uniquely
  222. identify an object. This will usually mean that your model will
  223. have a uniqueness clause (either unique=True on a single field, or
  224. ``unique_together`` over multiple fields) for the field or fields
  225. in your natural key. However, uniqueness doesn't need to be
  226. enforced at the database level. If you are certain that a set of
  227. fields will be effectively unique, you can still use those fields
  228. as a natural key.
  229. Serialization of natural keys
  230. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  231. So how do you get Django to emit a natural key when serializing an object?
  232. Firstly, you need to add another method -- this time to the model itself::
  233. class Person(models.Model):
  234. objects = PersonManager()
  235. first_name = models.CharField(max_length=100)
  236. last_name = models.CharField(max_length=100)
  237. birthdate = models.DateField()
  238. def natural_key(self):
  239. return (self.first_name, self.last_name)
  240. class Meta:
  241. unique_together = (('first_name', 'last_name'),)
  242. That method should always return a natural key tuple -- in this
  243. example, ``(first name, last name)``. Then, when you call
  244. ``serializers.serialize()``, you provide a ``use_natural_keys=True``
  245. argument::
  246. >>> serializers.serialize([book1, book2], format='json', indent=2, use_natural_keys=True)
  247. When ``use_natural_keys=True`` is specified, Django will use the
  248. ``natural_key()`` method to serialize any reference to objects of the
  249. type that defines the method.
  250. If you are using :djadmin:`dumpdata` to generate serialized data, you
  251. use the `--natural` command line flag to generate natural keys.
  252. .. note::
  253. You don't need to define both ``natural_key()`` and
  254. ``get_by_natural_key()``. If you don't want Django to output
  255. natural keys during serialization, but you want to retain the
  256. ability to load natural keys, then you can opt to not implement
  257. the ``natural_key()`` method.
  258. Conversely, if (for some strange reason) you want Django to output
  259. natural keys during serialization, but *not* be able to load those
  260. key values, just don't define the ``get_by_natural_key()`` method.
  261. Dependencies during serialization
  262. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  263. Since natural keys rely on database lookups to resolve references, it
  264. is important that data exists before it is referenced. You can't make
  265. a `forward reference` with natural keys - the data you are referencing
  266. must exist before you include a natural key reference to that data.
  267. To accommodate this limitation, calls to :djadmin:`dumpdata` that use
  268. the :djadminopt:`--natural` option will serialize any model with a
  269. ``natural_key()`` method before it serializes normal key objects.
  270. However, this may not always be enough. If your natural key refers to
  271. another object (by using a foreign key or natural key to another object
  272. as part of a natural key), then you need to be able to ensure that
  273. the objects on which a natural key depends occur in the serialized data
  274. before the natural key requires them.
  275. To control this ordering, you can define dependencies on your
  276. ``natural_key()`` methods. You do this by setting a ``dependencies``
  277. attribute on the ``natural_key()`` method itself.
  278. For example, consider the ``Permission`` model in ``contrib.auth``.
  279. The following is a simplified version of the ``Permission`` model::
  280. class Permission(models.Model):
  281. name = models.CharField(max_length=50)
  282. content_type = models.ForeignKey(ContentType)
  283. codename = models.CharField(max_length=100)
  284. # ...
  285. def natural_key(self):
  286. return (self.codename,) + self.content_type.natural_key()
  287. The natural key for a ``Permission`` is a combination of the codename for the
  288. ``Permission``, and the ``ContentType`` to which the ``Permission`` applies. This means
  289. that ``ContentType`` must be serialized before ``Permission``. To define this
  290. dependency, we add one extra line::
  291. class Permission(models.Model):
  292. # ...
  293. def natural_key(self):
  294. return (self.codename,) + self.content_type.natural_key()
  295. natural_key.dependencies = ['contenttypes.contenttype']
  296. This definition ensures that ``ContentType`` models are serialized before
  297. ``Permission`` models. In turn, any object referencing ``Permission`` will
  298. be serialized after both ``ContentType`` and ``Permission``.