serialization.txt 23 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647
  1. ==========================
  2. Serializing Django objects
  3. ==========================
  4. Django's serialization framework provides a mechanism for "translating" Django
  5. models into other formats. Usually these other formats will be text-based and
  6. used for sending Django data over a wire, but it's possible for a
  7. serializer to handle any format (text-based or not).
  8. .. seealso::
  9. If you just want to get some data from your tables into a serialized
  10. form, you could use the :djadmin:`dumpdata` management command.
  11. Serializing data
  12. ================
  13. At the highest level, you can serialize data like this::
  14. from django.core import serializers
  15. data = serializers.serialize("xml", SomeModel.objects.all())
  16. The arguments to the ``serialize`` function are the format to serialize the data
  17. to (see `Serialization formats`_) and a
  18. :class:`~django.db.models.query.QuerySet` to serialize. (Actually, the second
  19. argument can be any iterator that yields Django model instances, but it'll
  20. almost always be a QuerySet).
  21. .. function:: django.core.serializers.get_serializer(format)
  22. You can also use a serializer object directly::
  23. XMLSerializer = serializers.get_serializer("xml")
  24. xml_serializer = XMLSerializer()
  25. xml_serializer.serialize(queryset)
  26. data = xml_serializer.getvalue()
  27. This is useful if you want to serialize data directly to a file-like object
  28. (which includes an :class:`~django.http.HttpResponse`)::
  29. with open("file.xml", "w") as out:
  30. xml_serializer.serialize(SomeModel.objects.all(), stream=out)
  31. .. note::
  32. Calling :func:`~django.core.serializers.get_serializer` with an unknown
  33. :ref:`format <serialization-formats>` will raise a
  34. ``django.core.serializers.SerializerDoesNotExist`` exception.
  35. .. _subset-of-fields:
  36. Subset of fields
  37. ----------------
  38. If you only want a subset of fields to be serialized, you can
  39. specify a ``fields`` argument to the serializer::
  40. from django.core import serializers
  41. data = serializers.serialize("xml", SomeModel.objects.all(), fields=["name", "size"])
  42. In this example, only the ``name`` and ``size`` attributes of each model will
  43. be serialized. The primary key is always serialized as the ``pk`` element in the
  44. resulting output; it never appears in the ``fields`` part.
  45. .. note::
  46. Depending on your model, you may find that it is not possible to
  47. deserialize a model that only serializes a subset of its fields. If a
  48. serialized object doesn't specify all the fields that are required by a
  49. model, the deserializer will not be able to save deserialized instances.
  50. Inherited models
  51. ----------------
  52. If you have a model that is defined using an :ref:`abstract base class
  53. <abstract-base-classes>`, you don't have to do anything special to serialize
  54. that model. Call the serializer on the object (or objects) that you want to
  55. serialize, and the output will be a complete representation of the serialized
  56. object.
  57. However, if you have a model that uses :ref:`multi-table inheritance
  58. <multi-table-inheritance>`, you also need to serialize all of the base classes
  59. for the model. This is because only the fields that are locally defined on the
  60. model will be serialized. For example, consider the following models::
  61. class Place(models.Model):
  62. name = models.CharField(max_length=50)
  63. class Restaurant(Place):
  64. serves_hot_dogs = models.BooleanField(default=False)
  65. If you only serialize the Restaurant model::
  66. data = serializers.serialize("xml", Restaurant.objects.all())
  67. the fields on the serialized output will only contain the ``serves_hot_dogs``
  68. attribute. The ``name`` attribute of the base class will be ignored.
  69. In order to fully serialize your ``Restaurant`` instances, you will need to
  70. serialize the ``Place`` models as well::
  71. all_objects = [*Restaurant.objects.all(), *Place.objects.all()]
  72. data = serializers.serialize("xml", all_objects)
  73. Deserializing data
  74. ==================
  75. Deserializing data is very similar to serializing it::
  76. for obj in serializers.deserialize("xml", data):
  77. do_something_with(obj)
  78. As you can see, the ``deserialize`` function takes the same format argument as
  79. ``serialize``, a string or stream of data, and returns an iterator.
  80. However, here it gets slightly complicated. The objects returned by the
  81. ``deserialize`` iterator *aren't* regular Django objects. Instead, they are
  82. special ``DeserializedObject`` instances that wrap a created -- but unsaved --
  83. object and any associated relationship data.
  84. Calling ``DeserializedObject.save()`` saves the object to the database.
  85. .. note::
  86. If the ``pk`` attribute in the serialized data doesn't exist or is
  87. null, a new instance will be saved to the database.
  88. This ensures that deserializing is a non-destructive operation even if the
  89. data in your serialized representation doesn't match what's currently in the
  90. database. Usually, working with these ``DeserializedObject`` instances looks
  91. something like::
  92. for deserialized_object in serializers.deserialize("xml", data):
  93. if object_should_be_saved(deserialized_object):
  94. deserialized_object.save()
  95. In other words, the usual use is to examine the deserialized objects to make
  96. sure that they are "appropriate" for saving before doing so. If you trust your
  97. data source you can instead save the object directly and move on.
  98. The Django object itself can be inspected as ``deserialized_object.object``.
  99. If fields in the serialized data do not exist on a model, a
  100. ``DeserializationError`` will be raised unless the ``ignorenonexistent``
  101. argument is passed in as ``True``::
  102. serializers.deserialize("xml", data, ignorenonexistent=True)
  103. .. _serialization-formats:
  104. Serialization formats
  105. =====================
  106. Django supports a number of serialization formats, some of which require you
  107. to install third-party Python modules:
  108. ========== ==============================================================
  109. Identifier Information
  110. ========== ==============================================================
  111. ``xml`` Serializes to and from a simple XML dialect.
  112. ``json`` Serializes to and from JSON_.
  113. ``jsonl`` Serializes to and from JSONL_.
  114. ``yaml`` Serializes to YAML (YAML Ain't a Markup Language). This
  115. serializer is only available if PyYAML_ is installed.
  116. ========== ==============================================================
  117. .. _json: https://json.org/
  118. .. _jsonl: https://jsonlines.org/
  119. .. _PyYAML: https://pyyaml.org/
  120. XML
  121. ---
  122. The basic XML serialization format looks like this:
  123. .. code-block:: xml
  124. <?xml version="1.0" encoding="utf-8"?>
  125. <django-objects version="1.0">
  126. <object pk="123" model="sessions.session">
  127. <field type="DateTimeField" name="expire_date">2013-01-16T08:16:59.844560+00:00</field>
  128. <!-- ... -->
  129. </object>
  130. </django-objects>
  131. The whole collection of objects that is either serialized or deserialized is
  132. represented by a ``<django-objects>``-tag which contains multiple
  133. ``<object>``-elements. Each such object has two attributes: "pk" and "model",
  134. the latter being represented by the name of the app ("sessions") and the
  135. lowercase name of the model ("session") separated by a dot.
  136. Each field of the object is serialized as a ``<field>``-element sporting the
  137. fields "type" and "name". The text content of the element represents the value
  138. that should be stored.
  139. Foreign keys and other relational fields are treated a little bit differently:
  140. .. code-block:: xml
  141. <object pk="27" model="auth.permission">
  142. <!-- ... -->
  143. <field to="contenttypes.contenttype" name="content_type" rel="ManyToOneRel">9</field>
  144. <!-- ... -->
  145. </object>
  146. In this example we specify that the ``auth.Permission`` object with the PK 27
  147. has a foreign key to the ``contenttypes.ContentType`` instance with the PK 9.
  148. ManyToMany-relations are exported for the model that binds them. For instance,
  149. the ``auth.User`` model has such a relation to the ``auth.Permission`` model:
  150. .. code-block:: xml
  151. <object pk="1" model="auth.user">
  152. <!-- ... -->
  153. <field to="auth.permission" name="user_permissions" rel="ManyToManyRel">
  154. <object pk="46"></object>
  155. <object pk="47"></object>
  156. </field>
  157. </object>
  158. This example links the given user with the permission models with PKs 46 and 47.
  159. .. admonition:: Control characters
  160. If the content to be serialized contains control characters that are not
  161. accepted in the XML 1.0 standard, the serialization will fail with a
  162. :exc:`ValueError` exception. Read also the W3C's explanation of `HTML,
  163. XHTML, XML and Control Codes
  164. <https://www.w3.org/International/questions/qa-controls>`_.
  165. .. _serialization-formats-json:
  166. JSON
  167. ----
  168. When staying with the same example data as before it would be serialized as
  169. JSON in the following way::
  170. [
  171. {
  172. "pk": "4b678b301dfd8a4e0dad910de3ae245b",
  173. "model": "sessions.session",
  174. "fields": {
  175. "expire_date": "2013-01-16T08:16:59.844Z",
  176. # ...
  177. },
  178. }
  179. ]
  180. The formatting here is a bit simpler than with XML. The whole collection
  181. is just represented as an array and the objects are represented by JSON objects
  182. with three properties: "pk", "model" and "fields". "fields" is again an object
  183. containing each field's name and value as property and property-value
  184. respectively.
  185. Foreign keys have the PK of the linked object as property value.
  186. ManyToMany-relations are serialized for the model that defines them and are
  187. represented as a list of PKs.
  188. Be aware that not all Django output can be passed unmodified to :mod:`json`.
  189. For example, if you have some custom type in an object to be serialized, you'll
  190. have to write a custom :mod:`json` encoder for it. Something like this will
  191. work::
  192. from django.core.serializers.json import DjangoJSONEncoder
  193. class LazyEncoder(DjangoJSONEncoder):
  194. def default(self, obj):
  195. if isinstance(obj, YourCustomType):
  196. return str(obj)
  197. return super().default(obj)
  198. You can then pass ``cls=LazyEncoder`` to the ``serializers.serialize()``
  199. function::
  200. from django.core.serializers import serialize
  201. serialize("json", SomeModel.objects.all(), cls=LazyEncoder)
  202. Also note that GeoDjango provides a :doc:`customized GeoJSON serializer
  203. </ref/contrib/gis/serializers>`.
  204. ``DjangoJSONEncoder``
  205. ~~~~~~~~~~~~~~~~~~~~~
  206. .. class:: django.core.serializers.json.DjangoJSONEncoder
  207. The JSON serializer uses ``DjangoJSONEncoder`` for encoding. A subclass of
  208. :class:`~json.JSONEncoder`, it handles these additional types:
  209. :class:`~datetime.datetime`
  210. A string of the form ``YYYY-MM-DDTHH:mm:ss.sssZ`` or
  211. ``YYYY-MM-DDTHH:mm:ss.sss+HH:MM`` as defined in `ECMA-262`_.
  212. :class:`~datetime.date`
  213. A string of the form ``YYYY-MM-DD`` as defined in `ECMA-262`_.
  214. :class:`~datetime.time`
  215. A string of the form ``HH:MM:ss.sss`` as defined in `ECMA-262`_.
  216. :class:`~datetime.timedelta`
  217. A string representing a duration as defined in ISO-8601. For example,
  218. ``timedelta(days=1, hours=2, seconds=3.4)`` is represented as
  219. ``'P1DT02H00M03.400000S'``.
  220. :class:`~decimal.Decimal`, ``Promise`` (``django.utils.functional.lazy()`` objects), :class:`~uuid.UUID`
  221. A string representation of the object.
  222. .. _ecma-262: https://262.ecma-international.org/5.1/#sec-15.9.1.15
  223. .. _serialization-formats-jsonl:
  224. JSONL
  225. -----
  226. *JSONL* stands for *JSON Lines*. With this format, objects are separated by new
  227. lines, and each line contains a valid JSON object. JSONL serialized data looks
  228. like this::
  229. {"pk": "4b678b301dfd8a4e0dad910de3ae245b", "model": "sessions.session", "fields": {...}}
  230. {"pk": "88bea72c02274f3c9bf1cb2bb8cee4fc", "model": "sessions.session", "fields": {...}}
  231. {"pk": "9cf0e26691b64147a67e2a9f06ad7a53", "model": "sessions.session", "fields": {...}}
  232. JSONL can be useful for populating large databases, since the data can be
  233. processed line by line, rather than being loaded into memory all at once.
  234. YAML
  235. ----
  236. YAML serialization looks quite similar to JSON. The object list is serialized
  237. as a sequence mappings with the keys "pk", "model" and "fields". Each field is
  238. again a mapping with the key being name of the field and the value the value:
  239. .. code-block:: yaml
  240. - model: sessions.session
  241. pk: 4b678b301dfd8a4e0dad910de3ae245b
  242. fields:
  243. expire_date: 2013-01-16 08:16:59.844560+00:00
  244. Referential fields are again represented by the PK or sequence of PKs.
  245. .. _topics-serialization-natural-keys:
  246. Natural keys
  247. ============
  248. The default serialization strategy for foreign keys and many-to-many relations
  249. is to serialize the value of the primary key(s) of the objects in the relation.
  250. This strategy works well for most objects, but it can cause difficulty in some
  251. circumstances.
  252. Consider the case of a list of objects that have a foreign key referencing
  253. :class:`~django.contrib.contenttypes.models.ContentType`. If you're going to
  254. serialize an object that refers to a content type, then you need to have a way
  255. to refer to that content type to begin with. Since ``ContentType`` objects are
  256. automatically created by Django during the database synchronization process,
  257. the primary key of a given content type isn't easy to predict; it will
  258. depend on how and when :djadmin:`migrate` was executed. This is true for all
  259. models which automatically generate objects, notably including
  260. :class:`~django.contrib.auth.models.Permission`,
  261. :class:`~django.contrib.auth.models.Group`, and
  262. :class:`~django.contrib.auth.models.User`.
  263. .. warning::
  264. You should never include automatically generated objects in a fixture or
  265. other serialized data. By chance, the primary keys in the fixture
  266. may match those in the database and loading the fixture will
  267. have no effect. In the more likely case that they don't match, the fixture
  268. loading will fail with an :class:`~django.db.IntegrityError`.
  269. There is also the matter of convenience. An integer id isn't always
  270. the most convenient way to refer to an object; sometimes, a
  271. more natural reference would be helpful.
  272. It is for these reasons that Django provides *natural keys*. A natural
  273. key is a tuple of values that can be used to uniquely identify an
  274. object instance without using the primary key value.
  275. Deserialization of natural keys
  276. -------------------------------
  277. Consider the following two models::
  278. from django.db import models
  279. class Person(models.Model):
  280. first_name = models.CharField(max_length=100)
  281. last_name = models.CharField(max_length=100)
  282. birthdate = models.DateField()
  283. class Meta:
  284. constraints = [
  285. models.UniqueConstraint(
  286. fields=["first_name", "last_name"],
  287. name="unique_first_last_name",
  288. ),
  289. ]
  290. class Book(models.Model):
  291. name = models.CharField(max_length=100)
  292. author = models.ForeignKey(Person, on_delete=models.CASCADE)
  293. Ordinarily, serialized data for ``Book`` would use an integer to refer to
  294. the author. For example, in JSON, a Book might be serialized as::
  295. ...
  296. {"pk": 1, "model": "store.book", "fields": {"name": "Mostly Harmless", "author": 42}}
  297. ...
  298. This isn't a particularly natural way to refer to an author. It
  299. requires that you know the primary key value for the author; it also
  300. requires that this primary key value is stable and predictable.
  301. However, if we add natural key handling to Person, the fixture becomes
  302. much more humane. To add natural key handling, you define a default
  303. Manager for Person with a ``get_by_natural_key()`` method. In the case
  304. of a Person, a good natural key might be the pair of first and last
  305. name::
  306. from django.db import models
  307. class PersonManager(models.Manager):
  308. def get_by_natural_key(self, first_name, last_name):
  309. return self.get(first_name=first_name, last_name=last_name)
  310. class Person(models.Model):
  311. first_name = models.CharField(max_length=100)
  312. last_name = models.CharField(max_length=100)
  313. birthdate = models.DateField()
  314. objects = PersonManager()
  315. class Meta:
  316. constraints = [
  317. models.UniqueConstraint(
  318. fields=["first_name", "last_name"],
  319. name="unique_first_last_name",
  320. ),
  321. ]
  322. Now books can use that natural key to refer to ``Person`` objects::
  323. ...
  324. {
  325. "pk": 1,
  326. "model": "store.book",
  327. "fields": {"name": "Mostly Harmless", "author": ["Douglas", "Adams"]},
  328. }
  329. ...
  330. When you try to load this serialized data, Django will use the
  331. ``get_by_natural_key()`` method to resolve ``["Douglas", "Adams"]``
  332. into the primary key of an actual ``Person`` object.
  333. .. note::
  334. Whatever fields you use for a natural key must be able to uniquely
  335. identify an object. This will usually mean that your model will
  336. have a uniqueness clause (either ``unique=True`` on a single field, or a
  337. ``UniqueConstraint`` or ``unique_together`` over multiple fields) for the
  338. field or fields in your natural key. However, uniqueness doesn't need to be
  339. enforced at the database level. If you are certain that a set of fields
  340. will be effectively unique, you can still use those fields as a natural
  341. key.
  342. Deserialization of objects with no primary key will always check whether the
  343. model's manager has a ``get_by_natural_key()`` method and if so, use it to
  344. populate the deserialized object's primary key.
  345. Serialization of natural keys
  346. -----------------------------
  347. So how do you get Django to emit a natural key when serializing an object?
  348. Firstly, you need to add another method -- this time to the model itself::
  349. class Person(models.Model):
  350. first_name = models.CharField(max_length=100)
  351. last_name = models.CharField(max_length=100)
  352. birthdate = models.DateField()
  353. objects = PersonManager()
  354. class Meta:
  355. constraints = [
  356. models.UniqueConstraint(
  357. fields=["first_name", "last_name"],
  358. name="unique_first_last_name",
  359. ),
  360. ]
  361. def natural_key(self):
  362. return (self.first_name, self.last_name)
  363. That method should always return a natural key tuple -- in this
  364. example, ``(first name, last name)``. Then, when you call
  365. ``serializers.serialize()``, you provide ``use_natural_foreign_keys=True``
  366. or ``use_natural_primary_keys=True`` arguments:
  367. .. code-block:: pycon
  368. >>> serializers.serialize(
  369. ... "json",
  370. ... [book1, book2],
  371. ... indent=2,
  372. ... use_natural_foreign_keys=True,
  373. ... use_natural_primary_keys=True,
  374. ... )
  375. When ``use_natural_foreign_keys=True`` is specified, Django will use the
  376. ``natural_key()`` method to serialize any foreign key reference to objects
  377. of the type that defines the method.
  378. When ``use_natural_primary_keys=True`` is specified, Django will not provide the
  379. primary key in the serialized data of this object since it can be calculated
  380. during deserialization::
  381. ...
  382. {
  383. "model": "store.person",
  384. "fields": {
  385. "first_name": "Douglas",
  386. "last_name": "Adams",
  387. "birth_date": "1952-03-11",
  388. },
  389. }
  390. ...
  391. This can be useful when you need to load serialized data into an existing
  392. database and you cannot guarantee that the serialized primary key value is not
  393. already in use, and do not need to ensure that deserialized objects retain the
  394. same primary keys.
  395. If you are using :djadmin:`dumpdata` to generate serialized data, use the
  396. :option:`dumpdata --natural-foreign` and :option:`dumpdata --natural-primary`
  397. command line flags to generate natural keys.
  398. .. note::
  399. You don't need to define both ``natural_key()`` and
  400. ``get_by_natural_key()``. If you don't want Django to output
  401. natural keys during serialization, but you want to retain the
  402. ability to load natural keys, then you can opt to not implement
  403. the ``natural_key()`` method.
  404. Conversely, if (for some strange reason) you want Django to output
  405. natural keys during serialization, but *not* be able to load those
  406. key values, just don't define the ``get_by_natural_key()`` method.
  407. .. _natural-keys-and-forward-references:
  408. Natural keys and forward references
  409. -----------------------------------
  410. Sometimes when you use :ref:`natural foreign keys
  411. <topics-serialization-natural-keys>` you'll need to deserialize data where
  412. an object has a foreign key referencing another object that hasn't yet been
  413. deserialized. This is called a "forward reference".
  414. For instance, suppose you have the following objects in your fixture::
  415. ...
  416. {
  417. "model": "store.book",
  418. "fields": {"name": "Mostly Harmless", "author": ["Douglas", "Adams"]},
  419. },
  420. ...
  421. {"model": "store.person", "fields": {"first_name": "Douglas", "last_name": "Adams"}},
  422. ...
  423. In order to handle this situation, you need to pass
  424. ``handle_forward_references=True`` to ``serializers.deserialize()``. This will
  425. set the ``deferred_fields`` attribute on the ``DeserializedObject`` instances.
  426. You'll need to keep track of ``DeserializedObject`` instances where this
  427. attribute isn't ``None`` and later call ``save_deferred_fields()`` on them.
  428. Typical usage looks like this::
  429. objs_with_deferred_fields = []
  430. for obj in serializers.deserialize("xml", data, handle_forward_references=True):
  431. obj.save()
  432. if obj.deferred_fields is not None:
  433. objs_with_deferred_fields.append(obj)
  434. for obj in objs_with_deferred_fields:
  435. obj.save_deferred_fields()
  436. For this to work, the ``ForeignKey`` on the referencing model must have
  437. ``null=True``.
  438. Dependencies during serialization
  439. ---------------------------------
  440. It's often possible to avoid explicitly having to handle forward references by
  441. taking care with the ordering of objects within a fixture.
  442. To help with this, calls to :djadmin:`dumpdata` that use the :option:`dumpdata
  443. --natural-foreign` option will serialize any model with a ``natural_key()``
  444. method before serializing standard primary key objects.
  445. However, this may not always be enough. If your natural key refers to
  446. another object (by using a foreign key or natural key to another object
  447. as part of a natural key), then you need to be able to ensure that
  448. the objects on which a natural key depends occur in the serialized data
  449. before the natural key requires them.
  450. To control this ordering, you can define dependencies on your
  451. ``natural_key()`` methods. You do this by setting a ``dependencies``
  452. attribute on the ``natural_key()`` method itself.
  453. For example, let's add a natural key to the ``Book`` model from the
  454. example above::
  455. class Book(models.Model):
  456. name = models.CharField(max_length=100)
  457. author = models.ForeignKey(Person, on_delete=models.CASCADE)
  458. def natural_key(self):
  459. return (self.name,) + self.author.natural_key()
  460. The natural key for a ``Book`` is a combination of its name and its
  461. author. This means that ``Person`` must be serialized before ``Book``.
  462. To define this dependency, we add one extra line::
  463. def natural_key(self):
  464. return (self.name,) + self.author.natural_key()
  465. natural_key.dependencies = ["example_app.person"]
  466. This definition ensures that all ``Person`` objects are serialized before
  467. any ``Book`` objects. In turn, any object referencing ``Book`` will be
  468. serialized after both ``Person`` and ``Book`` have been serialized.