rich_text_internals.rst 19 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278
  1. Rich text internals
  2. ===================
  3. At first glance, Wagtail's rich text capabilities appear to give editors direct control over a block of HTML content. In reality, it's necessary to give editors a representation of rich text content that is several steps removed from the final HTML output, for several reasons:
  4. * The editor interface needs to filter out certain kinds of unwanted markup; this includes malicious scripting, font styles pasted from an external word processor, and elements which would break the validity or consistency of the site design (for example, pages will generally reserve the ``<h1>`` element for the page title, and so it would be inappropriate to allow users to insert their own additional ``<h1>`` elements through rich text).
  5. * Rich text fields can specify a ``features`` argument to further restrict the elements permitted in the field - see :ref:`rich_text_features`.
  6. * Enforcing a subset of HTML helps to keep presentational markup out of the database, making the site more maintainable, and making it easier to repurpose site content (including, potentially, producing non-HTML output such as `LaTeX <https://www.latex-project.org/>`_).
  7. * Elements such as page links and images need to preserve metadata such as the page or image ID, which is not present in the final HTML representation.
  8. This requires the rich text content to go through a number of validation and conversion steps; both between the editor interface and the version stored in the database, and from the database representation to the final rendered HTML.
  9. For this reason, extending Wagtail's rich text handling to support a new element is more involved than simply saying (for example) "enable the ``<blockquote>`` element", since various components of Wagtail - both client and server-side - need to agree on how to handle that feature, including how it should be exposed in the editor interface, how it should be represented within the database, and (if appropriate) how it should be translated when rendered on the front-end.
  10. The components involved in Wagtail's rich text handling are described below.
  11. Data format
  12. -----------
  13. Rich text data (as handled by :ref:`RichTextField <rich-text>`, and ``RichTextBlock`` within :doc:`StreamField </topics/streamfield>`) is stored in the database in a format that is similar, but not identical, to HTML. For example, a link to a page might be stored as:
  14. .. code-block:: html
  15. <p><a linktype="page" id="3">Contact us</a> for more information.</p>
  16. Here, the ``linktype`` attribute identifies a rule that shall be used to rewrite the tag. When rendered on a template through the ``|richtext`` filter (see :ref:`rich-text-filter`), this is converted into valid HTML:
  17. .. code-block:: html
  18. <p><a href="/contact-us/">Contact us</a> for more information.</p>
  19. In the case of ``RichTextBlock``, the block's value is a ``RichText`` object which performs this conversion automatically when rendered as a string, so the ``|richtext`` filter is not necessary.
  20. Likewise, an image inside rich text content might be stored as:
  21. .. code-block:: html
  22. <embed embedtype="image" id="10" alt="A pied wagtail" format="left" />
  23. which is converted into an ``img`` element when rendered:
  24. .. code-block:: html
  25. <img alt="A pied wagtail" class="richtext-image left" height="294" src="/media/images/pied-wagtail.width-500_ENyKffb.jpg" width="500">
  26. Again, the ``embedtype`` attribute identifies a rule that shall be used to rewrite the tag. All tags other than ``<a linktype="...">`` and ``<embed embedtype="..." />`` are left unchanged in the converted HTML.
  27. A number of additional constraints apply to ``<a linktype="...">`` and ``<embed embedtype="..." />`` tags, to allow the conversion to be performed efficiently via string replacement:
  28. * The tag name and attributes must be lower-case
  29. * Attribute values must be quoted with double-quotes
  30. * ``embed`` elements must use XML self-closing tag syntax (i.e. end in ``/>`` instead of a closing ``</embed>`` tag)
  31. * The only HTML entities permitted in attribute values are ``&lt;``, ``&gt;``, ``&amp;`` and ``&quot;``
  32. The feature registry
  33. --------------------
  34. Any app within your project can define extensions to Wagtail's rich text handling, such as new ``linktype`` and ``embedtype`` rules. An object known as the *feature registry* serves as a central source of truth about how rich text should behave. This object can be accessed through the :ref:`register_rich_text_features` hook, which is called on startup to gather all definitions relating to rich text:
  35. .. code-block:: python
  36. # my_app/wagtail_hooks.py
  37. from wagtail.core import hooks
  38. @hooks.register('register_rich_text_features')
  39. def register_my_feature(features):
  40. # add new definitions to 'features' here
  41. .. _rich_text_rewrite_handlers:
  42. Rewrite handlers
  43. ----------------
  44. Rewrite handlers are classes that know how to translate the content of rich text tags like ``<a linktype="...">`` and ``<embed embedtype="..." />`` into front-end HTML. For example, the ``PageLinkHandler`` class knows how to convert the rich text tag ``<a linktype="page" id="123">`` into the HTML tag ``<a href="/path/to/page/123">``.
  45. Rewrite handlers can also provide other useful information about rich text tags. For example, given an appropriate tag, ``PageLinkHandler`` can be used to extract which page is being referred to. This can be useful for downstream code that may want information about objects being referenced in rich text.
  46. You can create custom rewrite handlers to support your own new ``linktype`` and ``embedtype`` tags. New handlers must be Python classes that inherit from either ``wagtail.core.richtext.LinkHandler`` or ``wagtail.core.richtext.EmbedHandler``. Your new classes should override at least some of the following methods (listed here for ``LinkHandler``, although ``EmbedHandler`` has an identical signature):
  47. .. class:: LinkHandler
  48. .. attribute:: identifier
  49. Required. The ``identifier`` attribute is a string that indicates which rich text tags should be handled by this handler.
  50. For example, ``PageLinkHandler.identifier`` is set to the string ``"page"``, indicating that any rich text tags with ``<a linktype="page">`` should be handled by it.
  51. .. method:: expand_db_attributes(attrs)
  52. Required. The ``expand_db_attributes`` method is expected to take a dictionary of attributes from a database rich text ``<a>`` tag (``<embed>`` for ``EmbedHandler``) and use it to generate valid frontend HTML.
  53. For example, ``PageLinkHandler.expand_db_attributes`` might receive ``{'id': 123}``, use it to retrieve the Wagtail page with ID 123, and render a link to its URL like ``<a href="/path/to/page/123">``.
  54. .. method:: get_model()
  55. Optional. The static ``get_model`` method only applies to those handlers that are used to render content related to Django models. This method allows handlers to expose the type of content that they know how to handle.
  56. For example, ``PageLinkHandler.get_model`` returns the Wagtail class ``Page``.
  57. Handlers that aren't related to Django models can leave this method undefined, and calling it will raise ``NotImplementedError``.
  58. .. method:: get_instance(attrs)
  59. Optional. The static or classmethod ``get_instance`` method also only applies to those handlers that are used to render content related to Django models. This method is expected to take a dictionary of attributes from a database rich text ``<a>`` tag (``<embed>`` for ``EmbedHandler``) and use it to return the specific Django model instance being referred to.
  60. For example, ``PageLinkHandler.get_instance`` might receive ``{'id': 123}`` and return the instance of the Wagtail ``Page`` class with ID 123.
  61. If left undefined, a default implementation of this method will query the ``id`` model field on the class returned by ``get_model`` using the provided ``id`` attribute; this can be overridden in your own handlers should you want to use some other model field.
  62. Below is an example custom rewrite handler that implements these methods to add support for rich text linking to user email addresses. It supports the conversion of rich text tags like ``<a linktype="user" username="wagtail">`` to valid HTML like ``<a href="mailto:hello@wagtail.io">``. This example assumes that equivalent front-end functionality has been added to allow users to insert these kinds of links into their rich text editor.
  63. .. code-block:: python
  64. from django.contrib.auth import get_user_model
  65. from wagtail.core.rich_text import LinkHandler
  66. class UserLinkHandler(LinkHandler):
  67. identifier = 'user'
  68. @staticmethod
  69. def get_model():
  70. return get_user_model()
  71. @classmethod
  72. def get_instance(cls, attrs):
  73. model = cls.get_model()
  74. return model.objects.get(username=attrs['username'])
  75. @classmethod
  76. def expand_db_attributes(cls, attrs):
  77. user = cls.get_instance(attrs)
  78. return '<a href="mailto:%s">' % user.email
  79. Registering rewrite handlers
  80. ----------------------------
  81. Rewrite handlers must also be registered with the feature registry via the :ref:`register_rich_text_features` hook. Independent methods for registering both link handlers and embed handlers are provided.
  82. .. method:: FeatureRegistry.register_link_type(handler)
  83. This method allows you to register a custom handler deriving from ``wagtail.core.rich_text.LinkHandler``, and adds it to the list of link handlers available during rich text conversion.
  84. .. code-block:: python
  85. # my_app/wagtail_hooks.py
  86. from wagtail.core import hooks
  87. from my_app.handlers import MyCustomLinkHandler
  88. @hooks.register('register_rich_text_features')
  89. def register_link_handler(features):
  90. features.register_link_type(MyCustomLinkHandler)
  91. It is also possible to define link rewrite handlers for Wagtail’s built-in ``external`` and ``email`` links, even though they do not have a predefined ``linktype``. For example, if you want external links to have a ``rel="nofollow"`` attribute for SEO purposes:
  92. .. code-block:: python
  93. from django.utils.html import escape
  94. from wagtail.core import hooks
  95. from wagtail.core.rich_text import LinkHandler
  96. class NoFollowExternalLinkHandler(LinkHandler):
  97. identifier = 'external'
  98. @classmethod
  99. def expand_db_attributes(cls, attrs):
  100. href = attrs["href"]
  101. return '<a href="%s" rel="nofollow">' % escape(href)
  102. @hooks.register('register_rich_text_features')
  103. def register_external_link(features):
  104. features.register_link_type(NoFollowExternalLinkHandler)
  105. Similarly you can use ``email`` linktype to add a custom rewrite handler for email links (e.g. to obfuscate emails in rich text).
  106. .. method:: FeatureRegistry.register_embed_type(handler)
  107. This method allows you to register a custom handler deriving from ``wagtail.core.rich_text.EmbedHandler``, and adds it to the list of embed handlers available during rich text conversion.
  108. .. code-block:: python
  109. # my_app/wagtail_hooks.py
  110. from wagtail.core import hooks
  111. from my_app.handlers import MyCustomEmbedHandler
  112. @hooks.register('register_rich_text_features')
  113. def register_embed_handler(features):
  114. features.register_embed_type(MyCustomEmbedHandler)
  115. Editor widgets
  116. --------------
  117. The editor interface used on rich text fields can be configured with the :ref:`WAGTAILADMIN_RICH_TEXT_EDITORS <WAGTAILADMIN_RICH_TEXT_EDITORS>` setting. Wagtail provides two editor implementations: ``wagtail.admin.rich_text.DraftailRichTextArea`` (the `Draftail <https://www.draftail.org/>`_ editor based on `Draft.js <https://draftjs.org/>`_) and ``wagtail.admin.rich_text.HalloRichTextArea`` (deprecated, based on `Hallo.js <http://hallojs.org/>`_).
  118. It is possible to create your own rich text editor implementation. At minimum, a rich text editor is a Django :class:`~django.forms.Widget` subclass whose constructor accepts an ``options`` keyword argument (a dictionary of editor-specific configuration options sourced from the ``OPTIONS`` field in ``WAGTAILADMIN_RICH_TEXT_EDITORS``), and which consumes and produces string data in the HTML-like format described above.
  119. Typically, a rich text widget also receives a ``features`` list, passed from either ``RichTextField`` / ``RichTextBlock`` or the ``features`` option in ``WAGTAILADMIN_RICH_TEXT_EDITORS``, which defines the features available in that instance of the editor (see :ref:`rich_text_features`). To opt in to supporting features, set the attribute ``accepts_features = True`` on your widget class; the widget constructor will then receive the feature list as a keyword argument ``features``.
  120. There is a standard set of recognised feature identifiers as listed under :ref:`rich_text_features`, but this is not a definitive list; feature identifiers are only defined by convention, and it is up to each editor widget to determine which features it will recognise, and adapt its behaviour accordingly. Individual editor widgets might implement fewer or more features than the default set, either as built-in functionality or through a plugin mechanism if the editor widget has one.
  121. For example, a third-party Wagtail extension might introduce ``table`` as a new rich text feature, and provide implementations for the Draftail and Hallo editors (which both provide a plugin mechanism). In this case, the third-party extension will not be aware of your custom editor widget, and so the widget will not know how to handle the ``table`` feature identifier. Editor widgets should silently ignore any feature identifiers that they do not recognise.
  122. The ``default_features`` attribute of the feature registry is a list of feature identifiers to be used whenever an explicit feature list has not been given in ``RichTextField`` / ``RichTextBlock`` or ``WAGTAILADMIN_RICH_TEXT_EDITORS``. This list can be modified within the ``register_rich_text_features`` hook to make new features enabled by default, and retrieved by calling ``get_default_features()``.
  123. .. code-block:: python
  124. @hooks.register('register_rich_text_features')
  125. def make_h1_default(features):
  126. features.default_features.append('h1')
  127. Outside of the ``register_rich_text_features`` hook - for example, inside a widget class - the feature registry can be imported as the object ``wagtail.core.rich_text.features``. A possible starting point for a rich text editor with feature support would be:
  128. .. code-block:: python
  129. from django.forms import widgets
  130. from wagtail.core.rich_text import features
  131. class CustomRichTextArea(widgets.TextArea):
  132. accepts_features = True
  133. def __init__(self, *args, **kwargs):
  134. self.options = kwargs.pop('options', None)
  135. self.features = kwargs.pop('features', None)
  136. if self.features is None:
  137. self.features = features.get_default_features()
  138. super().__init__(*args, **kwargs)
  139. Editor plugins
  140. --------------
  141. .. method:: FeatureRegistry.register_editor_plugin(editor_name, feature_name, plugin_definition)
  142. Rich text editors often provide a plugin mechanism to allow extending the editor with new functionality. The ``register_editor_plugin`` method provides a standardised way for ``register_rich_text_features`` hooks to define plugins to be pulled in to the editor when a given rich text feature is enabled.
  143. ``register_editor_plugin`` is passed an editor name (a string uniquely identifying the editor widget - Wagtail uses the identifiers ``draftail`` and ``hallo`` for its built-in editors), a feature identifier, and a plugin definition object. This object is specific to the editor widget and can be any arbitrary value, but will typically include a :doc:`Django form media <django:topics/forms/media>` definition referencing the plugin's JavaScript code - which will then be merged into the editor widget's own media definition - along with any relevant configuration options to be passed when instantiating the editor.
  144. .. method:: FeatureRegistry.get_editor_plugin(editor_name, feature_name)
  145. Within the editor widget, the plugin definition for a given feature can be retrieved via the ``get_editor_plugin`` method, passing the editor's own identifier string and the feature identifier. This will return ``None`` if no matching plugin has been registered.
  146. For details of the plugin formats for Wagtail's built-in editors, see :doc:`./extending_draftail` and :doc:`./extending_hallo`.
  147. .. _rich_text_format_converters:
  148. Format converters
  149. -----------------
  150. Editor widgets will often be unable to work directly with Wagtail's rich text format, and require conversion to their own native format. For Draftail, this is a JSON-based format known as ContentState (see `How Draft.js Represents Rich Text Data <https://medium.com/@rajaraodv/how-draft-js-represents-rich-text-data-eeabb5f25cf2>`_). Hallo.js and other editors based on HTML's ``contentEditable`` mechanism require valid HTML, and so Wagtail uses a convention referred to as "editor HTML", where the additional data required on link and embed elements is stored in ``data-`` attributes, for example: ``<a href="/contact-us/" data-linktype="page" data-id="3">Contact us</a>``.
  151. Wagtail provides two utility classes, ``wagtail.admin.rich_text.converters.contentstate.ContentstateConverter`` and ``wagtail.admin.rich_text.converters.editor_html.EditorHTMLConverter``, to perform conversions between rich text format and the native editor formats. These classes are independent of any editor widget, and distinct from the rewriting process that happens when rendering rich text onto a template.
  152. Both classes accept a ``features`` list as an argument to their constructor, and implement two methods, ``from_database_format(data)`` which converts Wagtail rich text data to the editor's format, and ``to_database_format(data)`` which converts editor data to Wagtail rich text format.
  153. As with editor plugins, the behaviour of a converter class can vary according to the feature list passed to it. In particular, it can apply whitelisting rules to ensure that the output only contains HTML elements corresponding to the currently active feature set. The feature registry provides a ``register_converter_rule`` method to allow ``register_rich_text_features`` hooks to define conversion rules that will be activated when a given feature is enabled.
  154. .. method:: FeatureRegistry.register_converter_rule(converter_name, feature_name, rule_definition)
  155. ``register_editor_plugin`` is passed a converter name (a string uniquely identifying the converter class - Wagtail uses the identifiers ``contentstate`` and ``editorhtml``), a feature identifier, and a rule definition object. This object is specific to the converter and can be any arbitrary value.
  156. For details of the rule definition format for the ``contentstate`` and ``editorhtml`` converters, see :doc:`./extending_draftail` and :doc:`./extending_hallo` respectively.
  157. .. method:: FeatureRegistry.get_converter_rule(converter_name, feature_name)
  158. Within a converter class, the rule definition for a given feature can be retrieved via the ``get_converter_rule`` method, passing the converter's own identifier string and the feature identifier. This will return ``None`` if no matching rule has been registered.