Browse Source

Add search_index option to control search indexing of StreamField blocks (#11135)

Vedant Pandey 1 year ago
parent
commit
837d733097

+ 1 - 0
CHANGELOG.txt

@@ -4,6 +4,7 @@ Changelog
 6.0 (xx.xx.xxxx) - IN DEVELOPMENT
 ~~~~~~~~~~~~~~~~
 
+ * Added `search_index` option to StreamField blocks to control whether the block is indexed for searching (Vedant Pandey)
  * Fix: Update system check for overwriting storage backends to recognise the `STORAGES` setting introduced in Django 4.2 (phijma-leukeleu)
  * Fix: Prevent password change form from raising a validation error when browser autocomplete fills in the "Old password" field (Chiemezuo Akujobi)
  * Fix: Ensure that the legacy dropdown options, when closed, do not get accidentally clicked by other interactions wide viewports (CheesyPhoenix, Christer Jensen)

+ 1 - 0
CONTRIBUTORS.md

@@ -756,6 +756,7 @@
 * scott-8
 * phijma-leukeleu
 * CheesyPhoenix
+* Vedant Pandey
 
 ## Translators
 

+ 8 - 0
docs/reference/streamfield/blocks.md

@@ -66,6 +66,7 @@ All block definitions accept the following optional keyword arguments:
     :param max_length: The maximum allowed length of the field.
     :param min_length: The minimum allowed length of the field.
     :param help_text: Help text to display alongside the field.
+    :param search_index: If false (default true), the content of this block will not be indexed for searching.
     :param validators: A list of validation functions for the field (see `Django Validators <https://docs.djangoproject.com/en/stable/ref/validators/>`__).
     :param form_classname: A value to add to the form field's ``class`` attribute when rendered on the page editing form.
 
@@ -79,6 +80,7 @@ All block definitions accept the following optional keyword arguments:
     :param max_length: The maximum allowed length of the field.
     :param min_length: The minimum allowed length of the field.
     :param help_text: Help text to display alongside the field.
+    :param search_index: If false (default true), the content of this block will not be indexed for searching.
     :param rows: Number of rows to show on the textarea (defaults to 1).
     :param validators: A list of validation functions for the field (see `Django Validators <https://docs.djangoproject.com/en/stable/ref/validators/>`__).
     :param form_classname: A value to add to the form field's ``class`` attribute when rendered on the page editing form.
@@ -225,6 +227,7 @@ All block definitions accept the following optional keyword arguments:
     :param features: Specifies the set of features allowed (see :ref:`rich_text_features`).
     :param required: If true (the default), the field cannot be left blank.
     :param max_length: The maximum allowed length of the field. Only text is counted; rich text formatting, embedded content and paragraph / line breaks do not count towards the limit.
+    :param search_index: If false (default true), the content of this block will not be indexed for searching.
     :param help_text: Help text to display alongside the field.
     :param validators: A list of validation functions for the field (see `Django Validators <https://docs.djangoproject.com/en/stable/ref/validators/>`__).
     :param form_classname: A value to add to the form field's ``class`` attribute when rendered on the page editing form.
@@ -267,6 +270,7 @@ All block definitions accept the following optional keyword arguments:
     :param choices: A list of choices, in any format accepted by Django's :attr:`~django.db.models.Field.choices` parameter for model fields, or a callable returning such a list.
     :param required: If true (the default), the field cannot be left blank.
     :param help_text: Help text to display alongside the field.
+    :param search_index: If false (default true), the content of this block will not be indexed for searching.
     :param widget: The form widget to render the field with (see `Django Widgets <https://docs.djangoproject.com/en/stable/ref/forms/widgets/>`__).
     :param validators: A list of validation functions for the field (see `Django Validators <https://docs.djangoproject.com/en/stable/ref/validators/>`__).
     :param form_classname: A value to add to the form field's ``class`` attribute when rendered on the page editing form.
@@ -311,6 +315,7 @@ All block definitions accept the following optional keyword arguments:
     :param choices: A list of choices, in any format accepted by Django's :attr:`~django.db.models.Field.choices` parameter for model fields, or a callable returning such a list.
     :param required: If true (the default), the field cannot be left blank.
     :param help_text: Help text to display alongside the field.
+    :param search_index: If false (default true), the content of this block will not be indexed for searching.
     :param widget: The form widget to render the field with (see `Django Widgets <https://docs.djangoproject.com/en/stable/ref/forms/widgets/>`__).
     :param validators: A list of validation functions for the field (see `Django Validators <https://docs.djangoproject.com/en/stable/ref/validators/>`__).
     :param form_classname: A value to add to the form field's ``class`` attribute when rendered on the page editing form.
@@ -446,6 +451,7 @@ All block definitions accept the following optional keyword arguments:
     :param form_classname: An HTML ``class`` attribute to set on the root element of this block as displayed in the editing interface. Defaults to ``struct-block``; note that the admin interface has CSS styles defined on this class, so it is advised to include ``struct-block`` in this value when overriding. See :ref:`custom_editing_interfaces_for_structblock`.
     :param form_template: Path to a Django template to use to render this block's form. See :ref:`custom_editing_interfaces_for_structblock`.
     :param value_class: A subclass of ``wagtail.blocks.StructValue`` to use as the type of returned values for this block. See :ref:`custom_value_class_for_structblock`.
+    :param search_index: If false (default true), the content of this block will not be indexed for searching.
     :param label_format:
      Determines the label shown when the block is collapsed in the editing interface. By default, the value of the first sub-block in the StructBlock is shown, but this can be customised by setting a string here with block names contained in braces - for example ``label_format = "Profile for {first_name} {surname}"``
 
@@ -482,6 +488,7 @@ All block definitions accept the following optional keyword arguments:
     :param form_classname: An HTML ``class`` attribute to set on the root element of this block as displayed in the editing interface.
     :param min_num: Minimum number of sub-blocks that the list must have.
     :param max_num: Maximum number of sub-blocks that the list may have.
+    :param search_index: If false (default true) , the content of this block will not be indexed for searching.
     :param collapsed: When true, all sub-blocks are initially collapsed.
 
 
@@ -538,6 +545,7 @@ All block definitions accept the following optional keyword arguments:
     :param required: If true (the default), at least one sub-block must be supplied. This is ignored when using the ``StreamBlock`` as the top-level block of a StreamField; in this case the StreamField's ``blank`` property is respected instead.
     :param min_num: Minimum number of sub-blocks that the stream must have.
     :param max_num: Maximum number of sub-blocks that the stream may have.
+    :param search_index: If false (default true), the content of this block will not be indexed for searching.
     :param block_counts: Specifies the minimum and maximum number of each block type, as a dictionary mapping block names to dicts with (optional) ``min_num`` and ``max_num`` fields.
     :param collapsed: When true, all sub-blocks are initially collapsed.
     :param form_classname: An HTML ``class`` attribute to set on the root element of this block as displayed in the editing interface.

+ 1 - 1
docs/releases/6.0.md

@@ -14,7 +14,7 @@ depth: 1
 
 ### Other features
 
- * ...
+ * Added `search_index` option to StreamField blocks to control whether the block is indexed for searching (Vedant Pandey)
 
 ### Bug fixes
 

+ 12 - 0
docs/topics/streamfield.md

@@ -575,6 +575,18 @@ hero_image = my_page.body.first_block_by_name('image')
 <div class="hero-image">{{ page.body.first_block_by_name.image }}</div>
 ```
 
+## Search considerations
+
+Like any other field, content in a StreamField can be made searchable by adding the field to the model's search_fields definition - see {ref}`wagtailsearch_indexing_fields`. By default, all text content from the stream will be added to the search index. If you wish to exclude certain block types from being indexed, pass the keyword argument `search_index=False` as part of the block's definition. For example:
+
+```python
+body = StreamField([
+    ('normal_text', blocks.RichTextBlock()),
+    ('pull_quote', blocks.RichTextBlock(search_index=False)),
+    ('footnotes', blocks.ListBlock(blocks.CharBlock(), search_index=False)),
+], use_json_field=True)
+```
+
 ## Custom validation
 
 Custom validation logic can be added to blocks by overriding the block's `clean` method. For more information, see [](streamfield_validation).

+ 17 - 3
wagtail/blocks/field_block.py

@@ -146,10 +146,12 @@ class CharBlock(FieldBlock):
         max_length=None,
         min_length=None,
         validators=(),
+        search_index=True,
         **kwargs,
     ):
         # CharField's 'label' and 'initial' parameters are not exposed, as Block handles that functionality natively
         # (via 'label' and 'default')
+        self.search_index = search_index
         self.field = forms.CharField(
             required=required,
             help_text=help_text,
@@ -160,7 +162,7 @@ class CharBlock(FieldBlock):
         super().__init__(**kwargs)
 
     def get_searchable_content(self, value):
-        return [force_str(value)]
+        return [force_str(value)] if self.search_index else []
 
 
 class TextBlock(FieldBlock):
@@ -171,6 +173,7 @@ class TextBlock(FieldBlock):
         rows=1,
         max_length=None,
         min_length=None,
+        search_index=True,
         validators=(),
         **kwargs,
     ):
@@ -182,6 +185,7 @@ class TextBlock(FieldBlock):
             "validators": validators,
         }
         self.rows = rows
+        self.search_index = search_index
         super().__init__(**kwargs)
 
     @cached_property
@@ -193,7 +197,7 @@ class TextBlock(FieldBlock):
         return forms.CharField(**field_kwargs)
 
     def get_searchable_content(self, value):
-        return [force_str(value)]
+        return [force_str(value)] if self.search_index else []
 
     class Meta:
         icon = "pilcrow"
@@ -482,6 +486,7 @@ class BaseChoiceBlock(FieldBlock):
         default=None,
         required=True,
         help_text=None,
+        search_index=True,
         widget=None,
         validators=(),
         **kwargs,
@@ -489,6 +494,7 @@ class BaseChoiceBlock(FieldBlock):
 
         self._required = required
         self._default = default
+        self.search_index = search_index
 
         if choices is None:
             # no choices specified, so pick up the choice defined at the class level
@@ -599,6 +605,8 @@ class ChoiceBlock(BaseChoiceBlock):
 
     def get_searchable_content(self, value):
         # Return the display value as the searchable value
+        if not self.search_index:
+            return []
         text_value = force_str(value)
         for k, v in self.field.choices:
             if isinstance(v, (list, tuple)):
@@ -633,6 +641,8 @@ class MultipleChoiceBlock(BaseChoiceBlock):
 
     def get_searchable_content(self, value):
         # Return the display value as the searchable value
+        if not self.search_index:
+            return []
         content = []
         text_value = force_str(value)
         for k, v in self.field.choices:
@@ -657,6 +667,7 @@ class RichTextBlock(FieldBlock):
         features=None,
         max_length=None,
         validators=(),
+        search_index=True,
         **kwargs,
     ):
         if max_length is not None:
@@ -670,6 +681,7 @@ class RichTextBlock(FieldBlock):
         }
         self.editor = editor
         self.features = features
+        self.search_index = search_index
         super().__init__(**kwargs)
 
     def get_default(self):
@@ -707,8 +719,10 @@ class RichTextBlock(FieldBlock):
         return RichText(value)
 
     def get_searchable_content(self, value):
-        # Strip HTML tags to prevent search backend from indexing them
+        if not self.search_index:
+            return []
         source = force_str(value.source)
+        # Strip HTML tags to prevent search backend from indexing them
         return [get_text_for_indexing(source)]
 
     def extract_references(self, value):

+ 4 - 3
wagtail/blocks/list_block.py

@@ -138,9 +138,9 @@ class ListValue(MutableSequence):
 
 
 class ListBlock(Block):
-    def __init__(self, child_block, **kwargs):
+    def __init__(self, child_block, search_index=True, **kwargs):
         super().__init__(**kwargs)
-
+        self.search_index = search_index
         if isinstance(child_block, type):
             # child_block was passed as a class, so convert it to a block instance
             self.child_block = child_block()
@@ -343,8 +343,9 @@ class ListBlock(Block):
         return format_html("<ul>{0}</ul>", children)
 
     def get_searchable_content(self, value):
+        if not self.search_index:
+            return []
         content = []
-
         for child_value in value:
             content.extend(self.child_block.get_searchable_content(child_value))
 

+ 4 - 1
wagtail/blocks/stream_block.py

@@ -75,8 +75,9 @@ class StreamBlockValidationError(ValidationError):
 
 
 class BaseStreamBlock(Block):
-    def __init__(self, local_blocks=None, **kwargs):
+    def __init__(self, local_blocks=None, search_index=True, **kwargs):
         self._constructor_kwargs = kwargs
+        self.search_index = search_index
 
         super().__init__(**kwargs)
 
@@ -340,6 +341,8 @@ class BaseStreamBlock(Block):
         )
 
     def get_searchable_content(self, value):
+        if not self.search_index:
+            return []
         content = []
 
         for child in value:

+ 4 - 1
wagtail/blocks/struct_block.py

@@ -106,8 +106,9 @@ class PlaceholderBoundBlock(BoundBlock):
 
 
 class BaseStructBlock(Block):
-    def __init__(self, local_blocks=None, **kwargs):
+    def __init__(self, local_blocks=None, search_index=True, **kwargs):
         self._constructor_kwargs = kwargs
+        self.search_index = search_index
 
         super().__init__(**kwargs)
 
@@ -253,6 +254,8 @@ class BaseStructBlock(Block):
         }
 
     def get_searchable_content(self, value):
+        if not self.search_index:
+            return []
         content = []
 
         for name, block in self.child_blocks.items():

+ 1 - 0
wagtail/fields.py

@@ -252,6 +252,7 @@ class StreamField(models.Field):
         return self.get_prep_value(value)
 
     def get_searchable_content(self, value):
+
         return self.stream_block.get_searchable_content(value)
 
     def extract_references(self, value):

+ 38 - 0
wagtail/tests/test_blocks.py

@@ -125,6 +125,12 @@ class TestFieldBlock(WagtailTestUtils, SimpleTestCase):
 
         self.assertEqual(content, ["Hello world!"])
 
+    def test_search_index_searchable_content(self):
+        block = blocks.CharBlock(search_index=False)
+        content = block.get_searchable_content("Hello world!")
+
+        self.assertEqual(content, [])
+
     def test_charfield_with_validator(self):
         def validate_is_foo(value):
             if value != "foo":
@@ -665,6 +671,18 @@ class TestRichTextBlock(TestCase):
             ],
         )
 
+    def test_search_index_get_searchable_content(self):
+        block = blocks.RichTextBlock(search_index=False)
+        value = RichText(
+            '<p>Merry <a linktype="page" id="4">Christmas</a>! &amp; a happy new year</p>\n'
+            "<p>Our Santa pet <b>Wagtail</b> has some cool stuff in store for you all!</p>"
+        )
+        result = block.get_searchable_content(value)
+        self.assertEqual(
+            result,
+            [],
+        )
+
     def test_get_searchable_content_whitespace(self):
         block = blocks.RichTextBlock()
         value = RichText("<p>mashed</p><p>po<i>ta</i>toes</p>")
@@ -928,6 +946,16 @@ class TestChoiceBlock(WagtailTestUtils, SimpleTestCase):
         )
         self.assertEqual(block.get_searchable_content("choice-1"), ["Choice 1"])
 
+    def test_search_index_searchable_content(self):
+        block = blocks.ChoiceBlock(
+            choices=[
+                ("choice-1", "Choice 1"),
+                ("choice-2", "Choice 2"),
+            ],
+            search_index=False,
+        )
+        self.assertEqual(block.get_searchable_content("choice-1"), [])
+
     def test_searchable_content_with_callable_choices(self):
         def callable_choices():
             return [
@@ -1305,6 +1333,16 @@ class TestMultipleChoiceBlock(WagtailTestUtils, SimpleTestCase):
         )
         self.assertEqual(block.get_searchable_content("choice-1"), ["Choice 1"])
 
+    def test_search_index_searchable_content(self):
+        block = blocks.MultipleChoiceBlock(
+            choices=[
+                ("choice-1", "Choice 1"),
+                ("choice-2", "Choice 2"),
+            ],
+            search_index=False,
+        )
+        self.assertEqual(block.get_searchable_content("choice-1"), [])
+
     def test_searchable_content_with_callable_choices(self):
         def callable_choices():
             return [