123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442 |
- ================
- Full text search
- ================
- The database functions in the ``django.contrib.postgres.search`` module ease
- the use of PostgreSQL's `full text search engine
- <https://www.postgresql.org/docs/current/textsearch.html>`_.
- For the examples in this document, we'll use the models defined in
- :doc:`/topics/db/queries`.
- .. seealso::
- For a high-level overview of searching, see the :doc:`topic documentation
- </topics/db/search>`.
- .. currentmodule:: django.contrib.postgres.search
- The ``search`` lookup
- =====================
- .. fieldlookup:: search
- A common way to use full text search is to search a single term against a
- single column in the database. For example:
- .. code-block:: pycon
- >>> Entry.objects.filter(body_text__search="Cheese")
- [<Entry: Cheese on Toast recipes>, <Entry: Pizza Recipes>]
- This creates a ``to_tsvector`` in the database from the ``body_text`` field
- and a ``plainto_tsquery`` from the search term ``'Cheese'``, both using the
- default database search configuration. The results are obtained by matching the
- query and the vector.
- To use the ``search`` lookup, ``'django.contrib.postgres'`` must be in your
- :setting:`INSTALLED_APPS`.
- ``SearchVector``
- ================
- .. class:: SearchVector(*expressions, config=None, weight=None)
- Searching against a single field is great but rather limiting. The ``Entry``
- instances we're searching belong to a ``Blog``, which has a ``tagline`` field.
- To query against both fields, use a ``SearchVector``:
- .. code-block:: pycon
- >>> from django.contrib.postgres.search import SearchVector
- >>> Entry.objects.annotate(
- ... search=SearchVector("body_text", "blog__tagline"),
- ... ).filter(search="Cheese")
- [<Entry: Cheese on Toast recipes>, <Entry: Pizza Recipes>]
- The arguments to ``SearchVector`` can be any
- :class:`~django.db.models.Expression` or the name of a field. Multiple
- arguments will be concatenated together using a space so that the search
- document includes them all.
- ``SearchVector`` objects can be combined together, allowing you to reuse them.
- For example:
- .. code-block:: pycon
- >>> Entry.objects.annotate(
- ... search=SearchVector("body_text") + SearchVector("blog__tagline"),
- ... ).filter(search="Cheese")
- [<Entry: Cheese on Toast recipes>, <Entry: Pizza Recipes>]
- See :ref:`postgresql-fts-search-configuration` and
- :ref:`postgresql-fts-weighting-queries` for an explanation of the ``config``
- and ``weight`` parameters.
- ``SearchQuery``
- ===============
- .. class:: SearchQuery(value, config=None, search_type='plain')
- ``SearchQuery`` translates the terms the user provides into a search query
- object that the database compares to a search vector. By default, all the words
- the user provides are passed through the stemming algorithms, and then it
- looks for matches for all of the resulting terms.
- If ``search_type`` is ``'plain'``, which is the default, the terms are treated
- as separate keywords. If ``search_type`` is ``'phrase'``, the terms are treated
- as a single phrase. If ``search_type`` is ``'raw'``, then you can provide a
- formatted search query with terms and operators. If ``search_type`` is
- ``'websearch'``, then you can provide a formatted search query, similar to the
- one used by web search engines. ``'websearch'`` requires PostgreSQL ≥ 11. Read
- PostgreSQL's `Full Text Search docs`_ to learn about differences and syntax.
- Examples:
- .. _Full Text Search docs: https://www.postgresql.org/docs/current/textsearch-controls.html#TEXTSEARCH-PARSING-QUERIES
- .. code-block:: pycon
- >>> from django.contrib.postgres.search import SearchQuery
- >>> SearchQuery("red tomato") # two keywords
- >>> SearchQuery("tomato red") # same results as above
- >>> SearchQuery("red tomato", search_type="phrase") # a phrase
- >>> SearchQuery("tomato red", search_type="phrase") # a different phrase
- >>> SearchQuery("'tomato' & ('red' | 'green')", search_type="raw") # boolean operators
- >>> SearchQuery(
- ... "'tomato' ('red' OR 'green')", search_type="websearch"
- ... ) # websearch operators
- ``SearchQuery`` terms can be combined logically to provide more flexibility:
- .. code-block:: pycon
- >>> from django.contrib.postgres.search import SearchQuery
- >>> SearchQuery("meat") & SearchQuery("cheese") # AND
- >>> SearchQuery("meat") | SearchQuery("cheese") # OR
- >>> ~SearchQuery("meat") # NOT
- See :ref:`postgresql-fts-search-configuration` for an explanation of the
- ``config`` parameter.
- ``SearchRank``
- ==============
- .. class:: SearchRank(vector, query, weights=None, normalization=None, cover_density=False)
- So far, we've returned the results for which any match between the vector and
- the query are possible. It's likely you may wish to order the results by some
- sort of relevancy. PostgreSQL provides a ranking function which takes into
- account how often the query terms appear in the document, how close together
- the terms are in the document, and how important the part of the document is
- where they occur. The better the match, the higher the value of the rank. To
- order by relevancy:
- .. code-block:: pycon
- >>> from django.contrib.postgres.search import SearchQuery, SearchRank, SearchVector
- >>> vector = SearchVector("body_text")
- >>> query = SearchQuery("cheese")
- >>> Entry.objects.annotate(rank=SearchRank(vector, query)).order_by("-rank")
- [<Entry: Cheese on Toast recipes>, <Entry: Pizza recipes>]
- See :ref:`postgresql-fts-weighting-queries` for an explanation of the
- ``weights`` parameter.
- Set the ``cover_density`` parameter to ``True`` to enable the cover density
- ranking, which means that the proximity of matching query terms is taken into
- account.
- Provide an integer to the ``normalization`` parameter to control rank
- normalization. This integer is a bit mask, so you can combine multiple
- behaviors:
- .. code-block:: pycon
- >>> from django.db.models import Value
- >>> Entry.objects.annotate(
- ... rank=SearchRank(
- ... vector,
- ... query,
- ... normalization=Value(2).bitor(Value(4)),
- ... )
- ... )
- The PostgreSQL documentation has more details about `different rank
- normalization options`_.
- .. _different rank normalization options: https://www.postgresql.org/docs/current/textsearch-controls.html#TEXTSEARCH-RANKING
- ``SearchHeadline``
- ==================
- .. class:: SearchHeadline(expression, query, config=None, start_sel=None, stop_sel=None, max_words=None, min_words=None, short_word=None, highlight_all=None, max_fragments=None, fragment_delimiter=None)
- Accepts a single text field or an expression, a query, a config, and a set of
- options. Returns highlighted search results.
- Set the ``start_sel`` and ``stop_sel`` parameters to the string values to be
- used to wrap highlighted query terms in the document. PostgreSQL's defaults are
- ``<b>`` and ``</b>``.
- Provide integer values to the ``max_words`` and ``min_words`` parameters to
- determine the longest and shortest headlines. PostgreSQL's defaults are 35 and
- 15.
- Provide an integer value to the ``short_word`` parameter to discard words of
- this length or less in each headline. PostgreSQL's default is 3.
- Set the ``highlight_all`` parameter to ``True`` to use the whole document in
- place of a fragment and ignore ``max_words``, ``min_words``, and ``short_word``
- parameters. That's disabled by default in PostgreSQL.
- Provide a non-zero integer value to the ``max_fragments`` to set the maximum
- number of fragments to display. That's disabled by default in PostgreSQL.
- Set the ``fragment_delimiter`` string parameter to configure the delimiter
- between fragments. PostgreSQL's default is ``" ... "``.
- The PostgreSQL documentation has more details on `highlighting search
- results`_.
- Usage example:
- .. code-block:: pycon
- >>> from django.contrib.postgres.search import SearchHeadline, SearchQuery
- >>> query = SearchQuery("red tomato")
- >>> entry = Entry.objects.annotate(
- ... headline=SearchHeadline(
- ... "body_text",
- ... query,
- ... start_sel="<span>",
- ... stop_sel="</span>",
- ... ),
- ... ).get()
- >>> print(entry.headline)
- Sandwich with <span>tomato</span> and <span>red</span> cheese.
- See :ref:`postgresql-fts-search-configuration` for an explanation of the
- ``config`` parameter.
- .. _highlighting search results: https://www.postgresql.org/docs/current/textsearch-controls.html#TEXTSEARCH-HEADLINE
- .. _postgresql-fts-search-configuration:
- Changing the search configuration
- =================================
- You can specify the ``config`` attribute to a :class:`SearchVector` and
- :class:`SearchQuery` to use a different search configuration. This allows using
- different language parsers and dictionaries as defined by the database:
- .. code-block:: pycon
- >>> from django.contrib.postgres.search import SearchQuery, SearchVector
- >>> Entry.objects.annotate(
- ... search=SearchVector("body_text", config="french"),
- ... ).filter(search=SearchQuery("œuf", config="french"))
- [<Entry: Pain perdu>]
- The value of ``config`` could also be stored in another column:
- .. code-block:: pycon
- >>> from django.db.models import F
- >>> Entry.objects.annotate(
- ... search=SearchVector("body_text", config=F("blog__language")),
- ... ).filter(search=SearchQuery("œuf", config=F("blog__language")))
- [<Entry: Pain perdu>]
- .. _postgresql-fts-weighting-queries:
- Weighting queries
- =================
- Every field may not have the same relevance in a query, so you can set weights
- of various vectors before you combine them:
- .. code-block:: pycon
- >>> from django.contrib.postgres.search import SearchQuery, SearchRank, SearchVector
- >>> vector = SearchVector("body_text", weight="A") + SearchVector(
- ... "blog__tagline", weight="B"
- ... )
- >>> query = SearchQuery("cheese")
- >>> Entry.objects.annotate(rank=SearchRank(vector, query)).filter(rank__gte=0.3).order_by(
- ... "rank"
- ... )
- The weight should be one of the following letters: D, C, B, A. By default,
- these weights refer to the numbers ``0.1``, ``0.2``, ``0.4``, and ``1.0``,
- respectively. If you wish to weight them differently, pass a list of four
- floats to :class:`SearchRank` as ``weights`` in the same order above:
- .. code-block:: pycon
- >>> rank = SearchRank(vector, query, weights=[0.2, 0.4, 0.6, 0.8])
- >>> Entry.objects.annotate(rank=rank).filter(rank__gte=0.3).order_by("-rank")
- Performance
- ===========
- Special database configuration isn't necessary to use any of these functions,
- however, if you're searching more than a few hundred records, you're likely to
- run into performance problems. Full text search is a more intensive process
- than comparing the size of an integer, for example.
- In the event that all the fields you're querying on are contained within one
- particular model, you can create a functional
- :class:`GIN <django.contrib.postgres.indexes.GinIndex>` or
- :class:`GiST <django.contrib.postgres.indexes.GistIndex>` index which matches
- the search vector you wish to use. For example::
- GinIndex(
- SearchVector("body_text", "headline", config="english"),
- name="search_vector_idx",
- )
- The PostgreSQL documentation has details on
- `creating indexes for full text search
- <https://www.postgresql.org/docs/current/textsearch-tables.html#TEXTSEARCH-TABLES-INDEX>`_.
- ``SearchVectorField``
- ---------------------
- .. class:: SearchVectorField
- If this approach becomes too slow, you can add a ``SearchVectorField`` to your
- model. You'll need to keep it populated with triggers, for example, as
- described in the `PostgreSQL documentation`_. You can then query the field as
- if it were an annotated ``SearchVector``:
- .. code-block:: pycon
- >>> Entry.objects.update(search_vector=SearchVector("body_text"))
- >>> Entry.objects.filter(search_vector="cheese")
- [<Entry: Cheese on Toast recipes>, <Entry: Pizza recipes>]
- .. _PostgreSQL documentation: https://www.postgresql.org/docs/current/textsearch-features.html#TEXTSEARCH-UPDATE-TRIGGERS
- Trigram similarity
- ==================
- Another approach to searching is trigram similarity. A trigram is a group of
- three consecutive characters. In addition to the :lookup:`trigram_similar`,
- :lookup:`trigram_word_similar`, and :lookup:`trigram_strict_word_similar`
- lookups, you can use a couple of other expressions.
- To use them, you need to activate the `pg_trgm extension
- <https://www.postgresql.org/docs/current/pgtrgm.html>`_ on PostgreSQL. You can
- install it using the
- :class:`~django.contrib.postgres.operations.TrigramExtension` migration
- operation.
- ``TrigramSimilarity``
- ---------------------
- .. class:: TrigramSimilarity(expression, string, **extra)
- Accepts a field name or expression, and a string or expression. Returns the
- trigram similarity between the two arguments.
- Usage example:
- .. code-block:: pycon
- >>> from django.contrib.postgres.search import TrigramSimilarity
- >>> Author.objects.create(name="Katy Stevens")
- >>> Author.objects.create(name="Stephen Keats")
- >>> test = "Katie Stephens"
- >>> Author.objects.annotate(
- ... similarity=TrigramSimilarity("name", test),
- ... ).filter(
- ... similarity__gt=0.3
- ... ).order_by("-similarity")
- [<Author: Katy Stevens>, <Author: Stephen Keats>]
- ``TrigramWordSimilarity``
- -------------------------
- .. class:: TrigramWordSimilarity(string, expression, **extra)
- Accepts a string or expression, and a field name or expression. Returns the
- trigram word similarity between the two arguments.
- Usage example:
- .. code-block:: pycon
- >>> from django.contrib.postgres.search import TrigramWordSimilarity
- >>> Author.objects.create(name="Katy Stevens")
- >>> Author.objects.create(name="Stephen Keats")
- >>> test = "Kat"
- >>> Author.objects.annotate(
- ... similarity=TrigramWordSimilarity(test, "name"),
- ... ).filter(
- ... similarity__gt=0.3
- ... ).order_by("-similarity")
- [<Author: Katy Stevens>]
- ``TrigramStrictWordSimilarity``
- -------------------------------
- .. class:: TrigramStrictWordSimilarity(string, expression, **extra)
- Accepts a string or expression, and a field name or expression. Returns the
- trigram strict word similarity between the two arguments. Similar to
- :class:`TrigramWordSimilarity() <TrigramWordSimilarity>`, except that it forces
- extent boundaries to match word boundaries.
- ``TrigramDistance``
- -------------------
- .. class:: TrigramDistance(expression, string, **extra)
- Accepts a field name or expression, and a string or expression. Returns the
- trigram distance between the two arguments.
- Usage example:
- .. code-block:: pycon
- >>> from django.contrib.postgres.search import TrigramDistance
- >>> Author.objects.create(name="Katy Stevens")
- >>> Author.objects.create(name="Stephen Keats")
- >>> test = "Katie Stephens"
- >>> Author.objects.annotate(
- ... distance=TrigramDistance("name", test),
- ... ).filter(
- ... distance__lte=0.7
- ... ).order_by("distance")
- [<Author: Katy Stevens>, <Author: Stephen Keats>]
- ``TrigramWordDistance``
- -----------------------
- .. class:: TrigramWordDistance(string, expression, **extra)
- Accepts a string or expression, and a field name or expression. Returns the
- trigram word distance between the two arguments.
- Usage example:
- .. code-block:: pycon
- >>> from django.contrib.postgres.search import TrigramWordDistance
- >>> Author.objects.create(name="Katy Stevens")
- >>> Author.objects.create(name="Stephen Keats")
- >>> test = "Kat"
- >>> Author.objects.annotate(
- ... distance=TrigramWordDistance(test, "name"),
- ... ).filter(
- ... distance__lte=0.7
- ... ).order_by("distance")
- [<Author: Katy Stevens>]
- ``TrigramStrictWordDistance``
- -----------------------------
- .. class:: TrigramStrictWordDistance(string, expression, **extra)
- Accepts a string or expression, and a field name or expression. Returns the
- trigram strict word distance between the two arguments.
|