123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254 |
- ================
- Full text search
- ================
- The database functions in the ``django.contrib.postgres.search`` module ease
- the use of PostgreSQL's `full text search engine
- <https://www.postgresql.org/docs/current/textsearch.html>`_.
- For the examples in this document, we'll use the models defined in
- :doc:`/topics/db/queries`.
- .. seealso::
- For a high-level overview of searching, see the :doc:`topic documentation
- </topics/db/search>`.
- .. currentmodule:: django.contrib.postgres.search
- The ``search`` lookup
- =====================
- .. fieldlookup:: search
- The simplest way to use full text search is to search a single term against a
- single column in the database. For example::
- >>> Entry.objects.filter(body_text__search='Cheese')
- [<Entry: Cheese on Toast recipes>, <Entry: Pizza Recipes>]
- This creates a ``to_tsvector`` in the database from the ``body_text`` field
- and a ``plainto_tsquery`` from the search term ``'Cheese'``, both using the
- default database search configuration. The results are obtained by matching the
- query and the vector.
- To use the ``search`` lookup, ``'django.contrib.postgres'`` must be in your
- :setting:`INSTALLED_APPS`.
- ``SearchVector``
- ================
- .. class:: SearchVector(*expressions, config=None, weight=None)
- Searching against a single field is great but rather limiting. The ``Entry``
- instances we're searching belong to a ``Blog``, which has a ``tagline`` field.
- To query against both fields, use a ``SearchVector``::
- >>> from django.contrib.postgres.search import SearchVector
- >>> Entry.objects.annotate(
- ... search=SearchVector('body_text', 'blog__tagline'),
- ... ).filter(search='Cheese')
- [<Entry: Cheese on Toast recipes>, <Entry: Pizza Recipes>]
- The arguments to ``SearchVector`` can be any
- :class:`~django.db.models.Expression` or the name of a field. Multiple
- arguments will be concatenated together using a space so that the search
- document includes them all.
- ``SearchVector`` objects can be combined together, allowing you to reuse them.
- For example::
- >>> Entry.objects.annotate(
- ... search=SearchVector('body_text') + SearchVector('blog__tagline'),
- ... ).filter(search='Cheese')
- [<Entry: Cheese on Toast recipes>, <Entry: Pizza Recipes>]
- See :ref:`postgresql-fts-search-configuration` and
- :ref:`postgresql-fts-weighting-queries` for an explanation of the ``config``
- and ``weight`` parameters.
- ``SearchQuery``
- ===============
- .. class:: SearchQuery(value, config=None, search_type='plain')
- ``SearchQuery`` translates the terms the user provides into a search query
- object that the database compares to a search vector. By default, all the words
- the user provides are passed through the stemming algorithms, and then it
- looks for matches for all of the resulting terms.
- If ``search_type`` is ``'plain'``, which is the default, the terms are treated
- as separate keywords. If ``search_type`` is ``'phrase'``, the terms are treated
- as a single phrase. If ``search_type`` is ``'raw'``, then you can provide a
- formatted search query with terms and operators. Read PostgreSQL's `Full Text
- Search docs`_ to learn about differences and syntax. Examples:
- .. _Full Text Search docs: https://www.postgresql.org/docs/current/textsearch-controls.html#TEXTSEARCH-PARSING-QUERIES
- >>> from django.contrib.postgres.search import SearchQuery
- >>> SearchQuery('red tomato') # two keywords
- >>> SearchQuery('tomato red') # same results as above
- >>> SearchQuery('red tomato', search_type='phrase') # a phrase
- >>> SearchQuery('tomato red', search_type='phrase') # a different phrase
- >>> SearchQuery("'tomato' & ('red' | 'green')", search_type='raw') # boolean operators
- ``SearchQuery`` terms can be combined logically to provide more flexibility::
- >>> from django.contrib.postgres.search import SearchQuery
- >>> SearchQuery('meat') & SearchQuery('cheese') # AND
- >>> SearchQuery('meat') | SearchQuery('cheese') # OR
- >>> ~SearchQuery('meat') # NOT
- See :ref:`postgresql-fts-search-configuration` for an explanation of the
- ``config`` parameter.
- .. versionadded:: 2.2
- The `search_type` parameter was added.
- ``SearchRank``
- ==============
- .. class:: SearchRank(vector, query, weights=None)
- So far, we've just returned the results for which any match between the vector
- and the query are possible. It's likely you may wish to order the results by
- some sort of relevancy. PostgreSQL provides a ranking function which takes into
- account how often the query terms appear in the document, how close together
- the terms are in the document, and how important the part of the document is
- where they occur. The better the match, the higher the value of the rank. To
- order by relevancy::
- >>> from django.contrib.postgres.search import SearchQuery, SearchRank, SearchVector
- >>> vector = SearchVector('body_text')
- >>> query = SearchQuery('cheese')
- >>> Entry.objects.annotate(rank=SearchRank(vector, query)).order_by('-rank')
- [<Entry: Cheese on Toast recipes>, <Entry: Pizza recipes>]
- See :ref:`postgresql-fts-weighting-queries` for an explanation of the
- ``weights`` parameter.
- .. _postgresql-fts-search-configuration:
- Changing the search configuration
- =================================
- You can specify the ``config`` attribute to a :class:`SearchVector` and
- :class:`SearchQuery` to use a different search configuration. This allows using
- different language parsers and dictionaries as defined by the database::
- >>> from django.contrib.postgres.search import SearchQuery, SearchVector
- >>> Entry.objects.annotate(
- ... search=SearchVector('body_text', config='french'),
- ... ).filter(search=SearchQuery('œuf', config='french'))
- [<Entry: Pain perdu>]
- The value of ``config`` could also be stored in another column::
- >>> from django.db.models import F
- >>> Entry.objects.annotate(
- ... search=SearchVector('body_text', config=F('blog__language')),
- ... ).filter(search=SearchQuery('œuf', config=F('blog__language')))
- [<Entry: Pain perdu>]
- .. _postgresql-fts-weighting-queries:
- Weighting queries
- =================
- Every field may not have the same relevance in a query, so you can set weights
- of various vectors before you combine them::
- >>> from django.contrib.postgres.search import SearchQuery, SearchRank, SearchVector
- >>> vector = SearchVector('body_text', weight='A') + SearchVector('blog__tagline', weight='B')
- >>> query = SearchQuery('cheese')
- >>> Entry.objects.annotate(rank=SearchRank(vector, query)).filter(rank__gte=0.3).order_by('rank')
- The weight should be one of the following letters: D, C, B, A. By default,
- these weights refer to the numbers ``0.1``, ``0.2``, ``0.4``, and ``1.0``,
- respectively. If you wish to weight them differently, pass a list of four
- floats to :class:`SearchRank` as ``weights`` in the same order above::
- >>> rank = SearchRank(vector, query, weights=[0.2, 0.4, 0.6, 0.8])
- >>> Entry.objects.annotate(rank=rank).filter(rank__gte=0.3).order_by('-rank')
- Performance
- ===========
- Special database configuration isn't necessary to use any of these functions,
- however, if you're searching more than a few hundred records, you're likely to
- run into performance problems. Full text search is a more intensive process
- than comparing the size of an integer, for example.
- In the event that all the fields you're querying on are contained within one
- particular model, you can create a functional index which matches the search
- vector you wish to use. The PostgreSQL documentation has details on
- `creating indexes for full text search
- <https://www.postgresql.org/docs/current/textsearch-tables.html#TEXTSEARCH-TABLES-INDEX>`_.
- ``SearchVectorField``
- ---------------------
- .. class:: SearchVectorField
- If this approach becomes too slow, you can add a ``SearchVectorField`` to your
- model. You'll need to keep it populated with triggers, for example, as
- described in the `PostgreSQL documentation`_. You can then query the field as
- if it were an annotated ``SearchVector``::
- >>> Entry.objects.update(search_vector=SearchVector('body_text'))
- >>> Entry.objects.filter(search_vector='cheese')
- [<Entry: Cheese on Toast recipes>, <Entry: Pizza recipes>]
- .. _PostgreSQL documentation: https://www.postgresql.org/docs/current/textsearch-features.html#TEXTSEARCH-UPDATE-TRIGGERS
- Trigram similarity
- ==================
- Another approach to searching is trigram similarity. A trigram is a group of
- three consecutive characters. In addition to the :lookup:`trigram_similar`
- lookup, you can use a couple of other expressions.
- To use them, you need to activate the `pg_trgm extension
- <https://www.postgresql.org/docs/current/pgtrgm.html>`_ on PostgreSQL. You can
- install it using the
- :class:`~django.contrib.postgres.operations.TrigramExtension` migration
- operation.
- ``TrigramSimilarity``
- ---------------------
- .. class:: TrigramSimilarity(expression, string, **extra)
- Accepts a field name or expression, and a string or expression. Returns the
- trigram similarity between the two arguments.
- Usage example::
- >>> from django.contrib.postgres.search import TrigramSimilarity
- >>> Author.objects.create(name='Katy Stevens')
- >>> Author.objects.create(name='Stephen Keats')
- >>> test = 'Katie Stephens'
- >>> Author.objects.annotate(
- ... similarity=TrigramSimilarity('name', test),
- ... ).filter(similarity__gt=0.3).order_by('-similarity')
- [<Author: Katy Stevens>, <Author: Stephen Keats>]
- ``TrigramDistance``
- -------------------
- .. class:: TrigramDistance(expression, string, **extra)
- Accepts a field name or expression, and a string or expression. Returns the
- trigram distance between the two arguments.
- Usage example::
- >>> from django.contrib.postgres.search import TrigramDistance
- >>> Author.objects.create(name='Katy Stevens')
- >>> Author.objects.create(name='Stephen Keats')
- >>> test = 'Katie Stephens'
- >>> Author.objects.annotate(
- ... distance=TrigramDistance('name', test),
- ... ).filter(distance__lte=0.7).order_by('distance')
- [<Author: Katy Stevens>, <Author: Stephen Keats>]
|