Mapping a multifield
Often, a field must be processed with several core types or in different ways. For example, a string field must be processed as analyzed
for search and as not_analyzed
for sorting. To do this, we need to define a multifield.
Multifield is a very powerful feature of mapping, because it allows the use of the same field in different ways.
Getting ready
You need a working ElasticSearch cluster.
How to do it...
To define a multifield we need to do the following:
- Use
multi_field
as type. - Define a dictionary containing the subfields called
fields
. The subfield with the same name of parent field is the default one.
If we consider the item of our order example, we can index the name as multi_field
as shown in the following code:
"name": { "type": "multi_field", "fields": { "name": { "type": "string", "index": "not_analyzed" }, "tk": { "type": "string", "index": "analyzed" }, "code": { "type": "string", "index": "analyzed", "analyzer": "code_analyzer" } } },
If we already have a mapping stored in ElasticSearch, and if we want to upgrade the field in a multifield, it's enough to save a new mapping with a different type and ElasticSearch provides automatic merging.
How it works...
During indexing, when ElasticSearch processes a type
field as multi_field
, it reprocesses the same field for every subfield defined in the mapping.
To access the subfields of multi_field
, we have a new path built on the base field plus the subfield name. If we consider the preceding example, we have:
name
: This points to default multifield subfield (thenot_analyzed
one)name.tk
: This points to the standard analyzed (tokenized) fieldname.code
: This points to a field analyzed with a code extractor analyzer
If you notice in the preceding example, we have changed the analyzer to introduce a code extractor analyzer that allows extraction of the item code from a string.
Using the multifield if we index a string, such as "Good item to buy - ABC1234", we'll have:
name
= "Good item to buy - ABC1234" (useful for sorting)name.tk
=["good", "item", "to", "buy", "abc1234"] (useful for searching)name.code
= ["ABC1234"] (useful for searching and faceting)
There's more...
MultiField is very useful in data processing, because it allows you to define several ways to process a field data.
For example, if we are working for document content, we can define them as subfield analyzers to extract names, places, date/time, geo location, and so on. The fields of a multifield are standard core type fields; we can do every process we want on them, such as search, filter, facet, and scripting.
See also
- Mapping different analyzers