ElasticSearch Cookbook
上QQ阅读APP看书,第一时间看更新

Mapping a document

The document, also referred to as root object, has special parameters to control its behavior used to mainly internally perform special processing.

In this recipe we'll see special fields and how to use them.

Getting ready

You need a working ElasticSearch cluster.

How to do it...

We can extend the preceding order example, adding some of the special fields. For example:

{
  "order": {
    "_uid": {
      "store": "yes"
    },
    "_id": {
      "path": "order_id"
    },
    "_type": {
      "store": "yes"
    },
    "_source": {
      "store": "yes"
    },
    "_all": {
      "enable": false
    },
    "_analyzer": {
      "path": "analyzer_field"
    },
    "_boost": {
      "null_value": 1.0
    },
    "_routing": {
      "path": "customer_id",
      "required": true
    },
    "_index": {
      "enabled": true
    },
    "_size": {
      "enabled": true,
      "store": "yes"
    },
    "_timestamp": {
      "enabled": true,
      "store": "yes",
      "path": "date"
    },
    "_ttl": {
      "enabled": true,
      "default": "3y"
    },
    "properties": {
      "order_id": {
        "type": "string",
        "store": "yes",
        "index": "not_analyzed"
      },
    "date": {
      "type": "date",
      "store": "no",
      "index": "not_analyzed"
    },
    "analyzer_field": {
      "type": "string",
      "store": "yes",
      "index": "not_analyzed"
    },
    "customer_id": {
      "type": "string",
      "store": "yes",
      "index": "not_analyzed"
    },
    "customer_ip": {
      "type": "ip",
      "store": "yes",
      "index": "yes"
    },
    "customer_location": {
      "type": "geo_point",
      "store": "yes"
    },
    "sent": {
      "type": "boolean",
      "store": "no",
      "index": "not_analyzed"
    }
    }
  }
}

How it works...

Every special field has its own parameters and a special meaning, such as:

  • _uid: This controls the storage of a unique ID, a join between the type and ID of the document. The _uid value of a document is unique in the whole index.
  • _id (defaults to not indexed and not stored): This allows indexing only the id part of the document. It can be associated with a path that will be used to extract the ID from the source of the document as shown in the following code:
      "_id" : {
        "path" : "order_id"
      },
  • _type (defaults to indexed and not stored): This allows indexing of type of the document.
  • _index (defaults to enabled=false): This determines whether or not the index should be stored. It can be enabled by setting the enabled parameter to true.
  • _boost (defaults to null_value=1.0): This controls the boost level of the document. It can be overridden in the boost parameter for the field.
  • _size (defaults to enabled=false): This controls if it stores the size of the source record.
  • _timestamp (defaults to enabled=false): This automatically enables the indexing of the document timestamp. If given a parameter path, it can be extracted by the source of document and used. It can be queried as a standard datetime.
  • _ttl (defaults to enabled=false): This sets the expiry time of the document. When a document expires, it will be removed from the index. It allows defining an optional default parameter, to provide a default value to the type level.
  • _all (defaults to enabled=true): This controls the creation of all fields (a special field that aggregates all the text of all the document fields). It's CPU and storage consumer; so if it is not required, it is better to disable it.
  • _source (defaults to enabled=true): This controls the storage of the document source. Storing the source, is very useful, but it's a storage overhead, so it is not required. Thus it's better to turn it off.
  • _parent: This defines the parent document (refer to the Mapping a child document recipe).
  • _routing: This controls in which shard the document should be stored. It supports additional parameters such as:
    • path: This provides a field to be used for routing (especially, customer_id in the example)
    • required (true/false): This forces the presence of the routing value, raising an exception if not provided
  • _analyzer: This allows defining a document field that contains the name of the analyzer to be used for fields that do not explicitly define an analyzer or an index_analyzer parameter.

The power to control how to index and processing a document is very important and allows resolution of issues related to complex data types.

Every special field has parameters to set a particular configuration and some of their behavior could change in different releases of ElasticSearch.

See also

  • Using dynamic templates in document mapping
  • Putting a mapping in an index in Chapter 4, Standard Operations