Elasticsearch

May 16, 2017 (updated: Aug 15, 2017)

1. Terms
1. 1.1. Mapping
2. 1.2. Analysis
2. Query

Terms

Mapping

https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-types.html

type	description
`text`	Analyzed to lists of terms For full-text search Not for sorting, aggregation
`keyword`	For filtring Only searchable by exact value
`date`	“format”: “yyyy-MM-dd HH:mm:ss	yyyy-MM-dd	epoch_millis”
`boolean`
`binary`	Not searchable `Base64` encoded string
`long`	numeric
`integer`	numeric
`short`	numeric
`byte`	numeric
`double`	numeric
`float`	numeric
`half_float`	numeric
`scaled_float`	numeric
`integer_range`	range
`long_range`	range
`double_range`	range
`date_range`	range
`ip_range`	range
`geo_point`	bounding box polygon
`geo_shape`	arbitrary geo shapes such as rectangles and polygons
`ip`	Supports CIDR_notation

Analysis

To register Analyzers, Tokenizers and TokenFilters

Sample:

index :
  analysis :
    analyzer :
      standard :
        type : standard
        stopwords : [stop1, stop2]
      myAnalyzer1 :
        type : standard
        stopwords : [stop1, stop2, stop3]
        max_token_length : 500
      # configure a custom analyzer which is
      # exactly like the default standard analyzer
      myAnalyzer2 :
        tokenizer : standard
        filter : [standard, lowercase, stop]
    tokenizer :
      myTokenizer1 :
        type : standard
        max_token_length : 900
      myTokenizer2 :
        type : keyword
        buffer_size : 512
    filter :
      myTokenFilter1 :
        type : stop
        stopwords : [stop1, stop2, stop3, stop4]
      myTokenFilter2 :
        type : length
        min : 0
        max : 2000

Analyzer

/Tokennizer{1} + TokenFilter*/

Predefined tokenizers, token filters and character filters to configure custome analyzers.

Built-in analyzers

Standard Analyzer
Simple Analyzer
Whitespace Analyzer
Stop Analyzer
Keyword Analyzer
Pattern Analyzer
Language Analyzers
Snowball Analyzer
Custom Analyzer

Tokenizer

Token Filter

Standard
ASCII Folding
Length
Lowercase
Uppercase
NGram
Edge NGram
Porter Stem
Shingle
Stop
Word Delimiter
Stemmer
Stemmer Override
Keyword Marker
Keyword Repeat
KStem
Snowball
Phonetic
Synonym
Compound Word
Reverse
Elision
Truncate
Unique
Pattern Capture
Pattern Replace
Trim
Limit Token Count
Hunspell
Common Grams
Normalization
CJK Width
CJK Bigram
Delimited Payload
Keep Words
Keep Types
Classic
Apostrophe
Decimal Digit

Character Filter

Query

Query API

Basic Query String

{endpoint}/_search?q=hello&size=5

Ref:

Full Query API

{endpoint}/_search?source={Query-as-JSON}
curl -XGET {endpoint}/_search -d ‘Query-as-JSON’

Query Language

{
  size: "number of results to return (defaults to 10)",
  from: "offset into results (defaults to 0)",
  fields: "list of document fields that should be returned - http://elasticsearch.org/guide/reference/api/search/fields.html",
  sort: "define sort order - see http://elasticsearch.org/guide/reference/api/search/sort.html",
  query: {
    
  },
  facets: {
    # facets specifications
    # Facets provide summary information about a particular field or fields in the data
  }
  # special case for situations where you want to apply filter/query to results but *not* to facets
  filter: {
    # filter objects
    # a filter is a simple "filter" (query) on a specific field.
    # Simple means e.g. checking against a specific value or range of values
  },
}

Context

Query Context: How well does this document match this query clause?
Filter Context: Does this document match this query clause?

Query Cateories

Leaf query clauses
- match
- term
- range
Compound query clauses
- not
- bool
- dis_max
- constant_score

Query Types

Match All Query

1	{ "match_all": {} }

1	{ "match_all": { "boost" : 1.2 }}

Full Text Queries

Match Query

{
  "match" : {
    "field_name" : "the query"
  }
}

{
  "match" : {
    "message" : {
      "query" : "the query",
      "operator" : "and"
    }
  }
}

Supported parameters:

analyzer
boost
operator
minimum_should_match
fuzziness
prefix_length
max_expansions
rewrite
zero_terms_query
cutoff_frequency

boolean (default)

fuzziness
zero_terms_query
cutoff_frequency
- relative: [0..1), absolute: 1.0..infinite
- per-shard-level

phrase

{
  "match_phrase" : {
    "message" : "this is a test"
  }
}

{
  "match" : {
    "message" : {
      "query" : "this is a test",
      "type" : "phrase"
    }
  }
}

phrase_prefix

Same as match_phrase, except that it allows for prefix matches on the last term in the text.

Multi-match Query

Multi-field queires.

{
  "multi_match" : {
    "query":    "this is a test", 
    "fields": [ "subject^3", "message", "*_name" ] 
  }
}

Types:

best_fields: (default) any fields with best _score
most_fields: any fields and conbines the _score
cross_fields: aggs fields of same analyzer
phrase: run match_phrase query on each field and conbines _score
phrase_prefix: similar with phrase

Common_terms Query

{
  "common": {
    "body": {
      "query": "nelly the elephant as a cartoon",
      "cutoff_frequency": 0.001,
      "minimum_should_match": 2
    }
  }
}

{
  "common": {
    "body": {
      "query": "nelly the elephant not as a cartoon",
      "cutoff_frequency": 0.001,
      "minimum_should_match": {
          "low_freq" : 2,
          "high_freq" : 3
       }
    }
  }
}

Query String Query

Full-text style query across all fields(default_field defaults to _all).

{
  "query_string" : {
    "default_field" : "content",
    "query" : "this AND that OR thus"
  }
}

{
  "query_string" : {
    "fields" : ["content", "name"],
    "query" : "this AND that"
  }
}

Parameter:

query
default_field (_all)
default_operator (OR)
analyzer
allow_leading_wildcard (true)
lowercase_expanded_terms (wildcard, prefix, fuzzy, and range queries) (true)
enable_position_increments (true)
fuzzy_max_expansions (50)
fuzziness (AUTO)
fuzzy_prefix_length (0)
phrase_slop (0)
boost (1.0)
analyze_wildcard (false)
auto_generate_phrase_queries (false)
max_determinized_states (10000)
minimum_should_match
lenient (false)
locale (ROOT)
time_zone []Joda timezone(http://www.joda.org/joda-time/apidocs/org/joda/time/DateTimeZone.html)

Simple Query String Query

Never throw an exception, and discards invalid parts of the query.

Term Level Queries

Term Query

Finds documents which contain the exact term Kimchy in the inverted index of the user field.

1
2
3

{
  "term" : { "user" : "Kimchy" } 
}

Optional boost

{
  "query": {
    "bool": {
      "should": [
        {
          "term": {
            "status": {
              "value": "urgent",
              "boost": 2.0 
            }
          }
        },
        {
          "term": {
            "status": "normal" 
          }
        }
      ]
    }
  }
}

Terms Query

{
  "constant_score" : {
    "filter" : {
      "terms" : { "user" : ["kimchy", "elasticsearch"]}
    }
  }
}

Range Query

TermRangeQuery (string)

NumericRangeQuery (number/date)

{
  "range" : {
    "age" : {
      "gte" : 10,
      "lte" : 20,
      "boost" : 2.0
    }
  }
}

Date Math:

{
  "range" : {
    "date" : {
      "gte" : "now-1d/d",
      "lt" :  "now/d"
    }
  }
}

Date format & time zone:

{
  "range" : {
    "born" : {
      "gte": "01/01/2012",
      "lte": "2013",
      "format": "dd/MM/yyyy||yyyy",
      "time_zone": "-04:00"
    }
  }
}

Parameters:

gte: Greater-than or equal to
gt: Greater-than
lte: Less-than or equal to
lt: Less-than
boost: Sets the boost value of the query, defaults to 1.0

Exists Query

Returns documents that have at least one non-null value in the original field.

1
2
3

{
  "exists" : { "field" : "user" }
}

Missing Query

"bool": {
  "must_not": {
    "exists": {
      "field": "user"
    }
  }
}

Prefix Query

1
2
3

{
  "prefix" : { "user" : "ki" }
}

Optional boost:

1
2
3

{
  "prefix" : { "user" :  { "value" : "ki", "boost" : 2.0 } }
}

Wildcard Query

1
2
3

{
  "wildcard" : { "user" : "ki*y" }
}

Optional boost:

1
2
3

{
  "wildcard" : { "user" : { "value" : "ki*y", "boost" : 2.0 } }
}

Regexp Query

1
2
3

{
  "regexp" : { "user" : "ki.*y" }
}

Optional boost:

1
2
3

{
  "regexp" : { "user" : { "value" : "ki.*y", "boost" : 2.0 } }
}

Fuzzy Query

Levenshtein edit distance for string
+/- margin on numeric and date

1
2
3

{
  "fuzzy" : { "user" : "ki" }
}

{
  "fuzzy" : {
    "user" : {
      "value" :         "ki",
      "boost" :         1.0,
      "fuzziness" :     2,
      "prefix_length" : 0,
      "max_expansions": 100
    }
  }
}

{
  "fuzzy" : {
    "price" : {
      "value" : 12,
      "fuzziness" : 2
    }
  }
}

{
  "fuzzy" : {
    "created" : {
      "value" : "2010-02-05T12:05:07",
      "fuzziness" : "1d"
    }
  }
}

Parameters:

fuzziness: The maximum edit distance. Defaults to AUTO.
prefix_length: The number of initial characters which will not be “fuzzified”. This helps to reduce the number of terms which must be examined. Defaults to 0.
max_expansions: The maximum number of terms that the fuzzy query will expand to. Defaults to 50.

Type Query

Matching the provided document / mapping type.

{
  "type" : {
    "value" : "my_type"
  }
}

IDs Query

Filters documents that only have the provided _uid.

{
  "ids" : {
    "type" : "my_type",
    "values" : ["1", "4", "100"]
  }
}

Compound Queries

Constant Score query

A query that wraps another query and simply returns a constant score equal to the query boost for every document in the filter. Maps to Lucene ConstantScoreQuery.

{
  "constant_score" : {
    "filter" : {
      "term" : { "user" : "kimchy"}
    },
    "boost" : 1.2
  }
}

Bool Query

Occurrence types:

must: The clause (query) must appear in matching documents and will contribute to the score.
filter: The clause (query) must appear in matching documents. the score of the query will be ignored.
should: The clause (query) should appear in the matching document. In a boolean query with no must clauses, one or more should clauses must match a document, the minimum number of should clauses to match can be set using the minimum_should_match parameter.
must_not: The clause (query) must not appear in the matching documents.

{
  "bool" : {
    "must" : {
      "term" : { "user" : "kimchy" }
    },
    "filter": {
      "term" : { "tag" : "tech" }
    },
    "must_not" : {
      "range" : {
        "age" : { "from" : 10, "to" : 20 }
      }
    },
    "should" : [
      {
        "term" : { "tag" : "wow" }
      },
      {
        "term" : { "tag" : "elasticsearch" }
      }
    ],
    "minimum_should_match" : 1,
    "boost" : 1.0
  }
}

Generates the union of documents produced by its subqueries, an dscores each document with the maximum score for that document as produced by any subquery, plus a tie breaking increment for any additional matching subqueries.

{
  "dis_max" : {
    "tie_breaker" : 0.7,
    "boost" : 1.2,
    "queries" : [
      {
        "term" : { "age" : 34 }
      },
      {
        "term" : { "age" : 35 }
      }
    ]
  }
}

Function Score Query

"function_score": {
  "query": {},
  "boost": "boost for the whole query",
  "FUNCTION": {}, 
  "boost_mode":"(multiply|replace|...)"
}

"function_score": {
  "query": {},
  "boost": "boost for the whole query",
  "functions": [
    {
      "filter": {},
      "FUNCTION": {}, 
      "weight": number
    },
    {
      "FUNCTION": {} 
    },
    {
      "filter": {},
      "weight": number
    }
  ],
  "max_boost": number,
  "score_mode": "(multiply|max|...)",
  "boost_mode": "(multiply|replace|...)",
  "min_score" : number
}

score_mode:

multiply: scores are multiplied (default)
sum: scores are summed
avg: scores are averaged
first: the first function that has a matching filter is applied
max: maximum score is used
min: minimum score is used

boost_mode:

multiply: query score and function score is multiplied (default)
replace: only function score is used, the query score is ignored
sum: query score and function score are added
avg: average
max: max of query score and function score
min: min of query score and function score

Boosting Query

{
  "boosting" : {
    "positive" : {
      "term" : {
        "field1" : "value1"
      }
    },
    "negative" : {
      "term" : {
        "field2" : "value2"
      }
    },
    "negative_boost" : 0.2
  }
}

Indices Query

{
  "indices" : {
    "indices" : ["index1", "index2"],
    "query" : {
      "term" : { "tag" : "wow" }
    },
    "no_match_query" : {
      "term" : { "tag" : "kow" }
    }
  }
}

Limit Query

A limit query limits the number of documents (per shard) to execute on.

{
  "bool": {
    "must": {
      "term" : { "name.first" : "shay" }
    },
    "filter" : {
      "limit" : {"value" : 100}
    }
  }
}

Joining Queries

Nested Query

Sample mapping:

{
  "type1" : {
    "properties" : {
      "obj1" : {
        "type" : "nested"
      }
    }
  }
}

Sample nested query:

{
  "nested" : {
    "path" : "obj1",
    "score_mode" : "avg",
    "query" : {
      "bool" : {
        "must" : [
          {
            "match" : {"obj1.name" : "blue"}
          },
          {
            "range" : {"obj1.count" : {"gt" : 5}}
          }
        ]
      }
    }
  }
}

Has Child Query

{
  "has_child" : {
    "type" : "blog_tag",
    "score_mode" : "sum",
    "min_children": 2, 
    "max_children": 10, 
    "query" : {
      "term" : {
        "tag" : "something"
      }
    }
  }
}

Has Parent Query

{
    "has_parent" : {
        "parent_type" : "blog",
        "score_mode" : "score",
        "query" : {
            "term" : {
                "tag" : "something"
            }
        }
    }
}

Geo Queries

GeoShape Query

Requires the geo_shape Mapping.

Given docoment:

{
  "name": "Wind & Wetter, Berlin, Germany",
  "location": {
    "type": "Point",
    "coordinates": [13.400544, 52.530286]
  }
}

With envelope extensoin:

{
  "query":{
    "bool": {
      "must": {
        "match_all": {}
      },
      "filter": {
        "geo_shape": {
          "location": {
            "shape": {
              "type": "envelope",
              "coordinates" : [[13.0, 53.0], [14.0, 52.0]]
            },
            "relation": "within"
          }
        }
      }
    }
  }
}

Geo Bounding Box Query

Given document:

{
  "pin" : {
    "location" : {
      "lat" : 40.12,
      "lon" : -71.34
    }
  }
}

Query:

{
  "bool" : {
    "must" : {
      "match_all" : {}
    },
    "filter" : {
      "geo_bounding_box" : {
        "pin.location" : {
          "top_left" : {
            "lat" : 40.73,
            "lon" : -74.1
          },
          "bottom_right" : {
            "lat" : 40.01,
            "lon" : -71.12
          }
        }
      }
    }
  }
}

Geo Distance Queqy

Givven document:

{
  "pin" : {
    "location" : {
      "lat" : 40.12,
      "lon" : -71.34
    }
  }
}

Query:

{
  "bool" : {
    "must" : {
      "match_all" : {}
    },
    "filter" : {
      "geo_distance" : {
        "distance" : "200km",
        "pin.location" : {
          "lat" : 40,
          "lon" : -70
        }
      }
    }
  }
}

Options:

Option	Descript
distance	The radius of the circle centred on the specified location. Points which fall into this circle are considered to be matches. The distance can be specified in various units. See the section called “Distance Unitsedit”.
distance_type	How to compute the distance. Can either be sloppy_arc (default), arc (slightly more precise but significantly slower) or plane (faster, but inaccurate on long distances and close to the poles).
optimize_bbox	Whether to use the optimization of first running a bounding box check before the distance check. Defaults to memory which will do in memory checks. Can also have values of indexed to use indexed value check (make sure the geo_point type index lat lon in this case), or none which disables bounding box optimization.
_name	Optional name field to identify the query
coerce	Set to true to normalize longitude and latitude values to a standard -180:180 / -90:90 coordinate system. (default is false).
ignore_malformed	Set to true to accept geo points with invalid latitude or longitude (default is false).

Geo Distance Range Query

{
  "bool" : {
    "must" : {
      "match_all" : {}
    },
    "filter" : {
      "geo_distance_range" : {
        "from" : "200km",
        "to" : "400km",
        "pin.location" : {
          "lat" : 40,
          "lon" : -70
        }
      }
    }
  }
}

Geo Polygon Query

{
  "bool" : {
    "query" : {
      "match_all" : {}
    },
    "filter" : {
      "geo_polygon" : {
        "person.location" : {
          "points" : [
            {"lat" : 40, "lon" : -70},
            {"lat" : 30, "lon" : -90},
            {"lat" : 20, "lon" : -90},
            {"lat" : 10, "lon" : -60}
          ]
        }
      }
    }
  }
}

Options:

Option	Description
_name	Optional name field to identify the filter
coerce	Set to true to normalize longitude and latitude values to a standard -180:180 / -90:90 coordinate system. (default is false).
ignore_malformed	Set to true to accept geo points with invalid latitude or longitude (default is false).

Geohash Cell Query

Geohash needs be indexed:

{
  "mappings" : {
    "location": {
      "properties": {
        "pin": {
          "type": "geo_point",
          "geohash": true,
          "geohash_prefix": true,
          "geohash_precision": 10
        }
      }
    }
  }
}

Specialized Queries

More Like This Query

{
  "more_like_this" : {
    "fields" : ["title", "description"],
    "like" : "Once upon a time",
    "min_term_freq" : 1,
    "max_query_terms" : 12
  }
}

Template Query

Based on Mustache.

{
  "query": {
    "template": {
      "inline": { "match": { "text": "{{query_string}}" }},
      "params" : {
        "query_string" : "all about search"
      }
    }
  }
}

Stored template:

```
{
  "query": {
    "template": {
      "file": "my_template", 
      "params" : {
        "query_string" : "all about search"
      }
    }
  }
}

Or:

PUT /_search/template/my_template
{
  "template": { "match": { "text": "{{query_string}}" }},
}

{
  "query": {
    "template": {
      "id": "my_template", 
      "params" : {
        "query_string" : "all about search"
      }
    }
  }
}

Script Query

"bool" : {
  "must" : {
    ...
  },
  "filter" : {
    "script" : {
      "script" : "doc['num1'].value > 1"
    }
  }
}

Span Query

Span Term Query

1
2
3

{
  "span_term" : { "user" : { "value" : "kimchy", "boost" : 2.0 } }
}

Span Multi Term Query

{
  "span_multi":{
    "match":{
      "prefix" : { "user" :  { "value" : "ki", "boost" : 1.08 } }
    }
  }
}

Span First Query

{
  "span_first" : {
    "match" : {
      "span_term" : { "user" : "kimchy" }
    },
    "end" : 3
  }
}

Span Near Query

{
  "span_near" : {
    "clauses" : [
      { "span_term" : { "field" : "value1" } },
      { "span_term" : { "field" : "value2" } },
      { "span_term" : { "field" : "value3" } }
    ],
    "slop" : 12,
    "in_order" : false,
    "collect_payloads" : false
  }
}

Span Or Query

{
  "span_or" : {
    "clauses" : [
      { "span_term" : { "field" : "value1" } },
      { "span_term" : { "field" : "value2" } },
      { "span_term" : { "field" : "value3" } }
    ]
  }
}

Span Not Query

{
  "span_not" : {
    "include" : {
      "span_term" : { "field1" : "hoya" }
    },
    "exclude" : {
      "span_near" : {
        "clauses" : [
          { "span_term" : { "field1" : "la" } },
          { "span_term" : { "field1" : "hoya" } }
        ],
        "slop" : 0,
        "in_order" : true
      }
    }
  }
}

Span Containing Query

{
  "span_containing" : {
    "little" : {
      "span_term" : { "field1" : "foo" }
    },
    "big" : {
      "span_near" : {
        "clauses" : [
          { "span_term" : { "field1" : "bar" } },
          { "span_term" : { "field1" : "baz" } }
        ],
        "slop" : 5,
        "in_order" : true
      }
    }
  }
}

mapping

Mapping Options:

Option	Description	Default
tree	geohash / quadtree	geohash
precision	in, inch, yd, yard, mi, miles, km, kilometers, m,meters, cm,centimeters, mm, millimeters	meters
tree_levels		50m
strategy	The approach for how to represent shapes at indexing and search time	recursive
distance_error_pct	precise	0.025 ((2.5%)
orientation	Optionally define how to interpret vertex order for polygons / multipolygons	ccw
points_only		false

GeoJSON