📜 Download the Shakespeare data, create an index mapping as below and insert the data.
⬇️ https://download.elastic.co/demos/kibana/gettingstarted/shakespeare_6.0.json
(this is also contained in the data folder of this repo)
PUT /shakespeare
{
"mappings": {
"properties": {
"speaker": {"type": "keyword"},
"play_name": {"type": "keyword"},
"line_id": {"type": "integer"},
"speech_number": {"type": "integer"}
}
}
}
$ curl -u "elastic:Password01" -s -H "Content-Type: application/x-ndjson" -XPUT localhost:9200/shakespeare/_bulk --data-binary "@shakespeare_6.0.json"; echo
❓ How many times do the words brothers
, blood
appear in the shakespeare
data?
View Solution (click to reveal)
💡 text_entry
is the field we are querying. match
is the type of query.
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/query-dsl-match-query.html
Returns documents that match a provided text, number, date or boolean value. The provided text is analyzed before matching.
The match query is the standard query for performing a full-text search, including options for fuzzy matching.
GET shakespeare/_search
{
"query": {
"match": {
"text_entry": {
"query": "brothers blood"
}
}
}
}
Answer: 761
❓ How many times do the words brothers
and blood
appear in the shakespeare
data?
View Solution (click to reveal)
operator (Optional, string) Boolean logic used to interpret text in the query value. Valid values are:
OR (Default) For example, a query value of capital of Hungary is interpreted as capital OR of OR Hungary. AND For example, a query value of capital of Hungary is interpreted as capital AND of AND Hungary.
GET shakespeare/_search
{
"query": {
"match": {
"text_entry": {
"query": "brothers blood",
"operator": "and"
}
}
}
}
Answer: 4
❓ How many times does the phrase brothers blood
appear in the shakespeare
data?
View Solution (click to reveal)
Like the match query but used for matching exact phrases or word proximity matches.
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/query-dsl-match-query-phrase.html
The match_phrase query analyzes the text and creates a phrase query out of the analyzed text.
GET shakespeare/_search
{
"query": {
"match_phrase": {
"text_entry": {
"query": "brothers blood"
}
}
}
}
Answer: 3
❓ Find how many lines there are, where Prince Henry
is mentioned or is a speaker in the shakespeare
data?
View Solution (click to reveal)
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/query-dsl-multi-match-query.html
The multi-field version of the match query.
The multi_match query builds on the match query to allow multi-field queries
GET shakespeare/_search
{
"query": {
"multi_match" : {
"query": "prince henry",
"fields": [ "text_entry", "speaker" ],
"type": "phrase"
}
}
}
Answer: 22
You can use term-level queries to find documents
based on precise values
in structured data. Examples of structured data include date ranges, IP addresses, prices, or product IDs.
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/term-level-queries.html
The full text queries enable you to search analyzed text fields such as the body of an email. The query string is processed using the same analyzer that was applied to the field during indexing.
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/full-text-queries.html
The match_phrase query analyzes the text and creates a phrase query out of the analyzed text.
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/query-dsl-match-query-phrase.html
❓ How many times is Hamlet
mentioned as a play in the shakespeare
index?
View Solution (click to reveal)
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/query-dsl-terms-query.html
Returns documents that contain one or more
exact
terms in a provided field.
GET shakespeare/_search
{
"query": {
"term": {
"play_name": {
"value": "Hamlet"
}
}
}
}
Answer: 4244 times
❓ How many times is Alls well that ends well
mentioned as a play in the shakespeare
index?
View Solution (click to reveal)
GET shakespeare/_search
{
"query": {
"term": {
"play_name": {
"value": "Alls well that ends well"
}
}
}
}
Answer: 3083
❓ How many lines do kings Henry V
and Henry VI
have?
View Solution (click to reveal)
The
terms
query is the same as theterm
query, except you can search for multiple values.
GET shakespeare/_search
{
"query": {
"terms": {
"speaker": ["KING HENRY V", "KING HENRY VI"]
}
}
}
Note: the terms have to match exactly, so have to be in uppercase here as in the index field data.
Answer: 2003
❓ How many lines do all the kings have in the shakespeare
index?
View Solution (click to reveal)
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/query-dsl-wildcard-query.html
Returns documents that contain terms matching a wildcard pattern.
GET shakespeare/_search
{
"query": {
"wildcard": {
"speaker": "KING*"
}
}
}
Answer: 7250
❓ How many times do people called Henry
enter the scene?
View Solution (click to reveal)
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/query-dsl-regexp-query.html
Returns documents that contain terms matching a regular expression.
GET shakespeare/_search
{
"query": {
"regexp": {
"text_entry.keyword": {
"value": "Enter.*HENRY.*",
"flags": "ALL"
}
}
}
}
Answer: 33
❓ How many speeches range in number between 400
and 402
?
View Solution (click to reveal)
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/query-dsl-range-query.html
Returns documents that contain terms within a provided range.
GET shakespeare/_search
{
"query": {
"range": {
"speech_number": {
"gte": 400,
"lte": 402
}
}
}
}
Answer: 10
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/compound-queries.html
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/query-dsl-bool-query.html
Occur | Description |
---|---|
must | The clause (query) must appear in matching documents and will contribute to the score. |
filter | The clause (query) must appear in matching documents. However unlike must the score of the query will be ignored. Filter clauses are executed in filter context, meaning that scoring is ignored and clauses are considered for caching. |
should | The clause (query) should appear in the matching document. |
💡 It could be viewed that should is more like a logical OR where as must is a logical AND |
|
must_not | The clause (query) must not appear in the matching documents. Clauses are executed in filter context meaning that scoring is ignored and clauses are considered for caching. Because scoring is ignored, a score of 0 for all documents is returned. |
❓ How many times is Desdemona
mentioned in Othello
but not by the leading character Othello
himself?
View Solution (click to reveal)
GET shakespeare/_search
{
"size": 200,
"query": {
"bool": {
"must": [
{
"term": {
"play_name": "Othello"
}
},
{
"wildcard": {
"text_entry.keyword": "*Desdemona*"
}
}
],
"must_not": [
{
"term": {
"speaker": "OTHELLO"
}
}
]
}
}
}
Answer: 26
❓ Filter the previous query results to only show those that also mention the word sweet
View Solution (click to reveal)
GET shakespeare/_search
{
"size": 200,
"query": {
"bool": {
"must": [
{
"term": {
"play_name": "Othello"
}
},
{
"wildcard": {
"text_entry.keyword": "*Desdemona*"
}
}
],
"must_not": [
{
"term": {
"speaker": "OTHELLO"
}
}
],
"filter": {
"term": {
"text_entry": "sweet"
}
}
}
}
}
Answer: 2
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/async-search.html
The async search API let you asynchronously execute a search request, monitor its progress, and retrieve partial results as they become available.
Executes a search request asynchronously. It accepts the same parameters and request body as the
search API
.Returns an
id
.
POST
to return an id
.
It is better to be aware of the concepts and be able to look this up in the online manuals in the exam.
POST /kibana_sample_data_logs/_async_search?size=0
{
"sort": [
{ "@timestamp": { "order": "asc" } }
],
"aggs": {
"sale_date": {
"date_histogram": {
"field": "@timestamp",
"calendar_interval": "1d"
}
}
}
}
// output
{
"id" : "FmRldE8zREVEUzA2ZVpUeGs2ejJFUFEaMkZ5QTVrSTZSaVN3WlNFVmtlWHJsdzoxMDc=",
"is_partial" : true,
"is_running" : true,
"start_time_in_millis" : 1583945890986,
"expiration_time_in_millis" : 1584377890986,
"response" : {
"took" : 1122,
"timed_out" : false,
...
💡 Note the long id
string above.
The get async search API retrieves the results of a previously submitted async search request given its
id
.If the Elasticsearch security features are enabled, the access to the results of a specific async search is restricted to the user or API key that submitted it.
GET /_async_search/FmRldE8zREVEUzA2ZVpUeGs2ejJFUFEaMkZ5QTVrSTZSaVN3WlNFVmtlWHJsdzoxMDc=
The get async search status API, without retrieving search results, shows only the status of a previously submitted async search request given its
id
.If the Elasticsearch security features are enabled, the access to the get async search status API is restricted to the monitoring_user role.
GET /_async_search/status/FmRldE8zREVEUzA2ZVpUeGs2ejJFUFEaMkZ5QTVrSTZSaVN3WlNFVmtlWHJsdzoxMDc=
You can use the delete async search API to manually delete an async search by
id
.If the search is still running, the search request will be cancelled. Otherwise, the saved search results are deleted.
DELETE /_async_search/FmRldE8zREVEUzA2ZVpUeGs2ejJFUFEaMkZ5QTVrSTZSaVN3WlNFVmtlWHJsdzoxMDc=
We will be using the kibana_sample_data_ecommerce
index data for these examples.
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/search-aggregations.html
Show only the aggs results, and not all of the matches: https://www.elastic.co/guide/en/elasticsearch/reference/8.1/search-aggregations.html#return-only-agg-results
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/search-aggregations-metrics.html
- Avg Aggregation
- Max Aggregation
- Min Aggregation
- Sum Aggregation
- Stats Aggregation
- Extended Stats Aggregation
- Cardinality Aggregation
- Geo Bounds Aggregation
- Geo Centroid Aggregation
- Median Absolute Deviation Aggregation
- Percentiles Aggregation
- Percentile Ranks Aggregation
- Scripted Metric Aggregation
- Top Hits Aggregation
- Value Count Aggregation
- Weighted Avg Aggregation
❓ Pull the number of sales, Max, Min, Average and total sales for the American customers.
View Solution (click to reveal)
Note: We use
"size": 0
to only return the aggs results and not all the actual matching documents.
Note: We still need to use the
filter_path
to drill down to our aggs results.
GET kibana_sample_data_ecommerce/_search?filter_path=aggregations
{
"size": 0,
"query": {
"match": {
"geoip.country_iso_code": "US"
}
},
"aggs": {
"cart_stats": {
"extended_stats": {
"field": "taxful_total_price"
}
}
}
}
\\ output
{
"aggregations" : {
"cart_stats" : {
"count" : 1206,
"min" : 6.98828125,
"max" : 344.0,
"avg" : 76.16962064676616,
"sum" : 91860.5625,
"sum_of_squares" : 9216415.566009521,
"variance" : 1840.3245174012998,
"std_deviation" : 42.89900368774664,
"std_deviation_bounds" : {
"upper" : 161.96762802225945,
"lower" : -9.628386728727122
}
}
}
}
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/search-aggregations-bucket.html
- Adjacency Matrix Aggregation
- Auto-interval Date Histogram Aggregation
- Children Aggregation
- Composite Aggregation
- Date Histogram Aggregation
- Date Range Aggregation
- Diversified Sampler Aggregation
- Filter Aggregation
- Filters Aggregation
- Geo Distance Aggregation
- GeoHash grid Aggregation
- GeoTile Grid Aggregation
- Global Aggregation
- Histogram Aggregation
- IP Range Aggregation
- Missing Aggregation
- Nested Aggregation
- Parent Aggregation
- Range Aggregation
- Reverse nested Aggregation
- Sampler Aggregation
- Significant Terms Aggregation
- Significant Text Aggregation
- Terms Aggregation
❓ Display number of sales per day
View Solution (click to reveal)
💡 calendar_interval
comes out wrong in the kibana auto-complete. "interval": "month"
should be "calendar_interval": "month"
.
GET kibana_sample_data_ecommerce/_search?filter_path=aggregations
{
"size":0,
"aggs": {
"date_hist": {
"date_histogram": {
"field": "order_date",
"calendar_interval": "1d",
"min_doc_count": 1
}
}
}
}
\\ output
{
"aggregations" : {
"date_hist" : {
"buckets" : [
{
"key_as_string" : "2021-03-25T00:00:00.000Z",
"key" : 1616630400000,
"doc_count" : 146
},
{
"key_as_string" : "2021-03-26T00:00:00.000Z",
"key" : 1616716800000,
"doc_count" : 153
},
{
"key_as_string" : "2021-03-27T00:00:00.000Z",
"key" : 1616803200000,
"doc_count" : 143
},
...
❓ Display all sales per day broken down by sales category
View Solution (click to reveal)
- The
date_histogram
is the Y-axis- Then you need to group by category
- then Sum the sales (X-axis)
Basially here, you create each aggregation separately and then combine in the end.
Create the date_histogram
as in the previous example.
GET kibana_sample_data_ecommerce/_search?filter_path=aggregations
{
"size":0,
"aggs": {
"date_hist_y_axis": {
"date_histogram": {
"field": "order_date",
"calendar_interval": "1d",
"min_doc_count": 1
}
}
}
}
Aggregate by the category
grouping - here you are testing your sub-aggregation.
You should also note the number of categories, so you don't size
the aggregation too small later on.
GET kibana_sample_data_ecommerce/_search?filter_path=aggregations
{
"size": 0,
"aggs": {
"cat_keyword": {
"terms": {
"field": "category.keyword"
}
}
}
}
// output
{
"aggregations" : {
"cat_keyword" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "Men's Clothing",
"doc_count" : 2024
},
{
"key" : "Women's Clothing",
"doc_count" : 1903
},
{
"key" : "Women's Shoes",
"doc_count" : 1136
},
{
"key" : "Men's Shoes",
"doc_count" : 944
},
{
"key" : "Women's Accessories",
"doc_count" : 830
},
{
"key" : "Men's Accessories",
"doc_count" : 572
}
]
}
}
}
Put it all together
GET kibana_sample_data_ecommerce/_search?filter_path=aggregations
{
"size":0,
"aggs": {
"date_hist_y_axis": {
"date_histogram": {
"field": "order_date",
"calendar_interval": "1d",
"min_doc_count": 1
},
"aggs": {
"group_by_category": {
"terms": {
"field": "category.keyword",
"size": 8
},
"aggs": {
"total_sales_price": {
"sum": {
"field": "taxful_total_price"
}
}
}
}
}
}
}
}
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/search-search.html
#TODO: Need a couple of clusters, though the theory is easy
Note: just split the remote clusters with commas
Then append the index to each with a colon
local_index,remote1:index1,remote2:index2
View Solution (click to reveal)
GET local_index,remote_cluster1:that_remote_index,remote_cluster2:that_other_remote_index
{
"query" : {
"match_all" : {}
}
}
https://www.elastic.co/guide/en/elasticsearch/reference/current/runtime-search-request.html