Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elasticsearch Nested Aggregations #2789

Open
shivam1892 opened this issue Sep 6, 2018 · 11 comments
Open

Elasticsearch Nested Aggregations #2789

shivam1892 opened this issue Sep 6, 2018 · 11 comments

Comments

@shivam1892
Copy link

shivam1892 commented Sep 6, 2018

Issue Summary

I am working on creating the table from elasticsearch with nested aggregation query.
Below is my example Query

{
  "index": "data",
  "size": 0,
  "aggs": {
    "date_aggr": {
      "date_histogram": {
        "field": "timestamp",
        "interval": "1d"
      },
      "aggs": {
        "agg_abc": {
          "terms": {
            "field": "abc.keyword",
            "order": {
              "price": "desc"
            }
          },
          "aggs": {
            "price": {
              "sum": {
                "field": "price"
              }
            }
          }
        }
      }
    }
  }
}

The above Nested aggregated query does not give all the columns in the table as the response of elasticsearch. Also “agg_abc” have more than 5 categories in it but in table it shows only for 2 categories which is also not complete.
According to me the nested aggregated query is not being parsed properly as some of the columns are missing.
But when I hit the above query to elastic search It gives the proper response with all categories.

Technical details:

  • Redash version 4.0.1
  • Ubuntu 16.04
  • Installed with docker setup guide.
@arikfr
Copy link
Member

arikfr commented Sep 7, 2018

Please share an example of the results you do get for this query in Redash and the results you get when querying directly. Thanks.

@shivam1892
Copy link
Author

shivam1892 commented Sep 10, 2018

Hi Arik,

This is my actual query

  "index":"data",
  "size": 0,
  "aggs": {
    "date_aggr": {
      "date_histogram": {
        "field": "timestamp",
        "interval": "1d"
      },
      "aggs": {
        "vendor": {
          "terms": {
            "field": "vendorId.keyword",
            "order": {
              "price": "desc"
            }
          },
          "aggs": {
            "price": {
              "sum": {
                "field": "totalPrice"
              }
            }
          }
        }
      }
    }
  }
}
This is response from Elasticsearch 
{
	"date_aggr": {
		"buckets": [{
				"key_as_string": "2018-01-01T00:00:00.000Z",
				"key": 1514764800000,
				"doc_count": 2540,
				"vendor": {
					"doc_count_error_upper_bound": 0,
					"sum_other_doc_count": 0,
					"buckets": [{
							"key": "RIL",
							"doc_count": 594,
							"price": {
								"value": 415973.95056152344
							}
						},
						{
							"key": "IOCL",
							"doc_count": 482,
							"price": {
								"value": 332374.34034729004
							}
						},
						{
							"key": "HPCL",
							"doc_count": 448,
							"price": {
								"value": 310616.87955093384
							}
						},
						{
							"key": "BPCL",
							"doc_count": 449,
							"price": {
								"value": 307841.93978500366
							}
						},
						{
							"key": "EssarOil",
							"doc_count": 338,
							"price": {
								"value": 220755.62995529175
							}
						},
						{
							"key": "ONGC",
							"doc_count": 229,
							"price": {
								"value": 156433.39011383057
							}
						}
					]
				}
			},
			{
				"key_as_string": "2018-01-02T00:00:00.000Z",
				"key": 1514851200000,
				"doc_count": 2514,
				"vendor": {
					"doc_count_error_upper_bound": 0,
					"sum_other_doc_count": 0,
					"buckets": [{
							"key": "RIL",
							"doc_count": 581,
							"price": {
								"value": 394284.0503387451
							}
						},
						{
							"key": "HPCL",
							"doc_count": 450,
							"price": {
								"value": 333518.30029296875
							}
						},
						{
							"key": "IOCL",
							"doc_count": 475,
							"price": {
								"value": 329492.58993911743
							}
						},
						{
							"key": "BPCL",
							"doc_count": 458,
							"price": {
								"value": 320690.6898841858
							}
						},
						{
							"key": "EssarOil",
							"doc_count": 331,
							"price": {
								"value": 221389.1397857666
							}
						},
						{
							"key": "ONGC",
							"doc_count": 219,
							"price": {
								"value": 144096.33992004395
							}
						}
					]
				}
			},
			{
				"key_as_string": "2018-01-03T00:00:00.000Z",
				"key": 1514937600000,
				"doc_count": 2469,
				"vendor": {
					"doc_count_error_upper_bound": 0,
					"sum_other_doc_count": 0,
					"buckets": [{
							"key": "RIL",
							"doc_count": 545,
							"price": {
								"value": 374848.3391456604
							}
						},
						{
							"key": "IOCL",
							"doc_count": 482,
							"price": {
								"value": 356451.5997390747
							}
						},
						{
							"key": "BPCL",
							"doc_count": 447,
							"price": {
								"value": 319834.75061416626
							}
						},
						{
							"key": "HPCL",
							"doc_count": 444,
							"price": {
								"value": 312194.3507118225
							}
						},
						{
							"key": "EssarOil",
							"doc_count": 336,
							"price": {
								"value": 252321.69033050537
							}
						},
						{
							"key": "ONGC",
							"doc_count": 215,
							"price": {
								"value": 152464.62057495117
							}
						}
					]
				}
			}```

And below is the csv format data created in Redash

price,doc_count,vendor,date_aggr
156433.39011383057,2540,ONGC,2018-01-01T00:00:00.000Z
144096.33992004395,2514,ONGC,2018-01-02T00:00:00.000Z
152464.62057495117,2469,ONGC,2018-01-03T00:00:00.000Z
173854.6601638794,2437,ONGC,2018-01-04T00:00:00.000Z
159169.18005752563,2452,ONGC,2018-01-05T00:00:00.000Z
152034.30032348633,2514,ONGC,2018-01-06T00:00:00.000Z
168911.539894104,2489,ONGC,2018-01-07T00:00:00.000Z
152085.08991241455,2492,ONGC,2018-01-08T00:00:00.000Z
165859.48080062866,2517,ONGC,2018-01-09T00:00:00.000Z
164046.0099182129,2459,ONGC,2018-01-10T00:00:00.000Z
153649.18016052246,2438,ONGC,2018-01-11T00:00:00.000Z
177316.99977111816,2533,ONGC,2018-01-12T00:00:00.000Z
167918.51028442383,2512,ONGC,2018-01-13T00:00:00.000Z
140893.3701019287,2453,ONGC,2018-01-14T00:00:00.000Z
157128.23043823242,2455,ONGC,2018-01-15T00:00:00.000Z
153589.3794708252,2497,ONGC,2018-01-16T00:00:00.000Z
167240.75048065186,2392,ONGC,2018-01-17T00:00:00.000Z
152549.56973266602,2446,ONGC,2018-01-18T00:00:00.000Z
145871.3002319336,2494,ONGC,2018-01-19T00:00:00.000Z
159190.40007019043,2522,ONGC,2018-01-20T00:00:00.000Z
165743.09028625488,2391,ONGC,2018-01-21T00:00:00.000Z
172964.93984222412,2511,ONGC,2018-01-22T00:00:00.000Z
150591.2501487732,2468,ONGC,2018-01-23T00:00:00.000Z
155488.8802909851,2565,ONGC,2018-01-24T00:00:00.000Z
159831.6100616455,2513,ONGC,2018-01-25T00:00:00.000Z
145047.91046905518,2455,ONGC,2018-01-26T00:00:00.000Z

As we can see in elasticsearch response we have 6 keys [ongc, ril, bpcl, hpcl...]
But in redash column for only ONGC is created corresponding to Sum of price. 

Can you please let me know where it is getting wrong?

@moghira10
Copy link

@arikfr Any update on this?

@ivanli1990
Copy link

ivanli1990 commented May 16, 2019

@arikfr I met this issue too.
My elasticsearch query:

{
    "index": "master_bug-20190513",
    "query": {
        "bool": {
            "must": [
                {
                    "match": {
                        "state": "已关闭"
                    }
                },
                {
                    "range": {
                        "@timestamp": {
                            "gte": 1557590400000,
                            "lte": 1558195199999,
                            "format": "epoch_millis"
                        }
                    }
                }
            ]
        }
    },
    "aggs": {
        "moduleId": {
            "terms": {
                "field": "moduleId.keyword"
            },
            "aggs": {
                "labels": {
                    "terms": {
                        "field": "labels.keyword",
                        "size": 100
                    },
                    "aggs": {
                        "onesId_num": {
                            "cardinality": {
                                "field": "onesId.keyword"
                            }
                        }
                    }
                }
            }
        }
    },
    "size": 0
}

The result from elasticsearch:

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 19,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "moduleId": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "数据",
          "doc_count": 11,
          "labels": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "数据自身问题",
                "doc_count": 10,
                "onesId_num": {
                  "value": 10
                }
              },
              {
                "key": "未填写原因",
                "doc_count": 1,
                "onesId_num": {
                  "value": 1
                }
              }
            ]
          }
        },
        {
          "key": "多码后端",
          "doc_count": 8,
          "labels": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "未填写原因",
                "doc_count": 8,
                "onesId_num": {
                  "value": 8
                }
              }
            ]
          }
        }
      ]
    }
  }
}

See above result, there has 3 rows in result. But in the redash table, there just 2 rows:
image
The row which moduleId is “数据” and labels is “数据自身问题” does not display in the result table.

@lkm
Copy link

lkm commented May 16, 2019

@ivanli1990 Your query looks fairly similar to what I'm implementing support for, you could try the PR I've just referenced

@ivanli1990
Copy link

@ivanli1990 Your query looks fairly similar to what I'm implementing support for, you could try the PR I've just referenced

It works fine. Thanks!!!

@ivanli1990
Copy link

@lkm I met another issue may be caused by your pr. When I query es, redash displays 'unicode' object has no attribute 'iteritems'. The full response from redash api:

{"job": {"status": 4, "error": "'unicode' object has no attribute 'iteritems'", "id": "5ffb80b7-d139-494c-b147-4d21b628a325", "query_result_id": null, "updated_at": 0}}

And, the query:

{
    "index": "master_bug-*",
    "query": {
        "bool": {
            "must": [
                {
                    "match_all": {}
                },
                {
                    "range": {
                        "@timestamp": {
                            "gte": 1557590400000,
                            "lte": 1558195199999,
                            "format": "epoch_millis"
                        }
                    }
                }
            ],
            "must_not": []
        }
    },
    "size": 0,
    "aggs": {
        "3": {
            "terms": {
                "field": "moduleId.keyword",
                "size": 10,
                "order": {
                    "1": "desc"
                }
            },
            "aggs": {
                "1": {
                    "cardinality": {
                        "field": "onesId.keyword"
                    }
                },
                "4": {
                    "filters": {
                        "filters": {
                            "state:已关闭": {
                                "query_string": {
                                    "query": "state:已关闭",
                                    "analyze_wildcard": true
                                }
                            }
                        }
                    },
                    "aggs": {
                        "1": {
                            "cardinality": {
                                "field": "onesId.keyword"
                            }
                        },
                        "5": {
                            "terms": {
                                "field": "labels.keyword",
                                "size": 100,
                                "order": {
                                    "1": "desc"
                                }
                            },
                            "aggs": {
                                "1": {
                                    "cardinality": {
                                        "field": "onesId.keyword"
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}

@lkm
Copy link

lkm commented May 17, 2019

@ivanli1990 Would you be able to provide a sample result from that query?

@ivanli1990
Copy link

@ivanli1990 Would you be able to provide a sample result from that query?

hi @lkm , the resut from query in elasticsearch as below:

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 64,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "3": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "1": {
            "value": 28
          },
          "4": {
            "buckets": {
              "state:已关闭": {
                "1": {
                  "value": 26
                },
                "5": {
                  "doc_count_error_upper_bound": 0,
                  "sum_other_doc_count": 0,
                  "buckets": [
                    {
                      "1": {
                        "value": 26
                      },
                      "key": "未填写原因",
                      "doc_count": 26
                    }
                  ]
                },
                "doc_count": 26
              }
            }
          },
          "key": "多码后端",
          "doc_count": 64
        }
      ]
    }
  }
}

@truebit
Copy link

truebit commented May 29, 2019

Acutally the elasticsearch tabify meant to do the same work.
And I found the python implementation of tabify.

I think we could port it to redash:)

@lkm
Copy link

lkm commented May 31, 2019

Seems good, would probably be an easy job. Not the most maintained project ever (the python implementation). Would you use it as requirement or copy in the code?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants