Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support elasticsearch-sql plugin for SQL query #2582

Closed
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 25 additions & 7 deletions redash/query_runner/elasticsearch.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
import logging
import re
import sys
import urllib

import requests
import simplejson as json
from simplejson import JSONDecodeError
from requests.auth import HTTPBasicAuth

from redash.query_runner import *
Expand Down Expand Up @@ -43,6 +45,13 @@
}


def get_index_from_sql(sql):
matches = re.search(r"from ([\w\d]+)", sql, flags=re.IGNORECASE)
Copy link

@jean343 jean343 Jan 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not match an index of the type metadata_0_2018-08-29_18:40:21, I would add ([\w\d\-:]+)

if not matches:
return None
return matches.groups()[0]


class BaseElasticSearch(BaseQueryRunner):
DEBUG_ENABLED = False

Expand Down Expand Up @@ -402,25 +411,34 @@ def name(cls):
def run_query(self, query, user):
try:
error = None
index_name = None
result_fields = None

logger.debug(query)
query_dict = json.loads(query)

index_name = query_dict.pop("index", "")
result_fields = query_dict.pop("result_fields", None)

if not self.server_url:
error = "Missing configuration key 'server'"
return None, error

url = "{0}/{1}/_search".format(self.server_url, index_name)
mapping_url = "{0}/{1}/_mapping".format(self.server_url, index_name)
try:
query_dict = json.loads(query)
index_name = query_dict.pop("index", "")
result_fields = query_dict.pop("result_fields", None)
url = "{0}/{1}/_search".format(self.server_url, index_name)
params = {"source": json.dumps(query_dict), "source_content_type": "application/json"}
except JSONDecodeError:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relaying on JSONDecodeError to detect SQL queries, means that users who just have some issue with their JSON will get an unexplained error when the fallback behavior is triggered.

How about we create a dedicate query runner for ES SQL?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point. My code may change normal behavior of the query runner. Thanks @arikfr

If I understand correctly, you prefer creating new subclass for ES SQL right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW. "ES with SQL plugin" and "Standard ES SQL" (version >=6.3) interfaces are different. For example:

ES with SQL plugin

GET /_sql?sql=SELECT * FROM table LIMIT 5

Standard ES SQL

POST /_xpack/sql?format=txt
{
    "query": "SELECT * FROM table LIMIT 5"
}

If we implement query runner for ES SQL plugin separately (separate into the new data source), it will be the problem because user may use both normal JSON query and SQL.

Any suggestions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, you prefer creating new subclass for ES SQL right?

Yes.

As for the second question, is there a simple way to check what version of ES we're querying? Or maybe, just add a checkbox configuration option to set whether we're using the plugin or not?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea! I like the concept that we're having the checkbox to select between JSON query and SQL query.

Will take care of this.

sql_query = query.strip() if isinstance(query, str) or isinstance(query, unicode) else ''
if sql_query.lower().startswith("select"):
logger.debug("Query using SQL statement")
index_name = get_index_from_sql(sql_query)
url = "{0}/_sql?sql={1}".format(self.server_url, sql_query)
params = {}

mapping_url = "{0}/{1}/_mapping".format(self.server_url, index_name)
mappings, error = self._get_query_mappings(mapping_url)
if error:
return None, error

params = {"source": json.dumps(query_dict), "source_content_type": "application/json"}
logger.debug("Using URL: %s", url)
logger.debug("Using params : %s", params)
r = requests.get(url, params=params, auth=self.auth)
Expand Down