Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(teradata): LIMIT syntax #18240

Merged
merged 42 commits into from
Feb 1, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
9a37eb7
temp not working
mccushjack Jun 9, 2021
873fc7b
Merge branch 'apache:master' into master
mccushjack Jun 17, 2021
c54a606
Update sql_parse.py
mccushjack Jun 17, 2021
f3e808f
fix for apply limit for Teradata connections
mccushjack Jun 17, 2021
8a5ca7b
Merge branch 'apache:master' into master
mccushjack Jun 17, 2021
d9e89b2
Merge branch 'apache:master' into master
mccushjack Jul 28, 2021
0f34db8
fixes to teradata driver and limit methods in teradata.py
mccushjack Jul 28, 2021
9e7def5
Merge branch 'apache:master' into master
mccushjack Aug 30, 2021
252987e
remove logging to tdpy
mccushjack Aug 31, 2021
ed72dd6
Merge branch 'master' of github.com:mccushjack/superset
mccushjack Aug 31, 2021
22030d0
rebase for Teradata fixes
mccushjack Nov 1, 2021
bba1b13
rebase for TD fixes
mccushjack Nov 1, 2021
cf92a6e
Merge remote-tracking branch 'upstream/master'
mccushjack Nov 1, 2021
830ca55
Update teradata.py
mccushjack Nov 1, 2021
c4a4898
fixes to td.py table ref4
mccushjack Nov 5, 2021
21a1de5
Update superset/db_engine_specs/teradata.py
mccushjack Nov 12, 2021
21db8dd
Update superset/db_engine_specs/teradata.py
mccushjack Nov 12, 2021
7535758
Update superset/db_engine_specs/teradata.py
mccushjack Nov 12, 2021
3f887e9
Update superset/db_engine_specs/teradata.py
mccushjack Nov 12, 2021
2aae450
Update superset/db_engine_specs/teradata.py
mccushjack Nov 12, 2021
f4ef16a
Update tests/unit_tests/db_engine_specs/test_teradata.py
mccushjack Nov 12, 2021
2fcd27c
Update superset/db_engine_specs/teradata.py
mccushjack Nov 12, 2021
13228c5
Update teradata.py
mccushjack Nov 12, 2021
a4fead1
Update teradata.py
mccushjack Nov 12, 2021
2075f83
Update test_teradata.py
mccushjack Nov 21, 2021
0984e03
Update test_teradata.py
mccushjack Nov 23, 2021
c3dc448
Merge branch 'master' into master
nytai Jan 24, 2022
c6b8e9c
commit updates
Jan 26, 2022
67d89ec
Merge remote-tracking branch 'upstream/master'
mccushjack Jan 26, 2022
ea9c3bd
improve insert, make sure it is right after SEL keyword
Jan 26, 2022
89451c4
simplify, improve readability
Jan 26, 2022
0d3c881
fix spelling of keywords variables
Jan 26, 2022
0cd6c1b
specify param types and return
Jan 26, 2022
eaa7373
use set literals
Jan 26, 2022
d38b552
use enumerate as suggested by pylint
Jan 26, 2022
9f16968
shorten line
Jan 26, 2022
76a94e3
Merge branch 'master' into fix_unit
dmcnulla Jan 26, 2022
3e8de76
update setup
mccushjack Jan 26, 2022
0103186
Merge branch 'fix_unit'
mccushjack Jan 26, 2022
0eadf07
Merge branch 'master' into fix_unit
dmcnulla Jan 27, 2022
3f3b55f
fix ch ItemList to item_list
Jan 27, 2022
3216e6a
commit changes to get black to work in precommit
Feb 1, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 9 additions & 8 deletions docs/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -569,7 +569,7 @@ Here's a list of some of the recommended packages.
+------------------+-------------------------------------------------------------------+-------------------------------------------------+
| SQL Server | ``"apache-superset[mssql]"`` | ``mssql://`` |
+------------------+-------------------------------------------------------------------+-------------------------------------------------+
| Teradata | ``"apache-superset[teradata]"`` | ``teradata://`` |
| Teradata | ``"apache-superset[teradata]"`` | ``teradatasql://`` |
+------------------+-------------------------------------------------------------------+-------------------------------------------------+
| Vertica | ``"apache-superset[vertical]"`` | ``vertica+vertica_python://`` |
+------------------+-------------------------------------------------------------------+-------------------------------------------------+
Expand Down Expand Up @@ -753,16 +753,17 @@ Teradata

The connection string for Teradata looks like this ::

teradata://{user}:{password}@{host}
The recommended connector library is
[teradatasql](https://github.com/Teradata/python-driver).
Also, see the latest on [PyPi](https://pypi.org/project/teradatasql/)

*Note*: Its required to have Teradata ODBC drivers installed and environment variables configured for proper work of sqlalchemy dialect. Teradata ODBC Drivers available here: https://downloads.teradata.com/download/connectivity/odbc-driver/linux
The connection string for Teradata looks like this:

Required environment variables: ::

export ODBCINI=/.../teradata/client/ODBC_64/odbc.ini
export ODBCINST=/.../teradata/client/ODBC_64/odbcinst.ini
```
teradatasql://{user}:{password}@{host}
```

See `Teradata SQLAlchemy <https://github.com/Teradata/sqlalchemy-teradata>`_.
See `Teradata SQL Native Python Driver <https://github.com/Teradata/python-driver>`_.

Apache Drill
------------
Expand Down
2 changes: 1 addition & 1 deletion docs/src/pages/docs/Connecting to Databases/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ A list of some of the recommended packages.
|[Snowflake](/docs/databases/snowflake)|```pip install snowflake-sqlalchemy```|```snowflake://{user}:{password}@{account}.{region}/{database}?role={role}&warehouse={warehouse}```|
|SQLite||```sqlite://```|
|[SQL Server](/docs/databases/sql-server)|```pip install pymssql```|```mssql://```|
|[Teradata](/docs/databases/teradata)|```pip install sqlalchemy-teradata```|```teradata://{user}:{password}@{host}```|
|[Teradata](/docs/databases/teradata)|```pip install teradatasql```|```teradatasql://{user}:{password}@{host}```|
|[Vertica](/docs/databases/vertica)|```pip install sqlalchemy-vertica-python```|```vertica+vertica_python://<UserName>:<DBPassword>@<Database Host>/<Database Name>```|

***
Expand Down
15 changes: 2 additions & 13 deletions docs/src/pages/docs/Connecting to Databases/teradata.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -9,21 +9,10 @@ version: 1
## Teradata

The recommended connector library is
[sqlalchemy-teradata](https://github.com/Teradata/sqlalchemy-teradata).
[teradatasql](https://github.com/Teradata/python-driver).

The connection string for Teradata looks like this:

```
teradata://{user}:{password}@{host}
```

Note: Its required to have Teradata ODBC drivers installed and environment variables configured for
proper work of sqlalchemy dialect. Teradata ODBC Drivers available here:
https://downloads.teradata.com/download/connectivity/odbc-driver/linux

Required environment variables:

```
export ODBCINI=/.../teradata/client/ODBC_64/odbc.ini
export ODBCINST=/.../teradata/client/ODBC_64/odbcinst.ini
teradatasql://{user}:{password}@{host}
```
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -159,7 +159,7 @@ def get_git_sha() -> str:
"snowflake": [
"snowflake-sqlalchemy==1.2.4"
], # PINNED! 1.2.5 introduced breaking changes requiring sqlalchemy>=1.4.0
"teradata": ["sqlalchemy-teradata==0.9.0.dev0"],
"teradata": ["teradatasql>=16.20.0.23"],
"thumbnails": ["Pillow>=8.3.2, <10.0.0"],
"vertica": ["sqlalchemy-vertica-python>=0.5.9, < 0.6"],
"netezza": ["nzalchemy>=11.0.2"],
Expand Down
240 changes: 238 additions & 2 deletions superset/db_engine_specs/teradata.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,13 +14,223 @@
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

from typing import Optional, Set

import sqlparse
from sqlparse.sql import (
Identifier,
IdentifierList,
Parenthesis,
remove_quotes,
Token,
TokenList,
)
from sqlparse.tokens import Keyword, Name, Punctuation, String, Whitespace
from sqlparse.utils import imt

from superset.db_engine_specs.base import BaseEngineSpec, LimitMethod
from superset.sql_parse import Table

PRECEDES_TABLE_NAME = {"FROM", "JOIN", "DESCRIBE", "WITH", "LEFT JOIN", "RIGHT JOIN"}
CTE_PREFIX = "CTE__"
JOIN = " JOIN"


def _extract_limit_from_query_td(statement: TokenList) -> Optional[int]:
td_limit_keywork = {"TOP", "SAMPLE"}
str_statement = str(statement)
str_statement = str_statement.replace("\n", " ").replace("\r", "")
token = str_statement.rstrip().split(" ")
token = [part for part in token if part]
limit = None

for i, _ in enumerate(token):
if token[i].upper() in td_limit_keywork and len(token) - 1 > i:
try:
limit = int(token[i + 1])
except ValueError:
limit = None
break
return limit


class ParsedQueryTeradata:
def __init__(
self, sql_statement: str, strip_comments: bool = False, uri_type: str = "None"
):

if strip_comments:
sql_statement = sqlparse.format(sql_statement, strip_comments=True)

self.sql: str = sql_statement
self._tables: Set[Table] = set()
self._alias_names: Set[str] = set()
self._limit: Optional[int] = None
self.uri_type: str = uri_type

self._parsed = sqlparse.parse(self.stripped())
for statement in self._parsed:
self._limit = _extract_limit_from_query_td(statement)

@property
def tables(self) -> Set[Table]:
if not self._tables:
for statement in self._parsed:
self._extract_from_token(statement)

self._tables = {
table for table in self._tables if str(table) not in self._alias_names
}
return self._tables

def stripped(self) -> str:
return self.sql.strip(" \t\n;")

def _extract_from_token(self, token: Token) -> None:
"""
<Identifier> store a list of subtokens and <IdentifierList> store lists of
subtoken list.

It extracts <IdentifierList> and <Identifier> from :param token: and loops
through all subtokens recursively. It finds table_name_preceding_token and
passes <IdentifierList> and <Identifier> to self._process_tokenlist to populate

self._tables.

:param token: instance of Token or child class, e.g. TokenList, to be processed
"""
if not hasattr(token, "tokens"):
return

table_name_preceding_token = False

for item in token.tokens:
if item.is_group and (
not self._is_identifier(item) or isinstance(item.tokens[0], Parenthesis)
):
self._extract_from_token(item)

if item.ttype in Keyword and (
item.normalized in PRECEDES_TABLE_NAME or item.normalized.endswith(JOIN)
):
table_name_preceding_token = True
continue

if item.ttype in Keyword:
table_name_preceding_token = False
continue
if table_name_preceding_token:
if isinstance(item, Identifier):
self._process_tokenlist(item)
elif isinstance(item, IdentifierList):
for item_list in item.get_identifiers():
if isinstance(item_list, TokenList):
self._process_tokenlist(item_list)
elif isinstance(item, IdentifierList):
if any(not self._is_identifier(ItemList) for ItemList in item.tokens):
self._extract_from_token(item)

@staticmethod
def _get_table(tlist: TokenList) -> Optional[Table]:
"""
Return the table if valid, i.e., conforms to the [[catalog.]schema.]table
construct.

:param tlist: The SQL tokens
:returns: The table if the name conforms
"""

# Strip the alias if present.
idx = len(tlist.tokens)

if tlist.has_alias():
ws_idx, _ = tlist.token_next_by(t=Whitespace)

if ws_idx != -1:
idx = ws_idx

tokens = tlist.tokens[:idx]

odd_token_number = len(tokens) in (1, 3, 5)
qualified_name_parts = all(
imt(token, t=[Name, String]) for token in tokens[::2]
)
dot_separators = all(imt(token, m=(Punctuation, ".")) for token in tokens[1::2])
if odd_token_number and qualified_name_parts and dot_separators:
return Table(*[remove_quotes(token.value) for token in tokens[::-2]])

return None

@staticmethod
def _is_identifier(token: Token) -> bool:
return isinstance(token, (IdentifierList, Identifier))

def _process_tokenlist(self, token_list: TokenList) -> None:
"""
Add table names to table set

:param token_list: TokenList to be processed
"""
# exclude subselects
if "(" not in str(token_list):
table = self._get_table(token_list)
if table and not table.table.startswith(CTE_PREFIX):
self._tables.add(table)
return

# store aliases
if token_list.has_alias():
self._alias_names.add(token_list.get_alias())

# some aliases are not parsed properly
if token_list.tokens[0].ttype == Name:
self._alias_names.add(token_list.tokens[0].value)
self._extract_from_token(token_list)

def set_or_update_query_limit_td(self, new_limit: int) -> str:
td_sel_keywords = {"SELECT", "SEL"}
td_limit_keywords = {"TOP", "SAMPLE"}
statement = self._parsed[0]

if not self._limit:
final_limit = new_limit
elif new_limit < self._limit:
final_limit = new_limit
else:
final_limit = self._limit

str_statement = str(statement)
str_statement = str_statement.replace("\n", " ").replace("\r", "")

tokens = str_statement.rstrip().split(" ")
tokens = [token for token in tokens if token]

if limit_not_in_sql(str_statement, td_limit_keywords):
selects = [i for i, word in enumerate(tokens) if word in td_sel_keywords]
first_select = selects[0]
tokens.insert(first_select + 1, "TOP")
tokens.insert(first_select + 2, str(final_limit))

next_is_limit_token = False
new_tokens = []

for token in tokens:
if token.upper() in td_limit_keywords:
next_is_limit_token = True
elif next_is_limit_token:
if token.isdigit():
token = str(final_limit)
next_is_limit_token = False
new_tokens.append(token)

return " ".join(new_tokens)


class TeradataEngineSpec(BaseEngineSpec):
"""Dialect for Teradata DB."""

engine = "teradata"
engine = "teradatasql"
engine_name = "Teradata"
limit_method = LimitMethod.WRAP_SQL
max_column_name_length = 30 # since 14.10 this is 128
Expand All @@ -32,7 +242,7 @@ class TeradataEngineSpec(BaseEngineSpec):
"P1D": "TRUNC(CAST({col} as DATE), 'DDD')",
"P1W": "TRUNC(CAST({col} as DATE), 'WW')",
"P1M": "TRUNC(CAST({col} as DATE), 'MONTH')",
"P3M": "TRUNC(CAST({col} as DATE), 'Q')",
"P0.25Y": "TRUNC(CAST({col} as DATE), 'Q')",
"P1Y": "TRUNC(CAST({col} as DATE), 'YEAR')",
}

Expand All @@ -43,3 +253,29 @@ def epoch_to_dttm(cls) -> str:
"AT 0)) AT 0) + (({col} MOD 86400) * INTERVAL '00:00:01' "
"HOUR TO SECOND) AS TIMESTAMP(0))"
)

@classmethod
def apply_limit_to_sql(
cls, sql: str, limit: int, database: str = "Database", force: bool = False
) -> str:
"""
Alters the SQL statement to apply a TOP clause
The function overwrites similar function in base.py because Teradata doesn't
support LIMIT syntax
:param sql: SQL query
:param limit: Maximum number of rows to be returned by the query
:param database: Database instance
:return: SQL query with limit clause
"""

parsed_query = ParsedQueryTeradata(sql)
sql = parsed_query.set_or_update_query_limit_td(limit)

return sql


def limit_not_in_sql(sql: str, limit_words: Set[str]) -> bool:
for limit_word in limit_words:
if limit_word in sql:
return False
return True
Loading