Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to extract columns name along with table name from complex query having subquery #680

Open
cehsonu100 opened this issue Sep 27, 2022 · 3 comments

Comments

@cehsonu100
Copy link

How to extract columns name from complex query having subquery like below:

SELECT "Gender" AS "Gender", "Ethnicity" AS "Ethnicity", count(*) AS "count" FROM (Select distinct avg(t3."Age") As "Age", t1."Ethnicity" As "Ethnicity", t2."Gender" As "Gender" from (SELECT ethnicityid, genderid, age As "Age" FROM schema1.t3 FINAL where (__USER_TABLE_SECURITY_t3_USER_TABLE_SECURITY__) ) as t3 inner join (SELECT name As "Ethnicity", ethnicityid FROM schema1.t1 FINAL where (t1.code = 'AP' and __USER_TABLE_SECURITY_t1_USER_TABLE_SECURITY__) ) as t1 on t3.ethnicityid = t1.ethnicityid inner join (SELECT name As "Gender", genderid FROM schema1.t2 FINAL where (__USER_TABLE_SECURITY_t2_USER_TABLE_SECURITY__) ) as t2 on t3.genderid = t2.genderid where (1 = 1) group by "Ethnicity", "Gender") AS "virtual_table" GROUP BY "Gender", "Ethnicity" ORDER BY "count" DESC LIMIT 1000;

@ChrisJaunes
Copy link

ChrisJaunes commented Sep 28, 2022

You can use "splparse.parse" to parse this complex query, then you can iterate over tokens and get cloumns from "select token" and "from token". Note that it is handled according to different cloumn types.

@ChrisJaunes
Copy link

@ChrisJaunes
Copy link

If you need to extract the outermost columns, some modifications are required

for example:

if __name__ == "__main__":
    sql = """SELECT "Gender" AS "Gender", "Ethnicity" AS "Ethnicity", count(*) AS "count" FROM (Select distinct avg(t3."Age") As "Age", t1."Ethnicity" As "Ethnicity", t2."Gender" As "Gender" from (SELECT ethnicityid, genderid, age As "Age" FROM schema1.t3 FINAL where (__USER_TABLE_SECURITY_t3_USER_TABLE_SECURITY__) ) as t3 inner join (SELECT name As "Ethnicity", ethnicityid FROM schema1.t1 FINAL where (t1.code = 'AP' and __USER_TABLE_SECURITY_t1_USER_TABLE_SECURITY__) ) as t1 on t3.ethnicityid = t1.ethnicityid inner join (SELECT name As "Gender", genderid FROM schema1.t2 FINAL where (__USER_TABLE_SECURITY_t2_USER_TABLE_SECURITY__) ) as t2 on t3.genderid = t2.genderid where (1 = 1) group by "Ethnicity", "Gender") AS "virtual_table" GROUP BY "Gender", "Ethnicity" ORDER BY "count" DESC LIMIT 1000;"""
    import sqlparse
    statement = sqlparse.parse(sql)[0]
    raw_columns = []
    USE_SELECT, USE_FROM = False, False
    for token in statement:
        if token.ttype == sqlparse.tokens.Keyword and token.value == "FROM":
            USE_FROM = True
            break
        if USE_SELECT and not (token.is_whitespace or token.match(sqlparse.tokens.Punctuation, ',')):
            raw_columns.append(token)
        if token.ttype == sqlparse.tokens.DML and token.value == "SELECT":
            USE_SELECT = True

    def extract_table_identifiers(token_stream):
        for item in token_stream:
            if isinstance(item, sqlparse.sql.IdentifierList):
                for identifier in item.get_identifiers():
                    yield identifier.get_name()
            elif isinstance(item, sqlparse.sql.Identifier):
                yield item.get_name()
            elif item.ttype is sqlparse.tokens.Keyword:
                yield item.value

    print(list(extract_table_identifiers(raw_columns)))

output

['Gender', 'Ethnicity', 'count']

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants