Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add utf8 column name support for query result data source #2145

Conversation

tonyjiangh
Copy link
Contributor

Problem

When the column name contains non ASCII character, using query result as data source.

The original query result

image

Error when executing query from the query result

image

Use case

For original data source such as Google spread sheet, it is quite propable that sheet column was in non-ASCII language.

Implementation

Change non-ASCII column name into "column_{}".format(column_number)
image

Though this doesn't seem to be a fundamental solution, a better one might bring more influence to other data source.

@tonyjiangh tonyjiangh force-pushed the feature/query_result_utf8_support branch from 1549476 to c788cd5 Compare December 10, 2017 01:36
@arikfr
Copy link
Member

arikfr commented Dec 11, 2017

I'm not thrilled about renaming the columns... did you see which line throws this exception?

@tonyjiangh
Copy link
Contributor Author

tonyjiangh commented Dec 11, 2017

Well, I do understand the feeling.

As for the source of exception.
First, replace doesn't handle non-ASCII characters without the magic comment.

def fix_column_name(name):
    return name.replace(':', '_').replace('.', '_').replace(' ', '_')

And then there's create table in redash/query_runner/query_results.py. sqlite3 does not allow non-ASCII characters as column names.

I guess other solutions might be:

  • Add feature to allow user to define field name for each field at each query, for query result DS.
  • Add feature to define the pool for query result DS.

I personally prefer the 2nd one, cause that seems to be a way to replace query_xxx table name to a more explicit one.

Any ideas?

wireframe for the alternative:
image
image

@arikfr
Copy link
Member

arikfr commented Dec 13, 2017

I prefer this one:

Add feature to allow user to define field name for each field at each query, for query result DS.

And we actually already support that -> in most cases, the user can add aliases to the columns in their original query.

What we should have is a clear error message that says: "Only ASCII characters are supported for column names. Please rename column ... to a supported name.".

I personally prefer the 2nd one, cause that seems to be a way to replace query_xxx table name to a more explicit one.

While the second option has its merits, I prefer to avoid it as it adds another step. What we should do to address the query_xxx issue is to either allow the user to use the query name as the table name or just implement auto complete that will translate query names to query_xxx.

WDYT?

@tonyjiangh
Copy link
Contributor Author

That sounds right. Thanks for the idea.
It's true that we can use the new table setting for this.

So the current plan would be:

  • Use column alias at table setting for query result
  • Show more explicit error message.

As for the table name issue, I'll leave it out of the scope of this PR.

@tonyjiangh
Copy link
Contributor Author

On second thought, it's just better to create a new PR.

@tonyjiangh tonyjiangh closed this Dec 13, 2017
@tonyjiangh tonyjiangh deleted the feature/query_result_utf8_support branch December 13, 2017 23:17
@arikfr
Copy link
Member

arikfr commented Dec 14, 2017

Sorry for the confusion, but I didn't mean the new alias feature of the table visualization... I meant that the user can change it in the query itself, i.e.:

SELECT column AS this_is_the_alias
FROM ...

@tonyjiangh
Copy link
Contributor Author

Ok, now I get what you mean.

But unfortunately, our problem is exactly with data source that does not support this, such as Google Spread Sheet(GSS).
In fact, most of the non-ASCII column names are from GSS. Because we need to import data from sheets operated by local non-engineer members.

I do see a localization card in re:dash's Trello.
If that does mean language localization, I guess we will bump into these unicode problems sooner or later.

So I guess for this one, maybe these would be the options:

  1. use column alias for the primary table visualization as field name
    • CON: force the column alias to be ASCII-only
  2. use column alias for the table visualization named 'query_result' as field name
    • CON: have to create new table visualization, which is one more step plus some confusion.
  3. the query_result source pool idea shown above
    • CON: more steps and labour
  4. do something about GSS query_runner (no idea now)
  5. leave it as it is for now.

My preference would be 3 > 2 > 1.
WDYT?

@arikfr
Copy link
Member

arikfr commented Dec 14, 2017

How about adding another sheet to your spreadsheet that references data from the main one, but with different column names?

Also #4 might be an option, where we introduce a way to map column names in the query. 🤔

@tonyjiangh
Copy link
Contributor Author

OK, I get the direction now.
So the current plan is

  • More explicit error message
  • Try to workaround this on spreadsheet

↑next PR

↓research

  • In the meanwhile, try to enhance gss data source querying.

@arikfr
Copy link
Member

arikfr commented Dec 14, 2017

👍

Re. GSS querying:

I was thinking of adding the capabilities of the query results data source into the GSS one, so you could do:

SELECT ...
FROM [spreadsheet id]

But this doesn't help with aliases... Maybe when referencing the table you could setup the aliases?

@tonyjiangh
Copy link
Contributor Author

👍
Will try to dig into it.
I think there're a lot to enhance for the GSS data source including this issue.
Such as using A1notation to set the area of reference, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants