Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export to CSV with cached data is converted to Panda Dataframe without column formats #20919

Closed
3 tasks done
mbcsa opened this issue Jul 29, 2022 · 1 comment
Closed
3 tasks done
Labels
#bug Bug report

Comments

@mbcsa
Copy link

mbcsa commented Jul 29, 2022

Export to CSV with cached data is converted to Panda Dataframe without column formats.
Then this breaks the usage of CSV_EXPORT option 'sep' (decimal separator).

The first time the data is exported, all works well and decimal separator is respected.

But when the query is executed again during the Cache Timeout, the data is gathered from "results_backend" and a Dataframe is dynamically created. This way, the Dataframe doesn't have column format specifications

It was working fine until merged commit e1fd906
#20760

In file superset/core.py, line dtype=object is setting "object" type for all columns of Dataframe.

def csv(...)
    ...
    df = pd.DataFrame(
        data=obj["data"],
        dtype=object,
        columns=[c["name"] for c in obj["columns"]],
    )
    ...

When removing line "dtype=object,", the CSV works correctly:

def csv(...)
    ...
    df = pd.DataFrame(
        data=obj["data"],
        columns=[c["name"] for c in obj["columns"]],
    )
    ...

How to reproduce the bug

  1. In superset_config.py configure CSV_EXPORT options:
CSV_EXPORT = {
    'encoding': 'utf-8',
    'sep': ';',
    'decimal': ',',
}
  1. Go to /superset/sqllab/
  2. Create a NEW SQL query having a Decimal / Float / Real column.
    For Example:
    image
  3. Export results to CSV
  4. Note that exported CSV file is correctly formed with configured decimal separator ","
  5. Execute again the SAME SQL, pressing Run button
  6. Export results to CSV AGAIN
  7. Note that exported CSV file has a point "." for decimal separator, instead of ",".

Expected results

Export to CSV to use configured decimal separator, either is using Cached data or not.

Actual results

OK - Export to CSV is using configured decimal separator when data is comming without caching, directly from DB.
FAIL - Expor to CSV is NOT using configured decimal separator when data is comming from "results_backend" CACHE.

Screenshots

Pandas Dataframe when cached

image

Pandas Dataframe when NOT cached

image

Environment

  • browser type and version: Chrome / Brave / Firefox
  • superset version: Docker Superset 0.0.0dev
  • docker build rusackas Thu Jul 28 17:36:00 UTC 2022
  • any feature flags active:
    "ALERT_REPORTS": True,
    "ENABLE_TEMPLATE_PROCESSING": True

Checklist

Make sure to follow these steps before submitting your issue - thank you!

  • I have checked the superset logs for python stacktraces and included it here as text if there are any.
  • I have reproduced the issue with at least the latest released version of superset.
  • I have checked the issue tracker for the same issue and I haven't found one similar.
@rusackas
Copy link
Member

rusackas commented Feb 9, 2024

I'm guessing the linked PR should have closed this. If this needs to be reopened, say the word!

@rusackas rusackas closed this as completed Feb 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
#bug Bug report
Projects
None yet
Development

No branches or pull requests

2 participants