Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Download to wikitable" parallel to "Download to CSV" in sqllab #25037

Closed
fredster33 opened this issue Aug 21, 2023 Discussed in #24455 · 2 comments
Closed

"Download to wikitable" parallel to "Download to CSV" in sqllab #25037

fredster33 opened this issue Aug 21, 2023 Discussed in #24455 · 2 comments

Comments

@fredster33
Copy link

Discussed in #24455

Originally posted by stuartyeates June 20, 2023

"Download to wikitable" parallel to "Download to CSV" in sqllab

Motivation

The Wikimedia Foundation (WMF) currently runs an in-house SQL explorer called Quarry https://quarry.wmcloud.org/ The plan is to move to superset, see https://phabricator.wikimedia.org/T169452. A very heavily used feature of Quarry is export as a wikitable (i.e. markup that can be included in a wikimedia wiki page) See for example https://quarry.wmcloud.org/query/74483

Since there are many tens of thousands of wikimedia wiki installs around the world, such as feature would be likely to be of use to others as well as ourselves.

Note: I'm a WMF volunteer, not a WMF staffer. I do not / cannot speak for the WMF.

Proposed Change

Clone everything under /api/v1/sqllab/export/ to /api/v1/sqllab/exportwiki/ (or a similar name) and change the output. By keeping things separate impact on the existing code should be less significant. The core of the csv export is https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_csv.html but there is not directly equalivent for a wikitable, so I'll have to write my own based partly on the logic at https://github.com/pandas-dev/pandas/blob/main/pandas/io/formats/csvs.py

The documentation on the output format is at https://www.mediawiki.org/wiki/Help:Tables

https://quarry.wmcloud.org/ (which has dozens of users working in dozens of natural languages) would be the test server for this. I'll have my own DEV install.

I believe I understand how to do most of this work. I've already had one pull request merged to master (see #24440).

New or Changed Public Interfaces

Addition of /api/v1/sqllab/exportwiki/

Migration Plan and Compatibility

Migration: none.

Compatibility: By using the basic definition at https://www.mediawiki.org/wiki/Help:Tables rather than the richer definition at https://en.wikipedia.org/wiki/Help:Table the plan is to keep the output as portable as possible, both across wikis and going forward in time.

It is likely that binary fields will be dropped (or values swapped with placeholders), because there is no reliable way to represent them in wikitables. UTF-8 will have to be un-escaped (I believe); there are plenty of WMF users querying non-Roman-script wikis to test with.

@fredster33
Copy link
Author

@rchard2scout on Jun 21, 2023:

Just FYI, the wikitable formatting code in Quarry is available under an MIT license here. It's not very complicated.

@stuartyeates replying to @rchard2scout on Jun 21, 2023:

You are correct.

However the context is very different; Quarry has already mapped VARBINARY(M) and other types to UTF8, whereas Superset hasn't; binary fields aren't handled in that Quarry export as well as they are in Superset; etc.

@rusackas
Copy link
Member

rusackas commented Mar 8, 2024

Hi @fredster33 - sorry this slipped under the collective radar when you posted it. We're trying to get better about these things.

Are you still interested in contributing to this? I think we would welcome a SIP or a PR to discuss, if so. It does indeed seem useful... we just need to make sure that as we add more and more export formats, they're kept discrete and maintainable - maybe even pluggable!

In any case, normally, I'd close this as stale, and because it's not a bug, but meanwhile I'll move it to a GitHub Discussion as an Ideas thread in case you (or anyone) want to pick it up. Let us know!

@apache apache locked and limited conversation to collaborators Mar 8, 2024
@rusackas rusackas converted this issue into discussion #27445 Mar 8, 2024

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants