Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Improvement] reading excel file from base64 should allow ignoring data uri information #2762

Closed
AbdelrahmanHafez opened this issue Aug 8, 2022 · 4 comments · Fixed by #2763

Comments

@AbdelrahmanHafez
Copy link
Contributor

AbdelrahmanHafez commented Aug 8, 2022

When reading a base64 file, it would be nice if XLSX would ignore the leading data URI parameters.
So it can treat
data:text/csv;base64,TmFtZXMNCkhhZmV6DQpTYW0NCg==
the same way it treats
TmFtZXMNCkhhZmV6DQpTYW0NCg==

const workBook = XLSX.read(sheetInBase64, { type: 'base64' });
// workBook should be the same whether `sheetInBase64` starts with a dataURI scheme or without it.

I would like to create a PR with that fix, but the tests file seem to be automatically generated, and when I run npm test I simply get bits/some_file.js in the log, with no output telling me whether the tests passed or failed.

@SheetJSDev
Copy link
Contributor

https://docs.sheetjs.com/docs/miscellany/contributing#os-specific-setup and https://docs.sheetjs.com/docs/miscellany/testing should cover how to get started. If this does not work, please leave a note and we'll try to redo it.

To run the tests manually, use make test. Under the hood it actually runs mocha -R spec -t 30000. The smaller test can be run with env WTF=1 make test_misc

RFC2397 covers the full set of data URI options. Should the parser handle data that is not base64-encoded?

@AbdelrahmanHafez
Copy link
Contributor Author

AbdelrahmanHafez commented Aug 8, 2022

Thanks. I can see that we have .ts and .js files in the bits directories.
Having both can always bring up the question as to which one we should modify.

Is there a reason to not only have the .ts files and compile the .js ones out of them?

Also, a lot of the code feels to be automatically generated (one letter variables), with unusual styling, which is quite limiting in terms of contribution. Is there a reason behind that?

I can see that there are a lot of compiled files in the git repo which can be quite distracting when developing and makes it difficult to see where the source files are. Perhaps it would help if we can have a single src folder that includes all the source code, and anything else can be .gitignored, and for publishing purposes, the build process can be done on a CI/CD pipeline.

What do you think?

AbdelrahmanHafez added a commit to AbdelrahmanHafez/sheetjs that referenced this issue Aug 8, 2022
@SheetJSDev
Copy link
Contributor

(answering the question posed in the last comment)

This is a very old project. The first commits were in 2012, around the time that TypeScript was publicly released.

One of the primary goals is preserving compatibility with a wide array of JavaScript engines and runtimes. In 2022, PhotoShop and other Adobe products ship with ExtendScript, a runtime based on ES3 that is incompatible with modern JavaScript. In 2022, NetSuite still uses Rhino, a JS engine based on ES3. We still test against IE10 to ensure compatibility.

This is not a goal shared by the rest of the community. Countless projects that worked 6 months ago with modern tooling have broken in subtle ways because of drastic API changes and deep dependencies not properly pinning versions.

With regards to the short variable names, we've more or less had to abandon tooling for Photoshop. There are a number of tracking issues like mishoo/UglifyJS#1144 (comment) and mishoo/UglifyJS#1930 (comment) where other ecosystem developers decided that PhotoShop is not an important target. We were forced to adapt by opting for short variable names.

.

The main scripts are built with a cat step to ensure that we have complete control over the final scripts. Every line in xlsx.extendscript.js maps to a line in the source that we can directly observe and edit.

So when we explored modernizing the codebase, we wanted to incrementally shift over to TypeScript. esbuild presented a pathway to replacing individual files in the bits folder. For base64 the sequence is

  1. run make 04_base64.js from the modules directory. That creates modules/04_base64.js

  2. run make from the base directory. This copies modules/04_base64.js to bits/04_base64.js, inserting the file in the correct spot of the output.

We hope to eventually port everything over, but there is quite a bit of prototype manipulation and other tricks that need to be re-worked.

.

The actual docs site is built from https://github.com/SheetJS/docs.sheetjs.com/blob/master/docz/docs/09-miscellany/04-testing.md and https://github.com/SheetJS/docs.sheetjs.com/blob/master/docz/docs/09-miscellany/05-contributing.md -- please let us know how to improve those (and you can send a PR)

AbdelrahmanHafez added a commit to AbdelrahmanHafez/sheetjs that referenced this issue Aug 8, 2022
@AbdelrahmanHafez
Copy link
Contributor Author

I understand that having to support older environments can make the development experience a bit challenging.
I'm hoping I'll find some time to fix the challenges I faced while submitting a simple fix, and incrementally put such a nice project in a nicer state.

Thanks a lot for all the hard work. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants