Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sanitize core library filename #33

Closed
wsxiaoys opened this issue Oct 4, 2017 · 20 comments
Closed

Sanitize core library filename #33

wsxiaoys opened this issue Oct 4, 2017 · 20 comments

Comments

@wsxiaoys
Copy link
Contributor

wsxiaoys commented Oct 4, 2017

Hi,

I'm working on support gerbil compiling in my implementation of bazel integration of gambit (https://github.com/wsxiaoys/rules_gambit).

One restriction bazel have, is only allows specific set of filenames as sources: link

@fare
Copy link
Collaborator

fare commented Oct 4, 2017

Hi! I once co-wrote bazel support for Common Lisp. What issue are you having? Is the issue that gambit generates files with # in the name? That's a problem indeed. Maybe talk with Marc Feeley on how much this matters?

@fare
Copy link
Collaborator

fare commented Oct 4, 2017

Oh, I see it might be gerbil introducing the # in some file names... @vyzo can you comment?

@wsxiaoys
Copy link
Contributor Author

wsxiaoys commented Oct 4, 2017

I actually fixed the "#" issue in bazelbuild/bazel#2006 to make bazel work with gambit. (updated OP)

But gerbil have names like core::<MOP>::<MOP:1>__rt.scm.

@vyzo
Copy link
Collaborator

vyzo commented Oct 4, 2017

The "#" is used by convention in Gambit for macro definitions that get included from other files.

The <...>'s are coming from nested modules that get compiled into separate module files; <MOP> is the name of the nested module, that has another nested module <MOP:1>.

I guess we could sanitize the generated file names to not include invalid shell characters.

@wsxiaoys
Copy link
Contributor Author

wsxiaoys commented Oct 4, 2017

That should work (we also need regenerate bootstrap files, right?).

Involved characters are: <, > and :.

@vyzo
Copy link
Collaborator

vyzo commented Oct 4, 2017

It requires some small changes in the compiler, and then a new bootstrap generation -- but that's something that is easy to do nowadays.
We will also have to pick a different nested module name separator (that's where the :: is coming from)

@wsxiaoys
Copy link
Contributor Author

wsxiaoys commented Oct 4, 2017

# is safe to use as separator it has been released in bazel since 0.5.4.

@vyzo
Copy link
Collaborator

vyzo commented Oct 4, 2017

Can we make a list of the problematic characters so that I can have the compiler do generic mangling for all of them?

@wsxiaoys
Copy link
Contributor Author

wsxiaoys commented Oct 4, 2017

We could reuse bazel's sanitize target name definition: [0-9a-zA-Z_@-+,=~# .()$]+.

@vyzo
Copy link
Collaborator

vyzo commented Oct 4, 2017

so these are the acceptable characters in bazel?

@wsxiaoys
Copy link
Contributor Author

wsxiaoys commented Oct 4, 2017

For filenames yes.

It's slightly stricter for package name (e.g. , is not allowed), but it's not affecting our case.

@vyzo
Copy link
Collaborator

vyzo commented Oct 4, 2017

Can we add : to the list as well? it would be nice to not have to come up with a new separator.

@wsxiaoys
Copy link
Contributor Author

wsxiaoys commented Oct 4, 2017

: is slightly tricker, bazel used it to represent sub-packages.

For example I could run bazel run //tests/ffi:main in http://github.com/wsxiaoys/rules_gambit, where main is a package declared in tests/ffi/BUILD.

@vyzo
Copy link
Collaborator

vyzo commented Oct 4, 2017

ok, I guess we could use $ java-style :)

@fare
Copy link
Collaborator

fare commented Oct 4, 2017

Also, the : character won't work on macOS, a Lisp Machine, or VMS... not that it really matters... :-)

@wsxiaoys
Copy link
Contributor Author

wsxiaoys commented Oct 9, 2017

Regarding of a generic approach to handle special characters in module name, how about simply replace characters other than [a-zA-Z0-9_] with __asciicode__, just like gambit-scheme's?

entry point are:
compile-scm-file
core-resolve-library-module-path

Am I missing any parts?

@vyzo
Copy link
Collaborator

vyzo commented Oct 10, 2017

yes, that would work.

vyzo added a commit that referenced this issue Mar 11, 2018
@vyzo
Copy link
Collaborator

vyzo commented Mar 11, 2018

So the following characters are restricted and replaced with _ by the compiler for compiled module artifacts:

:#<>&!?*;()[]{}|'`"\

Note that this has implications on library module filenames and package names; they are disallowed (but not enforced) there.

Merged in d9ff94e
Implementation in 42af642

@vyzo
Copy link
Collaborator

vyzo commented Mar 11, 2018

@wsxiaoys does this solve your issue?

@wsxiaoys
Copy link
Contributor Author

@vyzo Thanks for the fix! Yes, the built file is now accepted by bazel :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants