Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Public API for compilers #1739

Open
kitsonk opened this issue Feb 11, 2019 · 42 comments
Open

Public API for compilers #1739

kitsonk opened this issue Feb 11, 2019 · 42 comments
Labels
feat new feature (which has been agreed to/accepted) public API related to "Deno" namespace in JS

Comments

@kitsonk
Copy link
Contributor

kitsonk commented Feb 11, 2019

For tracking purposes, please don't work on this without discussing with Ry or myself.

Having a public API that is similar to how we perform TypeScript compilation is a good idea. It would allow JS->JS transpilation (e.g. those who need Babel custom plugins or Flow) or other languages (e.g. CoffeeScript).

Related to #1738 and some other work in rationalising the compiler APIs internally, but we should be able to support loading a compiler in a web worker and instructing the privileged side what resources should be sent to that runtime compiler.

@ry ry added this to the v0.4 milestone Feb 19, 2019
@daniele-orlando
Copy link

Dart and Elm are other candidate languages.

@islishude
Copy link

I think Dartlang is better than TypeScript.

@kitsonk
Copy link
Contributor Author

kitsonk commented Mar 1, 2019

@islishude then you are probably looking at the wrong project.

@afinch7
Copy link
Contributor

afinch7 commented Mar 1, 2019

What you really want is a extensible module loading system(not resolution as already discussed this is prone to problems). I have quite a few ideas on this one, and even some existing code. I've been working on similar idea for while now, and I think the only good way to design this is with a two stage loading process:

  1. First "load"(what this means is really dependent on the media type) modules to js or ts(possibly with embedded wasm) source, since these are the only formats that the typescript compiler understands(other than JSON).
  2. Compile using something similar to the existing ts compiler.

The first step would be the only part that is user extensible.

@kitsonk
Copy link
Contributor Author

kitsonk commented Mar 2, 2019

The public compiler API would need to:

  1. Allow a user to register a compiler. That registration would include extensions and media types that it should compile. For security and consistency reasons, TypeScript media types and extensions should be disallowed, but JSON and JavaScript could potentially be registered. This would be a special type of web worker. I am not certain if this should be a new op or if, we add non-standard customisations to the options of new Worker(url, options) to provide the appropriate information.
  2. When the media type is encountered, Rust would post a message on the worker providing the module specifier and refer.
  3. The userland compiler would be able to fetch resources using the fetch_module_meta_data op. We would enforce the read file/network security on this (unlike the built in compiler).
  4. The userland compiler would post back via the web work API to Rust the compiled module code, source map and any diagnostics.
  5. The userland compiler would need to be able to be unregistered. (I guess we would just use Worker.terminate())

@afinch7
Copy link
Contributor

afinch7 commented Mar 3, 2019

You specified separating compilers by media type, but I don't think that extension(media type) is really a reliable way to do this(how would you handle no extension). I guess you could decide to on a list of supported media types, but I feel that would be very limiting. You could load modules by manifest and have the manifest tell the loader system what sort of loader to use, but that already sounds way to much like package.json if you ask me. That might be fine if some thought was put into it, but I don't think thats really what the deno community wants.

In general the more complicated your expectations get the more difficult this will be to implement, and the more bugs we will encounter in the process.
It might be much more simple to decide that user compilers should be trusted code, and expect them to handle the retrieval of resources required to complete their tasks thus the nomenclature loader would be more fitting.
The expectations for said loaders could be as simple as:

  1. Loaders will be given a module's fully qualified url(this could be generated using native url parsing I.E. new URL(specifier, referrer ? referrer : defaultUrl)).
  2. Loaders will also be given a modules referrer information: origin url, source code, source map, media type, source loader(what loader was used to provide this resource?), etc.
  3. Loaders would be expected to take the information from 1 and 2, and accurately as possible return a ts or js module that represents that modules url(or error out if not possible).
  4. Loaders should be given the ability to error out on a request for any reason(and should be encouraged to error out as soon as possible).
  5. Loader priority should be determined by their order in the list of configured loaders, and each loader must be given an attempt and error out before trying the next one.
  6. Loaders should be designed to be platform agnostic, so they can be integrated in tooling like a typescript language services plugin. This would most likely be achieved by having the implementation passing the loaders platform specific implementations of a shared resource accessor api.

The new dynamic import could be used to load these "loaders" as modules with a defined structure.
I tried to describe this as best as possible, but I figured I might be able to better represent my ideas in typescript interfaces and I also included a simple example implementation: https://gist.github.com/afinch7/4356a4377ec20dc336456d4639777578.

This would absolve the implementation of a lot of potentially complicated responsibilities, and even allow loaders to make security decisions about the content they are attempting to load(like a browser would if you tried to make a cross origin request).

A simple approach like this could support just about any use case, and could be universal enough to enable parity between the deno compiler and a typescript language services plugin(Seamless accurate integration in the editor is rare for systems like this). It also wouldn't be limited to javascript transpilers or json processing you could very easily compile just about anything into a javascript or typescript module. Flatbuffers definitions could be compiled to a typescript module at runtime, or c++, c, or even rust source code could be compiled to wasm and embedded into a js/ts module at runtime. You could have a nearly completely language agnostic platform, and give the developer the information to actually use it effectively.

@afinch7
Copy link
Contributor

afinch7 commented Mar 3, 2019

This might require a preflight check of sorts to check for redirects #1742 or give deno control of fetching the module "entry point".

@kitsonk
Copy link
Contributor Author

kitsonk commented Mar 5, 2019

You specified separating compilers by media type, but I don't think that extension(media type) is really a reliable way to do this

This is exactly how we do it today. media type != extension.

(how would you handle no extension)

Without a media type, that is insecure. We wouldn't want to allow a file like that to be processed... Files without extensions require a media type.

It might be much more simple to decide that user compilers should be trusted code, and expect them to handle the retrieval of resources required to complete their tasks thus the nomenclature loader would be more fitting.

We don't trust our own compiler. It does not do module resolution. That is up to privileged/Rust, as is appropriate.

There is nothing preventing someone from implementing a loader and eval'ing code today with the right Deno permissions. A public API for a userland compiler needs to follow the pattern of the built in compiler.

@afinch7
Copy link
Contributor

afinch7 commented Mar 5, 2019

Those are valid concerns. You want something that doesn't require any user setup or config, and my approach would require user configuration to work thus it falls outside of the deno philosophy.

It will always be distributed as a single executable - and that executable will be sufficient software to run any deno program. Given a URL to a deno program, you should be able to execute it with nothing more than the 50 megabyte deno executable.

I think that pretty much settles what direction deno should go, but I still have my concerns with the idea of a untrusted compiler. I think my main concern here is how can you in any way trust the code a compiler emits if you don't have full trust in the compiler as a end user.

In general I think we are both on completely different pages right now, so I want to do what is needed to put us all on the same page with this one.

@rdeforest
Copy link

rdeforest commented Apr 21, 2019

[snip]

I still have my concerns with the idea of a untrusted compiler. I think my main concern here is how can you in any way trust the code a compiler emits if you don't have full trust in the compiler as a end user.

[snip]

Just a lurker here, but I think what @kitsonk means by trusted/untrusted isn't what you think. You're right that one can't delegate code generation without the risk that the output will do something unwanted. Ken Thompson addressed this famously in his "Reflections on Trusting Trust" presentation in 1984.

The objective of treating the compiler as untrusted is to limit the damage it can do. Isolating the compiler prevents it from (for example) invoking /bin/bash in hopes of exploiting that unrelated program. The isolation is a reduction in attack surface as part of a defense-in-depth strategy.

The reason it's acceptable to risk nefarious or broken compiler output is because there is no architectural way to avoid the risk. The risk has to be addressed at a different layer, such as via module signing, code review, webs of trust, insurance mechanisms, etc.

I hope my comment is helpful.

@oldrich-s
Copy link

I propose to use service worker api to provide the compiler API:

#2676 (comment)

@kitsonk
Copy link
Contributor Author

kitsonk commented Jul 23, 2019

Service Workers aren't really suitable for a public compiler API. Service Workers are a specific class of Web Workers anyways. The existing compiler is implemented as a web worker, and a specific class of web worker would also be suitable IMO for the public compiler API, which is laid out above.

@brandonkal
Copy link
Contributor

TypeScript media types and extensions should be disallowed

Strongly disagree with this. Put it behind a permission flag, but we should have the possibility to use the existing babel ecosystem or other tools to preprocess TypeScript files.

@rsp
Copy link
Contributor

rsp commented Jan 15, 2020

It could be useful to be able to use a custom compiler for TypeScript as well, like ttypescript or reflec-ts. I would generally avoid it for performance reasons but it might be useful for some experiments, unless it is a problem with bootstrapping, i.e. the main entry file that defines a custom TS compiler itself being compiled by a built-in TS compiler chicken and egg problem.
Some things could be done with custom transformers if supported by #2089/#2927/#3442.

@JimLynchCodes
Copy link

I don't know if I'm in the right place, but I would like to write ClojureScript and directly or indirectly run it through deno! 🙃 🙌

@kitsonk
Copy link
Contributor Author

kitsonk commented Feb 10, 2020

If there is a JavaScript based Clojure compiler, then that would likely be possible to accomplish with this feature.

@bartlomieju bartlomieju modified the milestones: v1.0, future Feb 24, 2020
@Soremwar
Copy link
Contributor

@kitsonk I assume this isn't a goal for 1.0 is it?

@lucacasonato
Copy link
Member

No as manifested by its future milestone.

@ghost
Copy link

ghost commented Dec 4, 2020

@RDambrosio016 Trust me, I fully understand (I once looked at their 8kb+ block).
Even trying to parse plain JS requires a whole lot of code.
TS is still a relatively dynamic language, so it's not easy to type check.
I was using TSC as an example of a poorly performing "compiler" written in JS.

The key point is I believe that Deno should try to support more than just JS compilers, or at least make it easier to use something that isn't JS.

There are also plenty of other compilers that emit JS that I believe are not written in JS, for example, dart2js and the ClosureScript compilers. (I may be wrong)

@shadowtime2000
Copy link

I guess this is kind of like require hooks in NodeJS. I think another usage would be for compiled to JS functions template engines, so you don't have to read the file and compile, you can just import it.

@auvipy
Copy link

auvipy commented Sep 11, 2021

how much away are we from this to happen?

@kitsonk
Copy link
Contributor Author

kitsonk commented Sep 11, 2021

Quite a lot.

@mimbrown
Copy link

mimbrown commented Mar 4, 2022

@kitsonk it looks like the module resolution API for your deno_graph module is pretty much there, no? I've used the module a little bit and it seems to work quite smoothly. Can that be reworked back into deno?

@kitsonk
Copy link
Contributor Author

kitsonk commented Mar 5, 2022

@mimbrown it is already part of Deno, as it is what is used to do module resolution, but directly as a Rust crate.

We are unlikely to expose it as an internal API, because it is available as a JavaScript/WASM API.

@mimbrown
Copy link

mimbrown commented Mar 5, 2022

Yes I am aware, sorry my comment was not at all clear. What I meant was, the createGraph function exposed by the deno_graph module has a set of options that allow for user-defined module resolution and loading. You're giving users hooks to override the default behavior. I'll copy the options interface here:

interface CreateGraphOptions {
  /**
   * A callback that is called with the URL string of the resource to be loaded
   * and a flag indicating if the module was required dynamically. The callback
   * should resolve with a `LoadResponse` or `undefined` if the module is not
   * found. If there are other errors encountered, a rejected promise should be
   * returned.
   *
   * @param specifier The URL string of the resource to be loaded and resolved
   * @param isDynamic A flag that indicates if the module was being loaded
   *   dynamically
   */
  load?(
    specifier: string,
    isDynamic: boolean,
  ): Promise<LoadResponse | undefined>;
  /** The type of graph to build. `"all"` includes all dependencies of the
   * roots. `"typesOnly"` skips any code only dependencies that do not impact
   * the types of the graph, and `"codeOnly"` only includes dependencies that
   * are runnable code. */
  kind?: "all" | "typesOnly" | "codeOnly";
  /** When identifying a `@jsxImportSource` pragma, what module name will be
   * appended to the import source. This defaults to `jsx-runtime`. */
  jsxImportSourceModule?: string;
  /** An optional callback that will be called with a URL string of the resource
   * to provide additional meta data about the resource to enrich the module
   * graph. */
  cacheInfo?(specifier: string): CacheInfo;
  /** An optional callback that allows the default resolution logic of the
   * module graph to be "overridden". This is intended to allow items like an
   * import map to be used with the module graph. The callback takes the string
   * of the module specifier from the referrer and the string URL of the
   * referrer. The callback then returns a fully qualified resolved URL string
   * specifier or an object which contains the URL string and the module kind.
   * If just the string is returned, the module kind is inferred to be ESM. */
  resolve?(specifier: string, referrer: string): string | ResolveResult;
  /** An optional callback that can allow custom logic of how type dependencies
   * of a module to be provided. This will be called if a module is being added
   * to the graph that is is non-typed source code (e.g. JavaScript/JSX) and
   * allow resolution of a type only dependency for the module (e.g. `@types`
   * or a `.d.ts` file). */
  resolveTypes?(specifier: string): TypesDependency | undefined;
  /** An optional callback that returns `true` if the sub-resource integrity of
   * the provided specifier and content is valid, otherwise `false`. This allows
   * for items like lock files to be applied to the module graph. */
  check?(specifier: string, content: Uint8Array): boolean;
  /** An optional callback that returns the sub-resource integrity checksum for
   * a given set of content. */
  getChecksum?(content: Uint8Array): string;
  /** An optional string to be used when generating an error when the integrity
   * check of the module graph fails. */
  lockFilename?: string;
  /** An optional record of "injected" dependencies to the module graph. This
   * allows adding things like TypeScript's `"types"` values into the graph. */
  imports?: Record<string, string[]>;
}

The load, resolve, and resolveTypes hooks look like they were at least somewhat inspired by @GeoffreyBooth's comment above. You can tell me if I'm wrong. Anyway, this API seems like it would work almost as-is for a custom Deno loader. This is what I'm envisioning:

/** @file myLoader.ts */

export function load(specifier: string, isDynamic: boolean): Promise<LoadResponse | undefined> {
  // custom logic.
}

export function resolve(specifier: string, referrer: string): string | ResolveResult {
  // custom logic.
}

Used as:

deno run --loader=./myLoader.ts ./myScript.ts

Deno would then look at the loader file and override the default behavior depending on the functions that are exposed. I think it's already been noted that Workers aren't actually a great solution to this, even though they seem like they would be at first glance. This seems to be the simplest solution I can see, and it's already working fine for the deno_graph module.

@masx200
Copy link

masx200 commented Jul 17, 2022

https://nodejs.org/dist/latest-v18.x/docs/api/esm.html#loaders

https://rollupjs.org/guide/en/#plugins-overview

@Leo-Mu
Copy link

Leo-Mu commented Jul 28, 2022

I think Rescript will be the next big thing. Its syntax is very similar to rust, and many rust developers like it, and it is powered by Facebook and moves forward with the react ecosystem.

@ejsmith
Copy link

ejsmith commented Aug 4, 2022

@ry @kitsonk has there been any movement on this? If I wanted to create a language that compiles to JS and is then run like any other file in Deno, is there any hook I can currently use to do the transpile action on the fly how you do with TypeScript files?

@kitsonk
Copy link
Contributor Author

kitsonk commented Aug 4, 2022

This continue to not be a priority. Even if there was a community contributed solution, it may not get accepted. The issue is still open, because it is a potentially desirable feature, but it is not "just around the corner".

The main reason is that we have really hardened the integration of TypeScript into Deno and in a lot of ways we have federated how it is handled. We don't have a straight forward pipeline anymore of a simple "here is a TypeScript file, give me JavaScript". There are very good reasons for this, mostly to do with performance. In a lot of cases, the TypeScript compiler isn't even spun up and we don't use the emit from it any more, meaning its main purpose is to type check code.

We also now only emit code when it is needed and emitted/transformed code is no longer cached. So back when we had a much clearer way of registering an extension and media type to transpiler, it was feasible, now it would require implementing user APIs to make it work. There are now multiple paths that code can take through things and multiple points where this would need to be plugged in. deno_graph enforces all the content typing and dependency analysis, eszip is used to determine what goes into a deno compile bundle, we still have deno bundle which uses the swc bundler and there is deno_emit which handles the actual emit, and we still have an internal cache in CLI that we manage. It is fairly complex process to get a file off a web server or read it locally and get it to the point where it is JavaScript and ready to go into v8 to be executed.

We removed Deno.emit() and moved its functionality to a Wasm library under deno_emit. It still needs some work, but that feels like more likely the way forward for these type of things, exposing the "internals" of Deno to users via user loadable modules.

In theory, it could be done now, in userland, without exposing anything. People can transpile their code to JavaScript and load runtime code either via data URLs or object URLs and then dynamically import() them. Making that whole process easier might be a great community idea and might help make the case that there is broad community interest in using Deno for this type of things.

@ejsmith
Copy link

ejsmith commented Aug 5, 2022

@kitsonk thank you for the very thorough response. :-) I'm looking to do something similar to EJS templates being transformed into JS and run. I'd like to avoid having a compilation step. I will take a look into your idea of using dynamic imports.

@masx200
Copy link

masx200 commented Aug 12, 2022

@matthewp
Copy link

@kitsonk Understand your position here, just wanted to explain the difficulty of pulling this off in userland. There's really two ways you can do it, and both have bad tradeoffs that you don't really want. It's because a custom file type, like .foo might be depended on by a .js/.ts file too.

That means it's not enough to just transpile the files that Deno doesn't understand. You have to transpile all of them in order to rewrite JS imports. The two ways I have seen this done is:

  1. Transpile the entire codebase ahead-of-time. This can be slow in large codebases. You might as well just run a bundler, as it's the same thing.
  2. Rewrite all files and bypass the module loader entirely. This is what Vite does. It allows you to load things lazily as you are creating your own module loader. But you have to rewrite all imports into non-imports.

(2) is problematic because it doesn't and can't match ESM semantics entirely. And since you only want to do this in dev mode you are risking dev/prod differences.

tldr; you can't write a module loader in userland without bypassing Deno's module loader and that's not something anyone really wants to do.

@mimbrown
Copy link

@matthewp Just throwing another option in here, if you're wanting a way to transpile any kind of file to JS on the fly when it gets imported. This is easy on a server where the server code can do the transformation before serving the file. To achieve the same behavior in local imports, we can use an import map to map our local code folder to a server running on localhost, something like:

{
  "imports": {
    "./src/": "http://localhost:8000/"
  }
}

Now, we can have a server sitting between the import whatever from "./some-custom-file.custom-ext" and the yet-to-be-transpiled code on our local machines. I did a POC for myself using the svelte compiler. I don't know the performance implications of using a localhost server, and the caching isn't optimal (which is why I filed #15509), but it works.

@reggi
Copy link

reggi commented Oct 25, 2022

@mimbrown yeah I had the same idea, also outlined this here https://dev.to/reggi/proposal-the-as-ts-language-server-52in

@reggi
Copy link

reggi commented Oct 26, 2022

Today Vercel / Next.js dropped turbopack a rust-based transpiler / bundler designed to be a successor to webpack. Curious if the deno team has any interest in using this natively in deno in the future given it is built in rust.

@symful
Copy link

symful commented Jan 30, 2023

@mimbrown Something like this? (Please check the examples, it doesn't have proper readme :P)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat new feature (which has been agreed to/accepted) public API related to "Deno" namespace in JS
Projects
None yet
Development

No branches or pull requests