Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite deno_dir.rs #2057

Closed
ry opened this issue Apr 5, 2019 · 28 comments
Closed

Rewrite deno_dir.rs #2057

ry opened this issue Apr 5, 2019 · 28 comments

Comments

@ry
Copy link
Member

ry commented Apr 5, 2019

deno_dir is starting to show its age, it's time for it to be redesigned from scratch. Here are some of the issues I want to address

  • Potentially multiple deno processes may be downloading the same file at the same time. Therefore they should write to "filename.ts.$RANDOM_NUMBER.download" and when they complete rename it to "filename.ts"
  • Module resolution should be done using exactly this API: https://html.spec.whatwg.org/multipage/webappapis.html#resolve-a-module-specifier
  • Directories should be lazily created. Currently if you look at an strace of deno, you will see many unnecessary mkdirs.
  • It should be done in //core and be independent of other modules in core. deno_dir.rs is currently quite tangled up with other modules, which makes it difficult to refactor.
  • Remove the gen directory completely. Instead generated javascript and source maps should live along side their typescript counterparts. This makes it more inspectable and corresponds to how "tsc" generates code. For local files, the "file" scheme should be used and the path recreated (I'm leaving out details here- but I think people involved will understand)

anything else?

@hayd
Copy link
Contributor

hayd commented Apr 6, 2019

cli/compiler's ModuleMetaData is also closely related, should that be pulled out too?

I propose this be called core/cache.rs. Happy to have a stab at this.

@ry
Copy link
Member Author

ry commented Apr 7, 2019

@hayd I think core/cache.rs is a good name. ModuleMetaData is used in several situations where it shouldn't be (example) and ModuleMetaData has some bits to support TypeScript whereas //core is only JS - so I don't think it should be brought over.
Glad you want to take this on, I'd like to spec this out in a bit more detail before you dive in.

I'm not totally sure about this part "Remove the gen directory completely. Instead generated javascript and source maps should live along side their typescript counterparts."

@bartlomieju
Copy link
Member

I'm not totally sure about this part "Remove the gen directory completely. Instead generated javascript and source maps should live along side their typescript counterparts."

IMHO gen/ is fine as is, I'd hate to see my project dir explode with generated files after running deno, it'd be nightmare to add entries to .gitignore for each simple project.

I want to suggest a way to purge gen/ dir, it's been mentioned a few times already, especially when upgrading between versions that change compilation process. Although I'm not sure if --reload handles that already.

@ry
Copy link
Member Author

ry commented Apr 7, 2019

I wasn’t suggesting that the generated files would be in the project directory. They’d still live in the cache directory. For example if you had /home/ry/main.ts that would map to $DENO_DIR/file/home/ry/main.js

@bartlomieju
Copy link
Member

They’d still live in the cache directory. For example if you had /home/ry/main.ts that would map to $DENO_DIR/file/home/ry/main.js

I didn't catch that... This approach is super clean 👍

@hayd
Copy link
Contributor

hayd commented Apr 7, 2019

For consistency would you also cp main.ts to $DENO_DIR/file/home/ry/main.ts ? That way "generated javascript and source maps should live along side their typescript counterparts" in the same way as they do for http/https.

@ry
Copy link
Member Author

ry commented Apr 8, 2019

@hayd hmm seems superfluous...

@hayd
Copy link
Contributor

hayd commented Apr 8, 2019

That's true, I agree it's a little weird, but otherwise among http/https/etc. file is going to be a special case in not having a .ts file alongside .js and .js.map...

@bartlomieju
Copy link
Member

I've started working on progress bars for downloads and import, since downloads and compilation is async it would be helpful to store number of downloaded and compiled files on DenoDir so progress bars can be updated as new files are discovered.

@ry
Copy link
Member Author

ry commented Apr 27, 2019

@bartlomieju I'd rather there be a "ProgressBar" object (which should wrap whichever progress bar crate we're using) which can be added to ThreadSafeState. Then you'd have something like ProgressBar::start(action: String) and ProgressBar::end(action: String) ? Not sure about that exact API, but some functionality like that.

@kitsonk
Copy link
Contributor

kitsonk commented May 9, 2019

One consideration, with CheckJS enabled (see #2114) the TypeScript compiler will transform JS->JS. While we could discard the output, it makes more sense to cache both the input and output. This would allow support of transforming many CommonJS modules and AMD modules into ES modules that can be run by Deno transparently. In that case we would have an input and output JS to cache.

@ry
Copy link
Member Author

ry commented May 13, 2019

This would allow support of transforming many CommonJS modules and AMD modules into ES modules that can be run by Deno transparently

@kitsonk Although I'm not against this in the limit, I wouldn't want to tie this requirement to refactoring DenoDir. Does CheckJS output something different than the input if the input is ESM?

@ry
Copy link
Member Author

ry commented May 13, 2019

To summarize my position (using suggest we've discussed here):

//cli/deno_dir.rs needs to be re-written as it's very messy. The rewrite should be called //core/cache.rs.

Goals of the new //core/cache.rs:

  • The new module should ideally be independent of other bits of code in //core. In particular, it should not import //core/isolate.rs and ideally it would not even import //core/modules.rs (but this can be discussed - maybe it's necessary).
  • lazily creating the cache directories.
  • caching files downloaded from the internet into $DENO_DIR/https/example.com/a/b/c.ts
  • caching various meta-data and compilation artifacts:
    • HTTP headers for each request should be stored in $DENO_DIR/https/example.com/a/b/c.ts.headers.json
    • The output of TSC should be stored as a neighbor to the corresponding TS file: $DENO_DIR/https/example.com/a/b/c.js (Yes, this means getting rid of the $DENO_DIR/gen directory)
    • The new module should be general enough to support other files in the future too. We probably will want a "depfile" ($DENO_DIR/https/example.com/a/b/c.ts.d not to be confused with d.ts)
    • Local TS files will have their output cached into a directory corresponding to their local file:/// url. Example: $DENO_DIR/file/Users/rld/src/deno/tests/002_hello.js would be the cached output of /Users/rld/src/deno/tests/002_hello.ts
  • should not depend on TS as //core is JS-only.

If there are no other requirements, I welcome pseudo-code which starts specifying the interface.

@kitsonk
Copy link
Contributor

kitsonk commented May 14, 2019

Does CheckJS output something different than the input if the input is ESM?

Not with our targets as they are. If we weren't supporting esnext as the target, syntactical constructs would be down-emitted (e.g. async iterables), but if we are not doing any down level support it should be like for like (though I think we have things like strip comments, so that would be erased from the output file).

I guess my main point is there is always an input file and usually an output file (for example JSON imports should be transformed to an embedded .js file), so we could ensure that the cache structure will always work (e.g. not be opionated that input files are always .ts, the could be .js, .mjs. .esm, extension-less, etc. and that we can/do process them, conform them as an output file which will always be .js even if it is just a copy paste).

@ry
Copy link
Member Author

ry commented May 14, 2019

It would be cool to support --outFile some day and magically give people the ability to bundle their app. And if they bundle their app, they probably want to control the target.

(This seems like it wouldn't be too difficult to support these days?)

@kitsonk
Copy link
Contributor

kitsonk commented May 14, 2019

--outFile would only work with something like if we were outputting AMD where you can simply concatenate modules. With ESM, you can't simply concatenate modules, you have to rewrite them. One of the big "issues" I have with ESM, but it is the standard.

@kitsonk
Copy link
Contributor

kitsonk commented Jun 9, 2019

At @hayd are you still working on this? If not, I would like to.

@bartlomieju
Copy link
Member

I'd like to add my 2 cents:

  • To add to that, we should use specialized types for passing specifiers between functions - using &str is cumbersome and you never know if specifier is already resolved or not, which leads to resolving resolved specifiers multiple times, see this comment for reference

  • same file shouldn't be downloaded multiple times on the same run - deno run --reload infinite loop #2442, fix: prevent multiple downloads of same file #2477

  • to fully support import maps we should have ability to try multiple URLs for single module and reuse first successful URL for subsequent calls - reference - this can be incorporated into previous point

  • I'd opt to separate code paths for situations with --reload and --no-fetch flags - otherwise we'll end up in big conditionals like in get_source_code_async:

    deno/cli/deno_dir.rs

    Lines 405 to 409 in e2468d8

    if !is_module_remote
    || use_cache
    || no_fetch
    || deno_dir.fetched.has(&module_name)
    {

@hayd
Copy link
Contributor

hayd commented Jun 9, 2019

@kitsonk please take it.

@kitsonk
Copy link
Contributor

kitsonk commented Jun 9, 2019

@ry and I discussed, thus should rest a little bit until bundle and import maps settle down a bit.

@bartlomieju
Copy link
Member

bartlomieju commented Jul 3, 2019

Here's link to high level diagram that I created with my take on rewriting deno_dir:

https://drive.google.com/file/d/1xk8e1nCWSugvbz1yOGlQQJmTDEmNJQh_/view?usp=sharing

Write up with explanation for my choices and pseudo code will follow tomorrow - I need to polish it a bit. There are still some missing bits like:

  • decide if TS file is not stale - my idea is to use metadata.json in which we can store hash of the compilation (eg. result of current cache_path is a good value we could use)
  • concurrent access to DenoDir by multiple processes, @ry suggested lock file for that, I'm not yet sure how to incorporate it

Anyway feel free to take a look and comment

@bartlomieju
Copy link
Member

bartlomieju commented Jul 4, 2019

Proposed implementation

Overview

Main goal of rewrite is to make DenoDir more maintainable, allow for further improvements in future and limit number of IO operations (file and net fetches).

Rewrite requires significant changes to following files:

  • //cli/deno_dir.rs
  • //cli/state.rs
  • //cli/compiler.rs
  • //js/compiler.ts

Data structures

ModuleMetaData

  • module_specifier
  • module_redirect_source_name
  • filename
  • media_type
  • source_code

Module - struct passed directly to V8
SourceCodeInfo - already existing struct in //core/modules.rs

  • module_specifier
  • source_code

DenoDir

Important goal of rewrite is to make DenoDir agnostic of TypeScript - it should operate as cache/fetcher for local and remote modules.

This goal achieved via following public API:

  • fetch_module_meta_data(module_specifier) -> Future<ModuleMetaData, DenoError> Main method for obtaining source files for Deno. It's similar to current implementation but is completely agnostic of type of file fetched - it literally be any filetype; JS, JSON, TS, SourceMap
  • fetch_cached_module_meta_data(module_specifier) -> Option<ModuleMetaData> Returns already loaded ModuleMetaData for given module (that is cached in-process for fast subsequent access) if it was previously loaded or None
  • save_source_code(module_specifier, code)
    Saves source code of module to disk, does conversion from module_specifier to local filepath (using url_to_deps_path).
save_source_code(
  "https://deno.land/std/http/file_server.ts",
  "<snip>"
);
// saves file to `$DENO_DIR/https/deno.land/std/http/file_server.ts
  • save_auxiliary_file(module_specifier, filetype, contents)
    Saves file to disk, that is somehow related to module - eg. headers file for remote module for compiled modules. Example usage:
save_auxiliary_file(
  "https://deno.land/std/http/file_server.ts",
  "headers.json",
  "<snip>"
);
// saves file to `$DENO_DIR/https/deno.land/std/http/file_server.ts.headers.json

save_auxiliary_file(
  "file:///dev/file_server.js",
  "map",
  "<snip>"
);
// saves file to `$DENO_DIR/file/dev/file_server.js.map

Important part of new DenoDir is ModuleMetaDataCache:

struct ModuleMetaDataCache(Mutex<HashMap<ModuleSpecifier, ModuleMetaData>>);

impl ModuleMetaDataCache {
  pub fn get(self, module_specifier) -> Option<ModuleMetaData>
  
  pub fn cache(self, module_meta_data)
}

It is a simple hash map that stores metadata for modules that have already been requested and fetched - ensuring that no file is read from disk/fetched from net more than once. This is in-process cache.

MISSING BITS:

  • concurrent access to $DENO_DIR
  • trying multiple URLs for module

TS compiler

  • Currently during resolveModuleNames we're issuing a number of op_fetch_module_meta_data ops. Compiler requests only resolved URLs of imports
    but we're downloading them actually. This can be solved in two ways:
    • replace call to op_fetch_module_meta_data with op_resolve_modules which simply resolves all specifiers and returns them to TS compiler
    • replace call to op_fetch_module_meta_data with op_fetch_modules_meta_data and download all requested modules concurrently (as of now modules are downloaded one after another)
  • op_cache should be replaced with op_cache_compiler_output that saves compiled output code and it's source map (using DenoDir::save_source_code and DenoDir::save_auxiliary_code).

State

Currently fetch_module_meta_data_and_maybe_compile_async is main function that is called to load module code into V8. I propose following algorithm for this function:

1. try to get cached `ModuleMetaData` for module:
  a) if requested module is TS file resolve URL for compiled module (ie. replace .ts with .js extension)
  b) call `DenoDir::fetch_cached_module_meta_data` 
  c) if `DenoDir::fetch_cached_module_meta_data` returned something go to 10.
2. if requested module is not TS file call `DenoDir::fetch_module_meta_data` and go to 10.
3. can `use_cache`?
  a) try to get compiled module `ModuleMetaData` from on-disk cache by issuing `DenoDir::fetch_module_meta_data` (ie. with .js extension)
  b) if returned None go to 4.
  c) if cached version is stale (this can be established using similar approach to current `cache_path`, but storing that hash in auxiliary file), go to 4.
  d) return and go to 10.
4. setup TS compiler worker and runtime
5. post message to TS compiler
6. call `ts.createProgram`, and for requested module and its imports:
  a) get source file by calling `op_fetch_module_meta_data`
  b) resolve imports by calling `op_resolve_modules`
  c) write compiled output to disk by calling `op_cache_compiler_output`
7. get message from TS compiler worker
8. handle errors and diagnostic from TS worker and exit or
9. get compiled module `ModuleMetaData` from on-disk cache by issuing `DenoDir::fetch_module_meta_data` (ie. with .js extension) - this call can't fail!
10. return `SourceCodeInfo(module_url, source_code)` constructed from `ModuleMetaData`

With described approach it's possible to encapsulate TS compilation logic in small struct called TSCompiler:

impl TSCompiler {
  pub fn compile(self, module_specifier) -> Future<SourceCodeInfo, DenoError> {
    let compile_module_specifier = ...;
    if use_cache {
      let compiled_module_meta_data = await self.deno_dir.fetch_module_meta_data(compiled_module_specifier);
      
      if !self.is_stale(compiled_module_meta_data) {
        compiled_module_meta_data.into_module();
      }
    }
    
    let ts_compiler = setup_compiler_worker();
    let compiler_result = ts_compiler.post_message(module_specifier);
    if compiler_result.error || compiler_result.diagnostics {
      return DenoError::from(compiler_result)
    }
    
    let compiled_module_meta_data = await self.deno_dir.fetch_module_meta_data(compiled_module_specifier);
    
    compiled_module_meta_data.into_module();
  }
}

The great thing is it can be abstracted away to supported other file types:

impl JSONCompiler {
  pub fn compile(self, module_specifier) -> Future<SourceCodeInfo, DenoError> {
    let module_meta_data = await self.deno_dir.fetch_module_meta_data(module_specifier);
    
    // wrap JSON source in `export default` and return `SourceCodeInfo`
    {
      source_dode: format!(
        "export default {};",
        str::from_utf8(&self.source_code).unwrap()
      ),
      module_specifier
    }
  }
}

Want to import Rust file in your JavaScript like in Parcel? There you go:

impl RustCompiler {
  pub fn compile(self, module_specifier) -> Future<SourceCodeInfo, DenoError> {
    let compile_module_specifier = ...;
    let module_meta_data = await self.deno_dir.fetch_module_meta_data(module_specifier);
    // compile Rust file to WebAssembly
    // https://github.com/parcel-bundler/parcel/blob/master/packages/core/parcel-bundler/src/assets/RustAsset.js
    
    // wrap output WASM code similar to Parcel:
    // https://github.com/parcel-bundler/parcel/blob/master/packages/core/parcel-bundler/src/builtins/loaders/browser/wasm-loader.js
  }
}

That means we can create something called CompilationManager that allows compilers to register themselves for different extensions/media types.

There are probably a few details that I forgot to describe, I'll update this comment when I remember them.

Please comment and ask questions.

CC @ry @kitsonk @hayd @piscisaureus

EDIT: Also CC @kevinkassimo

@ry
Copy link
Member Author

ry commented Jul 4, 2019

It's very nice to have such a comprehensive analysis of the problem! I think the abstracted compiler sounds good.

I'd like to have a lower-level interface to DENO_DIR which doesn't know anything about downloading/compiling or even modules. I'm not sure exactly what this would look like, but it might be something as simple as

struct DenoDir2 {
   location: PathBuf,
}

impl DenoDir2 {
  fn set(url: Url, data: &[u8]) -> Result<()> { unimplemented!() }
  fn get(url: Url) -> Result<Vec<u8>> { unimplemented!() }
}

I think this could work fine with your design?

@bartlomieju
Copy link
Member

bartlomieju commented Jul 4, 2019

It's very nice to have such a comprehensive analysis of the problem! I think the abstracted compiler sounds good.

Thanks!

I'd like to have a lower-level interface to DENO_DIR which doesn't know anything about downloading/compiling or even modules. I'm not sure exactly what this would look like, but it might be something as simple as

struct DenoDir2 {
   location: PathBuf,
}

impl DenoDir2 {
  fn set(url: Url, data: &[u8]) -> Result<()> { unimplemented!() }
  fn get(url: Url) -> Result<Vec<u8>> { unimplemented!() }
}

I think this could work fine with your design?

Yes, that should be possible 👍 looks like it corresponds to read_file/write_file part in the diagram which is the smallest building block of DenoDir

Actually I believe it'd be possible to structure code in such a way, that allows to skip compiler step completely - then, DenoDir would be able to handle only JS files so deno crate users can build upon that.

EDIT:
FYI I'm working on prototype of proposed implementation. I'll get back at the end of week.

@kevinkassimo
Copy link
Contributor

kevinkassimo commented Jul 5, 2019

@bartlomieju Looks good at first glance, will check more carefully tonight. Also would it be better if we could actually put this proposal in a Google Doc so others could add their suggestions on certain parts of the proposal (and easier to make comment threads)?
(You could create a shareable link for suggestion only)

@bartlomieju
Copy link
Member

@bartlomieju
Copy link
Member

@ry do you think we can close this issue now?

@ry
Copy link
Member Author

ry commented Aug 12, 2019

@bartlomieju Thank you very much for the critical work you've done in this refactor: 2e1ab82, 421cbd3, ac269be, 70de8dd, 8214b68, 72d9045.

I still think the deno dir needs some work, but you've addressed most of the issues outlined here. I will create a new issue if/when we need to do more.

@ry ry closed this as completed Aug 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants