Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement ideas from JupyterLite #16

Open
rth opened this issue Oct 15, 2022 · 8 comments
Open

Enhancement ideas from JupyterLite #16

rth opened this issue Oct 15, 2022 · 8 comments

Comments

@rth
Copy link
Member

rth commented Oct 15, 2022

As summarized in pyodide/pyodide#3093 (comment) by @bollwyvl there are a number of features in piplite/JupyterLite built on top of micropip, some of which could make sense to move upstream, now that this package is standalone,

What piplite/jupyterlite hacks are wanted? Happy to shed some stuff!

Even if all our hacks weren't desirable, it would be lovely to have a few more documented ways to customize micropip, rather than trying to solve some things Once And For All... presenting the two together:

  1. handling multiple package sources
    1. we check for a custom Warehouse (or warehouse-like file format) for custom/offline packages
    2. notionally, micropip.add_pypi_json_resolver(key: str, get_pypi_json: Callable[[str, dict], PackageReleases])
  2. disabling package sources
    1. for offline (but still dynamic) installations, it's nice to have some guardrails in place to avoid calling out to Big PyPI
    2. notionally, micropip.remove_pypi_json_resolver(key: str)
  3. specifying no-op packages to be considered installed
    1. not handled in piplite, but rather in separate build packages
    2. notionally, micropip.add_noop_package(package_name: str)

Some things that we hack outside of piplite, so maybe they belong in core:

  1. specifying patches that need to occur at import time
    1. we un-lazily load and patch some heavy hitters (matplotlib, oy!)
    2. this is really more of a loader-time thing, so maybe should be a pyodide API
    3. we tried to use some off-the-shelf stuff, but at some point it stopped working
  2. specifying mock packages
    1. we mock some packages
@rth rth changed the title Enhancements ideas from JupyterLite Enhancement ideas from JupyterLite Oct 15, 2022
@rth
Copy link
Member Author

rth commented Oct 15, 2022

1ii. notionally, micropip.add_pypi_json_resolver

This is very much related to #7

  1. disabling package sources

So then the only thing that would work is micropip.list and installation from a local sources? I though that you handled the offline case with a service worker so that all the code could continue making the usual fetch requests?

  1. specifying no-op packages to be considered installed
  2. specifying mock packages

I guess these two could go together. Yes, indeed could be worth considering. So far the alternative is to use deps=False and install all the other packages except the problematic one manually.

  1. specifying patches that need to occur at import time

Import time is after the package is installed, so I'm not sure this would be suitable for matplotlib which only cares about package installation.

@bollwyvl
Copy link

Thanks for carrying these over.

To add more to an already heavy topic, we just landed wrapping micropip in another layer of complexity to get closer to parity with IPython-under-ipykernel, namely by adding %pip which gets rewritten to piplite.install.

As the goal is pip install CLI parity, we went ahead and added support for pyodide/pyodide#6's -r files with a very naive parser.

I was pleased to find that ; platform_machine != "wasm32" already works, really only os_name == "posix" is a little weird.

I think that these will give us a good way to document for content authors how to have a single requirements.txt for both the build environment (from which the client app inherits some data_files... #18) as well as the runtime pyodide kernel environment, as sometimes it's very important that client and kernel packages agree.

NB: I'll do some separate response below, as it got a little long.

@bollwyvl
Copy link

1ii. micropip.add_pypi_json_resolver

This is very much related to pyodide/pyodide#7

Yep. It's too bad it would be pretty hard to make this map to --index-urls. And I don't love my file format.

A related one: if this could be done in advance, and could have the importables data from top-level.txt so that it got picked up by installFromImports, so users didn't even need to install stuff, i'd be over the moon!

@bollwyvl
Copy link

  1. disabling package sources

the only thing that would work is micropip.list and installation from a local sources?

Where "local" means, "the content owner shipped it", "not already "in emfs", as we still have some local binary issues (including wheels) stemming from our service worker implementation.

This is not a security feature, as:

  • piplite just wraps micropip, not replaces it
  • still supports URL wheels

But rather, the use case is giving site owners a way to reduce the number of chances for un-reproducible behavior, and would be used in conjunction with a co-deployed pyodide and indexed wheels.

Having it as an API would mean we could drop some patches.

@bollwyvl
Copy link

    1. specifying no-op packages to be considered installed and mock packages

the alternative is to use deps=False

Yeah, yep, thanks for deps=False!. In trying to get jupyterlite self-hosting (e.g. one could build a new site inside the kernel running in another site and download (but not serve)) I have a gnarly block of code like this:

    dbm = types.ModuleType("dbm")
    dbm.dumb = dbm.whichdb = None
    sys.modules["dbm"] = dbm
    # for ``jupyter_server``
    anyio = types.ModuleType("anyio")
    sys.modules["anyio.to_thread"] = anyio.to_thread = types.ModuleType("to_thread")
    anyio.to_thread.run_sync = lambda: None
    sys.modules["anyio"] = anyio
noop = lambda *args, **kwargs: dict(args=args, kwargs=kwargs)
nbclient = types.ModuleType("nbclient")
nbclient.NotebookClient = nbclient.execute = noop
sys.modules["nbclient"] = nbclient
nbclient_exceptions = types.ModuleType("nbclient.exceptions")
nbclient_exceptions.CellExecutionError = noop
sys.modules["nbclient.exceptions"] = nbclient_exceptions

prometheus_client = sys.modules["prometheus_client"] = types.ModuleType(
    "prometheus_client"
)
prometheus_client.Gauge = prometheus_client.Histogram = lambda *x: None

requests = sys.modules["requests"] = types.ModuleType("requests")

# tornado
tornado = sys.modules["tornado"] = types.ModuleType("tornado")
tornado.version_info = (6, 1, 0)
tornado.concurrent = sys.modules["tornado.concurrent"] = types.ModuleType(
    "concurrent"
)
tornado.concurrent.Future = None
tornado.escape = sys.modules["tornado.escape"] = types.ModuleType("escape")
tornado.escape.url_escape = (
    tornado.escape.json_decode
) = tornado.escape.json_encode = tornado.escape.utf8 = None
tornado.gen = sys.modules["tornado.gen"] = types.ModuleType("gen")
tornado.gen.multi = None
tornado.gen.coroutine = lambda *x: None
tornado.httputil = sys.modules["tornado.httputil"] = types.ModuleType("httputil")
tornado.httputil.url_concat = None
tornado.httpclient = sys.modules["tornado.httpclient"] = types.ModuleType(
    "httpclient"
)
tornado.httpserver = sys.modules["tornado.httpserver"] = types.ModuleType(
    "httpserver"
)
tornado.httpclient.AsyncHTTPClient = (
    tornado.httpclient.HTTPClient
) = (
    tornado.httpclient.HTTPRequest
) = tornado.httpclient.HTTPClientError = tornado.httpclient.HTTPResponse = None
tornado.ioloop = sys.modules["tornado.ioloop"] = types.ModuleType("ioloop")
tornado.ioloop.PeriodicCallback = tornado.ioloop.IOLoop = None
tornado.log = sys.modules["tornado.log"] = types.ModuleType("log")
tornado.log.app_log = (
    tornado.log.access_log
) = tornado.log.gen_log = tornado.log.LogFormatter = None
tornado.netutil = sys.modules["tornado.netutil"] = types.ModuleType("netutil")
tornado.netutil.Resolver = tornado.netutil.bind_unix_socket = None
tornado.template = sys.modules["tornado.template"] = types.ModuleType("template")
tornado.template.Template = lambda *x: None

tornado.web = sys.modules["tornado.web"] = types.ModuleType("web")

class Application:
    pass

tornado.web.Application = Application

class RequestHandler:
    pass

tornado.web.StaticFileHandler = (
    tornado.web.RequestHandler
) = tornado.web.RedirectHandler = RequestHandler
tornado.web.authenticated = tornado.web.removeslash = lambda *x: None
tornado.web.HTTPError = None

websocket = sys.modules["websocket"] = types.ModuleType("websocket")
websocket.WebSocket = None

# zmq
zmq = sys.modules["zmq"] = types.ModuleType("zmq")
zmq.REQ = zmq.DEALER = zmq.SUB = zmq.NOBLOCK = zmq.Message = None

class Context1:
    pass

zmq.Context = Context1
zmq.MessageTracker = lambda *x: x
zmq.asyncio = sys.modules["zmq.asyncio"] = types.ModuleType("asyncio")

class Context2:
    pass

zmq.asyncio.Context = Context2
zmq.asyncio.Socket = None
zmq.sugar = sys.modules["zmq.sugar"] = types.ModuleType("sugar")
zmq.sugar.socket = sys.modules["zmq.sugar.socket"] = types.ModuleType("socket")
zmq.sugar.socket.Socket = None
zmq.eventloop = sys.modules["zmq.eventloop"] = types.ModuleType("eventloop")
zmq.eventloop.ioloop = sys.modules["zmq.eventloop.ioloop"] = types.ModuleType(
    "eventloop"
)
zmq.eventloop.ioloop.IOLoop = None
zmq.eventloop.zmqstream = sys.modules["zmq.eventloop.zmqstream"] = types.ModuleType(
    "eventloop"
)
zmq.eventloop.zmqstream.ZMQStream = None

So yeah, anything that would make that a little easier 😍

@bollwyvl
Copy link

  1. specifying patches that need to occur at import time

Import time is after the package is installed

Right, it's true. If there would be a place in pyodide itself to do this, that would be fine as well. But in general, "if a package gets installed, we want to do these things before/slightly after it imports" would be a lovely thing to be able to signal without forcing it to be installed to do so.

@rth
Copy link
Member Author

rth commented Nov 26, 2022

specifying mock packages

This was implemented in #26

@bollwyvl
Copy link

bollwyvl commented Feb 14, 2023

Welp, I've gone and started down the road of generating repodata.json and patching the runtime repodata_packages and _import_name_to_package_name. May it be on my head, but with it, we'll be able to get our cold startup download to under 10mb, which is an enormous improvement over even a few months ago, when we still had to load matplotlib and numpy before giving control back to the user. With this approach, a site owner can avoid having %pip install in pre-written content, as the loadPackagesFromImports takes care of most things.

I haven't investigated the solution from #26, as handling mocks as actual wheels (e.g. 3.6kb of jedi) still seems more manageable for jupyterlite core, but that will be great for users that just need a little something-something inside their own package, for example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants