Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Network access in the browser #85

Open
adamziel opened this issue Dec 8, 2022 · 20 comments
Open

Network access in the browser #85

adamziel opened this issue Dec 8, 2022 · 20 comments

Comments

@adamziel
Copy link
Collaborator

adamziel commented Dec 8, 2022

Latest status

Curl and tcp over fetch() are now a part of WordPress Playground 🎉 Here's what we still need to close this issue:


Description

WordPress Playground only has a partial support for network calls.

Types of network calls in WordPress

wp_safe_remote_get

As of #724, Playground is capable of translating wp_safe_remote_get calls into JavaScript fetch() requests. This has limitations:

  • Only https:// URLs are supported
  • The server must provide valid CORS headers in the response
  • Developers can't control all the headers

Arbitrary network calls

Other methods of accessing the network, such as libcurl or file_get_contents, are not supported yet.

Web browsers do not allow the WebAssembly code to access the internet directly yet. A native socket API may or may not be released in the future, but there isn't one for now. #1093 would improve the situation.

In Node.js, Playground access the network using the following method:

  1. Set up a same-domain API endpoint that accepts network commands from the browser
  2. Capture socket function calls in the WebAssembly binary
  3. Pass them to JavaScript
  4. Pass the requested operation over the API endpoint using the fetch() or WebSocket

This may not be viable on the web as someone would have to pay for the hardware to run the proxy on, and the proxy's nature mean there are security risks related to accessing the local network.

Solution

After 1,5 years of exploring and discussing, this issue finally has a path forward:

For full networking support, we'd also need the following:

  • Expose the Node networking proxy as a separate, runnable script
  • Provide an API to connect it to the in-browser version Playground
  • Document the workflow

Nice to haves:

  • Ship a version of the network built in PHP script to enable running a full-featured Playground build in the same environments as WordPress.
  • Provide a Dockerfile to set up the network proxy and a few buttons for quickly spinning proxy cloud nodes on, e.g. CloudFlare, Digital Ocean, etc.

Limitations of the approach

Limitations without the network proxy:

  • Non-CORS URLs wouldn't work
  • Non-HTTPS traffic wouldn't work
  • gethostname and other low-level methods still wouldn't work
  • SSL certificate checks, like the ones done by Composer, wouldn't work

All of the above could be resolved by plugging in a network proxy.

Other Alternatives

@adamziel
Copy link
Collaborator Author

adamziel commented Dec 16, 2022

For posterity: I tried a custom Request_Transport that tunneled all traffic through browser's fetch() using the vrzno extension by @seanmorris and that worked well except for sites that didn't allow cross-origin requests – which is most sites.

Interestingly, I remember that WordPress Plugin Directory did not work in this setup. However, @dd32 pointed out that it exposes the correct access-control headers:

curl -is ‘https://api.wordpress.org/plugins/info/1.2/?action=query_plugins’ | grep ‘^access-control’
access-control-allow-origin: *

So perhaps there is a way to support at least the api.wordpress.org requests with the browser's native fetch()? Let's revisit this idea.

@adamziel
Copy link
Collaborator Author

adamziel commented Feb 2, 2023

Networking is supported in the Node.js build as of #119 – PHP sends data through a WebSocket to a local TCP proxy that handles the required network calls.

I can think of three ways to implement in-browser support:

  • A server-side TCP proxy – the least handy of all, has terrible security implications.
  • An in-browser TCP proxy – could be implemented as a browser extension, although Google Chrome deprecated the socket.tcp API for extensions.
  • TCP to HTTP rewriting – The WebSocket class could be replaced with one that concatenates all the sent data and then reconstructs a fetch() call from them. Then, PHP can be compiled without OpenSSL support OR to treat all https requests as http ones so that the WebSocket shim could read raw data. The proxy itself could work as a same-tab fetch() or as a browser-extension fetch() to work around the CORS limitations. This wouldn't support arbitrary network traffic, but would be perhaps good enough for the most popular use-cases.

@adamziel
Copy link
Collaborator Author

adamziel commented Feb 3, 2023

Also linking to this related discussion.

@adamziel
Copy link
Collaborator Author

adamziel commented Apr 29, 2023

Libraries like Composer require HTTPS and they verify the peer certificate by default: https://github.com/composer/composer/blob/11879ea737978fabb8127616e703e571ff71b184/src/Composer/Util/StreamContextFactory.php#L183-L197

As a workaround, networking in the browser could:

  • Give PHP a fake wildcard CA cert
  • Implement a fake endpoint for all HTTP requests that would feed PHP the fake certificate
  • Parse the incoming request and re-issue it using fetch()
  • Parse the response, encrypt using the fake certificate, feed it back to PHP

This will only work for endpoints exposing proper CORS headers, but it's a start.

@dmsnell
Copy link
Member

dmsnell commented May 1, 2023

Give PHP a fake wildcard CA cert

why not use a real chain of trust?
I'm very leery of building a system whose default is to strip away all security from TLS connections and present trust for everything.

particularly if we're trying to make it easy to instantly spool up systems with a blueprint, this could so easily lead to cross-site attacks: "Hey look at the plugin I wrote: [malware link]"

for what it's worth, the default Erlang net library sets verify_peer to false and it's a disaster because nobody remembers to activate it and supply proper certs.

maybe I'm misreading this, but I'd rather us avoid that mistake if it's what I think we're talking about

@adamziel
Copy link
Collaborator Author

adamziel commented May 3, 2023

why not use a real chain of trust?

We do in Node.js. Browsers can’t open raw TCP sockets so we need to re-issue the request using fetch(). The only way to do it is to MITM the PHP program to parse the encrypted request data.

@adamziel
Copy link
Collaborator Author

Hosting a websocket proxy on e.g. free CloudFlare tier could solve this for now.

@eliot-akira
Copy link
Collaborator

eliot-akira commented Jun 25, 2023

Hosting a websocket proxy

Possible candidates:


EDIT: Oh, I see there's already something like this implemented in @php-wasm/node, based on maximegris/node-websockify.

https://github.com/WordPress/wordpress-playground/blob/trunk/packages/php-wasm/node/src/lib/networking/outbound-ws-to-tcp-proxy.ts

@adamziel
Copy link
Collaborator Author

adamziel commented Jun 25, 2023

Oh, I see there's already something like this implemented in @php-wasm/node, based on maximegris/node-websockify.

Yup, it is used in the @php-wasm/cli, VS Code extension, and wp-now. The same proxy would just work with the web version if it was hosted somewhere. The custom parts were added to support setsockopt().

@fritexvz
Copy link

I wonder what could be achieved, if so, by using the Cloudflare TCP Sockets and running WP Playground on Cloudflare Worker / WASM / NodeJS?

@geekodour
Copy link

Just to add more context to @fritexvz 's reply, running the playground on wordpress has been discussed here:
#69

@adamziel adamziel mentioned this issue Oct 4, 2023
10 tasks
@adamziel
Copy link
Collaborator Author

adamziel commented Mar 4, 2024

#1051 implements a HTTPS termination function. All PHP-initiated network traffic is intercepted by a "fake WebSocket" instance which then offers a self-signed HTTPS certificate and reads the raw HTTP traffic, rewrites it as a fetch() call, and streams the response back to PHP. Note this may only work for HTTP and HTTPS requests to URLs exposing valid CORS-headers. It won't work for arbitrary sockets.

That PR needs a lot of cleaning up, but the concept seems to be solid. It would unblock support for libcurl and stream wrappers like file_get_contents("https://...").

@adamziel
Copy link
Collaborator Author

adamziel commented Mar 22, 2024

It took 1,5 years but we now have a clear path to resolving this issue 🎉

This would enable requesting all CORS-enabled HTTPS endpoints.

For full networking support, we'd also need the following:

  • Expose the Node networking proxy as a separate, runnable script
  • Provide an API to connect it to the in-browser version Playground
  • Document the workflow

The proxy wouldn't be hosted on Playground.wordpress.net as it would be a resource drain, but we could make spinning your own proxy instance easy enough.

Nice to haves:

  • Ship a version of the network built in PHP script to enable running a full-featured Playground build in the same environments as WordPress.
  • Provide a Dockerfile to set up the network proxy and a few buttons for quickly spinning proxy cloud nodes on, e.g. CloudFlare, Digital Ocean, etc.

adamziel added a commit that referenced this issue Apr 29, 2024
Ships the Node.js version of PHP built with `--with-libcurl` option to support the curl extension.

It also changes two nuances in the overall PHP build process:

* It replaces the `select(2)` function using `-Wl,--wrap=select` emcc
option instead of patching PHP source code – this enables supporting
asynchronous `select(2)` in curl without additional patches.
* Brings the `__wrap_select` implementation more in line with
`select(2)`, add support for `POLLERR`.
* Adds support for polling file descriptors that represent neither child
processes nor streams in `poll(2)` – that's because `libcurl` polls
`/dev/urandom`.

Builds on top of and supersedes
#1133

## Debugging Asyncify problems

The [typical way of resolving Asyncify
crashes](https://wordpress.github.io/wordpress-playground/architecture/wasm-asyncify/)
didn't work during the work on this PR. Functions didn't come up in the
error messages and even raw stack traces. The reasons are unclear.

[The JSPI build of
PHP](#1339) was
more helpful as it enabled logging the current stack trace in all the
asynchronous calls, which quickly revealed all the missing
`ASYNCIFY_ONLY` functions. This is the way to debug any future issues
until we fully migrate to JSPI.

## Testing Instructions

Confirm the CI checks pass. This PR ships a few new tests specifically
targeting networking with curl.


## Related resources

* #85
* #1093

---------

Co-authored-by: Adam Zieliński <adam@adamziel.com>
Co-authored-by: MHO <yannick@chillpills.io>
adamziel pushed a commit that referenced this issue May 1, 2024
## What is this PR doing?
Due to the Content Security Policy, the link to the GitHub issue does
not open within the Playground 'Add new plugin' and 'Add new theme'
pages. To fix this, add the _target_ attribute to load the link in a new
tab.

## What problem is it solving?
The error occurs when someone clicks on the "experimental, opt-in
feature" link within the Playground due to the CSP.

## How is the problem addressed?
Added the **target** attribute to the hyperlink to load the link in a
new tab.

## Testing Instructions

1. Open playground 
2. Navigate to the **Plugins > Add New Plugins** or **Appearance >
Themes > Add New Theme (button)**
3. You can notice the admin error notice mentioning "Network access is
an [experimental, opt-in
feature](#85),
which means you need to enable it to allow Playground to access the
Plugins/Themes directories."
4. Click on the "experimental, opt-in feature" hyperlink, where you can
notice the GitHub issue link is not loaded due to the CSP.
@adamziel adamziel moved this to Future work in Playground Board Jun 30, 2024
@jeffpaul
Copy link
Member

@adamziel would love to chat about this at WCUS Contributor Day if you'll be around?

@adamziel
Copy link
Collaborator Author

adamziel commented Sep 13, 2024

Hey @jeffpaul! Unfortunately I won't be around at WCUS :( But let me loop in @dmsnell who I know will be there. Alternatively, we could connect on .org slack or zoom.

adamziel added a commit that referenced this issue Oct 23, 2024
Enables the CURL PHP extension on playground.wordpress.net when
networking is enabled.

The heavy lifting was done in #1926. All this PR does is:

* Enables the curl extension
* Rebuilds PHP.wasm for the web
* Enables curl_exec and curl_multiexec functions in web browsers
* Unrelated – adds a JSPI vs Asyncify indication to the SAPI name so
  that we can easily learn which PHP.wasm build Playground is running.

Related to #85
Closes #1008

 ## Testing instrucions

Confirm the new E2E tests are sound and that they work in CI. You could
also try installing a CURL-reliant plugin such as Plausible and confirm
it installs without the fatal errors reported in #1008
@adamziel
Copy link
Collaborator Author

I merged this significant milestone earlier today:

Next up:

adamziel added a commit that referenced this issue Oct 24, 2024
Enables the CURL PHP extension on
[playground.wordpress.net](http://playground.wordpress.net/) when
networking is enabled. This is made possible by the TLS 1.2
implementation merged in #1926.

This PR:

* Enables the curl extension
* Rebuilds PHP.wasm for the web
* Enables curl_exec and curl_multiexec functions in web browsers
* **Strips the response content-length and switches to
Transfer-Encoding: Chunked**
* Unrelated – adds a JSPI vs Asyncify indication to the SAPI name so
that we can easily learn which PHP.wasm build Playground is running

Related to #85
Closes #1008

## Why use Transfer-Encoding: chunked?

Web servers often respond with a combination of Content-Length
and Content-Encoding. For example, a 16kb text file may be compressed
to 4kb with gzip and served with a Content-Encoding of `gzip` and a
Content-Length of 4KB.

The web browser, however, exposes neither the Content-Encoding header
nor the gzipped data stream. All we have access to is the original
Content-Length value of the gzipped file and a decompressed data stream.

If we just pass that along to the PHP-side request handler, it would
see a 16KB body stream with a Content-Length of 4KB. It would then
truncate the body stream at 4KB and discard the rest of the data.

This is not what we want.

To correct that behavior, we're stripping the Content-Length entirely.
We do that for every single response because we don't have any way
of knowing whether any Content-Encoding was used. Furthermore, we can't
just calculate the correct Content-Length value without consuming the
entire content stream – and we want to pass each data chunk to PHP
as we receive it.

Instead of a fixed Content-Length, this PR uses Content-Encoding:
Chunked,
and then provides a per-chunk Content-Length. 

## Testing instrucions

Confirm the new E2E tests are sound and that they work in CI. You could
also try installing a CURL-reliant plugin such as Plausible and confirm
it installs without the fatal errors reported in #1008
@adamziel
Copy link
Collaborator Author

adamziel commented Oct 24, 2024

Curl is available in web browsers since #1935. fetch() is used as a network transport so the typical CORS limitations apply.

To solve, say, ~80% of the problem, we'd need to open up the CORS Proxy beyond talking to git. This is coming in the short to medium term.

To solve 100% of the problem, we'd need to tunnel the raw TCP traffic coming from Playground over a persistent WebSocket connection. In this scenario, we'd need a https://playground.wordpress.net/tcp-over-ws.php endpoint that would use stream_select to ingest data form Playground, pipe it to the network, and pipe the response bytes back to Playground. Definitely possible, especially with AsyncHttp\Client, but it's also non-trivial and I'm not sure what kind of appetite y'all have for such a feature. For now I'm taking a wild guess this is a very low priority project. If this is something that would help you, please comment on this issue and describe your use-case – if enough people come in, I'm happy to make it happen.

For now, here's what we need to close this issue:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants