Skip to content

lightpanda-io/browser

Repository files navigation

Logo

Lightpanda Browser

lightpanda.io


Lightpanda is the open-source browser made for headless usage:

  • Javascript execution
  • Support of Web APIs (partial, WIP)
  • Compatible with Playwright, Puppeteer through CDP (WIP)

Fast scraping and web automation with minimal memory footprint:

  • Ultra-low memory footprint (9x less than Chrome)
  • Exceptionally fast execution (11x faster than Chrome) & instant startup

See benchmark details.

Why?

Javascript execution is mandatory for the modern web

In the good old days, scraping a webpage was as easy as making an HTTP request, cURL-like. It’s not possible anymore, because Javascript is everywhere, like it or not:

  • Ajax, Single Page App, infinite loading, “click to display”, instant search, etc.
  • JS web frameworks: React, Vue, Angular & others

Chrome is not the right tool

If we need Javascript, why not use a real web browser? Take a huge desktop application, hack it, and run it on the server. Hundreds or thousands of instances of Chrome if you use it at scale. Are you sure it’s such a good idea?

  • Heavy on RAM and CPU, expensive to run
  • Hard to package, deploy and maintain at scale
  • Bloated, lots of features are not useful in headless usage

Lightpanda is built for performance

If we want both Javascript and performance in a true headless browser, we need to start from scratch. Not another iteration of Chromium, really from a blank page. Crazy right? But that’s we did:

  • Not based on Chromium, Blink or WebKit
  • Low-level system programming language (Zig) with optimisations in mind
  • Opinionated: without graphical rendering

Status

Lightpanda is still a work in progress and is currently at a Beta stage.

⚠️ You should expect most websites to fail or crash.

Here are the key features we have implemented:

  • HTTP loader
  • HTML parser and DOM tree (based on Netsurf libs)
  • Javascript support (v8)
  • Basic DOM APIs
  • Ajax
    • XHR API
    • Fetch API
  • DOM dump
  • Basic CDP/websockets server

NOTE: There are hundreds of Web APIs. Developing a browser (even just for headless mode) is a huge task. Coverage will increase over time.

You can also follow the progress of our Javascript support in our dedicated zig-js-runtime project.

Quick start

Install from the nightly builds

You can download the last binary from the nightly builds for Linux x86_64 and MacOS aarch64.

# Download the binary
$ wget https://github.com/lightpanda-io/browser/releases/download/nightly/lightpanda-x86_64-linux
$ chmod a+x ./lightpanda-x86_64-linux
$ ./lightpanda-x86_64-linux -h
usage: ./lightpanda-x86_64-linux [options] [URL]

  start Lightpanda browser

  * if an url is provided the browser will fetch the page and exit
  * otherwhise the browser starts a CDP server

  -h, --help      Print this help message and exit.
  --host          Host of the CDP server (default "127.0.0.1")
  --port          Port of the CDP server (default "9222")
  --timeout       Timeout for incoming connections of the CDP server (in seconds, default "3")
  --dump          Dump document in stdout (fetch mode only)

Dump an URL

$ ./lightpanda-x86_64-linux --dump https://lightpanda.io
info(browser): GET https://lightpanda.io/ http.Status.ok
info(browser): fetch script https://api.website.lightpanda.io/js/script.js: http.Status.ok
info(browser): eval remote https://api.website.lightpanda.io/js/script.js: TypeError: Cannot read properties of undefined (reading 'pushState')
<!DOCTYPE html>

Start a CDP server

$ ./lightpanda-x86_64-linux --host 127.0.0.1 --port 9222
info(websocket): starting blocking worker to listen on 127.0.0.1:9222
info(server): accepting new conn...

Once the CDP server started, you can run a Puppeteer script by configuring the browserWSEndpoint.

'use scrict'

import puppeteer from 'puppeteer-core';

// use browserWSEndpoint to pass the Lightpanda's CDP server address.
const browser = await puppeteer.connect({
  browserWSEndpoint: "ws://127.0.0.1:9222",
});

// The rest of your script remains the same.
const context = await browser.createBrowserContext();
const page = await context.newPage();

await page.goto('https://wikipedia.com/');

await page.close();
await context.close();

Build from sources

Prerequisites

Lightpanda is written with Zig 0.13.0. You have to install it with the right version in order to build the project.

Lightpanda also depends on zig-js-runtime (with v8), Netsurf libs and Mimalloc.

To be able to build the v8 engine for zig-js-runtime, you have to install some libs:

For Debian/Ubuntu based Linux:

sudo apt install xz-utils \
    python3 ca-certificates git \
    pkg-config libglib2.0-dev \
    gperf libexpat1-dev \
    cmake clang

For MacOS, you only need cmake:

brew install cmake

Install and build dependencies

All in one build

You can run make install to install deps all in one (or make install-dev if you need the development versions).

Be aware that the build task is very long and cpu consuming, as you will build from sources all dependancies, including the v8 Javascript engine.

Step by step build dependancy

The project uses git submodules for dependencies.

To init or update the submodules in the vendor/ directory:

make install-submodule

Netsurf libs

Netsurf libs are used for HTML parsing and DOM tree generation.

make install-netsurf

For dev env, use make install-netsurf-dev.

Mimalloc

Mimalloc is used as a C memory allocator.

make install-mimalloc

For dev env, use make install-mimalloc-dev.

Note: when Mimalloc is built in dev mode, you can dump memory stats with the env var MIMALLOC_SHOW_STATS=1. See https://microsoft.github.io/mimalloc/environment.html.

zig-js-runtime

Our own Zig/Javascript runtime, which includes the v8 Javascript engine.

This build task is very long and cpu consuming, as you will build v8 from sources.

make install-zig-js-runtime

For dev env, use make iinstall-zig-js-runtime-dev.

Test

Unit Tests

You can test Lightpanda by running make test.

Web Platform Tests

Lightpanda is tested against the standardized Web Platform Tests.

The relevant tests cases are committed in a dedicated repository which is fetched by the make install-submodule command.

All the tests cases executed are located in the tests/wpt sub-directory.

For reference, you can easily execute a WPT test case with your browser via wpt.live.

Run WPT test suite

To run all the tests:

make wpt

Or one specific test:

make wpt Node-childNodes.html

Add a new WPT test case

We add new relevant tests cases files when we implemented changes in Lightpanda.

To add a new test, copy the file you want from the WPT repo into the tests/wpt directory.

⚠️ Please keep the original directory tree structure of tests/wpt.

Contributing

Lightpanda accepts pull requests through GitHub.

You have to sign our CLA during the pull request process otherwise we're not able to accept your contributions.