🏛️ - An archivist browser controller that caches everything you browse, a library server with full text search to serve your archive.
This project literally makes your web browsing available COMPLETELY OFFLINE. Your browser does not even know the difference. It's literally that amazing. Yes.
Save your browsing, then switch off the net and go to http://localhost:22120
and switch mode to serve then browse what you browsed before. It all still works.
warning: if you have Chrome open, it will close it automatically when you open 22120, and relaunch it. You may lose any unsaved work.
3 ways to get it:
- Get binary from the releases page., or
- Install globally via npm:
npm i -g archivist1
, or - Clone this repo and run as a Node.JS app:
npm i && npm start
Also, coming soon is a Chrome Extension.
Go to http://localhost:22120 in your browser, and follow the instructions.
Archive will be located in $your_user_home_directory/22120-arc/public/library
But it's not public, don't worry!
The archive format is:
22120-arc/public/library/<resource-origin>/<sha1-path-hash>.json
Inside the JSON file, is a JSON object with headers, response code, key and a base 64 encoded response body.
Uses DevTools protocol to intercept all requests, and caches responses against a key made of (METHOD and URL) onto disk. It also maintains an in memory set of keys so it knows what it has on disk.
No.
Interacts just fine. The things ad blockers stop will not be archived.
Seems pretty secure. It's not exposed to the public internet, and pages you load that tried to use it cannot use the protocol for anything (except to open a new tab, which they can do anyway).
Yes this is totally free to download and use. It's also open source so do what you want with it.
- Full text search
- Library server to serve archive publicly.
- Distributed p2p web browser on IPFS
The following are probably hard (and I haven't thought much about):
- Streaming content (audio, video)
- "Impure" request response pairs (such as if you call GET /endpoint 1 time you get "A", if you call it a second time you get "AA", and other examples like this).
- WebSockets (how to capture and replay that faithfully?)
Probably some way to do this tho.
Yes! Put any domains into $HOME/22120-arc/no.json
, eg:
[
"*.google.com",
"*.cnn.co?"
]
Will not cache any resource with a host matching those. Wildcards:
*
(0 or more anything) and?
(0 or 1 anything)
Yes, just make sure you set an environment variable called DEBUG_22120
to anything non empty.
So for example in posix systems:
export DEBUG_22120=True
Yes, there's a control for changing the archive path in the control page: http://localhost:22120
There's a few command line arguments. You'll see the format printed as the first printed line when you start the program.
For other things you can examine the source code.