Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Censorship resistant bootstrapping (e.g. for wikipedia) #3908

Open
ianopolous opened this issue May 8, 2017 · 21 comments
Open

Censorship resistant bootstrapping (e.g. for wikipedia) #3908

ianopolous opened this issue May 8, 2017 · 21 comments
Labels
effort/weeks Estimated to take multiple weeks exp/expert Having worked on the specific codebase is important P1 High: Likely tackled by core team if no one steps up status/in-progress In progress topic/dht Topic dht
Milestone

Comments

@ianopolous
Copy link
Member

Version information:

N/A

Type:

Enhancement

Severity:

Medium (unless you're in Turkey, then High)

Description:

I was thinking about the attack vectors for censorship of the recently hosted wikipedia in Turkey, and I believe a significant weak point is the bootstrap process. Currently it is a hardcoded (in a config file) list of domains/ips. This is public and easy for an oppressor to add to a blacklist.

One proposed mitigation to this would be to have a fallback bootstrap method which used Tor. Tor have thought a lot more about attacks in this area, and using them would be easy. The simplest would be a Tor client that just contacts one of the bootstrap nodes through Tor to then bootstrap via. Clearly this is only as strong as the Tor bootstrapping mechanism, but as mentioned above that is a well studied problem.

This would mean that a binary of ipfs that was distributed in Turkey through USB sticks would still work even if ipfs.io and all the public ipfs bootstrap nodes were blacklisted.

@Kubuxu
Copy link
Member

Kubuxu commented May 8, 2017

Ides which we had was to use already known nodes for "after the first" launches. Also we could ship IPFS with hundreds of nodes weight sorted according to some criteria and use both of those mechanisms at the same time.

@ianopolous
Copy link
Member Author

@Kubuxu I thought that once a node had bootstrapped once it didn't need the bootstrap nodes again? Does it use them every time it restarts?

@ianopolous
Copy link
Member Author

I think if they are going to blacklist your bootstrap node list, then it doesn't matter how long the list is.

@Kubuxu
Copy link
Member

Kubuxu commented May 8, 2017

Unfortunately, currently yes.

@ianopolous
Copy link
Member Author

Ok then that's an independent problem to address, but I would say that is a lower priority than a proper Tor based fix (which would solve both problems for many threat models).

@Kubuxu
Copy link
Member

Kubuxu commented May 8, 2017

How does the TOR bootstrap? Wouldn't it face similar problems?

We for sure can run TOR hidden service with ever updating list of nodes one could bootstrap off.

@ianopolous
Copy link
Member Author

Tor does face similar problems, but they have spent a long time trying to solve them, for example using unpublished bridges rather than the public directory servers.

@ianopolous
Copy link
Member Author

ianopolous commented May 8, 2017

Note I wasn't suggesting running a Tor hidden service in the simplest case, just using Tor to access a public ipfs bootstrap node.

@djdv
Copy link
Contributor

djdv commented May 9, 2017

@Kubuxu

Also we could ship IPFS with hundreds of nodes weight sorted according to some criteria and use both of those mechanisms at the same time.

Would something like a dynamic list be useful, where it's populated with nodes sorted by daemon uptime (as well as adding some randomness to the selection process)?
For instance a list of nodes that's generated/published to ipns every few hours. It might help against static blacklists that are manually updated, but I don't know how common that kind of setup is, so it may be pointless.

In addition there's still the matter of connecting to grab said list for the first time, either through IPNS or something else. The only connections you could really rely on communicating with would have to be things something like mDNS would pick up on, physically close ad-hoc networks used for bootstrapping, then you could either do peer exchanging or maybe message relaying. I've got no idea on that aspect.

@matthewrobertbell
Copy link

A possible option would be to take inspiration from how botnets work: Use a dynamic set of domains / subdomains (changing over time) to publish a list of bootstrap nodes which change over time, either via HTTP or DNS TXT records.

@ghost
Copy link

ghost commented Jul 18, 2017

Another idea is domain fronting it e.g. with google cloud: libp2p/libp2p#18

@elitak
Copy link

elitak commented Sep 17, 2017

I suggest adding cmdlets for as many methods to circumvent that can be imaginied, e.g.:

ipfs bootstrap scrape raw file:///mnt/usb/bootstrap-list.txt # provide support for as many URI schemes as possible, including ssh://, magnet:
ipfs bootstrap scrape tor # grabs a list from wellknown1.onion, wellknown2.onion and adds it to bootstrap list, via the http proxy hooked up to tor on localhost port NNNN(changeable in config)
ipfs bootstrap scrape domain-front reputable-domain.com #sends some standardized request over https, requires complicity by the reputable-domain
ipfs bootstrap scrape dns fast-or-double-flux-bootstrapper.com
ipfs bootstrap scrape irc irc.efnet.net ipfs-bootstrap # talks to a bot on that server+channel in a predefined way
ipfs bootstrap scrape twitter @username # reads tweets in rev-chrono order on this account and interprets them as URIs to dial until N successes
ipfs bootstrap scrape bittorrent-dht # use bittorrent DHT to find potential endpoints; i.e., ipfs running on same addresses. That subset is selected by fetching a specific torrent that signifies the host also acts as an ipfs bootstrap node
ipfs bootstrap scrape netscan # last hope; just dials IPv4/6 addresses randomly on port 4001 until it hits something

Some of these are trivial to implement as bash scripts. In Go, it would take (me, at least) a bit more effort, but each could be written conforming to a simple plugin API, then static-linked in along with whatever meta-bootstrap data are needed.

@kpcyrd
Copy link
Contributor

kpcyrd commented Sep 17, 2017

@elitak that might integrate nicely. Given the command ipfs bootstrap scrape foo arg1 --arg=2, ipfs could try to execute ipfs-bootstrap-scrape-foo-fetch arg1 --arg=2 and read addresses from it's output. This way you can write plugins for bootstrapping in any language, similar to how git works. Preferably the command would be shorter.

As an alternative, I think you can always add nodes using the ipfs api with a project seperate from ipfs.

@whyrusleeping
Copy link
Member

ipfs does have a method for allowing git-style pluggable programs as subcommands (ipfs update works this way). Its currently whitelisted so ipfs has to at least know it should try searching for a given external subcommand before it will work.

@elitak
Copy link

elitak commented Sep 19, 2017

Much as I'm tempted to hack away at adding all these as bash scripts, I think it'd be foolhardy not to implement them instead in Go, so as to carry forward the static-linked, cross-platform portability that's already afforded, especially so for utilities whose audience may not be very tech-savvy to begin with.

I'll have a stab at adding some basic ones, but no promises on how soon.

@elitak
Copy link

elitak commented Sep 19, 2017

To expand on using bittorrent's DHT: the hypothetical trick I came up with would be to compute the infohash (.torrent without the header) for a file generated locally, containing something like "0.1.0;ip4;4001"(no newline, could also be JSON), to lookup all daemons running v0.1.0 protocol on ipv4 addresses on port 4001. Using (preferably) a built-in minimal bittorrent client, or an external one, the ipfs daemon would obtain a list of addresses for peers distributing that hash on the bittorrent DHT network, and assume that each peer that did was doing so was thus advertising an ipfs daemon matching the criteria, hosted at the same IP4/6 address. Any outsider could include his daemon in the bootstrap list by simply seeding the appropriate file in a DHT-enabled bittorrent cilent running on the same machine.

The greatest weaknesses I see with this method are that ports need to be guessed (4001 probably being the only decent candidate) and the high chance that the bittorrent DHT bootstrap nodes are blocked along with the ipfs ones. The latter could be offset by also scraping the hash from common public trackers (http and https, udp), obtaining similar results, in complete absence of DHT connectivity. DHT-type networks from other p2p apps could be utilized in the same fashion, probably.

@raulk
Copy link
Member

raulk commented Feb 15, 2019

Ideas being discussed in libp2p/go-libp2p-kad-dht#254.

@Stebalien Stebalien added status/in-progress In progress P1 High: Likely tackled by core team if no one steps up topic/dht Topic dht exp/expert Having worked on the specific codebase is important effort/days Estimated to take multiple days, but less than a week effort/weeks Estimated to take multiple weeks and removed effort/days Estimated to take multiple days, but less than a week labels May 29, 2020
@Jorropo
Copy link
Contributor

Jorropo commented Jul 27, 2023

This was done in #8856

@Jorropo Jorropo closed this as completed Jul 27, 2023
@github-project-automation github-project-automation bot moved this from 🥞 Todo to 🎉 Done in IPFS Shipyard Team Jul 27, 2023
@ianopolous
Copy link
Member Author

I don't think #8856 solves this. That solves the secondary problem of subsequent restarts. The original problem of initial bootstrap still is unsolved.

@Jorropo Jorropo reopened this Jul 28, 2023
@Jorropo
Copy link
Contributor

Jorropo commented Jul 28, 2023

I don't think there is a not over complicated good solution to that.
If we let's say capture 100 nodes while doing every release and store them in the binary someone could download each release and ban the 100 nodes everytime. It's more work for them but does not really solve the problem. I still want to do something like this but as a protection if our bootstrapers are down.

Else "forum based bootstrapping" where you ask someone to give you 100 random nodes and add them to your bootstrap list is the only way to solve the initial boot without over engineering a solution that is just gonna put us in the treadmill problem.

@ianopolous
Copy link
Member Author

Agreed, it is a hard problem.

There are some options that work for varying threat models:

  1. Using another network like Tor or I2P to contact existing bootstrappers
  2. Use domain fronting where it still works
  3. Investigate what Tor's meek-azure mode is doing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
effort/weeks Estimated to take multiple weeks exp/expert Having worked on the specific codebase is important P1 High: Likely tackled by core team if no one steps up status/in-progress In progress topic/dht Topic dht
Projects
No open projects
Status: 🎉 Done
Development

No branches or pull requests