Skip to content
This repository has been archived by the owner on Feb 12, 2024. It is now read-only.

Firefox - crashes with excessive thread activity #950

Closed
mitra42 opened this issue Aug 17, 2017 · 33 comments
Closed

Firefox - crashes with excessive thread activity #950

mitra42 opened this issue Aug 17, 2017 · 33 comments
Assignees
Labels
exp/wizard Extensive knowledge (implications, ramifications) required kind/bug A bug in existing code (including security flaws) P0 Critical: Tackled by core team ASAP

Comments

@mitra42
Copy link

mitra42 commented Aug 17, 2017

Type: Bug

Severity: Severe

Description:

The most serious bug - that makes the system unusable - is the extreme thread usage on Firefox. Firefox normally runs with about 60 threads (according to Mac Activity Monitor)

As soon as I open a page which starts IPFS/IIIF it adds about another 40 threads, BUT the Idle Wake Ups starts hitting the 10’s of thousands (three orders of magnitude more than anything else on the box). Load Average (as reported by “top” or “uptime" starts rising and quickly exceeds 100, by which time the machine is running too slow to do much more than kill Firefox.

Is anyone else running js-ipfs in browser, are you seeing this especially with iiif ?

Note - on Chrome the behavior is different - there aren't as many wake ups, but CPU load grows (slower) until it hits a point where Chrome gives the "Aw Snap! Something went wrong while displaying this webpage." error.

Steps to reproduce the error:

@daviddias daviddias added kind/bug A bug in existing code (including security flaws) exp/wizard Extensive knowledge (implications, ramifications) required P1 High: Likely tackled by core team if no one steps up labels Aug 18, 2017
@haadcode
Copy link
Member

haadcode commented Aug 24, 2017

I've seen this has started happening in Orbit, too (see orbitdb-archive/orbit-web#9 (comment) and note that the bug itself is not related).

"That said, I've noticed Orbit has started randomly crashing (both Chrome and FF) and on Firefox it just kills the browser completely (hangs). This is seemingly random and I haven't been able to debug it yet."

This is indeed the same behaviour as described above by @mitra42.

Important to note here is that Orbit's code (deployed at orbit.chat) hasn't changed since April and as far as I can tell, only the browsers have gotten updated. So that makes me think that a) we always had this problem or b) it's something shared between the browsers (eg. webrtc). Or both.

There's an open bug for Firefox at https://bugzilla.mozilla.org/show_bug.cgi?id=1389812 which also has some interesting details re. webrtc usage, perhaps related.

@daviddias
Copy link
Member

daviddias commented Aug 24, 2017

Thanks for reporting. This mostly due to WebRTC being hungry for machine resources and the fact that so many operations are happening on the Main thread.

Solutions for this problem are:

@daviddias
Copy link
Member

Another thing that is specially annoying is Chrome's aggressive resource throttling and the lack of way to disable that (other than having things on a Service Worker).

@mitra42
Copy link
Author

mitra42 commented Aug 24, 2017

OK - this sounds pretty bad ... If I understand you.

  • We could run our own server at the Internet Archive following instructions at that address.
  • BUT wouldn't that disconnect us from the rest of IPFS, which somewhat removes the point of using a Content-addressed system in the first place.
  • ALSO - that still has scaling issues, i.e. as soon as any apps scaled they'd hit the same issue.
  • I don't quite get your point "Avoid using WebRTC at all and put IPFS running on a Service or WebWorker". We are using WebRTC because in order to get the ability to post/retrieve/notify on a list one of you suggested I used IIIF which specifies WebRTC.

And moving to a WebWorker seems like a massive re-architecting of the upper layers to get around this bug in lower layers, and I'm not sure it would even work since the problem is that the WebRTC (or how IIIF/IPFS is using it) causes it to consume huge amounts of resources.

I find on Chrome our demos crash within about 5 minutes, just sitting there with no UI or interaction with IPFS other than having started it.
IpfsIiifDb( {"ipfs":{"repo":"/tmp/ipfs_dweb20170820","config":{"Addresses":{"Swarm":["/libp2p-webrtc-star/dns4/star-signal.cloud.ipfs.team/wss"]},"Discovery":{"webRTCStar":{"Enabled":true}}},"EXPERIMENTAL":{"pubsub":true}},"store":"indexeddb","partition":"dweb20170820"} )

@daviddias
Copy link
Member

@mitra42 seems that my points were missing some context. To clarify:

Ideal - Figure out Connection Closing for IPFS and implement it. This is doable and one of our priorities.

Connection Closing is a feature that any P2P network requires, it is currently a bottleneck both in go-ipfs and js-ipfs that we've been actively working towards a scalable solution. We've multiple ideas and are discussing how to implement it.

You just happen to notice it faster in js-ipfs because a) The browser has way less resources than a go-ipfs running a desktop and b) WebRTC is super resource intensive, try any video chat app that uses it nowadays and your laptop fans will start flying :)

You can implement an application level connection closing policy and close connections that no longer interest you.

We could run our own server at the Internet Archive following instructions at that address.
BUT wouldn't that disconnect us from the rest of IPFS, which somewhat removes the point of using a Content-addressed system in the first place.

Yes, I proposed that and the reason is that it would be a way to introduce the nodes that are interested into the same content without having to connect to every other node in the network that is being booted for other apps.

It is a good policy to connect nodes that are interested in a topic faster (i.e orbit with orbit nodes, IIIF with IIIF nodes, etc).

Content Addressing will still work.

I don't quite get your point "Avoid using WebRTC at all and put IPFS running on a Service or WebWorker". We are using WebRTC because in order to get the ability to post/retrieve/notify on a list one of you suggested I used IIIF which specifies WebRTC.

WebRTC is resource intensive and browsers do not like that. You could actually turn on WebRTC transport and disable WebRTC-Star discovery that connects all the nodes automatically, enabling you to pick the ones you are using.

We are developing Relay, which will enable Browser nodes to connect to any other node in the network through a WebSockets connection, reducing vastly the resources consumption.

And moving to a WebWorker seems like a massive re-architecting of the upper layers to get around this bug in lower layers,

Separating the View layer from the Transport Layer shouldn't come as a foreign idea, that already happens for all the other transports in the Browser. In fact, we are already working on a Browser integration that would put IPFS in a background process so that you don't even have to load it in your app and one of the other solutions is exposing IPFS through an Extension.

There is a lot of work ahead of us, but progress moving steadily and today we are able to run full P2P protocol on the browser, things will just get better moving forward :)

@mitra42 if you and your team would like to work more closely with us, either by creating tests, identifying performance benchmarks and so on, please let us know, we appreciate help from our open source contributors and I'm happy to give you pointers on how to improve certain things :)

@mitra42
Copy link
Author

mitra42 commented Aug 24, 2017

You can implement an application level connection closing policy and close connections that no longer interest you.

How can we do this at the app level ? We aren't opening and closing connections at the app level, this is happening entirely inside IPFS library code.

We can run our own server, but as I said there will still be a scaling problem when the number of nodes on any app exceeds enough to bring WebRTC to its knees.

Content Addressing will still work.

Maybe I'm misunderstanding something, but my assumption was that if we have the app connect to a separate webrtc server and store a block, and someone on some other app refers to the same block elsewhere (by its hash) will they still get it ?

Re your point on webrtc, I want to be clear - we aren't explicitly using WebRTC from choice and certainly aren't explicitly conncting to other nodes since anything will be running in the browser. I was told (by Matt) that the best way to do what we needed was to use IIIF and the IIIF examples used WebRTC (I'm pretty sure that's where the "WebRTC" link came from).

Separating the View layer from the Transport Layer shouldn't come as a foreign idea .

Agreed, and they are separate in our framework, its the Transport Layer that is causing the problems ! Its still a massive job to rearchitect something to use WebWorkers.

@diasdavid - I'd be happy to jump on a call to figure out to make this move more smoothly, and be more of help.

@daviddias
Copy link
Member

daviddias commented Aug 25, 2017

How can we do this at the app level ? We aren't opening and closing connections at the app level, this is happening entirely inside IPFS library code.

You can disable Discovery. It defaults to true (see https://github.com/ipfs/js-ipfs/blob/master/src/core/runtime/config-browser.json#L13-L16) but you can overload the config as it is explained in this section of the README https://github.com/ipfs/js-ipfs#advanced-options-when-creating-an-ipfs-node

We can run our own server, but as I said there will still be a scaling problem when the number of nodes on any app exceeds enough to bring WebRTC to its knees.

The number of open WebRTC Connections needs to be under control more strictly than other types of connections.

Relay will help this too because we will be able to multiplex multiple connections to multiple peers over one socket.

I was told (by Matt) that the best way to do what we needed was to use IIIF and the IIIF examples used WebRTC (I'm pretty sure that's where the "WebRTC" link came from).

And that was a good suggestion. The IIIF team enabled shared annotations over IPFS using IIIF-DB which was designed to fit the needs of the project. I believe @flyingzumwalt was suggesting to use IIIF-DB as an inspiration and a learning tool of how to create a CRDT powered DB over IPFS and not as here is the off the shelf solution that solves all your problems. Bare in mind that IIIF DB is still under active development. You read me more about CRDT and the work being actively developed at https://blog.ipfs.io/30-js-ipfs-crdts.md.

As we talked in the past over email some months ago, it would be excellent if you could share with us your goal for the application and the architecture that you are building, I feel that there is some shoehorn happening here and that is creating some obstacles. Happy to chat more next week :)

@mitra42
Copy link
Author

mitra42 commented Aug 25, 2017

Yes - our goal was, and still is, to build on top of IPFS, however pretty much everything we tried didn't function or didn't function as documented, or wasn't available on JS-IPFS (e.g. IPNS) or depended on something else which didn't work. That's how we ended up with IIIF, because something else (I think Y-connector) wasn't working as documented.

Our docs - very much a work in progress so are still on Google Docs, but it doesn't really cover the key primitive we are trying to get IPFS to do - which works, except that it crashes because of this WRTP issue. Lets setup a call for next week - I have a lot of flexibility.

@mitra42
Copy link
Author

mitra42 commented Aug 25, 2017

Its unclear from those links what the implication of disabling discover is. If we disable discovery as suggested above will the pubsub feature still work, and will a resource stored at one node be viewable at another?

@daviddias
Copy link
Member

@mitra42 PubSub will work still. You can learn more about what Discovery is and how things get plugged in the Tutorials of libp2p https://github.com/libp2p/js-libp2p/tree/master/examples

I'll review your Google Doc over the weekend and ping you to chat next week :)

@daviddias
Copy link
Member

Is this still an issue in latest master (without WebRTC)?

@mitra42
Copy link
Author

mitra42 commented Sep 12, 2017

It seems to be happening a lot slower now e.g. 30 mins to crash.

To be clear ... we are loading with config options

"yarray":{
    "db":{"name":"indexeddb"},
    "connector":{"name":"ipfs","room":"dweb20170908"},
    "share":{"array":"Array"}},
"ipfs":{
    "repo":"/tmp/ipfs_dweb20170908",
    "config":{"Addresses":{"Swarm":["/dns4/star-signal.cloud.ipfs.team/wss/p2p-webrtc-star"]}},
    "EXPERIMENTAL":{"pubsub":true}},

Which looks to me like its still webrtc for IPFS, and presumably also for YJS, but I think that is what you gave Kyle on Friday (to replace
Swarm: ["/libp2p-webrtc-star/dns4/star-signal.cloud.ipfs.team/wss"]
Let me know if you want me to try a different config.

@daviddias
Copy link
Member

There is more perf fixing incoming :) See dignifiedquire/pull-block#2 by @Beanow.

The other thing we are working specifically for your use-case, @mitra42, is libp2p/js-libp2p#122. Once that lands, I'll update you on how to configure your node and things should improve dramatically :)

@daviddias daviddias added the status/ready Ready to be worked label Sep 13, 2017
@daviddias daviddias added P0 Critical: Tackled by core team ASAP and removed P1 High: Likely tackled by core team if no one steps up labels Oct 11, 2017
@mitra42
Copy link
Author

mitra42 commented Oct 18, 2017

Just a flag that we are still seeing random crashes and overloads on Chrome that may be related to this bug. Its hard to tell, as the most common case is coming back to a web browser and finding the page has crashed. (Note there is no activity in the page during time away from it, its just IPFS running)

@Beanow
Copy link

Beanow commented Oct 19, 2017

@mitra42 that's what I would expect as well. The performance improvements in pull-block will be most noticeable when adding files to IPFS. For example ipfs-inactive/js-ipfs-unixfs-engine#187 (comment)

Note, you may need to check if your NPM gave you pull-block v1.2.1 yet. If not you can test explicitly including the dependency in your root package.json.

You mention this happens when idling. So this is probably still WebRTC related.

@mitra42
Copy link
Author

mitra42 commented Oct 19, 2017

Yes - NPM updated to pull-block 1.2.1,

Am i understanding you correctly, that if its crashing when idle then changed to pull-block won't effect it, i.e. we can expect idle pages that do essentially nothing other than load IPFS to crash ?

@daviddias daviddias added status/in-progress In progress and removed status/ready Ready to be worked labels Nov 6, 2017
@mitra42
Copy link
Author

mitra42 commented Nov 16, 2017

Is there any progress on this - I'm suprised to see new version of IPFS appearing when with this bug its essentially unusuable on Javascript ?

@Beanow
Copy link

Beanow commented Nov 16, 2017

@mitra42 In the meantime pull-block has had another performance update, now on v1.4.0. Though again if your issue is WebRTC related this won't fix that.

Can you confirm if you have the issue using a websockets transport?

@mitra42
Copy link
Author

mitra42 commented Nov 16, 2017

@Beanow , we aren't explicitly using a transport, we are connecting to IPFS via the only version of the config options we've been given that works. I think it came from David via Kyle.

config: {
     repo: '/tmp/ipfs_dweb20171029',
     Addresses: { Swarm: [ '/dns4/star-signal.cloud.ipfs.team/wss/p2p-webrtc-star']},
    EXPERIMENTAL: {  pubsub: true }
}

Note ... we are using yjs with the ipfs connector, but this error occurs even before we connct to Y and when our app has done nothing but started IPFS.
And that we - as for most users - are in no position to modify this file to "use websockets" as there is no usable documentation for the Config options. We just use something that someone else gives us or is pulled from one of the demos.

@mitra42
Copy link
Author

mitra42 commented Nov 16, 2017

Try https://dweb.me/examples/example_block.html as a repeatable example - mean time to crash the brower tab is about 10 minutes of doing nothing.
And, I'm more than happy to try a different set of config options if you can propose a set.

If you want to repeat it,
The repo is at https://github.com/internetarchive/dweb-transport/ and the config options are in the TransportIPFS.js file. "npm run bundle" is used to compile them into a file in the right place to run locally just by opening "example_block.html" in your browser.

@Beanow
Copy link

Beanow commented Nov 16, 2017

Thanks for reproducing and the example code.

Indeed Firefox Quantum 57.0
image

And Chromium 62.0
image

@Beanow
Copy link

Beanow commented Nov 16, 2017

I've done some testing to run both your and my own example with libp2p-websocket-star.
It's not for the feint of heart to configure at the moment though. It required code changes directly to js-ipfs and the configuration I came up with is undocumented. But I can confirm that this hasn't crashed any browsers for me while having working IPFS bitswap and pubsub.

The required code changes are already in progress here. libp2p/js-libp2p#122

@mitra42
Copy link
Author

mitra42 commented Nov 16, 2017

Yes - and that would be ok if there was at least one configuration that worked in the browser, but currently there doesn't seem to be ANY, so there is nothing for people to build apps on while waiting for IPFS to come up with improvements. It surprises me that any new JS releases are being pushed when there is no working browser version. Its not like that example does anything complex, all it does is start IPFS, sit around, and crash !

@daviddias
Copy link
Member

@mitra42 I appreciate the feedback and believe me, I do feel that pain that is seeing a Browser running out of memory. It is important to note that there are multiple people working really hard towards shipping a solution that will mitigate that issue and also bring lots of performance improvements to js-ipfs.

Today we have identified the problem (WebRTC is an unstoppable memory hog), we fixed some other performance issues (i.e pull-block, browserify-aes and Stream API) and we do have two proposals for solving the WebRTC issue: a) Implement ConnManager or b) Offer websocket-star as a first class transport.

You can get involved in the solutions as they get implemented or you can be patience and provide support. For example, having examples is definitely very useful.

@AquiGorka
Copy link
Member

AquiGorka commented Nov 17, 2017

Oh wow! I've been having this problem too and I thought I was not correctly setting up YJS with js-ipfs. I have been trying to run my own signalling server to see if this still happens with a limited number of connected nodes.

@diasdavid could you help me out in that idea? I don't seem to find any documentation to configure my own signaling server.

I have my own node running (js-ipfs) and a browser node. The two are connected: I setup the ipfs config to Bootstrap only from the ws address provided from my js-ipfs node - and I can see one peer connected from ipfs_node.swarm.peers().

Still I see no activity in the signaling server logs. I am using https://github.com/libp2p/js-libp2p-webrtc-star.

@daviddias
Copy link
Member

@AquiGorka let's open a new issue to help get you set up. Nevertheless, did you happen to see these notes? https://github.com/libp2p/js-libp2p-webrtc-star#rendezvous-server-aka-signalling-server.

js-ipfs performance and stability is now better, I was successful at running js-ipfs for a while (well over 30 mins) and upload more than 750Mb with #1086

Once that PR is merged, the last mile is: #962

Closing this issue, let's track the development of the fixes in the links above.

@ghost ghost removed the status/in-progress In progress label Nov 17, 2017
@mitra42
Copy link
Author

mitra42 commented Nov 17, 2017

@diasdavid - was there a reason to close this before the problem is actually solved ! Its not clear from either #962 or #1086 that they are about closing the problem of Firefox crashing and as @AquiGorka shows there are other people hitting this issue who are not going to realise that either #962 or #1086 are addressing it. Not - until they work - is it clear that either of those fixes will solve it.

@daviddias
Copy link
Member

@mitra42 as mentioned in the comment above and also on my other answer at #988 (comment) + the note from the experiment @Beanow run at #950 (comment), we know that:

@mitra42
Copy link
Author

mitra42 commented Nov 17, 2017

@diasdavid What config are you using for the working version. Note - we don't choose to use WebRTC, this was recommended by you and Kyle.

Is there a recommended better alternative than ...

config: {
     repo: '/tmp/ipfs_dweb20171029',
     Addresses: { Swarm: [ '/dns4/star-signal.cloud.ipfs.team/wss/p2p-webrtc-star']},
    EXPERIMENTAL: {  pubsub: true }
} 

that works in browsers please let us know.

@Beanow
Copy link

Beanow commented Nov 18, 2017

@diasdavid unfortunately for most applications not using WebRTC isn't an option, as you lose pubsub support and there's no alternative (websocket-star / circuit-relay) ready for an official release yet.

@Beanow
Copy link

Beanow commented Nov 18, 2017

As not using WebRTC does not solve the issue, I've opened #1088 to track possible solutions.

@daviddias
Copy link
Member

Thank you @Beanow! That's perfect

@AquiGorka
Copy link
Member

Nevertheless, did you happen to see these notes? https://github.com/libp2p/js-libp2p-webrtc-star#rendezvous-server-aka-signalling-server.

Yep. I setup the signal-server accordingly.

Issue for configuration advice/help: #1092

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
exp/wizard Extensive knowledge (implications, ramifications) required kind/bug A bug in existing code (including security flaws) P0 Critical: Tackled by core team ASAP
Projects
None yet
Development

No branches or pull requests

5 participants