Feature flag to disable asset stats #668

bartekn · 2018-09-18T17:13:24Z

Asset stats calculations have become very CPU intensive with the recent growth of operations per ledger and assets in, both, public and test networks. Because it's part of the ingestion session it sometimes caused delays in ingesting new ledgers but also slows down connected stellar-core DB (calculations are done in core DB).

The original plan I discussed with @MonsieurNicolas was to extract asset stats calculations outside ingestion and make it run every X seconds, independently of ingestion. The problem is that it's a huge architectural change and requires much more work than I expected to make it right and test it properly. Given all the arguments above and the fact that it's already causing operational issues for some organizations I think we should design it along with the upcoming big architectural change in Horizon and ingest package and have a quickfix now.

This PR introduces a feature flag that can be set using environment variable or command line param. When DISABLE_ASSET_STATS=true assets stats calculation are turned off during ingestion and /assets endpoint is not present. This is not a breaking change as the default value is false. It can be released in 0.14.1 patch release.

MonsieurNicolas · 2018-09-18T17:49:29Z

Right now it's updating stats every time a ledger is ingested, this PR makes it never update.
While this helps with one off Horizon deployments, it doesn't really help with the public facing endpoints.

What I would suggest then as it's a very similar change (code wise) is to change this from a boolean to an update period (default is 1 = updated every ledger close): that way we would make the SDF public nodes expose the endpoint but only update every 120 ledgers (10 minutes) for example.

What do you think?

bartekn · 2018-09-18T18:19:41Z

What I would suggest then as it's a very similar change (code wise) is to change this from a boolean to an update period (default is 1 = updated every ledger close): that way we would make the SDF public nodes expose the endpoint but only update every 120 ledgers (10 minutes) for example.

This is one of the ideas I had too but it will still trigger high CPU load every X minutes this time possibly causing longer delays as list of assets to check grows bigger. And it still involves a big rearchitecture of the system.

Currently every ingest.Session keeps track of assets that should be recalculated. We have two options: keep track of the asset in the higher layer (possibly ingest.System) or recalculate all assets stats. The former option seems better but there are still some unknowns, ex. the list of assets can grow to max ops per ledger * # ledgers in 10 minutes (currently 600k in testnet correct?). What are memory and DB CPU implications of this? What we can end up with is instead a of short delay every busy ledger we can have, say, 5 minutes of calculations every 10 minutes. Both situations are really bad.

I propose the following solution:

Let's merge this PR and release in 0.14.1 today/tomorrow. This way we can quickly solve delays for organizations that run core+horizon and experience high CPU load on their core instances.
Start testing "calculate asset stats every X ledgers" solution, if it works fine we can release it later this month as another feature flag.
In the long term, let's think about and implement solution to solve this problem completely (make this calculations faster). Possibly connected with ingest package rearchitecture/plugins etc.

MonsieurNicolas · 2018-09-18T19:59:47Z

Oh I see, I thought the cost of running was pretty much fixed cost.

This seems fine then. I'll let other @stellar/platform-committers merge.

MonsieurNicolas

looks good

nikhilsaraf

LGTM.

some thoughts on how to amortize the cost over N ledgers for a future PR:

if you write the list of modified assets to a db table ever ledger, there will be many overwrites because of repeats (which is good). This will effectively reduce the average number of assets needed to be calculated on a per-ledger basis, thereby amortizing the cost (because of the repeats).
you can have a background thread that looks at this table, copies over the first K values to a new table (to unblock insertions from the main ingestion thread), and then recomputes these K values. This will ensure we keep the cost fixed to a max of K values and the insertion table will also contain dates so assets are not "out of sync" for too long.
params N and K should be configurable, as indicated in the comments above.

Feature flag to disable asset stats

c2c4446

bartekn added this to the Horizon v0.14.x - patch releases milestone Sep 18, 2018

bartekn requested review from nullstyle, tomerweller, MonsieurNicolas and nikhilsaraf and removed request for nullstyle September 18, 2018 17:13

MonsieurNicolas reviewed Sep 19, 2018

View reviewed changes

nikhilsaraf approved these changes Sep 19, 2018

View reviewed changes

bartekn merged commit 9eca875 into master Sep 19, 2018

bartekn deleted the disable-asset-stats-flag branch September 19, 2018 14:47

bartekn mentioned this pull request Sep 27, 2018

Make asset stats disabled by default in 0.15.0 #698

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature flag to disable asset stats #668

Feature flag to disable asset stats #668

bartekn commented Sep 18, 2018 •

edited

Loading

MonsieurNicolas commented Sep 18, 2018 •

edited

Loading

bartekn commented Sep 18, 2018 •

edited

Loading

MonsieurNicolas commented Sep 18, 2018

MonsieurNicolas left a comment

nikhilsaraf left a comment

Feature flag to disable asset stats #668

Feature flag to disable asset stats #668

Conversation

bartekn commented Sep 18, 2018 • edited Loading

MonsieurNicolas commented Sep 18, 2018 • edited Loading

bartekn commented Sep 18, 2018 • edited Loading

MonsieurNicolas commented Sep 18, 2018

MonsieurNicolas left a comment

Choose a reason for hiding this comment

nikhilsaraf left a comment

Choose a reason for hiding this comment

bartekn commented Sep 18, 2018 •

edited

Loading

MonsieurNicolas commented Sep 18, 2018 •

edited

Loading

bartekn commented Sep 18, 2018 •

edited

Loading