Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: update the status.* fleets so that they keep using 1MiB as maximum relay message size #2305

Closed
Ivansete-status opened this issue Dec 18, 2023 · 17 comments
Assignees
Labels
status-waku-integ All issues relating to the Status Waku integration.

Comments

@Ivansete-status
Copy link
Collaborator

Ivansete-status commented Dec 18, 2023

Background

We are about to update the default maximum msg size to 150 KiB as per the following PR: #2298
Therefore, we need to change the status deployment scripts so that we keep using 1MiB there and they use the same message size constraints.

Thanks @alrevuelta for the heads up on this!

Details

We will need help from the infra team to update the following scripts so the --max-msg-size="1024KiB" is passed to the wakunode2 executable in status.* fleets:
https://github.com/status-im/infra-role-nim-waku/blob/master/templates/docker-compose.yml.j2
https://github.com/status-im/infra-status/blob/master/ansible/group_vars/status-node.yml

Notice that this msg size limitation will be available from version v0.24.0 onwards and then, we need to apply the deployment change simultaneously with the status.* upgrade to v0.24.0.

Acceptance criteria

The status fleets run well and all major protocols (relay, light-push, filter, store) work well without issues.

@Ivansete-status
Copy link
Collaborator Author

I set this as part of the v0.24.0 but this cannot be implemented until the v0.24.0 is ready and deployed in status.* fleets.

@Ivansete-status
Copy link
Collaborator Author

Ivansete-status commented Jan 9, 2024

@SionoiS - this is the issue that helps us to align the upgrade of status.* fleets to v0.24.0 with the infra team ( cc @jakubgs )

@chair28980 chair28980 added the status-waku-integ All issues relating to the Status Waku integration. label Jan 9, 2024
@chair28980 chair28980 moved this from To Do to Priority in Waku Jan 9, 2024
@jakubgs
Copy link
Contributor

jakubgs commented Jan 9, 2024

What do you mean by "align the upgrade"? You mean synchronize config changes with upgrade to new version?

When do we want to do that? And what config changes are necessary?

@Ivansete-status
Copy link
Collaborator Author

Morning @jakubgs

What do you mean by "align the upgrade"?

I mean to coordinate the deployment of v0.24.0 with changes in the ansible scripts so that the new parameter is properly configured (--max-msg-size="1024KiB" .)

You mean synchronize config changes with upgrade to new version?

Yes, exactly that

When do we want to do that?

At the time version v0.24.0 will get deployed into any of the status fleets. We (nwaku team) will ping you when the release should happen.

And what config changes are necessary?

--max-msg-size="1024KiB"

( cc @SionoiS @gabrielmer )

@jakubgs
Copy link
Contributor

jakubgs commented Jan 15, 2024

And what's the schedule for 0.24.0 release? I see there's an RC?

@SionoiS
Copy link
Contributor

SionoiS commented Jan 15, 2024

And what's the schedule for 0.24.0 release? I see there's an RC?

@jakubgs

The plan is to test v0.24.0-rc1 on wakuv2.test shards.test until 22 January then to deploy to wakuv2.prod

As for Status, I do not know.

@jakubgs
Copy link
Contributor

jakubgs commented Feb 1, 2024

I see we are already running v0.24.0 on shards fleet:

admin@store-01.do-ams3.shards.test:/docker/nim-waku-store % d inspect harbor.status.im/wakuorg/nwaku:deploy-shards-test | grep commit
                "commit": "7fc8e322",
admin@store-01.do-ams3.shards.test:/docker/nim-waku-store % grep max docker-compose.yml                                              
      --max-connections=300

But the flag was not applied yet. I can do it today.

@jakubgs
Copy link
Contributor

jakubgs commented Feb 1, 2024

The shards.test fleet was updated:

It appears to work fine.

jakubgs added a commit to status-im/infra-status that referenced this issue Feb 1, 2024
waku-org/nwaku#2305

Signed-off-by: Jakub Sokołowski <jakub@status.im>
jakubgs added a commit to status-im/infra-status-legacy that referenced this issue Feb 1, 2024
waku-org/nwaku#2305

Signed-off-by: Jakub Sokołowski <jakub@status.im>
@jakubgs
Copy link
Contributor

jakubgs commented Feb 1, 2024

Flag deployed to status.test along with v0.24.0 release:

Appears to work.

@jakubgs
Copy link
Contributor

jakubgs commented Feb 1, 2024

I have also bumped status.test DB data volumes from 40 GB to 100 GB while at it to avoid storage issues in near future:

Also found a bug in volume size configuration.

@jakubgs
Copy link
Contributor

jakubgs commented Feb 1, 2024

The status.prod fleet is also on the same quite old commit as status.test was.

I will wait till tomorrow to check performance of status.test and then perform the same upgrade on status.prod.

@jakubgs
Copy link
Contributor

jakubgs commented Feb 2, 2024

It appears these changes have caused the disk usage to increase rapidly:

image

This is 100 GB of disk space used up in ~6 hours. This is not normal or acceptable behavior.

@jakubgs
Copy link
Contributor

jakubgs commented Feb 2, 2024

This is db-01.do-ams3.status.test, 60 GB in ~5 hours:

image

@jakubgs
Copy link
Contributor

jakubgs commented Feb 2, 2024

We can clearly see big spikes in traffic:

image

Which corresponds with disk usage spikes:

image

@jakubgs
Copy link
Contributor

jakubgs commented Feb 5, 2024

Current growth looks fine, so it seems like that was external test/attack rather than result of the deployed changes:

image

I will upgrade status.prod next.

@jakubgs
Copy link
Contributor

jakubgs commented Feb 5, 2024

All nodes have been upgraded no status.prod:

 > a status-node -o -a '/docker/nim-waku/rpc.sh get_waku_v2_debug_v1_version | jq .result'
node-02.do-ams3.status.prod | CHANGED | rc=0 | (stdout) "v0.24.0-rc.0-9-g93427f"
node-01.do-ams3.status.prod | CHANGED | rc=0 | (stdout) "v0.24.0"
node-01.gc-us-central1-a.status.prod | CHANGED | rc=0 | (stdout) "v0.24.0"
node-02.gc-us-central1-a.status.prod | CHANGED | rc=0 | (stdout) "v0.24.0"
node-01.ac-cn-hongkong-c.status.prod | CHANGED | rc=0 | (stdout) "v0.24.0"
node-02.ac-cn-hongkong-c.status.prod | CHANGED | rc=0 | (stdout) "v0.24.0"
 > a status-node -o -a 'grep max-msg-size /docker/nim-waku/docker-compose.yml'
node-01.do-ams3.status.prod | CHANGED | rc=0 | (stdout)       --max-msg-size=1024KiB
node-02.do-ams3.status.prod | CHANGED | rc=0 | (stdout)       --max-msg-size=1024KiB
node-01.ac-cn-hongkong-c.status.prod | CHANGED | rc=0 | (stdout)       --max-msg-size=1024KiB
node-02.ac-cn-hongkong-c.status.prod | CHANGED | rc=0 | (stdout)       --max-msg-size=1024KiB
node-01.gc-us-central1-a.status.prod | CHANGED | rc=0 | (stdout)       --max-msg-size=1024KiB
node-02.gc-us-central1-a.status.prod | CHANGED | rc=0 | (stdout)       --max-msg-size=1024KiB

I consider this done.

@github-project-automation github-project-automation bot moved this from Priority to Done in Waku Feb 5, 2024
@jakubgs
Copy link
Contributor

jakubgs commented Feb 5, 2024

Richard suggested to run something like:

select distinct contentTopic from messages where storedAt >= ____ and storedAt <= ____

Next time we see this abuse.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status-waku-integ All issues relating to the Status Waku integration.
Projects
Archived in project
Development

No branches or pull requests

4 participants