You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Continued performance issues after upgrade to 4.1.1
Request Type
Bug
Work Environment
Question
Answer
OS version (server)
Ubuntu
OS version (client)
18.04
TheHive version / git hash
4.1.1 (docker image 4.1.1-2
Package Type
Docker
Browser type & version
Various
Problem Description
After upgrading from 4.0.5-1 to 4.1.0 and then 4.1.1:
audit entries don't show in the application "live stream" view.
I get the familiar "AuditSrv" error after a while
the "Data Index Status" section of the "Platform Status" page does not load (i.e. user session times out before it loads).
This was consistent behaviour for 4.1.0 and 4.1.1.
The Audit table has 1,265,475 entries.
During initial indexing, there were a number of "org.janusgraph.diskstorage.TemporaryBackendException: Temporary failure in storage backend" errors. Removing MAX_HEAP_SIZE and HEAP_NEWSIZE settings on cassandra removed these.
During initial periods after the upgrade, there was evidence of memory exhaustion. More RAM was added and the host and thehive was given 16g via -e JAVA_OPTS='-Xms16g -Xmx16g'
Without the "Platform Status" page, I have been able to reindex with curl: curl -k "https://<host>:9000/api/v1/admin/index/Case/reindex" -H 'Authorization: Bearer *authwibble*'
I have re-run these for each Index and the logs show that these complete successfully.
Snippets from the Audit reindex logs:
Mar 25 21:39:52 hivehost01 docker[26287]: [info] o.t.s.m.Database [00000020|] Reindex job is running: 1265475 record(s) indexed
Mar 25 21:39:53 hivehost01 docker[26287]: [info] o.j.g.d.m.ManagementSystem [|] Index update job successful for [AuditRequestidMainaction]
Mar 25 21:39:53 hivehost01 docker[26287]: [info] o.t.s.m.Database [00000020|] Reindex job is finished
Mar 25 21:47:59 hivehost01 docker[26287]: [info] o.t.s.m.Database [00000020|] Reindex job is running: 0 record(s) indexed
Mar 25 21:48:00 hivehost01 docker[26287]: [info] o.t.s.m.Database [00000020|] Reindex job is running: 0 record(s) indexed
Mar 25 21:48:01 hivehost01 docker[26287]: [info] o.j.g.o.j.IndexRepairJob [|] Found index Audit
Mar 25 21:48:01 hivehost01 docker[26287]: [info] o.t.s.m.Database [00000020|] Reindex job is running: 0 record(s) indexed
Mar 25 21:48:02 hivehost01 docker[26287]: [info] o.j.g.d.m.ManagementSystem [|] Index update job successful for [Audit]
Mar 25 21:48:02 hivehost01 docker[26287]: [info] o.t.s.m.Database [00000020|] Reindex job is finished
Our implementation had been "misusing" tags (per the 4.1.0 release blog) and had some long tags containing links to raw alerts etc. This was evidenced with a 6sec load time on /api/v1/query?name=list-tags. I have deleted these tags from the "Custom Tags" view. Is it possible something in the Audit content could be causing this? Is it possible to truncate / compact the Audit table?
Probably unrelated but I see this on start of the server: Mar 25 21:27:40 hivehost01 docker[26287]: [warn] c.d.d.c.RequestHandler [|] Query '[4 bound values] SELECT column1,value,writetime(value) AS writetime,ttl(value) AS ttl F ROM thehive.graphindex WHERE key=:key AND column1>=:sliceStart AND column1<:sliceEnd LIMIT :maxRows;' generated server side warning(s): Read 947 live rows and 5788 tombstone cells for query SELECT * FROM thehive.graphindex WHERE key = 022689a05461e7 AND column1 >= 00 AND column1 < ff LIMIT 5000; token -8419547459570797906 (see tombstone_warn_threshold)
I have multiple times deleted & reconfigured the index. After restart (and before index), the "platform status" page loads (all indexes = "ERROR"). After I click "Reindex" on Audit, the indexing completes and the same performance issue is present. I can then no longer refresh / view the Index Status section of the Platform Status page.
The text was updated successfully, but these errors were encountered:
The problem is that this issue prevents a production upgrade to 4.1.1 (I didn't mention that this is a UAT instance) and leaves us stranded on 3.x. The UI is just too slow and if I have 2 or 3 analysts logged in, the CPU on the host becomes saturated.
Continued performance issues after upgrade to 4.1.1
Request Type
Bug
Work Environment
Problem Description
After upgrading from 4.0.5-1 to 4.1.0 and then 4.1.1:
This was consistent behaviour for 4.1.0 and 4.1.1.
The Audit table has 1,265,475 entries.
Steps to Reproduce
Complementary information
Other observations / debug actions:
During initial indexing, there were a number of "org.janusgraph.diskstorage.TemporaryBackendException: Temporary failure in storage backend" errors. Removing MAX_HEAP_SIZE and HEAP_NEWSIZE settings on cassandra removed these.
During initial periods after the upgrade, there was evidence of memory exhaustion. More RAM was added and the host and thehive was given 16g via
-e JAVA_OPTS='-Xms16g -Xmx16g'
Without the "Platform Status" page, I have been able to reindex with curl:
curl -k "https://<host>:9000/api/v1/admin/index/Case/reindex" -H 'Authorization: Bearer *authwibble*'
I have re-run these for each Index and the logs show that these complete successfully.
Snippets from the Audit reindex logs:
Our implementation had been "misusing" tags (per the 4.1.0 release blog) and had some long tags containing links to raw alerts etc. This was evidenced with a 6sec load time on /api/v1/query?name=list-tags. I have deleted these tags from the "Custom Tags" view. Is it possible something in the Audit content could be causing this? Is it possible to truncate / compact the Audit table?
Probably unrelated but I see this on start of the server:
Mar 25 21:27:40 hivehost01 docker[26287]: [warn] c.d.d.c.RequestHandler [|] Query '[4 bound values] SELECT column1,value,writetime(value) AS writetime,ttl(value) AS ttl F ROM thehive.graphindex WHERE key=:key AND column1>=:sliceStart AND column1<:sliceEnd LIMIT :maxRows;' generated server side warning(s): Read 947 live rows and 5788 tombstone cells for query SELECT * FROM thehive.graphindex WHERE key = 022689a05461e7 AND column1 >= 00 AND column1 < ff LIMIT 5000; token -8419547459570797906 (see tombstone_warn_threshold)
I have multiple times deleted & reconfigured the index. After restart (and before index), the "platform status" page loads (all indexes = "ERROR"). After I click "Reindex" on Audit, the indexing completes and the same performance issue is present. I can then no longer refresh / view the Index Status section of the Platform Status page.
The text was updated successfully, but these errors were encountered: