Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]org.opensearch.performanceanalyzer.collectors.CacheConfigMetricsCollector$CacheMaxSizeStatus #644

Closed
coredump17 opened this issue Apr 6, 2024 · 19 comments · Fixed by #657
Labels
bug Something isn't working untriaged

Comments

@coredump17
Copy link

What is the bug?
Since upgrading from 2.12 to 2.13 i see the below WARN messages spamming the logs
Json Mapping Error: Cannot invoke “java.lang.Long.longValue()” because “this.cacheMaxSize” is null (through reference chain: org.opensearch.performanceanalyzer.collectors.CacheConfigMetricsCollector$CacheMaxSizeStatus[“Cache_MaxSize”])

How can one reproduce the bug?
install 2.13

What is the expected behavior?
no errors logged

What is your host/environment?
opensearch 2.13 container

@coredump17 coredump17 added bug Something isn't working untriaged labels Apr 6, 2024
@timolow
Copy link

timolow commented Apr 9, 2024

Having this issue too.

@guldil
Copy link

guldil commented Apr 10, 2024

Same here.

@sarankup
Copy link

Same here

@Gradlon
Copy link

Gradlon commented Apr 11, 2024

Same here.

@ComBin
Copy link

ComBin commented Apr 11, 2024

Me too

@geckiss
Copy link

geckiss commented Apr 12, 2024

Same here

@dxturner
Copy link

see this also.

@slayerjk
Copy link

slayerjk commented Apr 17, 2024

same here(after update from 2.12 to 2.13)

@rtista
Copy link

rtista commented Apr 18, 2024

Same here on after update from 2.12.0 to 2.13.0.

@22charud
Copy link

I have the same issue on 2.13

@TheHansam
Copy link

Same here on a fresh install on AlmaLinux 9.3.

@pmarjou22
Copy link

same here

@tdankers
Copy link

tdankers commented Apr 23, 2024

Same here :-( Version "2.13.0

cluster.name": "opensearch", "node.name": "ubuntu", "message": "Json Mapping Error: Cannot invoke "java.lang.Long.longValue()" because "this.cacheMaxSize" is null (through reference chain: org.opensearch.performanceanalyzer.collectors.CacheConfigMetricsCollector$CacheMaxSizeStatus["Cache_MaxSize"])", "cluster.uuid": "P6RyCh4KS5SObyb7k05akA", "node.id": "V9D7KQFqRgKoiNsOop8UzQ" }

@merlinz01
Copy link

merlinz01 commented Apr 26, 2024

Same here. Version 2.13.0 on Debian 12. Is there perhaps a setting that needs to be set?

Downgraded to 2.12.0 to bypass the issue for now.

@merlinz01
Copy link

merlinz01 commented May 4, 2024

I upgraded back to 2.13.0 and removed the Performance Analyzer plugin, and the errors aren't appearing for me.

Seems to be related to JSON marshaling of a performance metric perhaps?

v1.13.0

main

@JsonInclude(Include.NON_NULL)
private final Long cacheMaxSize;

@cinhtau
Copy link

cinhtau commented May 29, 2024

I just disable the Performance Analyzer on my cluster Version 2.14 today as described in https://opensearch.org/docs/latest/monitoring-your-cluster/pa/index/#disable-performance-analyzer. Was making the rolling upgrade troublesome.

@ansjcy
Copy link
Member

ansjcy commented May 29, 2024

There are some exceptions raised in the collectMetrics function in CacheConfigMetricsCollector. The current logic returns a CacheMaxSizeStatus with null cacheMaxSize if exception is raised, while we require it to be non-null.

@varunsrivathsav, @atharvasharma61, @psychbot, let's investigate this further to understand:

  • What is causing the error to be thrown in 2.13?
  • We should fix the above bug in the code to raise/log the exception, rather than returning an Object with null value.

@caothu159
Copy link

caothu159 commented May 31, 2024

While waiting for the new upgrade, you can perform the following fixes:
In debian or ubuntu, make opensearch.service auto restart on fail, crash or has an unclean exit:

  • Edit service file, example /lib/systemd/system/opensearch.service
  • In [Service] before [Install], add 2 lines:
Restart=on-failure
RestartSec=60s
  • Run command systemctl daemon-reload to reload units
  • Run systemctl restart opensearch anh see

Example results:

...
# Allow a slow startup before the systemd notifier module kicks in to extend the timeout
TimeoutStartSec=75

Restart=on-failure
RestartSec=60s

[Install]
WantedBy=multi-user.target
...

@borisdenis
Copy link

borisdenis commented Jun 19, 2024

I have the same issue on 2.14
Ubuntu 22.04

{ "name" : "opensearch1", "cluster_name" : "graylog", "cluster_uuid" : "F71gNpV-TUSjVbscIkUSTg", "version" : { "distribution" : "opensearch", "number" : "2.14.0", "build_type" : "deb", "build_hash" : "aaa555453f4713d652b52436874e11ba258d8f03", "build_date" : "2024-05-09T18:50:48.052504416Z", "build_snapshot" : false, "lucene_version" : "9.10.0", "minimum_wire_compatibility_version" : "7.10.0", "minimum_index_compatibility_version" : "7.0.0" }, "tagline" : "The OpenSearch Project: https://opensearch.org/" }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working untriaged
Projects
None yet
Development

Successfully merging a pull request may close this issue.