-
Notifications
You must be signed in to change notification settings - Fork 526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
9.0 test plan #15569
Comments
I tested #15094 (and related #15360). Test scenario:
Stack monitoring was displaying APM metrics and |
tested #15211 with otelgen, adding a RecordError() call to all spans |
#14921 tested in #14921 (comment) |
Tested the upgrade scenarios, details in upgrade scenario testing One blocker issue was found related to the Cloud UI (and potentially API), which is tracked in https://elasticco.atlassian.net/browse/CP-10318. |
I validated the changes #15524 with a standalone 9.0 APM Server build. I ran 3 scenarios, with the following TBS settings: sampling.tail:
enabled: true
interval: 1m
policies:
- sample_rate: .5
discard_on_write_failure: true 1. Disk capacity at 75% fresh APM Server deployment, the load is generated with continuous apmbench.As expected before the disk threshold 80% was reached the TBS was working in the normal mode as expected.
and
TBS continued to make an incremental progress according to expiring TTL records in the DB. df .
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda1 2974484 2381332 576768 81% / du -sh data/tail_sampling/
245M data/tail_sampling/ ![]() 2. Disk capacity at 80% restart APM Server deployment with existing DB, the load is generated with continuous apmbench.As with the scenario #1 the same behavior continued as we would expect it. 3. Disk capacity at 85% fresh APM Server deployment, the load is generated with continuous apmbench.In this scenario we start with disk utilization above the configured threshold right at the beginning. The result is the same warning logs coming from the APM Server as in scenario #1. While all traces are getting discarded, since no space to store any sampled traces this time. du -sh data/tail_sampling/
32K data/tail_sampling/ One surprising observation that sampling decision were still getting posted to the ElasticSearch despite the disk threshold. This doesn't really cause any direct problem but should be separately investigated later. ![]() |
When validating #15235 I used the checklist provided by @carsonip:
1. The upgrade from a previous TBS setup with Badger to a new TBS setup with Pebble.Validated with a local version of APM Server - everything worked as expected, no error have been observed. To validate the new behavior I used scaled up version of policies:
- sample_rate: 0.1
trace.name: "foo"
- sample_rate: 0.25
trace.name: "bar"
- sample_rate: 0.05 2. Run TBS over 2 * TTL, ensure disk usage is bounded by checking TBS monitoring metrics (and actual local disk usage if running locally), and does not OOM.Validated for both on prem and on ECH with scaled up ![]() ![]() |
Manual Test Plan
List of changes: v8.18.0...v9.0.0 (both tags are not there yet)
Smoke Testing ESS setup
Thanks to #8303 further smoke tests are run automatically on ESS now.
Consider extending the smoke tests to include more test cases which we'd like to cover
go-docappender library
No changes, same dependency version used
apm-data library
No changes, same dependency version used
Test cases from the GitHub board
Add yourself as assignee on the PR before you start testing.
apm-server 9.0.0 test-plan
Add yourself as assignee on the PR before you start testing.
Tasks
8.18
to9.0
in various settings (with TBS enabled, apm-server standalone vs. fleet managed, etc)Regressions
The text was updated successfully, but these errors were encountered: