Release v0.2.0
Pre-release
Pre-release
This release focuses on scaling up Global Resource Service (GRS) to support total 5 million nodes come from different regions.
Highlights include:
- In regular node status change scenario, GRS delivers 99% of node status changes to consumers (schedulers) within 90 milliseconds
- In massive node outage scenario (5M nodes have status changes), GRS can deliver all events to consumers within 1 minute
- Latency and throughput data:
Test Case | Watch Latency (ms) | List (ms) | Register (ms) | Throughput (events/s) |
||
P50 | P90 | P99 | ||||
Regular changes | 32 | 73 | 88 | 1,389 | 64 | N/A |
Massive outages | 23,606 | 44,900 | 47,810 | 1,381 | 66 | 96,241 |
Detailed test configurations are listed here.
Features/Improvements/Bug fixes:
Architectural Changes:
- Use list/watch mechanism to replace the pull model between data aggregator and Global Resource Service API in favor of latency in regular node status change scenario (PR 196)
Features and Engineering Improvements:
- Add Admin APIs: support single node status query in Global Resource Service (PR 147, 161)
- Add integration test framework and test cases for CICD pipeline support (PR 195)
Scalability and Performance Tuning:
- Support profiling for Global Resource Service (PR 114)