v0.2.3
v0.2.3 implemented thread attribution approach 1 as described in #86. Approach 1 is logical and easy to prove compared to Approach2. The implementation will not introduce data race but still maintains low overhead.
Proved effectiveness through simple examples.
Added thread imbalance examples to the paper.
Improved performance and made Scaler faster than all other tools.
Improve benchmarksuites to include kernel memory measurement and make results more stable by adding delay between benchmarks.