bevyengine · james7132 · Mar 2, 2023 · Mar 4, 2023 · Mar 4, 2023 · Mar 4, 2023
diff --git a/content/news/2023-03-04-bevy-0.10/index.md b/content/news/2023-03-04-bevy-0.10/index.md
@@ -485,6 +485,43 @@ fn inspect_changes_system<T: Component + Debug>(q: Query<Ref<T>>) {
 
 We are also deprecating `ChangeTrackers<T>`, which is the old way of inspecting a component's change ticks. This type will be removed in the next version of Bevy.
 
+## Renderer Optimizations
+
+<div class="release-feature-authors">authors: @danchia, Rob Swain (@superdump), james7132, @kurtkuehnert, @robfm</div>
+
+Bevy's renderer has had quite a few low hanging fruit for optimization.
+
+The biggest bottleneck when rendering anything in Bevy is the final render stage, where we collect all of the data in the render world to issue draw calls to the GPU. The core loops here are extremely hot and any extra overhead is noticeable. In **Bevy 0.10**, we've thrown the kitchen sink at this problem and have attacked it from every angle. Overall, these following optimizations should make the render stage **2-3 times faster** than it was in 0.9:
+
+ * In [#7639](https://github.com/bevyengine/bevy/pull/7639) by @danchia, we found that even disabled logging has a strong impact on hot loops, netting us 20-50% speedups in the stage.
+ * In [#6944](https://github.com/bevyengine/bevy/pull/6944) by @james7132, we shrank the core data structures involved in the stage, reducing memory fetches and netting us 9% speedups.
+ * In [#6885](https://github.com/bevyengine/bevy/pull/6885) by @james7132, we rearchitected our `PhaseItem` and `RenderCommand` infrastucture to combine common operations when fetching component data from the `World`, netting us a 7% speedup.
+ * In [#7053](https://github.com/bevyengine/bevy/pull/7053) by @james7132, we changed `TrackedRenderPass`'s allocation patterns to minimize branching within these loops, netting a 6% speedup.
+ * In [#7084](https://github.com/bevyengine/bevy/pull/7084) by @james7132, we altered how we're fetching resources from the World to minimize use of atomics in the stage, netting a 2% speedup.
+ * In [#6988](https://github.com/bevyengine/bevy/pull/6988) by @kurtkuehnert, we changed our internal resource IDs to use atomically incremented counters instead of UUIDs, reducing the comparison cost of some of the branches in the stage.
+
+One other ongoing development is enabling the render stage to properly parallelize command encoding across multiple threads. Following [#7248](https://github.com/bevyengine/bevy/pull/7248) by @james7132, we now support ingesting externally created `CommandBuffer`s into the render graph, which should allow users to encode GPU commands in parallel and import them into the render graph. This is currently blocked by wgpu, which currently locks the GPU device when encoding render passes, but we should be able to support parallel command encoding as soon as that's addressed.
+
+On a similar note, we've made steps to enable higher parallelism in other stages in the rendering pipeline. `PipelineCache` has been a resource that almost every Queue stage system needed to access mutably, but also only rarely needed to be written to. In [#7205](https://github.com/bevyengine/bevy/pull/7205), @danchia changed this to use internal mutability to allow for these systems to parallelize. This doens't fully allow every system in this stage to parallelize just yet, as there still remains a few common blockers, but it should allow non-conflicting render phases to queue commands at the same time.
+
+Optimization isn't all about CPU time! We've also improved memory usage, compile times, and GPU performance as well!
+
+ * We've also reduced the memory usage of `ComputedVisibility` by 50% thanks to @james7132. This was done by replacing the internal storage with a set of bitflags instead of multiple booleans.
+ * @robfm also used type erasure as a work-around a [rustc performance regression](https://github.com/rust-lang/rust/issues/99188) to ensure that rendering related crates have better compile times, with some of the crates compiling **up to 60% faster**! Full details can be seen in [#5950](https://github.com/bevyengine/bevy/pull/5950).
+ * In [#7069](https://github.com/bevyengine/bevy/pull/7069), @superdump reduced the number of active registers used on the GPU to prevent register spilling, significantly improving GPU-side performance.
+
+Finally, we have made some improvements on specific usage scenarios:
+
+ * In [#6833](https://github.com/bevyengine/bevy/pull/6833), @james7132 improved the extraction of bones for mesh skinning by 40-50% by omitting an unnecessary buffer copy.
+ * In [#7311](https://github.com/bevyengine/bevy/pull/7311), @james7132 improved UI extraction by 33% by lifting a common computation out of a hot loop.
+
+## Parallelized Transform Propagation and Animation Kinematics
+
+<div class="release-feature-authors">authors: @james7132</div>
+
+TODO: Parallelize forward kinematics animation systems
+TODO: Parallelized transform propagation
+
 ## Android Support
 
 <div class="release-feature-authors">authors: @mockersf, @slyedoc</div>