-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Animating more than 30 elements per frame causes frame drops #3854
Comments
Can you confirm the crashes still happen after #3852 – this crash was something we also notices during testing and #3852 is resolving that problem for us. We've actually been getting that crash much sooner than after 5 mins and with the increase number of circles it'd happen within few seconds, and after that patch it'd no longer occur. Regarding BokehExample what needs to be mentioned is that it isn't really implemented optimally, it's never meant to be and in its current form it serves well as a benchmark. It is important for us to know what type of devices can run at what framerate. The most time consuming part in that example is due to the fact that circles manipulate layout properties. Because of that we need to recalculate layout on every single frame and in addition to that, this actually happen as many times as there are circles – reanimated isn't optimized for this type of scenario (yet?) and we don't do any batching of updates that require layouting (note that with other type of updates batching isn't really necessary) |
The below diff does a relatively simple optimization that changes that example to use transforms instead of layout props:
|
@kmagiera Kindly find my replies below.
Cannot confirm your patch is working. There is one thing regarding this. Looks like calling
I've tried this here. Rendering performance is not an issue. Reanimated is. We should optimize better.
I've got some information for OPPO A16. These are the measurements of how long updateProps executes on average (in ms) + FPS. As you can see, I've done 5 measurements in a row after a fresh restart for every build type.
All these measurements were done with 25 elements updating |
Regarding the crash we've identified the actual root cause and the fix is here #3859 |
As for monkey patching rAF I did check this but forgot to mention that it requires #3838 to work that was merged recently and you perhaps don't have that patch included |
Confirm it's no longer crashing. But batching |
@kmagiera Do you think batching JSI calls to updateProps could be efficient? Or maybe it's about data conversion? |
I think we could batch all prop updates in JS and call JSI-based function Another idea for performance improvement on Android might be doing exactly the same but in the native code. When non-layout props like Good news is that on Fabric we already apply all ShadowTree updates at once (done in #3369) in At this point I think it might consider adding |
@tomekzaw Can we discuss this on a Zoom call, possibly? |
I'm not sure about batching updateProps calls, this isn't as straightforward as with rAF calls, where we can save a number of JSI roundtrips. With updateProps we still need to pass the additional data so we'd have fewer calls but still more data to be unmarshalled using JSI. I wouldn't consider adding this unless we have a clear indication that it improves performance. |
Same concern here. Our gains might be insignificant, since marshaling will have to process roughly same amount of data. I'll try this anyways. Keep digging. |
Hey @sergeymorkovkin, there's one more thing which @kmagiera has reminded me of. When @j-piasecki was profiling flash-list on Android, he noticed that the performance depends on the NDK version used. In particular, Hermes built from sources with NDK 24 is way slower than the one build using NDK 21. If you use RN 0.70 app, NDK 24 is used by default on M1 Macs, see here. I know this is not really related to poor performance of Edit: I added some systraces and here are the first results. I'm using a Samsung M23 (not really low-end), app in release build, using profileable instead of debuggable, 100% battery, not plugged in. Using BokehExample with 100 views with the transforms patch applied (i.e. no layout props). First, |
Here's my update on profiling Reanimated on Android. I added some more systraces in Java, C++ and JS code (as JSI binding). First, we noticed that one chunk of code related to color processing was left after shareable rewrite is no longer necessary, see #3872. Further ideas include optimizing JSI/JNI conversions and prop filtering ( |
@tomekzaw Can you please confirm you're purely using Java-based Systrace and exposing same to both C++ and JS? |
@sergeymorkovkin Here's how I added systraces: Javaimport com.facebook.systrace.Systrace;
Systrace.beginSection(Systrace.TRACE_TAG_REACT_JAVA_BRIDGE, "performOperations");
// ...
Systrace.endSection(Systrace.TRACE_TAG_REACT_JAVA_BRIDGE); C++First I've filled the implementation of #include <android/trace.h>
struct DummySystraceSection {
public:
template <typename... ConvertsToStringPiece>
explicit DummySystraceSection(
const char *name,
ConvertsToStringPiece &&...args) {
ATrace_beginSection(name);
}
~DummySystraceSection() {
ATrace_endSection();
}
};
using SystraceSection = DummySystraceSection; Since They are meant to be used in RAII-style: {
SystraceSection s("JNIHelper::ConvertToPropsMap");
// ...
} // ~DummySystraceSection() is automatically called when the scope ends JavaScriptIn auto clb10 = [](
jsi::Runtime &rt,
const jsi::Value &thisValue,
const jsi::Value *args,
const size_t count) -> jsi::Value {
ATrace_beginSection(args[0].asString(rt).utf8(rt).c_str());
return jsi::Value::undefined();
};
jsi::Value fun10 = jsi::Function::createFromHostFunction(
rt, jsi::PropNameID::forAscii(rt, "_beginSection"), 1, clb10);
rt.global().setProperty(rt, "_beginSection", fun10);
auto clb11 = [](
jsi::Runtime &rt,
const jsi::Value &thisValue,
const jsi::Value *args,
const size_t count) -> jsi::Value {
ATrace_endSection();
return jsi::Value::undefined();
};
jsi::Value fun11 = jsi::Function::createFromHostFunction(
rt, jsi::PropNameID::forAscii(rt, "_endSection"), 0, clb11);
rt.global().setProperty(rt, "_endSection", fun11); I've also needed to whitelist these two names in const globals = new Set([
// ...
'_beginSection',
'_endSection',
]); Because some of the worklets are also run on the main JS context, here's how to use them: if (_WORKLET) _beginSection('UpdateProps processColor');
// ...
if (_WORKLET) _endSection(); I did some initial benchmarks and I think I have one idea how to optimize execution time of single |
I've pushed my changes to |
I'm back with the numbers. TL;DR
Note: I'm still using the old terminology for consistency with the code, "UI props" means non-layout props while "native props" are props that require layout recalculation when changed. |
@tomekzaw I'll try to replicate these results on my device. Will also look into weird mappers behavior with react-navigation. |
@sergeymorkovkin The number of const update = () => {
left.value = withTiming(left.value + getRandomPositionDiff(), config);
top.value = withTiming(top.value + getRandomPositionDiff(), config);
hue.value = withTiming(hue.value + getRandomHueDiff(), config);
}; When it comes to the performance of updateProps, one problem with using Finally, I couldn't find a place in our code where we explicitly invoke layout recalculation after we update layout props in |
You're totally right, I confirm your observation. The thing is in my other example I'm not using style-nested animations, and that's why I've missed it completely. As for the transformation matrices, I suppose, those should normally be sent to the GPU, right? Why RN does this double calculation, is not very clear to me. It's clearly calling MatrixMathHelper.decomposeMatrix(). Looks like the bottleneck we were looking for. As for calling dispatchViewUpdates(), I suppose, it's good, right? |
Well, currently React Native runs these matrix calculations on the CPU because it needs to call
Yeah,
Yes, it's correct. The only problem with this chunk of code is that sometimes, in a specific scenario, it can deadlock, see #3879, but this is another story. |
My another observation is that working with WritableMap's inside updateProps is also expensive. Possibly, this is why your approach of splitting it into three separate methods works. Let me try it... |
Yeah, basically Edit: I think that the average execution time of |
@tomekzaw Confirm that it works notably better with splitting updateProps on the JS side. It's animating 100 circles at roughly 45-50 FPS. We're close to resolving it. My further steps are:
Also, there is a reason they calculate transformation matrix. The thing is it's allowed to perform same transformation many times, e.g.:
|
That's great! I will submit a PR with 2b54074 then. These changes also need to be adapted for iOS but the speedup probably won't be this significant.
I was actually thinking of optimizing transforms on my own but honestly I don't have a clear idea yet on how to handle this (should we introduce another group of props, like let's say "direct props"?) so feel free to share your ideas. For the reference, here's a complete list of all non-layout props. I noticed that the list already contains
These are all valid ideas for further research. In particular, it would be great if you could debug mappers when used with react-navigation or even come up with some test cases that we can use for unit tests for
Yes also the order of transforms matters (i.e. |
@tomekzaw You can surely submit a PR, but did you think about batching calls to updateProps? Now we need to batch 3 of them separately. I'd possibly wait. What do you think?
What we could do is allow passing an object for a transform prop:
Of course, in such a case developer code will need to work around specifying multiple same-type transforms. Given this feature is more a convenience mechanism, I think this might be a fair compromise (to avoid matrix calculation). Still, the order of transforms is a concern... |
@sergeymorkovkin After some time spent on profiling, at this point I no longer think that batching
Yeah, first I would check if using |
@tomekzaw I've tested with direct props and we only get 10% of an improvement, not 40%, unfortunately. Looks, like all the CPU losses we experience are coming from JSI/JNI. And yes, I'm using NDK 21.4.7075529, not 24. At least, now we know how profitable avoiding matrix calculation is. |
@tomekzaw Looks like our memory is leaking like crazy. Will look into this tomorrow. It's too early for a PR. memory-is-leaking.movAlso, after observing an app for ~10 minutes, we start seeing stutters and frame drops again. |
Might be, but eventually memory usage substantially drops down. For the record, there were some memory leaks in the old implementation of shared values but after shareable rewrite they all should be gone. It all depends on when GC decides to kick in. Note that you can trigger GC maunally by calling
There might be some problem with long-running animations (which I also observed on BokehExample left running while I was making tea) but @piaskowyk is doing his best fixing it. |
@tomekzaw There is no clarity on what exactly is leaking (JS? Java?). GC is running every 10 seconds, but not clearing much. Likely, this is happening because of...
App starts at around 72 Mb RAM and gradually reaches 512 Mb and even more. Yes, there is a significant GC drop in the end of this video I attached. The problem is we do not control this and this is not guaranteed. As for stutters and frame drops after certain period, this might or might not be related to GC.
Can you please provide a reference? Is there an issue? |
I don't have any code pointers or commits for you but @piaskowyk says that it might be related to the GC.
Exactly. Can you please check if calling |
@piaskowyk @tomekzaw I think it'll be better to split this into a separate issue, so I created it here. I don't think this is related to GC, since triggering GC manually doesn't remove stutters. Also, if I trigger GC manually, it drops RAM consumption down to 300 Mb only, while application has started at around 70 Mb. Looks like we're not cleaning some lookup tables... |
Turns out that App.tsximport { StyleSheet, View } from 'react-native';
import React, { useEffect } from 'react';
import Animated, {
useAnimatedStyle,
useSharedValue,
withRepeat,
withTiming,
} from 'react-native-reanimated';
export default function App() {
const sv = useSharedValue(0);
useEffect(() => {
sv.value = 0;
sv.value = withRepeat(withTiming(1), -1, true);
}, []);
const animatedStyle = useAnimatedStyle(() => {
return {
translateX: sv.value * 50,
translateY: sv.value * 50,
};
});
return (
<View style={styles.container}>
<Animated.View style={[styles.box, animatedStyle]} />
</View>
);
}
const styles = StyleSheet.create({
container: {
flex: 1,
alignItems: 'center',
justifyContent: 'center',
},
box: {
width: 150,
height: 150,
backgroundColor: 'red',
},
}); It looks like So as a workaround for your case, you can have two worklets (one returning an object with A more general solution would be to add support for |
@tomekzaw I've made this test a couple of days ago. Kindly find my message below:
|
@tomekzaw I'm wondering: do we really need to use ReadableMap/WritableMap and depend on JSI? Why not simply use JavaOnlyMap, for instance? As for this solution you're offering, unfortunately, we cannot replace this in all places. E.g. we also use element rotation and anchor point shifting. So, there are multiple trabslateXY used inside a transform prop. |
@sergeymorkovkin Here's the commit I talked about yesterday: a6a1367. It adds batching UI prop updates on the JS side and consequently optimizes the number of JNI calls thanks to using only a single ReadableMap for all component updates. Please check if it helps in your case (note that the patch is still a PoC so probably it needs some polishing before submitting a PR). Thanks to batching, the average execution time of When it comes to MapBuffer, I still don't think this is a viable alternative. Obviously, it saves some CPU time and memory as it uses
In our case, the buffer is neither large nor long-lived (perhaps a better alternative would be to allocate a single DirectByteBuffer, re-use it between animation frames and use BSON instead). There are also some major limitations. First, MapBuffer only supports integer keys. React Native keeps a hard-coded list of constants that represent each property, see here. In our use-case, we need more flexibility. Finally, when we update props, we call methods from React Native that accepts With the batching optimization, the conversion itself takes only about 15% of the execution time which is approximately two times faster than what we've started with. I've discussed it with @lukmccall (who is an Android & JNI expert) and honestly at this point we don't think it's worth trying to optimize it further, especially taking into account that the other 50% of the execution time originates from the JS part of Reanimated (useAnimatedStyle updater, valueSetter step) and most likely this is what we should focus on in the first place. We also observed an increased activity of GC due to a high number of allocations and deallocations which also needs to be taken care of. My next step is to profile the JS part of Reanimated using JSC or Hermes profiler. I really hope to find something interesting there that can be optimized. |
Here are the results of profiling Reanimated runtime using Hermes profiler via Chrome DevTools. Here are the results are from OPPO A16 running Example app in debug mode: Thanks to this, I was able to identify the following bottlenecks:
|
@tomekzaw styleDiff is required to avoid calling _updateProps when there are no actual changes. |
Hey @sergeymorkovkin, just out of curiosity I've implemented another optimization, see a13f33d. This time, along with batching prop updates and calling
The optimization is to have just one map where the keys consist of the view tag and the prop name (like
This significantly optimizes the number of JNI calls since previously there were 2 calls ( Note that currently the PoC implementation only supports values of double type (adding support for integers or strings should be easy) and also doesn't optimize for nested fields (like The results are quite promising, see the attached table. Also, the more animated components, the better speedup.
|
Any update for this ? |
@bolan9999 Yep, first we need to finish #4503 which adds JSI-based batching and then we'll be able to go deeper with JNI-level batching to make Android faster. |
Do you have a Benchmarks for this update totally? In my scene,I'm trying to rewrite a ScrollView and Recyclable list with Reanimated and GestureHandler. But when everything is done, the Benchmarks of it is a little bad. What's worse more is frame drops randomly on Android when scrolling 3 min or longer. But it works well when I stoped scrolling and wait for a few seconds. |
@bolan9999 Do you memoize gestures? |
@tomekzaw What do you mean it is experimental. And I will open source and take some Benchmarks latter. But muti-items animations with Reanimated causes frame drops is real. The only thing I can do is reducing the count of items and worklet calls. |
@bolan9999 If you use Gesture Handler v2 API ( const tap = useMemo(() => {
return Gesture.Tap.onStart(/* ... */);
}, []);
<GestureDetector gesture={tap}> |
@tomekzaw It is not re-rendered in my scene at all. And I don't use |
I'm pretty sure worklet calls or |
Getting the same issue animating const itemSize = 560
const offset = itemSize / 2
const confettiTranslateY = useSharedValue(0);
const imageTranslateY = useSharedValue(0);
...
useEffect(() => {
confettiTranslateY.value = withRepeat(
withSpring(-50, {
damping: 8,
}),
-1,
true,
);
imageTranslateY.value = withRepeat(
withTiming(-10 * itemSize + offset, {
duration: 10_000,
easing: Easing.inOut(Easing.exp),
}),
-1,
true,
);
}, [])
...
const confettiStyle = useAnimatedStyle(
() => ({ transform: [{ translateY: confettiTranslateY.value }] }),
[confettiTranslateY],
);
const imageStyle = useAnimatedStyle(
() => ({ transform: [{ translateY: imageTranslateY.value }] }),
[imageTranslateY],
);
... |
I'm experiencing the same issue. Getting 60 fps on all iOS devices (see this post), but it's constant 16 fps on both JS and UI thread on my low end Android device (Samsung A11) even when there are no re-renders and nothing happening on the JS thread. |
Description
On the lower-end Android devices (worth $100-150) Reanimated 2/3 can only handle maximum 30 simultaneous elements per frame. Animating more than that results in serious frame drops. In my tests I was using OPPO A16.
It's been tested with or without new architecture and Fabric.
It's been tested with or without @3.0.0-rc8 shared variables performance update.
It's been tested with or without requestAnimationFrame optimization.
I'm still researching this issue and it boils down to a few problems:
Inside JSI runtime the requestAnimationFrame function calls are done synchronously via JSI, while every useAnimatedStyle instance does it's own call to rAF. It's been addressed by @kmagiera here.
Every updateProps call is too heavy and takes ~2ms in debug build and ~0.3-0.5ms in release build (fabric/paper respectively). The larger generated style object - the longer it takes, since JSI does argument conversion on a CPU.
THIS IS AN ASSUMPTION. I've noticed react-navigation mappers somehow get involved into a user-defined animation. Need more time to dig into this.
VIDEO CRASHES.md
Steps to reproduce
Actual result: FPS is somewhere between 30-40 FPS for 100 elements and app crashes quickly.
Expected result: should be able to animate 100-150 elements at 60 FPS with no crashes.
Snack or a link to a repository
https://github.com/sergeymorkovkin/test-reanimated
Reanimated version
3.0.0-rc8
React Native version
0.70.6
Platforms
Android
JavaScript runtime
Hermes
Workflow
React Native (without Expo)
Architecture
None
Build type
Release mode
Device
Real device
Device model
OPPO A16
Acknowledgements
Yes
The text was updated successfully, but these errors were encountered: