-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SwiftFormer meets Android #14
Comments
Update. The results were so discouraging that I had to benchmark |
Thank you for the update. Could you kindly conduct benchmark tests for both (MobileViT or MobileViT2×1) in addition to EfficientFormer_L1? I understand that we may not achieve the exact performance on the S23 (Ultra) as observed on the iPhone 14 (Pro Max) due to variations in hardware. Please note that EfficientFormer_L1 has demonstrated comparable speed to SwiftFormer_L1 on the iPhone 14 (Pro Max). If you manage to replicate EfficientFormer_L1 on the S23 Ultra with a runtime of 2.63 msec, it suggests that the ANE of the iPhone 14 Pro Max is faster than the GPU or ANE on the S23 Ultra. If EfficientFormer_L1 significantly outperforms SwiftFormer_L1, it may indicate that the activations, normalization, and certain layers of SwiftFormer_L1 are not optimized for the S23 Ultra. This could mean that SwiftFormer requires additional optimization for optimal performance on this hardware. I would appreciate your thoughts on this proposed plan. Thank you. |
Agree, SwiftFormer_L1 (PytTorch implementation) + QNN 2.16 (+ my way of porting) may be leaving room for optimization. 😉 I will leave it to someone else as:
Perhaps add/edit your message with the relevant links for those arch 😊 |
I can do that soon and will update you 😄 I would be grateful if you could provide details on the steps or requirements involved in measuring the inference time on the S23 Ultra. For iOS, Apple has introduced a valuable feature in their IDE (Xcode 14) that allows for the measurement of prediction time, load time, and compilation time. Could you please share this information or update the forked repository with these specific details on Android? I am following your repo and already checked the export file. |
There are multiple ways to port a ML model on Android 😊. Feel free to rename the issue accordingly. I wrote it in that way for marketing reasons 😉 My approach is specific to Qualcomm hardware using QNN.
I'm preparing a tutorial for other folks in my org. I will share the slides later in Q1/Q2. |
Attaching the latency results,
The JSON files were generated with an internal/private tool. However, QNN docs provide all the info to parse the binary with the profiling results from step 3. The TXT-file was generated by a tiny wrapper digesting the JSON. report_ops.txt |
(perhaps) Good news, the latency of the block that I'm interested in improving got a speed-up of 1.27x by using QNN >= 2.17 With enough ⭐s on my fork, I may be persuaded to benchmark SwiftFormer L1 😊 🤣 |
That's great! 🚀 If you benchmarked SwiftFormer models (Let's say L1), we can do a pull request and I will add you as a contributor to the main repo with a special shoutout in the acknowledgments 👀. Isn't it a good deal? 🤣 |
Push latency performance of SwiftFormerL1 with QNN 2.17 & 2.18. Improvement is as much 1.16x
Done with 80% of my duties. Awaiting instruction for the 20% & collecting the brownie points mentioned earlier 🍪 |
You have my word on it 💯. Here we go! Please create a pull request to the readme file of the main repo with the following change: Create a new sub-section under "Latency Measurement" named as SwiftFormer meets Android (I liked the name). With this section, you can add the two tables (SwiftFormer Encoder & SwiftFormer-L1) for the latency measurements with the variants of QNN (Feel free to add the scripts as well). Then, I will check & merge the pull request and you will automatically added as a contributor! 🚀. Following this, I'll update the acknowledgment, earning you a well-deserved second brownie 🍪 |
Community drive contributions: SwiftFormer meets Android. Qualcomm S8G2 DSP/HTP hardware, via Qualcomm tooling (QNN). Details in Amshaker#14. Work done by @3scorciav . Refer to his fork for details.
Thanks for merging 🥰. Let's keep the issue for 6-12 months in case someone else is interested in improving runtime performance, or exploring other porting avenue for Android 😉 |
As mentioned in #13 , I forked the project to bring SwiftFormer onto Android (in Qualcomm hardware).
As of today, the performance of a single block as it's not encouraging under 2.2 msec. Gotten in S23 Utral S8G2 with QNN 2.16, details here
The text was updated successfully, but these errors were encountered: