Viacheslav Romanov AI and Japan Choosing the Best AI Framework for Real-Time Apps

Ok… you’ve decided to build a real-time AI app. What framework do you choose for it? The answer is not trivial at all. Inference performance is matter more than you think for real-time solutions especially on mobile devices. The market is filled with various solutions which are not compatible with each other. When it comes to inference, performance is crucial and mainly affects your UX. Modern neural networks can become very convoluted with hundreds of layers. They are mainly trained on NVIDIA GPUs where performance is not an issue. For on-device use, this is not the case. Try to deploy these models to your fancy iPhone and you’ll get a performance hit. That’s why we still don’t get very nice AI experience on mobile yet even with all NPU accelerators in place.

I’ve decided to make my on research… The demo in my video on YT below. This article is TLDR of it.

We presented with few main players. I’ll start with a native Apple solution.

CoreML

Is a native Apple’s framework for ML inference. I highlighted the word inference because I’ve never seen it’s used for model training. For training the CPU or MPS (metal performance shaders) are more common means to move forward.

OpenCV DNN

DNN stands for Deep Neural Networks btw. Though it’s mainly used for inference. Easy to use and intuitive APIs. The performance still not that great for CPU inference. Don’t have out of the box CoreML acceleration. Rigid, hard to extend, slow progress.

ExecuTorch

The way to go. It is statically linked and offers an easy-to-use API. It supports many on-device backends including CoreML. It also supports on-device ML training, though I haven’t tried this yet. Lots of extensions to extend API.

Onnxruntime

MS’s runtime. Bulky and unintuitive. Performance is mediocre. Though new G-Api maybe can save them, which has more Graph oriented design and support native acceleration backends.

Architectural Dilemma

Strange that no one sees a problem with modern CPU architecture. Seems like delegating workloads to different compute units doesn’t work that great and should be changed.

Even you have NPU on your device it doesn’t guarantee that you’ll get an instance improvement. Because compare to CPU and even GPU to get an acceleration you need to utilize special frameworks and APIs. And device itself is hidden behind many layers of abstractions to the point where performance is neglected. If you get 2-5x you are lucky but for normal ML inference we need at least 10-100x more. It’s one magnitude or more than that to get smooth like butter real-time inference. To achieve this, the architecture should be embedded inside the CPU. It should reach the level of simple instructions. This way, you don’t need special dances to get an instant improvement. It’s unclear why we need NPUs at all. GPUs are more performant and universal. They can be used for both training and inference.

Conclusion

🍎 CoreML is good if you only target Apple’s ecosystem. Performance is good but still too many gaps and targeting simple models. Though the market is too big to ignore. I can give Apple a credit for relatively good support of conversion from PyTorch.

🤖 OpenCV feels like a Chinese stalls of different solutions brought together by a skillful snesei. DNN Api is user friendly. Still performance and Apple’s ecosystem support could’ve been better.

👍🏻 ExecuTorch is great in fact, though you need to fine-tune your models to work better on Apple’s devices. Compilation is not that trivial. There are no release packages available, so you need to compile it by hand. Additionally, you need to get familiar with the code. If you targeting many platforms this is your ultimate choice.

👎🏻 Onnxruntime is an outsider with mediocre performance and vague APIs. Don’t recommend it to anyone.

Each framework performs differently. Each has their own pros and cons and you need to know all of them to be productive. Doesn’t look sustainable in fact. There has to be some universal approach. Though ExecuTorch has potential.

Choosing the Best AI Framework for Real-Time Apps