Google’s MediaPipe unveils experimental LLM inference API for developers

Google's open source framework for ML, MediaPipe, launched a new LLM inference API yesterday (7 March).

The new experimental API allows LLMs to run entirely on-device across platforms, overcoming the memory and computing challenges associated with LLMs.

TensorFlow Lite and MediaPipe have streamlined on-device ML for web developers since 2017, with MediaPipe supporting complete ML pipelines since 2019, Google said.

The new API supports web, Android and iOS and initially includes four openly available LLMs: Gemma, Phi 2, Falcon and Stable LM.

Android users can access the MediaPipe LLM inference API for experimental and research use, while production applications with LLMs on Android can use the Gemini API or Gemini Nano.

Google said the new API simplifies integration for web developers, allowing them to prototype and test openly available LLM on-device. It will also provide metrics like prefill and decode speed to measure an LLM's performance.

The API can be used via web demo or by building sample demo apps using the provided SDKs for web, Android or iOS.

MediaPipe plans to expand the LLM inference API, introducing more platforms, models, conversion tools, on-device components, high-level tasks and further optimisations in 2024.

In a survey conducted by GlobalData in Q4 of 2023, around 54% of businesses answered that AI had already begun to tangibly disrupt their industry, with a further 56% of respondents believing that AI would live up to all of its promises. 

GenAI is also predicted to be the fastest growing segment of AI according to GlobalData’s research, with revenues expected to grow from $1.8bn to $33bn between 2022 and 2027.  

Explore our market-leading Intelligence Centers

Still looking?

Search companies, themes, reports, as well as actionable data & insights spanning 22 global industries

Explorer

Access more premium companies when you subscribe to Explorer