Getting started
The API allows you to make the best use of the device's hardware to run local AI in the browser in the most performant way possible. It's based around the Chrome built-in AI APIs, but adds support for new features such as custom/HuggingFace models, grammar schemas, JSON output, LoRa Adapters, embeddings, and a fallback to a self-hosted or public server for lower-powered devices.
Take a look at the Feature comparison table for each implementation
AiBrow extension using llama.cpp natively
Using the AiBrow extension gives the best on-device performance with the broadest feature-set. It's a browser extension that leverages the powerful llama.cpp and can give great performance on all kinds of desktop computers either leveraging the GPU or CPU. Downloaded models are stored in a common repository meaning models only need to be downloaded once. You can use models provided by AiBrow, or any GGUF model hosted on HuggingFace.
AiBrow on WebGPU
WebGPU provides a good middle-ground for performance and feature set, but it comes with some memory usage restrictions and performance overheads. If you only need to use small models or want to provide a fallback for when the extension isn't installed this can provide a great solution. Under the hood it uses transformers.js from HuggingFace. Models are downloaded through an AiBrow frame which means models only need to be downloaded once. You can use models provided by AiBrow, or any ONNX model hosted on HuggingFace.
Chrome built-in AI
The Chrome built-in AI is a great option for simple tasks such as summarization, writing etc. It has a smaller feature-set compared to the AiBrow extension and WebGPU and has reasonable on-device performance.
Last updated