Chrome Extension AI: How to Use Built-in On-Device and Cloud AI APIs in 2026
The AI landscape for developers just shifted. Android introduced a hybrid inference API that lets apps switch seamlessly between on-device and cloud AI depending on context. Chrome did the same — quietly, and in a way that directly benefits Chrome extension AI API developers.
As of Chrome 138+, Gemini Nano runs locally in the browser. No API key. No server round-trip. No per-token billing. Your extension can call natural language AI directly from a content script or background service worker.
This guide covers everything: what Chrome’s built-in AI APIs are, how they compare to cloud inference, when to use each, and practical code you can ship today.
What Is Chrome’s Built-in On-Device AI?
Chrome ships with Gemini Nano — a small but capable language model that runs entirely on the user’s device using the GPU and NPU. It powers a family of task-specific APIs exposed to extensions and web apps through the window.ai namespace (and equivalent APIs accessible from extension contexts).
The model never leaves the device. Processing happens locally. That means:
- Zero API cost per request
- No data sent to external servers
- Works offline
- Near-zero latency (no network round-trip)
The tradeoff is capability. Gemini Nano is not GPT-4o or Gemini Ultra. It is optimized for common, well-defined tasks — summarizing, translating, rewriting, classifying — not open-ended reasoning or complex multi-step tasks.
Chrome exposes five built-in AI APIs:
| API | Purpose |
|---|---|
| Prompt API | General-purpose language model prompting |
| Summarizer API | Condensing long text into summaries |
| Translator API | Language translation between 12+ languages |
| Language Detection API | Detecting the language of a text string |
| Writer / Rewriter API | Generating or rephrasing content |
Let us walk through each.
The 5 Chrome Built-in AI APIs for Extension Developers
1. Prompt API — General-Purpose On-Device Inference
The Prompt API is the most flexible of the five. It lets you send free-form prompts to Gemini Nano and receive a response — just like calling an LLM API, but entirely on-device.
// Check availability before using
const { available } = await window.ai.languageModel.capabilities();
if (available === 'readily') {
const session = await window.ai.languageModel.create({
systemPrompt: 'You are a helpful assistant for Chrome extension users.'
});
const result = await session.prompt(
'Summarize the key points from this support ticket: ' + ticketText
);
console.log(result);
session.destroy(); // Free memory when done
}Best for: Classification, intent detection, Q&A, custom instructions that do not fit a specialized API.
Availability check is mandatory. Not all devices have Gemini Nano ready — older hardware or users who have never triggered a download may return 'after-download' or 'no'. Always handle gracefully.
2. Summarizer API — Condense Any Text
The Summarizer API is purpose-built for distillation. Pass it a block of text, get back a summary in the format you choose: key-points, tl;dr, teaser, or headline.
const summarizerCapabilities = await window.ai.summarizer.capabilities();
if (summarizerCapabilities.available !== 'no') {
const summarizer = await window.ai.summarizer.create({
type: 'key-points',
format: 'markdown',
length: 'short'
});
// Trigger download if needed
if (summarizerCapabilities.available === 'after-download') {
await summarizer.ready;
}
const summary = await summarizer.summarize(articleText);
console.log(summary);
summarizer.destroy();
}Best for: Article digests, email triage, meeting notes, content previews in sidepanels.
3. Translator API — Multilingual Without an API Key
The Translator API handles translation between a growing list of language pairs. Unlike Google Translate or DeepL, there is no cost and no network dependency after the language pack is downloaded.
const languagePair = { sourceLanguage: 'en', targetLanguage: 'es' };
const translatorCapabilities = await translation.canTranslate(languagePair);
if (translatorCapabilities !== 'no') {
const translator = await translation.createTranslator(languagePair);
if (translatorCapabilities === 'after-download') {
await translator.ready;
}
const translated = await translator.translate('Hello, how can I help you?');
console.log(translated); // "Hola, ¿cómo puedo ayudarte?"
}Best for: Real-time sidepanel translation, multilingual form helpers, customer support extensions.
4. Language Detection API — Know What Language You Are Working With
Before translating or routing content, you need to know what language it is in. The Language Detection API gives you that with confidence scores — entirely on-device.
const detector = await translation.createDetector();
const results = await detector.detect(userInputText);
// results is an array of { detectedLanguage, confidence }
const topResult = results[0];
console.log(topResult.detectedLanguage); // e.g. "fr"
console.log(topResult.confidence); // e.g. 0.97Best for: Auto-routing to the correct translation pair, tagging content by language, skipping translation when unnecessary.
5. Writer and Rewriter APIs — Generate and Refine Content
The Writer API generates new content from a prompt and context. The Rewriter API takes existing content and transforms it — changing tone, length, or formality — without rewriting intent.
// Writer: generate from scratch
const writer = await window.ai.writer.create({
tone: 'formal',
length: 'short',
format: 'plain-text'
});
const reply = await writer.write(
'Write a professional response to a refund request',
{ context: 'Customer says: "I want a refund for order #1234"' }
);
// Rewriter: transform existing content
const rewriter = await window.ai.rewriter.create({
tone: 'more-casual',
length: 'shorter'
});
const simplified = await rewriter.rewrite(technicalDocText);Best for: Smart reply suggestions, tone adjustment for emails, content rephrasing for accessibility.
On-Device vs Cloud AI: The Full Comparison
This is the core decision every extension developer faces. Here is the unvarnished comparison:
| Built-in AI (On-Device) | Cloud API (OpenAI, Gemini, Claude) | |
|---|---|---|
| Latency | ~50–200ms (local GPU) | 500ms–3s+ (network) |
| Privacy | Data never leaves device | Data sent to provider servers |
| Cost | Free | Per-token billing |
| Offline Support | Full (after model download) | None |
| Model Quality | Good for simple tasks | Excellent for complex reasoning |
| Context Window | Limited (~4K tokens typical) | Large (8K–200K+ tokens) |
| Availability | Chrome 138+, compatible hardware | Any browser, any device |
| Streaming | Supported | Supported |
| Fine-tuning | Not available | Available (some providers) |
| Setup Complexity | No keys, no backend | API keys, CORS handling, rate limits |
Neither option is universally better. The right answer depends on what you are building.
When to Use Built-in AI vs External APIs
Use Chrome’s built-in AI when:
- The task is well-defined. Summarization, translation, classification, and rewriting are exactly what these APIs are optimized for. Do not use the cloud for what Gemini Nano does well locally.
- Privacy is a selling point. If your extension handles sensitive documents, emails, or health data, on-device inference is a hard requirement, not a nice-to-have. Users and enterprise buyers will ask.
- Your users include power users with no internet. Researchers, writers, and analysts who use extensions offline need AI that works offline.
- You have zero infrastructure budget. Building a side project or MVP? On-device AI means no API billing, no key management, no backend proxy.
- Latency matters more than accuracy. Real-time sidepanel suggestions, autocomplete, or live translation require sub-second responses that cloud APIs cannot reliably deliver.
Use cloud APIs when:
- You need high-quality reasoning. Complex analysis, multi-step instructions, code generation, or nuanced content creation require a larger model. Gemini Nano will not out-perform GPT-4o or Gemini 1.5 Pro on hard tasks.
- You need large context windows. Analyzing a full PDF, summarizing an entire repository, or processing long legal documents requires more than Gemini Nano’s context limit.
- Your users are on older hardware or non-Chrome browsers. Built-in AI requires Chrome 138+ and compatible GPU/NPU. Firefox and Safari users get nothing.
- You need model-specific capabilities. Vision inputs, structured JSON output with complex schemas, or tool use at scale require cloud providers.
- Consistency and auditability matter. Cloud providers offer versioned models, audit logs, and compliance certifications that on-device inference cannot match.
The hybrid approach (recommended)
The smartest architecture mirrors what Android’s hybrid inference API does: attempt on-device first, fall back to cloud when necessary.
async function smartSummarize(text) {
// Try on-device first
const capabilities = await window.ai.summarizer.capabilities();
if (capabilities.available === 'readily') {
const summarizer = await window.ai.summarizer.create({
type: 'tl;dr',
length: 'short'
});
const result = await summarizer.summarize(text);
summarizer.destroy();
return { result, source: 'on-device' };
}
// Fall back to cloud
const response = await fetch('https://api.openai.com/v1/chat/completions', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': `Bearer ${apiKey}`
},
body: JSON.stringify({
model: 'gpt-4o-mini',
messages: [
{ role: 'system', content: 'Summarize the following text in 2-3 sentences.' },
{ role: 'user', content: text }
]
})
});
const data = await response.json();
return { result: data.choices[0].message.content, source: 'cloud' };
}This pattern gives users the best of both: fast, private, free inference when available — with reliable fallback when not.
Practical Use Cases for On-Device AI in Extensions
Smart Email Assistant
A Gmail or Outlook extension that reads the current email thread and suggests a reply — entirely on-device. Use the Prompt API for intent detection, Writer API for draft generation, and Rewriter API for tone adjustment.
No email content ever leaves the browser. That is a feature, not a footnote.
Content Summarizer Sidepanel
A reading extension that adds a sidepanel to any article page. When opened, it calls the Summarizer API with the page’s main content and displays a key-points list in under 200ms. Works offline. Zero cost at scale.
// content script
chrome.runtime.onMessage.addListener(async (msg, sender, sendResponse) => {
if (msg.type === 'summarize') {
const articleText = document.querySelector('article')?.innerText || document.body.innerText;
const capabilities = await window.ai.summarizer.capabilities();
if (capabilities.available !== 'no') {
const summarizer = await window.ai.summarizer.create({
type: 'key-points',
format: 'markdown',
length: 'medium'
});
if (capabilities.available === 'after-download') {
await summarizer.ready;
}
const summary = await summarizer.summarize(articleText.slice(0, 8000));
summarizer.destroy();
sendResponse({ summary });
}
}
return true; // Keep message channel open for async response
});Real-Time Translator Sidebar
A sidepanel extension that translates selected text as the user highlights it. The Language Detection API identifies the source language automatically. The Translator API handles conversion. The result appears in the sidebar within milliseconds.
Add a cloud fallback for unsupported language pairs and you have a production-grade translation tool built entirely with browser-native APIs.
Manifest V3 Compatibility Notes
All five Chrome built-in AI APIs are compatible with Manifest V3. A few things to keep in mind:
- Service workers cannot use streaming responses the same way as pages. Use
promptStreaming()in content scripts or sidepanels, not background service workers. - Session objects are not persistent. Service workers are terminated when idle. Create new sessions on demand; do not try to persist
window.ai.languageModelsessions across wake cycles. - No special permissions required for built-in AI APIs. You do not need to declare any new permissions in
manifest.jsonto use these APIs. - Content Security Policy. Built-in AI APIs make no external network requests, so they are fully compatible with strict CSP settings.
Getting Started Today
Chrome’s built-in AI APIs are available in Chrome 138+ with the “Prompt API for Chrome Extensions” flag enabled for development. For production, Summarizer, Translator, and Language Detection are stable. The Prompt API, Writer, and Rewriter are in origin trial as of early 2026.
Check the Chrome AI APIs origin trial status for the latest availability.
To test your extension’s AI features across different device capabilities, use ExtensionBooster’s free developer tools — including a Chrome extension AI compatibility checker and automated store listing analyzer that helps you communicate your extension’s AI features clearly to users.
The era of “AI tax” — where every inference request burns API budget — is ending for common browser tasks. Start with on-device. Fall back to cloud only when you must.
Related Guides
Share this article
Build better extensions with free tools
Icon generator, MV3 converter, review exporter, and more — no signup needed.
Related Articles
Building Accessible Chrome Extensions: Keyboard, Screen Reader, and WCAG Compliance
26% of US adults have disabilities. Make your Chrome extension accessible with focus traps, ARIA, keyboard nav, and WCAG 2.1 AA compliance.
I Built the Same Chrome Extension With 5 Different Frameworks. Here's What Actually Happened.
WXT vs Plasmo vs CRXJS vs Extension.js vs Bedframe. Real benchmarks, honest opinions, and the framework with 12K stars that's quietly dying.
5 Best Email Marketing Services to Grow Your Chrome Extension (2026)
Compare the top email marketing platforms for SaaS and Chrome extension developers. MailerLite, Mailchimp, Brevo, ActiveCampaign, and Drip reviewed.