OasisHost is the interface to webAI’s local AI runtime. It lets your app run language model inference directly on the user’s hardware — no cloud, no API keys, no network requests. Your app acquires the runtime, sends prompts, and receives streamed token responses.
Quick start
Here’s the minimal flow to get AI working in your app:
import { getOasisHost } from './webai';
async function askAI(prompt) {
const host = getOasisHost();
if (!host) throw new Error('AI not available outside webAI.');
const release = await host.acquire({ warmRuntime: true });
try {
const response = await host.request(prompt, {
systemPrompt: 'You are a helpful assistant.',
maxTokens: 2048,
temperature: 0.7,
onToken: (token) => process.stdout.write(token),
});
return response;
} finally {
if (release) release();
}
}
Checking runtime status
Before sending requests, check whether a model is loaded and ready. OasisHost exposes a getStatus() method that returns the current state of the AI runtime.
function getOasisState() {
const host = getOasisHost();
if (!host?.getStatus) return 'waiting';
const status = host.getStatus();
if (status?.lastModel) return 'ready';
if (status?.loadingModel || status?.isGenerating) return 'loading';
return 'waiting';
}
The three states your app should handle:
| State | Meaning | Recommended UX |
|---|
ready | A model is loaded and idle | Enable AI features, show a green indicator |
loading | A model is loading or actively generating | Disable new requests, show a loading state |
waiting | No model loaded or OasisHost unavailable | Show a “no model” notice, disable AI features |
Full status object
The getStatus() return object includes additional fields for advanced use cases:
| Field | Type | Description |
|---|
hasRuntime | boolean | Whether the inference runtime is initialized |
lastModel | string | null | ID of the currently loaded model |
loadingModel | string | null | ID of a model currently being loaded |
isGenerating | boolean | Whether the model is actively generating tokens |
refCount | number | Number of active runtime consumers |
modulesLoaded | boolean | Whether all required modules are loaded |
deviceProfile | object | null | Hardware capabilities (GPU, memory, backend support) |
backendSelection | object | null | Which inference backend was selected and why |
modelSelection | object | null | Which model was selected for the current backend |
bootstrapPhase | string | Current boot phase of the runtime |
nativeServer | object | null | Status of the native inference server (MLX/llama.cpp) |
For most apps, checking lastModel, loadingModel, and isGenerating is sufficient. The additional fields are useful for diagnostic tools or apps that need to display detailed runtime info.
Polling for status changes
The AI runtime state can change at any time — a user might load a new model, or a generation might finish. Poll getStatus() on an interval to keep your UI in sync.
const [oasisState, setOasisState] = useState('waiting');
useEffect(() => {
const probe = () => {
const host = window.OasisHost ?? window.parent?.OasisHost;
if (!host?.getStatus) return 'waiting';
const s = host.getStatus();
if (s?.lastModel) return 'ready';
if (s?.loadingModel || s?.isGenerating) return 'loading';
return 'waiting';
};
setOasisState(probe());
const id = setInterval(() => setOasisState(probe()), 1200);
return () => clearInterval(id);
}, []);
const oasisState = ref('waiting');
let oasisInterval = null;
onMounted(() => {
const probe = () => {
const host = window.OasisHost ?? window.parent?.OasisHost;
if (!host?.getStatus) return 'waiting';
const s = host.getStatus();
if (s?.lastModel) return 'ready';
if (s?.loadingModel || s?.isGenerating) return 'loading';
return 'waiting';
};
oasisState.value = probe();
oasisInterval = setInterval(() => { oasisState.value = probe(); }, 1200);
});
onUnmounted(() => clearInterval(oasisInterval));
A 1200ms polling interval strikes a good balance between responsiveness and performance. Avoid polling faster than 500ms.
Acquiring the runtime
Before sending a request, your app must acquire exclusive access to the AI runtime. This prevents multiple apps from competing for the same GPU resources.
const release = await host.acquire({ warmRuntime: true });
Parameters:
| Parameter | Type | Description |
|---|
warmRuntime | boolean | When true, pre-warms the inference runtime for faster first response |
Returns: A release function. Call it when you’re done with the runtime to let other apps use it.
Always call the release function when your request completes — even if it fails. Use a try/finally block to guarantee cleanup.
Streaming completions
The core of OasisHost is the request() method. It sends a prompt to the loaded model and streams the response back token-by-token.
const fullText = await host.request(prompt, {
systemPrompt: 'You are a helpful assistant.',
maxTokens: 2048,
temperature: 0.7,
onToken: (token) => {
// Called for each token as it's generated
// Accumulate tokens yourself for UI updates
},
});
Parameters
| Parameter | Type | Default | Description |
|---|
prompt | string | — | The user’s input prompt |
systemPrompt | string | '' | Instructions that guide the model’s behavior |
maxTokens | number | 2048 | Maximum number of tokens to generate |
temperature | number | 0.7 | Controls randomness. Lower = more deterministic, higher = more creative |
onToken | (token: string) => void | — | Callback invoked with each generated token for real-time streaming |
persona | string | — | Optional persona type to use for this request |
appId | string | — | Your app’s identifier, used for persona permission checks |
onPersonaStart | (name: string) => void | — | Called when a persona begins generating (useful in multi-persona mode) |
onPersonaEnd | (name: string) => void | — | Called when a persona finishes generating |
Return value
request() returns a Promise<string> that resolves to the full accumulated response text once generation is complete.
Complete example
Here’s a reusable helper module that wraps the full acquire → request → release lifecycle:
// src/webai.js
export const getOasisHost = () =>
window.OasisHost ?? window.parent?.OasisHost ?? null;
export async function streamCompletion(prompt, systemPrompt, onToken) {
const host = getOasisHost();
if (!host) throw new Error('Oasis AI is not available in this environment.');
const release = await host.acquire({ warmRuntime: true });
try {
return await host.request(prompt, {
systemPrompt: systemPrompt ?? '',
maxTokens: 2048,
temperature: 0.7,
onToken,
});
} finally {
if (release) release();
}
}
And a React component that uses it:
import { useState, useEffect } from 'react';
import { getOasisState, streamCompletion } from './webai';
function AIChat() {
const [oasisState, setOasisState] = useState('waiting');
const [prompt, setPrompt] = useState('');
const [output, setOutput] = useState('');
const [isGenerating, setIsGenerating] = useState(false);
useEffect(() => {
setOasisState(getOasisState());
const id = setInterval(() => setOasisState(getOasisState()), 1200);
return () => clearInterval(id);
}, []);
async function handleRun() {
if (!prompt.trim() || isGenerating) return;
setIsGenerating(true);
setOutput('');
try {
await streamCompletion(
prompt,
'You are a helpful assistant.',
(token) => setOutput((prev) => prev + token)
);
} catch (err) {
setOutput('Error: ' + err.message);
} finally {
setIsGenerating(false);
}
}
return (
<div>
<textarea
value={prompt}
onChange={(e) => setPrompt(e.target.value)}
placeholder="Ask something..."
/>
<button onClick={handleRun} disabled={isGenerating || oasisState !== 'ready'}>
{isGenerating ? 'Generating...' : 'Run'}
</button>
{output && <pre>{output}</pre>}
</div>
);
}
Chat memory
OasisHost provides per-app chat memory that persists across sessions. Memory is stored locally and can be injected into prompts automatically.
loadAppChatHistory(appId)
Load the persisted chat memory for an app. Returns a structured object containing a rolling summary, learned preferences, recent conversation turns, and tool outcomes.
clearAppChatHistory(appId)
Clear all chat memory for an app.
Memory context in requests
When calling request(), two options control memory behavior:
| Option | Type | Default | Description |
|---|
memoryContext | boolean | 'auto' | 'auto' | 'auto' follows the shell’s Memory toggle. true always injects memory. false skips memory for this request. |
chatSession | string | — | Session ID for isolating memory between independent conversations in the same app. |
Chat history is always auto-saved when appId is present in the request, regardless of the memoryContext setting.
Use chatSession when your app has multiple independent chat threads. Without it, memory is filtered by persona ID.
Personas
OasisHost exposes persona management methods for apps that need to work with custom AI behaviors.
getPersonas()
Returns all available personas as an array.
getPersonasWithPermissions(appId?)
Returns personas that the specified app has permission to use, keyed by specialty type.
const personas = host.getPersonasWithPermissions('my-app');
// { "research": { id: "...", name: "Research Assistant", type: "research" } }
getActivePersona(appId?, personaType?)
Returns the currently active persona for a given app and type, or null.
loadPersona(personaType, options?)
Loads a persona’s model into memory. Throws if no persona is found for the given type.
await host.loadPersona('coding', {
appId: 'my-app',
onProgress: (p) => console.log(`${p}% loaded`)
});
requestPersonaAccess(appId?, personaType?)
Prompts the user to grant your app permission to use a persona. Returns true if granted.
removePersonaPermission(appId?, personaType?)
Revokes persona access for your app.
getLoadedPersonas()
Returns a map of persona types currently loaded in memory.
Persona methods are optional and only available when running inside the Apogee shell with the persona manager active. Always check for null before calling.
Runtime constants
These are the default values the AI runtime uses. You can override maxTokens and temperature per-request, but the others are system-level.
| Constant | Value |
|---|
| Default max tokens | 2048 |
| Default temperature | 0.7 |
| Request timeout | 150,000ms (2.5 minutes) |
| Tool turn max tokens | 800 |
| Max tool turns per message | 3 |
Next steps