Choosing and managing models

webAI runs AI models locally on your device. The right model depends on your hardware — specifically how much memory you have and which inference backend you’re using. This guide helps you pick the best option.

Quick recommendation

Not sure where to start? Here’s the short version:

Your device	Recommended model	Backend
MacBook Air (8 GB)	Gemma 3n E2B or Qwen3 4B	MLX
MacBook Pro (16 GB)	Qwen3 8B	MLX
MacBook Pro (32 GB+)	Qwen3 32B	MLX
Any Mac (fallback)	Qwen3 4B	llama.cpp
Browser only	Qwen3 1.7B or Qwen2.5 0.5B

When in doubt, start with a smaller model. You can always switch to a larger one later — and smaller models load faster and respond more quickly.

Downloading a model

Open Settings

Navigate to Settings from the launcher or sidebar.

Go to the AI section

Find the model management panel where you can see available and downloaded models.

Select a model

Choose a model from the list. The UI shows the model size, memory requirement, and compatible backends.

Download

Click download. The model is saved to your device — once downloaded, it’s available offline.

Understanding model sizes

Larger models are generally smarter but require more memory and respond more slowly. Here’s what to expect:

Size class	Parameters	Good for
Small (0.5B - 1.7B)	Fast, lightweight	Quick answers, simple tasks, low-memory devices
Medium (4B - 8B)	Balanced	General use, research, writing, code
Large (12B - 32B)	More capable	Complex reasoning, detailed analysis, nuanced writing
Very large (70B+)	Most capable	Advanced reasoning, demanding tasks (requires 64+ GB)

Choosing a backend

The backend determines how the model runs on your hardware. See On-Device AI for the full technical breakdown.

Backend	Platform	Speed	Model support
MLX	macOS (Apple Silicon)	Fastest	Up to 235B parameters
llama.cpp	macOS (desktop app)	Moderate	Up to 70B parameters
WebGPU	Any browser	Varies by GPU	Up to 1.7B parameters

The system selects the best backend automatically based on your hardware. You can override this in Settings if you prefer a specific backend.

Managing your model library

Switching models

You can switch between downloaded models at any time from the Oasis settings or the model selector. The current model unloads and the new one loads in its place.

Storage

Models are stored locally on your device. A small model (0.5B) takes around 500 MB of disk space, while a large model (32B) can take 20+ GB. Check your available disk space before downloading larger models.

Removing models

If you need to free up space, you can delete downloaded models from the model management panel. They can always be re-downloaded later.

LoRA adapters

After choosing a base model, you can optionally attach a adapter through your persona configuration. Adapters specialize the model for specific tasks without requiring a separate download of a full model.

Collaboration

AI & Personas

Custom Apps

Quick recommendation

Downloading a model

Understanding model sizes

Choosing a backend

Managing your model library

Switching models

Storage

Removing models

LoRA adapters

Learn more

On-Device AI

Oasis

Collaboration

AI & Personas

Custom Apps

​Quick recommendation

​Downloading a model

​Understanding model sizes

​Choosing a backend

​Managing your model library

​Switching models

​Storage

​Removing models

​LoRA adapters

​Learn more

On-Device AI

Oasis

Quick recommendation

Downloading a model

Understanding model sizes

Choosing a backend

Managing your model library

Switching models

Storage

Removing models

LoRA adapters

Learn more