Use Local Model Runtimes

Local model runtimes let Apprentice call models served from your machine or local network instead of a hosted model API.

Apprentice currently supports local runtime providers such as LM Studio, Docker Model Runner, and Ollama.

What Local Means

With a local runtime, model inference happens through the local service you configure.

This is separate from Apprentice's local-first app model. Apprentice always keeps app data and runtime control local, but model calls follow the provider. A local runtime keeps model calls local only when the runtime itself is local and reachable.

Before You Start

Make sure:

Docker is running for Apprentice agent execution.
The local model service is installed and running.
At least one model is available in that service.
The server URL in Apprentice matches the local service.

LM Studio

LM Studio uses an OpenAI-compatible local API.

In Apprentice, configure:

Server URL, usually http://localhost:1234.
Optional API key, if your LM Studio server requires one.
Temperature and max tokens.

Start the LM Studio server before testing the provider.

Ollama

Ollama uses a local server, commonly:

http://localhost:11434

Make sure the model you want is pulled and available in Ollama before selecting it in Apprentice.

Docker Model Runner

Docker Model Runner uses Docker's local model runtime endpoint.

The default Apprentice setting is based on:

http://localhost:12434/engines

Use this when Docker Model Runner is installed and models are available through Docker.

Cost And Budgets

Local models may not have provider token charges.

If cost tracking is not useful for a local runtime, mark the agent as Free Agent in the Budget tab. Still keep max duration, capabilities, permissions, and folder access tight.

Performance Expectations

Local model quality and speed depend on:

Model size.
Machine CPU, GPU, and memory.
Runtime configuration.
Context size.
Agent tool usage.

If a local model struggles with a complex task, use a narrower prompt, a stronger local model, or a cloud/API provider for that agent.

Troubleshooting

If no models appear, confirm the local runtime is running and has models available.

If the provider test fails, check the server URL and port.

If responses are slow, use a smaller model or reduce the task scope.

If the agent fails while Docker is stopped, start Docker. Apprentice still uses Docker for agent runtime isolation even when the model provider is local.

Next Step

After the local runtime works, create or update an agent to use that provider and run a small manual test.