Skip to content

Where is the Ollama entry in Silly Tavern?

Can’t find Ollama in new Silly Tavern? Three steps to local models

Silly Tavern (see e.g. 1.16.0) works well as a self-hosted front-end on a PC or home server. With Ollama and LM Studio you can run local LLMs and keep chats closer to home—privacy and control. After updates, people ask: where did Ollama / LM Studio go? They were not removed: the API panel’s main API and API type layers changed. Here are three steps to the right place.

Why it feels like Ollama is “gone”

1. Two main routes
Under API Connections, the first dropdown is the main API. Ollama and LM Studio use Text Completion, not the more obvious default Chat Completion. Picking the wrong branch hides the options you expect.

2. LM Studio is not its own top-level item
Under text completion, LM Studio is grouped under Generic (OpenAI-compatible) [LM Studio, LiteLLM, etc.]—open API Type.

3. Ollama is its own type
After Text Completion, choose Ollama under API Type to get the URL field and model list. You will not find that type under Chat Completion by design.


Three steps

Step 1: Main API → Text Completion

Switch the first dropdown from Chat Completion to Text Completion.

Switch main API to Text Completion

Step 2: API type → Ollama

In the text-completion block, set API Type to Ollama. For LM Studio use Generic (OpenAI-compatible) [LM Studio, LiteLLM, etc.].

Select Ollama API type

Step 3: URL, model, Connect

Set API URL to http://127.0.0.1:11434, click Connect. After success, models from Ollama appear in the list.

Pick a local model


Environment

bash
ollama pull llama3.2
ollama run qwen3.5:27b

Fields in Silly Tavern

FieldValue
Main APIText Completion
API TypeOllama
API URLhttp://127.0.0.1:11434
Ollama modelfrom dropdown

Why not Generic OpenAI for Ollama? Ollama uses /api/generate, not /v1/chat/completions. Use the Ollama type.


LM Studio (OpenAI-compatible)

Use Generic (OpenAI-compatible), not the Ollama type.

FieldValue
Main APIText Completion
API TypeGeneric (OpenAI-compatible) [LM Studio, LiteLLM, etc.]
Server URLhttp://127.0.0.1:1234
API Keyempty for pure local

Load a model and start Local Server in LM Studio first.


Quick test: qwen3.5:0.8b for roleplay?

We tried a tiny model for RP—rough:

Small model leaking tags into replies

Use at least 7B+ for serious RP; 14B / 27B is steadier. More in Qwen 3.5-27B hands-on or the Japanese version.



About the author

花

花(Hana)

AI工具評価の専門家。東京・新宿三丁目周辺で活動し、最新のAIアプリケーションやツールを実際に使用してレビューを提供しています。


FAQ

Connect does nothing and no error?

Most often Text Completion was skipped. Run ollama list. For LM Studio, confirm the local server is running.

Connected but no reply?

Match the model name exactly. Try disabling streaming or lowering context size.

Garbage and tags in output?

Often a too-small model (e.g. 0.8B). Prefer 7B+, 14B / 27B for RP.

Text Completion vs Chat Completion?

Different protocols. Ollama and many local stacks → Text Completion; cloud ChatGPT/Claude → Chat Completion.

Local LLM on a phone?

Impractical on-device. For mobile RP try MiniTavern with OpenRouter, DeepSeek, etc.

Self-hosted vs cloud API?

Self-hosted ST + Ollama keeps more on your LAN/PC. Chat Completion in the cloud goes through the provider. This guide targets Text Completion → Ollama / Generic for local inference.


Published: March 31, 2026
Updated: March 31, 2026


Updated: