A self-hosted, multi-user AI companion platform for people who take privacy seriously — not as a selling point, but as a minimum requirement.
Most AI chat platforms ask you to hand over your conversations, your prompts, and your trust to a corporate pipeline — in exchange for convenience.
Shared API keys mean your queries sit alongside everyone else's. Privacy policies change. Your data might leave the building, silently.
Lock-in is dressed up as simplicity. That is a trade you should never have to make.
A privacy-first, self-hosted, multi-user AI companion platform. All inference runs on hardware you choose. Deployed in minutes via Docker Compose. GPLv3 throughout — no vendor lock-in, no telemetry, no surprises.
One FastAPI process with strictly enforced module boundaries. Event-driven throughout. Scales simply — no Kubernetes required.
Works with self-hosted Ollama, Ollama Cloud, or any compatible endpoint. BYOK is not an optional feature — it is the only mode.
MongoDB with local vector search. Redis Streams for real-time events. No external vector database — everything stays on your machine.
One persistent WebSocket connection per user. No polling, ever. Every state change is an event, replayed on reconnect via Redis Streams.
Each user manages their own LLM connections — named, encrypted, personal. The administrator sees nothing of your API keys. Nobody else's credentials share a namespace with yours.
Keys are encrypted with Fernet symmetric encryption, stored per-user in the database. Multiple providers, multiple models, one clean interface.
Your data. Your credentials. Your inference.
Keys are never stored in plaintext. The admin has no visibility into user credentials — by design, not by policy.
Connect any Ollama-compatible endpoint. Self-hosted at home, Ollama Cloud, or a custom backend — the same interface throughout.
Most people with a home GPU face the same wall: dynamic IP, CGNAT, no port-forwarding. The machine sits idle unless you are physically on the local network.
A small sidecar container runs beside your home Ollama. It opens a reverse outbound WebSocket to your Chatsune instance — no inbound ports required. The connection tunnels inference traffic both ways.
One-time pairing keys, shown once at creation, hashed as argon2id. Instantly revocable. Reverse-only transport eliminates spoofing risk. Each pairing gets its own isolated namespace.
The homelab system is not just for solo use. Hosts can provision compute capacity to invited users — directly, without a marketplace, without a platform intermediary cutting in.
Create a pairing, define a model allowlist, set concurrency limits. Your hardware, your rules — down to the model level.
No public marketplace. You invite a specific, trusted user. Revoke at any time. Zero intermediaries. No billing surprises.
The sidecar protocol is engine-agnostic. Connect whatever runtime is running on the box — the abstraction handles translation.
The friend-to-friend compute model. Hardware shared between people who know each other — no cloud middleman required.
Most AI platforms stuff recent chat history into the context window and call it memory. That is not memory — it is short-term recall with an expiry date.
Chatsune runs background consolidation jobs — dreaming — that synthesise episodic journal entries into persistent prose memory. Long-term, versioned, recoverable on failure.
Each persona has its own memory body. It grows over time. It survives sessions, restarts, and upgrades.
Named companions with avatars and a three-layer system prompt hierarchy — global guardrails, user additions, persona specifics. Persistent relationships, not throwaway chat sessions.
Upload documents to a persona's private knowledge base. Semantic retrieval via MongoDB Vector Search — your documents stay where they are.
768-dimensional embeddings generated on CPU via ONNX Runtime. No OpenAI embedding calls. No Cohere. No external API. Zero data leaves your server.
Group conversations and knowledge by project. Switch context cleanly. Sanitised Mode keeps sensitive personas out of the main view when needed.
Multi-step tool use with refusal detection. Server orchestrates, browser executes sandboxed code, results returned in the same WebSocket stream.
When a model lacks native CoT, Chatsune injects an analytical reasoning block. Better outputs from any model — no manual prompt engineering.
| Feature | Chatsune | Open WebUI | SillyTavern |
|---|---|---|---|
| BYOK enforced per user | ✓ Always | ~ Admin-shared common | ~ Varies by setup |
| Homelab reverse sidecar (CGNAT-safe) | ✓ Native | ✗ | ✗ |
| GPU sharing between trusted users | ✓ Invitation-based | ✗ | ✗ |
| Background memory consolidation | ✓ "Dreaming" system | ✗ | ~ Character cards only |
| Local CPU embeddings (no external API) | ✓ Arctic Embed ONNX | ~ Ext. API or vector DB | ✗ |
| Client-side JS tool sandbox | ✓ Isolated Web Worker | ✗ | ✗ |
| Vision fallback (auto-delegate) | ✓ Automatic | ✗ Manual model selection | ✗ Manual |
| Soft chain-of-thought injection | ✓ Auto for all models | ✗ | ✗ |
| Event-driven real-time (no polling) | ✓ Redis Streams | ~ REST-heavy | ~ Varies |
| Copyleft licence | ✓ GPLv3 | Apache 2.0 | ✓ AGPL-3.0 |
React 19 + TypeScript
Vite 8 · Tailwind CSS 4 · Zustand 5 · pnpm
Python 3.12 + FastAPI
Pydantic v2 · uv · fully async throughout
MongoDB 7.0 (RS0)
Local Vector Search included · Single-node replica set
Redis 7
Streams · LRU cache · Session store · Event replay
Arctic Embed M v2.0
ONNX Runtime · CPU-only · 768-dimensional vectors
Docker Compose
MongoDB Atlas Local · Redis Alpine · Single command
We did not build Chatsune to compete with commercial AI platforms. We built it because we believe your relationship with an AI companion should belong to you — not to a company, not to a data pipeline, not to a terms-of-service agreement nobody reads.
The homelab concept is not a technical curiosity. It is a statement: your hardware should work for you, wherever you are. Convenience should never require surrendering control.
GPLv3. Not open-washing. Copyleft by conviction — freedom preserved all the way down the chain.
Chatsune deploys with a single docker compose up -d. MongoDB, Redis, and the backend come up together. Add users, connect your first Ollama instance, and you are running.
Self-host for yourself, or run a private instance for a small group of people who trust each other with shared compute.
The homelab sidecar is a separate, lightweight container — bring your own GPU to the party from anywhere in the world.
✦ Built by Tidesson Communications · GPLv3 · No surveillance. No compromise.