01 / 13
Privacy-first · Self-hosted · Open Source

Your AI companion.
On your hardware.
Under your control.

A self-hosted, multi-user AI companion platform for people who take privacy seriously — not as a selling point, but as a minimum requirement.

GPLv3 Self-hosted Multi-user Ollama-compatible BYOK Docker Compose No telemetry
The landscape

AI is remarkable.
The status quo is not.

Most AI chat platforms ask you to hand over your conversations, your prompts, and your trust to a corporate pipeline — in exchange for convenience.

Shared API keys mean your queries sit alongside everyone else's. Privacy policies change. Your data might leave the building, silently.

Lock-in is dressed up as simplicity. That is a trade you should never have to make.

  • Conversations used as training data
  • Shared credentials — your prompts, everyone's risk
  • Corporate oversight on every message
  • No ownership of your AI's memory
  • Vendor lock-in disguised as UX
  • Privacy as a premium tier, never a default
Platform overview

What Chatsune is.

A privacy-first, self-hosted, multi-user AI companion platform. All inference runs on hardware you choose. Deployed in minutes via Docker Compose. GPLv3 throughout — no vendor lock-in, no telemetry, no surprises.

Architecture

Modular Monolith

One FastAPI process with strictly enforced module boundaries. Event-driven throughout. Scales simply — no Kubernetes required.

Inference

Ollama-Compatible

Works with self-hosted Ollama, Ollama Cloud, or any compatible endpoint. BYOK is not an optional feature — it is the only mode.

Storage

MongoDB + Redis

MongoDB with local vector search. Redis Streams for real-time events. No external vector database — everything stays on your machine.

Real-time

Event-First WebSocket

One persistent WebSocket connection per user. No polling, ever. Every state change is an event, replayed on reconnect via Redis Streams.

Credentials & access

Bring Your Own Keys.
Every user. No exceptions.

Each user manages their own LLM connections — named, encrypted, personal. The administrator sees nothing of your API keys. Nobody else's credentials share a namespace with yours.

Keys are encrypted with Fernet symmetric encryption, stored per-user in the database. Multiple providers, multiple models, one clean interface.

Your data. Your credentials. Your inference.

Encryption

Fernet symmetric encryption

Keys are never stored in plaintext. The admin has no visibility into user credentials — by design, not by policy.

Multi-provider

Ollama · Cloud · Custom

Connect any Ollama-compatible endpoint. Self-hosted at home, Ollama Cloud, or a custom backend — the same interface throughout.

Homelab integration

Your GPU is at home.
That should not stop you.

Most people with a home GPU face the same wall: dynamic IP, CGNAT, no port-forwarding. The machine sits idle unless you are physically on the local network.

☁ Chatsune ⟵ ⟶ ✦ Sidecar ⬡ Ollama (home GPU)
Reverse WebSocket · No port-forwarding · Works through CGNAT and dynamic IPs
How it works

A small sidecar container runs beside your home Ollama. It opens a reverse outbound WebSocket to your Chatsune instance — no inbound ports required. The connection tunnels inference traffic both ways.

Security

One-time pairing keys, shown once at creation, hashed as argon2id. Instantly revocable. Reverse-only transport eliminates spoofing risk. Each pairing gets its own isolated namespace.

Community compute

Share your GPU
with people you trust.

The homelab system is not just for solo use. Hosts can provision compute capacity to invited users — directly, without a marketplace, without a platform intermediary cutting in.

Host

Provision your compute

Create a pairing, define a model allowlist, set concurrency limits. Your hardware, your rules — down to the model level.

Invitation

Invite, not publish

No public marketplace. You invite a specific, trusted user. Revoke at any time. Zero intermediaries. No billing surprises.

Multi-engine

Ollama, LM Studio, vLLM

The sidecar protocol is engine-agnostic. Connect whatever runtime is running on the box — the abstraction handles translation.

The friend-to-friend compute model. Hardware shared between people who know each other — no cloud middleman required.

Memory system

Memory that
actually accumulates.

Most AI platforms stuff recent chat history into the context window and call it memory. That is not memory — it is short-term recall with an expiry date.

Chatsune runs background consolidation jobs — dreaming — that synthesise episodic journal entries into persistent prose memory. Long-term, versioned, recoverable on failure.

Each persona has its own memory body. It grows over time. It survives sessions, restarts, and upgrades.

  • Journal extraction
    Conversations scanned in the background for significant statements and events.
  • Dreaming (consolidation)
    Episodic entries synthesised into coherent prose by a scheduled background job.
  • Two-tier retrieval
    Fresh episodic entries plus consolidated long-term memory injected at inference time.
  • Versioned with rollback
    Memory body is versioned. Consolidation failures roll back cleanly — no data loss.
Personas & knowledge

Companions with
real persistence.

Persona System

Distinct personalities

Named companions with avatars and a three-layer system prompt hierarchy — global guardrails, user additions, persona specifics. Persistent relationships, not throwaway chat sessions.

Knowledge Bases

Per-persona document libraries

Upload documents to a persona's private knowledge base. Semantic retrieval via MongoDB Vector Search — your documents stay where they are.

Local Embeddings

Arctic Embed M v2.0

768-dimensional embeddings generated on CPU via ONNX Runtime. No OpenAI embedding calls. No Cohere. No external API. Zero data leaves your server.

Projects

Organised context

Group conversations and knowledge by project. Switch context cleanly. Sanitised Mode keeps sensitive personas out of the main view when needed.

Tools & outputs

Tools that
actually do things.

  • Web search
    Pluggable adapters, per-user BYOK credentials. No shared search keys.
  • Knowledge retrieval
    Semantic search across a persona's document library, inline in conversation.
  • Client-side JS sandbox
    Arbitrary calculation executed in an isolated Web Worker. No DOM access, no network.
  • Vision fallback
    Non-vision models auto-delegate image tasks to a capable model — transparently.
  • Artefacts
    Code, Mermaid diagrams, HTML, SVG — captured, versioned, browsable after the fact.
Tool execution loop

Up to 5 iterations

Multi-step tool use with refusal detection. Server orchestrates, browser executes sandboxed code, results returned in the same WebSocket stream.

Soft chain-of-thought

Auto-injected reasoning

When a model lacks native CoT, Chatsune injects an analytical reasoning block. Better outputs from any model — no manual prompt engineering.

Competitive landscape

How we compare.

Feature Chatsune Open WebUI SillyTavern
BYOK enforced per user ✓ Always ~ Admin-shared common ~ Varies by setup
Homelab reverse sidecar (CGNAT-safe) ✓ Native
GPU sharing between trusted users ✓ Invitation-based
Background memory consolidation ✓ "Dreaming" system ~ Character cards only
Local CPU embeddings (no external API) ✓ Arctic Embed ONNX ~ Ext. API or vector DB
Client-side JS tool sandbox ✓ Isolated Web Worker
Vision fallback (auto-delegate) ✓ Automatic ✗ Manual model selection ✗ Manual
Soft chain-of-thought injection ✓ Auto for all models
Event-driven real-time (no polling) ✓ Redis Streams ~ REST-heavy ~ Varies
Copyleft licence ✓ GPLv3 Apache 2.0 ✓ AGPL-3.0
Technology

Built on solid ground.

Frontend

React 19 + TypeScript

Vite 8 · Tailwind CSS 4 · Zustand 5 · pnpm

Backend

Python 3.12 + FastAPI

Pydantic v2 · uv · fully async throughout

Database

MongoDB 7.0 (RS0)

Local Vector Search included · Single-node replica set

Cache & Events

Redis 7

Streams · LRU cache · Session store · Event replay

Embeddings

Arctic Embed M v2.0

ONNX Runtime · CPU-only · 768-dimensional vectors

Deployment

Docker Compose

MongoDB Atlas Local · Redis Alpine · Single command

# Deploy Chatsune — everything included
docker compose up -d            # MongoDB + Redis
uv run uvicorn backend.main:app  # FastAPI backend
cd frontend && pnpm dev           # React frontend
Why we built this

Privacy. Autonomy.
Self-determination.

We did not build Chatsune to compete with commercial AI platforms. We built it because we believe your relationship with an AI companion should belong to you — not to a company, not to a data pipeline, not to a terms-of-service agreement nobody reads.

The homelab concept is not a technical curiosity. It is a statement: your hardware should work for you, wherever you are. Convenience should never require surrendering control.

GPLv3. Not open-washing. Copyleft by conviction — freedom preserved all the way down the chain.

GPLv3 — your freedom, not ours to revoke
  • Your conversations are not training data.
  • Your keys are not the platform's business.
  • Your GPU earns its keep, wherever you are.
  • Your AI's memory belongs to you.
  • No telemetry. No subscription. No lock-in.
  • The source code is the product.
Get started

Your instance.
Your community.

Chatsune deploys with a single docker compose up -d. MongoDB, Redis, and the backend come up together. Add users, connect your first Ollama instance, and you are running.

Self-host for yourself, or run a private instance for a small group of people who trust each other with shared compute.

The homelab sidecar is a separate, lightweight container — bring your own GPU to the party from anywhere in the world.

Source on GitHub Issues welcome PRs considered
113Event topics
768dLocal vectors
v3GPL licence
0Telemetry calls
# That's all it takes
git clone https://github.com/symphonic-navigator/chatsune.git
docker compose up -d
# Visit http://localhost:5173

✦   Built by Tidesson Communications  ·  GPLv3  ·  No surveillance. No compromise.