Shipping Mundwerk 1.0 — Local-First Dictation on macOS

05/06/2026 · Created by Björn Kindler

Mundwerk launch cover — Mundwerk's overlay while transcribing — on-device, in the menu bar.

Mundwerk shipped on May 4. Here is what building a fully on-device dictation tool for macOS actually looks like — the tech choices, the friction points, and what did not make it into v1.

I shipped Mundwerk on the fourth of May. It is a dictation app for macOS that turns speech into text right at the cursor, fully on-device, with no cloud round-trips. This is a developer-side note about how it came together — not a sales pitch, more of a build log.

Why this project existed

The proximate reason was Apple's built-in dictation. It works for clean prose and stops working the moment you mix English jargon into a German sentence. I spend half my workday writing about TYPO3, AWS, Swift and a dozen other proper nouns the system has never heard of. The system was correcting “Sparkle” to “sparkle” and “EdDSA” to “et cetera.” That was the breaking point.

The deeper reason is that I wanted dictation I could trust with client material. Not “we anonymize before transmitting” — actually nothing leaves the machine. If you have ever sat in a client call thinking about which pieces of audio you should not let out of the room, you know the feeling. Local-first removes the question entirely.

Tech choices

Mundwerk runs on Swift 6 with strict concurrency, SwiftUI for the UI, GRDB for persistence and AVAudioEngine for capture. Transcription is whisper.cpp with the Metal backend, so it runs on the GPU of any Apple Silicon Mac. Voice activity detection uses Silero VAD v5 via ONNX, with an RMS fallback for the rare case where the ONNX model misbehaves on a fresh install.

The bridges between Swift and the C++ underneath live in dedicated targets — WhisperBridge and SileroVADBridge — both Objective-C++. Keeping that layer thin and testable mattered more than I expected. Every time I tried to be clever inside the bridge, debugging got worse.

Trade-offs

Local-first has an honest price: model files are large, and the app ships universally for Apple Silicon only. There is no Intel build. There is also no fallback to a cloud API when a local model is missing — if you do not have the model, you do not get transcription. I picked that constraint because mixing local and cloud paths is exactly the kind of complexity that erodes the privacy claim over time.

Distribution

Mundwerk is sold through my own checkout, not the App Store. Two reasons. First, the App Store sandbox does not let me write text into other applications via CGEvent.post — and that is the entire point of the app. Second, I wanted fast updates through EdDSA-signed Sparkle deltas instead of week-long review queues. Notarization and Gatekeeper still apply, just without the App Store as the middle layer.

What did not make it into v1

The English UI is not in v1. The app speaks German. Transcription understands both languages and code-switches mid-sentence, but the menu items and onboarding texts are German for now. An English UI is on the roadmap, and so is a hotkey builder that lets you bind dictation to any modifier or chord.

Streaming partial transcripts during a long dictation are also queued. The current model finishes and emits a chunk; the next version will surface partial output as you speak.

Where to go from here

If this resonates and you want to try it, the app and the documentation live at mundwerkapp.de — and there's a short overview here on the Mundwerk page. If you are interested in the build process or the tooling I used along the way, more posts on the engineering side will follow here on this blog.

Mundwerk is a one-person project. Feedback goes a long way.

← Back to the blog