Do I need a GPU to run Whisper.cpp for real time transcription?

A GPU is recommended for sub second latency, but you can run Whisper.cpp on a CPU with acceptable performance for meetings under 30 minutes. The base.en model on an 8 core CPU typically transcribes in roughly 1.5 times the audio length.

Can I use a cloud hosted LLM instead of Ollama for summarization?

Yes. The architecture is modular. Replace the Ollama API call with an endpoint from Claude, GPT, Groq, or OpenRouter. Just be aware that sending transcripts to a third party LLM partially defeats the privacy advantage of the local stack.

How do I handle speaker diarization without a commercial service?

You can add pyannote.audio to the pipeline. It runs best on a separate GPU with at least 4 GB VRAM. For most teams, labeling speakers by time segment (Speaker A, Speaker B) is sufficient. Voice print identification requires a training dataset of each participant.

Build Your Own AI Meeting Notes Assistant: The Open Source Guide for Devs

Every developer has felt the pain of a 45 minute sprint retro that produces exactly zero written action items. The knee jerk fix is a SaaS meeting bot: Otter, Fireflies, or Granola at $10 to $30 per user per month. Over a 20 person team that is $200 to $600 every month for someone else to hold your meeting transcripts hostage.

The better path is an open source meeting notes assistant that runs 100% on your own hardware. You get GDPR and HIPAA compliance by default, unlimited meeting minutes for the cost of a cloud GPU spot instance, and the freedom to pipe summaries into any tool you already use. In this guide you will build a working stack with Whisper.cpp for real time transcription, Ollama for local LLM summarization, and an orchestrator that posts results to Slack and Notion. No vendor lock in. No monthly subscription creep. Just a clean, private automation that you own.

Prerequisites

A Linux or macOS machine with Docker Compose installed (or a cloud VM with at least 8 GB RAM and ideally an NVIDIA GPU)
Basic familiarity with the terminal and REST APIs
A Slack workspace where you can create a bot app
A Notion account with database creation rights

1. Why Go DIY: Privacy, Cost, and Control

Commercial AI meeting assistants have a fundamental tension. They need to send your audio to their cloud to transcribe and summarize. That means your confidential product roadmap, your sales negotiation tactics, and your internal post mortem live on servers you do not control. Even if vendors promise encryption, the exposure surface is far larger than a self hosted system.

Cost compounds quickly. Fireflies starts at $10 per user per month; a 50 person team pays $6,000 a year. For that budget you can provision an NVIDIA T4 GPU on a cloud provider for months and still have change left over. A self hosted deployment handling 30 minute calls on such a GPU runs roughly $0.30 to $0.60 per hour, which drops to near zero if you already own a development workstation with a GPU. Community projects like Meetily prove that local transcription with Whisper.cpp is fast enough for real time use, and local LLMs through Ollama deliver summaries that rival cloud hosted models.

Control matters more than price. You choose the LLM backend. You tune the summarization prompt to match your team's communication style. You decide whether transcripts live in SQLite or sync to Notion. And you never have to ask a vendor for an API endpoint that does not exist yet.

2. Core Architecture: Whisper + LLM + Orchestrator

Your assistant will have three layers. Transcription uses Whisper.cpp, a C++ port of OpenAI's Whisper model that can run on a CPU or GPU and output text in near real time. Summarization uses Ollama, a lightweight runtime that can serve dozens of open source LLMs locally. The orchestrator is a small service (FastAPI or Node.js) that receives audio chunks, feeds them to Whisper, collects the transcript, passes it to Ollama with a structured prompt, and then posts the result to Slack and Notion.

Data flows like this:

A desktop agent (or the Meetily app) captures system audio during a meeting and streams it to the orchestrator via WebSocket.
The orchestrator sends each audio chunk to Whisper.cpp’s HTTP endpoint.
Whisper returns text segments which are accumulated into a full transcript.
When the meeting ends, the orchestrator sends the transcript to Ollama’s API with a custom prompt.
Ollama returns structured JSON (key decisions, action items, summary).
The orchestrator writes the full transcript to a local SQLite database and sends the structured summary to Slack and Notion.

This architecture gives you a meeting notes assistant architecture that is modular. You can swap Whisper for Parakeet, swap Ollama for an external Claude API, or add a search layer with vector embeddings later.

3. Step 1: Setting Up Real Time Transcription with Whisper.cpp

Start by compiling Whisper.cpp on your machine or cloud VM. The project supports CUDA and ROCm for GPU acceleration, which drops latency from a few seconds to under a second.

git clone https://github.com/ggerganov/whisper.cpp.git
cd whisper.cpp
make -j

Download a model. For English meetings the base.en model offers a good balance of speed and accuracy. The larger large-v2 model improves accuracy with accents but uses more memory.

bash models/download-ggml-model.sh base.en

Start the HTTP server. The --port flag defaults to 8080. The --model flag points to your downloaded model file.

./server --model models/ggml-base.en.bin --port 8080

Now test the server with a short WAV file. You can record a test audio file with any tool (e.g., arecord on Linux or QuickTime on macOS) and send it via curl.

curl -F "audio=@test-meeting.wav" http://localhost:8080/inference

The response will be a JSON object containing the transcribed text. For a 30 second test clip you should see text appear within one or two seconds. If latency feels high, enable GPU inference by passing --gpu 1 to the server command.

To stream audio from your meeting client, use a lightweight desktop agent. The Meetily desktop app (free and open source) captures system audio from Zoom, Google Meet, or Teams and pipes it to the Whisper server. Alternatively you can use PulseAudio on Linux or Virtual Audio Cable on Windows to route system audio into a simple Python script that sends chunks.

Whisper.cpp is remarkably robust. It handles multiple speakers and moderate background noise well, especially with the medium or large model. For speaker diarization (labeling who said what) you can add pyannote.audio in a separate pipeline, though this adds complexity. Start without diarization and add it only if your meetings regularly involve six or more participants.

4. Step 2: Summarization with Ollama and a Local LLM

Install Ollama with one command.

curl -fsSL https://ollama.com/install.sh | sh

Pull a model. Llama 3 (8B) runs well on 8 GB RAM and produces solid summaries. DeepSeek R1 is another strong option, especially for structured output.

ollama pull llama3

Now create a custom prompt template. This is where you control the output format. A good template forces the LLM to produce a JSON array with clear sections. Store this prompt in a text file or as an environment variable in your orchestrator.

You are a meeting minutes assistant. Summarize the following transcript.
Return a JSON object with three keys:
- "key_decisions": a list of strings, each describing a decision and its rationale
- "action_items": a list of objects with keys "owner", "task", and "due_date" (use "N/A" if not specified)
- "summary": a two sentence high level summary

Transcript:
{transcript}

In your orchestrator (FastAPI example below), after the meeting ends you fetch the full transcript and send it to Ollama’s API.

import requests
import json

def summarize(transcript):
    prompt = f"""You are a meeting minutes assistant...
Transcript:
{transcript}"""
    response = requests.post(
        "http://localhost:11434/api/generate",
        json={
            "model": "llama3",
            "prompt": prompt,
            "stream": False
        }
    )
    result = response.json()
    return json.loads(result["response"])

The response from Ollama is raw JSON. Parse it and you have structured minutes ready for distribution. If the LLM occasionally fails to output valid JSON, add a retry loop with a try / except block around the JSON parsing.

The beauty of Ollama meeting summarization is that you can swap models without touching any other code. Try Mistral for shorter outputs or Llama 3.1 for longer context windows. Each model costs nothing to run after the initial download.

5. Step 3: Integrating with Slack and Notion

A private transcript on disk is useful only to you. To make the assistant a team tool, you need to push summaries where your team already lives: Slack and Notion.

Slack Bot Integration

Go to api.slack.com/apps and create a new app. Add the chat:write and channels:history bot token scopes. Install the app to your workspace and note the Bot User OAuth Token. In your orchestrator, use the Slack SDK to post a formatted message.

from slack_sdk import WebClient

slack_token = "xoxb-your-token"
client = WebClient(token=slack_token)

def post_to_slack(summary):
    blocks = [
        {"type": "section", "text": {"type": "mrkdwn", "text": f"*Meeting Summary*\n{summary['summary']}"}},
        {"type": "divider"},
        {"type": "section", "text": {"type": "mrkdwn", "text": f"*Key Decisions*\n" + "\n".join([f"• {d}" for d in summary['key_decisions']])}},
        {"type": "section", "text": {"type": "mrkdwn", "text": f"*Action Items*\n" + "\n".join([f"• {a['task']} (owner: {a['owner']}, due: {a['due_date']})" for a in summary['action_items']])}}
    ]
    client.chat_postMessage(channel="#meeting-notes", blocks=blocks)

Notion Integration

Create an integration in the Notion Developer Portal. Grant it read and write access to the database you want to use. In Notion, create a new database with columns like “Meeting Title”, “Date”, “Summary”, “Transcript”, and “Action Items”. Then use the Notion API to create a page for each meeting.

import requests

NOTION_TOKEN = "secret_your_token"
DATABASE_ID = "your_database_id"

def save_to_notion(summary, transcript):
    headers = {
        "Authorization": f"Bearer {NOTION_TOKEN}",
        "Content-Type": "application/json",
        "Notion-Version": "2022-06-28"
    }
    data = {
        "parent": {"database_id": DATABASE_ID},
        "properties": {
            "Meeting Title": {"title": [{"text": {"content": "Sprint Review"}}]},
            "Summary": {"rich_text": [{"text": {"content": summary["summary"]}}]},
            "Date": {"date": {"start": "2025-06-15"}}
        }
    }
    requests.post("https://api.notion.com/v1/pages", headers=headers, json=data)

This Slack Notion integration meeting notes pipeline ensures every meeting produces a searchable document in Notion and an immediate notification in your team channel. The full transcript stays local for privacy; only the summary and action items go to external services.

6. Testing, Optimization, and Avoiding Common Pitfalls

Your new assistant will break in predictable ways unless you test it with realistic input. Start by collecting a few recordings of team meetings (with permission) that include multiple speakers, varied accents, and background noise like typing or children. Run each recording through the pipeline and check for three failure modes: missing words, hallucinated names, and malformed summary JSON.

Optimize prompt engineering. The system message you give to Ollama determines output quality. If action items are missing owners, add an explicit instruction: “For each action item, infer the owner from the conversation or output ‘unassigned’.” If decisions lack context, ask for “the reasoning behind each decision in one sentence.” Tune prompts over five or ten meetings until the output feels natural.

Set a human review window. AI summaries will occasionally misrepresent a discussion or miss sarcasm. Configure the orchestrator to hold the Slack post for 24 hours and instead send a preliminary notification to the meeting organizer. The organizer can correct errors before the summary goes to the general channel. This practice, recommended by experienced Whisper users, prevents embarrassing mistakes from becoming permanent records.

Secure the stack. Run the Whisper server, Ollama, and your orchestrator as non root users. Store API tokens in environment variables or a secrets manager like HashiCorp Vault. Encrypt the SQLite database that holds full transcripts. Use TLS if the orchestrator is accessible from the network. Follow the principle of least privilege: your Slack bot token should only have the scopes it needs (chat:write, channels:history).

Common pitfalls include running out of RAM when using large Whisper models on a 4 GB system, and forgetting to stop the Whisper server between meetings, which can cause memory leaks. Use Docker Compose to manage all services with restart policies and resource limits. For more advanced optimization, see our guide on local AI assistant setup which covers performance tuning for edge cases.

Next Steps: Go Further

Once the basic pipeline is stable, you can extend it. Add an MCP server to expose your meeting notes to AI agents, as Granola recently did. Connect the orchestrator to a second brain Notion system that indexes every decision across all meetings. Integrate with your calendar API to automatically fetch meeting titles and invite lists, as demonstrated in the Whisper + DeepSeek tutorial on Medium.

For teams that prefer a zero code approach to automation, consider pairing this assistant with the Claude productivity stack to trigger workflows based on meeting outcomes. And if you aspire to get your own technical content cited by AI assistants, our AI citation playbook offers a parallel strategy.

Building your own open source meeting notes assistant is not about saving a few dollars. It is about taking ownership of your team’s conversational data and shaping it into a tool that serves your exact workflow. The code is free. The models are powerful. The only missing piece is the evening you spend wiring them together.

Cover photo by Pachon in Motion on Pexels.