Why bother
This pipeline isn't really about meeting minutes. The "meeting" is a daily 30-minute slot I hold with myself — a standing recurring event on my calendar with one attendee: me. It's the front end of my personal voice-to-output workflow.
In any given session I'll:
- Summarize the day — what happened, what I learned, what's still open.
- Kick off a project by talking through what I want to build before I write a single line of it.
- Dictate spec material — user stories, requirements, acceptance criteria, the rough shape of an architecture.
- Have a conversation with myself about a new dashboard, document, or HTML artifact I'm sketching out.
- Think out loud on a problem I'm stuck on, which is genuinely faster than typing for me.
The transcript that drops out the other end becomes input for the next stage: my agent reads the .md, picks out the action items, drafts the spec, generates the artifact, files the follow-ups. The recording is just the cheapest possible voice capture device that already integrates with the rest of my Microsoft world.
So the daily standup isn't a meeting. It's the microphone on a personal automation pipeline. The faster and more invisible the path from "I said it" to "it's text in a folder my agent watches", the more useful the whole stack gets.
For the record, my wife is convinced I've lost it. She keeps finding me wandering around the house in headphones, narrating projects to nobody. The pipeline is, in part, a way of telling her "I promise I'm working — look, here's the text file."
/me/onlineMeetings/{id}/transcripts/{tid}/content) is blocked in my tenant. It requires OnlineMeetingTranscript.Read.All admin consent that isn't granted. So the clean API route was off the table — and that's what forced the design below.
The full flow at a glance
Turn on auto-record for the meeting series
Open the recurring meeting in Outlook or Teams Calendar → Meeting options → toggle Record automatically on (and turn on transcription if it isn't already). Without this, nothing else downstream has anything to chew on.
Recording saved to OneDrive
Stream auto-saves every recorded meeting to OneDrive / Documents / Recordings / as an MP4 with a predictable filename: <Subject>-YYYYMMDD_HHMMSSUTC-Meeting Recording.mp4.
Detect the new MP4
Every 30 min, list the Recordings folder via Microsoft Graph and look for files newer than the last seen timestamp whose name starts with a watched series prefix. One cheap API call per cycle; no browser unless there's something new.
Pop a browser, grab the .docx
Only runs when stage 2 found something. Drives signed-in Teams web in Playwright, opens the meeting chat, reads the "Meeting started at H:MM AM" line, clicks the right Transcript button, then Download → "Download as .docx". Closes the browser when done.
Convert and notify
Every 30 min, scan the transcripts folder for new .docx files. For each one, open in Word via COM, save as plain text, and ping Teams with the filename(s).
| Stage | Trigger | What runs | Status |
|---|---|---|---|
| 0 | Once, when you set up the series | Outlook / Teams meeting options → Record automatically | One-time |
| 1 | Recording ends | Microsoft Stream (nothing of mine) | Always on |
| 2 | Every 30 min | /transcript-watcher automation | Enabled |
| 3 | Stage 2 finds a new MP4 | /meeting-transcript via Playwright | On demand |
| 4 | Every 30 min | doc-watcher automation | Enabled |
Worst-case end-to-end latency: Stream processing (a few min) + up to 30 min for stage 2 + up to 30 min for stage 4 ≈ about an hour from "stop recording" to "Teams ping". Average is more like 30 minutes.
Design decisions, and a few dead ends
Why three skills instead of one
Single-responsibility. The detector doesn't know how to drive Teams. The Teams driver doesn't know about polling. The converter doesn't know about either. Each piece is replaceable: when Microsoft finally opens up the Graph transcripts API, I can swap stage 3 for a single HTTP call without touching the rest.
Why poll OneDrive instead of watching the chat
I tried three alternatives first:
- Graph transcripts API — blocked in my tenant. 403 every time. Dead.
- Teams chat system messages — Microsoft strips the
eventDetailpayload that would tell me "transcript available". The chat API returns opaque<systemEventMessage/>with the useful bits removed. - Stream activity feed — no public API surface I could reach.
OneDrive's Recordings folder turned out to be the cheapest reliable signal. Stream always auto-saves the MP4 there for the meeting organizer, with a stable filename pattern. One m365_list_files call with orderBy=lastModifiedDateTime desc tells me everything I need to know.
Why Playwright only on detection
Browser automation is fragile, slow, and visible. Running it every 30 minutes "just in case" would be obnoxious. By gating it behind a cheap Graph poll, the browser only pops up when there's actual work to do — usually once or twice a day.
Meeting started at 8:31 AM line. The skill explicitly captures that before navigating to the recap, because once you're on the recap surface you've lost easy access to the chat.
Why Word COM for the conversion
I wanted plain text that looks like what Word renders, not an XML parse. Word COM is the only thing on Windows that reliably handles every edge case (.doc, .docx, .docm, .rtf, .odt) and produces clean UTF-8 text. It's headless (Visible=False) and only spins up Word for a few seconds per file.
Why two 30-min timers instead of chaining them
Decoupling. If stage 2 fails or stage 3 crashes mid-flight, the next tick retries. If doc-watcher is already watching the folder for any .docx (not just ones I dropped there), it stays useful even if the transcript pipeline above it changes completely. Loose coupling, each side ignorant of the other.
Crash-safety: never advance state on failure
The transcript-watcher only updates last_seen_modified after /meeting-transcript succeeds for that specific file. If Playwright times out or the transcript isn't processed yet, the timestamp stays where it was, and the next 30-min poll retries the same file. The user gets one Teams DM about the failure; they don't get spammed every poll.
Stage 0 · Turn on auto-record (do this once)
The entire pipeline assumes the meeting actually gets recorded. The cheapest, most reliable way to guarantee that for a recurring series is to flip the auto-record switch on the series itself — then you can stop thinking about it.
How to set it
- Open the recurring meeting in Outlook (or Teams Calendar) → click into the series (not a single occurrence).
- Click Meeting options (usually in the ribbon, or a "More options" link inside the invite).
- Toggle Record automatically on. While you're there, confirm Allow transcription is also on — without it, Stream produces a video with no transcript and the pipeline has nothing to extract.
- Save. From the next occurrence onward, Stream records every instance the moment the meeting starts, regardless of whether anyone remembers to hit the button.
What if you don't own the meeting
Only the organizer can change meeting options. If you're an attendee on a series someone else owns, your two options are: ask them to enable auto-record, or accept that you'll need to hit Record manually each time. The downstream pipeline doesn't care which one started the recording, as long as one of them did.
What if your tenant blocks auto-record
Some tenant policies disable the auto-record toggle entirely. If the option is greyed out, you'll need to record manually each session — and the pipeline still works fine, you just have to remember to click Record. Everything downstream is unchanged.
Stage 1 · OneDrive auto-save
Nothing of mine runs here. When you record a Teams meeting as the organizer, Stream uploads the MP4 to your OneDrive at:
Documents / Recordings / <Subject>-YYYYMMDD_HHMMSSUTC-Meeting Recording.mp4
Two things to know:
- One meeting can produce multiple MP4s if the recording reconnects (e.g. organizer's network blip). Each gets its own file with its own timestamp. The pipeline handles this — each MP4 fires extraction once.
- The folder ID is stable per user. Capture it once during first-run setup and reuse forever.
Stage 2 · transcript-watcher
The detection skill. Reads its config, lists the Recordings folder via Graph, filters by prefix + timestamp, and on a hit hands off to /meeting-transcript.
Config shape
{
"recordings_folder_id": "<your OneDrive Recordings folder id>",
"interval_minutes": 30,
"automation_id": "<populated after first run>",
"watches": [
{
"id": "2026-daily-sync",
"display_name": "2026 Daily Sync",
"name_prefix": "2026 Daily Sync",
"chat_id": "19:meeting_<hex>@thread.v2",
"last_seen_modified": "2026-05-23T16:36:56Z"
}
]
}
The poll cycle
- For each watch, list the Recordings folder (
m365_list_fileswithorderBy=lastModifiedDateTime desc,limit=20). - Filter in memory: name starts with
name_prefix + "-", MIME isvideo/mp4,modified > last_seen_modified. - Sort matches ascending so state advances monotonically.
- For each match, invoke
/meeting-transcriptwith the chat ID. - Only after the transcript skill returns successfully, write the new
last_seen_modifiedinto config and persist immediately. - If nothing matched, stay completely silent — no Teams ping, no chat output.
The full skill definition (including first-run setup, dry-run mode, reconfigure commands, and failure handling) is in skills/transcript-watcher/SKILL.md.
Stage 3 · meeting-transcript
The Playwright-driven downloader. Runs only when stage 2 hands it a chat ID.
The seven steps
- Resolve the chat from a subject, link, or chat ID.
- Open the chat in signed-in Teams web (
teams.cloud.microsoft/v2/#/l/chat/<encoded-id>). - Capture the recording start time from the chat thread's "Meeting started at H:MM AM" line. Not from the recap picker — that's the scheduled time and will lie to you.
- Open the Transcript pane by clicking the last Transcript button (one button per recorded occurrence; last = most recent).
- Click Download → "Download as .docx" inside the Stream
xplatplugins.aspxiframe. Critical: wirepage.waitForEvent('download')before the menu-item click, usingPromise.all. - Save as
<slug>-<YYYY-MM-DD>-<HHMM>.docxusing the recording start time from step 3. - Verify the file exists and is >1 KB, then close the Playwright browser. On failure, leave the browser open for inspection.
The one tricky bit of Playwright code
The download menu lives inside an iframe. The event wiring matters:
async (page) => {
const tFrame = page.frames().find(f => f.url().includes('xplatplugins.aspx'));
if (!tFrame) throw new Error('Transcript iframe not found');
await tFrame.getByRole('button', {name: 'Download'}).click();
await page.waitForTimeout(500);
const [download] = await Promise.all([
page.waitForEvent('download', {timeout: 30000}),
tFrame.getByRole('menuitem', {name: /Download as \.docx/}).click()
]);
await download.saveAs(targetPath);
return download.suggestedFilename();
}
Why Promise.all: if you set up waitForEvent after the click, the event fires before you start listening and the promise times out. You have to be listening at the moment the click happens.
Stage 4 · doc-watcher
The generic "watch a folder, turn Word documents into plain text" automation. Was already running for other purposes; the transcript pipeline just drops files into a folder it already watches.
How it works
- Reads
config.json(folder, glob pattern, output dir, interval). - Runs
convert.py, which scans the folder for new files (tracked instate.jsonso each file converts exactly once). - For each new file, spins up a hidden Word instance (COM automation), opens the document, saves as Unicode text, and reads it back as UTF-8.
- Writes
<name>.mdnext to the original (or in a separate output folder). - Prints a JSON summary; if anything was processed or errored, sends one Teams DM listing the basenames.
The COM call that does the work
doc.SaveAs2(
FileName=str(out_txt),
FileFormat=7, # WD_FORMAT_UNICODE_TEXT
Encoding=1200, # UTF-16 LE — Word writes a BOM we strip on read
LineEnding=0, # CRLF, normalized to \n in Python
AddToRecentFiles=False,
)
One Word instance is reused for the whole batch (lazy DispatchEx), then quit at the end. OneDrive cloud-only files get hydrated to a temp dir first, since Word can't open placeholders directly.
Running it manually — the idiot's guide
The autopilot path is "do nothing, get a Teams ping in ~30 min". If you want the file right now:
Detects new MP4s and pulls each transcript into the watched folder.
Pass a meeting subject, link, or chat ID. Skips detection entirely.
Don't wait for the 30-min timer; convert any pending .docx files immediately.
The two-command path: /transcript-watcher followed by /doc-watcher takes a freshly-ended meeting all the way to a .md file plus Teams notification in about three minutes. Handy when I just stopped recording and want the text in front of me before the next meeting starts.
What's in this package
Each skill is independently installable. The full prompt body and config schema live in the SkillWorks library — open the matching card to view, copy, or download. Supporting files (convert.py, config.example.json) are linked from each skill's README on the library page.
Adapting this for your own setup
If you have Clawpilot
Drop the three skill folders into ~/.copilot/m-skills/, register each via m_create_skill, then invoke /transcript-watcher — the first-run flow walks you through picking a meeting series, finds the OneDrive Recordings folder and chat ID for you, and creates the disabled automation. Flip it on when you're happy.
If you don't have Clawpilot
The pieces still translate to anything that can run scheduled prompts against an LLM with tool calls — Power Automate + Copilot, a local Node script, a GitHub Actions workflow. Easier still: hand the three skill .md files to your agent of choice (GitHub Copilot, Copilot CoWork, Claude Code, whatever you're running) and ask it to adapt them to your stack. The substance is:
- One scheduled Microsoft Graph query against your Recordings folder.
- One Playwright script signed into Teams with a persisted profile.
- One folder watcher + Word COM converter.
- One Teams notification webhook (or whatever channel you prefer).
If your tenant does have the Graph transcript API
You can collapse stages 2 and 3 into a single GET /me/onlineMeetings/{id}/transcripts/{tid}/content call. No Playwright, no .docx round-trip — just the VTT directly. The rest of the pipeline (Word conversion, Teams ping) stays the same. Lucky you.
Caveats
- Only the meeting organizer gets the recording auto-saved to their OneDrive. If you're an attendee, you won't see the MP4 — but you can still pull the transcript via
/meeting-transcriptas long as you have chat access. - The Playwright session uses a persistent Edge profile; it has to be signed into Teams once, manually, before the first run.
- Recordings can be configured by your tenant admin to expire after N days. The pipeline doesn't extend retention; pull the transcripts you care about into a folder you control.