Teams Transcript Pipeline

Why bother

This pipeline isn't really about meeting minutes. The "meeting" is a daily 30-minute slot I hold with myself — a standing recurring event on my calendar with one attendee: me. It's the front end of my personal voice-to-output workflow.

In any given session I'll:

Summarize the day — what happened, what I learned, what's still open.
Kick off a project by talking through what I want to build before I write a single line of it.
Dictate spec material — user stories, requirements, acceptance criteria, the rough shape of an architecture.
Have a conversation with myself about a new dashboard, document, or HTML artifact I'm sketching out.
Think out loud on a problem I'm stuck on, which is genuinely faster than typing for me.

The transcript that drops out the other end becomes input for the next stage: my agent reads the .md, picks out the action items, drafts the spec, generates the artifact, files the follow-ups. The recording is just the cheapest possible voice capture device that already integrates with the rest of my Microsoft world.

So the daily standup isn't a meeting. It's the microphone on a personal automation pipeline. The faster and more invisible the path from "I said it" to "it's text in a folder my agent watches", the more useful the whole stack gets.

The seven clicks I'm removing: open Teams → find the meeting in the chat → click into the recap → open the transcript pane → click the three dots / Download / .docx → convert the .docx to plain text so the agent can really consume it → move the file from Downloads into the watched folder. Seven clicks a day, every day, forever, and most days more than once. The kind of friction that quietly kills a workflow before it has a chance to compound.

For the record, my wife is convinced I've lost it. She keeps finding me wandering around the house in headphones, narrating projects to nobody. The pipeline is, in part, a way of telling her "I promise I'm working — look, here's the text file."

The constraint that shaped everything: the Microsoft Graph transcript API (/me/onlineMeetings/{id}/transcripts/{tid}/content) is blocked in my tenant. It requires OnlineMeetingTranscript.Read.All admin consent that isn't granted. So the clean API route was off the table — and that's what forced the design below.

The full flow at a glance

0

You · one-time setup

Turn on auto-record for the meeting series

Open the recurring meeting in Outlook or Teams Calendar → Meeting options → toggle Record automatically on (and turn on transcription if it isn't already). Without this, nothing else downstream has anything to chew on.

one-time

1

Microsoft

Recording saved to OneDrive

Stream auto-saves every recorded meeting to OneDrive / Documents / Recordings / as an MP4 with a predictable filename: <Subject>-YYYYMMDD_HHMMSSUTC-Meeting Recording.mp4.

free

2

/transcript-watcher

Detect the new MP4

Every 30 min, list the Recordings folder via Microsoft Graph and look for files newer than the last seen timestamp whose name starts with a watched series prefix. One cheap API call per cycle; no browser unless there's something new.

automation · 30 min

3

/meeting-transcript

Pop a browser, grab the .docx

Only runs when stage 2 found something. Drives signed-in Teams web in Playwright, opens the meeting chat, reads the "Meeting started at H:MM AM" line, clicks the right Transcript button, then Download → "Download as .docx". Closes the browser when done.

on demand

4

/doc-watcher

Convert and notify

Every 30 min, scan the transcripts folder for new .docx files. For each one, open in Word via COM, save as plain text, and ping Teams with the filename(s).

automation · 30 min

Stage	Trigger	What runs	Status
0	Once, when you set up the series	Outlook / Teams meeting options → Record automatically	One-time
1	Recording ends	Microsoft Stream (nothing of mine)	Always on
2	Every 30 min	`/transcript-watcher` automation	Enabled
3	Stage 2 finds a new MP4	`/meeting-transcript` via Playwright	On demand
4	Every 30 min	`doc-watcher` automation	Enabled

Worst-case end-to-end latency: Stream processing (a few min) + up to 30 min for stage 2 + up to 30 min for stage 4 ≈ about an hour from "stop recording" to "Teams ping". Average is more like 30 minutes.

Design decisions, and a few dead ends

Why three skills instead of one

Single-responsibility. The detector doesn't know how to drive Teams. The Teams driver doesn't know about polling. The converter doesn't know about either. Each piece is replaceable: when Microsoft finally opens up the Graph transcripts API, I can swap stage 3 for a single HTTP call without touching the rest.

Why poll OneDrive instead of watching the chat

I tried three alternatives first:

Graph transcripts API — blocked in my tenant. 403 every time. Dead.
Teams chat system messages — Microsoft strips the eventDetail payload that would tell me "transcript available". The chat API returns opaque <systemEventMessage/> with the useful bits removed.
Stream activity feed — no public API surface I could reach.

OneDrive's Recordings folder turned out to be the cheapest reliable signal. Stream always auto-saves the MP4 there for the meeting organizer, with a stable filename pattern. One m365_list_files call with orderBy=lastModifiedDateTime desc tells me everything I need to know.

Why Playwright only on detection

Browser automation is fragile, slow, and visible. Running it every 30 minutes "just in case" would be obnoxious. By gating it behind a cheap Graph poll, the browser only pops up when there's actual work to do — usually once or twice a day.

The one gotcha that took the longest to figure out: the Stream recap picker's combobox label shows the scheduled occurrence ("Friday, May 22 6:15 PM"), not the actual recording start time. If the meeting started early, late, or got recorded ad-hoc, the picker label is wrong. The authoritative source is the chat thread's Meeting started at 8:31 AM line. The skill explicitly captures that before navigating to the recap, because once you're on the recap surface you've lost easy access to the chat.

Why Word COM for the conversion

I wanted plain text that looks like what Word renders, not an XML parse. Word COM is the only thing on Windows that reliably handles every edge case (.doc, .docx, .docm, .rtf, .odt) and produces clean UTF-8 text. It's headless (Visible=False) and only spins up Word for a few seconds per file.

Why two 30-min timers instead of chaining them

Decoupling. If stage 2 fails or stage 3 crashes mid-flight, the next tick retries. If doc-watcher is already watching the folder for any .docx (not just ones I dropped there), it stays useful even if the transcript pipeline above it changes completely. Loose coupling, each side ignorant of the other.

Crash-safety: never advance state on failure

The transcript-watcher only updates last_seen_modified after /meeting-transcript succeeds for that specific file. If Playwright times out or the transcript isn't processed yet, the timestamp stays where it was, and the next 30-min poll retries the same file. The user gets one Teams DM about the failure; they don't get spammed every poll.

Stage 0 · Turn on auto-record (do this once)

The entire pipeline assumes the meeting actually gets recorded. The cheapest, most reliable way to guarantee that for a recurring series is to flip the auto-record switch on the series itself — then you can stop thinking about it.

How to set it

Open the recurring meeting in Outlook (or Teams Calendar) → click into the series (not a single occurrence).
Click Meeting options (usually in the ribbon, or a "More options" link inside the invite).
Toggle Record automatically on. While you're there, confirm Allow transcription is also on — without it, Stream produces a video with no transcript and the pipeline has nothing to extract.
Save. From the next occurrence onward, Stream records every instance the moment the meeting starts, regardless of whether anyone remembers to hit the button.

Why on the series, not the occurrence: setting it on the series means you never have to remember. If you set it on a single occurrence, the next one is back to manual and the watcher silently has nothing to find. Most "the pipeline broke" failures I had during testing traced back to this — the meeting just hadn't been recorded.

What if you don't own the meeting

Only the organizer can change meeting options. If you're an attendee on a series someone else owns, your two options are: ask them to enable auto-record, or accept that you'll need to hit Record manually each time. The downstream pipeline doesn't care which one started the recording, as long as one of them did.

What if your tenant blocks auto-record

Some tenant policies disable the auto-record toggle entirely. If the option is greyed out, you'll need to record manually each session — and the pipeline still works fine, you just have to remember to click Record. Everything downstream is unchanged.

Stage 1 · OneDrive auto-save

Nothing of mine runs here. When you record a Teams meeting as the organizer, Stream uploads the MP4 to your OneDrive at:

Documents / Recordings / <Subject>-YYYYMMDD_HHMMSSUTC-Meeting Recording.mp4

Two things to know:

One meeting can produce multiple MP4s if the recording reconnects (e.g. organizer's network blip). Each gets its own file with its own timestamp. The pipeline handles this — each MP4 fires extraction once.
The folder ID is stable per user. Capture it once during first-run setup and reuse forever.

Stage 2 · transcript-watcher

The detection skill. Reads its config, lists the Recordings folder via Graph, filters by prefix + timestamp, and on a hit hands off to /meeting-transcript.

Config shape

{
          "recordings_folder_id": "<your OneDrive Recordings folder id>",
          "interval_minutes": 30,
          "automation_id": "<populated after first run>",
          "watches": [
            {
              "id": "2026-daily-sync",
              "display_name": "2026 Daily Sync",
              "name_prefix": "2026 Daily Sync",
              "chat_id": "19:meeting_<hex>@thread.v2",
              "last_seen_modified": "2026-05-23T16:36:56Z"
            }
          ]
        }

The poll cycle

For each watch, list the Recordings folder (m365_list_files with orderBy=lastModifiedDateTime desc, limit=20).
Filter in memory: name starts with name_prefix + "-", MIME is video/mp4, modified > last_seen_modified.
Sort matches ascending so state advances monotonically.
For each match, invoke /meeting-transcript with the chat ID.
Only after the transcript skill returns successfully, write the new last_seen_modified into config and persist immediately.
If nothing matched, stay completely silent — no Teams ping, no chat output.

The full skill definition (including first-run setup, dry-run mode, reconfigure commands, and failure handling) is in skills/transcript-watcher/SKILL.md.

Stage 3 · meeting-transcript

The Playwright-driven downloader. Runs only when stage 2 hands it a chat ID.

The seven steps

Resolve the chat from a subject, link, or chat ID.
Open the chat in signed-in Teams web (teams.cloud.microsoft/v2/#/l/chat/<encoded-id>).
Capture the recording start time from the chat thread's "Meeting started at H:MM AM" line. Not from the recap picker — that's the scheduled time and will lie to you.
Open the Transcript pane by clicking the last Transcript button (one button per recorded occurrence; last = most recent).
Click Download → "Download as .docx" inside the Stream xplatplugins.aspx iframe. Critical: wire page.waitForEvent('download') before the menu-item click, using Promise.all.
Save as <slug>-<YYYY-MM-DD>-<HHMM>.docx using the recording start time from step 3.
Verify the file exists and is >1 KB, then close the Playwright browser. On failure, leave the browser open for inspection.

The one tricky bit of Playwright code

The download menu lives inside an iframe. The event wiring matters:

async (page) => {
          const tFrame = page.frames().find(f => f.url().includes('xplatplugins.aspx'));
          if (!tFrame) throw new Error('Transcript iframe not found');

          await tFrame.getByRole('button', {name: 'Download'}).click();
          await page.waitForTimeout(500);

          const [download] = await Promise.all([
            page.waitForEvent('download', {timeout: 30000}),
            tFrame.getByRole('menuitem', {name: /Download as \.docx/}).click()
          ]);

          await download.saveAs(targetPath);
          return download.suggestedFilename();
        }

Why Promise.all: if you set up waitForEvent after the click, the event fires before you start listening and the promise times out. You have to be listening at the moment the click happens.

Don't scrape the transcript DOM. The Stream transcript pane uses virtualization. For anything longer than ~3 minutes, scrolling-and-reading silently truncates. Always use the native Download button.

Stage 4 · doc-watcher

The generic "watch a folder, turn Word documents into plain text" automation. Was already running for other purposes; the transcript pipeline just drops files into a folder it already watches.

How it works

Reads config.json (folder, glob pattern, output dir, interval).
Runs convert.py, which scans the folder for new files (tracked in state.json so each file converts exactly once).
For each new file, spins up a hidden Word instance (COM automation), opens the document, saves as Unicode text, and reads it back as UTF-8.
Writes <name>.md next to the original (or in a separate output folder).
Prints a JSON summary; if anything was processed or errored, sends one Teams DM listing the basenames.

The COM call that does the work

doc.SaveAs2(
            FileName=str(out_txt),
            FileFormat=7,          # WD_FORMAT_UNICODE_TEXT
            Encoding=1200,         # UTF-16 LE — Word writes a BOM we strip on read
            LineEnding=0,          # CRLF, normalized to \n in Python
            AddToRecentFiles=False,
        )

One Word instance is reused for the whole batch (lazy DispatchEx), then quit at the end. OneDrive cloud-only files get hydrated to a temp dir first, since Word can't open placeholders directly.

Running it manually — the idiot's guide

The autopilot path is "do nothing, get a Teams ping in ~30 min". If you want the file right now:

For any new recording

/transcript-watcher

Detects new MP4s and pulls each transcript into the watched folder.

For one specific meeting

/meeting-transcript

Pass a meeting subject, link, or chat ID. Skips detection entirely.

To force conversion now

/doc-watcher

Don't wait for the 30-min timer; convert any pending .docx files immediately.

The two-command path: /transcript-watcher followed by /doc-watcher takes a freshly-ended meeting all the way to a .md file plus Teams notification in about three minutes. Handy when I just stopped recording and want the text in front of me before the next meeting starts.

What's in this package

/transcript-watcher transcript-watcher.md · scheduled detector /meeting-transcript meeting-transcript.md · Playwright downloader /doc-watcher doc-watcher.md · folder watcher + Word COM

Each skill is independently installable. The full prompt body and config schema live in the SkillWorks library — open the matching card to view, copy, or download. Supporting files (convert.py, config.example.json) are linked from each skill's README on the library page.

Adapting this for your own setup

If you have Clawpilot

Drop the three skill folders into ~/.copilot/m-skills/, register each via m_create_skill, then invoke /transcript-watcher — the first-run flow walks you through picking a meeting series, finds the OneDrive Recordings folder and chat ID for you, and creates the disabled automation. Flip it on when you're happy.

If you don't have Clawpilot

The pieces still translate to anything that can run scheduled prompts against an LLM with tool calls — Power Automate + Copilot, a local Node script, a GitHub Actions workflow. Easier still: hand the three skill .md files to your agent of choice (GitHub Copilot, Copilot CoWork, Claude Code, whatever you're running) and ask it to adapt them to your stack. The substance is:

One scheduled Microsoft Graph query against your Recordings folder.
One Playwright script signed into Teams with a persisted profile.
One folder watcher + Word COM converter.
One Teams notification webhook (or whatever channel you prefer).

If your tenant does have the Graph transcript API

You can collapse stages 2 and 3 into a single GET /me/onlineMeetings/{id}/transcripts/{tid}/content call. No Playwright, no .docx round-trip — just the VTT directly. The rest of the pipeline (Word conversion, Teams ping) stays the same. Lucky you.

Caveats

Only the meeting organizer gets the recording auto-saved to their OneDrive. If you're an attendee, you won't see the MP4 — but you can still pull the transcript via /meeting-transcript as long as you have chat access.
The Playwright session uses a persistent Edge profile; it has to be signed into Teams once, manually, before the first run.
Recordings can be configured by your tenant admin to expire after N days. The pipeline doesn't extend retention; pull the transcripts you care about into a folder you control.

From recorded Teams meeting to plain-text file, automatically

Why bother

The full flow at a glance

Turn on auto-record for the meeting series

Recording saved to OneDrive

Detect the new MP4

Pop a browser, grab the .docx

Convert and notify

Design decisions, and a few dead ends

Why three skills instead of one

Why poll OneDrive instead of watching the chat

Why Playwright only on detection

Why Word COM for the conversion

Why two 30-min timers instead of chaining them

Crash-safety: never advance state on failure

Stage 0 · Turn on auto-record (do this once)

How to set it

What if you don't own the meeting

What if your tenant blocks auto-record

Stage 1 · OneDrive auto-save

Stage 2 · transcript-watcher

Config shape

The poll cycle

Stage 3 · meeting-transcript

The seven steps

The one tricky bit of Playwright code

Stage 4 · doc-watcher

How it works

The COM call that does the work

Running it manually — the idiot's guide

What's in this package

Adapting this for your own setup

If you have Clawpilot

If you don't have Clawpilot

If your tenant does have the Graph transcript API

Caveats