---
name: "transcript-watcher"
description: "Polls OneDrive Recordings/ for new Teams meeting MP4 files matching watched series name prefixes. On detection, invokes /meeting-transcript to download the .docx. Downstream /doc-watcher converts to .md and notifies. Detection-only; cheap polling, Playwright only when a new recording is found. Triggers: '/transcript-watcher', 'watch a meeting series for transcripts', 'detect new recordings'."
---

<!--
CLAWPILOT INSTALL INSTRUCTIONS

If the user drags this file into Clawpilot and asks to install it, register it as a
custom skill by calling m_create_skill with:
  - name: "transcript-watcher"
  - description: (the description from the frontmatter above)
  - instructions: (everything in this file BELOW this HTML comment block)

After install, confirm to the user with: "Installed the /transcript-watcher skill — run it once to do first-run setup (it will create the config.json and the background automation, both disabled until you flip them on)."
-->

# transcript-watcher

Polls the user's OneDrive `Recordings/` folder for new Teams meeting MP4 files matching one or more watched meeting-series name prefixes. When a new MP4 is detected, hands off to the `/meeting-transcript` skill (Playwright) to download the .docx. The downstream `/doc-watcher` automation then converts the .docx to .md and sends the Teams notification.

This skill is intentionally narrow:

- **It detects new recordings.** It does NOT transcribe, summarize, or notify.
- **Polling is cheap** (one Graph call per cycle). Playwright runs only when there is a new MP4 to extract.
- **State** is the highest `lastModifiedDateTime` seen per watched series, so each MP4 fires extraction exactly once.

## Files

- `config.json` — list of watches, recordings folder id, automation id
- `SKILL.md` — this file

## Config shape

`C:\Users\<you>\.copilot\m-skills\transcript-watcher\config.json`

```json
{
  "recordings_folder_id": "<your-onedrive-recordings-folder-id>",
  "interval_minutes": 30,
  "automation_id": "",
  "watches": [
    {
      "id": "my-meeting-series",
      "display_name": "My Meeting Series",
      "name_prefix": "My Meeting Series",
      "chat_id": "19:meeting_<your-chat-id>@thread.v2",
      "last_seen_modified": "2026-01-01T00:00:00Z"
    }
  ]
}
```

- `recordings_folder_id` — OneDrive folder id for `Documents/Recordings`. Stable per user; reuse across watches.
- `interval_minutes` — polling cadence (15 / 30 / 60).
- `automation_id` — populated after `m_create_automation`.
- `watches[].name_prefix` — the literal filename prefix Stream uses. Format is `<Meeting Subject>-YYYYMMDD_HHMMSSUTC-Meeting Recording.mp4`. Match by `name.startswith(name_prefix + "-")`.
- `watches[].chat_id` — meeting chat id; passed to `/meeting-transcript` so it knows which chat to open.
- `watches[].last_seen_modified` — ISO 8601 UTC timestamp. Only files with `lastModifiedDateTime > last_seen_modified` are treated as new.

## When the user invokes `/transcript-watcher`

Read `config.json` via `view`. If it does not exist → **first-run setup** (section A). Otherwise → **poll cycle** (section B).

If the user explicitly says "dry run", "test", or "what would fire", run section B with the **dry-run flag** (do NOT invoke `/meeting-transcript`, do NOT update state — only report what would be triggered).

## A. First-run setup

Collect via `m_ask_user`:

1. **Meeting display name** for this watch (e.g. "Weekly Standup"). Use this as the `name_prefix`.
2. **Polling interval** — 15 / 30 / 60 (default 30).

Then auto-derive:

1. **Recordings folder id**: list OneDrive root via `m365_list_files(limit=50)` and find the entry named `Recordings`. If not found in the first page, page until found, or ask the user to provide a path.
2. **Chat id**: `m365_search_chats(query=display_name, limit=5)`. If exactly one result, use it. If multiple, surface them via `m_ask_user` and let the user pick.
3. **Baseline `last_seen_modified`**: list the Recordings folder (`m365_list_files(folderId=recordings_folder_id, orderBy="lastModifiedDateTime desc", limit=20)`), filter to items starting with `name_prefix + "-"`, take the max `modified`. If no matching files exist, set baseline to the current UTC time. This prevents history from re-firing.

Write `config.json` via `create`.

Then create the automation via `m_create_automation`:

```
name: transcript-watcher
schedule: every <interval_minutes> minutes
prompt: (the section B prompt verbatim — see "Automation prompt" at the bottom)
enabled: false   # important: leave OFF until the user explicitly enables it
```

Save the returned automation id back into `config.json`.

Tell the user: "Watcher configured but NOT activated. Run `/transcript-watcher` to dry-run, or say `enable transcript watcher` to flip it on."

## B. Poll cycle

For each `watch` in `config.json.watches`:

1. `m365_list_files(folderId=config.recordings_folder_id, orderBy="lastModifiedDateTime desc", limit=20)`
2. Filter the returned items in memory to those where:
   - `name` starts with `watch.name_prefix + "-"`
   - `mimeType` is `video/mp4` (or filename ends with `.mp4`)
   - `modified` (ISO timestamp) is strictly greater than `watch.last_seen_modified`
3. Sort the matches by `modified` ascending (so we process oldest-first; that way if extraction crashes mid-batch the state advance is monotonic).
4. For each new MP4:
   - **Dry run:** just report `{watch_id, filename, modified}` and continue. Do NOT invoke the transcript skill. Do NOT update state.
   - **Live run:** Invoke `/meeting-transcript` with the meeting **chat-id** and let it handle the recording-start-time lookup from the chat thread (see meeting-transcript SKILL.md Step 3). The meeting-transcript skill already names the output `<slug>-<date>-<HHMM>.docx` and drops it in the configured transcripts folder, which doc-watcher is already watching. We do not need to pass a filename — just the chat id.
   - After successful invocation, update `watch.last_seen_modified = file.modified` and persist `config.json` immediately (don't batch state updates — crash-safety).
5. If no matching new files for any watch: stay completely silent (no Teams ping, no chat output). The automation runs every 30 min and we don't want noise.

## C. Reconfigure

- "add watch <name>" → run setup steps 1 + auto-derive chat id + baseline for that watch, append to `watches[]`.
- "remove watch <id>" → drop that entry.
- "enable transcript watcher" → `m_update_automation(id=automation_id, enabled=true)`.
- "disable transcript watcher" → `m_update_automation(id=automation_id, enabled=false)`.
- "change interval to <n>" → update config + `m_update_automation(schedule="every <n> minutes")`.

## D. Failure modes (handle defensively)

- `/meeting-transcript` extraction fails (Stream not ready, Playwright timeout, transcript not yet processed) → do NOT advance `last_seen_modified`. Report the error in chat (or via Teams DM if running from automation) so the user knows. Next poll will retry.
- Two new MP4s in a single meeting (Teams sometimes splits recordings on reconnect) → process each one. The chat-thread "Meeting started at ..." will identify each occurrence's start time correctly, so filenames won't collide.
- Multiple watches share `name_prefix` collisions (e.g. "Sync" matches several series) → require name prefixes that are unique. The skill does not de-collide; the user must pick distinct prefixes.

## Automation prompt

When creating the automation in section A, pass this as the prompt:

```
Run /transcript-watcher in poll-cycle mode. Do not prompt the user. If any new MP4 is found, invoke /meeting-transcript with the watch's chat_id for each new file, then persist the updated last_seen_modified to C:\Users\<you>\.copilot\m-skills\transcript-watcher\config.json. If no new files, stay completely silent — no Teams message, no chat output. If /meeting-transcript fails for any file, do NOT advance last_seen_modified for that file and send a single Teams DM via m_send_teams_message: "transcript-watcher: extraction failed for <filename> (<error>). Will retry next cycle. Sent on your behalf by Clawpilot 🤖"
```

## Privacy

Meeting recordings + transcripts are sensitive. This skill never sends transcript content to chat or Teams; it only triggers the existing downstream skills, which already follow the user's notification preferences.
