---
name: "doc-watcher"
description: "Watches a folder for new documents (default *.docx) and saves a plain-text .md copy of each one using Word COM (headless). Originals are never modified or deleted. Sends a Teams DM when new files are converted. On first run, prompts for folder/pattern/interval and creates a background automation. Triggers: '/doc-watcher', 'watch this folder for docx', 'convert docx to markdown automatically'."
---

<!--
CLAWPILOT INSTALL INSTRUCTIONS

If the user drags this file into Clawpilot and asks to install it, register it as a
custom skill by calling m_create_skill with:
  - name: "doc-watcher"
  - description: (the description from the frontmatter above)
  - instructions: (everything in this file BELOW this HTML comment block)

After install, also place convert.py next to SKILL.md in the skill folder
(C:\Users\<you>\.copilot\m-skills\doc-watcher\convert.py). Confirm to the user with:
"Installed the /doc-watcher skill — invoke it to run first-run setup. It will create
config.json and a background automation, then convert any new .docx files in the
watched folder on every tick."
-->

# doc-watcher

Watches a folder for new documents (default `*.docx`) and saves a plain-text `.md` copy of each one. Originals are never modified or deleted.

This is **stage 4 of the [Teams Transcript Pipeline](../teams-transcript-pipeline/)** suite — but it's intentionally generic. Point it at any folder with any glob pattern and it'll convert Word documents to clean UTF-8 markdown using Word COM.

When the user invokes `/doc-watcher`, follow this flow.

## 1. Load config

Config path: `C:\Users\<you>\.copilot\m-skills\doc-watcher\config.json`

Shape:
```json
{
  "folder": "C:\\path\\to\\watch",
  "pattern": "*.docx",
  "out_dir": "",
  "interval_minutes": 30,
  "automation_id": ""
}
```

- `folder` — folder to watch (required)
- `pattern` — one or more globs separated by `;` (default `*.docx`)
- `out_dir` — empty string means write `.md` next to each source file
- `interval_minutes` — polling cadence for the background automation
- `automation_id` — populated after the automation is created

Read `config.json` via the `view` tool. If it does not exist, do first-run setup.

## 2. First-run setup (only when config.json is missing)

Use `m_ask_user` to collect:

1. **Folder** to watch.
2. **Pattern** — default `*.docx`. Accept multiple globs joined by `;` (e.g. `*.docx;*.rtf`).
3. **Output folder** — default empty (write `.md` next to originals).
4. **Frequency** — default 30 minutes. Offer 15 / 30 / 60.

Write the config with the `create` tool. Then create the background automation via `m_create_automation` with the prompt shown in section 5.

Save the returned automation id back into `config.json`.

## 3. Run the converter

Execute (use the values from `config.json`):

```powershell
python "C:\Users\<you>\.copilot\m-skills\doc-watcher\convert.py" `
  --folder "<folder>" `
  --pattern "<pattern>" `
  --state  "C:\Users\<you>\.copilot\m-skills\doc-watcher\state.json"
```

Add `--out "<out_dir>"` only if `out_dir` is non-empty. Add `--all` only if the user explicitly asks to reconvert everything.

The script:
- Hydrates OneDrive cloud-only files by copying to a temp location.
- Uses Word COM (via `pywin32`) to open `.doc`, `.docx`, `.docm`, `.rtf`, `.odt` and save them as Unicode text. Plain text files (`.txt`, `.md`) pass through.
- Skips files already listed in `state.json` so each run only converts new arrivals.
- Prints a JSON summary to stdout: `processed`, `skipped`, `errors`.

## 4. Notify

After the converter finishes, parse its JSON output.

**If `processed` is non-empty OR `errors` is non-empty**, ALWAYS send a Teams notification via `m_send_teams_message`. Format:

```
doc-watcher converted N new file(s) in <folder>:
- <filename1>
- <filename2>
(errors, if any, with filename + error)
Sent on your behalf by Clawpilot 🤖
```

Use just the basename of each file, not the full path. Do this whether the run was manual (user invoked `/doc-watcher`) or from the scheduled automation.

If both `processed` and `errors` are empty:
- Manual run: tell the user in chat that nothing new was found.
- Automation run: stay completely silent (no Teams message, no chat output).

Also summarize results in the chat when the user invoked the skill manually.

## 5. Automation prompt

When creating or updating the automation, use this prompt:

```
Run the doc-watcher converter silently. Read C:\Users\<you>\.copilot\m-skills\doc-watcher\config.json and execute:
python "C:\Users\<you>\.copilot\m-skills\doc-watcher\convert.py" --folder "<folder>" --pattern "<pattern>" --out "<out_dir>" --state "C:\Users\<you>\.copilot\m-skills\doc-watcher\state.json"
(omit --out if out_dir is empty). Parse the JSON output. If the "processed" list is non-empty OR "errors" is non-empty, call m_send_teams_message with:

"doc-watcher converted N new file(s) in <folder>:
- <basename>
- <basename>
(errors, if any, with filename + error)
Sent on your behalf by Clawpilot 🤖"

If both processed and errors are empty, do nothing and stay silent. Never prompt the user.
```

## 6. Reconfigure

If the user asks to change folder, pattern, frequency, or output:
- Update `config.json` accordingly.
- If `interval_minutes` changed, call `m_update_automation(id=automation_id, schedule="every <new> minutes")`.
- If the user asks to stop watching, call `m_update_automation(id=automation_id, enabled=false)`.

## Dependencies

Python 3.12+ with `pywin32`. Microsoft Word must be installed (used in headless `Visible=False` mode).
