# doc-watcher

A Clawpilot skill that watches a folder for new documents and saves a plain-text `.md` copy of each one. Originals are never modified or deleted. Sends a Teams DM when new files are converted.

This is **stage 4 of the [Teams Transcript Pipeline](../teams-transcript-pipeline/)** suite, but it's intentionally generic — point it at any folder of `.docx`, `.doc`, `.docm`, `.rtf`, `.odt`, `.txt`, or `.md` files and it'll convert them to clean UTF-8 Markdown using Word COM.

## Why this exists

I wanted plain text that *looks like what Word renders*, not an XML parse. Word COM is the only thing on Windows that reliably handles every edge case (legacy `.doc`, modern `.docx`, `.docm`, `.rtf`, `.odt`) and produces clean Unicode text. It runs headless (`Visible=False`) and only spins up Word for a few seconds per file.

Pairs with `/transcript-watcher` + `/meeting-transcript` upstream to take Teams meeting recordings all the way from "stop recording" to "Markdown in a folder, Teams ping in chat" with no manual clicks.

## Install

1. Copy `doc-watcher.md` to your Clawpilot skills folder:
   - Windows: `C:\Users\<you>\.copilot\m-skills\doc-watcher\SKILL.md`
   - Or import via Clawpilot Settings → Skills → Add local skill.
2. **Also drop `convert.py` next to `SKILL.md`** in that same folder. The skill shells out to it via Python.
3. (Optional) Drop a starter `config.json` next to it — see `config.example.json` for the shape. First-run setup will create one for you if you skip this.
4. Restart Clawpilot (or reload skills) so the new skill is picked up.

## Use

In a Clawpilot chat:

```
/doc-watcher
```

- **First invocation**: prompts for folder, pattern, output dir, and polling interval, then creates a background automation (enabled by default — disable from the Automations panel if you want to stage it).
- **Subsequent invocations**: runs the converter once on demand. Sends a Teams DM listing any newly converted files.
- **Reconfigure**: ask in natural language — "watch `<new folder>` instead", "change pattern to `*.docx;*.rtf`", "stop watching", "change interval to 15 minutes".

## Files

- `doc-watcher.md` — the skill definition Clawpilot loads.
- `convert.py` — the converter. Uses Microsoft Word via COM (`pywin32`) so both legacy `.doc` and modern `.docx` work transparently. Also handles `.rtf`, `.odt`, `.docm`, `.txt`, `.md`.
- `config.example.json` — starter config shape.

## Run the converter manually (no Clawpilot needed)

```powershell
python "C:\Users\<you>\.copilot\m-skills\doc-watcher\convert.py" `
  --folder "C:\path\to\watch" `
  --pattern "*.docx" `
  --state  "C:\Users\<you>\.copilot\m-skills\doc-watcher\state.json"
```

Add `--all` to reconvert previously processed files. Use `--out "<dir>"` to write Markdown to a separate folder.

## Requirements

- Python 3.12+ with `pywin32` installed (`pip install pywin32`).
- Microsoft Word installed on the machine (Word runs hidden via COM automation).
- Clawpilot with Microsoft Teams tools (`m_send_teams_message`) and the automation tools (`m_create_automation`, `m_update_automation`) — only required if you want the polling + notification half.

## License

Provided as-is. No warranty. Do whatever you like with it.
