diff --git a/docs/superpowers/specs/2026-05-01-engine-design.md b/docs/superpowers/specs/2026-05-01-engine-design.md new file mode 100644 index 0000000..72b7700 --- /dev/null +++ b/docs/superpowers/specs/2026-05-01-engine-design.md @@ -0,0 +1,651 @@ +# Engine — WinDivert + SOCKS5 transparent proxy for Discord + +**Status**: design accepted 2026-05-01. +**Replaces**: stub `StartEngine`/`StopEngine` in `internal/gui/app.go` that just toggle a flag. +**Implements**: Phase 2 from `docs/planning/cuddly-baking-taco.md`. + +## Why + +The checker proves the upstream SOCKS5 proxy works. The engine is what +actually routes Discord's traffic through it. Without the engine, every +diagnostic in the world is theatre — the GUI just sits there saying +"Active" while Discord still talks direct to discord.com. Phase 2 turns +that "Active" state into reality: kernel-level packet capture (WinDivert), +NAT-style TCP redirect to a loopback listener, SOCKS5 UDP ASSOCIATE for +voice, and a polished lifecycle so the user can install once, click +"autostart at login", and forget the thing exists until Discord stops +working — at which point the tray icon turns yellow and explains why. + +## Architecture decisions (locked-in 2026-05-01) + +| # | Decision | Rationale | +|---|---|---| +| **A** | GUI-only single-process; no Windows service | Friends-and-family Windows-PC, Discord runs only when user is logged in. Service mode is overengineering for v1; can be added in v0.4 if a power user asks. | +| **B1** | UAC prompt at every launch; no scheduled-task trampoline | User chose simplicity over polish. Each `drover.exe` invocation re-elevates if not admin. Autostart via `HKCU\...\Run` triggers the same prompt at login. | +| **C1** | No DPI bypass (no fake QUIC injection) | Start with the simplest pipeline that works. If a friend reports voice not working on a DPI-active provider, add C2/C3 in v0.4. | +| **D1** | Window X = hide-to-tray + first-time toast; quit only via tray menu | Industry-standard (Steam, Discord, Telegram). One-shot toast prevents the "where did it go?" surprise. | +| **E3** | Contextual recovery: driver-loss → 1 reopen retry → fail-stop; proxy-loss → infinite exp-backoff (Reconnecting state); panic → fail-stop with crash dump; sleep/resume → graceful pause/resume | Different failure classes need different responses. Aggressive auto-restart on every error masks bugs; honest fail-stop on every error annoys the user during transient network blips. | + +## High-level architecture + +``` + ┌─────────────────────────────────────┐ + │ drover.exe (single binary) │ + │ │ + │ ┌──────────────┐ ┌──────────────┐ │ + │ │ Wails GUI │ │ systray │ │ + │ └──────┬───────┘ └──────┬───────┘ │ + │ └───────┬────────┘ │ + │ ┌─────────▼──────────┐ │ + │ │ Engine │ │ + │ │ state machine │ │ + │ │ Idle / Starting / │ │ + │ │ Active / Reconn / │ │ + │ │ Failed │ │ + │ └─────────┬──────────┘ │ + │ ┌─────────┼─────────────┐ │ + │ ▼ ▼ ▼ │ + │ ┌──────┐ ┌────────┐ ┌──────────┐ │ + │ │divert│ │redirect│ │ procscan │ │ + │ │ pkt │ │ TCP+UDP│ │ (2s tick)│ │ + │ └──┬───┘ └───┬────┘ └────┬─────┘ │ + │ ▼ ▼ │ │ + │ WinDivert socks5 │ │ + │ .sys client │ │ + └──────────────────────────────┼──────┘ + │ + ┌────────────┐ ┌─────────────▼───┐ + │ kernel │ │ upstream SOCKS5 │ + │ packet cap │ │ (mihomo) │ + └────────────┘ └─────────────────┘ +``` + +## File layout + +``` +cmd/drover/ + main.go existing — extend with engine startup, single-instance check + uac_windows.go new — IsAdmin, ReElevate + console_windows.go existing + autoupdate_windows.go existing + +internal/engine/ + engine.go new — orchestration, state machine, lifecycle + state.go new — Idle/Starting/Active/Reconnecting/Failed enum + transitions + recovery.go new — failure classifier → action mapper + health.go new — heartbeat timer, traffic detector + power_windows.go new — WM_POWERBROADCAST listener (sleep/resume) + +internal/divert/ + divert.go new — WinDivert handle wrapper + filter.go new — filter expression builder + packet.go new — IPv4 + TCP/UDP parse + checksum recompute + installer.go new — extract embedded WinDivert.sys/.dll on first run + divert_arm64.go new — stub returning "ARM64 not supported" + +internal/socks5/ NEW — production client (separate from internal/checker/socks5.go) + client.go new — TCP CONNECT + greet/auth + udp.go new — UDP ASSOCIATE + encapsulate/decapsulate + pool.go new — control-channel pool (deferred to P2.5 if needed) + +internal/redirect/ + tcp.go new — NAT-loopback redirect listener + per-flow pump + udp.go new — per-flow UDP tracker + encap/decap + +internal/procscan/ + procscan.go new — Toolhelp32 snapshot, periodic PID resolver + +internal/tray/ + tray.go new — getlantern/systray icon + menu + icons.go new — embed idle/active/reconnecting/error ICOs + +internal/autostart/ + autostart_windows.go new — HKCU\...\Run registry toggle + +internal/single/ + single_windows.go new — named mutex + activation pipe + +internal/config/ + config.go new — TOML schema + defaults + loader.go new — load/save with file lock + watcher.go new — fsnotify hot-reload + +internal/gui/ + app.go existing — extend with engine bindings + frontend/... existing — wire engine controls + autostart checkbox + +third_party/windivert/ existing — WinDivert64.sys, WinDivert.dll, LICENSE-LGPL +third_party/icons/ new — tray/{idle,active,reconnecting,error}.ico +``` + +## Engine state machine + +``` + ┌────────┐ + │ Idle │ ◄────────────────── (initial) + └────┬───┘ + │ user clicks "Start engine" + ▼ + ┌────────────┐ + ┌──────│ Starting │── any error ───┐ + │ └─────┬──────┘ │ + │ │ all checks ok │ + │ ▼ │ + │ ┌────────────┐ │ + │ │ Active │ ◄─── recover ─┐ │ + │ └────┬───────┘ │ + │ │ proxy lost / SOCKS5 │ + │ │ control channels died │ + │ ▼ │ + │ ┌─────────────┐ │ + │ │Reconnecting │── 5 min cap ──┐ │ + │ └────┬────────┘ │ + │ │ recovered │ + │ ▼ │ + │ back to Active │ + │ │ + │ Stop button ─►───────────────────┐│ + │ ▼▼ + │ ┌────────┐ + └──── Stop ───────────────────►│ Failed │ + └────┬───┘ + │ user clicks Retry + ▼ + (back to Starting) +``` + +States visible to GUI as `EngineStatus`: +- `Idle` — engine off, tray icon grey, GUI shows "Start" button +- `Starting` — handle being opened, procscan running, health-check; tray yellow with spin +- `Active` — packets flowing; tray green; live stats updating +- `Reconnecting` — proxy unreachable, exponential backoff in progress; tray yellow; "Reconnecting (3rd attempt)" +- `Failed` — driver lost twice OR panic OR Reconnecting hit 5 min cap. Tray red. GUI shows error message + Retry button. + +## E3 recovery rules (failure classifier) + +```go +// internal/engine/recovery.go + +type FailureClass int +const ( + ClassDriverLost FailureClass = iota // WinDivert handle invalid, ERROR_INVALID_HANDLE on Recv + ClassDriverGone // WinDivertOpen returns ERROR_FILE_NOT_FOUND or similar + ClassProxyUnreachable // SOCKS5 control TCP connection rejected/timeout + ClassPanic // recover() in goroutine + ClassSleep // WM_POWERBROADCAST suspend + ClassResume // WM_POWERBROADCAST resume + ClassFatal // anything we can't classify +) + +type Action int +const ( + ActionRetryOnce Action = iota // sleep 2s, reopen, if fails again → Failed + ActionExpBackoff // 1s → 5s → 30s cap, infinite, max 5min cumulative + ActionFailStop // straight to Failed, write crash dump + ActionPause // drain in-flight, close sockets, transition to Reconnecting + ActionResume // wait 5s, reopen handle, transition to Active +) + +func ClassifyFailure(err error, class FailureClass) Action +``` + +| Class | Action | UI feedback | +|---|---|---| +| `DriverLost` | RetryOnce | Status="reopening driver" | +| `DriverGone` | FailStop | "Driver missing — reinstall Drover" | +| `ProxyUnreachable` | ExpBackoff | "Reconnecting (Nth attempt)…" | +| `Panic` | FailStop | "Engine crashed — log saved to %PROGRAMDATA%\\Drover\\logs\\crash-*.txt" | +| `Sleep` | Pause | "Paused (system sleep)" | +| `Resume` | Resume | "Resuming…" then back to Active | + +**Health-check before Start engine**: GUI's Start button first runs `internal/checker.Run` with a reduced subset (tcp + greet + udp tests, 2s budget, no voice-quality). If any fails, the engine doesn't start and the GUI shows what failed. Prevents the "I clicked Start but Discord still doesn't work" mystery. + +**Heartbeat timer**: every 5s, sample `(rxBytes_now - rxBytes_5sAgo) > 0`. If false for 30s while Active and procscan reports Discord PIDs > 0, set status=`Active (no traffic)` (informational sub-state, tray green→yellow but state machine stays in Active). User sees this and can investigate (Discord might just be idle). + +**Crash dumps**: panic recover in any engine goroutine writes `%PROGRAMDATA%\Drover\logs\crash-YYYYMMDD-HHMMSS.txt` with full stack + goroutine dump + version. Then transitions to Failed. + +## WinDivert layer + +### Filter expression (rebuilt on PID list change) + +``` +outbound and (tcp or udp) and ip + and (processId == 12345 or processId == 67890 or ...) + and processId != + and ip.DstAddr != + and not (ip.DstAddr >= 224.0.0.0 and ip.DstAddr <= 239.255.255.255) + and not (ip.DstAddr >= 127.0.0.0 and ip.DstAddr <= 127.255.255.255) + and not (ip.DstAddr >= 169.254.0.0 and ip.DstAddr <= 169.254.255.255) +``` + +Notes: +- `ip` (IPv4) only — no `ipv6` clause. Discord client falls back to v4 in ~150ms via Happy Eyeballs. +- `processId != own_pid` is critical — without it our own SOCKS5 traffic to upstream gets caught and infinite-looped. +- Multicast/loopback/link-local explicitly excluded (Discord never talks to those, but extra safety). + +If the upstream proxy IP cannot be resolved at engine start, we fail-stop with a clear message — we cannot build a correct filter without it. + +### Library choice + +Use `github.com/imgk/divert-go` v0.1.0 (existing dep proposal — verify it still maintained when implementing P2.1). If unmaintained / broken, write thin syscall bindings directly — WinDivert C API is small (~6 functions used). + +### Driver lifecycle + +1. **First run**: extract embedded `WinDivert64.sys` + `WinDivert.dll` from Go `embed.FS` into `%PROGRAMDATA%\Drover\windivert\`. SHA256-verify against expected hashes (compiled in at build time). +2. **Open handle**: `WinDivertOpen(filter, layer=NETWORK, priority=0, flags=0)`. The driver auto-installs as a Windows service named "WinDivert" on first open. +3. **Driver remains installed across reboots** — we don't uninstall on Stop. Uninstaller (Inno Setup) explicitly does `sc stop WinDivert && sc delete WinDivert` on uninstall. + +### Driver edge cases (D-series in matrix) + +- **D-1: not installed** → embedded copy + auto-install on WinDivertOpen. +- **D-2: old v1.x** (zapret legacy) → `WinDivertOpen` returns `ERROR_DRIVER_FAILED_PRIOR_UNLOAD`. Detect: query service "WinDivert" via `OpenServiceW` + `QueryServiceStatusEx` to read binary path → check version resource. Show "Outdated WinDivert detected from another tool. Stop the other tool and reboot." +- **D-3: corrupted .sys** → SHA256 mismatch on extract. Reinstall path (delete + recopy + retry). +- **D-4: AV quarantine** → embedded bytes don't match expected → show specific error: "Antivirus may have quarantined WinDivert64.sys. Add `%PROGRAMDATA%\Drover\` to your AV exclusions and restart Drover." +- **D-5: reboot pending** → install successful but service not started → show "Reboot required to activate driver" with no retry button. +- **D-7: ARM64** → `runtime.GOARCH` check at startup; on ARM64 show "Drover requires x86-64 Windows. WinDivert does not support ARM64." + +## TCP redirect (NAT-loopback) + +### Mechanism + +1. On engine start, bind a TCP listener on `127.0.0.1:0` (OS picks unused port). Save the port number. +2. WinDivert sees a new SYN from `Discord.exe → real_target_ip:real_target_port`. Engine: + a. Modifies the IP header: `dst_addr = 127.0.0.1`, `dst_port = listener_port`. Stores mapping `(src_port → real_target_ip:port)` in a `sync.Map` with TTL 30 min. + b. Recomputes IP + TCP checksums. + c. Reinjects via `WinDivertSend` with direction=outbound. The kernel routes to loopback because dst is now 127.0.0.1. +3. Listener `accept()` returns a conn from `127.0.0.1:src_port`. Engine looks up mapping by `src_port`, finds real_target. +4. Engine opens fresh SOCKS5 control TCP to upstream, does greet + (auth if config) + CONNECT to real_target_ip:port. +5. Once SOCKS5 returns REP=00, `io.Copy` pumps bytes both directions until EOF on either side. +6. Conn close → drop mapping. + +### TCP edge cases + +- **T-1: listener bind fails** → fail-stop "could not bind loopback listener". Should never happen (random unused port). +- **T-2: 100+ concurrent flows** — sync.Map scales fine. Bound only by Discord's TCP usage (typically 50). +- **T-3: TCP retransmits** — handled by OS at both sides of the loopback. +- **T-4: IPv6** — dropped at filter level. Discord falls back to v4. +- **T-5: half-closed** — `io.Copy` returns on EOF in one direction; we close the other side via `defer conn.Close()`. +- **T-6: mapping leak** if conn never properly closes — TTL 30min sweeper goroutine deletes stale entries. + +## UDP redirect (SOCKS5 UDP ASSOCIATE) + +### Mechanism + +1. WinDivert sees outbound UDP from `Discord.exe:src_port → real_target_ip:port`. Engine: + a. Looks up mapping by `(src_ip, src_port, real_target_ip, real_target_port)`. If absent: + b. **Open new SOCKS5 control TCP** to upstream. Greet + (auth) + UDP ASSOCIATE. + c. Receive relay endpoint `(relay_ip, relay_port)` — if BND.ADDR is `0.0.0.0` substitute `upstream_proxy_ip`. + d. Open client-side UDP socket on `127.0.0.1:0`. Save mapping `flow_id → {control_tcp, relay, client_udp}`. +2. **Outbound packet path**: encap with SOCKS5 UDP header `00 00 | 00 | ATYP=01 | DST_IP(4) | DST_PORT(2) | DATA`. Send via `client_udp.WriteTo(packet, relay)`. Don't reinject the original packet — drop it (we sent the encapsulated version through the relay). +3. **Inbound packet path** (separate goroutine per flow): `client_udp.ReadFrom(buf)` → strip 10-byte SOCKS5 header → fabricate an IPv4+UDP packet with `src=real_target_ip:port, dst=Discord_src_ip:src_port`, recompute checksums → `WinDivertSend` direction=inbound. Discord sees a normal reply from real_target. +4. Idle TTL 5 min: any flow with no packets for 5 min → close control_tcp + client_udp + remove mapping. + +### UDP edge cases + +- **U-1**: each flow gets its own control TCP. No pool in v1 (overhead is ~5KB per flow, fine for ~10 active flows). +- **U-2: idle leak** → 5min TTL. +- **U-3: Discord changes voice region** mid-call → old flow goes idle (5min TTL), new flow opens. Brief glitch. +- **U-4: UDP fragments** → SOCKS5 RFC 1928 doesn't support FRAG. Drop. Discord packets are typically <1500 bytes; fragmentation rare. +- **U-5: control TCP dies** → next packet detects via `Write` error → close mapping → next-next packet opens fresh control. Audio glitch ~2-3s. + +## Process scanning + +### Mechanism + +`internal/procscan` runs every 2 seconds: +1. `CreateToolhelp32Snapshot(TH32CS_SNAPPROCESS, 0)` → enumerate via `Process32First`/`Process32Next`. Microseconds. +2. Filter by `szExeFile` against config `targets.processes` (case-insensitive on Windows). +3. Diff vs previous PID set. If different → notify engine to rebuild filter expression and reopen WinDivert handle. + +### Race: Discord starts up to 2s before procscan catches it + +Mitigation: at engine `Start`, do **synchronous initial scan** before opening WinDivert handle. After that, the periodic 2s tick handles ongoing changes. + +### Process edge cases + +- **P-1: Discord PID changes** → 2s scan + 50ms reopen gap with direct traffic. Acceptable. +- **P-2: multiple Discord variants**: default config includes `Discord.exe`, `DiscordCanary.exe`, `DiscordPTB.exe`, `Update.exe`. Vesktop **opt-in** via config (not default). +- **P-3: Update.exe** (Discord's updater) included in default — it downloads patches via HTTP and we want those proxied too. +- **P-5: PID re-use** (Discord exits, Chrome takes the PID before next scan) → 2s window where Chrome packets get proxied. Cosmetic, low-impact. + +## Self-loop protection + +The engine itself opens TCP/UDP connections to the upstream proxy. Without protection, the WinDivert filter would catch our own packets, encapsulate them in another SOCKS5 layer, infinite loop in seconds. + +Three layers of defense: + +1. `processId != own_pid` in the filter expression. +2. `ip.DstAddr != ` (resolved once at engine start; if upstream uses DDNS we re-resolve every 30s of failed reconnects). +3. Listener and SOCKS5 client always bind to `127.0.0.1` — even if filter leaks, loopback traffic is excluded by `not (ip.DstAddr >= 127.0.0.0 ...)`. + +## UAC + autostart (B1) + +### Elevation + +`cmd/drover/main.go` startup sequence: + +```go +func main() { + // 1. AttachConsole for CLI compatibility (existing) + attachConsole() + + // 2. Single-instance check (mutex). If second instance, send "show" to first and exit. + if !single.AcquireMutex() { + single.ActivateExistingInstance() + os.Exit(0) + } + + // 3. Parse Cobra commands. CLI sub-commands like `--check` and `--version` don't need admin + // and can run as user. The default GUI mode requires admin for WinDivert. + if cmdNeedsAdmin() && !uac.IsAdmin() { + uac.ReElevate(os.Args[1:]) // ShellExecute("runas", ...) + exit + os.Exit(0) + } + + // 4. Auto-update check (existing). Replace exe + relaunch if needed. + autoUpdateOnStartup() + + // 5. Boot Wails GUI + engine. + gui.Run(Version) +} +``` + +`uac.ReElevate` uses `ShellExecuteW` with `lpVerb="runas"`. If user cancels UAC, `ShellExecute` returns `SE_ERR_ACCESSDENIED` → we exit cleanly without an error dialog (the user already saw their cancel intent). + +### Autostart + +Implemented via `HKCU\Software\Microsoft\Windows\CurrentVersion\Run\DroverGo`: +- Value type: REG_SZ, value: full path to `drover.exe` with no args +- Set on toggle ON, deleted on toggle OFF +- GUI Settings tab has a checkbox "Запускать при входе в Windows" that reads/writes this key + +**Edge case A-5**: User disables autostart via Task Manager → Startup Apps. Windows writes a `Disabled` mark in `HKCU\Software\Microsoft\Windows\CurrentVersion\Explorer\StartupApproved\Run`. On GUI mount we check both keys; if Disabled → checkbox shown unchecked (user wins). + +**Edge case A-6**: Stale path (drover.exe was moved). On every launch we re-write the key value to `os.Executable()` if autostart is enabled. Self-healing. + +## Tray + window (D1) + +### Tray icon (4 ICO files embedded) + +| State | Icon | When shown | +|---|---|---| +| `idle` | grey | Engine not running | +| `active` | green | Engine running, traffic flowing | +| `reconnecting` | yellow | Reconnecting state OR no-traffic-detected | +| `error` | red | Failed state | + +### Tray menu (right-click) + +``` +[●] Active · 2h 14m · ↑ 142 KB/s ↓ 1.2 MB/s [disabled status row, dynamic] +───────────────────────────────────── +[⏸] Stop proxying [primary action, contextual] +[🔍] Run check [opens window + auto-runs check] +───────────────────────────────────── +[🪟] Show window [hidden when window is visible] +[📁] Open log file +───────────────────────────────────── +[🔄] Check for updates +[ℹ] About +───────────────────────────────────── +[✕] Quit +``` + +The status row is updated every 1s while engine is running. + +### Click behaviors + +- Single-click tray icon → toggle window visibility +- Double-click tray icon → open window (no toggle, always show) +- X on window title bar → hide to tray (D1) + - First-time only: toast "Drover свёрнут в трей. Engine продолжает работать. Закрыть полностью — через меню трея → Quit." Track via `config.ui.shown_tray_toast = true`. +- Quit from tray menu → graceful engine stop → exit cleanly + +### Library + +`github.com/getlantern/systray`. Stable on Win10/11 modulo the explorer-restart edge case which the library handles internally. + +## Single-instance enforcement + +Mutex name: `Global\DroverGoInstance-` where `installID = SHA256(os.Executable())[:16]`. This way: +- Installed copy at `C:\Program Files\Drover\drover.exe` and a portable copy at `D:\portable\drover.exe` get different mutexes — both can run. +- Two simultaneous launches of the same install fight over the mutex; second loses. + +Activation pipe: `\\.\pipe\drover-gui-`. Second instance opens it, writes `{"action":"show"}`, closes. First instance's listener goroutine pops the window to foreground. + +If first instance crashes without cleanup → mutex disappears at process death (kernel handle table cleanup). Next launch acquires normally. + +## Sleep/resume handling + +`WM_POWERBROADCAST` listener via Windows message loop in a dedicated goroutine. Uses `RegisterPowerSettingNotification` for fine-grained events. + +| Event | Action | +|---|---| +| `PBT_APMSUSPEND` | Engine: drain in-flight packets (give 200ms), close all SOCKS5 control TCPs, close WinDivert handle, set status="paused (sleep)" | +| `PBT_APMRESUMEAUTOMATIC` or `PBT_APMRESUMESUSPEND` | Wait 5s for network reconnect (poll `GetIpForwardTable2` for default route presence), reopen WinDivert handle, run health-check, transition Active | + +## Stats counters + +Atomic counters in `internal/engine/stats.go`: +- `bytesIn uint64` — bytes received from upstream (decapsulated UDP + TCP `io.Copy` returns) +- `bytesOut uint64` — bytes sent to upstream +- `tcpFlowsActive int32` — current count of open TCP redirects +- `udpFlowsActive int32` — current count of open UDP flows +- `startedAt time.Time` — engine start time (for uptime) + +Per-flow counters discarded on flow close (no aggregation needed for v1). + +Tray status row updates from these every 1s. GUI live stats panel does the same via Wails event `stats:update` (existing path). + +Lifetime totals persisted to `%PROGRAMDATA%\Drover\stats.json` every 60s and on Stop. + +## Config schema (TOML) + +`%APPDATA%\Drover\config.toml`: + +```toml +# Drover-Go config — auto-managed by GUI; manual edits hot-reload via fsnotify. + +version = 1 + +[proxy] +host = "95.165.72.59" +port = 12334 +auth = false +login = "" +password = "" +udp_associate_timeout = "5s" +tcp_connect_timeout = "10s" + +[targets] +processes = ["Discord.exe", "DiscordCanary.exe", "DiscordPTB.exe", "Update.exe"] +include_vesktop = false + +[skip] +# CIDR ranges to never proxy. Local + link-local always implicitly skipped at filter level. +extra_skip_cidrs = [] +multicast = true + +[ui] +log_level = "info" +log_max_mb = 10 +log_backups = 3 +tray_icon = true +auto_start = false # mirror of HKCU\...\Run +shown_tray_toast = false # one-shot first-close toast tracking +theme = "dark" # dark | light | auto + +[update] +check_on_startup = true +forgejo_repo = "git.okcu.io/root/drover-go" + +[engine] +heartbeat_interval = "5s" +no_traffic_warn_after = "30s" +reconnect_backoff_initial = "1s" +reconnect_backoff_max = "30s" +reconnect_total_cap = "5m" +``` + +Edge cases: +- **M-4 corrupted TOML** → log warning + use defaults + GUI shows banner "Config error line N — running with defaults". +- **M-7 hot-reload** → fsnotify on the file. On change: re-parse → if proxy section changed → engine restart (Stop → wait clean → Start). Other sections apply live. +- **Config migration** v1→v2 handled by `version` field; missing version assumes 1. + +## Edge case matrix (full) + +This is the master list. Every row must have a corresponding test or explicit "verified manually" note in the implementation plan. + +| # | Edge case | Mitigation | Test | +|---|---|---|---| +| **D-1** | WinDivert.sys not installed | Embed binary, copy to %PROGRAMDATA%, WinDivertOpen auto-loads | manual: clean Win11 VM | +| **D-2** | Old WinDivert v1.x present (zapret legacy) | Service version query → "remove old version first" error | manual: install zapret first, verify error | +| **D-3** | Driver corrupted | SHA256 verify on extract → reinstall flow with progress | unit test: SHA256 mismatch path | +| **D-4** | AV quarantines our embedded .sys | Specific AV-friendly error message + README link | manual: Defender enabled + first run | +| **D-5** | Reboot pending after install | Show "Reboot to activate driver" | manual: trigger via DISM | +| **D-7** | ARM64 Windows | Detect at startup, refuse install | unit: GOARCH=arm64 build returns expected error | +| **P-1** | Discord PID changes | 2s procscan + filter rebuild | integration: kill+restart Discord, verify continuity | +| **P-3** | Update.exe traffic | Default list includes it | integration: trigger Discord update, verify Update.exe traffic proxied | +| **P-5** | PID re-use | Cosmetic 2s window | accept | +| **L-1** | Self-loop (drover's own SOCKS5 traffic) | Filter excludes own_pid + upstream IP | unit: filter expression builder verifies own PID in output | +| **T-4** | IPv6 Discord targets | Drop at filter level; Happy Eyeballs falls back | manual: verify with `netsh interface ipv6 set route ::/0 disabled` | +| **T-6** | TCP mapping leak | 30min TTL cleanup | unit: TTL sweeper test | +| **U-2** | Idle UDP flow leak | 5min TTL cleanup | unit: TTL sweeper test | +| **U-4** | UDP fragments | Drop (SOCKS5 doesn't support FRAG) | accept (rare) | +| **A-1** | User non-admin | UAC re-launch on startup | manual: standard user account | +| **A-2** | UAC cancelled | Clean exit, no error dialog | manual: cancel UAC prompt | +| **A-3** | UAC at every login (autostart) | Accepted per B1 | document in README | +| **A-5** | Autostart disabled via Task Manager | Detect StartupApproved key, sync GUI checkbox | unit: registry mock | +| **TR-1** | Tray icon disappears on explorer.exe restart | systray library handles re-attach | manual: kill+restart explorer.exe | +| **TR-3** | First-time tray toast | Track `ui.shown_tray_toast` in config | unit: config writer | +| **SI-1** | Mutex collision portable vs installed | installID = SHA256(exe path)[:16] | unit: two paths → two mutexes | +| **SI-3** | First instance crashed without cleanup | Kernel cleans mutex on process death | manual: kill -9 first, launch second | +| **SR-1** | System sleep | WM_POWERBROADCAST listener → graceful pause | manual: trigger sleep on test machine | +| **SR-2** | System resume | Wait 5s network → reopen handle → resume | manual: wake from sleep | +| **UP-1** | Auto-update during active engine | Graceful shutdown → replace exe → relaunch with prior state | manual: stage v0.1 → v0.2 update during voice call | +| **M-1** | VPN concurrent | WinDivert ловит до VPN encap; SOCKS5 traffic to upstream IP — норма | manual: with WireGuard + Drover both active | +| **M-4** | Config corrupted | Use defaults + warning banner | unit: malformed TOML → defaults applied | +| **M-5** | Proxy IP changed (DDNS) | Re-resolve hostname every 30s of failed reconnect | unit: hostname resolver retry | +| **M-7** | Hot-reload config | fsnotify → engine restart | integration: edit TOML, observe restart | + +## Out of scope (Phase 3+) + +- DPI bypass / fake QUIC injection (decision **C1**) — add as opt-in toggle in v0.4 if needed +- Windows service mode (decision **A**) — add for power users in v0.4 if requested +- IPv6 SOCKS5 ATYP=04 — add when we hit a v6-only proxy +- ARM64 Windows — add when WinDivert ships ARM64 driver (waiting on basil00 upstream) +- Multi-user PC scenarios — single-user assumption baked in +- Vesktop default-on — stays opt-in via `targets.include_vesktop = true` +- Custom DNS resolver / DNS-over-proxy — out of scope; DNS goes direct, document in README + +## Phase 2 milestones + +Each milestone is a separate `writing-plans` invocation followed by `subagent-driven-development` execution. + +### P2.1 — TCP-only MVP (3-4 days) + +**Scope**: WinDivert handle, filter expression, packet parser, TCP NAT-loopback redirect, SOCKS5 client (TCP CONNECT only), procscan, self-loop protection, basic engine state machine (Idle/Starting/Active/Failed without Reconnecting yet). + +**Acceptance**: +- Run drover.exe on Win11 with admin +- Discord chat + Discord API requests routed through SOCKS5 (verify via tcpdump on mihomo: should see TCP CONNECT to discord.com:443 from upstream IP) +- Voice does NOT yet work (UDP path absent) — documented expectation +- Stop button cleanly closes everything in <500ms +- Driver remains installed after exit (verify `sc query WinDivert`) +- No self-loop infinite traffic (verify: bytes in == bytes out, not exponentially growing) + +### P2.2 — UDP voice (3-4 days) + +**Scope**: SOCKS5 UDP ASSOCIATE primitives (production-grade, not the diagnostic-only fork in checker), UDP flow tracker, packet encap/decap, IPv4-fabrication-and-reinject for inbound path. + +**Acceptance**: +- Voice call in Discord through proxy works without audible degradation +- Up to 4 simultaneous voice calls (ish) work without flow leakage +- Idle voice flow cleanup at 5min TTL (verified via debug log) +- Mid-call proxy disconnect → flow drops → re-opens within 2s on next outbound packet → ~2-3s audible glitch +- No memory leak after 1h voice call (RSS stable ±5MB) + +### P2.3 — E3 recovery + sleep/resume (2 days) + +**Scope**: failure classifier, contextual retry policies, Reconnecting state, exponential backoff, WM_POWERBROADCAST listener, heartbeat health-check. + +**Acceptance**: +- Stop mihomo on LXC 102 mid-session → engine transitions Active → Reconnecting → Active when mihomo back up (within 30s of recovery) +- Trigger machine sleep mid-voice-call → engine pauses gracefully → wake → engine resumes within 10s after network up → voice continues (Discord client itself reconnects) +- WinDivert handle externally killed (`sc stop WinDivert && sc start WinDivert`) → engine reopens once → if second kill within 30s → Failed with crash log +- Heartbeat detects "no traffic" while Discord open and idle → tray turns yellow with "no traffic" tooltip → no Failed transition + +### P2.4 — Tray + autostart + engine UI (2-3 days) + +**Scope**: getlantern/systray integration, 4 ICO icons, tray menu (D1 + first-time toast), autostart checkbox in GUI Settings tab, Start/Stop buttons in main window wired to engine, status indicator with state machine awareness, single-instance enforcement. + +**Acceptance**: +- Toggle autostart on → reboot → drover launches at login (after UAC accept) +- X on window → first-time toast → second X → silent hide +- Start button only enabled when checker passed (or in Failed state with Retry) +- Tray icon updates within 200ms of state change +- Two simultaneous launches → second activates first's window and exits silently +- Status row in tray menu updates every 1s while Active + +### P2.5 — Polish (2-3 days) + +**Scope**: crash dumps, config hot-reload via fsnotify, AV-friendly error messages, all remaining edge cases from matrix, README troubleshooting, install/uninstall verification on clean Win11 VM. + +**Acceptance**: +- Every edge case in the matrix has either a passing test or a verified manual reproduction note in `docs/testing/p2-edge-cases.md` +- Install on clean Win11 VM, run for 1 hour without intervention, no errors +- Uninstall via Apps & Features removes everything except optionally-kept config (asked at uninstall) +- README has SmartScreen + AV troubleshooting sections with screenshots + +**Total**: ~12-16 days to v1.0.0. + +## Testing strategy + +### Unit tests (per-package) + +- `divert/filter`: filter expression builder produces expected strings for various PID lists +- `divert/packet`: parse + serialize + checksum recompute is round-trip identity +- `engine/recovery`: failure classifier returns expected Action for each FailureClass +- `socks5/udp`: encap/decap round-trip +- `procscan`: snapshot diffing, mocked toolhelp32 +- `autostart`: registry read/write/disabled-detection (with mock registry) +- `single`: mutex acquire + release lifecycle +- `config`: defaults applied, malformed TOML → defaults + warning, version migration + +### Integration tests (each milestone has its own) + +- `engine_test.go`: mock WinDivert + mock SOCKS5 server in-process, exercise full pipeline +- `redirect_test.go`: spin up TCP listener, fake Discord client, fake SOCKS5 server, verify bytes flow + +### Manual test plan (per milestone, in `docs/testing/p2--manual.md`) + +Each manual test case is a numbered step-by-step with expected outcome. Run on clean Win11 VM snapshot before each milestone tag. + +### End-to-end (manual, before v1.0.0) + +Full user journey in `docs/testing/p2-e2e.md`: +1. Download installer from Forgejo release +2. Install via setup.exe (UAC prompt) +3. First launch: configure proxy, run check, click Start +4. Run Discord, place voice call → verify routing via mihomo logs +5. Toggle autostart on +6. Reboot → verify drover starts at login (UAC accept) +7. Sleep + wake cycle → verify continuity +8. Stop mihomo → verify Reconnecting state → restart mihomo → verify recovery +9. Quit via tray menu → verify clean shutdown +10. Uninstall → verify cleanup + +## Open questions / assumptions to validate during P2.1 + +1. **`imgk/divert-go` v0.1.0 still works with WinDivert v2.2.2?** If not, switch to direct syscall bindings. Verify in P2.1 day 1. +2. **Filter expression length limit** — WinDivert filter expressions have a max length. With 4 Discord PIDs + own PID + upstream IP exclusion + multicast we should be well under, but if user adds 10+ Vesktop variants we might hit it. Verify and document limit during P2.1. +3. **`WinDivertSend` for inbound packets we synthesize** — does the kernel correctly route a fabricated `dst=Discord_IP, src=real_target_IP` packet back to Discord's socket? Most divert-based tools do this; verify in P2.2 day 1 with a tracer. +4. **Embedded ICO size on disk** — 4 icons × ~5KB = 20KB. Negligible. + +## Files to read before implementation + +- `imgk/shadow/pkg/divert/` — opens handle + read packets pattern (downloaded already) +- `imgk/divert-go` README + `addr.go` — API surface +- `runetfreedom/force-proxy/proxy.cpp` — correct SOCKS5 UDP ASSOCIATE flow (local at `/tmp/drover-cmp/force-proxy/`) +- `wailsapp/wails/v2/examples/react` — Wails patterns for Engine bindings +- This spec.