Files
drover-go/docs/superpowers/specs/2026-05-01-engine-design.md
T
root 5f107de95d spec: Phase 2 engine — WinDivert + SOCKS5 transparent proxy
Design accepted 2026-05-01. Locks in 5 architectural decisions
(GUI-only, UAC-per-launch, no DPI bypass, hide-to-tray with toast,
contextual recovery) and decomposes Phase 2 into 5 milestones with
explicit acceptance criteria + a 30-row edge case matrix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 19:21:16 +03:00

652 lines
37 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Engine — WinDivert + SOCKS5 transparent proxy for Discord
**Status**: design accepted 2026-05-01.
**Replaces**: stub `StartEngine`/`StopEngine` in `internal/gui/app.go` that just toggle a flag.
**Implements**: Phase 2 from `docs/planning/cuddly-baking-taco.md`.
## Why
The checker proves the upstream SOCKS5 proxy works. The engine is what
actually routes Discord's traffic through it. Without the engine, every
diagnostic in the world is theatre — the GUI just sits there saying
"Active" while Discord still talks direct to discord.com. Phase 2 turns
that "Active" state into reality: kernel-level packet capture (WinDivert),
NAT-style TCP redirect to a loopback listener, SOCKS5 UDP ASSOCIATE for
voice, and a polished lifecycle so the user can install once, click
"autostart at login", and forget the thing exists until Discord stops
working — at which point the tray icon turns yellow and explains why.
## Architecture decisions (locked-in 2026-05-01)
| # | Decision | Rationale |
|---|---|---|
| **A** | GUI-only single-process; no Windows service | Friends-and-family Windows-PC, Discord runs only when user is logged in. Service mode is overengineering for v1; can be added in v0.4 if a power user asks. |
| **B1** | UAC prompt at every launch; no scheduled-task trampoline | User chose simplicity over polish. Each `drover.exe` invocation re-elevates if not admin. Autostart via `HKCU\...\Run` triggers the same prompt at login. |
| **C1** | No DPI bypass (no fake QUIC injection) | Start with the simplest pipeline that works. If a friend reports voice not working on a DPI-active provider, add C2/C3 in v0.4. |
| **D1** | Window X = hide-to-tray + first-time toast; quit only via tray menu | Industry-standard (Steam, Discord, Telegram). One-shot toast prevents the "where did it go?" surprise. |
| **E3** | Contextual recovery: driver-loss → 1 reopen retry → fail-stop; proxy-loss → infinite exp-backoff (Reconnecting state); panic → fail-stop with crash dump; sleep/resume → graceful pause/resume | Different failure classes need different responses. Aggressive auto-restart on every error masks bugs; honest fail-stop on every error annoys the user during transient network blips. |
## High-level architecture
```
┌─────────────────────────────────────┐
│ drover.exe (single binary) │
│ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ Wails GUI │ │ systray │ │
│ └──────┬───────┘ └──────┬───────┘ │
│ └───────┬────────┘ │
│ ┌─────────▼──────────┐ │
│ │ Engine │ │
│ │ state machine │ │
│ │ Idle / Starting / │ │
│ │ Active / Reconn / │ │
│ │ Failed │ │
│ └─────────┬──────────┘ │
│ ┌─────────┼─────────────┐ │
│ ▼ ▼ ▼ │
│ ┌──────┐ ┌────────┐ ┌──────────┐ │
│ │divert│ │redirect│ │ procscan │ │
│ │ pkt │ │ TCP+UDP│ │ (2s tick)│ │
│ └──┬───┘ └───┬────┘ └────┬─────┘ │
│ ▼ ▼ │ │
│ WinDivert socks5 │ │
│ .sys client │ │
└──────────────────────────────┼──────┘
┌────────────┐ ┌─────────────▼───┐
│ kernel │ │ upstream SOCKS5 │
│ packet cap │ │ (mihomo) │
└────────────┘ └─────────────────┘
```
## File layout
```
cmd/drover/
main.go existing — extend with engine startup, single-instance check
uac_windows.go new — IsAdmin, ReElevate
console_windows.go existing
autoupdate_windows.go existing
internal/engine/
engine.go new — orchestration, state machine, lifecycle
state.go new — Idle/Starting/Active/Reconnecting/Failed enum + transitions
recovery.go new — failure classifier → action mapper
health.go new — heartbeat timer, traffic detector
power_windows.go new — WM_POWERBROADCAST listener (sleep/resume)
internal/divert/
divert.go new — WinDivert handle wrapper
filter.go new — filter expression builder
packet.go new — IPv4 + TCP/UDP parse + checksum recompute
installer.go new — extract embedded WinDivert.sys/.dll on first run
divert_arm64.go new — stub returning "ARM64 not supported"
internal/socks5/ NEW — production client (separate from internal/checker/socks5.go)
client.go new — TCP CONNECT + greet/auth
udp.go new — UDP ASSOCIATE + encapsulate/decapsulate
pool.go new — control-channel pool (deferred to P2.5 if needed)
internal/redirect/
tcp.go new — NAT-loopback redirect listener + per-flow pump
udp.go new — per-flow UDP tracker + encap/decap
internal/procscan/
procscan.go new — Toolhelp32 snapshot, periodic PID resolver
internal/tray/
tray.go new — getlantern/systray icon + menu
icons.go new — embed idle/active/reconnecting/error ICOs
internal/autostart/
autostart_windows.go new — HKCU\...\Run registry toggle
internal/single/
single_windows.go new — named mutex + activation pipe
internal/config/
config.go new — TOML schema + defaults
loader.go new — load/save with file lock
watcher.go new — fsnotify hot-reload
internal/gui/
app.go existing — extend with engine bindings
frontend/... existing — wire engine controls + autostart checkbox
third_party/windivert/ existing — WinDivert64.sys, WinDivert.dll, LICENSE-LGPL
third_party/icons/ new — tray/{idle,active,reconnecting,error}.ico
```
## Engine state machine
```
┌────────┐
│ Idle │ ◄────────────────── (initial)
└────┬───┘
│ user clicks "Start engine"
┌────────────┐
┌──────│ Starting │── any error ───┐
│ └─────┬──────┘ │
│ │ all checks ok │
│ ▼ │
│ ┌────────────┐ │
│ │ Active │ ◄─── recover ─┐ │
│ └────┬───────┘ │
│ │ proxy lost / SOCKS5 │
│ │ control channels died │
│ ▼ │
│ ┌─────────────┐ │
│ │Reconnecting │── 5 min cap ──┐ │
│ └────┬────────┘ │
│ │ recovered │
│ ▼ │
│ back to Active │
│ │
│ Stop button ─►───────────────────┐│
│ ▼▼
│ ┌────────┐
└──── Stop ───────────────────►│ Failed │
└────┬───┘
│ user clicks Retry
(back to Starting)
```
States visible to GUI as `EngineStatus`:
- `Idle` — engine off, tray icon grey, GUI shows "Start" button
- `Starting` — handle being opened, procscan running, health-check; tray yellow with spin
- `Active` — packets flowing; tray green; live stats updating
- `Reconnecting` — proxy unreachable, exponential backoff in progress; tray yellow; "Reconnecting (3rd attempt)"
- `Failed` — driver lost twice OR panic OR Reconnecting hit 5 min cap. Tray red. GUI shows error message + Retry button.
## E3 recovery rules (failure classifier)
```go
// internal/engine/recovery.go
type FailureClass int
const (
ClassDriverLost FailureClass = iota // WinDivert handle invalid, ERROR_INVALID_HANDLE on Recv
ClassDriverGone // WinDivertOpen returns ERROR_FILE_NOT_FOUND or similar
ClassProxyUnreachable // SOCKS5 control TCP connection rejected/timeout
ClassPanic // recover() in goroutine
ClassSleep // WM_POWERBROADCAST suspend
ClassResume // WM_POWERBROADCAST resume
ClassFatal // anything we can't classify
)
type Action int
const (
ActionRetryOnce Action = iota // sleep 2s, reopen, if fails again → Failed
ActionExpBackoff // 1s → 5s → 30s cap, infinite, max 5min cumulative
ActionFailStop // straight to Failed, write crash dump
ActionPause // drain in-flight, close sockets, transition to Reconnecting
ActionResume // wait 5s, reopen handle, transition to Active
)
func ClassifyFailure(err error, class FailureClass) Action
```
| Class | Action | UI feedback |
|---|---|---|
| `DriverLost` | RetryOnce | Status="reopening driver" |
| `DriverGone` | FailStop | "Driver missing — reinstall Drover" |
| `ProxyUnreachable` | ExpBackoff | "Reconnecting (Nth attempt)…" |
| `Panic` | FailStop | "Engine crashed — log saved to %PROGRAMDATA%\\Drover\\logs\\crash-*.txt" |
| `Sleep` | Pause | "Paused (system sleep)" |
| `Resume` | Resume | "Resuming…" then back to Active |
**Health-check before Start engine**: GUI's Start button first runs `internal/checker.Run` with a reduced subset (tcp + greet + udp tests, 2s budget, no voice-quality). If any fails, the engine doesn't start and the GUI shows what failed. Prevents the "I clicked Start but Discord still doesn't work" mystery.
**Heartbeat timer**: every 5s, sample `(rxBytes_now - rxBytes_5sAgo) > 0`. If false for 30s while Active and procscan reports Discord PIDs > 0, set status=`Active (no traffic)` (informational sub-state, tray green→yellow but state machine stays in Active). User sees this and can investigate (Discord might just be idle).
**Crash dumps**: panic recover in any engine goroutine writes `%PROGRAMDATA%\Drover\logs\crash-YYYYMMDD-HHMMSS.txt` with full stack + goroutine dump + version. Then transitions to Failed.
## WinDivert layer
### Filter expression (rebuilt on PID list change)
```
outbound and (tcp or udp) and ip
and (processId == 12345 or processId == 67890 or ...)
and processId != <own_pid>
and ip.DstAddr != <upstream_proxy_ip>
and not (ip.DstAddr >= 224.0.0.0 and ip.DstAddr <= 239.255.255.255)
and not (ip.DstAddr >= 127.0.0.0 and ip.DstAddr <= 127.255.255.255)
and not (ip.DstAddr >= 169.254.0.0 and ip.DstAddr <= 169.254.255.255)
```
Notes:
- `ip` (IPv4) only — no `ipv6` clause. Discord client falls back to v4 in ~150ms via Happy Eyeballs.
- `processId != own_pid` is critical — without it our own SOCKS5 traffic to upstream gets caught and infinite-looped.
- Multicast/loopback/link-local explicitly excluded (Discord never talks to those, but extra safety).
If the upstream proxy IP cannot be resolved at engine start, we fail-stop with a clear message — we cannot build a correct filter without it.
### Library choice
Use `github.com/imgk/divert-go` v0.1.0 (existing dep proposal — verify it still maintained when implementing P2.1). If unmaintained / broken, write thin syscall bindings directly — WinDivert C API is small (~6 functions used).
### Driver lifecycle
1. **First run**: extract embedded `WinDivert64.sys` + `WinDivert.dll` from Go `embed.FS` into `%PROGRAMDATA%\Drover\windivert\`. SHA256-verify against expected hashes (compiled in at build time).
2. **Open handle**: `WinDivertOpen(filter, layer=NETWORK, priority=0, flags=0)`. The driver auto-installs as a Windows service named "WinDivert" on first open.
3. **Driver remains installed across reboots** — we don't uninstall on Stop. Uninstaller (Inno Setup) explicitly does `sc stop WinDivert && sc delete WinDivert` on uninstall.
### Driver edge cases (D-series in matrix)
- **D-1: not installed** → embedded copy + auto-install on WinDivertOpen.
- **D-2: old v1.x** (zapret legacy) → `WinDivertOpen` returns `ERROR_DRIVER_FAILED_PRIOR_UNLOAD`. Detect: query service "WinDivert" via `OpenServiceW` + `QueryServiceStatusEx` to read binary path → check version resource. Show "Outdated WinDivert detected from another tool. Stop the other tool and reboot."
- **D-3: corrupted .sys** → SHA256 mismatch on extract. Reinstall path (delete + recopy + retry).
- **D-4: AV quarantine** → embedded bytes don't match expected → show specific error: "Antivirus may have quarantined WinDivert64.sys. Add `%PROGRAMDATA%\Drover\` to your AV exclusions and restart Drover."
- **D-5: reboot pending** → install successful but service not started → show "Reboot required to activate driver" with no retry button.
- **D-7: ARM64** → `runtime.GOARCH` check at startup; on ARM64 show "Drover requires x86-64 Windows. WinDivert does not support ARM64."
## TCP redirect (NAT-loopback)
### Mechanism
1. On engine start, bind a TCP listener on `127.0.0.1:0` (OS picks unused port). Save the port number.
2. WinDivert sees a new SYN from `Discord.exe → real_target_ip:real_target_port`. Engine:
a. Modifies the IP header: `dst_addr = 127.0.0.1`, `dst_port = listener_port`. Stores mapping `(src_port → real_target_ip:port)` in a `sync.Map` with TTL 30 min.
b. Recomputes IP + TCP checksums.
c. Reinjects via `WinDivertSend` with direction=outbound. The kernel routes to loopback because dst is now 127.0.0.1.
3. Listener `accept()` returns a conn from `127.0.0.1:src_port`. Engine looks up mapping by `src_port`, finds real_target.
4. Engine opens fresh SOCKS5 control TCP to upstream, does greet + (auth if config) + CONNECT to real_target_ip:port.
5. Once SOCKS5 returns REP=00, `io.Copy` pumps bytes both directions until EOF on either side.
6. Conn close → drop mapping.
### TCP edge cases
- **T-1: listener bind fails** → fail-stop "could not bind loopback listener". Should never happen (random unused port).
- **T-2: 100+ concurrent flows** — sync.Map scales fine. Bound only by Discord's TCP usage (typically 50).
- **T-3: TCP retransmits** — handled by OS at both sides of the loopback.
- **T-4: IPv6** — dropped at filter level. Discord falls back to v4.
- **T-5: half-closed** — `io.Copy` returns on EOF in one direction; we close the other side via `defer conn.Close()`.
- **T-6: mapping leak** if conn never properly closes — TTL 30min sweeper goroutine deletes stale entries.
## UDP redirect (SOCKS5 UDP ASSOCIATE)
### Mechanism
1. WinDivert sees outbound UDP from `Discord.exe:src_port → real_target_ip:port`. Engine:
a. Looks up mapping by `(src_ip, src_port, real_target_ip, real_target_port)`. If absent:
b. **Open new SOCKS5 control TCP** to upstream. Greet + (auth) + UDP ASSOCIATE.
c. Receive relay endpoint `(relay_ip, relay_port)` — if BND.ADDR is `0.0.0.0` substitute `upstream_proxy_ip`.
d. Open client-side UDP socket on `127.0.0.1:0`. Save mapping `flow_id → {control_tcp, relay, client_udp}`.
2. **Outbound packet path**: encap with SOCKS5 UDP header `00 00 | 00 | ATYP=01 | DST_IP(4) | DST_PORT(2) | DATA`. Send via `client_udp.WriteTo(packet, relay)`. Don't reinject the original packet — drop it (we sent the encapsulated version through the relay).
3. **Inbound packet path** (separate goroutine per flow): `client_udp.ReadFrom(buf)` → strip 10-byte SOCKS5 header → fabricate an IPv4+UDP packet with `src=real_target_ip:port, dst=Discord_src_ip:src_port`, recompute checksums → `WinDivertSend` direction=inbound. Discord sees a normal reply from real_target.
4. Idle TTL 5 min: any flow with no packets for 5 min → close control_tcp + client_udp + remove mapping.
### UDP edge cases
- **U-1**: each flow gets its own control TCP. No pool in v1 (overhead is ~5KB per flow, fine for ~10 active flows).
- **U-2: idle leak** → 5min TTL.
- **U-3: Discord changes voice region** mid-call → old flow goes idle (5min TTL), new flow opens. Brief glitch.
- **U-4: UDP fragments** → SOCKS5 RFC 1928 doesn't support FRAG. Drop. Discord packets are typically <1500 bytes; fragmentation rare.
- **U-5: control TCP dies** → next packet detects via `Write` error → close mapping → next-next packet opens fresh control. Audio glitch ~2-3s.
## Process scanning
### Mechanism
`internal/procscan` runs every 2 seconds:
1. `CreateToolhelp32Snapshot(TH32CS_SNAPPROCESS, 0)` → enumerate via `Process32First`/`Process32Next`. Microseconds.
2. Filter by `szExeFile` against config `targets.processes` (case-insensitive on Windows).
3. Diff vs previous PID set. If different → notify engine to rebuild filter expression and reopen WinDivert handle.
### Race: Discord starts up to 2s before procscan catches it
Mitigation: at engine `Start`, do **synchronous initial scan** before opening WinDivert handle. After that, the periodic 2s tick handles ongoing changes.
### Process edge cases
- **P-1: Discord PID changes** → 2s scan + 50ms reopen gap with direct traffic. Acceptable.
- **P-2: multiple Discord variants**: default config includes `Discord.exe`, `DiscordCanary.exe`, `DiscordPTB.exe`, `Update.exe`. Vesktop **opt-in** via config (not default).
- **P-3: Update.exe** (Discord's updater) included in default — it downloads patches via HTTP and we want those proxied too.
- **P-5: PID re-use** (Discord exits, Chrome takes the PID before next scan) → 2s window where Chrome packets get proxied. Cosmetic, low-impact.
## Self-loop protection
The engine itself opens TCP/UDP connections to the upstream proxy. Without protection, the WinDivert filter would catch our own packets, encapsulate them in another SOCKS5 layer, infinite loop in seconds.
Three layers of defense:
1. `processId != own_pid` in the filter expression.
2. `ip.DstAddr != <upstream_proxy_ip>` (resolved once at engine start; if upstream uses DDNS we re-resolve every 30s of failed reconnects).
3. Listener and SOCKS5 client always bind to `127.0.0.1` — even if filter leaks, loopback traffic is excluded by `not (ip.DstAddr >= 127.0.0.0 ...)`.
## UAC + autostart (B1)
### Elevation
`cmd/drover/main.go` startup sequence:
```go
func main() {
// 1. AttachConsole for CLI compatibility (existing)
attachConsole()
// 2. Single-instance check (mutex). If second instance, send "show" to first and exit.
if !single.AcquireMutex() {
single.ActivateExistingInstance()
os.Exit(0)
}
// 3. Parse Cobra commands. CLI sub-commands like `--check` and `--version` don't need admin
// and can run as user. The default GUI mode requires admin for WinDivert.
if cmdNeedsAdmin() && !uac.IsAdmin() {
uac.ReElevate(os.Args[1:]) // ShellExecute("runas", ...) + exit
os.Exit(0)
}
// 4. Auto-update check (existing). Replace exe + relaunch if needed.
autoUpdateOnStartup()
// 5. Boot Wails GUI + engine.
gui.Run(Version)
}
```
`uac.ReElevate` uses `ShellExecuteW` with `lpVerb="runas"`. If user cancels UAC, `ShellExecute` returns `SE_ERR_ACCESSDENIED` → we exit cleanly without an error dialog (the user already saw their cancel intent).
### Autostart
Implemented via `HKCU\Software\Microsoft\Windows\CurrentVersion\Run\DroverGo`:
- Value type: REG_SZ, value: full path to `drover.exe` with no args
- Set on toggle ON, deleted on toggle OFF
- GUI Settings tab has a checkbox "Запускать при входе в Windows" that reads/writes this key
**Edge case A-5**: User disables autostart via Task Manager → Startup Apps. Windows writes a `Disabled` mark in `HKCU\Software\Microsoft\Windows\CurrentVersion\Explorer\StartupApproved\Run`. On GUI mount we check both keys; if Disabled → checkbox shown unchecked (user wins).
**Edge case A-6**: Stale path (drover.exe was moved). On every launch we re-write the key value to `os.Executable()` if autostart is enabled. Self-healing.
## Tray + window (D1)
### Tray icon (4 ICO files embedded)
| State | Icon | When shown |
|---|---|---|
| `idle` | grey | Engine not running |
| `active` | green | Engine running, traffic flowing |
| `reconnecting` | yellow | Reconnecting state OR no-traffic-detected |
| `error` | red | Failed state |
### Tray menu (right-click)
```
[●] Active · 2h 14m · ↑ 142 KB/s ↓ 1.2 MB/s [disabled status row, dynamic]
─────────────────────────────────────
[⏸] Stop proxying [primary action, contextual]
[🔍] Run check [opens window + auto-runs check]
─────────────────────────────────────
[🪟] Show window [hidden when window is visible]
[📁] Open log file
─────────────────────────────────────
[🔄] Check for updates
[] About
─────────────────────────────────────
[✕] Quit
```
The status row is updated every 1s while engine is running.
### Click behaviors
- Single-click tray icon → toggle window visibility
- Double-click tray icon → open window (no toggle, always show)
- X on window title bar → hide to tray (D1)
- First-time only: toast "Drover свёрнут в трей. Engine продолжает работать. Закрыть полностью — через меню трея → Quit." Track via `config.ui.shown_tray_toast = true`.
- Quit from tray menu → graceful engine stop → exit cleanly
### Library
`github.com/getlantern/systray`. Stable on Win10/11 modulo the explorer-restart edge case which the library handles internally.
## Single-instance enforcement
Mutex name: `Global\DroverGoInstance-<installID>` where `installID = SHA256(os.Executable())[:16]`. This way:
- Installed copy at `C:\Program Files\Drover\drover.exe` and a portable copy at `D:\portable\drover.exe` get different mutexes — both can run.
- Two simultaneous launches of the same install fight over the mutex; second loses.
Activation pipe: `\\.\pipe\drover-gui-<installID>`. Second instance opens it, writes `{"action":"show"}`, closes. First instance's listener goroutine pops the window to foreground.
If first instance crashes without cleanup → mutex disappears at process death (kernel handle table cleanup). Next launch acquires normally.
## Sleep/resume handling
`WM_POWERBROADCAST` listener via Windows message loop in a dedicated goroutine. Uses `RegisterPowerSettingNotification` for fine-grained events.
| Event | Action |
|---|---|
| `PBT_APMSUSPEND` | Engine: drain in-flight packets (give 200ms), close all SOCKS5 control TCPs, close WinDivert handle, set status="paused (sleep)" |
| `PBT_APMRESUMEAUTOMATIC` or `PBT_APMRESUMESUSPEND` | Wait 5s for network reconnect (poll `GetIpForwardTable2` for default route presence), reopen WinDivert handle, run health-check, transition Active |
## Stats counters
Atomic counters in `internal/engine/stats.go`:
- `bytesIn uint64` — bytes received from upstream (decapsulated UDP + TCP `io.Copy` returns)
- `bytesOut uint64` — bytes sent to upstream
- `tcpFlowsActive int32` — current count of open TCP redirects
- `udpFlowsActive int32` — current count of open UDP flows
- `startedAt time.Time` — engine start time (for uptime)
Per-flow counters discarded on flow close (no aggregation needed for v1).
Tray status row updates from these every 1s. GUI live stats panel does the same via Wails event `stats:update` (existing path).
Lifetime totals persisted to `%PROGRAMDATA%\Drover\stats.json` every 60s and on Stop.
## Config schema (TOML)
`%APPDATA%\Drover\config.toml`:
```toml
# Drover-Go config — auto-managed by GUI; manual edits hot-reload via fsnotify.
version = 1
[proxy]
host = "95.165.72.59"
port = 12334
auth = false
login = ""
password = ""
udp_associate_timeout = "5s"
tcp_connect_timeout = "10s"
[targets]
processes = ["Discord.exe", "DiscordCanary.exe", "DiscordPTB.exe", "Update.exe"]
include_vesktop = false
[skip]
# CIDR ranges to never proxy. Local + link-local always implicitly skipped at filter level.
extra_skip_cidrs = []
multicast = true
[ui]
log_level = "info"
log_max_mb = 10
log_backups = 3
tray_icon = true
auto_start = false # mirror of HKCU\...\Run
shown_tray_toast = false # one-shot first-close toast tracking
theme = "dark" # dark | light | auto
[update]
check_on_startup = true
forgejo_repo = "git.okcu.io/root/drover-go"
[engine]
heartbeat_interval = "5s"
no_traffic_warn_after = "30s"
reconnect_backoff_initial = "1s"
reconnect_backoff_max = "30s"
reconnect_total_cap = "5m"
```
Edge cases:
- **M-4 corrupted TOML** → log warning + use defaults + GUI shows banner "Config error line N — running with defaults".
- **M-7 hot-reload** → fsnotify on the file. On change: re-parse → if proxy section changed → engine restart (Stop → wait clean → Start). Other sections apply live.
- **Config migration** v1→v2 handled by `version` field; missing version assumes 1.
## Edge case matrix (full)
This is the master list. Every row must have a corresponding test or explicit "verified manually" note in the implementation plan.
| # | Edge case | Mitigation | Test |
|---|---|---|---|
| **D-1** | WinDivert.sys not installed | Embed binary, copy to %PROGRAMDATA%, WinDivertOpen auto-loads | manual: clean Win11 VM |
| **D-2** | Old WinDivert v1.x present (zapret legacy) | Service version query → "remove old version first" error | manual: install zapret first, verify error |
| **D-3** | Driver corrupted | SHA256 verify on extract → reinstall flow with progress | unit test: SHA256 mismatch path |
| **D-4** | AV quarantines our embedded .sys | Specific AV-friendly error message + README link | manual: Defender enabled + first run |
| **D-5** | Reboot pending after install | Show "Reboot to activate driver" | manual: trigger via DISM |
| **D-7** | ARM64 Windows | Detect at startup, refuse install | unit: GOARCH=arm64 build returns expected error |
| **P-1** | Discord PID changes | 2s procscan + filter rebuild | integration: kill+restart Discord, verify continuity |
| **P-3** | Update.exe traffic | Default list includes it | integration: trigger Discord update, verify Update.exe traffic proxied |
| **P-5** | PID re-use | Cosmetic 2s window | accept |
| **L-1** | Self-loop (drover's own SOCKS5 traffic) | Filter excludes own_pid + upstream IP | unit: filter expression builder verifies own PID in output |
| **T-4** | IPv6 Discord targets | Drop at filter level; Happy Eyeballs falls back | manual: verify with `netsh interface ipv6 set route ::/0 disabled` |
| **T-6** | TCP mapping leak | 30min TTL cleanup | unit: TTL sweeper test |
| **U-2** | Idle UDP flow leak | 5min TTL cleanup | unit: TTL sweeper test |
| **U-4** | UDP fragments | Drop (SOCKS5 doesn't support FRAG) | accept (rare) |
| **A-1** | User non-admin | UAC re-launch on startup | manual: standard user account |
| **A-2** | UAC cancelled | Clean exit, no error dialog | manual: cancel UAC prompt |
| **A-3** | UAC at every login (autostart) | Accepted per B1 | document in README |
| **A-5** | Autostart disabled via Task Manager | Detect StartupApproved key, sync GUI checkbox | unit: registry mock |
| **TR-1** | Tray icon disappears on explorer.exe restart | systray library handles re-attach | manual: kill+restart explorer.exe |
| **TR-3** | First-time tray toast | Track `ui.shown_tray_toast` in config | unit: config writer |
| **SI-1** | Mutex collision portable vs installed | installID = SHA256(exe path)[:16] | unit: two paths → two mutexes |
| **SI-3** | First instance crashed without cleanup | Kernel cleans mutex on process death | manual: kill -9 first, launch second |
| **SR-1** | System sleep | WM_POWERBROADCAST listener → graceful pause | manual: trigger sleep on test machine |
| **SR-2** | System resume | Wait 5s network → reopen handle → resume | manual: wake from sleep |
| **UP-1** | Auto-update during active engine | Graceful shutdown → replace exe → relaunch with prior state | manual: stage v0.1 → v0.2 update during voice call |
| **M-1** | VPN concurrent | WinDivert ловит до VPN encap; SOCKS5 traffic to upstream IP — норма | manual: with WireGuard + Drover both active |
| **M-4** | Config corrupted | Use defaults + warning banner | unit: malformed TOML → defaults applied |
| **M-5** | Proxy IP changed (DDNS) | Re-resolve hostname every 30s of failed reconnect | unit: hostname resolver retry |
| **M-7** | Hot-reload config | fsnotify → engine restart | integration: edit TOML, observe restart |
## Out of scope (Phase 3+)
- DPI bypass / fake QUIC injection (decision **C1**) — add as opt-in toggle in v0.4 if needed
- Windows service mode (decision **A**) — add for power users in v0.4 if requested
- IPv6 SOCKS5 ATYP=04 — add when we hit a v6-only proxy
- ARM64 Windows — add when WinDivert ships ARM64 driver (waiting on basil00 upstream)
- Multi-user PC scenarios — single-user assumption baked in
- Vesktop default-on — stays opt-in via `targets.include_vesktop = true`
- Custom DNS resolver / DNS-over-proxy — out of scope; DNS goes direct, document in README
## Phase 2 milestones
Each milestone is a separate `writing-plans` invocation followed by `subagent-driven-development` execution.
### P2.1 — TCP-only MVP (3-4 days)
**Scope**: WinDivert handle, filter expression, packet parser, TCP NAT-loopback redirect, SOCKS5 client (TCP CONNECT only), procscan, self-loop protection, basic engine state machine (Idle/Starting/Active/Failed without Reconnecting yet).
**Acceptance**:
- Run drover.exe on Win11 with admin
- Discord chat + Discord API requests routed through SOCKS5 (verify via tcpdump on mihomo: should see TCP CONNECT to discord.com:443 from upstream IP)
- Voice does NOT yet work (UDP path absent) — documented expectation
- Stop button cleanly closes everything in <500ms
- Driver remains installed after exit (verify `sc query WinDivert`)
- No self-loop infinite traffic (verify: bytes in == bytes out, not exponentially growing)
### P2.2 — UDP voice (3-4 days)
**Scope**: SOCKS5 UDP ASSOCIATE primitives (production-grade, not the diagnostic-only fork in checker), UDP flow tracker, packet encap/decap, IPv4-fabrication-and-reinject for inbound path.
**Acceptance**:
- Voice call in Discord through proxy works without audible degradation
- Up to 4 simultaneous voice calls (ish) work without flow leakage
- Idle voice flow cleanup at 5min TTL (verified via debug log)
- Mid-call proxy disconnect → flow drops → re-opens within 2s on next outbound packet → ~2-3s audible glitch
- No memory leak after 1h voice call (RSS stable ±5MB)
### P2.3 — E3 recovery + sleep/resume (2 days)
**Scope**: failure classifier, contextual retry policies, Reconnecting state, exponential backoff, WM_POWERBROADCAST listener, heartbeat health-check.
**Acceptance**:
- Stop mihomo on LXC 102 mid-session → engine transitions Active → Reconnecting → Active when mihomo back up (within 30s of recovery)
- Trigger machine sleep mid-voice-call → engine pauses gracefully → wake → engine resumes within 10s after network up → voice continues (Discord client itself reconnects)
- WinDivert handle externally killed (`sc stop WinDivert && sc start WinDivert`) → engine reopens once → if second kill within 30s → Failed with crash log
- Heartbeat detects "no traffic" while Discord open and idle → tray turns yellow with "no traffic" tooltip → no Failed transition
### P2.4 — Tray + autostart + engine UI (2-3 days)
**Scope**: getlantern/systray integration, 4 ICO icons, tray menu (D1 + first-time toast), autostart checkbox in GUI Settings tab, Start/Stop buttons in main window wired to engine, status indicator with state machine awareness, single-instance enforcement.
**Acceptance**:
- Toggle autostart on → reboot → drover launches at login (after UAC accept)
- X on window → first-time toast → second X → silent hide
- Start button only enabled when checker passed (or in Failed state with Retry)
- Tray icon updates within 200ms of state change
- Two simultaneous launches → second activates first's window and exits silently
- Status row in tray menu updates every 1s while Active
### P2.5 — Polish (2-3 days)
**Scope**: crash dumps, config hot-reload via fsnotify, AV-friendly error messages, all remaining edge cases from matrix, README troubleshooting, install/uninstall verification on clean Win11 VM.
**Acceptance**:
- Every edge case in the matrix has either a passing test or a verified manual reproduction note in `docs/testing/p2-edge-cases.md`
- Install on clean Win11 VM, run for 1 hour without intervention, no errors
- Uninstall via Apps & Features removes everything except optionally-kept config (asked at uninstall)
- README has SmartScreen + AV troubleshooting sections with screenshots
**Total**: ~12-16 days to v1.0.0.
## Testing strategy
### Unit tests (per-package)
- `divert/filter`: filter expression builder produces expected strings for various PID lists
- `divert/packet`: parse + serialize + checksum recompute is round-trip identity
- `engine/recovery`: failure classifier returns expected Action for each FailureClass
- `socks5/udp`: encap/decap round-trip
- `procscan`: snapshot diffing, mocked toolhelp32
- `autostart`: registry read/write/disabled-detection (with mock registry)
- `single`: mutex acquire + release lifecycle
- `config`: defaults applied, malformed TOML → defaults + warning, version migration
### Integration tests (each milestone has its own)
- `engine_test.go`: mock WinDivert + mock SOCKS5 server in-process, exercise full pipeline
- `redirect_test.go`: spin up TCP listener, fake Discord client, fake SOCKS5 server, verify bytes flow
### Manual test plan (per milestone, in `docs/testing/p2-<milestone>-manual.md`)
Each manual test case is a numbered step-by-step with expected outcome. Run on clean Win11 VM snapshot before each milestone tag.
### End-to-end (manual, before v1.0.0)
Full user journey in `docs/testing/p2-e2e.md`:
1. Download installer from Forgejo release
2. Install via setup.exe (UAC prompt)
3. First launch: configure proxy, run check, click Start
4. Run Discord, place voice call → verify routing via mihomo logs
5. Toggle autostart on
6. Reboot → verify drover starts at login (UAC accept)
7. Sleep + wake cycle → verify continuity
8. Stop mihomo → verify Reconnecting state → restart mihomo → verify recovery
9. Quit via tray menu → verify clean shutdown
10. Uninstall → verify cleanup
## Open questions / assumptions to validate during P2.1
1. **`imgk/divert-go` v0.1.0 still works with WinDivert v2.2.2?** If not, switch to direct syscall bindings. Verify in P2.1 day 1.
2. **Filter expression length limit** — WinDivert filter expressions have a max length. With 4 Discord PIDs + own PID + upstream IP exclusion + multicast we should be well under, but if user adds 10+ Vesktop variants we might hit it. Verify and document limit during P2.1.
3. **`WinDivertSend` for inbound packets we synthesize** — does the kernel correctly route a fabricated `dst=Discord_IP, src=real_target_IP` packet back to Discord's socket? Most divert-based tools do this; verify in P2.2 day 1 with a tracer.
4. **Embedded ICO size on disk** — 4 icons × ~5KB = 20KB. Negligible.
## Files to read before implementation
- `imgk/shadow/pkg/divert/` — opens handle + read packets pattern (downloaded already)
- `imgk/divert-go` README + `addr.go` — API surface
- `runetfreedom/force-proxy/proxy.cpp` — correct SOCKS5 UDP ASSOCIATE flow (local at `/tmp/drover-cmp/force-proxy/`)
- `wailsapp/wails/v2/examples/react` — Wails patterns for Engine bindings
- This spec.