spec: Phase 2 engine — WinDivert + SOCKS5 transparent proxy

Design accepted 2026-05-01. Locks in 5 architectural decisions
(GUI-only, UAC-per-launch, no DPI bypass, hide-to-tray with toast,
contextual recovery) and decomposes Phase 2 into 5 milestones with
explicit acceptance criteria + a 30-row edge case matrix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-01 19:21:16 +03:00
parent 11c4eb7f4a
commit 5f107de95d
@@ -0,0 +1,651 @@
# Engine — WinDivert + SOCKS5 transparent proxy for Discord
**Status**: design accepted 2026-05-01.
**Replaces**: stub `StartEngine`/`StopEngine` in `internal/gui/app.go` that just toggle a flag.
**Implements**: Phase 2 from `docs/planning/cuddly-baking-taco.md`.
## Why
The checker proves the upstream SOCKS5 proxy works. The engine is what
actually routes Discord's traffic through it. Without the engine, every
diagnostic in the world is theatre — the GUI just sits there saying
"Active" while Discord still talks direct to discord.com. Phase 2 turns
that "Active" state into reality: kernel-level packet capture (WinDivert),
NAT-style TCP redirect to a loopback listener, SOCKS5 UDP ASSOCIATE for
voice, and a polished lifecycle so the user can install once, click
"autostart at login", and forget the thing exists until Discord stops
working — at which point the tray icon turns yellow and explains why.
## Architecture decisions (locked-in 2026-05-01)
| # | Decision | Rationale |
|---|---|---|
| **A** | GUI-only single-process; no Windows service | Friends-and-family Windows-PC, Discord runs only when user is logged in. Service mode is overengineering for v1; can be added in v0.4 if a power user asks. |
| **B1** | UAC prompt at every launch; no scheduled-task trampoline | User chose simplicity over polish. Each `drover.exe` invocation re-elevates if not admin. Autostart via `HKCU\...\Run` triggers the same prompt at login. |
| **C1** | No DPI bypass (no fake QUIC injection) | Start with the simplest pipeline that works. If a friend reports voice not working on a DPI-active provider, add C2/C3 in v0.4. |
| **D1** | Window X = hide-to-tray + first-time toast; quit only via tray menu | Industry-standard (Steam, Discord, Telegram). One-shot toast prevents the "where did it go?" surprise. |
| **E3** | Contextual recovery: driver-loss → 1 reopen retry → fail-stop; proxy-loss → infinite exp-backoff (Reconnecting state); panic → fail-stop with crash dump; sleep/resume → graceful pause/resume | Different failure classes need different responses. Aggressive auto-restart on every error masks bugs; honest fail-stop on every error annoys the user during transient network blips. |
## High-level architecture
```
┌─────────────────────────────────────┐
│ drover.exe (single binary) │
│ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ Wails GUI │ │ systray │ │
│ └──────┬───────┘ └──────┬───────┘ │
│ └───────┬────────┘ │
│ ┌─────────▼──────────┐ │
│ │ Engine │ │
│ │ state machine │ │
│ │ Idle / Starting / │ │
│ │ Active / Reconn / │ │
│ │ Failed │ │
│ └─────────┬──────────┘ │
│ ┌─────────┼─────────────┐ │
│ ▼ ▼ ▼ │
│ ┌──────┐ ┌────────┐ ┌──────────┐ │
│ │divert│ │redirect│ │ procscan │ │
│ │ pkt │ │ TCP+UDP│ │ (2s tick)│ │
│ └──┬───┘ └───┬────┘ └────┬─────┘ │
│ ▼ ▼ │ │
│ WinDivert socks5 │ │
│ .sys client │ │
└──────────────────────────────┼──────┘
┌────────────┐ ┌─────────────▼───┐
│ kernel │ │ upstream SOCKS5 │
│ packet cap │ │ (mihomo) │
└────────────┘ └─────────────────┘
```
## File layout
```
cmd/drover/
main.go existing — extend with engine startup, single-instance check
uac_windows.go new — IsAdmin, ReElevate
console_windows.go existing
autoupdate_windows.go existing
internal/engine/
engine.go new — orchestration, state machine, lifecycle
state.go new — Idle/Starting/Active/Reconnecting/Failed enum + transitions
recovery.go new — failure classifier → action mapper
health.go new — heartbeat timer, traffic detector
power_windows.go new — WM_POWERBROADCAST listener (sleep/resume)
internal/divert/
divert.go new — WinDivert handle wrapper
filter.go new — filter expression builder
packet.go new — IPv4 + TCP/UDP parse + checksum recompute
installer.go new — extract embedded WinDivert.sys/.dll on first run
divert_arm64.go new — stub returning "ARM64 not supported"
internal/socks5/ NEW — production client (separate from internal/checker/socks5.go)
client.go new — TCP CONNECT + greet/auth
udp.go new — UDP ASSOCIATE + encapsulate/decapsulate
pool.go new — control-channel pool (deferred to P2.5 if needed)
internal/redirect/
tcp.go new — NAT-loopback redirect listener + per-flow pump
udp.go new — per-flow UDP tracker + encap/decap
internal/procscan/
procscan.go new — Toolhelp32 snapshot, periodic PID resolver
internal/tray/
tray.go new — getlantern/systray icon + menu
icons.go new — embed idle/active/reconnecting/error ICOs
internal/autostart/
autostart_windows.go new — HKCU\...\Run registry toggle
internal/single/
single_windows.go new — named mutex + activation pipe
internal/config/
config.go new — TOML schema + defaults
loader.go new — load/save with file lock
watcher.go new — fsnotify hot-reload
internal/gui/
app.go existing — extend with engine bindings
frontend/... existing — wire engine controls + autostart checkbox
third_party/windivert/ existing — WinDivert64.sys, WinDivert.dll, LICENSE-LGPL
third_party/icons/ new — tray/{idle,active,reconnecting,error}.ico
```
## Engine state machine
```
┌────────┐
│ Idle │ ◄────────────────── (initial)
└────┬───┘
│ user clicks "Start engine"
┌────────────┐
┌──────│ Starting │── any error ───┐
│ └─────┬──────┘ │
│ │ all checks ok │
│ ▼ │
│ ┌────────────┐ │
│ │ Active │ ◄─── recover ─┐ │
│ └────┬───────┘ │
│ │ proxy lost / SOCKS5 │
│ │ control channels died │
│ ▼ │
│ ┌─────────────┐ │
│ │Reconnecting │── 5 min cap ──┐ │
│ └────┬────────┘ │
│ │ recovered │
│ ▼ │
│ back to Active │
│ │
│ Stop button ─►───────────────────┐│
│ ▼▼
│ ┌────────┐
└──── Stop ───────────────────►│ Failed │
└────┬───┘
│ user clicks Retry
(back to Starting)
```
States visible to GUI as `EngineStatus`:
- `Idle` — engine off, tray icon grey, GUI shows "Start" button
- `Starting` — handle being opened, procscan running, health-check; tray yellow with spin
- `Active` — packets flowing; tray green; live stats updating
- `Reconnecting` — proxy unreachable, exponential backoff in progress; tray yellow; "Reconnecting (3rd attempt)"
- `Failed` — driver lost twice OR panic OR Reconnecting hit 5 min cap. Tray red. GUI shows error message + Retry button.
## E3 recovery rules (failure classifier)
```go
// internal/engine/recovery.go
type FailureClass int
const (
ClassDriverLost FailureClass = iota // WinDivert handle invalid, ERROR_INVALID_HANDLE on Recv
ClassDriverGone // WinDivertOpen returns ERROR_FILE_NOT_FOUND or similar
ClassProxyUnreachable // SOCKS5 control TCP connection rejected/timeout
ClassPanic // recover() in goroutine
ClassSleep // WM_POWERBROADCAST suspend
ClassResume // WM_POWERBROADCAST resume
ClassFatal // anything we can't classify
)
type Action int
const (
ActionRetryOnce Action = iota // sleep 2s, reopen, if fails again → Failed
ActionExpBackoff // 1s → 5s → 30s cap, infinite, max 5min cumulative
ActionFailStop // straight to Failed, write crash dump
ActionPause // drain in-flight, close sockets, transition to Reconnecting
ActionResume // wait 5s, reopen handle, transition to Active
)
func ClassifyFailure(err error, class FailureClass) Action
```
| Class | Action | UI feedback |
|---|---|---|
| `DriverLost` | RetryOnce | Status="reopening driver" |
| `DriverGone` | FailStop | "Driver missing — reinstall Drover" |
| `ProxyUnreachable` | ExpBackoff | "Reconnecting (Nth attempt)…" |
| `Panic` | FailStop | "Engine crashed — log saved to %PROGRAMDATA%\\Drover\\logs\\crash-*.txt" |
| `Sleep` | Pause | "Paused (system sleep)" |
| `Resume` | Resume | "Resuming…" then back to Active |
**Health-check before Start engine**: GUI's Start button first runs `internal/checker.Run` with a reduced subset (tcp + greet + udp tests, 2s budget, no voice-quality). If any fails, the engine doesn't start and the GUI shows what failed. Prevents the "I clicked Start but Discord still doesn't work" mystery.
**Heartbeat timer**: every 5s, sample `(rxBytes_now - rxBytes_5sAgo) > 0`. If false for 30s while Active and procscan reports Discord PIDs > 0, set status=`Active (no traffic)` (informational sub-state, tray green→yellow but state machine stays in Active). User sees this and can investigate (Discord might just be idle).
**Crash dumps**: panic recover in any engine goroutine writes `%PROGRAMDATA%\Drover\logs\crash-YYYYMMDD-HHMMSS.txt` with full stack + goroutine dump + version. Then transitions to Failed.
## WinDivert layer
### Filter expression (rebuilt on PID list change)
```
outbound and (tcp or udp) and ip
and (processId == 12345 or processId == 67890 or ...)
and processId != <own_pid>
and ip.DstAddr != <upstream_proxy_ip>
and not (ip.DstAddr >= 224.0.0.0 and ip.DstAddr <= 239.255.255.255)
and not (ip.DstAddr >= 127.0.0.0 and ip.DstAddr <= 127.255.255.255)
and not (ip.DstAddr >= 169.254.0.0 and ip.DstAddr <= 169.254.255.255)
```
Notes:
- `ip` (IPv4) only — no `ipv6` clause. Discord client falls back to v4 in ~150ms via Happy Eyeballs.
- `processId != own_pid` is critical — without it our own SOCKS5 traffic to upstream gets caught and infinite-looped.
- Multicast/loopback/link-local explicitly excluded (Discord never talks to those, but extra safety).
If the upstream proxy IP cannot be resolved at engine start, we fail-stop with a clear message — we cannot build a correct filter without it.
### Library choice
Use `github.com/imgk/divert-go` v0.1.0 (existing dep proposal — verify it still maintained when implementing P2.1). If unmaintained / broken, write thin syscall bindings directly — WinDivert C API is small (~6 functions used).
### Driver lifecycle
1. **First run**: extract embedded `WinDivert64.sys` + `WinDivert.dll` from Go `embed.FS` into `%PROGRAMDATA%\Drover\windivert\`. SHA256-verify against expected hashes (compiled in at build time).
2. **Open handle**: `WinDivertOpen(filter, layer=NETWORK, priority=0, flags=0)`. The driver auto-installs as a Windows service named "WinDivert" on first open.
3. **Driver remains installed across reboots** — we don't uninstall on Stop. Uninstaller (Inno Setup) explicitly does `sc stop WinDivert && sc delete WinDivert` on uninstall.
### Driver edge cases (D-series in matrix)
- **D-1: not installed** → embedded copy + auto-install on WinDivertOpen.
- **D-2: old v1.x** (zapret legacy) → `WinDivertOpen` returns `ERROR_DRIVER_FAILED_PRIOR_UNLOAD`. Detect: query service "WinDivert" via `OpenServiceW` + `QueryServiceStatusEx` to read binary path → check version resource. Show "Outdated WinDivert detected from another tool. Stop the other tool and reboot."
- **D-3: corrupted .sys** → SHA256 mismatch on extract. Reinstall path (delete + recopy + retry).
- **D-4: AV quarantine** → embedded bytes don't match expected → show specific error: "Antivirus may have quarantined WinDivert64.sys. Add `%PROGRAMDATA%\Drover\` to your AV exclusions and restart Drover."
- **D-5: reboot pending** → install successful but service not started → show "Reboot required to activate driver" with no retry button.
- **D-7: ARM64** → `runtime.GOARCH` check at startup; on ARM64 show "Drover requires x86-64 Windows. WinDivert does not support ARM64."
## TCP redirect (NAT-loopback)
### Mechanism
1. On engine start, bind a TCP listener on `127.0.0.1:0` (OS picks unused port). Save the port number.
2. WinDivert sees a new SYN from `Discord.exe → real_target_ip:real_target_port`. Engine:
a. Modifies the IP header: `dst_addr = 127.0.0.1`, `dst_port = listener_port`. Stores mapping `(src_port → real_target_ip:port)` in a `sync.Map` with TTL 30 min.
b. Recomputes IP + TCP checksums.
c. Reinjects via `WinDivertSend` with direction=outbound. The kernel routes to loopback because dst is now 127.0.0.1.
3. Listener `accept()` returns a conn from `127.0.0.1:src_port`. Engine looks up mapping by `src_port`, finds real_target.
4. Engine opens fresh SOCKS5 control TCP to upstream, does greet + (auth if config) + CONNECT to real_target_ip:port.
5. Once SOCKS5 returns REP=00, `io.Copy` pumps bytes both directions until EOF on either side.
6. Conn close → drop mapping.
### TCP edge cases
- **T-1: listener bind fails** → fail-stop "could not bind loopback listener". Should never happen (random unused port).
- **T-2: 100+ concurrent flows** — sync.Map scales fine. Bound only by Discord's TCP usage (typically 50).
- **T-3: TCP retransmits** — handled by OS at both sides of the loopback.
- **T-4: IPv6** — dropped at filter level. Discord falls back to v4.
- **T-5: half-closed** — `io.Copy` returns on EOF in one direction; we close the other side via `defer conn.Close()`.
- **T-6: mapping leak** if conn never properly closes — TTL 30min sweeper goroutine deletes stale entries.
## UDP redirect (SOCKS5 UDP ASSOCIATE)
### Mechanism
1. WinDivert sees outbound UDP from `Discord.exe:src_port → real_target_ip:port`. Engine:
a. Looks up mapping by `(src_ip, src_port, real_target_ip, real_target_port)`. If absent:
b. **Open new SOCKS5 control TCP** to upstream. Greet + (auth) + UDP ASSOCIATE.
c. Receive relay endpoint `(relay_ip, relay_port)` — if BND.ADDR is `0.0.0.0` substitute `upstream_proxy_ip`.
d. Open client-side UDP socket on `127.0.0.1:0`. Save mapping `flow_id → {control_tcp, relay, client_udp}`.
2. **Outbound packet path**: encap with SOCKS5 UDP header `00 00 | 00 | ATYP=01 | DST_IP(4) | DST_PORT(2) | DATA`. Send via `client_udp.WriteTo(packet, relay)`. Don't reinject the original packet — drop it (we sent the encapsulated version through the relay).
3. **Inbound packet path** (separate goroutine per flow): `client_udp.ReadFrom(buf)` → strip 10-byte SOCKS5 header → fabricate an IPv4+UDP packet with `src=real_target_ip:port, dst=Discord_src_ip:src_port`, recompute checksums → `WinDivertSend` direction=inbound. Discord sees a normal reply from real_target.
4. Idle TTL 5 min: any flow with no packets for 5 min → close control_tcp + client_udp + remove mapping.
### UDP edge cases
- **U-1**: each flow gets its own control TCP. No pool in v1 (overhead is ~5KB per flow, fine for ~10 active flows).
- **U-2: idle leak** → 5min TTL.
- **U-3: Discord changes voice region** mid-call → old flow goes idle (5min TTL), new flow opens. Brief glitch.
- **U-4: UDP fragments** → SOCKS5 RFC 1928 doesn't support FRAG. Drop. Discord packets are typically <1500 bytes; fragmentation rare.
- **U-5: control TCP dies** → next packet detects via `Write` error → close mapping → next-next packet opens fresh control. Audio glitch ~2-3s.
## Process scanning
### Mechanism
`internal/procscan` runs every 2 seconds:
1. `CreateToolhelp32Snapshot(TH32CS_SNAPPROCESS, 0)` → enumerate via `Process32First`/`Process32Next`. Microseconds.
2. Filter by `szExeFile` against config `targets.processes` (case-insensitive on Windows).
3. Diff vs previous PID set. If different → notify engine to rebuild filter expression and reopen WinDivert handle.
### Race: Discord starts up to 2s before procscan catches it
Mitigation: at engine `Start`, do **synchronous initial scan** before opening WinDivert handle. After that, the periodic 2s tick handles ongoing changes.
### Process edge cases
- **P-1: Discord PID changes** → 2s scan + 50ms reopen gap with direct traffic. Acceptable.
- **P-2: multiple Discord variants**: default config includes `Discord.exe`, `DiscordCanary.exe`, `DiscordPTB.exe`, `Update.exe`. Vesktop **opt-in** via config (not default).
- **P-3: Update.exe** (Discord's updater) included in default — it downloads patches via HTTP and we want those proxied too.
- **P-5: PID re-use** (Discord exits, Chrome takes the PID before next scan) → 2s window where Chrome packets get proxied. Cosmetic, low-impact.
## Self-loop protection
The engine itself opens TCP/UDP connections to the upstream proxy. Without protection, the WinDivert filter would catch our own packets, encapsulate them in another SOCKS5 layer, infinite loop in seconds.
Three layers of defense:
1. `processId != own_pid` in the filter expression.
2. `ip.DstAddr != <upstream_proxy_ip>` (resolved once at engine start; if upstream uses DDNS we re-resolve every 30s of failed reconnects).
3. Listener and SOCKS5 client always bind to `127.0.0.1` — even if filter leaks, loopback traffic is excluded by `not (ip.DstAddr >= 127.0.0.0 ...)`.
## UAC + autostart (B1)
### Elevation
`cmd/drover/main.go` startup sequence:
```go
func main() {
// 1. AttachConsole for CLI compatibility (existing)
attachConsole()
// 2. Single-instance check (mutex). If second instance, send "show" to first and exit.
if !single.AcquireMutex() {
single.ActivateExistingInstance()
os.Exit(0)
}
// 3. Parse Cobra commands. CLI sub-commands like `--check` and `--version` don't need admin
// and can run as user. The default GUI mode requires admin for WinDivert.
if cmdNeedsAdmin() && !uac.IsAdmin() {
uac.ReElevate(os.Args[1:]) // ShellExecute("runas", ...) + exit
os.Exit(0)
}
// 4. Auto-update check (existing). Replace exe + relaunch if needed.
autoUpdateOnStartup()
// 5. Boot Wails GUI + engine.
gui.Run(Version)
}
```
`uac.ReElevate` uses `ShellExecuteW` with `lpVerb="runas"`. If user cancels UAC, `ShellExecute` returns `SE_ERR_ACCESSDENIED` → we exit cleanly without an error dialog (the user already saw their cancel intent).
### Autostart
Implemented via `HKCU\Software\Microsoft\Windows\CurrentVersion\Run\DroverGo`:
- Value type: REG_SZ, value: full path to `drover.exe` with no args
- Set on toggle ON, deleted on toggle OFF
- GUI Settings tab has a checkbox "Запускать при входе в Windows" that reads/writes this key
**Edge case A-5**: User disables autostart via Task Manager → Startup Apps. Windows writes a `Disabled` mark in `HKCU\Software\Microsoft\Windows\CurrentVersion\Explorer\StartupApproved\Run`. On GUI mount we check both keys; if Disabled → checkbox shown unchecked (user wins).
**Edge case A-6**: Stale path (drover.exe was moved). On every launch we re-write the key value to `os.Executable()` if autostart is enabled. Self-healing.
## Tray + window (D1)
### Tray icon (4 ICO files embedded)
| State | Icon | When shown |
|---|---|---|
| `idle` | grey | Engine not running |
| `active` | green | Engine running, traffic flowing |
| `reconnecting` | yellow | Reconnecting state OR no-traffic-detected |
| `error` | red | Failed state |
### Tray menu (right-click)
```
[●] Active · 2h 14m · ↑ 142 KB/s ↓ 1.2 MB/s [disabled status row, dynamic]
─────────────────────────────────────
[⏸] Stop proxying [primary action, contextual]
[🔍] Run check [opens window + auto-runs check]
─────────────────────────────────────
[🪟] Show window [hidden when window is visible]
[📁] Open log file
─────────────────────────────────────
[🔄] Check for updates
[] About
─────────────────────────────────────
[✕] Quit
```
The status row is updated every 1s while engine is running.
### Click behaviors
- Single-click tray icon → toggle window visibility
- Double-click tray icon → open window (no toggle, always show)
- X on window title bar → hide to tray (D1)
- First-time only: toast "Drover свёрнут в трей. Engine продолжает работать. Закрыть полностью — через меню трея → Quit." Track via `config.ui.shown_tray_toast = true`.
- Quit from tray menu → graceful engine stop → exit cleanly
### Library
`github.com/getlantern/systray`. Stable on Win10/11 modulo the explorer-restart edge case which the library handles internally.
## Single-instance enforcement
Mutex name: `Global\DroverGoInstance-<installID>` where `installID = SHA256(os.Executable())[:16]`. This way:
- Installed copy at `C:\Program Files\Drover\drover.exe` and a portable copy at `D:\portable\drover.exe` get different mutexes — both can run.
- Two simultaneous launches of the same install fight over the mutex; second loses.
Activation pipe: `\\.\pipe\drover-gui-<installID>`. Second instance opens it, writes `{"action":"show"}`, closes. First instance's listener goroutine pops the window to foreground.
If first instance crashes without cleanup → mutex disappears at process death (kernel handle table cleanup). Next launch acquires normally.
## Sleep/resume handling
`WM_POWERBROADCAST` listener via Windows message loop in a dedicated goroutine. Uses `RegisterPowerSettingNotification` for fine-grained events.
| Event | Action |
|---|---|
| `PBT_APMSUSPEND` | Engine: drain in-flight packets (give 200ms), close all SOCKS5 control TCPs, close WinDivert handle, set status="paused (sleep)" |
| `PBT_APMRESUMEAUTOMATIC` or `PBT_APMRESUMESUSPEND` | Wait 5s for network reconnect (poll `GetIpForwardTable2` for default route presence), reopen WinDivert handle, run health-check, transition Active |
## Stats counters
Atomic counters in `internal/engine/stats.go`:
- `bytesIn uint64` — bytes received from upstream (decapsulated UDP + TCP `io.Copy` returns)
- `bytesOut uint64` — bytes sent to upstream
- `tcpFlowsActive int32` — current count of open TCP redirects
- `udpFlowsActive int32` — current count of open UDP flows
- `startedAt time.Time` — engine start time (for uptime)
Per-flow counters discarded on flow close (no aggregation needed for v1).
Tray status row updates from these every 1s. GUI live stats panel does the same via Wails event `stats:update` (existing path).
Lifetime totals persisted to `%PROGRAMDATA%\Drover\stats.json` every 60s and on Stop.
## Config schema (TOML)
`%APPDATA%\Drover\config.toml`:
```toml
# Drover-Go config — auto-managed by GUI; manual edits hot-reload via fsnotify.
version = 1
[proxy]
host = "95.165.72.59"
port = 12334
auth = false
login = ""
password = ""
udp_associate_timeout = "5s"
tcp_connect_timeout = "10s"
[targets]
processes = ["Discord.exe", "DiscordCanary.exe", "DiscordPTB.exe", "Update.exe"]
include_vesktop = false
[skip]
# CIDR ranges to never proxy. Local + link-local always implicitly skipped at filter level.
extra_skip_cidrs = []
multicast = true
[ui]
log_level = "info"
log_max_mb = 10
log_backups = 3
tray_icon = true
auto_start = false # mirror of HKCU\...\Run
shown_tray_toast = false # one-shot first-close toast tracking
theme = "dark" # dark | light | auto
[update]
check_on_startup = true
forgejo_repo = "git.okcu.io/root/drover-go"
[engine]
heartbeat_interval = "5s"
no_traffic_warn_after = "30s"
reconnect_backoff_initial = "1s"
reconnect_backoff_max = "30s"
reconnect_total_cap = "5m"
```
Edge cases:
- **M-4 corrupted TOML** → log warning + use defaults + GUI shows banner "Config error line N — running with defaults".
- **M-7 hot-reload** → fsnotify on the file. On change: re-parse → if proxy section changed → engine restart (Stop → wait clean → Start). Other sections apply live.
- **Config migration** v1→v2 handled by `version` field; missing version assumes 1.
## Edge case matrix (full)
This is the master list. Every row must have a corresponding test or explicit "verified manually" note in the implementation plan.
| # | Edge case | Mitigation | Test |
|---|---|---|---|
| **D-1** | WinDivert.sys not installed | Embed binary, copy to %PROGRAMDATA%, WinDivertOpen auto-loads | manual: clean Win11 VM |
| **D-2** | Old WinDivert v1.x present (zapret legacy) | Service version query → "remove old version first" error | manual: install zapret first, verify error |
| **D-3** | Driver corrupted | SHA256 verify on extract → reinstall flow with progress | unit test: SHA256 mismatch path |
| **D-4** | AV quarantines our embedded .sys | Specific AV-friendly error message + README link | manual: Defender enabled + first run |
| **D-5** | Reboot pending after install | Show "Reboot to activate driver" | manual: trigger via DISM |
| **D-7** | ARM64 Windows | Detect at startup, refuse install | unit: GOARCH=arm64 build returns expected error |
| **P-1** | Discord PID changes | 2s procscan + filter rebuild | integration: kill+restart Discord, verify continuity |
| **P-3** | Update.exe traffic | Default list includes it | integration: trigger Discord update, verify Update.exe traffic proxied |
| **P-5** | PID re-use | Cosmetic 2s window | accept |
| **L-1** | Self-loop (drover's own SOCKS5 traffic) | Filter excludes own_pid + upstream IP | unit: filter expression builder verifies own PID in output |
| **T-4** | IPv6 Discord targets | Drop at filter level; Happy Eyeballs falls back | manual: verify with `netsh interface ipv6 set route ::/0 disabled` |
| **T-6** | TCP mapping leak | 30min TTL cleanup | unit: TTL sweeper test |
| **U-2** | Idle UDP flow leak | 5min TTL cleanup | unit: TTL sweeper test |
| **U-4** | UDP fragments | Drop (SOCKS5 doesn't support FRAG) | accept (rare) |
| **A-1** | User non-admin | UAC re-launch on startup | manual: standard user account |
| **A-2** | UAC cancelled | Clean exit, no error dialog | manual: cancel UAC prompt |
| **A-3** | UAC at every login (autostart) | Accepted per B1 | document in README |
| **A-5** | Autostart disabled via Task Manager | Detect StartupApproved key, sync GUI checkbox | unit: registry mock |
| **TR-1** | Tray icon disappears on explorer.exe restart | systray library handles re-attach | manual: kill+restart explorer.exe |
| **TR-3** | First-time tray toast | Track `ui.shown_tray_toast` in config | unit: config writer |
| **SI-1** | Mutex collision portable vs installed | installID = SHA256(exe path)[:16] | unit: two paths → two mutexes |
| **SI-3** | First instance crashed without cleanup | Kernel cleans mutex on process death | manual: kill -9 first, launch second |
| **SR-1** | System sleep | WM_POWERBROADCAST listener → graceful pause | manual: trigger sleep on test machine |
| **SR-2** | System resume | Wait 5s network → reopen handle → resume | manual: wake from sleep |
| **UP-1** | Auto-update during active engine | Graceful shutdown → replace exe → relaunch with prior state | manual: stage v0.1 → v0.2 update during voice call |
| **M-1** | VPN concurrent | WinDivert ловит до VPN encap; SOCKS5 traffic to upstream IP — норма | manual: with WireGuard + Drover both active |
| **M-4** | Config corrupted | Use defaults + warning banner | unit: malformed TOML → defaults applied |
| **M-5** | Proxy IP changed (DDNS) | Re-resolve hostname every 30s of failed reconnect | unit: hostname resolver retry |
| **M-7** | Hot-reload config | fsnotify → engine restart | integration: edit TOML, observe restart |
## Out of scope (Phase 3+)
- DPI bypass / fake QUIC injection (decision **C1**) — add as opt-in toggle in v0.4 if needed
- Windows service mode (decision **A**) — add for power users in v0.4 if requested
- IPv6 SOCKS5 ATYP=04 — add when we hit a v6-only proxy
- ARM64 Windows — add when WinDivert ships ARM64 driver (waiting on basil00 upstream)
- Multi-user PC scenarios — single-user assumption baked in
- Vesktop default-on — stays opt-in via `targets.include_vesktop = true`
- Custom DNS resolver / DNS-over-proxy — out of scope; DNS goes direct, document in README
## Phase 2 milestones
Each milestone is a separate `writing-plans` invocation followed by `subagent-driven-development` execution.
### P2.1 — TCP-only MVP (3-4 days)
**Scope**: WinDivert handle, filter expression, packet parser, TCP NAT-loopback redirect, SOCKS5 client (TCP CONNECT only), procscan, self-loop protection, basic engine state machine (Idle/Starting/Active/Failed without Reconnecting yet).
**Acceptance**:
- Run drover.exe on Win11 with admin
- Discord chat + Discord API requests routed through SOCKS5 (verify via tcpdump on mihomo: should see TCP CONNECT to discord.com:443 from upstream IP)
- Voice does NOT yet work (UDP path absent) — documented expectation
- Stop button cleanly closes everything in <500ms
- Driver remains installed after exit (verify `sc query WinDivert`)
- No self-loop infinite traffic (verify: bytes in == bytes out, not exponentially growing)
### P2.2 — UDP voice (3-4 days)
**Scope**: SOCKS5 UDP ASSOCIATE primitives (production-grade, not the diagnostic-only fork in checker), UDP flow tracker, packet encap/decap, IPv4-fabrication-and-reinject for inbound path.
**Acceptance**:
- Voice call in Discord through proxy works without audible degradation
- Up to 4 simultaneous voice calls (ish) work without flow leakage
- Idle voice flow cleanup at 5min TTL (verified via debug log)
- Mid-call proxy disconnect → flow drops → re-opens within 2s on next outbound packet → ~2-3s audible glitch
- No memory leak after 1h voice call (RSS stable ±5MB)
### P2.3 — E3 recovery + sleep/resume (2 days)
**Scope**: failure classifier, contextual retry policies, Reconnecting state, exponential backoff, WM_POWERBROADCAST listener, heartbeat health-check.
**Acceptance**:
- Stop mihomo on LXC 102 mid-session → engine transitions Active → Reconnecting → Active when mihomo back up (within 30s of recovery)
- Trigger machine sleep mid-voice-call → engine pauses gracefully → wake → engine resumes within 10s after network up → voice continues (Discord client itself reconnects)
- WinDivert handle externally killed (`sc stop WinDivert && sc start WinDivert`) → engine reopens once → if second kill within 30s → Failed with crash log
- Heartbeat detects "no traffic" while Discord open and idle → tray turns yellow with "no traffic" tooltip → no Failed transition
### P2.4 — Tray + autostart + engine UI (2-3 days)
**Scope**: getlantern/systray integration, 4 ICO icons, tray menu (D1 + first-time toast), autostart checkbox in GUI Settings tab, Start/Stop buttons in main window wired to engine, status indicator with state machine awareness, single-instance enforcement.
**Acceptance**:
- Toggle autostart on → reboot → drover launches at login (after UAC accept)
- X on window → first-time toast → second X → silent hide
- Start button only enabled when checker passed (or in Failed state with Retry)
- Tray icon updates within 200ms of state change
- Two simultaneous launches → second activates first's window and exits silently
- Status row in tray menu updates every 1s while Active
### P2.5 — Polish (2-3 days)
**Scope**: crash dumps, config hot-reload via fsnotify, AV-friendly error messages, all remaining edge cases from matrix, README troubleshooting, install/uninstall verification on clean Win11 VM.
**Acceptance**:
- Every edge case in the matrix has either a passing test or a verified manual reproduction note in `docs/testing/p2-edge-cases.md`
- Install on clean Win11 VM, run for 1 hour without intervention, no errors
- Uninstall via Apps & Features removes everything except optionally-kept config (asked at uninstall)
- README has SmartScreen + AV troubleshooting sections with screenshots
**Total**: ~12-16 days to v1.0.0.
## Testing strategy
### Unit tests (per-package)
- `divert/filter`: filter expression builder produces expected strings for various PID lists
- `divert/packet`: parse + serialize + checksum recompute is round-trip identity
- `engine/recovery`: failure classifier returns expected Action for each FailureClass
- `socks5/udp`: encap/decap round-trip
- `procscan`: snapshot diffing, mocked toolhelp32
- `autostart`: registry read/write/disabled-detection (with mock registry)
- `single`: mutex acquire + release lifecycle
- `config`: defaults applied, malformed TOML → defaults + warning, version migration
### Integration tests (each milestone has its own)
- `engine_test.go`: mock WinDivert + mock SOCKS5 server in-process, exercise full pipeline
- `redirect_test.go`: spin up TCP listener, fake Discord client, fake SOCKS5 server, verify bytes flow
### Manual test plan (per milestone, in `docs/testing/p2-<milestone>-manual.md`)
Each manual test case is a numbered step-by-step with expected outcome. Run on clean Win11 VM snapshot before each milestone tag.
### End-to-end (manual, before v1.0.0)
Full user journey in `docs/testing/p2-e2e.md`:
1. Download installer from Forgejo release
2. Install via setup.exe (UAC prompt)
3. First launch: configure proxy, run check, click Start
4. Run Discord, place voice call → verify routing via mihomo logs
5. Toggle autostart on
6. Reboot → verify drover starts at login (UAC accept)
7. Sleep + wake cycle → verify continuity
8. Stop mihomo → verify Reconnecting state → restart mihomo → verify recovery
9. Quit via tray menu → verify clean shutdown
10. Uninstall → verify cleanup
## Open questions / assumptions to validate during P2.1
1. **`imgk/divert-go` v0.1.0 still works with WinDivert v2.2.2?** If not, switch to direct syscall bindings. Verify in P2.1 day 1.
2. **Filter expression length limit** — WinDivert filter expressions have a max length. With 4 Discord PIDs + own PID + upstream IP exclusion + multicast we should be well under, but if user adds 10+ Vesktop variants we might hit it. Verify and document limit during P2.1.
3. **`WinDivertSend` for inbound packets we synthesize** — does the kernel correctly route a fabricated `dst=Discord_IP, src=real_target_IP` packet back to Discord's socket? Most divert-based tools do this; verify in P2.2 day 1 with a tracer.
4. **Embedded ICO size on disk** — 4 icons × ~5KB = 20KB. Negligible.
## Files to read before implementation
- `imgk/shadow/pkg/divert/` — opens handle + read packets pattern (downloaded already)
- `imgk/divert-go` README + `addr.go` — API surface
- `runetfreedom/force-proxy/proxy.cpp` — correct SOCKS5 UDP ASSOCIATE flow (local at `/tmp/drover-cmp/force-proxy/`)
- `wailsapp/wails/v2/examples/react` — Wails patterns for Engine bindings
- This spec.