Files
drover-go/docs/superpowers/specs/2026-05-01-engine-design.md
T
root 5f107de95d spec: Phase 2 engine — WinDivert + SOCKS5 transparent proxy
Design accepted 2026-05-01. Locks in 5 architectural decisions
(GUI-only, UAC-per-launch, no DPI bypass, hide-to-tray with toast,
contextual recovery) and decomposes Phase 2 into 5 milestones with
explicit acceptance criteria + a 30-row edge case matrix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 19:21:16 +03:00

37 KiB
Raw Blame History

Engine — WinDivert + SOCKS5 transparent proxy for Discord

Status: design accepted 2026-05-01. Replaces: stub StartEngine/StopEngine in internal/gui/app.go that just toggle a flag. Implements: Phase 2 from docs/planning/cuddly-baking-taco.md.

Why

The checker proves the upstream SOCKS5 proxy works. The engine is what actually routes Discord's traffic through it. Without the engine, every diagnostic in the world is theatre — the GUI just sits there saying "Active" while Discord still talks direct to discord.com. Phase 2 turns that "Active" state into reality: kernel-level packet capture (WinDivert), NAT-style TCP redirect to a loopback listener, SOCKS5 UDP ASSOCIATE for voice, and a polished lifecycle so the user can install once, click "autostart at login", and forget the thing exists until Discord stops working — at which point the tray icon turns yellow and explains why.

Architecture decisions (locked-in 2026-05-01)

# Decision Rationale
A GUI-only single-process; no Windows service Friends-and-family Windows-PC, Discord runs only when user is logged in. Service mode is overengineering for v1; can be added in v0.4 if a power user asks.
B1 UAC prompt at every launch; no scheduled-task trampoline User chose simplicity over polish. Each drover.exe invocation re-elevates if not admin. Autostart via HKCU\...\Run triggers the same prompt at login.
C1 No DPI bypass (no fake QUIC injection) Start with the simplest pipeline that works. If a friend reports voice not working on a DPI-active provider, add C2/C3 in v0.4.
D1 Window X = hide-to-tray + first-time toast; quit only via tray menu Industry-standard (Steam, Discord, Telegram). One-shot toast prevents the "where did it go?" surprise.
E3 Contextual recovery: driver-loss → 1 reopen retry → fail-stop; proxy-loss → infinite exp-backoff (Reconnecting state); panic → fail-stop with crash dump; sleep/resume → graceful pause/resume Different failure classes need different responses. Aggressive auto-restart on every error masks bugs; honest fail-stop on every error annoys the user during transient network blips.

High-level architecture

                ┌─────────────────────────────────────┐
                │      drover.exe (single binary)     │
                │                                     │
                │  ┌──────────────┐ ┌──────────────┐  │
                │  │ Wails GUI    │ │  systray     │  │
                │  └──────┬───────┘ └──────┬───────┘  │
                │         └───────┬────────┘          │
                │       ┌─────────▼──────────┐        │
                │       │     Engine         │        │
                │       │  state machine     │        │
                │       │  Idle / Starting / │        │
                │       │  Active / Reconn / │        │
                │       │       Failed       │        │
                │       └─────────┬──────────┘        │
                │       ┌─────────┼─────────────┐     │
                │       ▼         ▼             ▼     │
                │   ┌──────┐ ┌────────┐  ┌──────────┐ │
                │   │divert│ │redirect│  │ procscan │ │
                │   │ pkt  │ │ TCP+UDP│  │ (2s tick)│ │
                │   └──┬───┘ └───┬────┘  └────┬─────┘ │
                │      ▼         ▼            │       │
                │   WinDivert  socks5         │       │
                │   .sys       client         │       │
                └──────────────────────────────┼──────┘
                                               │
                ┌────────────┐    ┌─────────────▼───┐
                │   kernel   │    │ upstream SOCKS5 │
                │ packet cap │    │   (mihomo)      │
                └────────────┘    └─────────────────┘

File layout

cmd/drover/
  main.go                     existing — extend with engine startup, single-instance check
  uac_windows.go              new    — IsAdmin, ReElevate
  console_windows.go          existing
  autoupdate_windows.go       existing

internal/engine/
  engine.go                   new — orchestration, state machine, lifecycle
  state.go                    new — Idle/Starting/Active/Reconnecting/Failed enum + transitions
  recovery.go                 new — failure classifier → action mapper
  health.go                   new — heartbeat timer, traffic detector
  power_windows.go            new — WM_POWERBROADCAST listener (sleep/resume)

internal/divert/
  divert.go                   new — WinDivert handle wrapper
  filter.go                   new — filter expression builder
  packet.go                   new — IPv4 + TCP/UDP parse + checksum recompute
  installer.go                new — extract embedded WinDivert.sys/.dll on first run
  divert_arm64.go             new — stub returning "ARM64 not supported"

internal/socks5/                 NEW — production client (separate from internal/checker/socks5.go)
  client.go                   new — TCP CONNECT + greet/auth
  udp.go                      new — UDP ASSOCIATE + encapsulate/decapsulate
  pool.go                     new — control-channel pool (deferred to P2.5 if needed)

internal/redirect/
  tcp.go                      new — NAT-loopback redirect listener + per-flow pump
  udp.go                      new — per-flow UDP tracker + encap/decap

internal/procscan/
  procscan.go                 new — Toolhelp32 snapshot, periodic PID resolver

internal/tray/
  tray.go                     new — getlantern/systray icon + menu
  icons.go                    new — embed idle/active/reconnecting/error ICOs

internal/autostart/
  autostart_windows.go        new — HKCU\...\Run registry toggle

internal/single/
  single_windows.go           new — named mutex + activation pipe

internal/config/
  config.go                   new — TOML schema + defaults
  loader.go                   new — load/save with file lock
  watcher.go                  new — fsnotify hot-reload

internal/gui/
  app.go                      existing — extend with engine bindings
  frontend/...                existing — wire engine controls + autostart checkbox

third_party/windivert/         existing — WinDivert64.sys, WinDivert.dll, LICENSE-LGPL
third_party/icons/             new      — tray/{idle,active,reconnecting,error}.ico

Engine state machine

                ┌────────┐
                │  Idle  │ ◄────────────────── (initial)
                └────┬───┘
                     │ user clicks "Start engine"
                     ▼
              ┌────────────┐
       ┌──────│  Starting  │── any error ───┐
       │      └─────┬──────┘                │
       │            │ all checks ok         │
       │            ▼                       │
       │     ┌────────────┐                 │
       │     │   Active   │ ◄─── recover ─┐ │
       │     └────┬───────┘                 │
       │          │ proxy lost / SOCKS5    │
       │          │  control channels died  │
       │          ▼                         │
       │   ┌─────────────┐                  │
       │   │Reconnecting │── 5 min cap ──┐  │
       │   └────┬────────┘                  │
       │        │ recovered                 │
       │        ▼                           │
       │  back to Active                    │
       │                                    │
       │  Stop button ─►───────────────────┐│
       │                                   ▼▼
       │                              ┌────────┐
       └──── Stop ───────────────────►│ Failed │
                                      └────┬───┘
                                           │ user clicks Retry
                                           ▼
                                       (back to Starting)

States visible to GUI as EngineStatus:

  • Idle — engine off, tray icon grey, GUI shows "Start" button
  • Starting — handle being opened, procscan running, health-check; tray yellow with spin
  • Active — packets flowing; tray green; live stats updating
  • Reconnecting — proxy unreachable, exponential backoff in progress; tray yellow; "Reconnecting (3rd attempt)"
  • Failed — driver lost twice OR panic OR Reconnecting hit 5 min cap. Tray red. GUI shows error message + Retry button.

E3 recovery rules (failure classifier)

// internal/engine/recovery.go

type FailureClass int
const (
    ClassDriverLost      FailureClass = iota // WinDivert handle invalid, ERROR_INVALID_HANDLE on Recv
    ClassDriverGone                          // WinDivertOpen returns ERROR_FILE_NOT_FOUND or similar
    ClassProxyUnreachable                    // SOCKS5 control TCP connection rejected/timeout
    ClassPanic                               // recover() in goroutine
    ClassSleep                               // WM_POWERBROADCAST suspend
    ClassResume                              // WM_POWERBROADCAST resume
    ClassFatal                               // anything we can't classify
)

type Action int
const (
    ActionRetryOnce  Action = iota  // sleep 2s, reopen, if fails again → Failed
    ActionExpBackoff                // 1s → 5s → 30s cap, infinite, max 5min cumulative
    ActionFailStop                  // straight to Failed, write crash dump
    ActionPause                     // drain in-flight, close sockets, transition to Reconnecting
    ActionResume                    // wait 5s, reopen handle, transition to Active
)

func ClassifyFailure(err error, class FailureClass) Action
Class Action UI feedback
DriverLost RetryOnce Status="reopening driver"
DriverGone FailStop "Driver missing — reinstall Drover"
ProxyUnreachable ExpBackoff "Reconnecting (Nth attempt)…"
Panic FailStop "Engine crashed — log saved to %PROGRAMDATA%\Drover\logs\crash-*.txt"
Sleep Pause "Paused (system sleep)"
Resume Resume "Resuming…" then back to Active

Health-check before Start engine: GUI's Start button first runs internal/checker.Run with a reduced subset (tcp + greet + udp tests, 2s budget, no voice-quality). If any fails, the engine doesn't start and the GUI shows what failed. Prevents the "I clicked Start but Discord still doesn't work" mystery.

Heartbeat timer: every 5s, sample (rxBytes_now - rxBytes_5sAgo) > 0. If false for 30s while Active and procscan reports Discord PIDs > 0, set status=Active (no traffic) (informational sub-state, tray green→yellow but state machine stays in Active). User sees this and can investigate (Discord might just be idle).

Crash dumps: panic recover in any engine goroutine writes %PROGRAMDATA%\Drover\logs\crash-YYYYMMDD-HHMMSS.txt with full stack + goroutine dump + version. Then transitions to Failed.

WinDivert layer

Filter expression (rebuilt on PID list change)

outbound and (tcp or udp) and ip
  and (processId == 12345 or processId == 67890 or ...)
  and processId != <own_pid>
  and ip.DstAddr != <upstream_proxy_ip>
  and not (ip.DstAddr >= 224.0.0.0 and ip.DstAddr <= 239.255.255.255)
  and not (ip.DstAddr >= 127.0.0.0 and ip.DstAddr <= 127.255.255.255)
  and not (ip.DstAddr >= 169.254.0.0 and ip.DstAddr <= 169.254.255.255)

Notes:

  • ip (IPv4) only — no ipv6 clause. Discord client falls back to v4 in ~150ms via Happy Eyeballs.
  • processId != own_pid is critical — without it our own SOCKS5 traffic to upstream gets caught and infinite-looped.
  • Multicast/loopback/link-local explicitly excluded (Discord never talks to those, but extra safety).

If the upstream proxy IP cannot be resolved at engine start, we fail-stop with a clear message — we cannot build a correct filter without it.

Library choice

Use github.com/imgk/divert-go v0.1.0 (existing dep proposal — verify it still maintained when implementing P2.1). If unmaintained / broken, write thin syscall bindings directly — WinDivert C API is small (~6 functions used).

Driver lifecycle

  1. First run: extract embedded WinDivert64.sys + WinDivert.dll from Go embed.FS into %PROGRAMDATA%\Drover\windivert\. SHA256-verify against expected hashes (compiled in at build time).
  2. Open handle: WinDivertOpen(filter, layer=NETWORK, priority=0, flags=0). The driver auto-installs as a Windows service named "WinDivert" on first open.
  3. Driver remains installed across reboots — we don't uninstall on Stop. Uninstaller (Inno Setup) explicitly does sc stop WinDivert && sc delete WinDivert on uninstall.

Driver edge cases (D-series in matrix)

  • D-1: not installed → embedded copy + auto-install on WinDivertOpen.
  • D-2: old v1.x (zapret legacy) → WinDivertOpen returns ERROR_DRIVER_FAILED_PRIOR_UNLOAD. Detect: query service "WinDivert" via OpenServiceW + QueryServiceStatusEx to read binary path → check version resource. Show "Outdated WinDivert detected from another tool. Stop the other tool and reboot."
  • D-3: corrupted .sys → SHA256 mismatch on extract. Reinstall path (delete + recopy + retry).
  • D-4: AV quarantine → embedded bytes don't match expected → show specific error: "Antivirus may have quarantined WinDivert64.sys. Add %PROGRAMDATA%\Drover\ to your AV exclusions and restart Drover."
  • D-5: reboot pending → install successful but service not started → show "Reboot required to activate driver" with no retry button.
  • D-7: ARM64runtime.GOARCH check at startup; on ARM64 show "Drover requires x86-64 Windows. WinDivert does not support ARM64."

TCP redirect (NAT-loopback)

Mechanism

  1. On engine start, bind a TCP listener on 127.0.0.1:0 (OS picks unused port). Save the port number.
  2. WinDivert sees a new SYN from Discord.exe → real_target_ip:real_target_port. Engine: a. Modifies the IP header: dst_addr = 127.0.0.1, dst_port = listener_port. Stores mapping (src_port → real_target_ip:port) in a sync.Map with TTL 30 min. b. Recomputes IP + TCP checksums. c. Reinjects via WinDivertSend with direction=outbound. The kernel routes to loopback because dst is now 127.0.0.1.
  3. Listener accept() returns a conn from 127.0.0.1:src_port. Engine looks up mapping by src_port, finds real_target.
  4. Engine opens fresh SOCKS5 control TCP to upstream, does greet + (auth if config) + CONNECT to real_target_ip:port.
  5. Once SOCKS5 returns REP=00, io.Copy pumps bytes both directions until EOF on either side.
  6. Conn close → drop mapping.

TCP edge cases

  • T-1: listener bind fails → fail-stop "could not bind loopback listener". Should never happen (random unused port).
  • T-2: 100+ concurrent flows — sync.Map scales fine. Bound only by Discord's TCP usage (typically 50).
  • T-3: TCP retransmits — handled by OS at both sides of the loopback.
  • T-4: IPv6 — dropped at filter level. Discord falls back to v4.
  • T-5: half-closedio.Copy returns on EOF in one direction; we close the other side via defer conn.Close().
  • T-6: mapping leak if conn never properly closes — TTL 30min sweeper goroutine deletes stale entries.

UDP redirect (SOCKS5 UDP ASSOCIATE)

Mechanism

  1. WinDivert sees outbound UDP from Discord.exe:src_port → real_target_ip:port. Engine: a. Looks up mapping by (src_ip, src_port, real_target_ip, real_target_port). If absent: b. Open new SOCKS5 control TCP to upstream. Greet + (auth) + UDP ASSOCIATE. c. Receive relay endpoint (relay_ip, relay_port) — if BND.ADDR is 0.0.0.0 substitute upstream_proxy_ip. d. Open client-side UDP socket on 127.0.0.1:0. Save mapping flow_id → {control_tcp, relay, client_udp}.
  2. Outbound packet path: encap with SOCKS5 UDP header 00 00 | 00 | ATYP=01 | DST_IP(4) | DST_PORT(2) | DATA. Send via client_udp.WriteTo(packet, relay). Don't reinject the original packet — drop it (we sent the encapsulated version through the relay).
  3. Inbound packet path (separate goroutine per flow): client_udp.ReadFrom(buf) → strip 10-byte SOCKS5 header → fabricate an IPv4+UDP packet with src=real_target_ip:port, dst=Discord_src_ip:src_port, recompute checksums → WinDivertSend direction=inbound. Discord sees a normal reply from real_target.
  4. Idle TTL 5 min: any flow with no packets for 5 min → close control_tcp + client_udp + remove mapping.

UDP edge cases

  • U-1: each flow gets its own control TCP. No pool in v1 (overhead is ~5KB per flow, fine for ~10 active flows).
  • U-2: idle leak → 5min TTL.
  • U-3: Discord changes voice region mid-call → old flow goes idle (5min TTL), new flow opens. Brief glitch.
  • U-4: UDP fragments → SOCKS5 RFC 1928 doesn't support FRAG. Drop. Discord packets are typically <1500 bytes; fragmentation rare.
  • U-5: control TCP dies → next packet detects via Write error → close mapping → next-next packet opens fresh control. Audio glitch ~2-3s.

Process scanning

Mechanism

internal/procscan runs every 2 seconds:

  1. CreateToolhelp32Snapshot(TH32CS_SNAPPROCESS, 0) → enumerate via Process32First/Process32Next. Microseconds.
  2. Filter by szExeFile against config targets.processes (case-insensitive on Windows).
  3. Diff vs previous PID set. If different → notify engine to rebuild filter expression and reopen WinDivert handle.

Race: Discord starts up to 2s before procscan catches it

Mitigation: at engine Start, do synchronous initial scan before opening WinDivert handle. After that, the periodic 2s tick handles ongoing changes.

Process edge cases

  • P-1: Discord PID changes → 2s scan + 50ms reopen gap with direct traffic. Acceptable.
  • P-2: multiple Discord variants: default config includes Discord.exe, DiscordCanary.exe, DiscordPTB.exe, Update.exe. Vesktop opt-in via config (not default).
  • P-3: Update.exe (Discord's updater) included in default — it downloads patches via HTTP and we want those proxied too.
  • P-5: PID re-use (Discord exits, Chrome takes the PID before next scan) → 2s window where Chrome packets get proxied. Cosmetic, low-impact.

Self-loop protection

The engine itself opens TCP/UDP connections to the upstream proxy. Without protection, the WinDivert filter would catch our own packets, encapsulate them in another SOCKS5 layer, infinite loop in seconds.

Three layers of defense:

  1. processId != own_pid in the filter expression.
  2. ip.DstAddr != <upstream_proxy_ip> (resolved once at engine start; if upstream uses DDNS we re-resolve every 30s of failed reconnects).
  3. Listener and SOCKS5 client always bind to 127.0.0.1 — even if filter leaks, loopback traffic is excluded by not (ip.DstAddr >= 127.0.0.0 ...).

UAC + autostart (B1)

Elevation

cmd/drover/main.go startup sequence:

func main() {
    // 1. AttachConsole for CLI compatibility (existing)
    attachConsole()

    // 2. Single-instance check (mutex). If second instance, send "show" to first and exit.
    if !single.AcquireMutex() {
        single.ActivateExistingInstance()
        os.Exit(0)
    }

    // 3. Parse Cobra commands. CLI sub-commands like `--check` and `--version` don't need admin
    //    and can run as user. The default GUI mode requires admin for WinDivert.
    if cmdNeedsAdmin() && !uac.IsAdmin() {
        uac.ReElevate(os.Args[1:]) // ShellExecute("runas", ...) + exit
        os.Exit(0)
    }

    // 4. Auto-update check (existing). Replace exe + relaunch if needed.
    autoUpdateOnStartup()

    // 5. Boot Wails GUI + engine.
    gui.Run(Version)
}

uac.ReElevate uses ShellExecuteW with lpVerb="runas". If user cancels UAC, ShellExecute returns SE_ERR_ACCESSDENIED → we exit cleanly without an error dialog (the user already saw their cancel intent).

Autostart

Implemented via HKCU\Software\Microsoft\Windows\CurrentVersion\Run\DroverGo:

  • Value type: REG_SZ, value: full path to drover.exe with no args
  • Set on toggle ON, deleted on toggle OFF
  • GUI Settings tab has a checkbox "Запускать при входе в Windows" that reads/writes this key

Edge case A-5: User disables autostart via Task Manager → Startup Apps. Windows writes a Disabled mark in HKCU\Software\Microsoft\Windows\CurrentVersion\Explorer\StartupApproved\Run. On GUI mount we check both keys; if Disabled → checkbox shown unchecked (user wins).

Edge case A-6: Stale path (drover.exe was moved). On every launch we re-write the key value to os.Executable() if autostart is enabled. Self-healing.

Tray + window (D1)

Tray icon (4 ICO files embedded)

State Icon When shown
idle grey Engine not running
active green Engine running, traffic flowing
reconnecting yellow Reconnecting state OR no-traffic-detected
error red Failed state

Tray menu (right-click)

[●] Active · 2h 14m · ↑ 142 KB/s ↓ 1.2 MB/s     [disabled status row, dynamic]
─────────────────────────────────────
[⏸] Stop proxying                                [primary action, contextual]
[🔍] Run check                                   [opens window + auto-runs check]
─────────────────────────────────────
[🪟] Show window                                 [hidden when window is visible]
[📁] Open log file
─────────────────────────────────────
[🔄] Check for updates
[] About
─────────────────────────────────────
[✕] Quit

The status row is updated every 1s while engine is running.

Click behaviors

  • Single-click tray icon → toggle window visibility
  • Double-click tray icon → open window (no toggle, always show)
  • X on window title bar → hide to tray (D1)
    • First-time only: toast "Drover свёрнут в трей. Engine продолжает работать. Закрыть полностью — через меню трея → Quit." Track via config.ui.shown_tray_toast = true.
  • Quit from tray menu → graceful engine stop → exit cleanly

Library

github.com/getlantern/systray. Stable on Win10/11 modulo the explorer-restart edge case which the library handles internally.

Single-instance enforcement

Mutex name: Global\DroverGoInstance-<installID> where installID = SHA256(os.Executable())[:16]. This way:

  • Installed copy at C:\Program Files\Drover\drover.exe and a portable copy at D:\portable\drover.exe get different mutexes — both can run.
  • Two simultaneous launches of the same install fight over the mutex; second loses.

Activation pipe: \\.\pipe\drover-gui-<installID>. Second instance opens it, writes {"action":"show"}, closes. First instance's listener goroutine pops the window to foreground.

If first instance crashes without cleanup → mutex disappears at process death (kernel handle table cleanup). Next launch acquires normally.

Sleep/resume handling

WM_POWERBROADCAST listener via Windows message loop in a dedicated goroutine. Uses RegisterPowerSettingNotification for fine-grained events.

Event Action
PBT_APMSUSPEND Engine: drain in-flight packets (give 200ms), close all SOCKS5 control TCPs, close WinDivert handle, set status="paused (sleep)"
PBT_APMRESUMEAUTOMATIC or PBT_APMRESUMESUSPEND Wait 5s for network reconnect (poll GetIpForwardTable2 for default route presence), reopen WinDivert handle, run health-check, transition Active

Stats counters

Atomic counters in internal/engine/stats.go:

  • bytesIn uint64 — bytes received from upstream (decapsulated UDP + TCP io.Copy returns)
  • bytesOut uint64 — bytes sent to upstream
  • tcpFlowsActive int32 — current count of open TCP redirects
  • udpFlowsActive int32 — current count of open UDP flows
  • startedAt time.Time — engine start time (for uptime)

Per-flow counters discarded on flow close (no aggregation needed for v1).

Tray status row updates from these every 1s. GUI live stats panel does the same via Wails event stats:update (existing path).

Lifetime totals persisted to %PROGRAMDATA%\Drover\stats.json every 60s and on Stop.

Config schema (TOML)

%APPDATA%\Drover\config.toml:

# Drover-Go config — auto-managed by GUI; manual edits hot-reload via fsnotify.

version = 1

[proxy]
host = "95.165.72.59"
port = 12334
auth = false
login = ""
password = ""
udp_associate_timeout = "5s"
tcp_connect_timeout   = "10s"

[targets]
processes = ["Discord.exe", "DiscordCanary.exe", "DiscordPTB.exe", "Update.exe"]
include_vesktop = false

[skip]
# CIDR ranges to never proxy. Local + link-local always implicitly skipped at filter level.
extra_skip_cidrs = []
multicast = true

[ui]
log_level     = "info"
log_max_mb    = 10
log_backups   = 3
tray_icon     = true
auto_start    = false             # mirror of HKCU\...\Run
shown_tray_toast = false          # one-shot first-close toast tracking
theme = "dark"                    # dark | light | auto

[update]
check_on_startup = true
forgejo_repo = "git.okcu.io/root/drover-go"

[engine]
heartbeat_interval = "5s"
no_traffic_warn_after = "30s"
reconnect_backoff_initial = "1s"
reconnect_backoff_max     = "30s"
reconnect_total_cap       = "5m"

Edge cases:

  • M-4 corrupted TOML → log warning + use defaults + GUI shows banner "Config error line N — running with defaults".
  • M-7 hot-reload → fsnotify on the file. On change: re-parse → if proxy section changed → engine restart (Stop → wait clean → Start). Other sections apply live.
  • Config migration v1→v2 handled by version field; missing version assumes 1.

Edge case matrix (full)

This is the master list. Every row must have a corresponding test or explicit "verified manually" note in the implementation plan.

# Edge case Mitigation Test
D-1 WinDivert.sys not installed Embed binary, copy to %PROGRAMDATA%, WinDivertOpen auto-loads manual: clean Win11 VM
D-2 Old WinDivert v1.x present (zapret legacy) Service version query → "remove old version first" error manual: install zapret first, verify error
D-3 Driver corrupted SHA256 verify on extract → reinstall flow with progress unit test: SHA256 mismatch path
D-4 AV quarantines our embedded .sys Specific AV-friendly error message + README link manual: Defender enabled + first run
D-5 Reboot pending after install Show "Reboot to activate driver" manual: trigger via DISM
D-7 ARM64 Windows Detect at startup, refuse install unit: GOARCH=arm64 build returns expected error
P-1 Discord PID changes 2s procscan + filter rebuild integration: kill+restart Discord, verify continuity
P-3 Update.exe traffic Default list includes it integration: trigger Discord update, verify Update.exe traffic proxied
P-5 PID re-use Cosmetic 2s window accept
L-1 Self-loop (drover's own SOCKS5 traffic) Filter excludes own_pid + upstream IP unit: filter expression builder verifies own PID in output
T-4 IPv6 Discord targets Drop at filter level; Happy Eyeballs falls back manual: verify with netsh interface ipv6 set route ::/0 disabled
T-6 TCP mapping leak 30min TTL cleanup unit: TTL sweeper test
U-2 Idle UDP flow leak 5min TTL cleanup unit: TTL sweeper test
U-4 UDP fragments Drop (SOCKS5 doesn't support FRAG) accept (rare)
A-1 User non-admin UAC re-launch on startup manual: standard user account
A-2 UAC cancelled Clean exit, no error dialog manual: cancel UAC prompt
A-3 UAC at every login (autostart) Accepted per B1 document in README
A-5 Autostart disabled via Task Manager Detect StartupApproved key, sync GUI checkbox unit: registry mock
TR-1 Tray icon disappears on explorer.exe restart systray library handles re-attach manual: kill+restart explorer.exe
TR-3 First-time tray toast Track ui.shown_tray_toast in config unit: config writer
SI-1 Mutex collision portable vs installed installID = SHA256(exe path)[:16] unit: two paths → two mutexes
SI-3 First instance crashed without cleanup Kernel cleans mutex on process death manual: kill -9 first, launch second
SR-1 System sleep WM_POWERBROADCAST listener → graceful pause manual: trigger sleep on test machine
SR-2 System resume Wait 5s network → reopen handle → resume manual: wake from sleep
UP-1 Auto-update during active engine Graceful shutdown → replace exe → relaunch with prior state manual: stage v0.1 → v0.2 update during voice call
M-1 VPN concurrent WinDivert ловит до VPN encap; SOCKS5 traffic to upstream IP — норма manual: with WireGuard + Drover both active
M-4 Config corrupted Use defaults + warning banner unit: malformed TOML → defaults applied
M-5 Proxy IP changed (DDNS) Re-resolve hostname every 30s of failed reconnect unit: hostname resolver retry
M-7 Hot-reload config fsnotify → engine restart integration: edit TOML, observe restart

Out of scope (Phase 3+)

  • DPI bypass / fake QUIC injection (decision C1) — add as opt-in toggle in v0.4 if needed
  • Windows service mode (decision A) — add for power users in v0.4 if requested
  • IPv6 SOCKS5 ATYP=04 — add when we hit a v6-only proxy
  • ARM64 Windows — add when WinDivert ships ARM64 driver (waiting on basil00 upstream)
  • Multi-user PC scenarios — single-user assumption baked in
  • Vesktop default-on — stays opt-in via targets.include_vesktop = true
  • Custom DNS resolver / DNS-over-proxy — out of scope; DNS goes direct, document in README

Phase 2 milestones

Each milestone is a separate writing-plans invocation followed by subagent-driven-development execution.

P2.1 — TCP-only MVP (3-4 days)

Scope: WinDivert handle, filter expression, packet parser, TCP NAT-loopback redirect, SOCKS5 client (TCP CONNECT only), procscan, self-loop protection, basic engine state machine (Idle/Starting/Active/Failed without Reconnecting yet).

Acceptance:

  • Run drover.exe on Win11 with admin
  • Discord chat + Discord API requests routed through SOCKS5 (verify via tcpdump on mihomo: should see TCP CONNECT to discord.com:443 from upstream IP)
  • Voice does NOT yet work (UDP path absent) — documented expectation
  • Stop button cleanly closes everything in <500ms
  • Driver remains installed after exit (verify sc query WinDivert)
  • No self-loop infinite traffic (verify: bytes in == bytes out, not exponentially growing)

P2.2 — UDP voice (3-4 days)

Scope: SOCKS5 UDP ASSOCIATE primitives (production-grade, not the diagnostic-only fork in checker), UDP flow tracker, packet encap/decap, IPv4-fabrication-and-reinject for inbound path.

Acceptance:

  • Voice call in Discord through proxy works without audible degradation
  • Up to 4 simultaneous voice calls (ish) work without flow leakage
  • Idle voice flow cleanup at 5min TTL (verified via debug log)
  • Mid-call proxy disconnect → flow drops → re-opens within 2s on next outbound packet → ~2-3s audible glitch
  • No memory leak after 1h voice call (RSS stable ±5MB)

P2.3 — E3 recovery + sleep/resume (2 days)

Scope: failure classifier, contextual retry policies, Reconnecting state, exponential backoff, WM_POWERBROADCAST listener, heartbeat health-check.

Acceptance:

  • Stop mihomo on LXC 102 mid-session → engine transitions Active → Reconnecting → Active when mihomo back up (within 30s of recovery)
  • Trigger machine sleep mid-voice-call → engine pauses gracefully → wake → engine resumes within 10s after network up → voice continues (Discord client itself reconnects)
  • WinDivert handle externally killed (sc stop WinDivert && sc start WinDivert) → engine reopens once → if second kill within 30s → Failed with crash log
  • Heartbeat detects "no traffic" while Discord open and idle → tray turns yellow with "no traffic" tooltip → no Failed transition

P2.4 — Tray + autostart + engine UI (2-3 days)

Scope: getlantern/systray integration, 4 ICO icons, tray menu (D1 + first-time toast), autostart checkbox in GUI Settings tab, Start/Stop buttons in main window wired to engine, status indicator with state machine awareness, single-instance enforcement.

Acceptance:

  • Toggle autostart on → reboot → drover launches at login (after UAC accept)
  • X on window → first-time toast → second X → silent hide
  • Start button only enabled when checker passed (or in Failed state with Retry)
  • Tray icon updates within 200ms of state change
  • Two simultaneous launches → second activates first's window and exits silently
  • Status row in tray menu updates every 1s while Active

P2.5 — Polish (2-3 days)

Scope: crash dumps, config hot-reload via fsnotify, AV-friendly error messages, all remaining edge cases from matrix, README troubleshooting, install/uninstall verification on clean Win11 VM.

Acceptance:

  • Every edge case in the matrix has either a passing test or a verified manual reproduction note in docs/testing/p2-edge-cases.md
  • Install on clean Win11 VM, run for 1 hour without intervention, no errors
  • Uninstall via Apps & Features removes everything except optionally-kept config (asked at uninstall)
  • README has SmartScreen + AV troubleshooting sections with screenshots

Total: ~12-16 days to v1.0.0.

Testing strategy

Unit tests (per-package)

  • divert/filter: filter expression builder produces expected strings for various PID lists
  • divert/packet: parse + serialize + checksum recompute is round-trip identity
  • engine/recovery: failure classifier returns expected Action for each FailureClass
  • socks5/udp: encap/decap round-trip
  • procscan: snapshot diffing, mocked toolhelp32
  • autostart: registry read/write/disabled-detection (with mock registry)
  • single: mutex acquire + release lifecycle
  • config: defaults applied, malformed TOML → defaults + warning, version migration

Integration tests (each milestone has its own)

  • engine_test.go: mock WinDivert + mock SOCKS5 server in-process, exercise full pipeline
  • redirect_test.go: spin up TCP listener, fake Discord client, fake SOCKS5 server, verify bytes flow

Manual test plan (per milestone, in docs/testing/p2-<milestone>-manual.md)

Each manual test case is a numbered step-by-step with expected outcome. Run on clean Win11 VM snapshot before each milestone tag.

End-to-end (manual, before v1.0.0)

Full user journey in docs/testing/p2-e2e.md:

  1. Download installer from Forgejo release
  2. Install via setup.exe (UAC prompt)
  3. First launch: configure proxy, run check, click Start
  4. Run Discord, place voice call → verify routing via mihomo logs
  5. Toggle autostart on
  6. Reboot → verify drover starts at login (UAC accept)
  7. Sleep + wake cycle → verify continuity
  8. Stop mihomo → verify Reconnecting state → restart mihomo → verify recovery
  9. Quit via tray menu → verify clean shutdown
  10. Uninstall → verify cleanup

Open questions / assumptions to validate during P2.1

  1. imgk/divert-go v0.1.0 still works with WinDivert v2.2.2? If not, switch to direct syscall bindings. Verify in P2.1 day 1.
  2. Filter expression length limit — WinDivert filter expressions have a max length. With 4 Discord PIDs + own PID + upstream IP exclusion + multicast we should be well under, but if user adds 10+ Vesktop variants we might hit it. Verify and document limit during P2.1.
  3. WinDivertSend for inbound packets we synthesize — does the kernel correctly route a fabricated dst=Discord_IP, src=real_target_IP packet back to Discord's socket? Most divert-based tools do this; verify in P2.2 day 1 with a tracer.
  4. Embedded ICO size on disk — 4 icons × ~5KB = 20KB. Negligible.

Files to read before implementation

  • imgk/shadow/pkg/divert/ — opens handle + read packets pattern (downloaded already)
  • imgk/divert-go README + addr.go — API surface
  • runetfreedom/force-proxy/proxy.cpp — correct SOCKS5 UDP ASSOCIATE flow (local at /tmp/drover-cmp/force-proxy/)
  • wailsapp/wails/v2/examples/react — Wails patterns for Engine bindings
  • This spec.