spec: Phase 2 engine — WinDivert + SOCKS5 transparent proxy
Design accepted 2026-05-01. Locks in 5 architectural decisions (GUI-only, UAC-per-launch, no DPI bypass, hide-to-tray with toast, contextual recovery) and decomposes Phase 2 into 5 milestones with explicit acceptance criteria + a 30-row edge case matrix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,651 @@
|
|||||||
|
# Engine — WinDivert + SOCKS5 transparent proxy for Discord
|
||||||
|
|
||||||
|
**Status**: design accepted 2026-05-01.
|
||||||
|
**Replaces**: stub `StartEngine`/`StopEngine` in `internal/gui/app.go` that just toggle a flag.
|
||||||
|
**Implements**: Phase 2 from `docs/planning/cuddly-baking-taco.md`.
|
||||||
|
|
||||||
|
## Why
|
||||||
|
|
||||||
|
The checker proves the upstream SOCKS5 proxy works. The engine is what
|
||||||
|
actually routes Discord's traffic through it. Without the engine, every
|
||||||
|
diagnostic in the world is theatre — the GUI just sits there saying
|
||||||
|
"Active" while Discord still talks direct to discord.com. Phase 2 turns
|
||||||
|
that "Active" state into reality: kernel-level packet capture (WinDivert),
|
||||||
|
NAT-style TCP redirect to a loopback listener, SOCKS5 UDP ASSOCIATE for
|
||||||
|
voice, and a polished lifecycle so the user can install once, click
|
||||||
|
"autostart at login", and forget the thing exists until Discord stops
|
||||||
|
working — at which point the tray icon turns yellow and explains why.
|
||||||
|
|
||||||
|
## Architecture decisions (locked-in 2026-05-01)
|
||||||
|
|
||||||
|
| # | Decision | Rationale |
|
||||||
|
|---|---|---|
|
||||||
|
| **A** | GUI-only single-process; no Windows service | Friends-and-family Windows-PC, Discord runs only when user is logged in. Service mode is overengineering for v1; can be added in v0.4 if a power user asks. |
|
||||||
|
| **B1** | UAC prompt at every launch; no scheduled-task trampoline | User chose simplicity over polish. Each `drover.exe` invocation re-elevates if not admin. Autostart via `HKCU\...\Run` triggers the same prompt at login. |
|
||||||
|
| **C1** | No DPI bypass (no fake QUIC injection) | Start with the simplest pipeline that works. If a friend reports voice not working on a DPI-active provider, add C2/C3 in v0.4. |
|
||||||
|
| **D1** | Window X = hide-to-tray + first-time toast; quit only via tray menu | Industry-standard (Steam, Discord, Telegram). One-shot toast prevents the "where did it go?" surprise. |
|
||||||
|
| **E3** | Contextual recovery: driver-loss → 1 reopen retry → fail-stop; proxy-loss → infinite exp-backoff (Reconnecting state); panic → fail-stop with crash dump; sleep/resume → graceful pause/resume | Different failure classes need different responses. Aggressive auto-restart on every error masks bugs; honest fail-stop on every error annoys the user during transient network blips. |
|
||||||
|
|
||||||
|
## High-level architecture
|
||||||
|
|
||||||
|
```
|
||||||
|
┌─────────────────────────────────────┐
|
||||||
|
│ drover.exe (single binary) │
|
||||||
|
│ │
|
||||||
|
│ ┌──────────────┐ ┌──────────────┐ │
|
||||||
|
│ │ Wails GUI │ │ systray │ │
|
||||||
|
│ └──────┬───────┘ └──────┬───────┘ │
|
||||||
|
│ └───────┬────────┘ │
|
||||||
|
│ ┌─────────▼──────────┐ │
|
||||||
|
│ │ Engine │ │
|
||||||
|
│ │ state machine │ │
|
||||||
|
│ │ Idle / Starting / │ │
|
||||||
|
│ │ Active / Reconn / │ │
|
||||||
|
│ │ Failed │ │
|
||||||
|
│ └─────────┬──────────┘ │
|
||||||
|
│ ┌─────────┼─────────────┐ │
|
||||||
|
│ ▼ ▼ ▼ │
|
||||||
|
│ ┌──────┐ ┌────────┐ ┌──────────┐ │
|
||||||
|
│ │divert│ │redirect│ │ procscan │ │
|
||||||
|
│ │ pkt │ │ TCP+UDP│ │ (2s tick)│ │
|
||||||
|
│ └──┬───┘ └───┬────┘ └────┬─────┘ │
|
||||||
|
│ ▼ ▼ │ │
|
||||||
|
│ WinDivert socks5 │ │
|
||||||
|
│ .sys client │ │
|
||||||
|
└──────────────────────────────┼──────┘
|
||||||
|
│
|
||||||
|
┌────────────┐ ┌─────────────▼───┐
|
||||||
|
│ kernel │ │ upstream SOCKS5 │
|
||||||
|
│ packet cap │ │ (mihomo) │
|
||||||
|
└────────────┘ └─────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
## File layout
|
||||||
|
|
||||||
|
```
|
||||||
|
cmd/drover/
|
||||||
|
main.go existing — extend with engine startup, single-instance check
|
||||||
|
uac_windows.go new — IsAdmin, ReElevate
|
||||||
|
console_windows.go existing
|
||||||
|
autoupdate_windows.go existing
|
||||||
|
|
||||||
|
internal/engine/
|
||||||
|
engine.go new — orchestration, state machine, lifecycle
|
||||||
|
state.go new — Idle/Starting/Active/Reconnecting/Failed enum + transitions
|
||||||
|
recovery.go new — failure classifier → action mapper
|
||||||
|
health.go new — heartbeat timer, traffic detector
|
||||||
|
power_windows.go new — WM_POWERBROADCAST listener (sleep/resume)
|
||||||
|
|
||||||
|
internal/divert/
|
||||||
|
divert.go new — WinDivert handle wrapper
|
||||||
|
filter.go new — filter expression builder
|
||||||
|
packet.go new — IPv4 + TCP/UDP parse + checksum recompute
|
||||||
|
installer.go new — extract embedded WinDivert.sys/.dll on first run
|
||||||
|
divert_arm64.go new — stub returning "ARM64 not supported"
|
||||||
|
|
||||||
|
internal/socks5/ NEW — production client (separate from internal/checker/socks5.go)
|
||||||
|
client.go new — TCP CONNECT + greet/auth
|
||||||
|
udp.go new — UDP ASSOCIATE + encapsulate/decapsulate
|
||||||
|
pool.go new — control-channel pool (deferred to P2.5 if needed)
|
||||||
|
|
||||||
|
internal/redirect/
|
||||||
|
tcp.go new — NAT-loopback redirect listener + per-flow pump
|
||||||
|
udp.go new — per-flow UDP tracker + encap/decap
|
||||||
|
|
||||||
|
internal/procscan/
|
||||||
|
procscan.go new — Toolhelp32 snapshot, periodic PID resolver
|
||||||
|
|
||||||
|
internal/tray/
|
||||||
|
tray.go new — getlantern/systray icon + menu
|
||||||
|
icons.go new — embed idle/active/reconnecting/error ICOs
|
||||||
|
|
||||||
|
internal/autostart/
|
||||||
|
autostart_windows.go new — HKCU\...\Run registry toggle
|
||||||
|
|
||||||
|
internal/single/
|
||||||
|
single_windows.go new — named mutex + activation pipe
|
||||||
|
|
||||||
|
internal/config/
|
||||||
|
config.go new — TOML schema + defaults
|
||||||
|
loader.go new — load/save with file lock
|
||||||
|
watcher.go new — fsnotify hot-reload
|
||||||
|
|
||||||
|
internal/gui/
|
||||||
|
app.go existing — extend with engine bindings
|
||||||
|
frontend/... existing — wire engine controls + autostart checkbox
|
||||||
|
|
||||||
|
third_party/windivert/ existing — WinDivert64.sys, WinDivert.dll, LICENSE-LGPL
|
||||||
|
third_party/icons/ new — tray/{idle,active,reconnecting,error}.ico
|
||||||
|
```
|
||||||
|
|
||||||
|
## Engine state machine
|
||||||
|
|
||||||
|
```
|
||||||
|
┌────────┐
|
||||||
|
│ Idle │ ◄────────────────── (initial)
|
||||||
|
└────┬───┘
|
||||||
|
│ user clicks "Start engine"
|
||||||
|
▼
|
||||||
|
┌────────────┐
|
||||||
|
┌──────│ Starting │── any error ───┐
|
||||||
|
│ └─────┬──────┘ │
|
||||||
|
│ │ all checks ok │
|
||||||
|
│ ▼ │
|
||||||
|
│ ┌────────────┐ │
|
||||||
|
│ │ Active │ ◄─── recover ─┐ │
|
||||||
|
│ └────┬───────┘ │
|
||||||
|
│ │ proxy lost / SOCKS5 │
|
||||||
|
│ │ control channels died │
|
||||||
|
│ ▼ │
|
||||||
|
│ ┌─────────────┐ │
|
||||||
|
│ │Reconnecting │── 5 min cap ──┐ │
|
||||||
|
│ └────┬────────┘ │
|
||||||
|
│ │ recovered │
|
||||||
|
│ ▼ │
|
||||||
|
│ back to Active │
|
||||||
|
│ │
|
||||||
|
│ Stop button ─►───────────────────┐│
|
||||||
|
│ ▼▼
|
||||||
|
│ ┌────────┐
|
||||||
|
└──── Stop ───────────────────►│ Failed │
|
||||||
|
└────┬───┘
|
||||||
|
│ user clicks Retry
|
||||||
|
▼
|
||||||
|
(back to Starting)
|
||||||
|
```
|
||||||
|
|
||||||
|
States visible to GUI as `EngineStatus`:
|
||||||
|
- `Idle` — engine off, tray icon grey, GUI shows "Start" button
|
||||||
|
- `Starting` — handle being opened, procscan running, health-check; tray yellow with spin
|
||||||
|
- `Active` — packets flowing; tray green; live stats updating
|
||||||
|
- `Reconnecting` — proxy unreachable, exponential backoff in progress; tray yellow; "Reconnecting (3rd attempt)"
|
||||||
|
- `Failed` — driver lost twice OR panic OR Reconnecting hit 5 min cap. Tray red. GUI shows error message + Retry button.
|
||||||
|
|
||||||
|
## E3 recovery rules (failure classifier)
|
||||||
|
|
||||||
|
```go
|
||||||
|
// internal/engine/recovery.go
|
||||||
|
|
||||||
|
type FailureClass int
|
||||||
|
const (
|
||||||
|
ClassDriverLost FailureClass = iota // WinDivert handle invalid, ERROR_INVALID_HANDLE on Recv
|
||||||
|
ClassDriverGone // WinDivertOpen returns ERROR_FILE_NOT_FOUND or similar
|
||||||
|
ClassProxyUnreachable // SOCKS5 control TCP connection rejected/timeout
|
||||||
|
ClassPanic // recover() in goroutine
|
||||||
|
ClassSleep // WM_POWERBROADCAST suspend
|
||||||
|
ClassResume // WM_POWERBROADCAST resume
|
||||||
|
ClassFatal // anything we can't classify
|
||||||
|
)
|
||||||
|
|
||||||
|
type Action int
|
||||||
|
const (
|
||||||
|
ActionRetryOnce Action = iota // sleep 2s, reopen, if fails again → Failed
|
||||||
|
ActionExpBackoff // 1s → 5s → 30s cap, infinite, max 5min cumulative
|
||||||
|
ActionFailStop // straight to Failed, write crash dump
|
||||||
|
ActionPause // drain in-flight, close sockets, transition to Reconnecting
|
||||||
|
ActionResume // wait 5s, reopen handle, transition to Active
|
||||||
|
)
|
||||||
|
|
||||||
|
func ClassifyFailure(err error, class FailureClass) Action
|
||||||
|
```
|
||||||
|
|
||||||
|
| Class | Action | UI feedback |
|
||||||
|
|---|---|---|
|
||||||
|
| `DriverLost` | RetryOnce | Status="reopening driver" |
|
||||||
|
| `DriverGone` | FailStop | "Driver missing — reinstall Drover" |
|
||||||
|
| `ProxyUnreachable` | ExpBackoff | "Reconnecting (Nth attempt)…" |
|
||||||
|
| `Panic` | FailStop | "Engine crashed — log saved to %PROGRAMDATA%\\Drover\\logs\\crash-*.txt" |
|
||||||
|
| `Sleep` | Pause | "Paused (system sleep)" |
|
||||||
|
| `Resume` | Resume | "Resuming…" then back to Active |
|
||||||
|
|
||||||
|
**Health-check before Start engine**: GUI's Start button first runs `internal/checker.Run` with a reduced subset (tcp + greet + udp tests, 2s budget, no voice-quality). If any fails, the engine doesn't start and the GUI shows what failed. Prevents the "I clicked Start but Discord still doesn't work" mystery.
|
||||||
|
|
||||||
|
**Heartbeat timer**: every 5s, sample `(rxBytes_now - rxBytes_5sAgo) > 0`. If false for 30s while Active and procscan reports Discord PIDs > 0, set status=`Active (no traffic)` (informational sub-state, tray green→yellow but state machine stays in Active). User sees this and can investigate (Discord might just be idle).
|
||||||
|
|
||||||
|
**Crash dumps**: panic recover in any engine goroutine writes `%PROGRAMDATA%\Drover\logs\crash-YYYYMMDD-HHMMSS.txt` with full stack + goroutine dump + version. Then transitions to Failed.
|
||||||
|
|
||||||
|
## WinDivert layer
|
||||||
|
|
||||||
|
### Filter expression (rebuilt on PID list change)
|
||||||
|
|
||||||
|
```
|
||||||
|
outbound and (tcp or udp) and ip
|
||||||
|
and (processId == 12345 or processId == 67890 or ...)
|
||||||
|
and processId != <own_pid>
|
||||||
|
and ip.DstAddr != <upstream_proxy_ip>
|
||||||
|
and not (ip.DstAddr >= 224.0.0.0 and ip.DstAddr <= 239.255.255.255)
|
||||||
|
and not (ip.DstAddr >= 127.0.0.0 and ip.DstAddr <= 127.255.255.255)
|
||||||
|
and not (ip.DstAddr >= 169.254.0.0 and ip.DstAddr <= 169.254.255.255)
|
||||||
|
```
|
||||||
|
|
||||||
|
Notes:
|
||||||
|
- `ip` (IPv4) only — no `ipv6` clause. Discord client falls back to v4 in ~150ms via Happy Eyeballs.
|
||||||
|
- `processId != own_pid` is critical — without it our own SOCKS5 traffic to upstream gets caught and infinite-looped.
|
||||||
|
- Multicast/loopback/link-local explicitly excluded (Discord never talks to those, but extra safety).
|
||||||
|
|
||||||
|
If the upstream proxy IP cannot be resolved at engine start, we fail-stop with a clear message — we cannot build a correct filter without it.
|
||||||
|
|
||||||
|
### Library choice
|
||||||
|
|
||||||
|
Use `github.com/imgk/divert-go` v0.1.0 (existing dep proposal — verify it still maintained when implementing P2.1). If unmaintained / broken, write thin syscall bindings directly — WinDivert C API is small (~6 functions used).
|
||||||
|
|
||||||
|
### Driver lifecycle
|
||||||
|
|
||||||
|
1. **First run**: extract embedded `WinDivert64.sys` + `WinDivert.dll` from Go `embed.FS` into `%PROGRAMDATA%\Drover\windivert\`. SHA256-verify against expected hashes (compiled in at build time).
|
||||||
|
2. **Open handle**: `WinDivertOpen(filter, layer=NETWORK, priority=0, flags=0)`. The driver auto-installs as a Windows service named "WinDivert" on first open.
|
||||||
|
3. **Driver remains installed across reboots** — we don't uninstall on Stop. Uninstaller (Inno Setup) explicitly does `sc stop WinDivert && sc delete WinDivert` on uninstall.
|
||||||
|
|
||||||
|
### Driver edge cases (D-series in matrix)
|
||||||
|
|
||||||
|
- **D-1: not installed** → embedded copy + auto-install on WinDivertOpen.
|
||||||
|
- **D-2: old v1.x** (zapret legacy) → `WinDivertOpen` returns `ERROR_DRIVER_FAILED_PRIOR_UNLOAD`. Detect: query service "WinDivert" via `OpenServiceW` + `QueryServiceStatusEx` to read binary path → check version resource. Show "Outdated WinDivert detected from another tool. Stop the other tool and reboot."
|
||||||
|
- **D-3: corrupted .sys** → SHA256 mismatch on extract. Reinstall path (delete + recopy + retry).
|
||||||
|
- **D-4: AV quarantine** → embedded bytes don't match expected → show specific error: "Antivirus may have quarantined WinDivert64.sys. Add `%PROGRAMDATA%\Drover\` to your AV exclusions and restart Drover."
|
||||||
|
- **D-5: reboot pending** → install successful but service not started → show "Reboot required to activate driver" with no retry button.
|
||||||
|
- **D-7: ARM64** → `runtime.GOARCH` check at startup; on ARM64 show "Drover requires x86-64 Windows. WinDivert does not support ARM64."
|
||||||
|
|
||||||
|
## TCP redirect (NAT-loopback)
|
||||||
|
|
||||||
|
### Mechanism
|
||||||
|
|
||||||
|
1. On engine start, bind a TCP listener on `127.0.0.1:0` (OS picks unused port). Save the port number.
|
||||||
|
2. WinDivert sees a new SYN from `Discord.exe → real_target_ip:real_target_port`. Engine:
|
||||||
|
a. Modifies the IP header: `dst_addr = 127.0.0.1`, `dst_port = listener_port`. Stores mapping `(src_port → real_target_ip:port)` in a `sync.Map` with TTL 30 min.
|
||||||
|
b. Recomputes IP + TCP checksums.
|
||||||
|
c. Reinjects via `WinDivertSend` with direction=outbound. The kernel routes to loopback because dst is now 127.0.0.1.
|
||||||
|
3. Listener `accept()` returns a conn from `127.0.0.1:src_port`. Engine looks up mapping by `src_port`, finds real_target.
|
||||||
|
4. Engine opens fresh SOCKS5 control TCP to upstream, does greet + (auth if config) + CONNECT to real_target_ip:port.
|
||||||
|
5. Once SOCKS5 returns REP=00, `io.Copy` pumps bytes both directions until EOF on either side.
|
||||||
|
6. Conn close → drop mapping.
|
||||||
|
|
||||||
|
### TCP edge cases
|
||||||
|
|
||||||
|
- **T-1: listener bind fails** → fail-stop "could not bind loopback listener". Should never happen (random unused port).
|
||||||
|
- **T-2: 100+ concurrent flows** — sync.Map scales fine. Bound only by Discord's TCP usage (typically 50).
|
||||||
|
- **T-3: TCP retransmits** — handled by OS at both sides of the loopback.
|
||||||
|
- **T-4: IPv6** — dropped at filter level. Discord falls back to v4.
|
||||||
|
- **T-5: half-closed** — `io.Copy` returns on EOF in one direction; we close the other side via `defer conn.Close()`.
|
||||||
|
- **T-6: mapping leak** if conn never properly closes — TTL 30min sweeper goroutine deletes stale entries.
|
||||||
|
|
||||||
|
## UDP redirect (SOCKS5 UDP ASSOCIATE)
|
||||||
|
|
||||||
|
### Mechanism
|
||||||
|
|
||||||
|
1. WinDivert sees outbound UDP from `Discord.exe:src_port → real_target_ip:port`. Engine:
|
||||||
|
a. Looks up mapping by `(src_ip, src_port, real_target_ip, real_target_port)`. If absent:
|
||||||
|
b. **Open new SOCKS5 control TCP** to upstream. Greet + (auth) + UDP ASSOCIATE.
|
||||||
|
c. Receive relay endpoint `(relay_ip, relay_port)` — if BND.ADDR is `0.0.0.0` substitute `upstream_proxy_ip`.
|
||||||
|
d. Open client-side UDP socket on `127.0.0.1:0`. Save mapping `flow_id → {control_tcp, relay, client_udp}`.
|
||||||
|
2. **Outbound packet path**: encap with SOCKS5 UDP header `00 00 | 00 | ATYP=01 | DST_IP(4) | DST_PORT(2) | DATA`. Send via `client_udp.WriteTo(packet, relay)`. Don't reinject the original packet — drop it (we sent the encapsulated version through the relay).
|
||||||
|
3. **Inbound packet path** (separate goroutine per flow): `client_udp.ReadFrom(buf)` → strip 10-byte SOCKS5 header → fabricate an IPv4+UDP packet with `src=real_target_ip:port, dst=Discord_src_ip:src_port`, recompute checksums → `WinDivertSend` direction=inbound. Discord sees a normal reply from real_target.
|
||||||
|
4. Idle TTL 5 min: any flow with no packets for 5 min → close control_tcp + client_udp + remove mapping.
|
||||||
|
|
||||||
|
### UDP edge cases
|
||||||
|
|
||||||
|
- **U-1**: each flow gets its own control TCP. No pool in v1 (overhead is ~5KB per flow, fine for ~10 active flows).
|
||||||
|
- **U-2: idle leak** → 5min TTL.
|
||||||
|
- **U-3: Discord changes voice region** mid-call → old flow goes idle (5min TTL), new flow opens. Brief glitch.
|
||||||
|
- **U-4: UDP fragments** → SOCKS5 RFC 1928 doesn't support FRAG. Drop. Discord packets are typically <1500 bytes; fragmentation rare.
|
||||||
|
- **U-5: control TCP dies** → next packet detects via `Write` error → close mapping → next-next packet opens fresh control. Audio glitch ~2-3s.
|
||||||
|
|
||||||
|
## Process scanning
|
||||||
|
|
||||||
|
### Mechanism
|
||||||
|
|
||||||
|
`internal/procscan` runs every 2 seconds:
|
||||||
|
1. `CreateToolhelp32Snapshot(TH32CS_SNAPPROCESS, 0)` → enumerate via `Process32First`/`Process32Next`. Microseconds.
|
||||||
|
2. Filter by `szExeFile` against config `targets.processes` (case-insensitive on Windows).
|
||||||
|
3. Diff vs previous PID set. If different → notify engine to rebuild filter expression and reopen WinDivert handle.
|
||||||
|
|
||||||
|
### Race: Discord starts up to 2s before procscan catches it
|
||||||
|
|
||||||
|
Mitigation: at engine `Start`, do **synchronous initial scan** before opening WinDivert handle. After that, the periodic 2s tick handles ongoing changes.
|
||||||
|
|
||||||
|
### Process edge cases
|
||||||
|
|
||||||
|
- **P-1: Discord PID changes** → 2s scan + 50ms reopen gap with direct traffic. Acceptable.
|
||||||
|
- **P-2: multiple Discord variants**: default config includes `Discord.exe`, `DiscordCanary.exe`, `DiscordPTB.exe`, `Update.exe`. Vesktop **opt-in** via config (not default).
|
||||||
|
- **P-3: Update.exe** (Discord's updater) included in default — it downloads patches via HTTP and we want those proxied too.
|
||||||
|
- **P-5: PID re-use** (Discord exits, Chrome takes the PID before next scan) → 2s window where Chrome packets get proxied. Cosmetic, low-impact.
|
||||||
|
|
||||||
|
## Self-loop protection
|
||||||
|
|
||||||
|
The engine itself opens TCP/UDP connections to the upstream proxy. Without protection, the WinDivert filter would catch our own packets, encapsulate them in another SOCKS5 layer, infinite loop in seconds.
|
||||||
|
|
||||||
|
Three layers of defense:
|
||||||
|
|
||||||
|
1. `processId != own_pid` in the filter expression.
|
||||||
|
2. `ip.DstAddr != <upstream_proxy_ip>` (resolved once at engine start; if upstream uses DDNS we re-resolve every 30s of failed reconnects).
|
||||||
|
3. Listener and SOCKS5 client always bind to `127.0.0.1` — even if filter leaks, loopback traffic is excluded by `not (ip.DstAddr >= 127.0.0.0 ...)`.
|
||||||
|
|
||||||
|
## UAC + autostart (B1)
|
||||||
|
|
||||||
|
### Elevation
|
||||||
|
|
||||||
|
`cmd/drover/main.go` startup sequence:
|
||||||
|
|
||||||
|
```go
|
||||||
|
func main() {
|
||||||
|
// 1. AttachConsole for CLI compatibility (existing)
|
||||||
|
attachConsole()
|
||||||
|
|
||||||
|
// 2. Single-instance check (mutex). If second instance, send "show" to first and exit.
|
||||||
|
if !single.AcquireMutex() {
|
||||||
|
single.ActivateExistingInstance()
|
||||||
|
os.Exit(0)
|
||||||
|
}
|
||||||
|
|
||||||
|
// 3. Parse Cobra commands. CLI sub-commands like `--check` and `--version` don't need admin
|
||||||
|
// and can run as user. The default GUI mode requires admin for WinDivert.
|
||||||
|
if cmdNeedsAdmin() && !uac.IsAdmin() {
|
||||||
|
uac.ReElevate(os.Args[1:]) // ShellExecute("runas", ...) + exit
|
||||||
|
os.Exit(0)
|
||||||
|
}
|
||||||
|
|
||||||
|
// 4. Auto-update check (existing). Replace exe + relaunch if needed.
|
||||||
|
autoUpdateOnStartup()
|
||||||
|
|
||||||
|
// 5. Boot Wails GUI + engine.
|
||||||
|
gui.Run(Version)
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
`uac.ReElevate` uses `ShellExecuteW` with `lpVerb="runas"`. If user cancels UAC, `ShellExecute` returns `SE_ERR_ACCESSDENIED` → we exit cleanly without an error dialog (the user already saw their cancel intent).
|
||||||
|
|
||||||
|
### Autostart
|
||||||
|
|
||||||
|
Implemented via `HKCU\Software\Microsoft\Windows\CurrentVersion\Run\DroverGo`:
|
||||||
|
- Value type: REG_SZ, value: full path to `drover.exe` with no args
|
||||||
|
- Set on toggle ON, deleted on toggle OFF
|
||||||
|
- GUI Settings tab has a checkbox "Запускать при входе в Windows" that reads/writes this key
|
||||||
|
|
||||||
|
**Edge case A-5**: User disables autostart via Task Manager → Startup Apps. Windows writes a `Disabled` mark in `HKCU\Software\Microsoft\Windows\CurrentVersion\Explorer\StartupApproved\Run`. On GUI mount we check both keys; if Disabled → checkbox shown unchecked (user wins).
|
||||||
|
|
||||||
|
**Edge case A-6**: Stale path (drover.exe was moved). On every launch we re-write the key value to `os.Executable()` if autostart is enabled. Self-healing.
|
||||||
|
|
||||||
|
## Tray + window (D1)
|
||||||
|
|
||||||
|
### Tray icon (4 ICO files embedded)
|
||||||
|
|
||||||
|
| State | Icon | When shown |
|
||||||
|
|---|---|---|
|
||||||
|
| `idle` | grey | Engine not running |
|
||||||
|
| `active` | green | Engine running, traffic flowing |
|
||||||
|
| `reconnecting` | yellow | Reconnecting state OR no-traffic-detected |
|
||||||
|
| `error` | red | Failed state |
|
||||||
|
|
||||||
|
### Tray menu (right-click)
|
||||||
|
|
||||||
|
```
|
||||||
|
[●] Active · 2h 14m · ↑ 142 KB/s ↓ 1.2 MB/s [disabled status row, dynamic]
|
||||||
|
─────────────────────────────────────
|
||||||
|
[⏸] Stop proxying [primary action, contextual]
|
||||||
|
[🔍] Run check [opens window + auto-runs check]
|
||||||
|
─────────────────────────────────────
|
||||||
|
[🪟] Show window [hidden when window is visible]
|
||||||
|
[📁] Open log file
|
||||||
|
─────────────────────────────────────
|
||||||
|
[🔄] Check for updates
|
||||||
|
[ℹ] About
|
||||||
|
─────────────────────────────────────
|
||||||
|
[✕] Quit
|
||||||
|
```
|
||||||
|
|
||||||
|
The status row is updated every 1s while engine is running.
|
||||||
|
|
||||||
|
### Click behaviors
|
||||||
|
|
||||||
|
- Single-click tray icon → toggle window visibility
|
||||||
|
- Double-click tray icon → open window (no toggle, always show)
|
||||||
|
- X on window title bar → hide to tray (D1)
|
||||||
|
- First-time only: toast "Drover свёрнут в трей. Engine продолжает работать. Закрыть полностью — через меню трея → Quit." Track via `config.ui.shown_tray_toast = true`.
|
||||||
|
- Quit from tray menu → graceful engine stop → exit cleanly
|
||||||
|
|
||||||
|
### Library
|
||||||
|
|
||||||
|
`github.com/getlantern/systray`. Stable on Win10/11 modulo the explorer-restart edge case which the library handles internally.
|
||||||
|
|
||||||
|
## Single-instance enforcement
|
||||||
|
|
||||||
|
Mutex name: `Global\DroverGoInstance-<installID>` where `installID = SHA256(os.Executable())[:16]`. This way:
|
||||||
|
- Installed copy at `C:\Program Files\Drover\drover.exe` and a portable copy at `D:\portable\drover.exe` get different mutexes — both can run.
|
||||||
|
- Two simultaneous launches of the same install fight over the mutex; second loses.
|
||||||
|
|
||||||
|
Activation pipe: `\\.\pipe\drover-gui-<installID>`. Second instance opens it, writes `{"action":"show"}`, closes. First instance's listener goroutine pops the window to foreground.
|
||||||
|
|
||||||
|
If first instance crashes without cleanup → mutex disappears at process death (kernel handle table cleanup). Next launch acquires normally.
|
||||||
|
|
||||||
|
## Sleep/resume handling
|
||||||
|
|
||||||
|
`WM_POWERBROADCAST` listener via Windows message loop in a dedicated goroutine. Uses `RegisterPowerSettingNotification` for fine-grained events.
|
||||||
|
|
||||||
|
| Event | Action |
|
||||||
|
|---|---|
|
||||||
|
| `PBT_APMSUSPEND` | Engine: drain in-flight packets (give 200ms), close all SOCKS5 control TCPs, close WinDivert handle, set status="paused (sleep)" |
|
||||||
|
| `PBT_APMRESUMEAUTOMATIC` or `PBT_APMRESUMESUSPEND` | Wait 5s for network reconnect (poll `GetIpForwardTable2` for default route presence), reopen WinDivert handle, run health-check, transition Active |
|
||||||
|
|
||||||
|
## Stats counters
|
||||||
|
|
||||||
|
Atomic counters in `internal/engine/stats.go`:
|
||||||
|
- `bytesIn uint64` — bytes received from upstream (decapsulated UDP + TCP `io.Copy` returns)
|
||||||
|
- `bytesOut uint64` — bytes sent to upstream
|
||||||
|
- `tcpFlowsActive int32` — current count of open TCP redirects
|
||||||
|
- `udpFlowsActive int32` — current count of open UDP flows
|
||||||
|
- `startedAt time.Time` — engine start time (for uptime)
|
||||||
|
|
||||||
|
Per-flow counters discarded on flow close (no aggregation needed for v1).
|
||||||
|
|
||||||
|
Tray status row updates from these every 1s. GUI live stats panel does the same via Wails event `stats:update` (existing path).
|
||||||
|
|
||||||
|
Lifetime totals persisted to `%PROGRAMDATA%\Drover\stats.json` every 60s and on Stop.
|
||||||
|
|
||||||
|
## Config schema (TOML)
|
||||||
|
|
||||||
|
`%APPDATA%\Drover\config.toml`:
|
||||||
|
|
||||||
|
```toml
|
||||||
|
# Drover-Go config — auto-managed by GUI; manual edits hot-reload via fsnotify.
|
||||||
|
|
||||||
|
version = 1
|
||||||
|
|
||||||
|
[proxy]
|
||||||
|
host = "95.165.72.59"
|
||||||
|
port = 12334
|
||||||
|
auth = false
|
||||||
|
login = ""
|
||||||
|
password = ""
|
||||||
|
udp_associate_timeout = "5s"
|
||||||
|
tcp_connect_timeout = "10s"
|
||||||
|
|
||||||
|
[targets]
|
||||||
|
processes = ["Discord.exe", "DiscordCanary.exe", "DiscordPTB.exe", "Update.exe"]
|
||||||
|
include_vesktop = false
|
||||||
|
|
||||||
|
[skip]
|
||||||
|
# CIDR ranges to never proxy. Local + link-local always implicitly skipped at filter level.
|
||||||
|
extra_skip_cidrs = []
|
||||||
|
multicast = true
|
||||||
|
|
||||||
|
[ui]
|
||||||
|
log_level = "info"
|
||||||
|
log_max_mb = 10
|
||||||
|
log_backups = 3
|
||||||
|
tray_icon = true
|
||||||
|
auto_start = false # mirror of HKCU\...\Run
|
||||||
|
shown_tray_toast = false # one-shot first-close toast tracking
|
||||||
|
theme = "dark" # dark | light | auto
|
||||||
|
|
||||||
|
[update]
|
||||||
|
check_on_startup = true
|
||||||
|
forgejo_repo = "git.okcu.io/root/drover-go"
|
||||||
|
|
||||||
|
[engine]
|
||||||
|
heartbeat_interval = "5s"
|
||||||
|
no_traffic_warn_after = "30s"
|
||||||
|
reconnect_backoff_initial = "1s"
|
||||||
|
reconnect_backoff_max = "30s"
|
||||||
|
reconnect_total_cap = "5m"
|
||||||
|
```
|
||||||
|
|
||||||
|
Edge cases:
|
||||||
|
- **M-4 corrupted TOML** → log warning + use defaults + GUI shows banner "Config error line N — running with defaults".
|
||||||
|
- **M-7 hot-reload** → fsnotify on the file. On change: re-parse → if proxy section changed → engine restart (Stop → wait clean → Start). Other sections apply live.
|
||||||
|
- **Config migration** v1→v2 handled by `version` field; missing version assumes 1.
|
||||||
|
|
||||||
|
## Edge case matrix (full)
|
||||||
|
|
||||||
|
This is the master list. Every row must have a corresponding test or explicit "verified manually" note in the implementation plan.
|
||||||
|
|
||||||
|
| # | Edge case | Mitigation | Test |
|
||||||
|
|---|---|---|---|
|
||||||
|
| **D-1** | WinDivert.sys not installed | Embed binary, copy to %PROGRAMDATA%, WinDivertOpen auto-loads | manual: clean Win11 VM |
|
||||||
|
| **D-2** | Old WinDivert v1.x present (zapret legacy) | Service version query → "remove old version first" error | manual: install zapret first, verify error |
|
||||||
|
| **D-3** | Driver corrupted | SHA256 verify on extract → reinstall flow with progress | unit test: SHA256 mismatch path |
|
||||||
|
| **D-4** | AV quarantines our embedded .sys | Specific AV-friendly error message + README link | manual: Defender enabled + first run |
|
||||||
|
| **D-5** | Reboot pending after install | Show "Reboot to activate driver" | manual: trigger via DISM |
|
||||||
|
| **D-7** | ARM64 Windows | Detect at startup, refuse install | unit: GOARCH=arm64 build returns expected error |
|
||||||
|
| **P-1** | Discord PID changes | 2s procscan + filter rebuild | integration: kill+restart Discord, verify continuity |
|
||||||
|
| **P-3** | Update.exe traffic | Default list includes it | integration: trigger Discord update, verify Update.exe traffic proxied |
|
||||||
|
| **P-5** | PID re-use | Cosmetic 2s window | accept |
|
||||||
|
| **L-1** | Self-loop (drover's own SOCKS5 traffic) | Filter excludes own_pid + upstream IP | unit: filter expression builder verifies own PID in output |
|
||||||
|
| **T-4** | IPv6 Discord targets | Drop at filter level; Happy Eyeballs falls back | manual: verify with `netsh interface ipv6 set route ::/0 disabled` |
|
||||||
|
| **T-6** | TCP mapping leak | 30min TTL cleanup | unit: TTL sweeper test |
|
||||||
|
| **U-2** | Idle UDP flow leak | 5min TTL cleanup | unit: TTL sweeper test |
|
||||||
|
| **U-4** | UDP fragments | Drop (SOCKS5 doesn't support FRAG) | accept (rare) |
|
||||||
|
| **A-1** | User non-admin | UAC re-launch on startup | manual: standard user account |
|
||||||
|
| **A-2** | UAC cancelled | Clean exit, no error dialog | manual: cancel UAC prompt |
|
||||||
|
| **A-3** | UAC at every login (autostart) | Accepted per B1 | document in README |
|
||||||
|
| **A-5** | Autostart disabled via Task Manager | Detect StartupApproved key, sync GUI checkbox | unit: registry mock |
|
||||||
|
| **TR-1** | Tray icon disappears on explorer.exe restart | systray library handles re-attach | manual: kill+restart explorer.exe |
|
||||||
|
| **TR-3** | First-time tray toast | Track `ui.shown_tray_toast` in config | unit: config writer |
|
||||||
|
| **SI-1** | Mutex collision portable vs installed | installID = SHA256(exe path)[:16] | unit: two paths → two mutexes |
|
||||||
|
| **SI-3** | First instance crashed without cleanup | Kernel cleans mutex on process death | manual: kill -9 first, launch second |
|
||||||
|
| **SR-1** | System sleep | WM_POWERBROADCAST listener → graceful pause | manual: trigger sleep on test machine |
|
||||||
|
| **SR-2** | System resume | Wait 5s network → reopen handle → resume | manual: wake from sleep |
|
||||||
|
| **UP-1** | Auto-update during active engine | Graceful shutdown → replace exe → relaunch with prior state | manual: stage v0.1 → v0.2 update during voice call |
|
||||||
|
| **M-1** | VPN concurrent | WinDivert ловит до VPN encap; SOCKS5 traffic to upstream IP — норма | manual: with WireGuard + Drover both active |
|
||||||
|
| **M-4** | Config corrupted | Use defaults + warning banner | unit: malformed TOML → defaults applied |
|
||||||
|
| **M-5** | Proxy IP changed (DDNS) | Re-resolve hostname every 30s of failed reconnect | unit: hostname resolver retry |
|
||||||
|
| **M-7** | Hot-reload config | fsnotify → engine restart | integration: edit TOML, observe restart |
|
||||||
|
|
||||||
|
## Out of scope (Phase 3+)
|
||||||
|
|
||||||
|
- DPI bypass / fake QUIC injection (decision **C1**) — add as opt-in toggle in v0.4 if needed
|
||||||
|
- Windows service mode (decision **A**) — add for power users in v0.4 if requested
|
||||||
|
- IPv6 SOCKS5 ATYP=04 — add when we hit a v6-only proxy
|
||||||
|
- ARM64 Windows — add when WinDivert ships ARM64 driver (waiting on basil00 upstream)
|
||||||
|
- Multi-user PC scenarios — single-user assumption baked in
|
||||||
|
- Vesktop default-on — stays opt-in via `targets.include_vesktop = true`
|
||||||
|
- Custom DNS resolver / DNS-over-proxy — out of scope; DNS goes direct, document in README
|
||||||
|
|
||||||
|
## Phase 2 milestones
|
||||||
|
|
||||||
|
Each milestone is a separate `writing-plans` invocation followed by `subagent-driven-development` execution.
|
||||||
|
|
||||||
|
### P2.1 — TCP-only MVP (3-4 days)
|
||||||
|
|
||||||
|
**Scope**: WinDivert handle, filter expression, packet parser, TCP NAT-loopback redirect, SOCKS5 client (TCP CONNECT only), procscan, self-loop protection, basic engine state machine (Idle/Starting/Active/Failed without Reconnecting yet).
|
||||||
|
|
||||||
|
**Acceptance**:
|
||||||
|
- Run drover.exe on Win11 with admin
|
||||||
|
- Discord chat + Discord API requests routed through SOCKS5 (verify via tcpdump on mihomo: should see TCP CONNECT to discord.com:443 from upstream IP)
|
||||||
|
- Voice does NOT yet work (UDP path absent) — documented expectation
|
||||||
|
- Stop button cleanly closes everything in <500ms
|
||||||
|
- Driver remains installed after exit (verify `sc query WinDivert`)
|
||||||
|
- No self-loop infinite traffic (verify: bytes in == bytes out, not exponentially growing)
|
||||||
|
|
||||||
|
### P2.2 — UDP voice (3-4 days)
|
||||||
|
|
||||||
|
**Scope**: SOCKS5 UDP ASSOCIATE primitives (production-grade, not the diagnostic-only fork in checker), UDP flow tracker, packet encap/decap, IPv4-fabrication-and-reinject for inbound path.
|
||||||
|
|
||||||
|
**Acceptance**:
|
||||||
|
- Voice call in Discord through proxy works without audible degradation
|
||||||
|
- Up to 4 simultaneous voice calls (ish) work without flow leakage
|
||||||
|
- Idle voice flow cleanup at 5min TTL (verified via debug log)
|
||||||
|
- Mid-call proxy disconnect → flow drops → re-opens within 2s on next outbound packet → ~2-3s audible glitch
|
||||||
|
- No memory leak after 1h voice call (RSS stable ±5MB)
|
||||||
|
|
||||||
|
### P2.3 — E3 recovery + sleep/resume (2 days)
|
||||||
|
|
||||||
|
**Scope**: failure classifier, contextual retry policies, Reconnecting state, exponential backoff, WM_POWERBROADCAST listener, heartbeat health-check.
|
||||||
|
|
||||||
|
**Acceptance**:
|
||||||
|
- Stop mihomo on LXC 102 mid-session → engine transitions Active → Reconnecting → Active when mihomo back up (within 30s of recovery)
|
||||||
|
- Trigger machine sleep mid-voice-call → engine pauses gracefully → wake → engine resumes within 10s after network up → voice continues (Discord client itself reconnects)
|
||||||
|
- WinDivert handle externally killed (`sc stop WinDivert && sc start WinDivert`) → engine reopens once → if second kill within 30s → Failed with crash log
|
||||||
|
- Heartbeat detects "no traffic" while Discord open and idle → tray turns yellow with "no traffic" tooltip → no Failed transition
|
||||||
|
|
||||||
|
### P2.4 — Tray + autostart + engine UI (2-3 days)
|
||||||
|
|
||||||
|
**Scope**: getlantern/systray integration, 4 ICO icons, tray menu (D1 + first-time toast), autostart checkbox in GUI Settings tab, Start/Stop buttons in main window wired to engine, status indicator with state machine awareness, single-instance enforcement.
|
||||||
|
|
||||||
|
**Acceptance**:
|
||||||
|
- Toggle autostart on → reboot → drover launches at login (after UAC accept)
|
||||||
|
- X on window → first-time toast → second X → silent hide
|
||||||
|
- Start button only enabled when checker passed (or in Failed state with Retry)
|
||||||
|
- Tray icon updates within 200ms of state change
|
||||||
|
- Two simultaneous launches → second activates first's window and exits silently
|
||||||
|
- Status row in tray menu updates every 1s while Active
|
||||||
|
|
||||||
|
### P2.5 — Polish (2-3 days)
|
||||||
|
|
||||||
|
**Scope**: crash dumps, config hot-reload via fsnotify, AV-friendly error messages, all remaining edge cases from matrix, README troubleshooting, install/uninstall verification on clean Win11 VM.
|
||||||
|
|
||||||
|
**Acceptance**:
|
||||||
|
- Every edge case in the matrix has either a passing test or a verified manual reproduction note in `docs/testing/p2-edge-cases.md`
|
||||||
|
- Install on clean Win11 VM, run for 1 hour without intervention, no errors
|
||||||
|
- Uninstall via Apps & Features removes everything except optionally-kept config (asked at uninstall)
|
||||||
|
- README has SmartScreen + AV troubleshooting sections with screenshots
|
||||||
|
|
||||||
|
**Total**: ~12-16 days to v1.0.0.
|
||||||
|
|
||||||
|
## Testing strategy
|
||||||
|
|
||||||
|
### Unit tests (per-package)
|
||||||
|
|
||||||
|
- `divert/filter`: filter expression builder produces expected strings for various PID lists
|
||||||
|
- `divert/packet`: parse + serialize + checksum recompute is round-trip identity
|
||||||
|
- `engine/recovery`: failure classifier returns expected Action for each FailureClass
|
||||||
|
- `socks5/udp`: encap/decap round-trip
|
||||||
|
- `procscan`: snapshot diffing, mocked toolhelp32
|
||||||
|
- `autostart`: registry read/write/disabled-detection (with mock registry)
|
||||||
|
- `single`: mutex acquire + release lifecycle
|
||||||
|
- `config`: defaults applied, malformed TOML → defaults + warning, version migration
|
||||||
|
|
||||||
|
### Integration tests (each milestone has its own)
|
||||||
|
|
||||||
|
- `engine_test.go`: mock WinDivert + mock SOCKS5 server in-process, exercise full pipeline
|
||||||
|
- `redirect_test.go`: spin up TCP listener, fake Discord client, fake SOCKS5 server, verify bytes flow
|
||||||
|
|
||||||
|
### Manual test plan (per milestone, in `docs/testing/p2-<milestone>-manual.md`)
|
||||||
|
|
||||||
|
Each manual test case is a numbered step-by-step with expected outcome. Run on clean Win11 VM snapshot before each milestone tag.
|
||||||
|
|
||||||
|
### End-to-end (manual, before v1.0.0)
|
||||||
|
|
||||||
|
Full user journey in `docs/testing/p2-e2e.md`:
|
||||||
|
1. Download installer from Forgejo release
|
||||||
|
2. Install via setup.exe (UAC prompt)
|
||||||
|
3. First launch: configure proxy, run check, click Start
|
||||||
|
4. Run Discord, place voice call → verify routing via mihomo logs
|
||||||
|
5. Toggle autostart on
|
||||||
|
6. Reboot → verify drover starts at login (UAC accept)
|
||||||
|
7. Sleep + wake cycle → verify continuity
|
||||||
|
8. Stop mihomo → verify Reconnecting state → restart mihomo → verify recovery
|
||||||
|
9. Quit via tray menu → verify clean shutdown
|
||||||
|
10. Uninstall → verify cleanup
|
||||||
|
|
||||||
|
## Open questions / assumptions to validate during P2.1
|
||||||
|
|
||||||
|
1. **`imgk/divert-go` v0.1.0 still works with WinDivert v2.2.2?** If not, switch to direct syscall bindings. Verify in P2.1 day 1.
|
||||||
|
2. **Filter expression length limit** — WinDivert filter expressions have a max length. With 4 Discord PIDs + own PID + upstream IP exclusion + multicast we should be well under, but if user adds 10+ Vesktop variants we might hit it. Verify and document limit during P2.1.
|
||||||
|
3. **`WinDivertSend` for inbound packets we synthesize** — does the kernel correctly route a fabricated `dst=Discord_IP, src=real_target_IP` packet back to Discord's socket? Most divert-based tools do this; verify in P2.2 day 1 with a tracer.
|
||||||
|
4. **Embedded ICO size on disk** — 4 icons × ~5KB = 20KB. Negligible.
|
||||||
|
|
||||||
|
## Files to read before implementation
|
||||||
|
|
||||||
|
- `imgk/shadow/pkg/divert/` — opens handle + read packets pattern (downloaded already)
|
||||||
|
- `imgk/divert-go` README + `addr.go` — API surface
|
||||||
|
- `runetfreedom/force-proxy/proxy.cpp` — correct SOCKS5 UDP ASSOCIATE flow (local at `/tmp/drover-cmp/force-proxy/`)
|
||||||
|
- `wailsapp/wails/v2/examples/react` — Wails patterns for Engine bindings
|
||||||
|
- This spec.
|
||||||
Reference in New Issue
Block a user