Files
drover-go/docs/superpowers/specs/2026-05-01-checker-design.md
T
root ea4202d4a3 spec: add voice-quality (burst loss/jitter) + voice-srv (Discord media probe)
Old single-shot stun test only proved one UDP packet round-tripped
through the relay. To predict whether voice will actually work the
checker now does two stronger tests:

- voice-quality: 30-packet STUN burst with loss/jitter/p50 metrics,
  with a "warn" tier between hard pass and hard fail.
- voice-srv: concurrent DNS resolve + SOCKS5 TCP probe to a list of
  Discord voice region hostnames; passes if any region is reachable.

Adds StatusWarn ("soft pass — show hint anyway") so the GUI can
distinguish "voice will work but glitchy" from green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 18:27:06 +03:00

9.4 KiB

Checker — 7-step SOCKS5 diagnostic

Status: design accepted 2026-05-01. Replaces: stub RunCheck in internal/gui/app.go that emits fake events.

Why

The Wails GUI exposes a "Check connection" button that the user presses before turning the engine on. Today it walks through a hard-coded scenario in Go, returning bogus metrics. The user can't tell whether their proxy is alive, supports UDP, or whether Discord blocks it. We need an honest diagnostic that tells the user exactly which capability of their SOCKS5 proxy works and which doesn't, with hex-level evidence on failure.

API surface

// internal/checker/checker.go
package checker

type Status string

const (
    StatusRunning Status = "running"
    StatusPassed  Status = "passed"
    StatusFailed  Status = "failed"
    StatusSkipped Status = "skipped"
)

type Result struct {
    ID       string        `json:"id"`
    Status   Status        `json:"status"`
    Metric   string        `json:"metric,omitempty"`
    Error    string        `json:"error,omitempty"`
    Hint     string        `json:"hint,omitempty"`
    RawHex   string        `json:"raw_hex,omitempty"`
    Duration time.Duration `json:"duration_ms"`
    Attempt  int           `json:"attempt"`
}

type Config struct {
    ProxyHost      string
    ProxyPort      int
    UseAuth        bool
    ProxyLogin     string
    ProxyPassword  string

    PerTestTimeout time.Duration
    MaxRetries     int
    RetryBackoff   time.Duration

    DiscordGateway string
    DiscordAPI     string
    StunServer     string

    // voice-quality burst tuning
    VoiceBurstCount    int           // default 30
    VoiceBurstInterval time.Duration // default 20ms

    // voice-srv probe — empty list means "use the built-in default
    // (russia/russia2/frankfurt/europe/singapore/japan/us-east/us-west/
    // brazil/india/hongkong/southkorea/sydney/southafrica/dubai/atlanta).discord.media"
    VoiceServerHostnames []string
}

// StatusWarn is a "soft pass" — the test technically succeeded but
// the user should know about a degradation (e.g. voice quality at the
// upper end of acceptable). Frontend renders it like StatusPassed but
// keeps the Hint visible.
const StatusWarn Status = "warn"

// Run streams Results to the returned channel and closes it when finished
// or when ctx is cancelled. The first event for each test is Status=running;
// the next is the final state (passed/failed/skipped). On retry, another
// running+final pair is emitted with Attempt > 1.
func Run(ctx context.Context, cfg Config) <-chan Result

Defaults applied when zero values are passed: PerTestTimeout=5s, MaxRetries=1, RetryBackoff=500ms, DiscordGateway="gateway.discord.gg:443", DiscordAPI="https://discord.com/api/v9/gateway", StunServer="stun.l.google.com:19302".

The seven tests

Sequential. Each test reuses sockets opened by previous tests when sensible.

ID What it does Considered failed when Skip rule
tcp net.DialTimeout("tcp", host:port) dial error never
greet Sends SOCKS5 client greeting 05 02 00 02 (or 05 01 00 if UseAuth=false). Reads 2 bytes. Pass = 05 00 (no auth) or 05 02 (auth required). Fail on 05 FF, anything else, or short read proxy returned non-SOCKS5 / refused all auth methods skipped if tcp failed
auth Only emitted when UseAuth=true. RFC 1929 sub-negotiation: 01 LEN_LOGIN LOGIN LEN_PASS PASS. Reads 2 bytes, expects 01 00. bad credentials (01 != 00) / short read not in test list when UseAuth=false; skipped if greet failed
connect SOCKS5 CONNECT to gateway.discord.gg:443 (ATYP=03 domain). Reads 10 bytes. Pass = REP=0x00. REP != 0 (0x05 = connection refused, etc) / timeout skipped if greet/auth failed
udp UDP ASSOCIATE: opens second TCP control channel, redoes greeting+auth there, sends 05 03 00 01 00000000 0000, reads 10-byte reply. Pass = REP=0x00 + valid relay endpoint in BND.ADDR/BND.PORT. REP=0x07 (cmd unsupported), other REP, short read skipped if greet failed
voice-quality Through the relay: send VoiceBurstCount (default 30) STUN binding requests to cfg.StunServer, spaced VoiceBurstInterval (default 20ms). Listen until last_send + 1.5*PerTestTimeout. Compute loss%, jitter (mean abs delta of inter-arrival deltas, à la RFC 3550 simplified), p50 RTT. Metric = "loss=2% jitter=14ms p50=42ms". Pass = loss ≤ 5% AND jitter ≤ 30ms AND p50 ≤ 250ms. Warn-pass (status=passed but Hint set) = loss ≤ 15% AND jitter ≤ 60ms — voice will work with audible glitches. Fail = anything worse. loss > 15% OR jitter > 60ms OR p50 > 400ms OR no replies at all skipped if udp failed
voice-srv Probe Discord voice servers. Concurrently DNS-resolve a hardcoded list of <region>.discord.media hostnames (russia, russia2, frankfurt, europe, singapore, japan, us-east, us-west, brazil, india, hongkong, southkorea, sydney, southafrica, dubai, atlanta) using OS resolver, 2s budget. For every resolved hostname: SOCKS5 CONNECT through proxy to host:443 with 1s dial timeout, run them concurrently with a small worker pool (8). Metric = "<N> regions reachable: russia, frankfurt, europe" (top 3). Pass = ≥ 1 region reachable. Warn-pass = 0 reachable but ≥ 1 resolved (proxy filters Discord media IPs even though DNS works) — Hint will warn that voice may not work despite checks 1-5 passing. Fail = 0 hostnames resolved at all (DNS broken or Discord changed naming) 0 hostnames resolved at all skipped if connect failed
api TCP CONNECT through the proxy to discord.com:443, do a tiny HTTPS GET /api/v9/gateway. Pass = HTTP 200 or 401 (Discord returns 401 unauthenticated, that still proves reachability). non-200/401 / TLS handshake failed / connect refused skipped if connect failed

For each fail, the Hint field carries a Russian explanation (the GUI is RU-localized) and RawHex carries the first 32 bytes of any unexpected response (for the expand-debug section in the UI).

Cancel & retry

  • ctx is honoured at every blocking call (Dial uses DialContext, reads use SetDeadline derived from PerTestTimeout). On cancel, current test emits a final failed result with Error="cancelled" and the channel closes; remaining tests get a single skipped event each.
  • Auto-retry once on transient errors:
    • timeout (net.Error.Timeout())
    • "connection reset by peer"
    • DNS temporary failure
  • NOT retried (likely user-config error or hard failure):
    • connection refused
    • bad credentials (REP=0x02, AUTH=0x01)
    • REP=0x07 (cmd unsupported)
    • HTTP 4xx/5xx other than 401 on api
  • Between attempts: sleep RetryBackoff.

Wails integration

internal/gui/app.go::RunCheck(cfg Config) becomes:

func (a *App) RunCheck(cfg Config) {
    ctx, cancel := context.WithCancel(a.ctx)
    a.muCheck.Lock()
    a.cancelCheck = cancel
    a.muCheck.Unlock()

    go func() {
        ck := mapToCheckerConfig(cfg)
        var passed, failed int
        for r := range checker.Run(ctx, ck) {
            runtime.EventsEmit(a.ctx, "check:result", r)
            if r.Status == checker.StatusPassed { passed++ }
            if r.Status == checker.StatusFailed { failed++ }
        }
        runtime.EventsEmit(a.ctx, "check:done", map[string]int{
            "total": passed + failed, "passed": passed, "failed": failed,
        })
    }()
}

func (a *App) CancelCheck() {
    a.muCheck.Lock()
    if a.cancelCheck != nil { a.cancelCheck() }
    a.muCheck.Unlock()
}

A new CancelCheck binding lets the GUI's Cancel button stop a running diagnostic. The frontend's useDrover hook gets a cancelCheck() callback that calls it.

Testing

  • Unit tests for each test function with a fake SOCKS5 server (net.Listen, hand-rolled byte responses) — covers happy path, every documented failure mode, malformed responses (truncated, wrong protocol, garbage).
  • STUN test uses a real pion/stun server in-process via net.Listen("udp").
  • Discord-API and connect tests use the same fake SOCKS5 server tunneling to httptest.NewTLSServer and net.Listen("tcp").
  • One end-to-end test against a real mihomo instance is documented in docs/testing/checker-e2e.md but not part of go test ./... (requires network).

Files

internal/checker/
  checker.go            ─ public API: Run, Result, Config
  socks5.go             ─ greeting, auth, CONNECT, UDP ASSOCIATE primitives
  stun.go               ─ STUN binding-request encode/decode (no library —
                          we already vendor enough; ~80 LOC)
  retry.go              ─ classify(err) -> transient | permanent
  hints.go              ─ map test failure → user hint (RU)
  checker_test.go       ─ Run-level integration with fake server
  socks5_test.go        ─ per-primitive table tests
  stun_test.go          ─ encode/decode + RTT mock

internal/gui/app.go gets RunCheck rewritten and a new CancelCheck method. The fake SCENARIOS path in app.go is removed.

Out of scope (future work)

  • IPv6 SOCKS5 ATYP=04. Discord today is IPv4; we'll add when we hit a proxy that's v6-only.
  • Parallel test execution (e.g. running connect and udp simultaneously on separate sessions). Sequential is clearer for the UI; we'll revisit if total runtime exceeds 10s on common networks.
  • TLS certificate pinning on api. The tls.Config is default — fine for reachability check.