Files
drover-go/docs/superpowers/specs/2026-05-01-checker-design.md
T
root c83f942716 design: checker — 7-step SOCKS5 diagnostic spec
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 15:40:43 +03:00

179 lines
7.5 KiB
Markdown

# Checker — 7-step SOCKS5 diagnostic
**Status**: design accepted 2026-05-01.
**Replaces**: stub `RunCheck` in `internal/gui/app.go` that emits fake events.
## Why
The Wails GUI exposes a "Check connection" button that the user presses
before turning the engine on. Today it walks through a hard-coded scenario
in Go, returning bogus metrics. The user can't tell whether their proxy
is alive, supports UDP, or whether Discord blocks it. We need an honest
diagnostic that tells the user exactly which capability of their SOCKS5
proxy works and which doesn't, with hex-level evidence on failure.
## API surface
```go
// internal/checker/checker.go
package checker
type Status string
const (
StatusRunning Status = "running"
StatusPassed Status = "passed"
StatusFailed Status = "failed"
StatusSkipped Status = "skipped"
)
type Result struct {
ID string `json:"id"`
Status Status `json:"status"`
Metric string `json:"metric,omitempty"`
Error string `json:"error,omitempty"`
Hint string `json:"hint,omitempty"`
RawHex string `json:"raw_hex,omitempty"`
Duration time.Duration `json:"duration_ms"`
Attempt int `json:"attempt"`
}
type Config struct {
ProxyHost string
ProxyPort int
UseAuth bool
ProxyLogin string
ProxyPassword string
PerTestTimeout time.Duration
MaxRetries int
RetryBackoff time.Duration
DiscordGateway string
DiscordAPI string
StunServer string
}
// Run streams Results to the returned channel and closes it when finished
// or when ctx is cancelled. The first event for each test is Status=running;
// the next is the final state (passed/failed/skipped). On retry, another
// running+final pair is emitted with Attempt > 1.
func Run(ctx context.Context, cfg Config) <-chan Result
```
Defaults applied when zero values are passed: PerTestTimeout=5s, MaxRetries=1,
RetryBackoff=500ms, DiscordGateway="gateway.discord.gg:443",
DiscordAPI="https://discord.com/api/v9/gateway",
StunServer="stun.l.google.com:19302".
## The seven tests
Sequential. Each test reuses sockets opened by previous tests when sensible.
| ID | What it does | Considered failed when | Skip rule |
|----|--------------|------------------------|-----------|
| `tcp` | `net.DialTimeout("tcp", host:port)` | dial error | never |
| `greet` | Sends SOCKS5 client greeting `05 02 00 02` (or `05 01 00` if UseAuth=false). Reads 2 bytes. Pass = `05 00` (no auth) or `05 02` (auth required). Fail on `05 FF`, anything else, or short read | proxy returned non-SOCKS5 / refused all auth methods | skipped if `tcp` failed |
| `auth` | Only emitted when UseAuth=true. RFC 1929 sub-negotiation: `01 LEN_LOGIN LOGIN LEN_PASS PASS`. Reads 2 bytes, expects `01 00`. | bad credentials (`01 != 00`) / short read | not in test list when UseAuth=false; skipped if `greet` failed |
| `connect` | SOCKS5 CONNECT to `gateway.discord.gg:443` (ATYP=03 domain). Reads 10 bytes. Pass = REP=0x00. | REP != 0 (0x05 = connection refused, etc) / timeout | skipped if `greet`/`auth` failed |
| `udp` | UDP ASSOCIATE: opens **second** TCP control channel, redoes greeting+auth there, sends `05 03 00 01 00000000 0000`, reads 10-byte reply. Pass = REP=0x00 + valid relay endpoint in BND.ADDR/BND.PORT. | REP=0x07 (cmd unsupported), other REP, short read | skipped if `greet` failed |
| `stun` | Through the relay endpoint from the previous step: send STUN binding request (20-byte header, magic cookie 0x2112A442, random transaction ID), wait up to PerTestTimeout for XOR-MAPPED-ADDRESS reply. Metric = round-trip ms. | timeout / malformed response / no XOR-MAPPED-ADDRESS attribute | skipped if `udp` failed |
| `api` | TCP CONNECT through the proxy to `discord.com:443`, do a tiny HTTPS GET `/api/v9/gateway`. Pass = HTTP 200 or 401 (Discord returns 401 unauthenticated, that still proves reachability). | non-200/401 / TLS handshake failed / connect refused | skipped if `connect` failed |
For each fail, the `Hint` field carries a Russian explanation (the GUI is
RU-localized) and `RawHex` carries the first 32 bytes of any unexpected
response (for the expand-debug section in the UI).
## Cancel & retry
- `ctx` is honoured at every blocking call (Dial uses DialContext, reads
use SetDeadline derived from PerTestTimeout). On cancel, current test
emits a final `failed` result with Error="cancelled" and the channel
closes; remaining tests get a single `skipped` event each.
- Auto-retry once on transient errors:
- timeout (`net.Error.Timeout()`)
- "connection reset by peer"
- DNS temporary failure
- NOT retried (likely user-config error or hard failure):
- connection refused
- bad credentials (REP=0x02, AUTH=0x01)
- REP=0x07 (cmd unsupported)
- HTTP 4xx/5xx other than 401 on `api`
- Between attempts: sleep `RetryBackoff`.
## Wails integration
`internal/gui/app.go::RunCheck(cfg Config)` becomes:
```go
func (a *App) RunCheck(cfg Config) {
ctx, cancel := context.WithCancel(a.ctx)
a.muCheck.Lock()
a.cancelCheck = cancel
a.muCheck.Unlock()
go func() {
ck := mapToCheckerConfig(cfg)
var passed, failed int
for r := range checker.Run(ctx, ck) {
runtime.EventsEmit(a.ctx, "check:result", r)
if r.Status == checker.StatusPassed { passed++ }
if r.Status == checker.StatusFailed { failed++ }
}
runtime.EventsEmit(a.ctx, "check:done", map[string]int{
"total": passed + failed, "passed": passed, "failed": failed,
})
}()
}
func (a *App) CancelCheck() {
a.muCheck.Lock()
if a.cancelCheck != nil { a.cancelCheck() }
a.muCheck.Unlock()
}
```
A new `CancelCheck` binding lets the GUI's Cancel button stop a running
diagnostic. The frontend's `useDrover` hook gets a `cancelCheck()`
callback that calls it.
## Testing
- Unit tests for each test function with a fake SOCKS5 server (`net.Listen`,
hand-rolled byte responses) — covers happy path, every documented failure
mode, malformed responses (truncated, wrong protocol, garbage).
- STUN test uses a real `pion/stun` server in-process via `net.Listen("udp")`.
- Discord-API and `connect` tests use the same fake SOCKS5 server tunneling
to `httptest.NewTLSServer` and `net.Listen("tcp")`.
- One end-to-end test against a real `mihomo` instance is documented in
`docs/testing/checker-e2e.md` but not part of `go test ./...` (requires
network).
## Files
```
internal/checker/
checker.go ─ public API: Run, Result, Config
socks5.go ─ greeting, auth, CONNECT, UDP ASSOCIATE primitives
stun.go ─ STUN binding-request encode/decode (no library —
we already vendor enough; ~80 LOC)
retry.go ─ classify(err) -> transient | permanent
hints.go ─ map test failure → user hint (RU)
checker_test.go ─ Run-level integration with fake server
socks5_test.go ─ per-primitive table tests
stun_test.go ─ encode/decode + RTT mock
```
`internal/gui/app.go` gets `RunCheck` rewritten and a new `CancelCheck`
method. The fake SCENARIOS path in app.go is removed.
## Out of scope (future work)
- IPv6 SOCKS5 ATYP=04. Discord today is IPv4; we'll add when we hit a
proxy that's v6-only.
- Parallel test execution (e.g. running `connect` and `udp` simultaneously
on separate sessions). Sequential is clearer for the UI; we'll revisit
if total runtime exceeds 10s on common networks.
- TLS certificate pinning on `api`. The `tls.Config` is default — fine for
reachability check.