SkillBench — Architecture Overview

The labor market is moving from binary to float.

Float requires measurement. SkillBench is the measurement layer — the only calibrated benchmark for human-AI collaboration, validated across controlled problems and real-world work.

Developers own their data. We own the distribution that makes it meaningful.

Four Compounding Layers

Talent Signals

The revenue engine.

Verified capability search · growth trajectory · continuous validation · marketplace integration

▲ builds on

Coaching Model

The flywheel. Prototype running.

Agent skills distilled from interaction patterns · causal A/B paired data · work becomes transferable knowledge

▲ builds on

Calibrated Benchmark

The lock. Grows with every user.

Percentile rankings by problem type, difficulty, language, AI tool — validated across controlled + real-world settings

▲ builds on

Individual Metrics

Free, portable. The growth engine.

Cognitive efficiency · authorship attribution · revision patterns · AI collaboration quality · developer-owned

Dual Validation Architecture

Controlled Problems

Coding challenge platforms (e.g. Codewars) — 3-5M developers
Known difficulty + verified test suites
Assessment platforms (HackerRank, Qualified, etc.)
SkillBench diagnostic instruments

→ instrument validation

Real-World Work

Enterprise deployments with partner organizations
Production telemetry at scale
Complex, proprietary workflows
Ecological validity across industries

→ ecological validation

Why This Can't Be Replicated

⚡

Structural Conflict

Model companies can't be neutral arbiters of "how well are you using AI?" when the answer might be "use a different model"

◈

Dual Dataset

Controlled + real-world. The gap between them is diagnostic insight only we can see

◷

Temporal Depth

Longitudinal behavioral patterns can't be fast-forwarded. Two years of data >> day-one snapshot

⇄

Causal Data

Paired A/B structure from coaching interventions. The only causally structured human-AI optimization dataset

Partnership Value

What SkillBench Enables for Partners

Training & Development

Instrument learners during real work — not synthetic assessments
Measure impact: leverage gains × capability growth
Scale coaching via transferable agent skills
Model-agnostic: works across Claude, Codex, Copilot, Gemini

Matching & Assessment

Skill-in-context signals (Python for SWE ≠ Python for analysis)
Evidence-based capability, not self-reported keywords
Continuous validation — not point-in-time testing
Authorship attribution: who did the work?

Marketplace & Credentials

Portable proof-of-work records for developers
Verified work record that travels with the person
Calibrated benchmark makes individual signals meaningful
Network effect: more users → better benchmarks → more value

Privacy Architecture

Developer trust is a design constraint, not a feature.

All telemetry stays local until the developer explicitly reviews and pushes it. No silent collection. No employer surveillance. No raw data leaves the device without consent. Group-level insights are available to organizations — never individual-level data without the developer's opt-in. Developers can export or delete their data at any time.

You can export your numerator. You can't export the distribution.