The Calibration Layer for Human-AI Work
Architecture Overview
March 2026

The labor market is moving from binary to float.

Float requires measurement. SkillBench is the measurement layer — the only calibrated benchmark for human-AI collaboration, validated across controlled problems and real-world work.

Developers own their data. We own the distribution that makes it meaningful.

Four Compounding Layers

L4
Talent Signals
The revenue engine.
Verified capability search · growth trajectory · continuous validation · marketplace integration
▲ builds on
L3
Coaching Model
The flywheel. Prototype running.
Agent skills distilled from interaction patterns · causal A/B paired data · work becomes transferable knowledge
▲ builds on
L2
Calibrated Benchmark
The lock. Grows with every user.
Percentile rankings by problem type, difficulty, language, AI tool — validated across controlled + real-world settings
▲ builds on
L1
Individual Metrics
Free, portable. The growth engine.
Cognitive efficiency · authorship attribution · revision patterns · AI collaboration quality · developer-owned

Dual Validation Architecture

Controlled Problems

  • Coding challenge platforms (e.g. Codewars) — 3-5M developers
  • Known difficulty + verified test suites
  • Assessment platforms (HackerRank, Qualified, etc.)
  • SkillBench diagnostic instruments
→ instrument validation

Real-World Work

  • Enterprise deployments with partner organizations
  • Production telemetry at scale
  • Complex, proprietary workflows
  • Ecological validity across industries
→ ecological validation

Why This Can't Be Replicated

Structural Conflict

Model companies can't be neutral arbiters of "how well are you using AI?" when the answer might be "use a different model"

Dual Dataset

Controlled + real-world. The gap between them is diagnostic insight only we can see

Temporal Depth

Longitudinal behavioral patterns can't be fast-forwarded. Two years of data >> day-one snapshot

Causal Data

Paired A/B structure from coaching interventions. The only causally structured human-AI optimization dataset

Partnership Value

What SkillBench Enables for Partners

Training & Development

  • Instrument learners during real work — not synthetic assessments
  • Measure impact: leverage gains × capability growth
  • Scale coaching via transferable agent skills
  • Model-agnostic: works across Claude, Codex, Copilot, Gemini

Matching & Assessment

  • Skill-in-context signals (Python for SWE ≠ Python for analysis)
  • Evidence-based capability, not self-reported keywords
  • Continuous validation — not point-in-time testing
  • Authorship attribution: who did the work?

Marketplace & Credentials

  • Portable proof-of-work records for developers
  • Verified work record that travels with the person
  • Calibrated benchmark makes individual signals meaningful
  • Network effect: more users → better benchmarks → more value

Privacy Architecture

Developer trust is a design constraint, not a feature.

All telemetry stays local until the developer explicitly reviews and pushes it. No silent collection. No employer surveillance. No raw data leaves the device without consent. Group-level insights are available to organizations — never individual-level data without the developer's opt-in. Developers can export or delete their data at any time.

You can export your numerator. You can't export the distribution.