skillbench
Internal — Platform Strategy

Three Tiers, One Flywheel

How SkillBench grows from free profiles to the calibrated benchmark that makes developer skill measurable, portable, and trustworthy — for individuals, marketplaces, and enterprises.

The Core Dynamic

The free tier generates the data and the network. The paid tier generates the insights. Developers join because the free offering is genuinely valuable on its own — community, ladder levels, verified profiles. Their participation enriches the calibrated benchmark that makes paid insights possible. The benchmark is the moat: you can export your profile, but you can't export the distribution that makes it meaningful.

◇ Tier 0

Boot Block

Work-product profile derived from public GitHub data. The entry point.
Requires: GitHub account
  • Character sheet from public repos, commits, PRs, contribution patterns
  • Language & domain presence mapping (not proficiency — presence)
  • Archetype classification from work products
  • AI authorship confidence signal — detected from commit patterns, not claimed
  • Basic community profile
Purpose: Creates the "wait, this doesn't reflect what I actually know" moment. Drives hunger for telemetry-derived insight.
◆ Tier 1 — Free

Community Member

Telemetry-enriched profile with social features. The network effect engine.
Requires: SkillMeter extension + opt-in data push
  • Agentic Engineering Ladder level (L1 Dabbler → L5 Maestro)
  • Level-up guide — specific gaps + examples from users ahead of you
  • Weekly progression trends on your own data
  • Community percentile rank (anonymized by default)
  • Mentor matching (opt-in, bidirectional consent)
  • Leaderboards with privacy controls
  • Portable achievements — export to LinkedIn, Wasteland, marketplaces
  • Proof-of-work summaries — portable evidence of how work got done, not just what shipped
Purpose: Every dev who joins enriches the calibrated benchmark. Social features drive retention. Free because the data they contribute is worth more than what they consume.
★ Tier 2 — Paid

Enterprise / Professional

Calibrated insights for individuals and organizations. The revenue layer.
Individual ~$10-20/mo  |  Enterprise per-seat
  • Calibrated benchmark comparison ("75th percentile among engineers at your level")
  • AI delegation pattern analysis — learning vs. just shipping
  • Skill-in-context mapping (Python for pipelines ≠ Python for APIs)
  • Coaching model integration — lessons embedded in your agent's behavior
  • CXO Dashboards — org-wide rollups, scenario modeling, board-ready reporting (enterprise)
  • Director Playbooks — group-level detail, operational coaching (enterprise)
  • Impact measurement — behavioral evidence of leverage gains and capability growth (enterprise)
  • Enhanced matching signals for marketplaces (enterprise)
Purpose: Calibrated leverage × capability signals at every altitude — individual, team, org. The same optimization, the same data flywheel, different views.

The Learning Flywheel

Every developer who joins makes the community smarter. Their work becomes agent skills. Agent skills become training at scale.
👤 Boot Block Free profile · on-ramp Work + Learn Dev builds with AI tools 🔬 Insights Captured Telemetry → skill signals 🧩 Agent Skills Shareable, plug-and-play 🌱 Community Grows Next dev starts smarter 🏔️ More Devs Join Network effects compound

Andela Product Alignment

SkillBench enriches all three Andela product lines — Matching, Assessment, and Learning.

Matching

Tier 1 (free) + Tier 2 (paid)

Verified developers are more placeable. Tier 1 gives basic skill signals to the marketplace. Tier 2 adds calibrated, context-specific capability data — "writes production Python independently, delegates tests to AI, reviews thoroughly."

Assessment

Tier 2 (paid)

Proof-of-work summaries replace or supplement assessment pipelines. Skill-in-context eliminates the "Python without context" problem. AI authorship detection separates verified human skill from AI-generated output.

Learning / Training

Tier 2 (paid)

Training ROI becomes measurable: "After your bootcamp, these 40 devs increased independent coding by 25%. These 15 didn't." Coaching model embeds lessons directly into developer workflows. Scales training beyond train-the-trainer limits.

Wasteland — Proof of Work

SkillBench solves the stamping problem: "Am I just giving stamps to Claude with different accounts?"

Stamp Summaries

Tier 1 (free)

For each work unit, SkillBench produces a crystallized paragraph — what the human prompted, where they pivoted, what the outcome was. Structured like driving directions: junctures, not a flat trace. Scannable in seconds so stamping velocity isn't impeded.

Human-vs-Agent Signal

Tier 1 (free)

Every stamp carries a coarse signal distinguishing human contribution from agent output. Conversation-to-artifact linkage via timestamps on both sides (chat turns and commits/PRs). The stamper sees the summary — never the raw logs. Hard architectural constraint, not policy.

Ground Truth Archive

Tier 2 (paid)

Full redacted conversation logs stored for future reprocessing by smarter models. The stamp summary is the interface; the ground truth is the archive. Proof of work ≠ proof of skill — the stamp says "this human engaged meaningfully." Skill assessment is built separately on aggregated data.

Model-Agnostic by Design

The Neutral Measurement Layer

SkillBench captures developer behavior across Claude, Codex, Copilot, Gemini, and Cursor — simultaneously. No model company can be the neutral arbiter of training effectiveness across competing platforms. As Andela scales training partnerships with OpenAI, Anthropic, NVIDIA, and GitHub, SkillBench is the only measurement infrastructure that works across all of them.

Why This Matters Now

Every model company wants developer training at scale but can only measure their own tool. Andela needs to prove training ROI regardless of which AI platform the learner uses. SkillBench provides the cross-platform skill signal that makes multi-partner training programs measurable — and accountable.

The Privacy-First Advantage

Developer-Curated Data Sharing

Session data never leaves the machine until the developer explicitly reviews and pushes it. The boot block tool auto-classifies projects by visibility and license — proprietary code is excluded by default. This is user-curated sharing, not surveillance. It makes the free tier genuinely generous because the act of opting in is itself what developers trade for value.

Why This Is Defensible

The calibrated benchmark gets richer with scale. At 1M users, it's granular enough to be diagnostic — "developers who score like you break through to 90th percentile by changing how they prompt on decomposition steps." No competitor can replicate without building both sides (controlled difficulty + real-world telemetry) plus years of longitudinal data.