The free tier generates the data and the network. The paid tier generates the insights. Developers join because the free offering is genuinely valuable on its own — community, ladder levels, verified profiles. Their participation enriches the calibrated benchmark that makes paid insights possible. The benchmark is the moat: you can export your profile, but you can't export the distribution that makes it meaningful.
Verified developers are more placeable. Tier 1 gives basic skill signals to the marketplace. Tier 2 adds calibrated, context-specific capability data — "writes production Python independently, delegates tests to AI, reviews thoroughly."
Proof-of-work summaries replace or supplement assessment pipelines. Skill-in-context eliminates the "Python without context" problem. AI authorship detection separates verified human skill from AI-generated output.
Training ROI becomes measurable: "After your bootcamp, these 40 devs increased independent coding by 25%. These 15 didn't." Coaching model embeds lessons directly into developer workflows. Scales training beyond train-the-trainer limits.
For each work unit, SkillBench produces a crystallized paragraph — what the human prompted, where they pivoted, what the outcome was. Structured like driving directions: junctures, not a flat trace. Scannable in seconds so stamping velocity isn't impeded.
Every stamp carries a coarse signal distinguishing human contribution from agent output. Conversation-to-artifact linkage via timestamps on both sides (chat turns and commits/PRs). The stamper sees the summary — never the raw logs. Hard architectural constraint, not policy.
Full redacted conversation logs stored for future reprocessing by smarter models. The stamp summary is the interface; the ground truth is the archive. Proof of work ≠ proof of skill — the stamp says "this human engaged meaningfully." Skill assessment is built separately on aggregated data.
SkillBench captures developer behavior across Claude, Codex, Copilot, Gemini, and Cursor — simultaneously. No model company can be the neutral arbiter of training effectiveness across competing platforms. As Andela scales training partnerships with OpenAI, Anthropic, NVIDIA, and GitHub, SkillBench is the only measurement infrastructure that works across all of them.
Every model company wants developer training at scale but can only measure their own tool. Andela needs to prove training ROI regardless of which AI platform the learner uses. SkillBench provides the cross-platform skill signal that makes multi-partner training programs measurable — and accountable.
Session data never leaves the machine until the developer explicitly reviews and pushes it. The boot block tool auto-classifies projects by visibility and license — proprietary code is excluded by default. This is user-curated sharing, not surveillance. It makes the free tier genuinely generous because the act of opting in is itself what developers trade for value.
The calibrated benchmark gets richer with scale. At 1M users, it's granular enough to be diagnostic — "developers who score like you break through to 90th percentile by changing how they prompt on decomposition steps." No competitor can replicate without building both sides (controlled difficulty + real-world telemetry) plus years of longitudinal data.