Developer Archetypes

Unsupervised clustering of 9,896 top GitHub developers by 22 behavioral features — SkillBench Research Lab, UCSB

9,896
Developers analyzed
22
Behavioral features
4
Primary clusters
7
Total archetypes

Finding the Natural Groupings

We extracted 22 features from each developer's public GitHub profile — value dimensions, activity patterns, skill breadth, AI-era signals, and identity metadata — then ran k-means clustering to find natural groupings. Silhouette analysis peaks at k=4.

Silhouette analysis

9,896 Developers in 2D

PCA projection of the 22-dimensional feature space. Clusters overlap in 2D because the real separation lives in higher dimensions — but the structure is visible.

PCA scatter

Cluster Value Profiles

Radar charts showing each cluster's average score across six value dimensions, compared to the global mean.

Radar charts
Seasoned Builders
4,118 (41.6%)

High pre-AI history (78%), active cadence, broad collaboration. The backbone of open source.

LoginNameFollowersAnnual ContribsPre-AIReviews
torvaldsLinus Torvalds287,9833,26487%2
karpathyAndrej135,64785969%6
yyx990803Evan You107,0391,90294%17
gaearondan90,42283678%42
ruanyfRuan YiFeng85,50768597%0
Pre-AI Veterans (Quiet)
4,037 (40.8%)

Deep pre-AI foundations (91%) but sporadic current activity. Built things, then stepped back.

LoginNameFollowersAnnual ContribsPre-AIReviews
gustavoguanabaraGustavo Guanabara109,928857%0
peng-zhihui空心86,188097%0
rafaballeriniRafaella Ballerini59,052094%0
tjTJ51,5712100%0
IDoubleAlp ₿📈🚀🌕50,2762495%0
AI-Era Arrivals
1,268 (12.8%)

Low pre-AI ratio (23%), newer accounts (~5yr), moderate activity. Became visible after AI coding tools emerged.

LoginNameFollowersAnnual ContribsPre-AIReviews
dmalanDavid J. Malan35,793710%16
elyxdevElyx26,763630%0
lucast1574Lucas Santillan21,1203770%0
OracleBrainAashis Jha15,368327%0
george0stjist13,5081,1090%0
Minimal Evidence Profiles
473 (4.8%)

High follower counts but near-zero activity, languages, and domains. Famous but not building publicly.

LoginNameFollowersAnnual ContribsPre-AIReviews
claudeClaude46,5970100%0
cumsoftcumsoft18,4100100%0
metatimeofficialMetatime18,02300%0
PremChapagainPrem Chapagain14,157433%0
ghostDeleted user12,14500%0

Sub-Archetypes: AI-Era Arrivals

The AI-Era Arrivals cluster contains developers who joined or became active primarily after AI coding tools. Sub-clustering reveals three distinct sub-types.

AI-Era radar
AI-Era PCA
Active AI-Native Builders
514 (41%)

Weekly cadence, highest quality in AI-Era group, some collaboration. Genuinely productive — but started in the AI era.

LoginNameFollowersAnnual ContribsPre-AIReviews
george0stjist13,5081,1090%0
jrohitofficialRohit Jha12,72164835%0
XiaomingXY1111,4653,7020%0
LaurieWiredLaurieWired11,0062080%0
elder-pliniuspliny10,9224770%0
Minimal Evidence Newcomers
415 (33%)

Accounts under 3 years, near-zero pre-AI commits, no code reviews. High social signal, low work-product signal.

LoginNameFollowersAnnual ContribsPre-AIReviews
elyxdevElyx26,763630%0
lucast1574Lucas Santillan21,1203770%0
pewdiepie-archdaemonPewDiePie13,2781300%0
elidianaandradeEli12,4232535%0
meliksahyorulmazlarmeliksahyorulmazlar10,711690%0
Faded Practitioners
339 (27%)

Some pre-AI history (62%) but activity declined. Had foundations, went quiet.

LoginNameFollowersAnnual ContribsPre-AIReviews
dmalanDavid J. Malan35,793710%16
OracleBrainAashis Jha15,368327%0
996icu996icu8,5664100%0
JCSIVOJCSIVO7,5122673%0
U7P4L-INU7P4L x C0D3R7,4552100%0

Sub-Archetypes: Seasoned Builders

The largest cluster splits into two recognizable types: individual contributors vs. community hubs.

Seasoned Builder radar
Seasoned Builder PCA
Solo Veterans
~2,500 (61%)

Deep individual contributors, strong foundations, low review/collaboration activity. Domain experts in their lane.

LoginNameFollowersAnnual ContribsPre-AIReviews
torvaldsLinus Torvalds287,9833,26487%2
karpathyAndrej135,64785969%6
yyx990803Evan You107,0391,90294%17
gaearondan90,42283678%42
ruanyfRuan YiFeng85,50768597%0
Ecosystem Leaders
~1,600 (39%)

Highest collaboration breadth (30+), active code reviewers, daily cadence. Framework authors and open source maintainers.

LoginNameFollowersAnnual ContribsPre-AIReviews
alesanchezrAlejandro Sanchez2,06812,60586%122
punkpeyeFrank Fiegel1,7097,9952%11

Complete Taxonomy

Summary
ArchetypeN%QualityCadencePre-AIAgeTop Exemplar
Solo Veterans 4,11641.6% 0.763.1 78%13y torvalds
Ecosystem Leaders 20.0% 0.613.5 44%10y alesanchezr
Pre-AI Veterans (Quiet) 4,03740.8% 0.681.1 91%12y gustavoguanabara
Active AI-Native Builders 5145.2% 0.512.8 14%5y george0st
Minimal Evidence Newcomers 4154.2% 0.361.7 2%3y elyxdev
Faded Practitioners 3393.4% 0.401.1 62%6y dmalan
Minimal Evidence Profiles 4734.8% 0.170.5 31%5y claude

Key finding: Among the top 10,000 GitHub users by follower count, 12.8% became prominent primarily in the AI era. Within that group, a third show minimal evidence of engineering depth — high social signal with low work-product signal. From public artifacts alone, these profiles are indistinguishable from legitimate developers who simply work in private repos.

The Telemetry Gap

This analysis shows what work products can tell you — and where they hit a wall.

Behavioral archetypes from public signal

Distinct patterns of quality, consistency, collaboration, and AI-era adaptation cluster reliably across ~10,000 profiles.

Decoupled social proof from evidence

High follower counts don't guarantee engineering depth. The data shows where these signals diverge.

Who actually wrote the code

Commits show who pushed, not who authored. A single commit may be 90% AI-generated or 100% handwritten — the git log looks identical.

Learning velocity

A developer who went from zero to competent in 6 months looks the same as one who plateau'd 5 years ago. Static snapshots miss trajectories.

AI delegation patterns

Did they use AI for boilerplate and write the hard parts themselves? Or prompt-engineer the entire feature? GitHub doesn't know.

Skill in context

Knowing Python as a software engineer is different from knowing Python as a business analyst. Repos show the what, not the how or why.

SkillBench closes this gap by instrumenting the development environment itself — turning the process of coding into a legible signal. These archetypes are the boot block. Telemetry turns them into living profiles.