Unsupervised clustering of 9,896 top GitHub developers by 22 behavioral features — SkillBench Research Lab, UCSB
We extracted 22 features from each developer's public GitHub profile — value dimensions, activity patterns, skill breadth, AI-era signals, and identity metadata — then ran k-means clustering to find natural groupings. Silhouette analysis peaks at k=4.
PCA projection of the 22-dimensional feature space. Clusters overlap in 2D because the real separation lives in higher dimensions — but the structure is visible.
Radar charts showing each cluster's average score across six value dimensions, compared to the global mean.
High pre-AI history (78%), active cadence, broad collaboration. The backbone of open source.
| Login | Name | Followers | Annual Contribs | Pre-AI | Reviews |
|---|---|---|---|---|---|
| torvalds | Linus Torvalds | 287,983 | 3,264 | 87% | 2 |
| karpathy | Andrej | 135,647 | 859 | 69% | 6 |
| yyx990803 | Evan You | 107,039 | 1,902 | 94% | 17 |
| gaearon | dan | 90,422 | 836 | 78% | 42 |
| ruanyf | Ruan YiFeng | 85,507 | 685 | 97% | 0 |
Deep pre-AI foundations (91%) but sporadic current activity. Built things, then stepped back.
| Login | Name | Followers | Annual Contribs | Pre-AI | Reviews |
|---|---|---|---|---|---|
| gustavoguanabara | Gustavo Guanabara | 109,928 | 8 | 57% | 0 |
| peng-zhihui | 空心 | 86,188 | 0 | 97% | 0 |
| rafaballerini | Rafaella Ballerini | 59,052 | 0 | 94% | 0 |
| tj | TJ | 51,571 | 2 | 100% | 0 |
| IDouble | Alp ₿📈🚀🌕 | 50,276 | 24 | 95% | 0 |
Low pre-AI ratio (23%), newer accounts (~5yr), moderate activity. Became visible after AI coding tools emerged.
| Login | Name | Followers | Annual Contribs | Pre-AI | Reviews |
|---|---|---|---|---|---|
| dmalan | David J. Malan | 35,793 | 71 | 0% | 16 |
| elyxdev | Elyx | 26,763 | 63 | 0% | 0 |
| lucast1574 | Lucas Santillan | 21,120 | 377 | 0% | 0 |
| OracleBrain | Aashis Jha | 15,368 | 32 | 7% | 0 |
| george0st | jist | 13,508 | 1,109 | 0% | 0 |
High follower counts but near-zero activity, languages, and domains. Famous but not building publicly.
| Login | Name | Followers | Annual Contribs | Pre-AI | Reviews |
|---|---|---|---|---|---|
| claude | Claude | 46,597 | 0 | 100% | 0 |
| cumsoft | cumsoft | 18,410 | 0 | 100% | 0 |
| metatimeofficial | Metatime | 18,023 | 0 | 0% | 0 |
| PremChapagain | Prem Chapagain | 14,157 | 4 | 33% | 0 |
| ghost | Deleted user | 12,145 | 0 | 0% | 0 |
The AI-Era Arrivals cluster contains developers who joined or became active primarily after AI coding tools. Sub-clustering reveals three distinct sub-types.
Weekly cadence, highest quality in AI-Era group, some collaboration. Genuinely productive — but started in the AI era.
| Login | Name | Followers | Annual Contribs | Pre-AI | Reviews |
|---|---|---|---|---|---|
| george0st | jist | 13,508 | 1,109 | 0% | 0 |
| jrohitofficial | Rohit Jha | 12,721 | 648 | 35% | 0 |
| XiaomingX | Y11 | 11,465 | 3,702 | 0% | 0 |
| LaurieWired | LaurieWired | 11,006 | 208 | 0% | 0 |
| elder-plinius | pliny | 10,922 | 477 | 0% | 0 |
Accounts under 3 years, near-zero pre-AI commits, no code reviews. High social signal, low work-product signal.
| Login | Name | Followers | Annual Contribs | Pre-AI | Reviews |
|---|---|---|---|---|---|
| elyxdev | Elyx | 26,763 | 63 | 0% | 0 |
| lucast1574 | Lucas Santillan | 21,120 | 377 | 0% | 0 |
| pewdiepie-archdaemon | PewDiePie | 13,278 | 130 | 0% | 0 |
| elidianaandrade | Eli | 12,423 | 253 | 5% | 0 |
| meliksahyorulmazlar | meliksahyorulmazlar | 10,711 | 69 | 0% | 0 |
Some pre-AI history (62%) but activity declined. Had foundations, went quiet.
| Login | Name | Followers | Annual Contribs | Pre-AI | Reviews |
|---|---|---|---|---|---|
| dmalan | David J. Malan | 35,793 | 71 | 0% | 16 |
| OracleBrain | Aashis Jha | 15,368 | 32 | 7% | 0 |
| 996icu | 996icu | 8,566 | 4 | 100% | 0 |
| JCSIVO | JCSIVO | 7,512 | 26 | 73% | 0 |
| U7P4L-IN | U7P4L x C0D3R | 7,455 | 2 | 100% | 0 |
The largest cluster splits into two recognizable types: individual contributors vs. community hubs.
Deep individual contributors, strong foundations, low review/collaboration activity. Domain experts in their lane.
| Login | Name | Followers | Annual Contribs | Pre-AI | Reviews |
|---|---|---|---|---|---|
| torvalds | Linus Torvalds | 287,983 | 3,264 | 87% | 2 |
| karpathy | Andrej | 135,647 | 859 | 69% | 6 |
| yyx990803 | Evan You | 107,039 | 1,902 | 94% | 17 |
| gaearon | dan | 90,422 | 836 | 78% | 42 |
| ruanyf | Ruan YiFeng | 85,507 | 685 | 97% | 0 |
Highest collaboration breadth (30+), active code reviewers, daily cadence. Framework authors and open source maintainers.
| Login | Name | Followers | Annual Contribs | Pre-AI | Reviews |
|---|---|---|---|---|---|
| alesanchezr | Alejandro Sanchez | 2,068 | 12,605 | 86% | 122 |
| punkpeye | Frank Fiegel | 1,709 | 7,995 | 2% | 11 |
| Archetype | N | % | Quality | Cadence | Pre-AI | Age | Top Exemplar |
|---|---|---|---|---|---|---|---|
| Solo Veterans | 4,116 | 41.6% | 0.76 | 3.1 | 78% | 13y | torvalds |
| Ecosystem Leaders | 2 | 0.0% | 0.61 | 3.5 | 44% | 10y | alesanchezr |
| Pre-AI Veterans (Quiet) | 4,037 | 40.8% | 0.68 | 1.1 | 91% | 12y | gustavoguanabara |
| Active AI-Native Builders | 514 | 5.2% | 0.51 | 2.8 | 14% | 5y | george0st |
| Minimal Evidence Newcomers | 415 | 4.2% | 0.36 | 1.7 | 2% | 3y | elyxdev |
| Faded Practitioners | 339 | 3.4% | 0.40 | 1.1 | 62% | 6y | dmalan |
| Minimal Evidence Profiles | 473 | 4.8% | 0.17 | 0.5 | 31% | 5y | claude |
Key finding: Among the top 10,000 GitHub users by follower count, 12.8% became prominent primarily in the AI era. Within that group, a third show minimal evidence of engineering depth — high social signal with low work-product signal. From public artifacts alone, these profiles are indistinguishable from legitimate developers who simply work in private repos.
This analysis shows what work products can tell you — and where they hit a wall.
Distinct patterns of quality, consistency, collaboration, and AI-era adaptation cluster reliably across ~10,000 profiles.
High follower counts don't guarantee engineering depth. The data shows where these signals diverge.
Commits show who pushed, not who authored. A single commit may be 90% AI-generated or 100% handwritten — the git log looks identical.
A developer who went from zero to competent in 6 months looks the same as one who plateau'd 5 years ago. Static snapshots miss trajectories.
Did they use AI for boilerplate and write the hard parts themselves? Or prompt-engineer the entire feature? GitHub doesn't know.
Knowing Python as a software engineer is different from knowing Python as a business analyst. Repos show the what, not the how or why.
SkillBench closes this gap by instrumenting the development environment itself — turning the process of coding into a legible signal. These archetypes are the boot block. Telemetry turns them into living profiles.