Open Source Claude Code Plugin

Test usability with
9 AI personas

Each persona is grounded in cognitive science research, traces your actual code paths, and reports what real users would experience. In parallel. In 2 minutes.

9
Behavioral Personas
5
Scoring Dimensions
30+
Research Citations
~2min
Full Test Run
The Problem

Usability testing shouldn't cost $15K

Traditional studies take weeks, cost thousands, and arrive after the code has shipped. UTA runs in your editor, on your code, in minutes.

Traditional Usability Study

Cost$5,000 - $15,000
Timeline2 - 4 weeks
Participants5 - 8 recruited
ConsistencyVaries per person
Specificity"Users had trouble with X"
IntegrationSeparate process

User Testing Agents

Cost$0
Timeline2 - 3 minutes
Participants9 behavioral archetypes
ConsistencyDeterministic signatures
Specificityfile.tsx:142 - handler returns early
IntegrationRuns in your editor
Get Started

Two ways to run UTA

Use as a Claude Code plugin or connect via MCP to any AI editor.

# Clone the repo
git clone https://github.com/Dekic648/user-testing-agents.git
cd user-testing-agents

# Install commands + skills
mkdir -p ~/.claude/commands/uta ~/.claude/uta
cp commands/*.md ~/.claude/commands/uta/
cp -r skills agents scoring ~/.claude/uta/

# Open any project and run
/uta:test

Connect UTA to any MCP-compatible AI tool. Your code stays local. Your AI subscription does the work.

Claude Desktop
Cursor
VS Code
Windsurf
Any MCP Client
# Clone and build the MCP server
git clone https://github.com/Dekic648/user-testing-agents.git
cd user-testing-agents/mcp-server
npm install && npm run build
Add to your AI tool's MCP config:
{
  "mcpServers": {
    "uta": {
      "command": "node",
      "args": ["/path/to/user-testing-agents/mcp-server/dist/index.js"]
    }
  }
}

4 Tools

uta_test — full 9-persona test
uta_test_quick — 3-persona quick check
uta_test_persona — single persona deep dive
uta_list_personas — browse all personas

12 Resources

uta://personas — overview of all 9
uta://persona/{id} — full persona definitions
uta://scoring-model — scoring framework
uta://domain-template — config template

Your Code Stays Local

The MCP server provides instructions. Your AI tool reads your code locally. Nothing leaves your machine.

Works Offline

Once cloned and built, no external dependencies. No API keys to configure. No accounts to create.

The Agents

9 behavioral personas, each grounded in research

Not demographic personas. Behavioral archetypes derived from cognitive science — how people actually interact with interfaces.

S
Scanner
Satisficer
Skims, clicks first CTA, never reads body text. 79% of users behave this way (NNGroup). Catches dead-end flows, misleading hierarchy, broken happy paths.
Simon 1956Krug 2000Nielsen F-pattern
D
Deliberator
Maximizer
Reads everything, compares all options, needs undo before committing. Catches missing tooltips, irreversible actions, inconsistent labels, information gaps.
Schwartz 2004Pask 1976Hick's Law
R
Rushing Pragmatist
Time-Pressured
3 minutes. Zero friction tolerance. Abandons at first roadblock. Catches excessive steps, slow paths, input format barriers, unprofessional output.
Maule & Hockey 1993CLTFitts' Law
N
Novice
First-Timer
No mental model. Scared by jargon. 87% of product returns are "couldn't figure it out" (Norman). Catches missing onboarding, jargon, empty states.
Dreyfus 1980Norman 2013Sweller CLT
P
Power User
Expert
Daily user. Keyboard first. Wants shortcuts, bulk ops, customization. 10-15% of users drive 50%+ of usage (NNGroup). Catches missing shortcuts, broken edge cases.
Shneiderman 1986Dreyfus ExpertZipf's Law
M
Distracted Multitasker
Interrupted
12 tabs open, leaves mid-task, returns 10 min later. 25 min average to resume (Iqbal & Horvitz). Catches lost state, no autosave, missing re-orientation.
Altmann & Trafton 2002Mark et al. 2008
A
Accessibility User
Keyboard / Screen Reader
Keyboard-only. Screen reader dependent. 96.3% of pages fail WCAG (WebAIM 2023). Catches missing ARIA, broken focus, contrast failures, inaccessible controls.
WCAG 2.1WebAIM MillionMicrosoft Inclusive
E
Skeptical Evaluator
Adoption Judge
Evaluating whether to adopt. 46% judge credibility by design (Fogg). Catches broken error states, missing loading states, stub pages, unprofessional edges.
Fogg 2002Stanford CredibilityMcKnight 2002
U
UI Purist
Visual Critic
Judges every pixel. 8px grid. Type scale. Color system. Users judge aesthetics in 17-50ms (Google). Catches spacing violations, typography issues, alignment breaks.
Tractinsky 2000GestaltApple HIG
Coverage

What each persona hunts for

Every problem type is covered by multiple personas. The matrix shows primary focus and secondary coverage.

Problem Type Scanner Deliber. Pragm. Novice Power Multi. A11y Eval. Purist
Dead-end flows / no path forward
Broken handlers / silent failures
Jargon / technical language
Missing empty states
Missing keyboard shortcuts
Lost form state / no autosave
Accessibility failures (ARIA, focus, contrast)
Irreversible actions / no undo
Excessive steps / slow paths
Spacing / typography / alignment issues
Missing loading / error states
Credibility / trust signals missing
Edge case failures (extreme inputs)
No onboarding / guidance
Primary focus
Secondary coverage
Not in scope
Output

What a UTA report looks like

Every test run produces a unified report with scores, ranked issues, and actionable solutions. Here's a real example structure.

uta-report-2026-03-22.md
UTA Composite Usability Report: Chartsmither
Test date: 2026-03-22 | Personas: 9/9 | Domain: loaded
Overall Score: 72/100 — Acceptable
Composite Scorecard
PersonaEffect.Effic.Learn.Satis.CraftComposite
Scanner857870806577
Deliberator806582787276
Rushing Pragmatist706055686063
Novice555045526050
Power User888275857081
Distracted Multi.454050426044
Accessibility User625865605560
Skeptical Eval.787075726873
UI Purist807578826273
Dimension Summary
DimensionAvgWeakestStrongestSystemic?
Effectiveness71Distracted Multi.Power UserYes
Efficiency64Distracted Multi.Power UserYes
Learnability66NoviceDeliberatorYes
Satisfaction69Distracted Multi.Power UserPartial
Craft64AccessibilityDeliberatorPartial
Ranked Priorities with Potential Solutions
P0 CRITICAL CRT-001: No autosave — form state lost on tab switch
What happens: User fills chart configuration (title, data, axis settings) across 3 panels. Switches to another tab for 10 minutes. Returns — all inputs reset to defaults. ConfigPanel.tsx:89 stores state in component useState only, no localStorage or sessionStorage persistence. The Distracted Multitasker loses 100% of work.
Where in code: ConfigPanel.tsx:89, ChartEditor.tsx:142
Impact: Found by 5/9 personas. Red route: "create chart -> export." Distracted Multitasker FAILS. Deliberator flags anxiety. Skeptical Evaluator: trust destroyed.
Potential solution: Add debounced localStorage persistence on every state change. Restore on mount. Show "draft saved" indicator.
Why #1: Userfocus Critical — show-stopper on red route, no workaround, persistent across all multi-step flows.
P1 SERIOUS FRC-003: 7 jargon terms on chart creation page
What happens: Chart creation page uses "Y-Axis Domain", "Ordinal Scale", "Datum", "Interpolation", "Categorical", "Discrete", "Aggregate" without explanation. Novice freezes at step 3 — decision paralysis. No tooltips on any label.
Where in code: ChartOptions.tsx:45-78, AxisConfig.tsx:23
Impact: Found by 3/9 personas (Novice, Deliberator, Skeptical Evaluator). Red route: yes. Workaround: trial-and-error.
Potential solution: Replace jargon labels with plain language ("Value Range" not "Y-Axis Domain"). Add info-icon tooltips for technical terms power users expect.
P1 SERIOUS FRC-005: No keyboard shortcuts for frequent actions
What happens: Power User must mouse-click through 4 menus to change chart type — an action done 10+ times daily. No Cmd+K command palette, no keyboard shortcuts registered. useHotkeys not imported anywhere in the codebase.
Where in code: App.tsx (no keyboard listener), ChartSelector.tsx:12
Impact: Found by 2/9 (Power User, Accessibility User). Not a show-stopper but persistent across all flows.
Potential solution: Add useHotkeys with Cmd+K command palette for chart type, theme, export. Register 5-10 shortcuts for frequent actions.
Flow Coverage Matrix
12 flows discovered | 10/12 covered | 2 untested
FlowScanDelPragNovPowMulA11yEvalPurCov.
Chart creationxxxxxxxxx9/9
Data input (paste)xxxxxx6/9
Export PDFxxxxxxx7/9
Theme switchingxxxx4/9
Report builderxxxx4/9
Settingsxxx3/9
Admin panel0/9
User preferences0/9
2 untested flows: Admin panel, User preferences. Consider adding these to the next test run.
Methodology

Three layers of evidence per persona

Every persona's existence is provable, not just theoretically motivated. Challenge any persona — the data backs it up.

1
Academic Citations
Published research with specific papers, years, and findings. Each persona traces back to foundational work in cognitive science, behavioral economics, or HCI.
Simon (1956), Schwartz (2004), Dreyfus & Dreyfus (1980), Norman (2013), Fogg (2002)
2
Empirical Behavioral Markers
Real statistics from large-scale research. Not opinions — measured facts that prove each behavioral pattern exists at scale.
"79% of users scan" (NNGroup) | "96.3% of pages fail WCAG" (WebAIM) | "25 min to resume after interruption" (Iqbal & Horvitz)
3
Deterministic Behavioral Signatures
IF/THEN rules derived from research that define exactly how each persona interacts. Not guidelines — deterministic behavior that makes every test reproducible.
IF page_has_multiple_CTAs -> pick largest | IF no_autosave_found -> flag Critical | IF heading_skips_level -> WCAG 1.3.1 failure
Scoring

5-dimension composite usability model

Each persona scores across 5 dimensions with persona-specific weight adjustments. The Novice weights Learnability at 35%. The UI Purist weights Craft at 50%.

25%
Effectiveness
Can the user finish? Completion rate, errors, dead-ends
20%
Efficiency
How much effort? Steps vs. minimum, wasted effort
20%
Learnability
First-use success? Jargon, empty states, disclosure
20%
Satisfaction
How does it feel? Clarity, confidence, trust
15%
Craft
Visual quality. Spacing, typography, alignment
Product Fit

Built for web products people use every day

UTA personas are calibrated for products where users navigate pages, fill forms, click buttons, read content, and expect professional output.

Strong Fit

SaaS / Web Applications

Dashboards, admin panels, analytics tools, CRM, project management. All 9 personas are calibrated for this. Every checklist item applies.

Notion, Linear, Airtable, Sensor Tower, HubSpot
Strong Fit

Internal Tools / Enterprise

Built without design resources, rarely usability tested. Highest ROI — Novice and Scanner catch the most issues here.

Admin dashboards, operations tools, reporting systems
Strong Fit

E-commerce / Marketplaces

Checkout flows, product browsing, search, filtering. Rushing Pragmatist catches cart friction. Skeptical Evaluator catches trust gaps.

Shopify storefronts, booking platforms, marketplace apps
Strong Fit

Developer Tools

CLIs with web UIs, API dashboards, documentation platforms. Power User catches missing shortcuts. Novice catches bad onboarding.

Stripe Dashboard, Vercel, Supabase, docs sites
Good Fit

Marketing Sites / Landing Pages

Scanner + Skeptical Evaluator + UI Purist are highly relevant for conversion optimization and credibility assessment.

Product landing pages, campaign sites, portfolios
Good Fit

Content Platforms

Navigation, readability, accessibility, and content management flows. Deliberator and Accessibility User add the most value here.

Blogs, news sites, knowledge bases, CMS platforms

Ship products users actually want to use

9 personas. 5 dimensions. 14 problem types. One command.

View on GitHub Browse Personas