User Testing Agents

Get Started

Two ways to run UTA

Use as a Claude Code plugin or connect via MCP to any AI editor.

# Clone the repo
git clone https://github.com/Dekic648/user-testing-agents.git
cd user-testing-agents

# Install commands + skills
mkdir -p ~/.claude/commands/uta ~/.claude/uta
cp commands/*.md ~/.claude/commands/uta/
cp -r skills agents scoring ~/.claude/uta/

# Open any project and run
/uta:test

Connect UTA to any MCP-compatible AI tool. Your code stays local. Your AI subscription does the work.

✓ Claude Desktop

✓ Cursor

✓ VS Code

✓ Windsurf

✓ Any MCP Client

# Clone and build the MCP server
git clone https://github.com/Dekic648/user-testing-agents.git
cd user-testing-agents/mcp-server
npm install && npm run build

Add to your AI tool's MCP config:

{
  "mcpServers": {
    "uta": {
      "command": "node",
      "args": ["/path/to/user-testing-agents/mcp-server/dist/index.js"]
    }
  }
}

4 Tools

uta_test — full 9-persona test
uta_test_quick — 3-persona quick check
uta_test_persona — single persona deep dive
uta_list_personas — browse all personas

12 Resources

uta://personas — overview of all 9
uta://persona/{id} — full persona definitions
uta://scoring-model — scoring framework
uta://domain-template — config template

Your Code Stays Local

The MCP server provides instructions. Your AI tool reads your code locally. Nothing leaves your machine.

Works Offline

Once cloned and built, no external dependencies. No API keys to configure. No accounts to create.

The Agents

9 behavioral personas, each grounded in research

Not demographic personas. Behavioral archetypes derived from cognitive science — how people actually interact with interfaces.

Scanner

Satisficer

Skims, clicks first CTA, never reads body text. 79% of users behave this way (NNGroup). Catches dead-end flows, misleading hierarchy, broken happy paths.

Simon 1956Krug 2000Nielsen F-pattern

Deliberator

Maximizer

Reads everything, compares all options, needs undo before committing. Catches missing tooltips, irreversible actions, inconsistent labels, information gaps.

Schwartz 2004Pask 1976Hick's Law

Rushing Pragmatist

Time-Pressured

3 minutes. Zero friction tolerance. Abandons at first roadblock. Catches excessive steps, slow paths, input format barriers, unprofessional output.

Maule & Hockey 1993CLTFitts' Law

Novice

First-Timer

No mental model. Scared by jargon. 87% of product returns are "couldn't figure it out" (Norman). Catches missing onboarding, jargon, empty states.

Dreyfus 1980Norman 2013Sweller CLT

Power User

Expert

Daily user. Keyboard first. Wants shortcuts, bulk ops, customization. 10-15% of users drive 50%+ of usage (NNGroup). Catches missing shortcuts, broken edge cases.

Shneiderman 1986Dreyfus ExpertZipf's Law

Distracted Multitasker

Interrupted

12 tabs open, leaves mid-task, returns 10 min later. 25 min average to resume (Iqbal & Horvitz). Catches lost state, no autosave, missing re-orientation.

Altmann & Trafton 2002Mark et al. 2008

Accessibility User

Keyboard / Screen Reader

Keyboard-only. Screen reader dependent. 96.3% of pages fail WCAG (WebAIM 2023). Catches missing ARIA, broken focus, contrast failures, inaccessible controls.

WCAG 2.1WebAIM MillionMicrosoft Inclusive

Skeptical Evaluator

Adoption Judge

Evaluating whether to adopt. 46% judge credibility by design (Fogg). Catches broken error states, missing loading states, stub pages, unprofessional edges.

Fogg 2002Stanford CredibilityMcKnight 2002

UI Purist

Visual Critic

Judges every pixel. 8px grid. Type scale. Color system. Users judge aesthetics in 17-50ms (Google). Catches spacing violations, typography issues, alignment breaks.

Tractinsky 2000GestaltApple HIG

Coverage

What each persona hunts for

Every problem type is covered by multiple personas. The matrix shows primary focus and secondary coverage.

Problem Type	Scanner	Deliber.	Pragm.	Novice	Power	Multi.	A11y	Eval.	Purist
Dead-end flows / no path forward
Broken handlers / silent failures
Jargon / technical language
Missing empty states
Missing keyboard shortcuts
Lost form state / no autosave
Accessibility failures (ARIA, focus, contrast)
Irreversible actions / no undo
Excessive steps / slow paths
Spacing / typography / alignment issues
Missing loading / error states
Credibility / trust signals missing
Edge case failures (extreme inputs)
No onboarding / guidance

Primary focus

Secondary coverage

Not in scope

Output

What a UTA report looks like

Every test run produces a unified report with scores, ranked issues, and actionable solutions. Here's a real example structure.

UTA Composite Usability Report: Chartsmither

Test date: 2026-03-22 | Personas: 9/9 | Domain: loaded

Overall Score: 72/100 — Acceptable

Composite Scorecard

Persona	Effect.	Effic.	Learn.	Satis.	Craft	Composite
Scanner	85	78	70	80	65	77
Deliberator	80	65	82	78	72	76
Rushing Pragmatist	70	60	55	68	60	63
Novice	55	50	45	52	60	50
Power User	88	82	75	85	70	81
Distracted Multi.	45	40	50	42	60	44
Accessibility User	62	58	65	60	55	60
Skeptical Eval.	78	70	75	72	68	73
UI Purist	80	75	78	82	62	73

Dimension Summary

Dimension	Avg	Weakest	Strongest	Systemic?
Effectiveness	71	Distracted Multi.	Power User	Yes
Efficiency	64	Distracted Multi.	Power User	Yes
Learnability	66	Novice	Deliberator	Yes
Satisfaction	69	Distracted Multi.	Power User	Partial
Craft	64	Accessibility	Deliberator	Partial

Ranked Priorities with Potential Solutions

P0 CRITICAL CRT-001: No autosave — form state lost on tab switch

What happens: User fills chart configuration (title, data, axis settings) across 3 panels. Switches to another tab for 10 minutes. Returns — all inputs reset to defaults. ConfigPanel.tsx:89 stores state in component useState only, no localStorage or sessionStorage persistence. The Distracted Multitasker loses 100% of work.

Where in code: ConfigPanel.tsx:89, ChartEditor.tsx:142

Impact: Found by 5/9 personas. Red route: "create chart -> export." Distracted Multitasker FAILS. Deliberator flags anxiety. Skeptical Evaluator: trust destroyed.

Potential solution: Add debounced localStorage persistence on every state change. Restore on mount. Show "draft saved" indicator.

Why #1: Userfocus Critical — show-stopper on red route, no workaround, persistent across all multi-step flows.

P1 SERIOUS FRC-003: 7 jargon terms on chart creation page

What happens: Chart creation page uses "Y-Axis Domain", "Ordinal Scale", "Datum", "Interpolation", "Categorical", "Discrete", "Aggregate" without explanation. Novice freezes at step 3 — decision paralysis. No tooltips on any label.

Where in code: ChartOptions.tsx:45-78, AxisConfig.tsx:23

Impact: Found by 3/9 personas (Novice, Deliberator, Skeptical Evaluator). Red route: yes. Workaround: trial-and-error.

Potential solution: Replace jargon labels with plain language ("Value Range" not "Y-Axis Domain"). Add info-icon tooltips for technical terms power users expect.

P1 SERIOUS FRC-005: No keyboard shortcuts for frequent actions

What happens: Power User must mouse-click through 4 menus to change chart type — an action done 10+ times daily. No Cmd+K command palette, no keyboard shortcuts registered. useHotkeys not imported anywhere in the codebase.

Where in code: App.tsx (no keyboard listener), ChartSelector.tsx:12

Impact: Found by 2/9 (Power User, Accessibility User). Not a show-stopper but persistent across all flows.

Potential solution: Add useHotkeys with Cmd+K command palette for chart type, theme, export. Register 5-10 shortcuts for frequent actions.

Flow Coverage Matrix

12 flows discovered | 10/12 covered | 2 untested

Flow	Scan	Del	Prag	Nov	Pow	Mul	A11y	Eval	Pur	Cov.
Chart creation	x	x	x	x	x	x	x	x	x	9/9
Data input (paste)	x	x	x	x	x		x			6/9
Export PDF	x	x	x		x	x	x	x		7/9
Theme switching		x			x			x	x	4/9
Report builder		x			x	x		x		4/9
Settings		x			x			x		3/9
Admin panel										0/9
User preferences										0/9

2 untested flows: Admin panel, User preferences. Consider adding these to the next test run.

Methodology

Three layers of evidence per persona

Every persona's existence is provable, not just theoretically motivated. Challenge any persona — the data backs it up.

Academic Citations

Published research with specific papers, years, and findings. Each persona traces back to foundational work in cognitive science, behavioral economics, or HCI.

Simon (1956), Schwartz (2004), Dreyfus & Dreyfus (1980), Norman (2013), Fogg (2002)

Empirical Behavioral Markers

Real statistics from large-scale research. Not opinions — measured facts that prove each behavioral pattern exists at scale.

"79% of users scan" (NNGroup) | "96.3% of pages fail WCAG" (WebAIM) | "25 min to resume after interruption" (Iqbal & Horvitz)

Deterministic Behavioral Signatures

IF/THEN rules derived from research that define exactly how each persona interacts. Not guidelines — deterministic behavior that makes every test reproducible.

IF page_has_multiple_CTAs -> pick largest | IF no_autosave_found -> flag Critical | IF heading_skips_level -> WCAG 1.3.1 failure

Product Fit

Built for web products people use every day

UTA personas are calibrated for products where users navigate pages, fill forms, click buttons, read content, and expect professional output.

Strong Fit

SaaS / Web Applications

Dashboards, admin panels, analytics tools, CRM, project management. All 9 personas are calibrated for this. Every checklist item applies.

Notion, Linear, Airtable, Sensor Tower, HubSpot

Strong Fit

Internal Tools / Enterprise

Built without design resources, rarely usability tested. Highest ROI — Novice and Scanner catch the most issues here.

Admin dashboards, operations tools, reporting systems

Strong Fit

E-commerce / Marketplaces

Checkout flows, product browsing, search, filtering. Rushing Pragmatist catches cart friction. Skeptical Evaluator catches trust gaps.

Shopify storefronts, booking platforms, marketplace apps

Strong Fit

Developer Tools

CLIs with web UIs, API dashboards, documentation platforms. Power User catches missing shortcuts. Novice catches bad onboarding.

Stripe Dashboard, Vercel, Supabase, docs sites

Good Fit

Marketing Sites / Landing Pages

Scanner + Skeptical Evaluator + UI Purist are highly relevant for conversion optimization and credibility assessment.

Product landing pages, campaign sites, portfolios

Good Fit

Content Platforms

Navigation, readability, accessibility, and content management flows. Deliberator and Accessibility User add the most value here.

Blogs, news sites, knowledge bases, CMS platforms

Test usability with
9 AI personas

Usability testing shouldn't cost $15K

Traditional Usability Study

Two ways to run UTA

4 Tools

12 Resources

Your Code Stays Local

Works Offline

9 behavioral personas, each grounded in research

What each persona hunts for

What a UTA report looks like

Three layers of evidence per persona

5-dimension composite usability model

Built for web products people use every day

SaaS / Web Applications

Internal Tools / Enterprise

E-commerce / Marketplaces

Developer Tools

Marketing Sites / Landing Pages

Content Platforms

Ship products users actually want to use

Test usability with9 AI personas

Usability testing shouldn't cost $15K

Traditional Usability Study