Erik Treviño
Creator of behavioral contract testing for AI-powered applications. IEEE AITest 2026 author. I build the testing tools that don't exist yet — then formalize them so the industry can use them.
The Thread
Every role, the same move: assess the landscape, find the gap, build what doesn't exist.
Twenty years ago I started my career on COBOL mainframes, migrating a county appraisal system — one million property records — from legacy architecture to modern code. It took five years. I learned what it means to build something that matters, under constraints that demand precision.
Since then, I've built QA departments from scratch, led a 15-person distributed team shipping mobile banking for 12 financial institutions with zero critical production issues, and created an AI-powered investigation platform that prevented $2M+ in annual fraud exposure at a major entertainment company.
Behavioral Contract Testing
At a cybersecurity company, I created behavioral contract testing — a novel approach to validating AI features at the browser layer through declarative YAML contracts with five validation layers: state machine transitions, timing constraints, structural schema, safety invariants, and semantic similarity.
The framework is formalized as a mathematical tuple C = (S, T, Σ, Φ, Ψ) and implements 8 custom Playwright matchers including toTransitionThrough, toMeetTimingContract, toMatchAiSchema, toNotContainPii, toNotLeakSystemPrompt, toRespectTenantBoundary, toNotExceedTokenBudget, and toBeSemanticallySimilar. The key innovation is a novel AiStateObserver — a DOM mutation observer using requestAnimationFrame configuration-order priority scanning to handle sub-frame state collisions where AI responses begin streaming in under 16ms.
Industrial validation against a deployed cybersecurity platform with live Amazon Bedrock AI demonstrated 100% contract compliance across 10 contract executions — while identifying two previously undetected security issues that 174 existing E2E tests had missed: PII patterns (Social Security Numbers and credit card formats) surfacing unmasked in AI-generated breach analysis output, and system prompt fragment leakage in AI error state responses.
The behavioral contract format serves as a compliance-as-code artifact — readable by product managers, executable by test automation, and auditable by compliance teams. Each contract layer maps to specific regulatory requirements: EU AI Act articles, NIST AI RMF measures, and SOC 2 controls. The same test execution that validates AI behavior in CI/CD also generates the documentation required for regulatory audits.
Publications & Research
- "Behavioral Contract Testing for AI Features at the Browser Layer" — Double-blind review submission to IEEE AITest 2026, AI Testing in Practice Track. Introduces the formal framework, the AiStateObserver for sub-frame temporal precision, and industrial validation results. Read the paper →
White Papers
- Behavioral Contract Testing for Non-Deterministic AI Outputs — The comprehensive technical white paper on the methodology.
- Multi-Outcome Assessment at Scale — Architecture for competency-based education (Casiana).
- AI-Native Test Intelligence — From natural language to executable tests (CasianaAI).
The Directive Platform
The full system is a three-component, spec-driven test engineering platform:
- SECUR-T — The behavioral contract testing framework. 8 custom Playwright matchers, 118 tests against live AI endpoints, dual-mode execution (mocked for CI/CD, live for nightly validation against real AI).
- CurioEVE — AI-powered CLI with 60+ commands. Generates test specs from acceptance criteria, produces Playwright code from specs, scores sprints for automation feasibility.
- M-O — Full-stack test management platform (Go/React). 950+ tests, flaky detection, error clustering, Slack/GitHub/Jira/Confluence integrations. Built in ~3 days.
Closed-loop pipeline: Ticket → spec → test → CI → results → gap analysis → next sprint.
Side Projects
Casiana — A full K-12 Learning Management System serving 63 schools and 22,000+ students. Built in ~100 days as a gift to a national education network. The only LMS with native multi-outcome assessment — scoring a single submission against 5 independent learning outcomes. No commercial LMS (Canvas, Schoology, Google Classroom) supports this because their data model is student × assignment → single_score. Casiana's 4-dimensional gradebook required a fundamentally different schema. 136K lines of code. 1,289 tests. 155+ Row Level Security policies. FERPA-compliant with SHA-256 audit chain verification. 9.85/10 CTO assessment. $0.003/student/month infrastructure cost — 300x cheaper than the incumbent.
CasianaAI — AI-native test intelligence platform. NLP engine converts plain English requirements to executable Playwright tests (70% automation rate, 95% parsing accuracy, <5 seconds). Self-healing tests with 6 recovery strategies and institutional memory learning. Predictive failure prevention forecasting failures 3-5 commits ahead (90% accuracy). Ports & Adapters architecture for clean IP separation. 5 patent-pending innovations. Named after my daughter, Casiana.
Speaking
I speak at conferences on behavioral contract testing, AI-augmented engineering, flaky test elimination, and CI/CD optimization. Available for keynotes, breakout sessions, workshops, and podcasts.
- Zero Trust for AI — Behavioral contracts for the AI era (keynote, 40-45 min)
- The Brain vs. The Body — AI evaluation and AI testing are different problems (35 min)
- 680 Runs, Zero Retries — Statistical confidence in test engineering (30 min)
- CI/CD Is Lying to You — Pipeline failures nobody noticed (35 min)
- Your AI Wrote the Code, But Who Audited the Dependencies? — Supply chain security + AI (30 min)
48 Named Engineering Patterns
Over 11 weeks of production work, I documented 48 original, pattern-named methodologies — each extracted from real engineering problems, not theoretical exercises. These include Zero Trust for AI, The Testing Trophy Gap, 680-Run Stress Validation, The Fluff Audit, Brain-Body Testing Separation, The Calendar Time Bomb, Dual-Locator Resilience, Predictive Hardening, and 40 more. Each pattern is a blog post, a potential conference talk, and a reusable methodology.
Personal
I live in Austin, TX with my wife, daughter Casiana, cat, and pup. I care about compounding systems, honest self-assessment, and work that matters. I named my AI platform after my daughter because the best things I build are the ones that outlast me.
Let's build something together.
Have a testing challenge? Let's talk.
