Erik Treviño
About

Erik Treviño

Creator of behavioral contract testing for AI-powered applications. IEEE AITest 2026 author. I build the testing tools that don't exist yet — then formalize them so the industry can use them.

The Thread

Every role, the same move: assess the landscape, find the gap, build what doesn't exist.

Twenty years ago I started my career on COBOL mainframes, migrating a county appraisal system — one million property records — from legacy architecture to modern code. It took five years. I learned what it means to build something that matters, under constraints that demand precision.

Since then, I've built QA departments from scratch, led a 15-person distributed team shipping mobile banking for 12 financial institutions with zero critical production issues, and created an AI-powered investigation platform that prevented $2M+ in annual fraud exposure at a major entertainment company.

Behavioral Contract Testing

At a cybersecurity company, I created behavioral contract testing — a novel approach to validating AI features at the browser layer through declarative YAML contracts with five validation layers: state machine transitions, timing constraints, structural schema, safety invariants, and semantic similarity.

The framework is formalized as a mathematical tuple C = (S, T, Σ, Φ, Ψ) and implements 8 custom Playwright matchers including toTransitionThrough, toMeetTimingContract, toMatchAiSchema, toNotContainPii, toNotLeakSystemPrompt, toRespectTenantBoundary, toNotExceedTokenBudget, and toBeSemanticallySimilar. The key innovation is a novel AiStateObserver — a DOM mutation observer using requestAnimationFrame configuration-order priority scanning to handle sub-frame state collisions where AI responses begin streaming in under 16ms.

Industrial validation against a deployed cybersecurity platform with live Amazon Bedrock AI demonstrated 100% contract compliance across 10 contract executions — while identifying two previously undetected security issues that 174 existing E2E tests had missed: PII patterns (Social Security Numbers and credit card formats) surfacing unmasked in AI-generated breach analysis output, and system prompt fragment leakage in AI error state responses.

The behavioral contract format serves as a compliance-as-code artifact — readable by product managers, executable by test automation, and auditable by compliance teams. Each contract layer maps to specific regulatory requirements: EU AI Act articles, NIST AI RMF measures, and SOC 2 controls. The same test execution that validates AI behavior in CI/CD also generates the documentation required for regulatory audits.

Publications & Research

  • "Behavioral Contract Testing for AI Features at the Browser Layer" — Double-blind review submission to IEEE AITest 2026, AI Testing in Practice Track. Introduces the formal framework, the AiStateObserver for sub-frame temporal precision, and industrial validation results. Read the paper →

White Papers

The Directive Platform

The full system is a three-component, spec-driven test engineering platform:

  • SECUR-T — The behavioral contract testing framework. 8 custom Playwright matchers, 118 tests against live AI endpoints, dual-mode execution (mocked for CI/CD, live for nightly validation against real AI).
  • CurioEVE — AI-powered CLI with 60+ commands. Generates test specs from acceptance criteria, produces Playwright code from specs, scores sprints for automation feasibility.
  • M-O — Full-stack test management platform (Go/React). 950+ tests, flaky detection, error clustering, Slack/GitHub/Jira/Confluence integrations. Built in ~3 days.

Closed-loop pipeline: Ticket → spec → test → CI → results → gap analysis → next sprint.

Side Projects

Casiana — A full K-12 Learning Management System serving 63 schools and 22,000+ students. Built in ~100 days as a gift to a national education network. The only LMS with native multi-outcome assessment — scoring a single submission against 5 independent learning outcomes. No commercial LMS (Canvas, Schoology, Google Classroom) supports this because their data model is student × assignment → single_score. Casiana's 4-dimensional gradebook required a fundamentally different schema. 136K lines of code. 1,289 tests. 155+ Row Level Security policies. FERPA-compliant with SHA-256 audit chain verification. 9.85/10 CTO assessment. $0.003/student/month infrastructure cost — 300x cheaper than the incumbent.

CasianaAI — AI-native test intelligence platform. NLP engine converts plain English requirements to executable Playwright tests (70% automation rate, 95% parsing accuracy, <5 seconds). Self-healing tests with 6 recovery strategies and institutional memory learning. Predictive failure prevention forecasting failures 3-5 commits ahead (90% accuracy). Ports & Adapters architecture for clean IP separation. 5 patent-pending innovations. Named after my daughter, Casiana.

Speaking

I speak at conferences on behavioral contract testing, AI-augmented engineering, flaky test elimination, and CI/CD optimization. Available for keynotes, breakout sessions, workshops, and podcasts.

  • Zero Trust for AI — Behavioral contracts for the AI era (keynote, 40-45 min)
  • The Brain vs. The Body — AI evaluation and AI testing are different problems (35 min)
  • 680 Runs, Zero Retries — Statistical confidence in test engineering (30 min)
  • CI/CD Is Lying to You — Pipeline failures nobody noticed (35 min)
  • Your AI Wrote the Code, But Who Audited the Dependencies? — Supply chain security + AI (30 min)

48 Named Engineering Patterns

Over 11 weeks of production work, I documented 48 original, pattern-named methodologies — each extracted from real engineering problems, not theoretical exercises. These include Zero Trust for AI, The Testing Trophy Gap, 680-Run Stress Validation, The Fluff Audit, Brain-Body Testing Separation, The Calendar Time Bomb, Dual-Locator Resilience, Predictive Hardening, and 40 more. Each pattern is a blog post, a potential conference talk, and a reusable methodology.

Personal

I live in Austin, TX with my wife, daughter Casiana, cat, and pup. I care about compounding systems, honest self-assessment, and work that matters. I named my AI platform after my daughter because the best things I build are the ones that outlast me.

Let's build something together.

Have a testing challenge? Let's talk.