CasianaAI

Most test automation is still manual. A human reads a requirement, translates it into code, and maintains that code when the product changes. Playwright made the execution layer excellent — but the creation, maintenance, and intelligence layers are still bottlenecked by human throughput.
What if you could describe what to test in plain English and get executable Playwright code?
CasianaAI is my answer. Named after my daughter, Casiana.
The Problem
The testing industry has a supply problem. There aren’t enough SDETs to write the tests that need to exist. Copilot-style code completion helps with boilerplate, but it generates code scaffolding — not behavioral specifications. Record-and-replay tools like Testim capture interactions but can’t reason about what should be tested. Self-healing wrappers like Healenium fix broken selectors but don’t generate new tests.
None of these take an NLP-first approach where plain English behavioral descriptions are the source of truth that generates executable test code. That’s the gap CasianaAI fills: from natural language requirements to structured test scenarios to runnable Playwright code, with intelligence at every layer.
Architecture Overview
CasianaAI uses a Ports & Adapters (Hexagonal) architecture with dependency injection to maintain clean IP separation. Four port interfaces define the intelligence contracts — PerformanceIntelligencePort, TestIntelligencePort, AdaptiveExecutionPort, and SelfHealingPort — with basic implementations providing safe defaults and enhanced implementations providing the full AI-powered pipeline. An environment variable switches between them.
This architecture was a deliberate choice: the intelligence is in the port implementations, not in the core domain logic. The core is pure TypeScript with no AI dependencies. The ports are contracts. The adapters are swappable.
Tech Stack
- Language: TypeScript (strict mode), 18,204 lines across 151 test suites
- NLP: HuggingFace models (facebook/bart-large-mnli for zero-shot classification, all-MiniLM-L6-v2 for semantic embeddings)
- AI Reasoning: Claude API (Opus for complex reasoning, Sonnet for simple tasks)
- Test Framework: Playwright (generated output target)
- Architecture: Ports & Adapters with dependency injection, triple-redundant AI systems
Technical Deep Dive: The NLP Pipeline
The pipeline transforms natural language into executable Playwright code through four stages:
Stage 1: RequirementsParser — Pattern matching + NLP extract testable elements from requirements text. Parses user stories, acceptance criteria, and feature descriptions into structured TestableElement objects with identified intents, actions, and expected outcomes.
Stage 2: TransformerEngine — Semantic analysis using HuggingFace models. The facebook/bart-large-mnli model performs zero-shot text classification, framing each requirement as an NLI premise and scoring it against candidate test categories (functional, accessibility, performance, security) without task-specific training data. The all-MiniLM-L6-v2 model generates 384-dimensional semantic embeddings for similarity comparison, grouping related requirements and detecting duplicate test coverage.
Stage 3: TestCaseGenerator — Transforms analyzed requirements into structured test scenarios, including positive paths, negative paths, and edge cases. Each scenario includes preconditions, steps, expected outcomes, and priority scoring.
Stage 4: PlaywrightCodeGenerator — Converts structured test scenarios into executable Playwright code with proper page objects, assertions, and fixture usage.
// Simplified view of the pipeline orchestrationexport class NLPEngine { private transformerEngine: TransformerEngine;
async analyzeRequirements(text: string): Promise<RequirementAnalysis> { // Stage 1: Parse intents from natural language const intents = this.extractIntents(text);
// Stage 2: Classify and embed with transformer models const classification = await this.transformerEngine.classifyIntent(text); const semanticContext = await this.transformerEngine.analyzeSemantics(text);
// Stage 3-4: Generate scenarios and code (downstream) return { intents, testableElements: this.identifyTestableElements(text), acceptanceCriteria: this.extractAcceptanceCriteria(text), confidence: classification.confidence, semanticContext, }; }}The pipeline achieves 70% automation rate — meaning 70% of natural language requirements successfully generate executable test code without manual intervention. NLP parsing accuracy is 95%, with generation speed under 5 seconds.
Technical Deep Dive: Intelligent Model Routing
Not every task requires the most expensive model. CasianaAI routes tasks to the appropriate Claude model based on complexity detection:
enum TaskComplexity { SIMPLE = 'simple', // Intent classification, basic parsing MODERATE = 'moderate', // Template-based generation, standard analysis COMPLEX = 'complex', // Test generation, user story mapping, failure reasoning}Claude Opus handles complex reasoning — test generation from ambiguous requirements, multi-step user story mapping, root cause analysis of cascading failures, and architectural recommendations. These tasks require the model to hold multiple contexts simultaneously and reason across them.
Claude Sonnet handles simpler tasks — intent classification, selector suggestions, basic parsing, and template filling. These tasks have clear patterns and bounded output.
The routing is automatic. The system analyzes the task, classifies its complexity, and sends it to the appropriate model. The result: 60% cost reduction compared to routing everything through Opus, with no measurable quality loss on simple tasks.
Five Patent-Pending Innovations
CasianaAI includes five innovations documented as patent-pending:
1. Natural Language Test Generation
The full NLP-to-Playwright pipeline described above. No other tool in the market takes plain English requirements and produces structured, framework-aware test code through a semantic analysis pipeline. Copilot generates code completions. CasianaAI generates test strategies.
2. Predictive Failure Prevention
Predicts test failures 3-5 commits in advance by combining code change analysis with time-series failure data and business impact assessment. The system identifies five code smell categories — fragile selectors, timing dependencies, flaky assertions, environment coupling, and data dependencies — and calculates technical debt with interest rate and compounding factors.
interface PredictedFailure { testId: string; probability: number; // 0-1 failure likelihood timeFrame: TimeFrame; // When the failure will manifest rootCause: RootCauseAnalysis; preventionStrategies: PreventionStrategy[]; businessImpact: BusinessImpactAssessment;}Prediction accuracy: 90%.
3. Autonomous Self-Healing
When a test fails, the TestHealingEngine doesn’t just retry — it diagnoses the failure type and applies the appropriate healing strategy. Nine failure types are detected, and six healing strategies are available:
- Intelligent Selector Evolution — Analyzes DOM changes and finds new selectors
- Dynamic Wait Optimization — Adjusts timing for changed page load patterns
- Assertion Adaptation — Updates assertions for intentional UI changes
- Navigation Healing — Fixes broken navigation paths
- Pattern Matching Fix — Applies known healing patterns from institutional memory
- Claude Reasoning Fix — For novel failures, sends the full context to Claude Opus for reasoning-based repair
The engine maintains a healing history and learns from every attempt. Recovery rate: 60% for automated healing without human intervention.
4. Institutional Memory System
Every healing attempt, every failure pattern, every successful fix feeds into the institutional memory system. Ten pattern categories are tracked. The system learns which healing strategies work for which failure types across the entire customer base.
This creates a compound advantage: the more tests CasianaAI heals, the better it gets at healing. New customers benefit from patterns learned across all previous deployments. The institutional memory is the moat — competitors starting from scratch have zero learned patterns.
5. Quantum-Inspired Test Optimization
Probabilistic optimization of test execution order based on predicted failure likelihood and business impact. Tests most likely to fail run first. Tests with highest business impact get priority. Execution time reduces because the most informative tests run earliest.
The Quality Gate Philosophy
CasianaAI bridges the gap between AI evaluation and AI testing — the distinction explored in The Brain vs. The Body. The NLP pipeline evaluates whether the AI generates good test code (the brain). The self-healing engine tests whether those tests continue to work in production (the body). Both layers are necessary; neither is sufficient alone.
The same philosophy applies to CasianaAI’s own development: AI-generated code passes through quality gates before shipping. When the AI code reviewer is confidently wrong, the human overrides. The human judgment layer is the system’s integrity check.
Results & Metrics
| Metric | Value |
|---|---|
| NLP → Test Automation Rate | 70% |
| NLP Parsing Accuracy | 95% |
| Failure Prediction Accuracy | 90% |
| Self-Healing Recovery Rate | 60% |
| Generation Speed | < 5 seconds |
| Codebase Size | 18,204 lines TypeScript |
| Test Suites | 151 |
| Model Cost Reduction | 60% via intelligent routing |
What I Learned
Ports & Adapters was the right call for IP protection. When building AI-powered tooling on personal time while employed, clean architectural separation isn’t optional — it’s legal protection. The intelligence lives behind port interfaces. The basic implementations use no proprietary data. The enhanced implementations can be activated independently. An environment variable switches between them.
Triple redundancy beats single-point AI. Every critical AI operation has three paths: primary model call, fallback model call, and deterministic fallback. If Claude is down, the system degrades gracefully — it doesn’t fail. Most AI-powered tools treat the model as a hard dependency. CasianaAI treats it as the preferred path with guaranteed alternatives.
NLP accuracy matters more than generation speed. Early iterations optimized for fast generation. But a test generated from a misunderstood requirement is worse than no test — it creates false confidence. Shifting investment to the NLP parsing stage (the TransformerEngine with HuggingFace models) improved end-to-end quality more than any downstream optimization.
Self-healing needs institutional memory to scale. A self-healing engine that starts fresh on every failure will always be slower than one that remembers what worked before. The healing history — 10 pattern categories, success rates per strategy, failure type distributions — turns individual fixes into system-wide intelligence.
What’s Next
CasianaAI is being developed as a commercial SaaS platform targeting engineering teams that maintain large Playwright test suites:
- Phase 6: Visual Intelligence — Screenshot-based test generation. Point at a UI, describe the behavior, get executable tests. Combines the NLP pipeline with visual understanding.
- Phase 7: Multi-Framework Support — Extending code generation beyond Playwright to Cypress, Selenium, and native mobile frameworks.
- Open-source core — The basic port implementations (non-AI-powered deterministic logic) will be released as open-source. The enhanced implementations remain commercial.
The vision: democratize test intelligence. Every team deserves the compound advantage of institutional memory, predictive prevention, and autonomous healing — not just teams that can afford to hire a dedicated SDET for every 5 developers.
Named after Casiana. Built for everyone.