kiteto logo
E2E Tests with AI: Technical Hurdles and Why It's More Complicated Than You Think

E2E Tests with AI: Technical Hurdles and Why It's More Complicated Than You Think

Georg Dörgeloh May 20, 2025

“Can’t we just use ChatGPT to write our E2E tests?” You probably hear this question frequently as a developer. The answer is more complicated than a simple yes or no. In this article, I’ll show you the technical reality behind AI-powered test automation – and why the problem is more interesting (and difficult) than it initially appears.

The Problem with Simple Approaches

Approach 1: Code Generation with LLMs

The most obvious approach is to have an LLM generate test code:

// AI-generated E2E test
test('User can login and update profile', async ({ page }) => {
  await page.goto('https://example.com');
  await page.fill('input[name="username"]', 'testuser');
  await page.fill('input[name="password"]', 'password123');
  await page.click('button[type="submit"]');
  await page.waitForNavigation();
  // and so on...
});

Technical Problems:

Approach 2: Browser Automation with LLMs

The next step: Let the LLM control the browser directly. The LLM receives screenshots and DOM, analyzes the current state, and decides what to do next.

Why this works technically:

Why it fails in practice:

Hybrid Approach: Record & Replay

The most promising approach combines AI generation with traditional test execution:

  1. Recording Phase: AI executes the test once and records actions
  2. Code Generation: Actions are translated into stable test code
  3. Replay Phase: Tests run without AI involvement
  4. Regeneration: AI can re-record the test when errors occur
class HybridTestRecorder {
  async recordTest(testDescription) {
    const actions = [];

    // AI-controlled execution with recording
    while (!this.isTestComplete()) {
      const screenshot = await this.takeScreenshot();
      const action = await this.aiAgent.nextAction(screenshot, testDescription);

      actions.push({
        type: action.type,
        selector: this.generateStableSelector(action.element),
        data: action.data,
        screenshot: screenshot,
      });

      await this.executeAction(action);
    }

    return this.generatePlaywrightCode(actions);
  }
}

Why Existing Tools Aren’t Sufficient

Problem with Generic AI Assistants

ChatGPT, Claude & Co. aren’t optimized for browser automation:

Problem with Simple Browser Automation Tools

Tools like Claude Desktop with Playwright MCP only solve surface problems:

The Three Core Technical Problems

1. The Context Problem

The DOM of a modern webapp can quickly become 2-10MB in size. LLMs have limited context windows. Even GPT-4 with 128k tokens cannot handle the complete DOM of a modern single-page application.

Solution Approaches:

2. The Visual-DOM Mapping Problem

The LLM sees a “Search” button in the screenshot, but the DOM only contains:

<button class="btn-primary">
  <svg viewBox="0 0 24 24">
    <path
      d="M15.5 14h-.79l-.28-.27C15.41 12.59 16 11.11 16 9.5 16 5.91 13.09 3 9.5 3S3 5.91 3 9.5 5.91 16 9.5 16c1.61 0 3.09-.59 4.23-1.57l.27.28v.79l5 4.99L20.49 19l-4.99-5zm-6 0C7.01 14 5 11.99 5 9.5S7.01 5 9.5 5 14 7.01 14 9.5 11.99 14 9.5 14z"
    />
  </svg>
</button>

How should the LLM know that this SVG icon represents the “Search” button, which it is looking for?

Technical Solution Approaches:

3. The Selector Stability Problem

A reliable E2E test needs stable selectors. The LLM must not only find an element but also generate a robust selector for it.

Selector Hierarchy (from stable to fragile):

  1. data-testid attributes (ideal, but rarely available)
  2. Semantic selectors (role, aria-label)
  3. Relative positioning to known elements
  4. CSS classes and IDs
  5. XPath with absolute positions (fragile)
// Robust selector algorithm
function generateSelector(element, dom) {
  if (element.dataset.testid) {
    return `[data-testid="${element.dataset.testid}"]`;
  }

  if (element.getAttribute('aria-label')) {
    return `[aria-label="${element.getAttribute('aria-label')}"]`;
  }

  // Fallback: Relative positioning
  const nearbyLandmark = findNearestLandmark(element, dom);
  if (nearbyLandmark) {
    return `${nearbyLandmark.selector} >> ${getRelativeSelector(element)}`;
  }

  // Last resort: CSS path
  return generateCSSPath(element);
}

The Value of Specialized Solutions

A well-thought-out AI test automation solution should provide:

Technical Excellence:

Developer Experience:

Business Value:

Conclusion: Why You Should Suggest a Specialized Solution to Your PO

AI test automation is a fascinating problem that goes far beyond “LLM + Browser.” The technical challenges are real and complex:

As a developer, you understand this complexity. When your product owner asks: “Can’t we just use ChatGPT?”, you can explain why a specialized solution is necessary.

The value for you:

The future of E2E test automation doesn’t lie in generic AI tools, but in specialized systems that thoughtfully solve these technical challenges.

Be Among the First to Try kiteto

Describe test cases in your own words. Let AI handle the automation.

  • Empower your entire Team to create automated tests
  • Stop fixing broken Tests
  • Save valuable developer time
  • Ship with confidence, ship faster