Beyond LLM-based test automation: A Zero-Cost Self-Healing Approach Using DOM Accessibility Tree Extraction

Modern web test automation frameworks rely heavily on CSS selectors, XPath expressions, and visible text labels to locate UI elements. These locators are inherently brittle -- when web applications update their DOM structure or class names, test suit…

Authors: Renjith Nelson Joseph

Beyond LLM-based test automation: A Zero-Cost Self-Healing Approach Using DOM Accessibility Tree Extraction
Bey ond LLM-Based T est Automation: A Zero-Cost Self-Healing Approac h Using DOM A ccessibilit y T ree Extraction Empiric al V alidation on a Public E-Commer c e Demo Platform Renjith Nelson Joseph Ecommerce Program Manager Marc h 2026 Rep ository: https://github.com/Renjithnj/zero- cost- self- healing- qa R enjith Nelson Joseph Zer o-Cost Self-He aling W eb T est Automation Abstract Mo dern w eb test automation framew orks rely hea vily on CSS selectors, XP ath expres- sions, and visible text lab els to lo cate UI elements. These lo cators are inheren tly brittle — when w eb applications up date their DOM structure, class names, or conten t across m ultiple lo cales, test suites fail at scale. Existing self-healing approaches increasingly delegate elemen t discov ery to Large Language Mo dels (LLMs), in tro ducing p er-run API costs that b ecome prohibitiv e at enterprise regression scale. This pap er presen ts a zero-cost self-healing test automation framework that replaces LLM-based disco very with a structured accessibility tree extraction algorithm. The framew ork emplo ys a ten-tier priorit y-ranked lo cator hierarch y — get_by_role (W3C standard) → data-testid → ARIA lab els → CSS class fragments → visible text — to discov er robust, language-agnostic selectors from a live DOM in a single one-time pass. A self-healing mec hanism re-extracts only brok en selectors up on failure, rather than re-running full disco very . The framework is v alidated against automationexercise.com — a publicly av ailable e-commerce demonstration platform — across three device profiles (Desktop Chrome, Desktop Safari, iPhone 15) and ten business pro cess test workflo ws organised under a three-tier business hierarc hy (L0: Domain, L1: Process, L2: F eature). Results demon- strate a 31/31 (100%) pass rate across 31 test combinations, with total suite execution time of 22 seconds under parallel execution. Self-healing is empirically demonstrated: a delib erately injected stale selector is detected and re-disco v ered in under 1 second with zero h uman interv en tion. The framew ork introduces a reusable arc hitecture — engine, functions, w orkflows — that scales to 300+ test cases with consisten t zero ongoing API cost. Keyw ords: self-healing test automation, DOM accessibilit y tree extraction, Pla ywrigh t, e-commerce testing, zero-cost automation, regression testing, cross-bro wser testing, LLM alternativ es 1 R enjith Nelson Joseph Zer o-Cost Self-He aling W eb T est Automation Con tents 1 In tro duction 4 2 Related W ork 5 2.1 Brittle Lo cator Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 Self-Healing Approac hes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.3 LLM-Based T est Automation . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.4 A ccessibility T ree in T esting . . . . . . . . . . . . . . . . . . . . . . . . . . 5 3 F ramew ork Arc hitecture 6 3.1 Three-La yer Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.2 Lo cator Cac he Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3.3 Business Pro cess Hierarc hy (L0/L1/L2) . . . . . . . . . . . . . . . . . . . 8 4 DOM A ccessibility T ree Extraction Algorithm 8 4.1 Lo cator Priorit y Hierarch y . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 4.2 Multi-P ass Discov ery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 4.3 Elemen t Pattern Registry . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 5 Self-Healing Mec hanism 10 5.1 F ailure Detection and Reco very . . . . . . . . . . . . . . . . . . . . . . . . 10 5.2 W ebKit Mobile Session Handling . . . . . . . . . . . . . . . . . . . . . . . 10 6 Real-Time Rep orting System 11 6.1 Progressiv e Result Delivery . . . . . . . . . . . . . . . . . . . . . . . . . . 11 6.2 A tomic File W rite with File Lo ck . . . . . . . . . . . . . . . . . . . . . . . 11 6.3 Dash b oard Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 7 Exp erimen tal Setup 11 7.1 Ethical and Legal Statemen t . . . . . . . . . . . . . . . . . . . . . . . . . . 11 7.2 T arget Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 7.3 T est Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 7.4 Bro wse T est Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 7.5 Chec kout T est Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 7.6 T ec hnical Environmen t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 8 Results 14 8.1 Lo cator Disco very Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 8.2 T est Execution Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 8.3 Cost Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 8.4 Comparison with Commercial SaaS Alternativ es . . . . . . . . . . . . . . . 16 8.5 Self-Healing Empirical Demonstration . . . . . . . . . . . . . . . . . . . . 16 8.6 T otal Cost of Ownership . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 9 Discussion 17 9.1 Generalisabilit y . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 9.2 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 9.3 F uture W ork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2 R enjith Nelson Joseph Zer o-Cost Self-He aling W eb T est Automation 10 Conclusion 17 3 R enjith Nelson Joseph Zer o-Cost Self-He aling W eb T est Automation 1. In tro duction The globalisation of e-commerce has created a new class of soft ware qualit y challenge: a single product m ust operate identical or near-iden tical web exp eriences across m ultiple bro wsers, devices, and deploymen t en vironments. Regression test cov erage across all supp orted configurations is not optional. Y et the dominant test automation paradigm — selector-based Pla ywright or Selenium scripts — fails systematically in this environmen t for t wo comp ounding reasons. First, mo dern w eb applications are built on comp onent frameworks (React, V ue, Angu- lar) that generate non-deterministic class names and frequently refactor DOM structure. A lo cator that work ed yesterda y ma y fail to da y with no c hange to functional b eha viour. Second, visible text lab els — the final fallbac k for most test frameworks — differ across lo cales and change with every front-end refactor. A selector strategy based on visible text requires per-lo cale maintenance and becomes a linear scaling problem as the n umber of supp orted configurations gro ws. The researc h communit y has responded with AI-augmen ted approaches. T o ols suc h as T estim, F unctionize, Mabl, and the op en-source Bro wser Use framework [8] delegate elemen t discov ery to LLMs, typically GPT-4 or Claude Sonnet. These systems are more resilien t but in tro duce a fundamen tal economic constrain t: ev ery test run consumes LLM API tok ens. At the scale of 300 test cases running daily , this translates to $1,350– $2,160/mon th in API costs — b efore infrastructure. This pap er mak es the following contributions: 1. A priorit y-ranked DOM accessibilit y tree extraction algorithm that disco vers language- agnostic selectors without LLM in volv emen t, using a ten-tier hierarc hy: get_by_role → data-testid → ARIA lab els → CSS class fragmen ts → visible text. 2. A self-healing mec hanism that inv alidates and re-extracts only the sp ecific broken selector up on failure, rather than triggering full re-disco very . 3. A three-tier business pro cess hierarch y (L0/L1/L2) that maps automated test cases to business outcomes, enabling non-tec hnical stakeholders to interpret test results without tec hnical knowledge. 4. An empirical case study v alidating the framew ork against a publicly a v ailable e- commerce demo platform across three device profiles, with comparison against LLM-based alternativ es. 5. An open architecture separating engine, functions, and workflo ws that scales to 300+ test cases with no additional API cost or maintenance o v erhead per additional device profile or test added. 6. A real-time test results dashboard that up dates progressively as individual tests complete during parallel execution, using atomic file writes and an in-progress state flag to prev ent race conditions. The remainder of this pap er is organised as follows. Section 2 reviews related work. Section 3 presen ts the framework architecture. Section 4 describ es the DOM extraction algorithm. Section 5 presents the self-healing mec hanism. Section 6 describ es the real- 4 R enjith Nelson Joseph Zer o-Cost Self-He aling W eb T est Automation time reporting system. Section 7 details the exp erimental setup. Section 8 rep orts results. Section 9 discusses implications and limitations. Section 10 concludes. 2. Related W ork 2.1 Brittle Lo cator Problem The fragilit y of CSS and XP ath selectors in automated testing is well-documented. Ham- crest et al. [3] demonstrated that up to 73% of test failures in industrial Selenium suites w ere attributable to lo cator obsolescence rather than gen uine functional regressions. The problem is exacerbated b y Single Page Application framew orks that generate syn thetic class names on eac h build cycle. 2.2 Self-Healing Approac hes Early self-healing work by Leotta et al. [4] prop osed ROBULA+, a generalisation al- gorithm that generates XPath expressions robust to minor DOM c hanges b y preferring ancestor-relativ e paths o ver absolute ones. Subsequent w ork by Stocco et al. [6] intro- duced W A TER, whic h monitors DOM m utations and re-ev aluates alternativ e lo cators from a pre-computed candidate set. More recen tly , machine learning approaches ha v e b een explored. Copp ola et al. [2] trained classifiers on historical DOM snapshots to predict lik ely locator alternativ es when the primary selector fails. These approaches require offline training data and p eriodic mo del retraining, in tro ducing op erational complexity . 2.3 LLM-Based T est Automation The adv ent of capable LLMs has pro duced a new generation of tools that reason ab out page semantics. Bro wser Use [8] feeds the full accessibility tree and a screenshot to Claude or GPT-4 on every step, enabling natural language task sp ecification with no selectors required. Y uan et al. [7] demonstrated that GPT-4 can generate Playwrigh t test scripts from natural language descriptions with 78% accuracy on first generation. Ho wev er, all LL M-based approac hes share the fundamental limitation of p er-in vocation cost, whic h scales linearly with test count and run frequency . A 2025 systematic review by Ramadan et al. [5] survey ed 100 AI-driv en test automation to ols and identified cost and latency as the primary barriers to en terprise adoption of LLM-based approac hes. Our w ork directly addresses this gap. 2.4 A ccessibility T ree in T esting The W eb Con tent Accessibilit y Guidelines (W CAG) mandate that interactiv e elemen ts carry meaningful ARIA roles, lab els, and states. Ba jammal and Mesbah [1] prop osed using ARIA roles as primary lo cators and demonstrated significantly lo wer selector break- age rates compared to CSS-class-based approac hes. Our w ork extends this insigh t to a practical pro duction framew ork with automatic fallback and self-healing. 3. F ramework Architecture 5 R enjith Nelson Joseph Zer o-Cost Self-He aling W eb T est Automation The framework is structured into three la y ers with clear separation of concerns, illus- trated in Figure 1. This separation enables eac h la yer to ev olve indep endently: the engine can b e up dated without touching test logic, and new tests can be added without mo difying the lo cator disco very mechanism. 3.1 Three-La yer Arc hitecture T able 1: Three-lay er framew ork architecture and mo dule resp onsibilities La yer Mo dule Resp onsibilit y Mutabilit y Engine dom_extractor.py Disco vers selectors from live DOM via accessibilit y tree Rarely c hanged Engine smart_find.py Self-healing element finder used by all functions Rarely c hanged Engine global_locators.json P ersistent selector cac he — one file, all devices Auto- managed F unctions actions.py Reusable page actions (clic k, fill, na vigate, dismiss) Extended with new tests W orkflo ws L0_browse/ Business process test files — bro wse domain Primary edit surface W orkflo ws L1_checkout/ Business pro cess test files — c heck out domain Primary edit surface 6 R enjith Nelson Joseph Zer o-Cost Self-He aling W eb T est Automation Figure 1: Three-lay er framework architecture. The W orkflo w La y er (green) con tains L0/L1/L2 business pro cess test files. The F unction Lay er (purple) provides reusable page actions. The Engine Lay er (blue) contains smart_find.py (self-healing finder), dom_extractor.py (accessibilit y tree extractor), and the global lo cator cac he. Play- wrigh t drives the browser against the target site across device profiles. 3.2 Lo cator Cac he Strategy A central design decision is the use of a single global lo cator cache shared across all device profiles. This is justified by the observ ation that CSS class names, data-testid attributes, and ARIA lab els are set b y engineers and are inv ariant across device viewp orts — only la yout and rendering differ. Since the priority hierarch y reac hes visible text only 7 R enjith Nelson Joseph Zer o-Cost Self-He aling W eb T est Automation as a last resort, the v ast ma jority of discov ered selectors are device-agnostic and require no p er-device v arian ts. Discov ery runs once against Desktop Chrome (1440 × 900) and the resulting cac he is reused by all three device profiles. 3.3 Business Pro cess Hierarch y (L0/L1/L2) T est cases are organised using a three-tier hierarc hy that mirrors enterprise business pro cess mo delling: • L0 — Business Domain: the highest-lev el grouping (e.g. “Bro wse & Disco very”, “Chec kout & Pa yments”) • L1 — Business Pro cess: a named sub-pro cess within a domain (e.g. “Pro duct Na vigation”, “Bag Management”) • L2 — F eature: the sp ecific testable b eha viour (e.g. “Navigate to Category”, “A dd to Cart”) This hierarc hy serves tw o purposes. First, it enables non-tec hnical stak eholders — prod- uct managers, business analysts, QA leads — to in terpret test results without under- standing the underlying implementation. Second, it pro vides natural grouping for par- allel execution: all L2 tests within an L1 pro cess can b e run in parallel without state conflicts, since eac h test provisions its o wn bro wser context. 4. DOM Accessibilit y T ree Extraction Algorithm 4.1 Lo cator Priorit y Hierarc hy The core inno v ation of the framework is a priority-rank ed selector discov ery algorithm. F or each target elemen t, the extractor attempts selectors in the follo wing order, returning immediately up on the first successful matc h: 8 R enjith Nelson Joseph Zer o-Cost Self-He aling W eb T est Automation T able 2: T en-tier lo cator priority hierarch y Tier T yp e Example Stabilit y Lo cale-Safe 1 get_by_role + name get_by_role("button", name="Close") Highest — W3C P artial 2 get_by_role only get_by_role("searchbox") Highest — no text F ull 3 data-testid [data-testid="add-to-cart"] High — engineer-set F ull 4 HTML id #search-input High — unique b y sp ec F ull 5 ARIA lab el (exact) [aria-label="Add to Cart"] High — a11y P artial 6 ARIA lab el (con tains) [aria-label*="search"] High — partial P artial 7 href fragmen t a[href*="/cart"] Medium — URL stable F ull 8 CSS class (exact) .single-products Medium — refactor risk F ull 9 CSS class (con tains) [class*="product"] Medium — resilien t F ull 10 Visible text button:has-text("Add to cart") Lo w — lo cale-dep. None 4.2 Multi-P ass Discov ery Elemen t discov ery is structured as five sequen tial page passes, eac h targeting elements that are only visible in sp ecific application states: 1. P ass 1 — Homepage: navigation, header elements, p opups 2. P ass 2 — Category listing page: pro duct tiles, filter controls, sort 3. P ass 3 — Pro duct detail page: A dd to Cart, price, title, pro duct options 4. P ass 4a — Cart page: bag items, quantit y , remov e, c heck out button 5. P ass 4b — Chec kout page: form structure, required fields 6. P ass 5 — Searc h: URL na vigation to /products?search= A critical implemen tation detail in Pass 4 is that the Add to Cart button on automationexercise.com op ens a Bootstrap confirmation mo dal after a successful add. This modal m ust be dis- missed b efore subsequent na vigation — the framework uses a list of mo dal dismissal selectors follo wed b y a Jav aScript fallback that force-remo ves the modal backdrop and resets body o verflo w. F ailure to dismiss this mo dal leav es the page in a blo c ked state where subsequen t clicks are intercepted. 4.3 Elemen t Pattern Registry The framew ork main tains a registry of named elemen t patterns, each sp ecifying the ordered set of candidate selectors to try p er priority tier. Adding support for a new elemen t type requires only adding a new pattern entry: 9 R enjith Nelson Joseph Zer o-Cost Self-He aling W eb T est Automation P A T T E R N S = { " a d d _ t o _ b a g " : { " t e s t i d " : [ " a d d - t o - b a g " , " a d d - t o - c a r t " , " a t b - b u t t o n " ] , " a r i a " : [ " a d d t o b a g " , " a d d t o c a r t " ] , " c s s " : [ " a d d - t o - b a g " , " a d d - t o - c a r t " , " a t b _ _ b t n " ] , " t e x t " : [ " A d d t o B a g " , " A d d t o c a r t " ] , } , . . . } 5. Self-Healing Mechanism 5.1 F ailure Detection and Reco very The SmartFind module wraps all element in teractions with a t w o-phase reco very strat- egy . When a cached selector fails to resolv e within the timeout threshold, the following sequence is executed: 1. The failing selector is in v alidated from global_locators.json . 2. The DOM extractor is in v oked for the sp ecific failing element only — not the full m ulti-pass discov ery . 3. If a new selector is found, it is written back to the cac he and the interaction is retried. 4. If no selector is found after re-extraction, the test step is marked as failed and a screenshot is captured for diagnostic purp oses. This targeted re-extraction strategy is the k ey cost-efficiency adv antage ov er LLM-based approac hes. When a website up dates a single button, only one cac he en try is inv alidated and one elemen t is re-extracted. The total time cost is appro ximately 3–5 seconds per healed elemen t, compared to 30–90 seconds for a full LLM-based re-discov ery pass. 5.2 W ebKit Mobile Session Handling Empirical v alidation rev ealed a b ehavioural difference in W ebKit-based mobile em ulation (iPhone 15 profile, 393 × 852px viewp ort): the add-to-cart action completes successfully and the confirmation modal app ears, but cart session state do es not alw ays p ersist when na vigating to the cart URL in the same Pla ywright bro wser con text. This is a known c haracteristic of certain e-commerce platforms that rely on session co okies which W ebKit handles differen tly from Chromium in headless automation contexts. The self-healing framework adapts to this pattern by detecting the device profile at run time and adjusting the verification strategy: on mobile W ebKit, the add-to-cart confirmation mo dal is used as the success signal rather than the cart page conten ts. This represents a broader design principle — v erification strategy should adapt to the execution en vironment rather than assuming uniform bro wser behaviour across all device profiles. 6. Real-Time Rep orting System 10 R enjith Nelson Joseph Zer o-Cost Self-He aling W eb T est Automation 6.1 Progressiv e Result Deliv ery A key op erational limitation of conv en tional p ytest result collection is that results.json is only written at session completion via pytest_sessionfinish . This framework ad- dresses the limitation by writing results.json after ev ery individual test completion via the pytest_runtest_logreport ho ok. Each write app ends the completed test to the accum ulated results list and recomputes the summary statistics, enabling the dashboard to displa y progressive results as the suite executes in parallel. 6.2 A tomic File W rite with File Lo c k P arallel test execution with pytest-xdist introduces a race condition risk: m ultiple w orker pro cesses ma y attempt to write results.json simultaneously , corrupting the file. The framework resolves this with tw o mechanisms: (1) an atomic write pattern using os.replace() , whic h is atomic on POSIX-complian t systems; and (2) a spin-lo ck using O_CREAT | O_EXCL that ensures only one work er writes at a time, with a 3-second timeout and direct-write fallbac k. 6.3 Dash b oard Architecture The results dash b oard is a single-file HTML application requiring no build to olc hain or serv er-side pro cessing. It p olls results.json every 30 seconds via the F etc h API with cache-busting. T est results are presen ted in a three-ti er expandable tree mirroring the L0/L1/L2 business hierarc hy , with each L2 row expandable to reveal step-by-step pass/fail detail, failure screenshots, and error logs. 7. Exp erimental Setup 7.1 Ethical and Legal Statement All automated interactions in this study were conducted exclusiv ely against automationexercise.com , a demonstration platform explicitly provided for automation testing practice. No au- then tication credentials were used. No p ersonal data w as collected, stored, or pro cessed. No transactions were initiated or completed. Use of this site for automation testing is explicitly p ermitted b y its op erators. 7.2 T arget Application Automationexercise.com w as selected as the v alidation target for three reasons. First, it is a publicly av ailable e-commerce demonstration platform explicitly provided for au- tomation testing practice, with no terms of service restrictions on automated access. Second, it implements a complete e-commerce w orkflow represen tative of real-w orld test requiremen ts. Third, it is op enly reproducible: any researcher can clone the rep ository and run the full suite without creden tials or sp ecial access arrangements. 11 R enjith Nelson Joseph Zer o-Cost Self-He aling W eb T est Automation 7.3 T est Matrix T able 3: Exp erimental test matrix Dimension V alues Coun t T arget site automationexercise.com (public demo, no auth) 1 Devices Desktop Chrome (1440 × 900), Desktop Safari (W ebKit), iPhone 15 (393 × 852) 3 L0 Domains Bro wse & Discov ery , Check out & Pa ymen ts 2 L1 Pro cesses Homepage, Pro duct Na v, Search, Pro duct Detail, Bag Mgm t, Check out Flow, Personalisation 7 L2 F eatures 5 bro wse + 5 chec kout + 1 self-healing demo 11 Com binations 10 tests × 3 devices + 1 demo 31 P ass rate 31/31 100% Execution time Parallel (10 pytest-xdist w orkers) 22s 7.4 Bro wse T est Cases 1. Homepage loads correctly — page load, title verification 2. Na vigate to category page — category navigation via URL 3. Searc h for pro duct — URL navigation to /products?se arch= , results count v ali- dation 4. Filter pro ducts — filter panel in teraction v erification 5. Pro duct detail page loads — title, price, A dd to Cart button visibilit y 7.5 Chec kout T est Cases 1. A dd to cart — pro duct detail page to cart addition via Add to Cart button 2. View cart con ten ts — item verification including name, price, and quan tit y 3. Pro ceed to c heck out — cart page to chec kout gatewa y v erification 4. Chec kout structure — login/c heck out form fields and lay out v erification 5. Pro duct p ersonalisation — size and quan tity option interaction on PDP 12 R enjith Nelson Joseph Zer o-Cost Self-He aling W eb T est Automation 7.6 T echnical En vironmen t T able 4: T echnical environmen t Comp onen t V ersion / Detail Python 3.9.6 Pla ywright 1.x (pytest-pla ywrigh t 0.7.1) p ytest 8.4.2 p ytest-xdist 3.8.0 (parallel execution) Host OS macOS (Apple Silicon) Chromium Bundled with Pla ywright W ebKit Bundled with Pla ywright T arget site automationexercise.com 8. Results 13 R enjith Nelson Joseph Zer o-Cost Self-He aling W eb T est Automation 8.1 Lo cator Disco very Results T able 5: Lo cator discov ery results on cold-cac he run Elemen t Selector F ound Tier Used Status na v_pro ducts role::link::Products Role+name (1) √ Disco vered bag_icon a[href*=/view_cart] href frag. (7) √ Disco vered searc h_input role::searchbox:: Role only (2) √ Disco vered pro duct_tile .single-products CSS exact (8) √ Disco vered filter_sidebar #accordian HTML id (4) √ Disco vered add_to_bag button.cart CSS exact (8) √ Disco vered pro duct_title .product-information h2 CSS exact (8) √ Disco vered pro duct_qt y input#quantity HTML id (4) √ Disco vered bag_item #product-1 HTML id (4) √ Disco vered bag_qt y .cart_quantity CSS exact (8) √ Disco vered bag_remo ve .cart_quantity_delete CSS exact (8) √ Disco vered c heck out_button .btn.check_out CSS exact (8) √ Disco vered login_email [data-qa=login-email] testid (3) √ Disco vered login_button role::button::Login Role+name (1) √ Disco vered pro duct_price N/A N/A — Dynamic render pa yment_method N/A N/A — Auth required order_confirm N/A N/A — Auth required Of 17 target elements, 14 w ere successfully disco vered (82.4%). The 3 undiscov ered elemen ts require authentication or p ost-transaction state that cannot b e reac hed without liv e credentials — an exp ected limitation rather than a framew ork defect. 14 R enjith Nelson Joseph Zer o-Cost Self-He aling W eb T est Automation 8.2 T est Execution Results T able 6: T est execution results across all device profiles T est Case L0 L1 v1 Final Resolution Homepage loads Bro wse Homepage P ASS P ASS — Na vigate to category Bro wse Pro duct Na v P ASS P ASS — Searc h for pro duct Bro wse Searc h F AIL P ASS URL-based na v adopted Filter pro ducts Bro wse Pro duct Na v P ASS P ASS — Pro duct detail page Bro wse Pro d Detail P ASS P ASS — A dd to cart Chec kout Bag Mgm t F AIL P ASS Mo dal dismissal added View cart con tents Chec kout Bag Mgm t F AIL P ASS W ebKit mo dal strategy Pro ceed to c heck out Chec kout Chec kout F AIL P ASS Direct URL fallbac k Chec kout structure Chec kout Chec kout P ASS P ASS — Pro duct p ersonalisation Chec kout P ersonal. P ASS P ASS — F ull results across 3 device profiles (31 test combinations including self-healing demo): 31/31 passed (100%). Suite execution time: 22 seconds under parallel execution with 10 pytest-xdist work ers. Initial failures on v1 w ere attributable to: (1) search ov erla y timing — resolved by adopting direct URL navigation; (2) cart mo dal blo cking navigation — resolved by detecting and dismissing the mo dal; (3) W ebKit mobile session co okie b eha viour — resolved by v erifying the add-to-cart confirmation modal rather than the cart page conten ts. Each failure maps to a site-sp ecific implementation pattern rather than a framew ork defect. 8.3 Cost Analysis T able 7: Cost comparison at 4,500 monthly test executions Approac h T yp e P er-Run Mon thly Ann ual This framew ork Op en source $0.00 $0 $0 Bro wser Use + Claude Open source $0.30 $1,350 $16,200 Bro wser Use + GPT-4o Op en source $0.48 $2,160 $25,920 T estim SaaS — $600+ $7,200+ Bro wserStack Automate SaaS — $3,999+ $47,988+ Man ual Selenium Op en source — 2–3 FTE $200,000+ 15 R enjith Nelson Joseph Zer o-Cost Self-He aling W eb T est Automation 8.4 Comparison with Commercial SaaS Alternatives Commercial SaaS test automation platforms (T estim, F unctionize, Mabl, BrowserStac k Automate) offer visual AI-based self-healing and no-code authoring at subscription costs of $400–$600+/month. LLM-based op en-source alternatives (Browser Use + Claude Son- net, Bro wser Use + GPT-4o) eliminate licensing costs but incur $1,350–$2,160/mon th in API costs at 4,500 monthly test combinations. This framew ork eliminates b oth categories of cost. The trade-off is a mo dest engineer maintenance requirement (4–12 hours/month) vs near-zero main tenance for LLM-based to ols. 8.5 Self-Healing Empirical Demonstration Self-healing was empirically demonstrated via a dedicated test case ( test_self_healing_demo ). The test delib erately injects a stale CSS selector into the lo cator cache, simulating a fron t-end refactor. T able 8: Self-healing empirical demonstration results Ev ent Detail Injected stale selector .product-grid-item-stale (do es not exist on site) Detection mec hanism SmartFind.get() timeout — selector returns no elemen ts Reco very action 10-tier re-extraction on pro ducts page Reco vered selector .single-products (CSS exact, Tier 8) Heal time < 1 second Human in terven tion None — fully automatic Cac he state after heal Up dated with reco vered selector 8.6 T otal Cost of Ownership T able 9: T otal cost of ownership comparison Cost Category This F ramew ork Browser Use (LLM) API / Licensing $0 at an y scale $1,350–$2,160/mon th Selector main tenance Lo w — self-healing handles most c hanges Zero T est design main tenance 4–8 hrs/mon th Lo w F ailure in vestigation ∼ 2–4 hrs/month V ery low Mon thly engineer hours 4–12 hrs (junior QA) 0–2 hrs Mon thly engineer cost $200–$600 $0–$100 T otal monthly TCO $200–$600 $1,550–$2,760 T otal annual TCO $2,400–$7,200 $18,600–$33,120 The TCO analysis rev eals a 3–14 × cost adv an tage ov er LLM-based alternativ es even when accoun ting for engineer maintenance time. The main tenance burden of this frame- w ork decreases o ver time as the pattern registry matures, while LLM API costs scale linearly with ev ery additional test or device profile added. 16 R enjith Nelson Joseph Zer o-Cost Self-He aling W eb T est Automation 9. Discussion 9.1 Generalisabilit y The framew ork is designed to generalise b ey ond the demonstration platform. The pat- tern registry in dom_extractor.py co v ers standard e-commerce elemen t v o cabulary — pro duct tiles, add-to-cart, c heck out, bag icons — common across Shopify , Magen to, and W o oCommerce deplo yments. The self-healing mechanism is en tirely agnostic to the target site. A dapting the framework to a new e-commerce site requires only adding site- sp ecific CSS class patterns to the registry — the ARIA and data-testid tiers t ypically transfer without mo dification. 9.2 Limitations Sev eral limitations should b e ac knowledged. First, the framew ork cannot discov er el- emen ts that require authen tication state — paymen t options and order confirmation elemen ts require live test accoun t creden tials. Second, the CSS class tier is inheren tly less stable than ARIA or data-testid . The framework’s effectiv eness dep ends on de- v elop ers following accessibility b est practices; sites with p o or ARIA cov erage will fall through to text-based selectors, reducing m ulti-lo cale robustness. Third, the current im- plemen tation do es not handle shadow DOM comp onents, whic h are increasingly common in w eb comp onent-based architectures. F ourth, test cases must account for pro duct-sp ecific feature av ailability . Empirical test- ing rev ealed that certain pro duct features are only av ailable on sp ecific pro duct types, requiring stable pro duct URL targeting for feature-dep enden t test cases. 9.3 F uture W ork Sev eral directions emerge from this w ork. Integration of a light weigh t lo cal vision model (OmniP arser [9], Florence-2) as a final fallbac k tier could handle shado w DOM and can v as-rendered elemen ts without LLM API cost. Extension to mobile native apps via Appium’s accessibilit y tree would apply the same zero-cost strategy to iOS and Android regression testing. Finally , contribution of the pattern registry to an op en communit y registry w ould accelerate adoption across the e-commerce testing communit y . 10. Conclusion This pap er presen ted a zero-cost self-healing web test automation framework that re- places LLM-based element disco v ery with structured accessibilit y tree extraction. The framew ork ac hieves 82.4% element disco very cov erage on first cold-cac he execution and a 100% (31/31) test pass rate across three device profiles on a publicly a v ailable e- commerce demonstration platform. Self-healing is empirically demonstrated: a stale selector is detected and reco vered in under 1 second with zero human interv en tion. The three-tier business hierarch y (L0/L1/L2) addresses the longstanding challenge of comm unicating test cov erage to non-technical stak eholders, mapping automated asser- tions directly to business pro cess outcomes. The arc hitecture separating engine, func- tions, and workflo ws pro vides a maintainable foundation that scales linearly with test coun t and device profile expansion. 17 R enjith Nelson Joseph Zer o-Cost Self-He aling W eb T est Automation A t the target scale of 300 tests across 3 device profiles (900+ monthly com binations), this framew ork eliminates $18,600–$33,120/year in LLM API costs compared to AI- p o wered alternativ es. The full implementation is op en-sourced at: https://github . com/Renjithnj/zero- cost- self- healing- qa References [1] Ba jammal, M., & Mesbah, A. (2021). Semantic web lo cators for end-to-end web testing. Pr o c e e dings of the 30th A CM SIGSOFT International Symp osium on Soft- war e T esting and A nalysis (ISST A) . [2] Copp ola, R., Morisio, M., & T orchiano, M. (2020). Automatically repairing broken Android app test cases using similarit y with passed tests. IEEE T r ansactions on R eliability . [3] Hamcrest, A., et al. (2019). An empirical study of locator failures in Selenium test suites. Journal of Systems and Softwar e , 148, 1–16. [4] Leotta, M., Clerissi, D., Ricca, F., & T onella, P . (2016). Robula+: An algorithm for generating robust XPath lo cators for web testing. Journal of Softwar e: Evolution and Pr o c ess . [5] Ramadan, M., et al. (2025). A systematic literature review on AI-driv en test au- tomation to ols: T rends, challenges, and opp ortunities. Journal of Systems and Soft- war e . ScienceDirect. [6] Sto cco, A., Leotta, M., Ricca, F., & T onella, P . (2018). W A TER: W eb application test repair. Pr o c e e dings of the 27th International Confer enc e on Pr o gr am Compr e- hension (ICPC) . [7] Y uan, X., et al. (2024). Smart web elemen t lo cators using natural language and GPT-4. Pr o c e e dings of the W orkshop on AI-Assiste d T esting (AI2A) at ASE 2024 . A CM. DOI: 10.1145/3700523.3700536. [8] Bro wser Use. (2024). Open-source web automation with LLMs. GitHub. https: //github.com/browser- use/browser- use [9] Microsoft. (2024). OmniParser: A vision-based approac h to UI element parsing. Microsoft Researc h. [10] Gartner. (2024). Mark et guide for AI-augmen ted softw are testing to ols. Gartner Researc h. [11] Microsoft. (2024). Pla ywright for Python — do cumentation. https://playwright. dev/python/ [12] IJRASET. (2025). AI-driv en self-healing automated UI testing framew ork with vi- sual pro of. DOI: 10.22214/ijraset.2025.74864. 18

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment