Beyond LLM-based test automation: A Zero-Cost Self-Healing Approach Using DOM Accessibility Tree Extraction

Bey ond LLM-Based T est Automation: A Zero-Cost Self-Healing Approac h Using DOM A ccessibilit y T ree Extraction Empiric al V alidation on a Public E-Commer c e Demo Platform Renjith Nelson Joseph Ecommerce Program Manager Marc h 2026 Rep ository: https://github.com/Renjithnj/zero- cost- self- healing- qa R enjith Nelson Joseph Zer o-Cost Self-He aling W eb T est Automation Abstract Mo dern w eb test automation framew orks rely hea vily on CSS selectors, XP ath expres- sions, and visible text lab els to lo cate UI elements. These lo cators are inheren tly brittle — when w eb applications up date their DOM structure, class names, or conten t across m ultiple lo cales, test suites fail at scale. Existing self-healing approaches increasingly delegate elemen t discov ery to Large Language Mo dels (LLMs), in tro ducing p er-run API costs that b ecome prohibitiv e at enterprise regression scale. This pap er presen ts a zero-cost self-healing test automation framework that replaces LLM-based disco very with a structured accessibility tree extraction algorithm. The framew ork emplo ys a ten-tier priorit y-ranked lo cator hierarch y — get_by_role (W3C standard) → data-testid → ARIA lab els → CSS class fragments → visible text — to discov er robust, language-agnostic selectors from a live DOM in a single one-time pass. A self-healing mec hanism re-extracts only brok en selectors up on failure, rather than re-running full disco very . The framework is v alidated against automationexercise.com — a publicly av ailable e-commerce demonstration platform — across three device proﬁles (Desktop Chrome, Desktop Safari, iPhone 15) and ten business pro cess test workﬂo ws organised under a three-tier business hierarc hy (L0: Domain, L1: Process, L2: F eature). Results demon- strate a 31/31 (100%) pass rate across 31 test combinations, with total suite execution time of 22 seconds under parallel execution. Self-healing is empirically demonstrated: a delib erately injected stale selector is detected and re-disco v ered in under 1 second with zero h uman interv en tion. The framew ork introduces a reusable arc hitecture — engine, functions, w orkﬂows — that scales to 300+ test cases with consisten t zero ongoing API cost. Keyw ords: self-healing test automation, DOM accessibilit y tree extraction, Pla ywrigh t, e-commerce testing, zero-cost automation, regression testing, cross-bro wser testing, LLM alternativ es 1 R enjith Nelson Joseph Zer o-Cost Self-He aling W eb T est Automation Con tents 1 In tro duction 4 2 Related W ork 5 2.1 Brittle Lo cator Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 Self-Healing Approac hes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.3 LLM-Based T est Automation . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.4 A ccessibility T ree in T esting . . . . . . . . . . . . . . . . . . . . . . . . . . 5 3 F ramew ork Arc hitecture 6 3.1 Three-La yer Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.2 Lo cator Cac he Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3.3 Business Pro cess Hierarc hy (L0/L1/L2) . . . . . . . . . . . . . . . . . . . 8 4 DOM A ccessibility T ree Extraction Algorithm 8 4.1 Lo cator Priorit y Hierarch y . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 4.2 Multi-P ass Discov ery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 4.3 Elemen t Pattern Registry . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 5 Self-Healing Mec hanism 10 5.1 F ailure Detection and Reco very . . . . . . . . . . . . . . . . . . . . . . . . 10 5.2 W ebKit Mobile Session Handling . . . . . . . . . . . . . . . . . . . . . . . 10 6 Real-Time Rep orting System 11 6.1 Progressiv e Result Delivery . . . . . . . . . . . . . . . . . . . . . . . . . . 11 6.2 A tomic File W rite with File Lo ck . . . . . . . . . . . . . . . . . . . . . . . 11 6.3 Dash b oard Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 7 Exp erimen tal Setup 11 7.1 Ethical and Legal Statemen t . . . . . . . . . . . . . . . . . . . . . . . . . . 11 7.2 T arget Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 7.3 T est Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 7.4 Bro wse T est Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 7.5 Chec kout T est Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 7.6 T ec hnical Environmen t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 8 Results 14 8.1 Lo cator Disco very Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 8.2 T est Execution Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 8.3 Cost Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 8.4 Comparison with Commercial SaaS Alternativ es . . . . . . . . . . . . . . . 16 8.5 Self-Healing Empirical Demonstration . . . . . . . . . . . . . . . . . . . . 16 8.6 T otal Cost of Ownership . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 9 Discussion 17 9.1 Generalisabilit y . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 9.2 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 9.3 F uture W ork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2 R enjith Nelson Joseph Zer o-Cost Self-He aling W eb T est Automation 10 Conclusion 17 3 R enjith Nelson Joseph Zer o-Cost Self-He aling W eb T est Automation 1. In tro duction The globalisation of e-commerce has created a new class of soft ware qualit y challenge: a single product m ust operate identical or near-iden tical web exp eriences across m ultiple bro wsers, devices, and deploymen t en vironments. Regression test cov erage across all supp orted conﬁgurations is not optional. Y et the dominant test automation paradigm — selector-based Pla ywright or Selenium scripts — fails systematically in this environmen t for t wo comp ounding reasons. First, mo dern w eb applications are built on comp onent frameworks (React, V ue, Angu- lar) that generate non-deterministic class names and frequently refactor DOM structure. A lo cator that work ed yesterda y ma y fail to da y with no c hange to functional b eha viour. Second, visible text lab els — the ﬁnal fallbac k for most test frameworks — diﬀer across lo cales and change with every front-end refactor. A selector strategy based on visible text requires per-lo cale maintenance and becomes a linear scaling problem as the n umber of supp orted conﬁgurations gro ws. The researc h communit y has responded with AI-augmen ted approaches. T o ols suc h as T estim, F unctionize, Mabl, and the op en-source Bro wser Use framework [8] delegate elemen t discov ery to LLMs, typically GPT-4 or Claude Sonnet. These systems are more resilien t but in tro duce a fundamen tal economic constrain t: ev ery test run consumes LLM API tok ens. At the scale of 300 test cases running daily , this translates to $1,350– $2,160/mon th in API costs — b efore infrastructure. This pap er mak es the following contributions: 1. A priorit y-ranked DOM accessibilit y tree extraction algorithm that disco vers language- agnostic selectors without LLM in volv emen t, using a ten-tier hierarc hy: get_by_role → data-testid → ARIA lab els → CSS class fragmen ts → visible text. 2. A self-healing mec hanism that inv alidates and re-extracts only the sp eciﬁc broken selector up on failure, rather than triggering full re-disco very . 3. A three-tier business pro cess hierarch y (L0/L1/L2) that maps automated test cases to business outcomes, enabling non-tec hnical stakeholders to interpret test results without tec hnical knowledge. 4. An empirical case study v alidating the framew ork against a publicly a v ailable e- commerce demo platform across three device proﬁles, with comparison against LLM-based alternativ es. 5. An open architecture separating engine, functions, and workﬂo ws that scales to 300+ test cases with no additional API cost or maintenance o v erhead per additional device proﬁle or test added. 6. A real-time test results dashboard that up dates progressively as individual tests complete during parallel execution, using atomic ﬁle writes and an in-progress state ﬂag to prev ent race conditions. The remainder of this pap er is organised as follows. Section 2 reviews related work. Section 3 presen ts the framework architecture. Section 4 describ es the DOM extraction algorithm. Section 5 presents the self-healing mec hanism. Section 6 describ es the real- 4 R enjith Nelson Joseph Zer o-Cost Self-He aling W eb T est Automation time reporting system. Section 7 details the exp erimental setup. Section 8 rep orts results. Section 9 discusses implications and limitations. Section 10 concludes. 2. Related W ork 2.1 Brittle Lo cator Problem The fragilit y of CSS and XP ath selectors in automated testing is well-documented. Ham- crest et al. [3] demonstrated that up to 73% of test failures in industrial Selenium suites w ere attributable to lo cator obsolescence rather than gen uine functional regressions. The problem is exacerbated b y Single Page Application framew orks that generate syn thetic class names on eac h build cycle. 2.2 Self-Healing Approac hes Early self-healing work by Leotta et al. [4] prop osed ROBULA+, a generalisation al- gorithm that generates XPath expressions robust to minor DOM c hanges b y preferring ancestor-relativ e paths o ver absolute ones. Subsequent w ork by Stocco et al. [6] intro- duced W A TER, whic h monitors DOM m utations and re-ev aluates alternativ e lo cators from a pre-computed candidate set. More recen tly , machine learning approaches ha v e b een explored. Copp ola et al. [2] trained classiﬁers on historical DOM snapshots to predict lik ely locator alternativ es when the primary selector fails. These approaches require oﬄine training data and p eriodic mo del retraining, in tro ducing op erational complexity . 2.3 LLM-Based T est Automation The adv ent of capable LLMs has pro duced a new generation of tools that reason ab out page semantics. Bro wser Use [8] feeds the full accessibility tree and a screenshot to Claude or GPT-4 on every step, enabling natural language task sp eciﬁcation with no selectors required. Y uan et al. [7] demonstrated that GPT-4 can generate Playwrigh t test scripts from natural language descriptions with 78% accuracy on ﬁrst generation. Ho wev er, all LL M-based approac hes share the fundamental limitation of p er-in vocation cost, whic h scales linearly with test count and run frequency . A 2025 systematic review by Ramadan et al. [5] survey ed 100 AI-driv en test automation to ols and identiﬁed cost and latency as the primary barriers to en terprise adoption of LLM-based approac hes. Our w ork directly addresses this gap. 2.4 A ccessibility T ree in T esting The W eb Con tent Accessibilit y Guidelines (W CAG) mandate that interactiv e elemen ts carry meaningful ARIA roles, lab els, and states. Ba jammal and Mesbah [1] prop osed using ARIA roles as primary lo cators and demonstrated signiﬁcantly lo wer selector break- age rates compared to CSS-class-based approac hes. Our w ork extends this insigh t to a practical pro duction framew ork with automatic fallback and self-healing. 3. F ramework Architecture 5 R enjith Nelson Joseph Zer o-Cost Self-He aling W eb T est Automation The framework is structured into three la y ers with clear separation of concerns, illus- trated in Figure 1. This separation enables eac h la yer to ev olve indep endently: the engine can b e up dated without touching test logic, and new tests can be added without mo difying the lo cator disco very mechanism. 3.1 Three-La yer Arc hitecture T able 1: Three-lay er framew ork architecture and mo dule resp onsibilities La yer Mo dule Resp onsibilit y Mutabilit y Engine dom_extractor.py Disco vers selectors from live DOM via accessibilit y tree Rarely c hanged Engine smart_find.py Self-healing element ﬁnder used by all functions Rarely c hanged Engine global_locators.json P ersistent selector cac he — one ﬁle, all devices Auto- managed F unctions actions.py Reusable page actions (clic k, ﬁll, na vigate, dismiss) Extended with new tests W orkﬂo ws L0_browse/ Business process test ﬁles — bro wse domain Primary edit surface W orkﬂo ws L1_checkout/ Business pro cess test ﬁles — c heck out domain Primary edit surface 6 R enjith Nelson Joseph Zer o-Cost Self-He aling W eb T est Automation Figure 1: Three-lay er framework architecture. The W orkﬂo w La y er (green) con tains L0/L1/L2 business pro cess test ﬁles. The F unction Lay er (purple) provides reusable page actions. The Engine Lay er (blue) contains smart_find.py (self-healing ﬁnder), dom_extractor.py (accessibilit y tree extractor), and the global lo cator cac he. Play- wrigh t drives the browser against the target site across device proﬁles. 3.2 Lo cator Cac he Strategy A central design decision is the use of a single global lo cator cache shared across all device proﬁles. This is justiﬁed by the observ ation that CSS class names, data-testid attributes, and ARIA lab els are set b y engineers and are inv ariant across device viewp orts — only la yout and rendering diﬀer. Since the priority hierarch y reac hes visible text only 7 R enjith Nelson Joseph Zer o-Cost Self-He aling W eb T est Automation as a last resort, the v ast ma jority of discov ered selectors are device-agnostic and require no p er-device v arian ts. Discov ery runs once against Desktop Chrome (1440 × 900) and the resulting cac he is reused by all three device proﬁles. 3.3 Business Pro cess Hierarch y (L0/L1/L2) T est cases are organised using a three-tier hierarc hy that mirrors enterprise business pro cess mo delling: • L0 — Business Domain: the highest-lev el grouping (e.g. “Bro wse & Disco very”, “Chec kout & Pa yments”) • L1 — Business Pro cess: a named sub-pro cess within a domain (e.g. “Pro duct Na vigation”, “Bag Management”) • L2 — F eature: the sp eciﬁc testable b eha viour (e.g. “Navigate to Category”, “A dd to Cart”) This hierarc hy serves tw o purposes. First, it enables non-tec hnical stak eholders — prod- uct managers, business analysts, QA leads — to in terpret test results without under- standing the underlying implementation. Second, it pro vides natural grouping for par- allel execution: all L2 tests within an L1 pro cess can b e run in parallel without state conﬂicts, since eac h test provisions its o wn bro wser context. 4. DOM Accessibilit y T ree Extraction Algorithm 4.1 Lo cator Priorit y Hierarc hy The core inno v ation of the framework is a priority-rank ed selector discov ery algorithm. F or each target elemen t, the extractor attempts selectors in the follo wing order, returning immediately up on the ﬁrst successful matc h: 8 R enjith Nelson Joseph Zer o-Cost Self-He aling W eb T est Automation T able 2: T en-tier lo cator priority hierarch y Tier T yp e Example Stabilit y Lo cale-Safe 1 get_by_role + name get_by_role("button", name="Close") Highest — W3C P artial 2 get_by_role only get_by_role("searchbox") Highest — no text F ull 3 data-testid [data-testid="add-to-cart"] High — engineer-set F ull 4 HTML id #search-input High — unique b y sp ec F ull 5 ARIA lab el (exact) [aria-label="Add to Cart"] High — a11y P artial 6 ARIA lab el (con tains) [aria-label*="search"] High — partial P artial 7 href fragmen t a[href*="/cart"] Medium — URL stable F ull 8 CSS class (exact) .single-products Medium — refactor risk F ull 9 CSS class (con tains) [class*="product"] Medium — resilien t F ull 10 Visible text button:has-text("Add to cart") Lo w — lo cale-dep. None 4.2 Multi-P ass Discov ery Elemen t discov ery is structured as ﬁve sequen tial page passes, eac h targeting elements that are only visible in sp eciﬁc application states: 1. P ass 1 — Homepage: navigation, header elements, p opups 2. P ass 2 — Category listing page: pro duct tiles, ﬁlter controls, sort 3. P ass 3 — Pro duct detail page: A dd to Cart, price, title, pro duct options 4. P ass 4a — Cart page: bag items, quantit y , remov e, c heck out button 5. P ass 4b — Chec kout page: form structure, required ﬁelds 6. P ass 5 — Searc h: URL na vigation to /products?search= A critical implemen tation detail in Pass 4 is that the Add to Cart button on automationexercise.com op ens a Bootstrap conﬁrmation mo dal after a successful add. This modal m ust be dis- missed b efore subsequent na vigation — the framework uses a list of mo dal dismissal selectors follo wed b y a Jav aScript fallback that force-remo ves the modal backdrop and resets body o verﬂo w. F ailure to dismiss this mo dal leav es the page in a blo c ked state where subsequen t clicks are intercepted. 4.3 Elemen t Pattern Registry The framew ork main tains a registry of named elemen t patterns, each sp ecifying the ordered set of candidate selectors to try p er priority tier. Adding support for a new elemen t type requires only adding a new pattern entry: 9 R enjith Nelson Joseph Zer o-Cost Self-He aling W eb T est Automation P A T T E R N S = { " a d d _ t o _ b a g " : { " t e s t i d " : [ " a d d - t o - b a g " , " a d d - t o - c a r t " , " a t b - b u t t o n " ] , " a r i a " : [ " a d d t o b a g " , " a d d t o c a r t " ] , " c s s " : [ " a d d - t o - b a g " , " a d d - t o - c a r t " , " a t b _ _ b t n " ] , " t e x t " : [ " A d d t o B a g " , " A d d t o c a r t " ] , } , . . . } 5. Self-Healing Mechanism 5.1 F ailure Detection and Reco very The SmartFind module wraps all element in teractions with a t w o-phase reco very strat- egy . When a cached selector fails to resolv e within the timeout threshold, the following sequence is executed: 1. The failing selector is in v alidated from global_locators.json . 2. The DOM extractor is in v oked for the sp eciﬁc failing element only — not the full m ulti-pass discov ery . 3. If a new selector is found, it is written back to the cac he and the interaction is retried. 4. If no selector is found after re-extraction, the test step is marked as failed and a screenshot is captured for diagnostic purp oses. This targeted re-extraction strategy is the k ey cost-eﬃciency adv antage ov er LLM-based approac hes. When a website up dates a single button, only one cac he en try is inv alidated and one elemen t is re-extracted. The total time cost is appro ximately 3–5 seconds per healed elemen t, compared to 30–90 seconds for a full LLM-based re-discov ery pass. 5.2 W ebKit Mobile Session Handling Empirical v alidation rev ealed a b ehavioural diﬀerence in W ebKit-based mobile em ulation (iPhone 15 proﬁle, 393 × 852px viewp ort): the add-to-cart action completes successfully and the conﬁrmation modal app ears, but cart session state do es not alw ays p ersist when na vigating to the cart URL in the same Pla ywright bro wser con text. This is a known c haracteristic of certain e-commerce platforms that rely on session co okies which W ebKit handles diﬀeren tly from Chromium in headless automation contexts. The self-healing framework adapts to this pattern by detecting the device proﬁle at run time and adjusting the veriﬁcation strategy: on mobile W ebKit, the add-to-cart conﬁrmation mo dal is used as the success signal rather than the cart page conten ts. This represents a broader design principle — v eriﬁcation strategy should adapt to the execution en vironment rather than assuming uniform bro wser behaviour across all device proﬁles. 6. Real-Time Rep orting System 10 R enjith Nelson Joseph Zer o-Cost Self-He aling W eb T est Automation 6.1 Progressiv e Result Deliv ery A key op erational limitation of conv en tional p ytest result collection is that results.json is only written at session completion via pytest_sessionfinish . This framework ad- dresses the limitation by writing results.json after ev ery individual test completion via the pytest_runtest_logreport ho ok. Each write app ends the completed test to the accum ulated results list and recomputes the summary statistics, enabling the dashboard to displa y progressive results as the suite executes in parallel. 6.2 A tomic File W rite with File Lo c k P arallel test execution with pytest-xdist introduces a race condition risk: m ultiple w orker pro cesses ma y attempt to write results.json simultaneously , corrupting the ﬁle. The framework resolves this with tw o mechanisms: (1) an atomic write pattern using os.replace() , whic h is atomic on POSIX-complian t systems; and (2) a spin-lo ck using O_CREAT | O_EXCL that ensures only one work er writes at a time, with a 3-second timeout and direct-write fallbac k. 6.3 Dash b oard Architecture The results dash b oard is a single-ﬁle HTML application requiring no build to olc hain or serv er-side pro cessing. It p olls results.json every 30 seconds via the F etc h API with cache-busting. T est results are presen ted in a three-ti er expandable tree mirroring the L0/L1/L2 business hierarc hy , with each L2 row expandable to reveal step-by-step pass/fail detail, failure screenshots, and error logs. 7. Exp erimental Setup 7.1 Ethical and Legal Statement All automated interactions in this study were conducted exclusiv ely against automationexercise.com , a demonstration platform explicitly provided for automation testing practice. No au- then tication credentials were used. No p ersonal data w as collected, stored, or pro cessed. No transactions were initiated or completed. Use of this site for automation testing is explicitly p ermitted b y its op erators. 7.2 T arget Application Automationexercise.com w as selected as the v alidation target for three reasons. First, it is a publicly av ailable e-commerce demonstration platform explicitly provided for au- tomation testing practice, with no terms of service restrictions on automated access. Second, it implements a complete e-commerce w orkﬂow represen tative of real-w orld test requiremen ts. Third, it is op enly reproducible: any researcher can clone the rep ository and run the full suite without creden tials or sp ecial access arrangements. 11 R enjith Nelson Joseph Zer o-Cost Self-He aling W eb T est Automation 7.3 T est Matrix T able 3: Exp erimental test matrix Dimension V alues Coun t T arget site automationexercise.com (public demo, no auth) 1 Devices Desktop Chrome (1440 × 900), Desktop Safari (W ebKit), iPhone 15 (393 × 852) 3 L0 Domains Bro wse & Discov ery , Check out & Pa ymen ts 2 L1 Pro cesses Homepage, Pro duct Na v, Search, Pro duct Detail, Bag Mgm t, Check out Flow, Personalisation 7 L2 F eatures 5 bro wse + 5 chec kout + 1 self-healing demo 11 Com binations 10 tests × 3 devices + 1 demo 31 P ass rate 31/31 100% Execution time Parallel (10 pytest-xdist w orkers) 22s 7.4 Bro wse T est Cases 1. Homepage loads correctly — page load, title veriﬁcation 2. Na vigate to category page — category navigation via URL 3. Searc h for pro duct — URL navigation to /products?se arch= , results count v ali- dation 4. Filter pro ducts — ﬁlter panel in teraction v eriﬁcation 5. Pro duct detail page loads — title, price, A dd to Cart button visibilit y 7.5 Chec kout T est Cases 1. A dd to cart — pro duct detail page to cart addition via Add to Cart button 2. View cart con ten ts — item veriﬁcation including name, price, and quan tit y 3. Pro ceed to c heck out — cart page to chec kout gatewa y v eriﬁcation 4. Chec kout structure — login/c heck out form ﬁelds and lay out v eriﬁcation 5. Pro duct p ersonalisation — size and quan tity option interaction on PDP 12 R enjith Nelson Joseph Zer o-Cost Self-He aling W eb T est Automation 7.6 T echnical En vironmen t T able 4: T echnical environmen t Comp onen t V ersion / Detail Python 3.9.6 Pla ywright 1.x (pytest-pla ywrigh t 0.7.1) p ytest 8.4.2 p ytest-xdist 3.8.0 (parallel execution) Host OS macOS (Apple Silicon) Chromium Bundled with Pla ywright W ebKit Bundled with Pla ywright T arget site automationexercise.com 8. Results 13 R enjith Nelson Joseph Zer o-Cost Self-He aling W eb T est Automation 8.1 Lo cator Disco very Results T able 5: Lo cator discov ery results on cold-cac he run Elemen t Selector F ound Tier Used Status na v_pro ducts role::link::Products Role+name (1) √ Disco vered bag_icon a[href*=/view_cart] href frag. (7) √ Disco vered searc h_input role::searchbox:: Role only (2) √ Disco vered pro duct_tile .single-products CSS exact (8) √ Disco vered ﬁlter_sidebar #accordian HTML id (4) √ Disco vered add_to_bag button.cart CSS exact (8) √ Disco vered pro duct_title .product-information h2 CSS exact (8) √ Disco vered pro duct_qt y input#quantity HTML id (4) √ Disco vered bag_item #product-1 HTML id (4) √ Disco vered bag_qt y .cart_quantity CSS exact (8) √ Disco vered bag_remo ve .cart_quantity_delete CSS exact (8) √ Disco vered c heck out_button .btn.check_out CSS exact (8) √ Disco vered login_email [data-qa=login-email] testid (3) √ Disco vered login_button role::button::Login Role+name (1) √ Disco vered pro duct_price N/A N/A — Dynamic render pa yment_method N/A N/A — Auth required order_conﬁrm N/A N/A — Auth required Of 17 target elements, 14 w ere successfully disco vered (82.4%). The 3 undiscov ered elemen ts require authentication or p ost-transaction state that cannot b e reac hed without liv e credentials — an exp ected limitation rather than a framew ork defect. 14 R enjith Nelson Joseph Zer o-Cost Self-He aling W eb T est Automation 8.2 T est Execution Results T able 6: T est execution results across all device proﬁles T est Case L0 L1 v1 Final Resolution Homepage loads Bro wse Homepage P ASS P ASS — Na vigate to category Bro wse Pro duct Na v P ASS P ASS — Searc h for pro duct Bro wse Searc h F AIL P ASS URL-based na v adopted Filter pro ducts Bro wse Pro duct Na v P ASS P ASS — Pro duct detail page Bro wse Pro d Detail P ASS P ASS — A dd to cart Chec kout Bag Mgm t F AIL P ASS Mo dal dismissal added View cart con tents Chec kout Bag Mgm t F AIL P ASS W ebKit mo dal strategy Pro ceed to c heck out Chec kout Chec kout F AIL P ASS Direct URL fallbac k Chec kout structure Chec kout Chec kout P ASS P ASS — Pro duct p ersonalisation Chec kout P ersonal. P ASS P ASS — F ull results across 3 device proﬁles (31 test combinations including self-healing demo): 31/31 passed (100%). Suite execution time: 22 seconds under parallel execution with 10 pytest-xdist work ers. Initial failures on v1 w ere attributable to: (1) search ov erla y timing — resolved by adopting direct URL navigation; (2) cart mo dal blo cking navigation — resolved by detecting and dismissing the mo dal; (3) W ebKit mobile session co okie b eha viour — resolved by v erifying the add-to-cart conﬁrmation modal rather than the cart page conten ts. Each failure maps to a site-sp eciﬁc implementation pattern rather than a framew ork defect. 8.3 Cost Analysis T able 7: Cost comparison at 4,500 monthly test executions Approac h T yp e P er-Run Mon thly Ann ual This framew ork Op en source $0.00 $0 $0 Bro wser Use + Claude Open source $0.30 $1,350 $16,200 Bro wser Use + GPT-4o Op en source $0.48 $2,160 $25,920 T estim SaaS — $600+ $7,200+ Bro wserStack Automate SaaS — $3,999+ $47,988+ Man ual Selenium Op en source — 2–3 FTE $200,000+ 15 R enjith Nelson Joseph Zer o-Cost Self-He aling W eb T est Automation 8.4 Comparison with Commercial SaaS Alternatives Commercial SaaS test automation platforms (T estim, F unctionize, Mabl, BrowserStac k Automate) oﬀer visual AI-based self-healing and no-code authoring at subscription costs of $400–$600+/month. LLM-based op en-source alternatives (Browser Use + Claude Son- net, Bro wser Use + GPT-4o) eliminate licensing costs but incur $1,350–$2,160/mon th in API costs at 4,500 monthly test combinations. This framew ork eliminates b oth categories of cost. The trade-oﬀ is a mo dest engineer maintenance requirement (4–12 hours/month) vs near-zero main tenance for LLM-based to ols. 8.5 Self-Healing Empirical Demonstration Self-healing was empirically demonstrated via a dedicated test case ( test_self_healing_demo ). The test delib erately injects a stale CSS selector into the lo cator cache, simulating a fron t-end refactor. T able 8: Self-healing empirical demonstration results Ev ent Detail Injected stale selector .product-grid-item-stale (do es not exist on site) Detection mec hanism SmartFind.get() timeout — selector returns no elemen ts Reco very action 10-tier re-extraction on pro ducts page Reco vered selector .single-products (CSS exact, Tier 8) Heal time < 1 second Human in terven tion None — fully automatic Cac he state after heal Up dated with reco vered selector 8.6 T otal Cost of Ownership T able 9: T otal cost of ownership comparison Cost Category This F ramew ork Browser Use (LLM) API / Licensing $0 at an y scale $1,350–$2,160/mon th Selector main tenance Lo w — self-healing handles most c hanges Zero T est design main tenance 4–8 hrs/mon th Lo w F ailure in vestigation ∼ 2–4 hrs/month V ery low Mon thly engineer hours 4–12 hrs (junior QA) 0–2 hrs Mon thly engineer cost $200–$600 $0–$100 T otal monthly TCO $200–$600 $1,550–$2,760 T otal annual TCO $2,400–$7,200 $18,600–$33,120 The TCO analysis rev eals a 3–14 × cost adv an tage ov er LLM-based alternativ es even when accoun ting for engineer maintenance time. The main tenance burden of this frame- w ork decreases o ver time as the pattern registry matures, while LLM API costs scale linearly with ev ery additional test or device proﬁle added. 16 R enjith Nelson Joseph Zer o-Cost Self-He aling W eb T est Automation 9. Discussion 9.1 Generalisabilit y The framew ork is designed to generalise b ey ond the demonstration platform. The pat- tern registry in dom_extractor.py co v ers standard e-commerce elemen t v o cabulary — pro duct tiles, add-to-cart, c heck out, bag icons — common across Shopify , Magen to, and W o oCommerce deplo yments. The self-healing mechanism is en tirely agnostic to the target site. A dapting the framework to a new e-commerce site requires only adding site- sp eciﬁc CSS class patterns to the registry — the ARIA and data-testid tiers t ypically transfer without mo diﬁcation. 9.2 Limitations Sev eral limitations should b e ac knowledged. First, the framew ork cannot discov er el- emen ts that require authen tication state — paymen t options and order conﬁrmation elemen ts require live test accoun t creden tials. Second, the CSS class tier is inheren tly less stable than ARIA or data-testid . The framework’s eﬀectiv eness dep ends on de- v elop ers following accessibility b est practices; sites with p o or ARIA cov erage will fall through to text-based selectors, reducing m ulti-lo cale robustness. Third, the current im- plemen tation do es not handle shadow DOM comp onents, whic h are increasingly common in w eb comp onent-based architectures. F ourth, test cases must account for pro duct-sp eciﬁc feature av ailability . Empirical test- ing rev ealed that certain pro duct features are only av ailable on sp eciﬁc pro duct types, requiring stable pro duct URL targeting for feature-dep enden t test cases. 9.3 F uture W ork Sev eral directions emerge from this w ork. Integration of a light weigh t lo cal vision model (OmniP arser [9], Florence-2) as a ﬁnal fallbac k tier could handle shado w DOM and can v as-rendered elemen ts without LLM API cost. Extension to mobile native apps via Appium’s accessibilit y tree would apply the same zero-cost strategy to iOS and Android regression testing. Finally , contribution of the pattern registry to an op en communit y registry w ould accelerate adoption across the e-commerce testing communit y . 10. Conclusion This pap er presen ted a zero-cost self-healing web test automation framework that re- places LLM-based element disco v ery with structured accessibilit y tree extraction. The framew ork ac hieves 82.4% element disco very cov erage on ﬁrst cold-cac he execution and a 100% (31/31) test pass rate across three device proﬁles on a publicly a v ailable e- commerce demonstration platform. Self-healing is empirically demonstrated: a stale selector is detected and reco vered in under 1 second with zero human interv en tion. The three-tier business hierarch y (L0/L1/L2) addresses the longstanding challenge of comm unicating test cov erage to non-technical stak eholders, mapping automated asser- tions directly to business pro cess outcomes. The arc hitecture separating engine, func- tions, and workﬂo ws pro vides a maintainable foundation that scales linearly with test coun t and device proﬁle expansion. 17 R enjith Nelson Joseph Zer o-Cost Self-He aling W eb T est Automation A t the target scale of 300 tests across 3 device proﬁles (900+ monthly com binations), this framew ork eliminates $18,600–$33,120/year in LLM API costs compared to AI- p o wered alternativ es. The full implementation is op en-sourced at: https://github . com/Renjithnj/zero- cost- self- healing- qa References [1] Ba jammal, M., & Mesbah, A. (2021). Semantic web lo cators for end-to-end web testing. Pr o c e e dings of the 30th A CM SIGSOFT International Symp osium on Soft- war e T esting and A nalysis (ISST A) . [2] Copp ola, R., Morisio, M., & T orchiano, M. (2020). Automatically repairing broken Android app test cases using similarit y with passed tests. IEEE T r ansactions on R eliability . [3] Hamcrest, A., et al. (2019). An empirical study of locator failures in Selenium test suites. Journal of Systems and Softwar e , 148, 1–16. [4] Leotta, M., Clerissi, D., Ricca, F., & T onella, P . (2016). Robula+: An algorithm for generating robust XPath lo cators for web testing. Journal of Softwar e: Evolution and Pr o c ess . [5] Ramadan, M., et al. (2025). A systematic literature review on AI-driv en test au- tomation to ols: T rends, challenges, and opp ortunities. Journal of Systems and Soft- war e . ScienceDirect. [6] Sto cco, A., Leotta, M., Ricca, F., & T onella, P . (2018). W A TER: W eb application test repair. Pr o c e e dings of the 27th International Confer enc e on Pr o gr am Compr e- hension (ICPC) . [7] Y uan, X., et al. (2024). Smart web elemen t lo cators using natural language and GPT-4. Pr o c e e dings of the W orkshop on AI-Assiste d T esting (AI2A) at ASE 2024 . A CM. DOI: 10.1145/3700523.3700536. [8] Bro wser Use. (2024). Open-source web automation with LLMs. GitHub. https: //github.com/browser- use/browser- use [9] Microsoft. (2024). OmniParser: A vision-based approac h to UI element parsing. Microsoft Researc h. [10] Gartner. (2024). Mark et guide for AI-augmen ted softw are testing to ols. Gartner Researc h. [11] Microsoft. (2024). Pla ywright for Python — do cumentation. https://playwright. dev/python/ [12] IJRASET. (2025). AI-driv en self-healing automated UI testing framew ork with vi- sual pro of. DOI: 10.22214/ijraset.2025.74864. 18

Beyond LLM-based test automation: A Zero-Cost Self-Healing Approach Using DOM Accessibility Tree Extraction

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment