Despite the growing popularity of AI coding assistants, over 80% of machine learning (ML) projects fail to deliver real business value. This study creates and tests a Machine Learning Canvas, a practical framework that combines business strategy, software engineering, and data science in order to determine the factors that lead to the success of ML projects. We surveyed 150 data scientists and analyzed their responses using statistical modeling. We identified four key success factors: Strategy (clear goals and planning), Process (how work gets done), Ecosystem (tools and infrastructure), and Support (organizational backing and resources). Our results show that these factors are interconnected -each one affects the next. For instance, strong organizational support results in a clearer strategy (ฮฒ = 0.432, p < 0.001), which improves work processes (ฮฒ = 0.428, p < 0.001) and builds better infrastructure (ฮฒ = 0.547, p < 0.001). Together, these elements determine whether a project succeeds. The surprising finding? Although AI assistants make coding faster, they don't guarantee project success. AI assists with the "how" of coding but cannot replace the "why" and "what" of strategic thinking.
mpirical studies suggest that large language models (LLMs) can boost developer productivity by automating certain coding tasks. For example, developers using Copilot finish tasks up to 55% faster and experience a lighter cognitive workload [1]. This is particularly beneficial for less experienced programmers. Anthropic's analysis of approximately 500,000 coding sessions revealed that approximately 79% of interactions with Claude Code involve automation [2]. An empirical study demonstrated increased productivity in Python programming tasks; however, the researchers cautioned that effectiveness depends on task complexity and user expertise [3]. Furthermore, a large-scale field experiment conducted by the Bank for International Settlements revealed a 55% increase in productivity, with LLMs generating approximately one-third of the total lines of code [4]. While these findings affirm productivity improvements, they also imply an ongoing need for human oversight. Code generated by artificial intelligence (AI) addresses repetitive or well-defined programming activities rather than complex architectural decisions [5].
Although LLMs can accelerate code generation and reduce the cognitive burden of routine programming tasks, 1 Martin Prause, martin.prause@xinblue. de . these micro-level efficiency gains do not translate into macro-level project success. In a study of 65 experienced data scientists, it was found that over 80% of AI projects fail -twice the rate of traditional IT projects [6]. The authors identified five primary failure modes: (1) misalignment between technical objectives and business problems, (2) poor data quality and infrastructure, (3) attempting to solve problems that are beyond the current capabilities of AI, (4) insufficient organizational readiness, and (5) inadequate governance structures. These findings confirm earlier research indicating high failure rates in machine learning (ML) deployments [7] [8].
In response to these challenges, machine learning operations (MLOps) have emerged as a discipline that combines software engineering principles with machine learning-specific requirements [9]. A systematic literature review identifies MLOps as encompassing continuous integration and deployment, automated testing and validation, model versioning and governance, and production monitoring and maintenance [10]. Similarly, in another study the authors indicate that adoption success rates vary substantially depending on MLOps maturity. More mature firms demonstrate significantly better capabilities in terms of data management, automated deployments, and continuous integration and delivery [11].
These multifaceted challenges highlight the necessity of structured frameworks for planning, communicating, and executing AI and ML projects. The Business Model Canvas (BMC) approach, popularized by Osterwalder, has proven
The Machine Learning Canvas: Empirical Findings on Why Strategy Matters More Than AI Code Generation
Martin Prause E effective in creating shared mental models among diverse stakeholders [12]. Recent applications to AI contexts include the AI Model Canvas, which adapts Canvas principles to the specific requirements of machine learning [13]. However, existing Canvas frameworks often fail to address the full complexity of ML projects. They typically focus on either technical aspects such as data pipelines and model architecture, or business considerations, such as value propositions and cost structures, without effectively integrating both perspectives. Additionally, these frameworks inadequately address the dynamic nature of ML projects, which require flexible yet structured approaches to iterative experimentation [14].
These limitations are addressed by developing a Machine Learning Canvas (MLC), which integrates organizational theory, software engineering, and data science. An empirical study using a Structural Equation Modeling (SEM) approach was conducted to identify the success determinants of ML projects in development contexts where large language models serve as coding assistants.
A business model is a blueprint that aligns a company’s strategic objectives with its operational execution [15]. It explains how an organization creates, delivers, and captures value [16]. The hierarchical taxonomy identifies four fundamental dimensions of business models: 1. Value Proposition: What value is created, and for whom? 2. Value Architecture: How is value created through organizational capabilities? 3. Value Network: The ecosystem of relationships that enable value creation. 4. Value Finance: The economic model for capturing created value.
The Business Model Canvas operationalizes these dimensions through nine interconnected building blocks arranged on a visual canvas. This approach facilitates iterative development and enables stakeholders to maintain a holistic view while addressing specific components. ML projects share structural similarities with business models in that they focus on creating value from data by aligning multiple perspectives: business strategy, technical implementation, and operational integration. These characteristics necessitate a specialized framework that addresses the technical and organizational aspects of ML initiatives.
To examine the determinants of success for an ML project and develop a theoretically based model, we mapped the BMC dimensions onto the corresponding ML constructs. These are strategy, process, ecosystem, and support. Next, we identified independent measures for each dimension from literature.
The dependent variable is IT project success, which is measured using the Iron Triangle framework, consisting of performance goals, business objectives, and stakeholder satisfaction [17]. Although data scientists frequently focus on technical performance measures, such as AUC scores or model accuracy, the Iron Triangle uses a language that business stakeholders understand: time, cost, and quality. Despite their technical complexity, ML projects are fundamentally business investments that must deliver value within constraints. A model that achieves 99% accuracy becomes irrelevant if it is delivered six months late and exceeds the budget by double. The Iron Triangle captures this reality by measuring what stakeholders care about: whether the project delivers its promised functionality on time and within budget. The independent variables consist of different dimensions of the ML Canvas, which are measured on a 5point Likert scale ranging from “strongly agree” to “strongly disagree.” The SEM approach enables examination of how the four ML Canvas dimensions -Strategy, Process, Ecosystem, and Support -work together to influence project success. This methodology acknowledges that these dimensions are not directly observable, but rather latent constructs measured through multiple survey items.
The ML Strategy dimension translates business objectives into machine learning problems [18]. First, the task definition element requires mapping business problems to the appropriate learning paradigms to identify tasks suitable for machine learning [19]. This includes 1) distinguishing between prediction, exploration, and optimization problems, 2) assessing data availability, and 3) defining explainability requirements for the outcome [20].
Before starting any machine learning project, data specification is a necessary task. It specifies the data sources and modalities and distinguishes between structured and unstructured data and how to store and process them [21]. Quality considerations now extend beyond traditional database integrity to include representativeness, temporal stability, and potential biases that could compromise the model’s validity [22]. Recent work emphasizes that data cascades, or compounding errors from poor data quality, are a primary cause of ML project failures [23].
The target definition connects technical performance metrics to business value creation. This addresses the hidden technical debt in machine learning systems [7]. Quantitative metrics should encompass not only accuracy or F1 scores, but also computational efficiency, latency constraints, and robustness to distribution shifts [24]. Qualitative considerations, such as interpretability requirements driven by regulatory frameworks like the GDPR and constraints that prevent discriminatory outcomes, have gained equal importance [25] [26]. The emergence of responsible AI frameworks requires explicit consideration of transparency, accountability, and auditability throughout the ML lifecycle [27].
These conceptual elements operationalize the strategy dimension by turning it into measurable items for our empirical analysis.
Strategy Dimension ST1_TARGET The success measures of the ML project are clearly defined.
The tasks in the ML project are clearly defined.
The requirements of the ML project, such as data and model constraints (explainability, space, inference time, โฆ) are clearly defined.
The ML Process dimension uses an adapted CRISP-DM framework to transform raw data into deployable models. This framework acknowledges the iterative and nondeterministic nature of machine learning development [28].
Extraction-transform-load (ETL) processes are required for data collection to address big data’s “5 V’s”: volume, variety, velocity, veracity, and value [29]. Volume encompasses storage and governance architectures that balance accessibility and security, as well as distributed systems that ensure scalability without sacrificing consistency [30]. Data lakes must accommodate both structured and unstructured data while maintaining traceability and version control. Variety requires integration pipelines that reconcile heterogeneity, such as relational schemas, unstructured text, and temporal streams, through semantic mapping and schema matching [31]. Robust encoding strategies are required for this: categorical variables need one-hot, target, or embedding approaches, which affect interpretability and dimensionality; text requires vectorization and embedding; and temporal features need cyclical encoding for seasonal patterns. Velocity introduces temporal complexity, where batch and stream processing coexist with different constraints on feature computation and model updating [32]. Standardization and normalization impact model convergence and offer different trade-offs in terms of outlier sensitivity and information preservation. Quality assessment encompasses ML-specific considerations, such as label noise tolerance, feature drift detection, and representation sufficiency [33]. Multidimensional qualityincluding completeness, consistency, timeliness, and relevance -requires composite metrics that reflect downstream performance rather than isolated characteristics [34].
Data preparation consumes up to 80% of project effort [35]. The cleansing process must balance information preservation with noise reduction. Outlier removal, for example, may eliminate rare but legitimate patterns that are crucial for predicting rare events [36]. Modern approaches use weak supervision and programmatic labeling to expand the reach of human expertise. These approaches encode domain knowledge as labeling functions rather than manual annotations [35]. Feature engineering is an important interface between domain knowledge and statistical learning. Practitioners must balance the automatic feature learning capabilities of automated machine learning (AutoML) or deep learning architectures with the interpretability and efficiency of handcrafted features using statistical methods or visual exploration of feature importance [37]. The feature selection and transformation pipeline embody increasing levels of abstraction. Selection methods (filter, wrapper, and embedded) optimize existing feature subsets. Extraction techniques (e.g., dimensionality reduction) discover latent representations. Transformations (e.g., normalization, discretization, and encoding) ensure algorithmic compatibility [38].
Algorithm selection is more than just finding a good algorithm that hits the performance targets. Formalized as the “No Free Lunch” theorem: no single algorithm dominates across all problem domains [39]. This fundamental constraint requires a portfolio approach, in which the choice of algorithms reflects the intersection of the characteristics of the problem (e.g., data distribution, noise levels, and feature types), operational constraints (e.g., latency, memory, and explainability), and organizational capabilities (e.g., expertise and maintenance capacity).
The rise of neural architecture search and AutoML has shifted the challenge from selecting among fixed architectures to navigating large configuration spaces, albeit sometimes at the cost of interpretability [40]. Transfer learning introduces a paradigm shift from learning from scratch to adapting knowledge. In this approach, pre-trained representations from large-scale datasets serve as initialization points, reducing sample complexity for subsequent tasks [41]. Ensemble methods operationalize the “wisdom of crowds” principle by achieving robustness through diversity via bagging’s variance reduction, boosting’s bias reduction, or stacking’s hierarchical combination [42].
The training and evaluation processes must balance multiple competing objectives, including statistical validity, computational efficiency, and business relevance. The strategy for splitting the training data includes random partitioning, as well as temporal splits for time-series data, stratified sampling for imbalanced classes, and grouped splits for hierarchical data structures. Each of these strategies preserves different aspects of the data-generating process. Designing the objective function is an important step in translating business goals into mathematical optimization. Surrogate losses approximate true business metrics while maintaining differentiability and computational tractability [43]. Modern evaluation frameworks emphasize robustness across multiple dimensions, such as performance stability across data subgroups (slice-based evaluation), degradation under distribution shift (stress testing), and consistency across random seeds (variance analysis).
Performance optimization operates within the constraint triangle of model capacity, training data, and computational resources. Improvements in one area often require sacrifices in the others. Error analysis has evolved from merely inspecting a confusion matrix to using sophisticated frameworks such as error analysis loops and counterfactual debugging. These frameworks systematically probe model failures to identify fixable patterns [44]. The bias-variance decomposition theory provides a foundation for optimization strategies, such as addressing high bias by increasing model capacity or feature engineering and combating high variance by using regularization, data augmentation, or ensemble methods [45]. Hyperparameter optimization has advanced from grid search to Bayesian optimization methods that model the response surface. Recent multi-fidelity optimization advances have dramatically reduced computational costs [46].
Risk assessments in ML projects focus on specific failure modes that are unique to learning approaches [47]. These failure modes include data and algorithm bias, as well as model vulnerability and security. Data bias can manifest through historical prejudices encoded in the training data, biases in the selection of the data, and biases in the definition of the features. Each of these requires different mitigation strategies, ranging from reweighting to causal debiasing [48]. Algorithmic bias emerges from the inductive biases inherent in learning algorithms, such as the preference for simple hypotheses in regularized models, locality bias in nearest neighbor methods, and hierarchical bias in deep networks. These biases can amplify or counteract data biases in complex ways [49] [50]. Adversarial vulnerabilities expose ML systems’ susceptibility to malicious inputs; imperceptible perturbations can cause catastrophic misclassifications. This calls for defensive strategies ranging from adversarial training to certified robustness [51]. The intersection of these risks creates compound vulnerabilities, such as biased training data enabling targeted adversarial attacks or algorithmic preferences creating predictable failure modes. This requires holistic risk frameworks that consider interactions rather than isolated threats [52].
The process dimension is operationalized by key machine learning activities. We distinguish between preprocessing activities, such as data collection, cleaning, and feature engineering, and postprocessing activities, such as algorithm selection, training, evaluation, optimization, and risk assessment. We recognize that these phases require different skill sets and organizational competencies.
The ML Ecosystem dimension represents the sociotechnical infrastructure that transforms ML experiments into production systems. Although ML algorithms comprise only 5% of real-world ML systems, the supporting infrastructure determines whether models provide value or remain costly experiments [53]. This ecosystem must reconcile competing demands: the exploratory nature of ML development, which requires flexibility and rapid iteration, and production requirements, which demand reliability, reproducibility, and governance.
The infrastructure architecture for ML projects has evolved from ad hoc assemblies to sophisticated platforms. The foundational layer integrates various data sources, including operational databases, data warehouses, streaming platforms, and application programming interfaces [54]. The compute layer’s specialized hardware, ranging from GPUs to tensor-optimized TPUs, poses challenges for workload scheduling. Security layers address ML-specific concerns, such as differential privacy for training data, defenses against model extraction, and regulatory audit trails [55]. Although cloud platforms offer elastic scaling, preconfigured services, and access to cutting-edge hardware, data gravity, regulatory constraints, and latency requirements require hybrid architectures or even on-premises resources [56] [57]. Cost optimization requires nuanced strategies, such as using spot instances for training, and reserved capacity for inference. Multi-cloud strategies are a way to avoid vendor lock-in and balance risk mitigation with operational complexity [58].
The ecosystem of development tools for ML requires integration across statistical computing environments, software engineering toolchains, and domain-specific platforms. Programming language choices embody fundamental trade-offs. For example, there is a trade-off between Python’s ecosystem dominance and ease of use and the performance advantages of compiled languages. This often leads to polyglot architectures where Python serves as an orchestration layer [59]. ML frameworks themselves represent different philosophies: TensorFlow’s productionfirst approach with static graphs; PyTorch’s researchfriendly, dynamic computation; and JAX’s functional programming paradigm, which enables novel optimization strategies. Experiment tracking and versioning tools address the reproducibility crisis in ML, capturing not just code, but also hyperparameters, random seeds, and the computational environment [60]. Collaboration platforms bridge the gap between the notebook-based exploration favored by data scientists and the IDE-based development preferred by engineers while enabling peer review [61]. Monitoring and debugging tools face unique ML challenges. For example, detecting data drift, model degradation, and fairness violations requires statistical sophistication that goes beyond traditional application performance monitoring [62].
Production integration is the “almost last mile” problem in the ML project, where promising prototypes often fail to deliver value due to engineering challenges. Technical integration requires careful API design to handle the inherent uncertainty of ML predictions, versioning strategies to enable gradual rollout and rollback, and containerization approaches [63]. Scalability challenges manifest differently throughout the ML lifecycle. Training scalability is achieved through data and model parallelism. Serving scalability is achieved through batching and caching strategies. Feature computation scalability is achieved through incremental processing [64]. Model monitoring involves tracking key metrics and incorporating behavioral drift detection via statistical tests, adversarial input detection via uncertainty quantification, and fairness monitoring across demographic groups [65]. The democratization of ML via low-code platforms and AutoML services promises to increase accessibility while raising concerns about the trade-off between abstraction and control and the potential for creating “black box” infrastructure that mirrors the complexity of the models themselves [66].
The support dimension addresses the “organizational learning” imperative in AI adoption -the recognition that technical excellence alone cannot guarantee ML project success without corresponding organizational capabilities and governance structures [67]. It encompasses managerial and leadership practices that bridge the “deployment gap” between ML potential and realized business value. This gap claims up to 87% of data science projects before production deployment [68].
Managing projects for ML initiatives requires finding a balance between the exploratory nature of data science with organizational demands for predictability and control. ML projects require hybrid teams that combine domain expertise, statistical knowledge, engineering capabilities, and business acumen [69]. To accommodate ML’s inherent unpredictability, development methodologies must adapt agile principles, leading to approaches like “ML Sprint,” which embeds experimentation cycles within broader delivery frameworks [70].
For ML project success, organizational leadership demands transformation across multiple dimensions of corporate capability. Stakeholder engagement must address the “AI trust deficit,” which is the intersection of public skepticism about AI’s fairness and transparency and employee fears about automation and job displacement. This requires communication strategies that emphasize augmentation over replacement and transparency over secrecy [71]. Performance measurement must also capture business value. This requires “AI transformation playbooks,” which define success through operational improvements, customer satisfaction, and strategic positioning rather than model accuracy and quantitative performance alone [72]. Organizations must develop “AI fluency,” or the organizational capacity to identify ML opportunities, manage ML projects, and integrate ML outputs into decision-making processes [73].
The intersection of project management and organizational leadership is evident in governance structures that balance innovation with responsibility. ML governance frameworks must address unique challenges that are not present in traditional IT governance. These challenges include model lifecycle management, versioning and deprecation policies, fairness and bias monitoring across protected attributes, explainability requirements for high-stakes decisions, and continuous monitoring for performance degradation [74]. Regulatory frameworks such as the EU’s AI Act introduce external governance requirements. These requirements include risk assessments, transparency measures, and human oversight for high-risk AI applications. This transforms compliance from a technical checklist into a strategic imperative [75].
These business support elements reduce to three fundamental indicators of organizational commitment: whether ML projects receive company-wide recognition (SP1_WELL), strategic prioritization (SP2_PRIO), and adequate financial resources (SP3_FIN). These observable markers reflect the degree to which an organization has internalized and acted upon the complex support requirements described.
Ecosystem Dimension SP1_WELL ML projects are well recognized across the company SP2_PRIO ML projects have a high priority in the company.
The financial budget for the ML project is sufficient.
The following model’s structure reflects the fundamental principle that machine learning projects exhibit unique characteristics requiring specific organizational capabilities and sequential dependencies [7][14]. The model’s structure addresses the fundamental tension in ML projects between exploration and exploitation [76]. Organizational support enables exploration through resource provision, strategy channels this exploration toward specific objectives, processes exploit chosen approaches through technical implementation, and ecosystems institutionalize successful patterns. This progression from exploration to exploitation through increasing specification explains why alternative causal structures would violate the inherent logic of ML development. The positioning of organizational support as the initial causal driver aligns with resource-based view theory, which posits that organizational resources and capabilities determine strategic options [77]. This causal relationship is not merely correlational; experimental evidence shows that variations in organizational support directly influence the quality and scope of data-driven strategic initiatives [78]. The absence of direct paths from support to downstream elements reflects “the analytics gap” -resources alone cannot generate value without strategic translation [79].
The central role of strategy as a mediator between support and process implementation finds strong theoretical grounding in the strategic alignment literature. IT strategy serves as a critical bridge between organizational capabilities and technical execution [80]. For ML projects, strategic clarity -operationalized through task definition, data specification, and target setting -determines the entire trajectory of technical development [81].
The causal flow from strategy to process reflects the inherent logic of ML development, where strategic decisions preceding technical implementation. ML projects following a strategy-first approach achieve significantly higher success rates than those beginning with technical experimentation [82].
In ML contexts, deployment infrastructure needs to emerge from, rather than determine, technical process decisions [8]. This causal direction is empirically supported by Google’s ML engineering practices, where teams first establish data pipelines and model architectures before designing deployment systems [14].
To test whether this structured approach leads to successful ML projects, we surveyed data scientists. The survey underwent two rounds of pilot testing before being distributed to 3,000 data scientists, yielding 161 responses (a 5% response rate). After cleaning the data, we deemed 150 responses suitable for analysis, where the data scientists use AI coding assistant on a daily base (1-2 hours on average). Most of the respondents were employed professionals (67%), followed by students (20%), freelancers (11%), and business owners (9%). Forty percent worked in large enterprises with more than 500 employees, while the rest represented companies with fewer than 20 employees (15%), 20-50 employees (6%), 50-100 employees (8%), or 100-500 employees (13%). Nineteen percent selected N/A.
The participants came from a variety of sectors, including fintech, education, artificial intelligence (AI), online retail, financial services, analytics consulting, and telecommunications. Of these participants, only 14% reported ML projects with real business impact. Twentytwo percent leaned toward business impact, 25% were neutral, and 39% characterized their work as experimental (23% leaned toward experimental and 16% were purely pilot projects). This distribution reflects the range of perspectives on the ML adoption spectrum, from experimental initiatives to production deployments.
Prior to conducting the structural equation modeling analysis, we verified that all fundamental assumptions were satisfied. First, multivariate normality was assessed using Mahalanobis distance with chi-square distribution testing. No outliers were detected as all p-values exceeded the 0.001 threshold, confirming the multivariate normality assumption. Multicollinearity diagnostics revealed no concerns, with all Variance Inflation Factor values below 10 and tolerance values exceeding 0.1, indicating absence of collinearity among predictor variables. The tolerance values (all above 0.30) and Variance Inflation Factor (VIF) scores (all below 5.0) indicate acceptable levels of multicollinearity, confirming that the SEM assumptions regarding independent variables’ linear independence are met.
Heteroskedasticity was examined through residual plots with loess smoothing, which displayed a straight-line pattern, confirming homoscedasticity. The correlation matrix demonstrated positive definiteness with a determinant greater than zero, satisfying this critical assumption for SEM estimation. Fig. 2. Heteroskedasticity / homoscedasticity analysis Initial factor loadings revealed that the financial support item (SP3_FIN) exhibited excessive cross-loadings, compromising discriminant validity. Following established procedures, this item was removed from further analysis, resulting in improved model specification. Reliability analysis demonstrated strong internal consistency across all constructs, with Cronbach’s alpha values exceeding the 0.7 threshold for all dimensions. The refined model is as follows: The evaluation of model fit assesses how well the theoretical model aligns with the observed data. the Normed Fit Index (NFI) of 0.867 and Relative Fit Index (RFI) of 0.841, slightly below the recommended threshold (โฅ 0.90), suggest minor improvements could be made regarding the relative comparison to a baseline model. However, strong incremental fit measures-such as the Incremental Fit Index (IFI = 0.960), Tucker-Lewis Index (TLI = 0.951), and Comparative Fit Index (CFI = 0.959)demonstrate that the hypothesized structural model provides substantial improvements over a baseline independence model. Moreover All hypothesized mediator relationships demonstrated statistical significance with z-values exceeding 1.96. Strategy mediates the relationship between Support and Process, while Process mediates between Strategy and Ecosystem. These serial mediation effects reveal the cascading nature of ML project success factors, where organizational support initiates a chain of effects through strategy clarity, structured processes, and implementation quality, ultimately determining project success.
A. Support -> Strategy (highly significant) Organizational support significantly influences strategy formulation, aligning with established information systems literature on the importance of organizational backing for IT project success. The significant relationship between organizational support and strategy formulation takes on new dimensions in the era of LLM-assisted development. As GitHub Copilot studies demonstrate, developers using AI coding assistants can achieve 55% faster task completion, but this acceleration requires clear organizational guidelines and strategic frameworks [5]. The support dimension becomes critical for establishing AI usage policies, ethical guidelines, and resource allocation for LLM tools. This aligns with organizational support structures determine whether AI tools amplify productivity or create confusion [83]. Organizations must now provide support not just for traditional ML projects but for metalevel decisions about when and how to leverage LLMs in the development process.
The path from strategy to structured preprocessing and postprocessing reveals a paradox in LLM-assisted development. While LLMs can generate preprocessing code rapidly, our findings suggest that strategic clarity remains essential for effective implementation [84]. This supports that developers using Copilot still require clear problem formulation and data understanding -tasks that LLMs cannot fully automate [85].
While LLMs excel at generating boilerplate preprocessing code, AI-generated code often lacks the robustness necessary for production systems [86]. Our findings suggest that structured processes remain crucial even when leveraging LLMs, supporting that LLM-assisted development requires new forms of quality assurance and testing [87]. The ecosystem perspective becomes essential as teams must integrate LLM-generated components with existing infrastructure, monitoring systems, and governance frameworks
The relationship between ecosystem and project success aligns with recent findings on LLM-assisted development outcomes. While LLMs accelerate individual coding tasks, project success depends on systematic integration practices [88]. The ecosystem approach becomes critical when integrating models into specific organizational contexts [89]. Success requires not just using these tools but building comprehensive ecosystems that handle versioning, monitoring, and continuous improvement of both humanwritten and AI-generated code.
The direct effect of strategy on success, beyond its mediated effects, supports that in AI-augmented environments, strategic clarity becomes more, not less, important [90]. The direct path suggests that strategy provides value in addition to operational efficiency -it shapes how teams conceptualize problems, evaluate LLM suggestions, and make architectural decisions that no current AI system can fully automate.
The negative coefficient likely represents a suppressor effect, where Process only contributes positively to Success through its influence on Ecosystem implementation [91]. This statistical phenomenon occurs when the direct effect has an opposite sign from the total effect, indicating that Process activities consume resources and create complexity that, without proper deployment infrastructure, actually hinder success.
In summary, these findings reveal that LLMs fundamentally alter the “how” but not the “why” of ML project success. While coding assistants can accelerate technical implementation, our model demonstrates that success still flows through organizational support, strategic clarity, and structured processes. This supports the “AI pair programming” paradigm, where LLMs augment rather than replace human expertise [92].
Several limitations of this study should be considered when interpreting the results. First, while the sample size of 150 respondents is adequate for SEM analysis, it may limit the generalizability of the findings to different organizational contexts and industries. The 5% response rate raises concerns about non-response bias because those who completed the survey may be data scientists with specific characteristics or experiences. Second, despite the theoretical grounding of our model, the cross-sectional design prevents causal inference; longitudinal studies would better capture the dynamic nature of ML project evolution. Third, the self-reported nature of the data may introduce common method bias, though our multi-factor structure and discriminant validity tests mitigate this concern to some extent. Fourth, our focus on data scientists may not represent all roles in machine learning project contexts. Finally, removing the financial support item (SP3_FIN) due to cross-loadings suggests that our operationalization of the support dimension may not capture all relevant organizational factors, especially those related to resource allocation.
This research addresses the persistent challenge of ML project failure by developing and validating a comprehensive framework that acknowledges the technical and organizational dimensions of success. The MLC adapts business model thinking to the specific context of ML development. It provides a structured approach that remains relevant despite the acceleration enabled by LLM coding assistants. Our empirical findings show that LLMs may revolutionize the “how” of ML development through rapid code generation. However, the fundamental success factors-strategic alignment, structured processes, robust ecosystems, and organizational support-remain unchanged. Our SEM analysis revealed cascading relationships that offer practical insights: Organizations cannot adopt AI coding tools and expect improved outcomes. Instead, they must build comprehensive capabilities across all four dimensions. Thus, the MLC framework serves as a diagnostic tool for identifying weaknesses in current ML initiatives and as a planning instrument for future projects, helping organizations navigate the intersection of human expertise and AI assistance in modern ML development.
This content is AI-processed based on open access ArXiv data.