OpenCap Monocular: 3D Human Kinematics and Musculoskeletal Dynamics from a Single Smartphone Video

Op enCap Mono cular: 3D Human Kinematics and Musculosk eletal Dynamics from a Single Smartphone Video Selim Gilon 1* , Emily Y. Miller 1 , Scott D. Uhlric h 1,2 1 Departmen t of Mechanical Engineering, Univ ersity of Utah, Salt Lak e Cit y , 84112, United States 2 Departmen t of Orthopaedic Surgery , Univ ersit y of Utah, Salt Lake Cit y , 84112, United States * selim.gilon@utah.edu Abstract Quan tifying human mo vemen t (kinematics) and m usculoskeletal forces (kinetics) at scale—suc h as estimating quadriceps force during a sit-to-stand mov ement—could transform the prediction, treatmen t, and monitoring of mobility-related conditions. Ho wev er, quantifying kinematics and kinetics traditionally requires costly , time-in tensive analysis in a sp ecialized lab oratory , limiting clinical translation. Scalable, accurate to ols for biomec hanical assessmen t are critically needed. W e introduce Op enCap Monocular, an algorithm that estimates 3D skeletal kinematics and kinetics from a single static smartphone video. The metho d refines 3D h uman p ose estimates from a mono cular pose estimation mo del from computer vision (WHAM) via optimization, computes the kinematics of a biomechanically constrained sk eletal mo del, and estimates kinetics via physics-based sim ulation and machine learning. W e v alidated Op enCap Monocular against marker-based motion capture and force plate data for w alking, squatting, and sit-to-stand tasks. Op enCap Mono cular ac hieved lo w kinematic error (4.8 ° mean absolute error [MAE] for rotational degrees of freedom; 3.4 cm MAE for p elvis translations), outperforming a regression-only computer vision baseline by 48% in rotational accuracy ( p = 0 . 036) and 69% in translational accuracy ( p < 0 . 001). Op enCap Monocular also estimated ground reaction forces during walking with accuracy comparable to, or b etter than, that of our prior tw o-camera Op enCap system. W e demonstrate that the algorithm estimates imp ortan t kinetic outcomes with a clinically meaningful lev el of accuracy in applications related to frailty and knee osteoarthritis, including estimating the knee extension momen t during sit-to-stand transitions and the knee adduction momen t during walking. OpenCap Mono cular is deplo yed via a smartphone app, a w eb app, and secure cloud computing (h ttps://op encap.ai), enabling free, accessible single-smartphone biomec hanical assessmen ts. Such accessibilit y enables large-scale remote studies and, ultimately , routine ev aluations of mobility and function in the clinic or at home. Our code is a v ailable at github.com/utahmobl/opencap-mono cular. Author Summary The abilit y to easily measure human mo vemen t and m usculoskeletal forces has the p oten tial to impro ve the treatmen t of mov emen t-related disorders. How ev er, precise biomec hanical analysis has traditionally required costly , time-consuming lab oratory Marc h 27, 2026 1/23 analyses, limiting its impact on clinical practice. T o address this gap, w e developed Op enCap Monocular, an op en-source tool that estimates 3D skeletal motion and m usculoskeletal forces using video recorded with a single smartphone. Our algorithm refines motion estimates from computer vision mo dels—whic h often con tain physically implausible artifacts suc h as fo ot sliding—b y integrating ph ysics-based mo deling with mac hine learning to generate physically and biomec hanically consistent motion and force estimates. The system es timates 3D human motion more accurately than a single-camera computer vision mo del alone and achiev es accuracy comparable to more cum b ersome t wo-camera setups. Its force estimates ac hieve clinically meaningful accuracy , capturing knee loading metrics related to osteoarthritis and muscle force patterns linked to age-related declines in ph ysical function. By pack aging this complex pip eline in to a free, automated cloud application, Op enCap Monocular enables clinicians and researchers to conduct precise, large-scale mo vemen t studies in clinics and homes using equipmen t they already carry in their p ock ets. 1 In tro duction Quan titative analysis of human mov ement provides critical information across fields suc h as rehabilitation, sp orts science, ergonomics, and the treatment of musculosk eletal and neurom uscular disorders. Measures of mov ement kinematics (e.g., joint angles, v elo cities) and kinetics (e.g., join t moments, ground reaction forces, muscle forces) can predict the risk of injury and disease progression, track functional reco very , and ev aluate the efficacy of in terven tions [1 – 7]. T raditionally , the gold standard for high-fidelity mov ement analysis has b een lab oratory-based motion capture [8]. This approach uses sp ecialized cameras to track reflectiv e markers on the b o dy and force plates to measure ground reaction forces. While accurate, this metho d requires exp ensiv e equipment (often > $ 150,000), dedicated lab oratory space, sp ecialized exp ertise for data collection and pro cessing, and considerable time inv estmen t (often hours to days p er participant) [9]. Thus, it is rarely used clinically . Although widely used in research, the limited scalability of motion capture has constrained our abilit y to study mov ement in large cohorts in clinical, comm unity , or home settings (Fig 1). Most biomechanics studies are limited to lab oratory settings and include a median of only 12 to 21 participants [10, 11]. Mobile sensing tec hniques for measuring human mov ement can address these c hallenges, but scalability and accuracy challenges remain. Inertial measurement units can estimate kinematics outside the lab. How ever, they require the donning and doffing of up to 15 sensors to estimate whole-b ody kinematics and kinetics [12 – 15]. W earable sensors are w ell-suited for long-term monitoring of sp ecific biomechanical v ariables [13, 16, 17]. Ho wev er, they are impractical for rapid, routine assessmen ts of whole-b ody mov ement in large-scale, decentralized studies or in clinical practice (Fig 1). Marc h 27, 2026 2/23 Fig 1. Op enCap Mono cular Enables Scalable Ev aluation of 3D Human Motion and Musculoskeletal Dynamics . ( A. ) T raditional lab-based motion capture pro vides v aluable, high-fidelity biomechanical assessments, but it is costly and time-consuming. Clinical assessments of function, suc h as timed functional tests, fail to capture the nuances of full-b o dy biomec hanics. Op enCap Mono cular addresses the need for fast, scalable, and accurate to ols to quantify whole-b o dy motion. This soft ware enables 3D biomechanical assessments in large-scale, ecologically v alid studies and supp orts integration into routine clinical practice. ( B. ) Op enCap Mono cular enables 3D assessment of kinematics and kinetics with a single smartphone. The pip eline is freely av ailable through our mobile and web applications and secure cloud pro cessing infrastructure. Video-based approac hes, leveraging adv ancements in computer vision and deep learning for human p ose estimation [18 – 21], are a promising av enue for large-scale, rapid measuremen t of whole-b o dy motion, due to the ubiquity of smartphone cameras [22]. W e previously developed Op enCap, an op en-source platform that quantifies 3D kinematics and m usculoskeletal dynamics from tw o or more smartphone videos [9]. Multi-camera Op enCap enables motion to b e measured in approximately 10 minutes with equipmen t that costs < $ 1,000. W e deploy ed the op en-source softw are using cloud computing and a freely a v ailable web application. Deploying these algorithms into an easy-to-use application has enabled 14,000 researchers to collect 400,000 motion trials in the 3 y ears since its release. Multi-camera video-based systems, like Op enCap, ha ve enabled large-scale studies of mo vemen t in more ecologically v alid settings (e.g., in campus gymnasiums or at patien t advocacy conferences) that would hav e b een infeasible with the lab-based approac h [23, 24]. Ho wev er, even the requirement of multiple calibrated, trip od-mounted smartphones plus a laptop is a barrier in sp ecific con texts, particularly for regular clinical or in-home assessmen ts. The abilit y to p erform accurate 3D motion capture and dynamic analysis from a single smartphone video w ould represen t a significant improv ement in accessibility , p oten tially emp o wering billions of smartphone users w orldwide with to ols for quan titative mov ement assessment. Here, w e developed Op enCap Mono cular, an op en-source, cloud-deploy ed algorithm for estimating 3D skeletal kinematics and kinetics from a single static smartphone video Marc h 27, 2026 3/23 (Fig 1). W e presen t v alidation against gold-standard marker-based motion capture and force plate measuremen ts. W e hypothesized that Op enCap Mono cular would result in lo wer kinematic and kinetic errors compared to directly applying inv erse kinematics to computer vision mo del outputs (i.e., CV + IK). W e also compare accuracy to the original t wo-camera Op enCap platform [9]. W e then ev aluate Op enCap Mono cular’s utilit y for t wo downstream clinical tasks. First, we analyze the kinetics of a sit-to-stand transition, an activity that reflects age-related reductions in quadriceps strength [25 – 27]. W e h yp othesize that Op enCap Mono cular can detect a redistribution of low er-extremit y join t moments from the knee to the ankle and the hip during chair rise with a quadriceps-a voidance strategy . W e further test whether errors in Op enCap Mono cular’s knee extension momen t estimates fall b elo w a clinically meaningful threshold of 11 Nm—the difference observed b et ween individuals with and without early signs of frailty (pre-frailt y) [28]. In a second clinical use case, we ev aluate Op enCap Mono cular’s accuracy in estimating the knee adduction momen t during walking, a key dynamic metric of knee loading asso ciated with medial compartment knee osteoarthritis progression [29, 30]. W e hypothesize that Op enCap Mono cular can estimate the knee adduction momen t with errors b elo w a clinically meaningful threshold of 0.5% b odyweigh t (BW) · heigh t (ht) [29, 30], supp orting its use for predicting progression and ev aluating joint-offloading interv entions. 2 Metho ds 2.1 Data Collection with Op enCap Mono cular Setting up an Op enCap Mono cular recording takes less than one minute and requires minimal equipmen t: one iPhone or iPad and a trip o d. No markers, calibration frames, or force plates are needed. After logging in to our HIP AA-compliant web application, users can either (1) con trol data collection directly on the trip o d-moun ted iOS device that is recording video or (2) use a separate internet-connected device (e.g., a laptop) to con trol the recording iOS device. The captured video is automatically uploaded to the cloud for pro cessing, where kinematics are computed in under tw o minutes for a 10-second video. Results can b e visualized directly in the web application or do wnloaded through the application programming interface for further analysis (e.g., kinetics). The iOS app and w eb platform are freely av ailable at https://opencap.ai/. 2.2 Op enCap Mono cular Pip eline: Static Video to Kinetics Fig 2 illustrates the fiv e core pro cessing steps: initial 3D p ose estimation, p ose refinemen t optimization, marker extraction from SMPL vertices, inv erse kinematics, and estimation of musculosk eletal dynamics. The first four steps are automated in the cloud. F or w alking trials, the fifth step, estimation of ground reaction forces, is av ailable via a mac hine learning mo del [31, 32]. F or all trial types, we provide co de that p erforms this step and the m usculoskeletal dynamics pip eline offline in p ost-processing [9, 33]. Marc h 27, 2026 4/23 Fig 2. Op enCap Mono cular Algorithm . OpenCap Mono cular estimates 3D global kinematics and kinetics from a single, static smartphone video. (1) Computer vision mo dels, ViTPose [34] and WHAM [35], estimate 2D k eyp oin ts and an initial 3D h uman global p ose, represen ted by a sequence of SMPL mo del parameters [36]. (2) This initial p ose sequence (top, red sk eleton) often contains physical inaccuracies like translational drift and fo ot-flo or p enetration. T o correct this, we apply a p ose-refinemen t optimization that minimizes repro jection error, fo ot sliding/p enetration, and excessive joint velocity . The output is a more physically plausible, optimized p ose sequence (b ottom, green skeleton). (3) A set of virtual skin markers is extracted from the v ertices of the refined SMPL mesh and (4) track ed with Op enSim Inv erse Kinematics [37] to obtain 3D join t kinematics. (5) Physics-based and mac hine learning algorithms are used to estimate kinetics (e.g., ground reaction and m uscle forces) from the mono cular kinematics, without the need for force plates [9, 31, 32]. 2.2.1 Initial 3D Pose Estimation W e first use WHAM [35] to estimate the global 3D human p ose, which is a sequence of SMPL [36] mo del parameters, including b ody shap e ( β 0 ), b ody p ose ( θ 0 ), global translation ( τ 0 ), and global orien tation (Γ 0 ). WHAM also provides an estimate of the camera’s extrinsic parameters, ξ . Additionally , we use ViTPose [34] to estimate 2D k eyp oin t lo cations and confidence scores for b oth the WHAM pip eline and subsequen t optimization steps. WHAM also pro vides ground contact probabilities for the heel and to e, which are used to guide our subsequent refinement step. While WHAM provides a strong initial estimate of the 3D motion, it can suffer from inaccuracies such as translational drift and ph ysically implausible fo ot-flo or interactions (e.g., sliding, p enetration), which motiv ate our p ose refinement step. 2.2.2 P ose Refinement Optimization T o improv e the physical plausibility of the initial p ose estimates, w e implemented a t wo-stage optimization pro cedure that refines camera extrinsic parameters and SMPL p ose and shap e parameters, ensuring consistency with observed 2D keypoints and the ph ysical constraints of human mov ement. The optimization problem is formulated in PyT orc h, enabling automatic differentiation and GPU acceleration, and b oth stages optimize all design v ariables sim ultaneously ov er the whole sequence. W e assume that the camera is not mo ving, that the b o dy shap e is constan t ov er time, and that the individual’s height is known (it is queried during recording in the web application). F or videos captured with our iOS application, camera in trinsics are known a priori via a database of in trinsic parameters for all iOS devices released since 2018. In the first optimization stage, w e solve for ( β ) and the camera extrinsic ( ξ ) parameters while holding the global p ose ( θ 0 , τ 0 , Γ 0 ) constant. The ob jectiv e function is describ ed in Eq (1): J stage1 = w r L repr + w h L height + w β L β (1) Marc h 27, 2026 5/23 where: L repr is the confidence-w eighted 2-D repro jection error: L repr ( ξ , K ) = N X i =1 w i   Π( X 3 D i ; K , ξ ) − x 2 D i   2 2 (2) where: N is the n umber of keypoints, w i is the confidence w eight of keypoint i , X 3 D i is the 3D p osition of keypoint i , Π( X 3 D i ; K , ξ ) denotes the pro jection of the 3D p oin t in to image co ordinates using the camera intrinsics matrix K and extrinsic parameters ξ (rotation and translation), and x 2 D i is the observ ed 2D keypoint in the image. L height =  ˆ h − h ) 2 p enalizes deviations from the individual’s known height ˆ h . And L reg β p enalizes deviations in b ody-shap e parameters from the initial WHAM estimate β 0 . L reg β = D X j =1  β j − β 0 ,j  2 (3) where: D is the dimensionality of the b ody-shap e parameter v ector (10), β j is the j -th b ody-shap e parameter, and β 0 ,j is the corresp onding WHAM estimate for parameter j . In the second optimization stage, the global p ose ( θ , τ , Γ) and camera extrinsics are refined while the b ody shap e ( β ) is held constant. The ob jective function is describ ed in Eq (4). J stage2 = w r L repr + w c L cam + w v L foot vel (4) + w s L foot slide + w f L flat + w sm L smooth where: • L cam p enalizes deviations in camera extrinsics from stage 1, • L foot vel p enalizes non-zero velocities of heel and to e markers during contact, • L foot slide p enalizes mov emen t (v ariance in p osition) of the heel and to e during b outs of contin uous con tact, • L flat enforces a consistent vertical p osition of heel and to e mark ers during con tact ev ents across the sequence, • L smooth p enalizes joint linear velocity , encouraging smo othness. Mathematical expressions for eac h term in the second-stage ob jective function are pro vided in our op en-source implemen tation for full repro ducibilit y . W e tuned the w eights of the second optimization stage using previously published Op enCap datasets [9], based on 3D marker errors relative to marker-based motion capture and qualitativ e insp ection of mo vemen t plausibility [9]. At the time of publication, the deplo yed pip eline uses separate, activit y-sp ecific w eight sets for gait, squatting, sit-to-stand, and other mo vemen ts. T o automatically select the appropriate parameter set at run time, we employ a video-understanding foundation mo del (Video-LLaMA3 [38]) to classify the activit y b eing p erformed. This design improv es robustness across div erse recording environmen ts and enables extensible activity classification without man ual interv ention or additional optimization. The parameters used are listed in Supplemen tary T able S1. Marc h 27, 2026 6/23 2.2.3 Mark er Extraction F or downstream kinematic and kinetic pro cessing using a biomechanical mo del in Op enSim, we extract 38 virtual surface mark er p ositions as vertices of the SMPL mesh. These mark ers characterize the motion of the forearm, upp er arm, torso, p elvis, thigh, shank, and fo ot segments. W e extract a ’static’ set of mark ers using a default standing SMPL p ose to scale a biomechanical mo del. Then, we extract marker tra jectories during the motion sequence using the optimized global p ose ( θ opt 2 , τ opt 2 , Γ opt 2 ). 2.2.4 In verse Kinematics of a Musculosk eletal Mo del W e use a m usculoskeletal mo del [9, 37, 39, 40] with 33 degrees of freedom (6 for the p elvis in the ground, 3 for the lumbar, 3 for each hip, 1 for each knee, 2 for each ankle, 3 for eac h shoulder, 2 for each elb o w, and 1 metatarsophalangeal joint p er fo ot [unlo c k ed for ph ysics simulation only]). Imp ortan tly , unlike the SMPL mo del, the joints in this constrained Op enSim mo del can only mov e in biomechanically plausible wa ys. F or example, whereas the SMPL mo del represents the knee with three rotational degrees of freedom, our biomec hanical mo del includes six axes of motion, with five constrained as spline functions of the single flexion–extension degree of freedom. F or each motion trial, w e scale the musculosk eletal mo del to the individual’s an throp ometry using the Op enSim Scale to ol [37] with markers extracted from the static-p osed SMPL mo del. W e then solve for the kinematics of this scaled mo del using the mark er tra jectories during the motion and the In verse Kinematics (IK) to ol. The resulting mono cular kinematics are tra jectories of biomechanically plausible p elvis translations and join t kinematics. 2.2.5 Musculosk eletal Dynamics from Mono cular Kinematics Here w e employ tw o different approaches to estimate musculosk eletal dynamics (i.e., kinetics) from the mono cular kinematics: physics-based simulation [9, 33] and machine learning [31]. The ph ysics-based approach uses the simulation metho ds describ ed in Uhlrich, F alisse, and Kidzinski et al. (2023) [9]. Briefly , we estimate musculosk eletal dynamics (ground reaction forces, join t moments, and muscle forces) using a torque or m uscle-driven dynamic simulation that tracks the mono cular kinematics, without the need for exp erimen tal force plate data. W e mo del fo ot-floor contact using six smo oth Hun t-Crossley contact spheres p er fo ot [41, 42]. The simulation is p osed as an optimal con trol problem, form ulated using direct collo cation with CasADI [43], and solv ed using IPOPT with algorithmic differentiation (Op enSimAD) for gradient computation [33, 44]. The optimization solves for kinematics, muscle excitations, and torque actuator controls that trac k the mono cular kinematics while minimizing the sum-squared muscle activ ations, sub ject to constraints on muscle and skeletal dynamics. The resulting kinematics and kinetics closely trac k the input motion, but are dynamically consistent (i.e., no residual forces or momen ts at the ground-p elvis joint), and could plausibly b e generated b y muscles. Here, we use muscle-driv en simulations to obtain dynamics for the sit-to-stand activit y . Since the release of the original m ulti-camera Op enCap platform [9], a machine learning mo del that predicts ground reaction forces during gait from kinematics has b een developed (GaitDynamics) [31]. This mo del improv es the sp eed and accuracy of ground force prediction from kinematics during w alking compared to physics-based sim ulations alone. T o estimate walking kinetics in this study , we use GaitDynamics to predict ground forces, and we track these predictions in a physics simulation to estimate join t moments [32]. Marc h 27, 2026 7/23 2.3 Secure Cloud Deplo ymen t The Op enCap Mono cular pip eline is integrated into the Op enCap web and iOS applications (Fig. S2). Videos uploaded to the cloud are queued and pro cessed by GPU serv ers. A 10-second video takes less than tw o minutes to pro cess using an NVIDIA R TX 4090 GPU. Once pro cessed, the results can b e visualized, further analyzed (e.g., automated gait analysis), and do wnloaded from the web application. Alternatively , data can b e downloaded programmatically for further pro cessing (e.g., estimating kinetics) using our application programming in terface and p ost-pro cessing softw are. The co debase is op en source, and cloud pro cessing is pro vided free of c harge to the researc h comm unity . 2.4 V alidation Proto col T o v alidate our metho d, w e used the publicly av ailable dataset of synchronized video, mark er-based motion capture, and force plate recordings from the multi-camera Op enCap study [9]; full exp erimen tal details are provided in the prior publication. Briefly , ten healthy adults (5 females; age 26 ± 4 years; mass 74 ± 8 kg) p erformed several activities, including lev el walking, five b o dyw eigh t squats, and five sit-to-stand transitions. T o simulate mov ement patterns relev an t to clinical p opulations, participan ts were also instructed to p erform mo dified versions of these tasks, such as squats while offloading one fo ot and sit-to-stand transitions with increased trunk flexion and angular v elo cit y to simulate a strategy commonly observed in older adults with quadriceps w eakness [25, 26]. The dataset also includes v ariations of gait, in whic h participan ts were instructed to walk with a trunk-swa y mo dification to emulate comp ensatory mov emen t patterns. Thus, our kinematic comparison for eac h individual comprises 10 squats, 10 sit-to-stand transitions, and six walking trials, with v aried kinematic patterns for each activity . P articipants whose faces are visible in images and videos in this article (e.g., Supplemen tary Video and Fig. S2) hav e provided informed consen t to the sharing of identifiable video data, through a proto col appro ved by the Univ ersity of Utah Institutional Review Board. F or the Op enCap Mono cular analysis, we used the 45 ° anterolateral camera view from the original m ulti-camera Op enCap recordings. The fron tal and sagittal-only camera views pro duced visually worse 3D mono cular kinematics due to prolonged segmen t o cclusion and a lac k of information in an entire plane of mov ement. W e compared Op enCap Mono cular kinematics with marker-based motion capture, a computer vision baseline (CV + IK), and the previously published tw o-camera Op enCap algorithm. The CV + IK baseline used the SMPL mo del prediction directly from WHAM, b ypassed the p ose refinemen t optimization step (Section 2.2.2), and pro ceeded to the marker extraction and IK steps (Section 2.2.3, Section 2.2.4). W e compute kinematic accuracy as a mean absolute error (MAE) across three translational degrees of freedom (p elvis translations) and 18 rotational degrees of freedom: ankles (4), knees (2), hips (6), p elvis global orientation (3), and lumbar (3). W e compared all-activit y kinematic errors b et ween the computer vision baseline and Op enCap Mono cular using paired t-tests ( α = 0 . 05, n = 10). Ground reaction force accuracy was quan tified as the mean absolute error (MAE) during the stance phase for each directional comp onen t, normalized b y b o dy weigh t. W alking ground reaction force errors w ere compared b et ween metho ds using paired t-tests ( α = 0 . 05, n = 10). All statistical analyses w ere conducted in Python (v3.9.21) using SciPy (v1.11.4). Marc h 27, 2026 8/23 2.5 Clinical Use Case 1: Joint Moments during Chair Rise T o demonstrate the clinical utility of Op enCap Mono cular, w e analyzed the sit-to-stand (STS) mo vemen t, a fundamental functional assessment in clinical practice. Rising strategies v ary with age and are asso ciated with distinct muscle force requirements [45]. Older adults often increase trunk flexion when rising from a chair, shifting muscular demand from the knee extensors to the hip extensors and ankle plan tarflexors [27]. This comp ensatory strategy is asso ciated with reduced functional strength [25] and an increased risk of falls [26]. W e used the Op enCap v alidation dataset [9], in which 10 health y individuals p erformed sit-to-stand transitions under t wo conditions: 5 rep etitions using their natural strategy , follow ed b y 5 rep etitions with delib erately increased trunk flexion. Joint moment v alues were av eraged ov er the rising phases of the three central rep etitions. W e first ev aluated the accuracy of the knee extension moment, a veraged ov er the rising phase, a proxy for quadriceps force. W e compared this accuracy to an 11 Nm clinically relev an t accuracy threshold, as this is the av erage difference in momen t b et ween adults with and without early frailty [28]. W e also p erformed one-sample t-tests on the changes in knee, hip, and ankle moments b etw een the natural and trunk-flexion conditions to test whether Op enCap Mono cular could detect group c hanges in joint moments similarly to motion capture and force plates. 2.6 Clinical Use Case 2: Knee Loading during W alking T o further demonstrate the clinical relev ance of Op enCap Mono cular, w e ev aluated the accuracy of the knee adduction momen t (KAM), a loading metric asso ciated with the onset and progression of medial compartmen t knee osteoarthritis [29, 30]. Elev ated KAM v alues during walking indicate increased medial compartmen t loading, which accelerates cartilage degeneration [29, 46]. Accurate, scalable estimation of the KAM from smartphone videos could therefore enable remote assessment of knee joint loading in large p opulations and facilitate early detection, monitoring, and p ersonalized rehabilitation for medial knee osteoarthritis [7]. W e used walking trials from the Op enCap v alidation dataset [9]. The KAM w as computed from mono cular kinematics using a h ybrid pip eline that com bines physics-based and machine-learning approac hes [31, 32]. W e compared these estimates to gold-standard in verse dynamics deriv ed from motion capture and force plates. W e fo cused on the first p eak of the KAM during the s tance phase, due to its relationship to disease progression [29]. W e computed the mean absolute error (MAE) of the first-p eak KAM, normalized to b o dy weigh t and heigh t, and compared it against a clinically meaningful threshold of 0.5% BW · h t. This threshold has b een sho wn to differentiate individuals with slow versus rapid medial knee osteoarthritis progression [29, 30, 47 – 49] and represents the low er b ound of the clinically relev ant range (0.5–2.2% BW · ht) used for OA diagnosis and progression risk assessment. 3 Results 3.1 Kinematic Accuracy Across all activities, Op enCap Mono cular had an MAE of 4 . 8 ◦ for rotational kinematics and 3.4 cm for translational kinematics, compared to marker-based motion capture. These errors are 4 . 5 ◦ (48%, p = 0 . 036) and 7.6 cm (69%, p < 0 . 001) lo wer than the computer vision baseline (CV+IK), highlighting the imp ortance of the p ose refinement step. Op enCap Mono cular’s rotational accuracy was within 1 ° of tw o-camera Op enCap, and translational accuracy within 2 cm (Fig 3). Rotational accuracy b y degree of freedom is pro vided in Fig. S1. Marc h 27, 2026 9/23 Fig 3. Kinematic Accuracy . The mean (bar) and standard deviation (error bar) of mean absolute errors (MAE) in kinematics across activities (STS stands for sit-to-stand), compared to marker-based motion capture. * indicates p < 0 . 05. Compared to the computer vision baseline mo del, Op enCap Mono cular demonstrated ( A ) 48% low er errors across 18 rotational degrees of freedom ( p = 0 . 036) and (B) 69% lo wer errors across three p elvic translational degrees of freedom ( p < 0 . 001), a veraged across activities. In addition to improving accuracy (Fig 3), the p ose refinemen t optimization step in the Op enCap Mono cular reduced translational drift (Fig 4). T ranslations from CV+IK often drifted o ver time. F or example, after five rep etitions of the sit-to-stand, the CV+IK p elvis drifted by an av erage of 56.9 cm, whereas our refined p ose remained more stable with an a verage p elvis translational error after five rep etitions of 4.9 cm (Fig 4). Fig 4. Impact of Pose Refinement on T ranslational Drift . The mean (line) and standard deviation (shading) of p elvis translational drift (Euclidean distance b et w een the estimated p elvis p osition and marker-based motion capture) ov er five sit-to-stand rep etitions. All p elvis origins w ere aligned at the initial time p oin t. Op enCap Mono cular drifted an order of magnitude less than the computer vision plus in verse kinematics baseline (CV+IK) but still more than the tw o-camera Op enCap approach, which can compute depth analytically . Representativ e skeletal kinematics are shown during the first and fifth rep etitions for marker-based motion capture (white), Op enCap Mono cular (blue), and CV+IK (red). 3.2 Kinetic Accuracy Op enCap Mono cular with the GaitDynamics mo del estimated ground reaction forces with an MAE of 9.7% BW compared to force plate measurements. This outp erforms the Marc h 27, 2026 10/23 CV + IK baseline of 13.6% BW (58% impro vemen t) in the vertical direction ( p = 0 . 002; Fig 5). Although not statistically tested, Op enCap Mono cular yields slightly low er v ertical ground reaction force errors than estimates obtained from t wo-camera Op enCap kinematics, using either ph ysics-based simulation or the GaitDynamics mo del (12.2–13.5% BW). Fig 5. Ground Reaction F orce Accuracy . The mean (bar) and standard deviation (error bar) of mean absolute errors (MAE) in ground reaction forces during walking compared to force plates. Op enCap Mono cular kinematics coupled with the GaitDynamics [31] mac hine learning (ML) mo del estimated ground reaction forces more accurately (vertical: p = 0 . 002; mediolateral: p = 0 . 002; anteroposterior: p = 0 . 065) than the baseline computer vision mo del (CV+IK) and GaitDynamics (* indicates p < 0 . 05). W e also compare to forces deriv ed from tw o-camera Op enCap kinematics with either physics-based sim ulation [9] or GaitDynamics (ML). Interestingly , Op enCap Mono cular + ML yielded slightly low er vertical force errors than either t wo-camera approach, despite using only one camera, p otentially due to improv ed vertical center-of-mass kinematics from Op enCap Mono cular’s p ose refinement step. 3.3 Clinical Use Case 1: Joint Moments during Chair Rise Op enCap Mono cular detected the redistribution of low er-extremity joint moments from a normal to an exaggerated trunk-lean sit-to-stand mo vemen t. It detected a reduction in knee extension moment ( p = 0 . 015), an increase in hip extension moment ( p = 0 . 003), and an increase in ankle plantarflexion moment ( p = 0 . 044). Imp ortan tly , the direction of these changes matched inv erse dynamics analysis using gold-standard motion capture and force-plate systems ( p = 0 . 027; p = 0 . 013; p = 0 . 004, resp ectiv ely). Across all sit-to-stand trials, the rising phase–a veraged knee extension moment estimated b y Op enCap Mono cular show ed strong agreemen t with motion capture ( r 2 = 0 . 64), with an MAE of 5.8 Nm, whic h is b elow the 11 Nm clinically meaningful threshold related to pre-frailt y [28]. Marc h 27, 2026 11/23 Fig 6. Clinical Use Case 1: Detecting Join t Moment Differences during a Quadriceps-Avoidance Sit-to-Stand T ransition . T en participants completed the Five Times Sit-to-Stand test naturally and with instruction to increase their trunk flexion angle and angular v elo cit y during lift-off, a comp ensatory strategy often used by individuals with quadriceps w eakness to shift demand from the knee extensors (quadriceps) to the hip extensors and ankle plantarflexors. ( A. ) Changes (mean ± standard deviation) in lo wer-extremit y joint moments, av eraged ov er the standing phase, from the natural to the trunk flexion condition, normalized to b odyweigh t (BW) and heigh t (ht). Op enCap Mono cular detected the exp ected reduction in knee extension momen t and increase in hip and ankle moments ( p = 0 . 015 − 0 . 044), similar to motion capture and force plates ( p = 0 . 004 − 0 . 027). * indicates p < 0 . 05. ( B. ) Op enCap Mono cular estimated the rising phase–av eraged knee extension moment with 5.8 Nm of mean absolute error (MAE), compared to motion capture and force plates. This falls b elo w an 11 Nm clinically meaningful threshold that differen tiates older adults with and without early signs of frailty [28]. 3.4 Clinical Use Case 2: Knee Loading during W alking Op enCap Mono cular accurately estimated the knee adduction moment (KAM) during w alking, demonstrating close agreement with motion capture and force plate–derived in verse dynamics (Fig 7). The estimated KAM wa veform captured b oth characteristic KAM p eaks during the stance phase, with timing and magnitudes similar to those of the gold standard. The MAE of the first p eak KAM from Op enCap Mono cular’s hybrid kinetics pip eline (0 . 36% BW · ht ) was comparable to the tw o-camera Op enCap approach (0 . 41% BW · ht) and was b elo w the clinically meaningful threshold of 0 . 5% BW · ht [29, 30, 47, 48]. Marc h 27, 2026 12/23 Fig 7. Clinical Use Case 2: Knee Loading during W alking. The knee adduction moment (KAM) predicts the progression of medial compartment knee osteoarthritis but is traditionally difficult to measure clinically . ( A. ) The mean (line) and standard deviation (shading) of the KAM, normalized to b odyweigh t (BW) and height (ht), ov er the stance phase. ( B. ) Mean absolute error (MAE) of the first p eak KAM, which is a target for biomechanical in terven tions. Op enCap Mono cular’s errors are b elo w a clinically meaningful threshold of 0.5% BW · ht [29, 30, 47, 48]. 4 Discussion In this study , w e developed Op enCap Mono cular, a pip eline for analyzing 3D human m usculoskeletal kinematics and kinetics from a single smartphone video. W e v alidated it against lab oratory-based motion capture and force plates, and we demonstrated sufficien t accuracy to supp ort do wnstream clinical tasks related to osteoarthritis and frailt y . Our key contribution is the integration of a 3D human p ose estimation mo del, ph ysics-inspired p ose optimization, and a m usculoskeletal simulation framework. T ogether, these c omponents pro duce 3D kinematics and kinetics that are physically realistic and biomec hanically plausible. The pip eline bridges the gap b et w een the SMPL-based p ose estimates commonly pro duced by computer vision mo dels and the sk eletal kinematics and musculosk eletal dynamics that are integral to biomechanics researc h. By deploying the workflo w in the cloud, Op enCap Mono cular mak es 3D biomec hanical analysis freely av ailable to researchers across mov ement-related fields in min utes, without requiring softw are developmen t exp ertise or high-p erformance computing resources. W e anticipate that Op enCap Mono cular’s accessibility will enable no vel studies of human mov ement in real-world settings that were previously infeasible. A central finding of this work is the imp ortance of the p ose-refinement optimization step for b oth kinematic and kinetic accuracy . Directly applying inv erse kinematics to the mono cular computer vision p ose estimates (CV+IK) resulted in large kinematic errors, particularly in global translation, due to drift and inaccurate fo ot-floor in teraction. Our p ose refinement optimization, which enforces physical constrain ts such as fo ot–floor contact, substantially improv ed kinematic accuracy , reducing rotational and translational error b y 48% and 69%, resp ectiv ely , compared to CV+IK. These impro vemen ts also enhanced the accuracy of kinetics. Our physics-based simulation computes ground reaction forces based on the motion of fo ot-moun ted contact spheres Marc h 27, 2026 13/23 relativ e to a ground plane [41, 42], a pro cess that is highly sensitive to unrealistic fo ot-floor p enetration and fo ot sliding. The fo ot-flo or kinematics of the CV+IK m ethod w ere not sufficiently accurate to track in simulations (Fig 4). In contrast, the refined Op enCap Mono cular kinematics yielded dynamically consistent simulations of sit-to-stand activit y (Fig 6). F or walking, we used the GaitDynamics machine learning mo del to estimate ground reaction forces from whole-b ody kinematics, and Op enCap Mono cular pro duced the most accurate predictions in the vertical direction—outp erforming CV+IK and the tw o-camera Op enCap system—likely due to impro ved center-of-mass kinematics resulting from the p ose refinemen t optimization step. In addition to our p ose refinement algorithm, Op enCap Mono cular’s improv ed kinematic and kinetic accuracy relativ e to direct mono cular computer vision outputs also stems from simplifying assumptions enabled b y the Op enCap web and mobile applications. Whereas WHAM estimates global motion using a p oten tially mo ving camera with unkno wn intrinsic parameters and unknown participant scale, Op enCap Mono cular assumes a static smartphone, known camera intrinsic parameters, and kno wn participant height. Within the existing Op enCap workflo w, these assumptions in tro duce minimal additional burden, as w e maintain a database of camera intrinsics for all iOS devices released since 2018, and participant height is collected through the web application as part of the standard w orkflow. Prioritizing accuracy ov er flexibility was in tentional to supp ort the rapid, large-scale collection of reliable biomec hanics data. Ho wev er, these assumptions currently limit applications such as analyzing mov emen ts from large online video databases. F uture work can leverage our op en-source co debase to relax these constrain ts and broaden applicability b ey ond the Op enCap acquisition pip eline. Op enCap Mono cular pro duces biomechanically realistic joint kinematics and m usculoskeletal dynamics, adv ancing the utility of mono cular p ose estimation for biomec hanical research and clinical practice. Our accuracy ev aluation using these quan tities from a state-of-the-art musculosk eletal mo del is more informative for biomedical applications than mean p er-join t p osition errors, which are typically used to b enc hmark computer vision mo dels. Imp ortan tly , going b ey ond motion and estimating measures of musculosk eletal dynamics—suc h as muscle and joint forces—is essential for studying h uman p erformance and mo vemen t-related conditions. These dynamic quan tities more directly reflect neural control and the mechanical stimuli exp erienced by tissues, making them more relev an t to injury [50 – 53] and neuro-musculosk eletal pathology [28, 29, 54] than kinematics alone. W e show ed that Op enCap Mono cular can estimate a k ey measure of knee loading with an accuracy sufficient to identify individuals at risk for rapid progression of medial compartmen t knee osteoarthritis [29, 30, 47 – 49]. W e also demonstrated the ability to compute the knee extension moment and changes in low er-extremity joint moments during chair rise with sufficien t accuracy to distinguish individuals with and without early signs of frailt y [25 – 28]. Cloud deploymen t mak es adv anced algorithms—previously confined to biomechanics and computer vision exp erts—accessible to a broad communit y of researchers studying h uman mov ement. Several recent developmen ts ha ve made this p ossible. Computer vision mo dels now estimate human p ose with increasing accuracy , with metho ds such as WHAM pro viding fast predictions on long videos. In parallel, muscle-driv en physics sim ulations hav e b ecome fast enough for routine use [33]; for example, a single sit-to-stand rep etition can now b e simulated in minutes. Large, high-quality ground-reaction-force datasets [55] ha ve also enabled machine learning mo dels that estimate kinetics with accuracy comparable to or exceeding physics-based metho ds [31]. Deplo ying these algorithms in an easy-to-use workflo w demo cratizes access to these Marc h 27, 2026 14/23 cutting-edge adv ancements in biomec hanics and computer vision. F or instance, using the t wo-camera Op enCap system, w e recently partnered with neurology clinicians to collect data from individuals with rare neuromuscular diseases across fourteen states at large-scale data collection even ts [23]. Others hav e used similar approaches to link join t loading to cartilage outcomes outside the MRI suite [56]. With its more straightforw ard setup, Op enCap Mono cular enables motion assessment in even more ecologically v alid en vironments, such as the clinic and the home. The single-phone approac h addresses time, exp ertise, and equipment barriers that ha ve limited the adoption of motion capture into clinical practice and p opulation-scale studies. While applications requiring extremely high-precision kinematics, suc h as pre-surgical planning for cerebral palsy , will lik ely contin ue to justify lab oratory-based motion capture [57], Op enCap Mono cular can augment or replace existing low-fidelit y functional outcomes that are common in clinical practice and research. Time constrain ts and patients’ inability to indep enden tly complete assessments are among the most frequen tly cited reasons why physical therapists do not adopt digital health tec hnology [58, 59]. Ev en our relatively simple multi-camera Op enCap system requires setting up m ultiple devices, calibration, and completing a static p ose for mo del scaling b efore collecting mov emen t data [9]. With Op enCap Mono cular, users can b egin collecting data in less than 1 min ute from the moment they log in to the application, without an y calibration or scaling steps. Data collection could b e easily automated for indep enden t patien t completion in the clinic. This conv enience enables clinicians to quan tify informative biomechanical outcomes, such as joint angles or muscle forces, during a functional activit y with little additional burden than existing low-fidelit y outcomes, lik e time to walk 10 meters. F urthermore, single smartphone mov emen t assessmen ts are feasible in the home [22], but prior work has measured low er-fidelity outcomes, like task completion time and 2D-pro jected kinematics. Op enCap Mono cular can quantify 3D kinematics and kinetics in the home, enabling large-scale decentralized studies of mo vemen t health, data-driven telerehabilitation, and regular monitoring of ph ysical function in high-risk p opulations. The research implications of this accessibility are a shift from sparse, lab oratory-based snapshots of function to regular, real-world measuremen ts in large cohorts. It is imp ortan t to ac knowledge several limitations. First, the fo ot-con tact probabilities from WHAM influence b oth its initial p ose estimate and our refinemen t step; they p erform p oorly during activities inv olving prolonged flight phases. As a result, Op enCap Mono cular do es not currently p erform well for jumping tasks, although alternativ e contact-probabilit y mo dels could mitigate this limitation [60]. Nev ertheless, the metho d p erforms well for activities of daily living frequently studied in mobilit y researc h. Second, our v alidation cohort consisted of young, healthy adults; additional studies in div erse p opulations, including those with pathological gait or mov emen t disorders, are needed. Third, we ev aluated a single-camera configuration (45 ° an terolateral view), which we found qualitatively sup erior to fron tal or sagittal placemen ts, though different activities ma y b enefit from other viewp oin ts. F uture work should examine p erformance across viewp oin ts and sensitivity to slight v ariations in camera placemen t. As with all mono cular video-based biomechanics to ols, a consistent camera setup remains imp ortan t due to inheren t challenges in estimating out-of-plane motion. These limitations highlight the need for ev aluation across more diverse p opulations, activities, and environmen ts to assess generalizabilit y . Finally , Op enCap Mono cular was designed with a mo dular architecture, enabling con tinuous improv ement as computer vision and biomec hanics technologies evolv e. The initial 3D p ose estimation mo dule (currently WHAM) can b e readily upgraded as new mono cular p ose estimation or foundation mo dels b ecome av ailable. Because the do wnstream optimization, inv erse kinematics, and musculosk eletal simulation Marc h 27, 2026 15/23 comp onen ts are indep enden t of the p ose estimation algorithm, Op enCap Mono cular pro vides a flexible and extensible framework that will contin ue to evolv e with adv ances in computer vision while main taining biomechanically grounded outputs suitable for clinical and researc h applications. 5 Conclusion W e developed and v alidated Op enCap Mono cular, a metho d for quantifying 3D human motion and m usculoskeletal dynamics from a single smartphone video. The pip eline com bines mono cular p ose estimation with physics-inspired optimization to estimate accurate kinematics from video, and uses m usculoskeletal simulation and mac hine-learning mo dels to infer dynamics from those kinematics. Op enCap Mono cular demonstrated impro ved accuracy compared to direct computer-vision outputs while pro viding a more accessible alternative to b oth lab oratory-based motion capture and m ulti-camera video systems. This approach has the p oten tial to substantially broaden the use of quantitativ e mov emen t analysis in real-world settings, including telemedicine and decen tralized clinical trials. By making the algorithm freely av ailable and hosting computation in the cloud, Op enCap Mono cular adv ances the accessibility of biomec hanically grounded 3D human mov ement analysis. 6 CRediT Author Con tributions Selim Gilon: W riting–original draft, V alidation, Softw are, Metho dology , Visualization, F ormal analysis. Emily Y. Miller: W riting–review & editing, Softw are. Scott D. Uhlrich: W riting–review & editing, Conce ptualization, Softw are, Metho dology , Sup ervision. 7 Ac kno wledgmen ts This study w as funded by grants from the Myotonic Dystrophy F oundation, the W u Tsai Human P erformance Alliance Agility Pro ject Program, and the NIH Restore Cen ter Pilot Pro ject Program. 8 Declaration of comp eting in terest SDU is a co-founder of Mo del Health, Inc., which provides markerless motion-capture tec hnology for commercial, non-academic use. All softw are presented in this work is op en source, integrated into the Op enCap co debase, and incorp orated into the cloud-deplo yed Op enCap platform, whic h is freely av ailable for academic research. Mo del Health, Inc. had no role in the study design; data collection, analysis, or in terpretation; the decision to publish; or the preparation of the manuscript. 9 Co de and Data Av ailability The Op enCap Mono cular source co de is op enly av ailable under a p ermissible Apac he 2.0 License at h ttps://github.com/utahmobl/opencap-mono cular. The OpenCap w eb and iOS applications used for data collection are accessible at https://app.opencap.ai. All v alidation exp eriments used the previously published, publicly av ailable Op enCap dataset describ ed in Uhlrich, F alisse, Kidzinski, et al., 2023 (h ttps://simtk.org/opencap). This dataset includes sync hronized multi-camera videos, marker-based motion capture, Marc h 27, 2026 16/23 and force plate recordings. The OpenCap Mono cular outputs are av ailable at h ttps://simtk.org/opencap-mono c. Marc h 27, 2026 17/23 References 1. P aterno M, Schmitt L, F ord K, Rauh M, Myer G, Huang B, et al. Biomechanical Measures During Landing and P ostural Stability Predict Second Anterior Cruciate Ligamen t Injury After Anterior Cruciate Ligament Reconstruction and Return to Sp ort. The American journal of sp orts medicine. 2010 10;38:1968-78. doi:10.1177/0363546510376053. 2. Clark S, Row e N, Adnan M, Brown S, Mulcahey M. Effective Interv entions for Impro ving F unctional Mov ement Screen Scores Among “High-Risk” Athletes: A Systematic Review. International Journal of Sp orts Physical Therapy . 2022 02;17. doi:10.26603/001c.31001. 3. Buc kley C, Alco c k L, Mc Ardle R, Din S, mazz` a C, Y arnall A, et al. The Role of Mo vemen t Analysis in Diagnosing and Monitoring Neuro degenerativ e Conditions: Insigh ts from Gait and Postural Control. Brain Sciences. 2019 02;9:34. doi:10.3390/brainsci9020034. 4. Sacco ICN, T rom bini-Souza F, Suda EY. Impact of biomechanics on therap eutic in terven tions and rehabilitation for ma jor chronic musculosk eletal conditions: A 50-y ear p ersp ectiv e. Journal of Biomechanics. 2023;154:111604. Av ailable from: https: //www.sciencedirect.com/science/article/pii/S0021929023001732 . doi:h ttps://doi.org/10.1016/j.jbiomech.2023.111604. 5. Alderink G, ˜ Ounpuu S. Biomec hanics of Human Motion and Its Clinical Applications: Instrumen ted Gait Analysis. Bio engineering. 2025;12(10). Av ailable from: https://www.mdpi.com/2306- 5354/12/10/1076 . doi:10.3390/bio engineering12101076. 6. Collins KC, Kennedy NC, Clark A, Pomero y VM. Kinematic Comp onents of the Reac h-to-T arget Mov ement After Stroke for F o cused Rehabilitation Interv entions: Systematic Review and Meta-Analysis. F ron tiers in Neurology . 2018;V olume 9 - 2018. Av ailable from: https://www.frontiersin.org/journals/neurology/ articles/10.3389/fneur.2018.00472 . doi:10.3389/fneur.2018.00472. 7. Uhlric h SD, Mazzoli V, Silder A, Finlay AK, Kogan F, Gold GE, et al. P ersonalised gait retraining for medial compartment knee osteoarthritis: a randomised con trolled trial [doi: 10.1016/S2665-9913(25)00151-1]. The Lancet Rheumatology . 2025 2025/12/18;7(10):e708-18. Av ailable from: https://doi.org/10.1016/S2665- 9913(25)00151- 1 . doi:10.1016/S2665-9913(25)00151-1. 8. Stebbins J, Harrington M, Stewart C. Clinical gait analysis 1973–2023: Ev aluating progress to guide the future. Journal of Biomechanics. 2023;160:111827. Av ailable from: https: //www.sciencedirect.com/science/article/pii/S0021929023003986 . doi:h ttps://doi.org/10.1016/j.jbiomech.2023.111827. 9. Uhlric h SD, F alisse A, Kidzi ´ nski L, Muccini J, Ko M, Chaudhari AS, et al. Op enCap: Human mov ement dynamics from smartphone videos. PLOS Computational Biology . 2023 10;19(10):1-26. Av ailable from: https://doi.org/10.1371/journal.pcbi.1011462 . doi:10.1371/journal.p cbi.1011462. Marc h 27, 2026 18/23 10. Oliv eira AS, Pirscov eanu CI. Implications of sample size and acquired num b er of steps to in vestigate running biomechanics. Scientific Rep orts. 2021;11(1):3083. Av ailable from: https://doi.org/10.1038/s41598- 021- 82876- z . doi:10.1038/s41598-021-82876-z. 11. Kn udson DV. Authorship and Sampling Practice in Selected Biomec hanics and Sp orts Science Journals. Perceptual and Motor Skills. 2011;112(3):838-44. PMID: 21853773. Av ailable from: https://doi.org/10.2466/17.PMS.112.3.838- 844 . doi:10.2466/17.PMS.112.3.838-844. 12. W eygers I, Kok M, Konings M, Hallez H, De V ro ey H, Claeys K. Inertial Sensor-Based Lo wer Limb Joint Kinematics: A Metho dological Systematic Review. Sensors. 2020;20(3). Av ailable from: https://www.mdpi.com/1424- 8220/20/3/673 . doi:10.3390/s20030673. 13. Al Borno M, O’Day J, Ibarra V, Dunne J, Seth A, Habib A, et al. Op enSense: An op en-source to olbox for inertial-measurement-unit-based measurement of lo wer extremity kinematics ov er long durations. Journal of NeuroEngineering and Rehabilitation. 2022;19(1):22. Av ailable from: https://doi.org/10.1186/s12984- 022- 01001- x . doi:10.1186/s12984-022-01001-x. 14. Konrath JM, Karatsidis A, Schepers HM, Bellusci G, de Zee M, Andersen MS. Estimation of the Knee Adduction Moment and Joint Contact F orce during Daily Living Activities Using Inertial Motion Capture. Sensors. 2019;19(7). Av ailable from: https://www.mdpi.com/1424- 8220/19/7/1681 . doi:10.3390/s19071681. 15. Karatsidis A, Bellusci G, Schepers HM, De Zee M, Andersen MS, V eltink PH. Estimation of Ground Reaction F orces and Momen ts During Gait Using Only Inertial Motion Capture. Sensors. 2017;17(1). Av ailable from: https://www.mdpi.com/1424- 8220/17/1/75 . doi:10.3390/s17010075. 16. Serv ais L, Y en K, Guridi M, Luk awy J, Vissiere D, Strijb os P . Stride V elo cit y 95th Cen tile: Insights into Gaining Regulatory Qualification of the First W earable-Deriv ed Digital Endp oin t for use in Duchenne Muscular Dystrophy T rials. Journal of Neuromuscular Diseases. 2021 12;9:1-12. doi:10.3233/JND-210743. 17. W ang C, Chan PPK, Lam BMF, W ang S, Zhang JH, Chan ZYS, et al. Real-Time Estimation of Knee Adduction Momen t for Gait Retraining in P atients With Knee Osteoarthritis. IEEE T ransactions on Neural Systems and Rehabilitation Engineering. 2020;28(4):888-94. doi:10.1109/TNSRE.2020.2978537. 18. Desmarais Y, Mottet D, Slangen P , Mon tesinos P . A review of 3D human p ose estimation algorithms for mark erless motion capture. Computer Vision and Image Understanding. 2021;212:103275. Av ailable from: https: //www.sciencedirect.com/science/article/pii/S1077314221001193 . doi:h ttps://doi.org/10.1016/j.cviu.2021.103275. 19. Cao Z, Hidalgo G, Simon T, W ei SE, Sheikh Y. OpenPose: Realtime Multi-P erson 2D Pose Estimation Using Part Affinity Fields. IEEE T ransactions on P attern Analysis and Machine Intelligence. 2018;43:172-86. Av ailable from: https://api.semanticscholar.org/CorpusID:198169848 . 20. W ang J, k e S, Cheng T, Borui J, Deng C, Zhao Y, et al. Deep High-Resolution Represen tation Learning for Visual Recognition. IEEE T ransactions on Pattern Marc h 27, 2026 19/23 Analysis and Mac hine Intelligence. 2020 04;PP:1-1. doi:10.1109/TP AMI.2020.2983686. 21. Kank o RM, Laende EK, Davis EM, Selbie WS, Deluzio KJ. Concurrent assessmen t of gait kinematics using marker-based and markerless motion capture. Journal of Biomec hanics. 2021;127:110665. Epub 2021 Aug 3. PMID: 34380101. Av ailable from: https://doi.org/10.1016/j.jbiomech.2021.110665 . doi:10.1016/j.jbiomec h.2021.110665. 22. Bosw ell MA, Kidzi ´ nski L , Hicks JL, Uhlrich SD, F alisse A, Delp SL. Smartphone videos of the sit-to-stand test predict osteoarthritis and health outcomes in a nation wide study . np j Digital Medicine. 2023;6(1):32. Av ailable from: https://doi.org/10.1038/s41746- 023- 00775- 1 . doi:10.1038/s41746-023-00775-1. 23. Ruth PS, Uhlrich SD, de Monts C, F alisse A, Muccini J, Co vitz S, et al. Video-based biomec hanical analysis captures disease-sp ecific mov emen t signatures of differen t neuromuscular diseases. bioRxiv. 2024. Av ailable from: https://www.biorxiv.org/content/early/2024/09/30/2024.09.26.613967 . doi:10.1101/2024.09.26.613967. 24. Gurc hiek R, T eplin Z, F alisse A, Delp S. Hamstrings Are Stretched More and F aster during Accelerative Running Compared to Sp eed-Matched Constant-Speed Running. Medicine & Science in Sp orts & Exercise. 2024 10;57. doi:10.1249/MSS.0000000000003577. 25. heijden M, Meijer K, Willems P , Sa velberg H. Muscles limiting the sit-to-stand mo vemen t An exp erimen tal simulation of m uscle weakness. Gait & p osture. 2009 08;30:110-4. doi:10.1016/j.gaitpost.2009.04.002. 26. Moreland J, Richardson J, Goldsmith C, Clase C. Muscle W eakness and F alls in Older Adults: A Systematic Review and Meta-Analysis. Journal of the American Geriatrics So ciet y . 2004 08;52:1121-9. doi:10.1111/j.1532-5415.2004.52310.x. 27. V an der Kruk E, Silv erman A, Reilly P , Bull A. Comp ensation due to age-related decline in sit-to-stand and sit-to-w alk. Journal of Biomechanics. 2021 04;122:110411. doi:10.1016/j.jbiomec h.2021.110411. 28. Sek o T, Ak asak a H, Ko yama M, Himuro N, Saitoh S, Ogaw a S, et al. The Con tributions of Knee Extension Strength and Hand Grip Strength to F actors Relev ant to Physical F railt y: The T anno-Sob etsu Study . Geriatrics. 2024;9(1):9. Av ailable from: https://doi.org/10.3390/geriatrics9010009 . doi:10.3390/geriatrics9010009. 29. Miy azaki T, W ada M, Kaw ahara H, Sato M, Baba H, Shimada S. Dynamic load at baseline can predict radiographic disease progression in medial compartment knee osteoarthritis. Annals of the Rheumatic Diseases. 2002;61(7):617-22. Av ailable from: https: //www.sciencedirect.com/science/article/pii/S0003496724093051 . doi:h ttps://doi.org/10.1136/ard.61.7.617. 30. Amin S, Luep ongsak N, McGibb on C, LaV alley M, Krebs D, F elson D. Knee adduction moment and developmen t of chronic knee pain in elders. Arthritis Care and Researc h. 2004 Jun;51(3):371-6. doi:10.1002/art.20396. Marc h 27, 2026 20/23 31. T an T, V an W ouw e T, W erling KF, Liu CK, Delp SL, Hicks JL, et al. GaitDynamics: a generative foundation mo del for analyzing human walking and running. Nature Biomedical Engineering. 2026. Av ailable from: https://doi.org/10.1038/s41551- 025- 01565- 8 . doi:10.1038/s41551-025-01565-8. 32. Miller EY, T an T, F alisse A, Uhlrich SD. Integrating Machine Learning with Musculosk eletal Simulation Improv es Op enCap Video-Based Dynamics Estimation. bioRxiv. 2025. Av ailable from: https://www.biorxiv.org/content/early/2025/12/22/2025.12.19.695562 . doi:10.64898/2025.12.19.695562. 33. F alisse A, Serrancol ´ ı G, Dem bia C, Gillis J, Jonkers I, De Gro ote F. Rapid predictiv e simulations with complex musculosk eletal mo dels suggest that diverse health y and pathological human gaits can emerge from similar con trol strategies. Journal of The Ro yal So ciet y Interface. 2019 08;16. doi:10.1098/rsif.2019.0402. 34. Xu Y, Zhang J, Zhang Q, T ao D. ViTP ose: simple vision transformer baselines for human p ose estimation. In: Pro ceedings of the 36th International Conference on Neural Information Pro cessing Systems. NIPS ’22. Red Ho ok, NY, USA: Curran Asso ciates Inc.; 2022. . 35. Shin S, Kim J, Halila j E, Black M. WHAM: Reconstructing W orld-Grounded Humans with Accurate 3D Motion; 2024. p. 2070-80. doi:10.1109/CVPR52733.2024.00202. 36. Lop er M, Mahmo o d N, Romero J, P ons-Moll G, Black MJ. In: SMPL: A Skinned Multi-P erson Linear Mo del. 1st ed. New Y ork, NY, USA: Asso ciation for Computing Mac hinery; 2023. Av ailable from: https://doi.org/10.1145/3596711.3596800 . 37. Seth A, Hicks JL, Uchida TK, Habib A, Dembia CL, Dunne JJ, et al. Op enSim: Sim ulating musculosk eletal dynamics and neuromuscular control to study human and animal mo vemen t. PLOS Computational Biology . 2018 07;14(7):1-20. Av ailable from: https://doi.org/10.1371/journal.pcbi.1006223 . doi:10.1371/journal.p cbi.1006223. 38. Zhang B, Li K, Cheng Z, Hu Z, Y uan Y, Chen G, et al. VideoLLaMA 3: F rontier Multimo dal F oundation Mo dels for Image and Video Understanding. ArXiv. 2025;abs/2501.13106. Av ailable from: https://api.semanticscholar.org/CorpusID:275789161 . 39. Lai AKM, Arnold AS, W ak eling JM. Why are Antagonist Muscles Co-activ ated in My Sim ulation? A Musculoskeletal Mo del for Analysing Human Lo comotor T asks. Annals of Biomedical Engineering. 2017;45(12):2762-74. doi:10.1007/s10439-017-1920-7. 40. Ra jagopal A, Dembia CL, DeMers MS, Delp DD, Hicks JL, Delp SL. F ull-Bo dy Musculosk eletal Mo del for Muscle-Driv en Simulation of Human Gait. IEEE T ransactions on Biomedical Engineering. 2016;63(10):2068-79. Epub 2016 Jul 7. PMID: 27392337; PMCID: PMC5507211. Av ailable from: https://doi.org/10.1109/TBME.2016.2586891 . doi:10.1109/TBME.2016.2586891. 41. Serrancol ´ ı G, F alisse A, Dem bia C, V antilt J, T anghe K, Lefeb er D, et al. Sub ject-Exoskeleton Contact Mo del Calibration Leads to Accurate Interaction Marc h 27, 2026 21/23 F orce Predictions. IEEE T ransactions on Neural Systems and Rehabilitation Engineering. 2019 06;PP:1-1. doi:10.1109/TNSRE.2019.2924536. 42. Sherman MA, Seth A, Delp SL. Sim b o dy: multibo dy dynamics for biomedical researc h. Pro cedia IUT AM. 2011;2:241-61. IUT AM Symp osium on Human Bo dy Dynamics. Av ailable from: https: //www.sciencedirect.com/science/article/pii/S2210983811000241 . doi:h ttps://doi.org/10.1016/j.piutam.2011.04.023. 43. Andersson J, Gillis J, Horn G, Rawlings J, Diehl M. CasADi: a softw are framew ork for nonlinear optimization and optimal control. Mathematical Programming Computation. 2018 07;11. doi:10.1007/s12532-018-0139-4. 44. W¨ ach ter A, Biegler L T. On the implementation of an interior-point filter line-searc h algorithm for large-scale nonlinear programming. Mathematical Programming. 2006;106(1):25-57. Av ailable from: https://doi.org/10.1007/s10107- 004- 0559- y . doi:10.1007/s10107-004-0559-y. 45. Smith S, Reilly P , Bull A. A musculosk eletal mo delling approach to explain sit-to-stand difficulties in older p eople due to changes in muscle recruitment and mo vemen t strategies. Journal of Biomec hanics. 2019 10;98:109451. doi:10.1016/j.jbiomec h.2019.109451. 46. Bennell KL, Bowles KA, W ang Y, Cicuttini F, Da vies-T uck M, Hinman RS. Higher dynamic medial knee load predicts greater cartilage loss ov er 12 months in medial knee osteoarthritis. Annals of the Rheumatic Diseases. 2011;70(10):1770-4. Av ailable from: https: //www.sciencedirect.com/science/article/pii/S000349672419125X . doi:h ttps://doi.org/10.1136/ard.2010.147082. 47. M ¨ undermann A, Dyrby C, Hurwitz D, Sharma L, Andriacchi T. Poten tial Strategies to Reduce Medial Compartmen t Loading in Patien ts With Knee Osteoarthritis of V arying Severit y: Reduced W alking Sp eed. Arthritis and rheumatism. 2004 05;50:1172-8. doi:10.1002/art.20132. 48. M ¨ undermann A, Dyrby CO, Andriacchi TP . Secondary gait changes in patients with medial compartmen t knee osteoarthritis: increased load at the ankle, knee, and hip during w alking. Arthritis and rheumatism. 2005;52 9:2835-44. Av ailable from: https://api.semanticscholar.org/CorpusID:13218842 . 49. Bosw ell MA, Uhlrich SD, Kidzi ´ nski L , Thomas K, Kolesar JA, Gold GE, et al. A neural net work to predict the knee adduction moment in patients with osteoarthritis using anatomical landmarks obtainable from 2D video analysis. Osteoarthritis and Cartilage. 2021;29(3):346-56. 50. Hew ett TE, Myer GD, F ord KR, Heidt Jr RS, Colosimo AJ, McLean SG, et al. Biomec hanical measures of neuromuscular control and v algus loading of the knee predict an terior cruciate ligament injury risk in female athletes: a prosp ectiv e study . The American journal of sp orts medicine. 2005;33(4):492-501. 51. Y ang NP , Hsu NW, Lin CH, Chen HC, Tsao HM, Lo SS, et al. Relationship b et w een muscle strength and fall episo des among the elderly: The Yilan study , T aiw an. BMC Geriatrics. 2018 04;18. doi:10.1186/s12877-018-0779-2. Marc h 27, 2026 22/23 52. Kadono N, Pa v ol MJ. Effects of aging-related losses in strength on the ability to reco ver from a backw ard balance loss. Journal of Biomechanics. 2013;46(1):13-8. Av ailable from: https: //www.sciencedirect.com/science/article/pii/S0021929012005209 . doi:h ttps://doi.org/10.1016/j.jbiomech.2012.08.046. 53. Pijnapp els M, Burg P , Reeves N, V an Dieen J. Identification of elderly fallers by m uscle strength measures. Europ ean journal of applied physiology . 2008 04;102:585-92. doi:10.1007/s00421-007-0613-6. 54. Nec kel ND, Nichols D, Hidler JM. Joint Moments Exhibited by Chronic Stroke Sub jects While W alking with a Prescrib ed Physiological Gait Pattern. In: 2007 IEEE 10th In ternational Conference on Rehabilitation Rob otics; 2007. p. 771-5. doi:10.1109/ICORR.2007.4428512. 55. W erling K, Kaneda J, T an T, Agarwal R, Sko v S, V an W ouw e T, et al. AddBiomec hanics Dataset: Capturing the Physics of Human Motion at Scale. In: Leonardis A, Ricci E, Roth S, Russak ovsky O, Sattler T, V arol G, editors. Computer Vision – ECCV 2024. Cham: Springer Nature Switzerland; 2025. p. 490-508. 56. Miller EY, Low e T, Zhu H, Lee W, Argote PF, Dresdner D, et al. Evolving cartilage strain with pain progression and gait: a longitudinal study p ost-A CL reconstruction at six and tw elve months. medRxiv. 2024. Av ailable from: https: //www.medrxiv.org/content/early/2024/09/09/2024.09.08.24313289 . doi:10.1101/2024.09.08.24313289. 57. Ra jagopal A, Kidzi ´ nski L, McGlaughlin AS, Hicks JL, Delp SL, Sch wartz MH. Pre-op erativ e gastro cnemius lengths in gait predict outcomes following gastro cnemius lengthening surgery in children with cerebral palsy . PLOS ONE. 2020 06;15(6):1-17. Av ailable from: https://doi.org/10.1371/journal.pone.0233706 . doi:10.1371/journal.p one.0233706. 58. Ma yer KP , Norris TL, Kumble S, Morelli N, Gorman SL, Ohtak e PJ. Acute Care Ph ysical Therapy Practice Analysis Identifies the Need for a Core Outcome Measuremen t Set. Journal of Acute Care Physical Therapy . 2021 10;12:150-7. Av ailable from: https://journals.lww.com/jacpt/fulltext/2021/10000/ acute_care_physical_therapy_practice_analysis.4.aspx . doi:10.1097/JA T.0000000000000161. 59. Jette DU, Halb ert J, Iverson C, Miceli E, Shah P . Use of Standardized Outcome Measures in Physical Therapist Practice: Perceptions and Applications. P hysical Therap y . 2009 02;89(2):125-35. Av ailable from: https://doi.org/10.2522/ptj.20080234 . doi:10.2522/ptj.20080234. 60. Remp e D, Birdal T, Hertzmann A, Y ang J, Sridhar S, Guibas LJ. HuMoR: 3D Human Motion Mo del for Robust Pose Estimation. In: International Conference on Computer Vision (ICCV); 2021. . Marc h 27, 2026 23/23 Supplemen tary App endix Rotational accuracy for eac h degree of freedom is rep orted in Fig S1. Optimization parameters for eac h activit y are listed in T able S1. The p ose refinement optimization pro cedure is describ ed in Section 2.2.2. Example screenshots of the Op enCap mobile and web applications are shown in Fig S2. Fig S1. Rotational kinematic accuracy by degree of freedom. Mean absolute errors (MAE) for each degree of freedom, compared to mark er-based motion capture. Errors are av eraged across all participants. P ose refinement in Op enCap Mono cular substantially reduces errors relative to the CV+IK baseline, ac hieving accuracy comparable to the tw o-camera system. T able S1. Optimization parameters for each activit y . P arameter W alking Squats STS Other Filter frequency (Hz) 6 4 4 8 Repro jection weigh t 50 50 250 50 Con tact velocity weigh t 1 1 1 1 Con tact p osition w eight 100 100 100 100 Smo othness weigh t 10 10 10 10 Flat flo or weigh t 100 10 50 – Marc h 27, 2026 1/2 Fig S2. Screenshots of the Op enCap mobile/w eb app with a mono cular recording. Marc h 27, 2026 2/2

OpenCap Monocular: 3D Human Kinematics and Musculoskeletal Dynamics from a Single Smartphone Video

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment