A Longitudinal Study of Usability in Identity-Based Software Signing

A Longitudinal Study of Usability in Identity-Based Softwar e Signing K elechi G. Kalu Pur due Univer sity kalu@pur due .edu Hieu T ran Pur due Univer sity tran335@pur due.edu Santiago T orres-Arias Pur due University santiagotorr es@pur due.edu Sooyeon Jeong Pur due University sooyeonj@pur due.edu James C. Davis Pur due University davisjam@pur due.edu Abstract Identity-based softw are signing tools aim to mak e softw are artifact pro venance v eriﬁable while reducing the operational burden of long-li ved key management. Ho we ver , there is lim- ited cross-tool, longitudinal e vidence about which usability problems arise in practice and how those problems e volve as tools mature. This gap matters because unusable signing and veriﬁcation w orkﬂo ws can lead to incomplete adoption, mis- conﬁguration, or skipped veriﬁcation, undermining intended integrity guarantees. W e conducted the ﬁrst mining-software-repositories study of ﬁv e open-source identity-based signing ecosystems: Sig- store, OpenPubKe y , HashiCorp V ault, K e yfactor , and No- tary v2. W e analyzed ∼ 3,900 GitHub issues from Nov . 2021– Nov . 2025. W e coded each issue for the reported usability concern and the implicated architectural component, and com- pared patterns across tools and over time. Across ecosystems, reported concerns concentrate in veriﬁcation workﬂo ws, pol- icy and conﬁguration surfaces, and integration boundaries. Longitudinal Poisson trend analysis sho ws substantial de- clines in reported issues for most ecosystems. Howe ver , across usability themes, workﬂo w and documentation-related con- cerns decline unev enly across tools and concern types, and veriﬁcation workﬂo ws and conﬁguration surfaces remain per- sistent friction points. These results indicate that identity- based signing reduces some usability b urdens while relocat- ing complexity to veriﬁcation clarity , policy conﬁguration, and deployment integration. Designing future signing ecosys- tems therefore requires treating veriﬁcation semantics and release workﬂows as ﬁrst-class usability targets rather than peripheral integration concerns. 1 Introduction Modern software supply chains depend on organizations con- suming code produced by others [ 8 , 74 ]. This reuse introduces attack vectors such as unauthorized or malicious code injec- tion [ 6 , 22 , 75 , 83 , 84 ]. T o mitigate this risk, artifacts must carry veriﬁable pr ovenance , that is, evidence of inte grity and authentic origin, across or ganizational boundaries [ 22 , 23 , 37 ]. Software signing provides the strongest formally grounded mechanism for establishing pro venance by binding artifacts to issuer identities and enabling do wnstream veriﬁcation [ 37 ]. Signing in volves two coupled workﬂo ws: upstream signature creation by producers, and downstream signature veriﬁcation by consumers. If either workﬂo w is hard to correctly complete, cryptographic guarantees degrade as producers mis-sign or consumers fail to verify [ 76 ]. Usability challenges in software signing undermine its guarantees. Early usable-security research sho wed that even technically proﬁcient users struggled to correctly generate, manage, and verify cryptographic ke ys [ 81 ]. Engineers adopt insecure workarounds when workﬂo ws are complex [ 29 ]. Empirical studies of software signing ecosystems further doc- ument low adoption and pers istent key-management f ailures in practice [ 76 ]. In response to these burdens, modern ecosys- tems hav e increasingly adopted identity-based signing , which replaces dev eloper -managed long-li ved ke ys with short-li ved credentials issued by external identity and certiﬁcate services. Empirical evidence on the usability of identity-based signing remains limited and is based primarily on intervie w stud- ies [ 35 , 38 ]. These studies suggest that identity-based designs shift usability burdens by introducing ne w conﬁguration and coordination challenges across tool components. Howe ver , we lack systematic evidence about ho w usability issues manifest across tools and e volv e over time in modern signing ecosys- tems. T ime-series quantitative analysis can rev eal whether problems decrease, persist, or shift ov er time, and it helps av oid repeating the trajectory of key-managed signing, where persistent usability burdens coincided with lo w signing adop- tion outside mandatory settings [ 21 , 76 , 86 ]. Our study addresses this gap through the ﬁrst mining- software-repositories in vestigation of usability issues in soft- ware signing tools. W e examine ﬁve open-source identity- based signing ecosystems and analyze developer -reported us- ability concerns as reﬂected in GitHub issue discussions. W e code these discussions for the type of usability concern raised 1 and the associated component(s) of the identity-based archi- tecture, and we compare these patterns across tools and over time. This approach allo ws us to observe ho w usability chal- lenges surface during adoption, integration, and maintenance, and how those challenges e volve as ecosystems mature. Our results provide a cross-tool map of developer -reported usability problems in identity-based software signing and a component-lev el view of where these problems arise. Re- ported problems concentrate in CLI/API-facing integration surfaces ( e.g ., Build/CI and policy/conﬁguration), with veriﬁ- cation workﬂo ws also sho wing strong cross-tool differences. Using Poisson trend analyses over the 48-month windo w , we ﬁnd that issue frequenc y decreases for some tools and themes, but trajectories are mixed, indicating that maturity signals are not uniform across identity-based signing tools. T ogether, these results sho w that identity-based designs shift usability challenges across components rather than eliminating them, and they identify concrete engineering surfaces where im- prov ements are likely to strengthen real-world signing and veriﬁcation guarantees. T o summarize our contributions: • W e present the ﬁrst mining-software-repositories study of usability in software signing tools, focusing on modern identity-based signing ecosystems. • W e provide an architecture-grounded characterization of dev eloper-reported usability concerns, mapping recurring themes to components and integration boundaries. • W e identify persistent usability hot spots and their trajecto- ries ov er time, yielding practical guidance for maintainers and adopters ev aluating deployment readiness. 2 Background and Related W ork This section introduces our usability framing (§ 2.1 ), and sum- marizes identity-based signing (§ 2.2 ). 2.1 Usability is a Dynamic Property Usability concerns the extent to which a tool enables users to achie ve its intended purpose [ 47 , 48 ]. Usability inﬂuences technology and tool adoption and acceptance [ 13 , 14 , 46 ] in general. In light of the economic impact of rapid software dev elopment [ 26 ], the usability of tools has therefore been a special focus in software engineering [ 4 , 45 , 48 , 60 ]. The usability proﬁle of a tool typically evolv es during its lifecycle. Usability is expected to improv e as a tool ma- tures [ 51 ] as a result of two factors. First, the user base ex- pands beyond early adopters, e xpanding interaction patterns and leading to increased emphasis on user-centered design practices [ 49 , 50 , 62 ]. Second, iterativ e dev elopment incor- porates this user feedback to stabilize interfaces and work- ﬂows [ 49 , 50 , 62 ]. Howe ver , maturity does not guarantee uni- form improvement across workﬂo ws, because new deploy- ment contexts and scaling pressures can surface ne w usability problems e ven as earlier ones are resolv ed. T able 1 summa- rizes this lifecycle lens, illustrating how usability signals may rise, shift, and ev entually stabilize as (signing) tools evolv e. T able 1: T ypical tool usability lifecycle. Lifecycle Stage Signals Mechanisms Early Adoption Rising reports of friction and con- ﬁguration issues Increased exposure; in- complete documentation; unstable workﬂo ws Growth / Inte gra- tion Persistent or shifting friction Integration into CI/CD; policy reﬁnement; identity management complexity Mature Ecosys- tem Stabilization or decline in issue frequency Improv ed defaults; com- munity knowledge; w ork- ﬂow automation 2.2 Identity-Based Software Signing Software signing uses public-k ey cryptography to establish ar - tifact integrity and authorship [ 77 ]. T raditional key-managed approaches ( e.g ., PGP/GPG) prov ed difﬁcult to use in practice due to key lifec ycle management [ 27 , 81 ]. Recently , identity- based signing systems were de veloped — these reduce ke y- management friction via short-liv ed certiﬁcates bound to fed- erated identities [ 21 , 90 ]. W e revie w the concepts of identity- based signing (§ 2.2.1 ) and then situate our study by summa- rizing prior empirical ﬁndings on signing usability (§ 2.2.2 ). 2.2.1 Concepts and Architectur e Software signing is the primary cryptographic mechanism used to establish artifact integrity and prov enance in mod- ern software supply chains [ 12 , 25 , 27 , 37 , 38 , 67 ]. It builds on public key cryptography , where a signer holds a priv ate key and distrib utes a corresponding public key , and a veriﬁer veriﬁes the identity of the signer using the distrib uted public key [ 15 , 23 , 40 ]. When veriﬁcation succeeds, the veriﬁer gains evidence that the artif act has not been modiﬁed since signing and that the signer controlled the corresponding pri v ate ke y at signing time [ 40 ]. In practice, these guarantees depend not only on cryptographic primitiv es b ut also on how signing k eys or credentials are created, protected, rotated, rev oked, discov- ered, and bound to identities [ 35 , 38 , 76 , 81 ]. Con ventional key-managed signing workﬂo ws, described in § B , require dev elopers to create and manage long-liv ed ke ys directly , a responsibility that has historically posed persistent usability and operational challenges [ 81 ]. Identity-based signing is a recent response to this challenge. This approach replaces de veloper -managed long-lived keys with short-liv ed credentials issued by external identity and certiﬁcate services [ 18 , 23 ]. Instead of requiring de velopers to 2 Figure 1: Identity-based signing uses short-lived certiﬁcates instead of long-liv ed keys. T ypical architectures decompose signing and veriﬁcation across four components (T able 7 ). create and distrib ute long-li ved k eys, identity-based ecosys- tems authenticate a human or workload through an identity provider and bind that identity to signing material through a credential-issuing service. This approach commonly issues short-liv ed certiﬁcates for ephemeral public keys, enabling veriﬁcation to rely on time-scoped and auditable identity bind- ings rather than indeﬁnitely trusted keys. As a result, identity binding and credential lifec ycle management are shifted from individual de velopers to shared ecosystem infrastructure. Identity-based signing ecosystems are typically architected using four primary components (Figure 1 ). Appendix T able 7 elaborates on each component, but succinctly: • Orchestrator: Primary de veloper interface that creates or selects artifacts, obtains identity tokens, generates ephemeral key pairs, and produces signatures ov er artifacts. • Identity Pro vider (IdP): Authenticates a human or w ork- load and issues an identity token for other services. • Credential Issuer / Certiﬁcate A uthority: Binds authenti- cated identity to signing material, commonly by issuing a short-liv ed certiﬁcate for an ephemeral public key . • Certiﬁcate and Signature Logs: Append-only logs that record certiﬁcate issuance and signing e vents to support transparency , monitoring, and audit. These components collectiv ely implement the workﬂo w illus- trated in Figure 1 . The typical architectural decomposition allows customization and reuse across ecosystems, yet also expands the conﬁguration surface and integration boundaries that shape signing and veriﬁcation semantics — and usability . 2.2.2 Empirical Evidence on Signing Usability Usability challenges in software signing were ﬁrst docu- mented in studies of k ey-managed cryptographic tools, which showed that public-k ey w orkﬂo ws impose substantial cogni- tiv e and operational burden on users. Early usable-security research demonstrated that e ven technically proﬁcient users like “Johnny” struggled to correctly generate, manage, and verify cryptographic ke ys, often misunderstanding trust mod- els, misconﬁguring tools, and failing to ex ecute ke y lifecycle tasks correctly [ 7 , 17 , 63 , 64 , 81 ]. Even twenty years later , em- pirical measurements of software package re gistries showed low signing rates outside mandatory settings, and that public- key management failures are common in maintainer-managed workﬂo ws [ 21 , 76 ]. In ke y-based signing, signing failures are due to usability issues more than faulty cryptography . Recent w ork has begun to examine the usability of identity- based signing. T o date, only interview-based data is av ailable about the usability of identity-based signing [ 35 , 38 ]. Inter- vie ws indicate that in practice, identity-based approaches alter but do not eliminate the usability situation. Identity-based signing introduces new conﬁguration complexity and coor- dination challenges, and redistrib utes workﬂo w roles across orchestrators, identity pro viders, and credential issuers. These studies suggest that identity-based designs change where us- ability problems emerge rather than eliminating them. Our Contribution Our study contrib utes to this literature by examining ho w signing usability manifests in real-world de velopment artifacts o ver time. T o the best of our knowledge, ours is the ﬁrst study to take a mining-softwar e-repositories appr oach to study usability issues in softw are signing tools. For identity-based signing, prior usability studies used an interview approach [ 35 , 38 ]. While this method provides con- textual insight into organizational adoption and dev eloper experience, it is scoped to particular deployments and cannot easily rev eal cross-ecosystem patterns or longitudinal trends. Our repository-based analysis allows us to compare identity-based signing tools using a uniform observational 3 lens. By examining reported friction across ecosystems and ov er time, we identify recurring usability surfaces, charac- terize architectural distrib ution of concerns, and model how usability signals ev olve as tools and components mature. Our approach complements prior qualitative work and provides evidence that can inform future tool design, ecosystem coor- dination, and signing adoption strategies. 3 Research Questions W e lack e vidence about the usability problems engineers expe- rience when using tools that implement identity-based signing. W e address this gap by analyzing usability concerns in pub- lic issue discussions across multiple identity-based software signing tools ov er time. W e ask three research questions: • RQ1 : What usability pr oblems ar e r eported for identity- based softwar e signing tools? • RQ2 : Which functionalities of identity-based software sign- ing tools ar e most frequently in volved in usability pr oblems? Are some components or boundaries high-friction? • RQ3 : How do r eported usability pr oblems change as identity-based softwar e signing tools matur e? Do reported usability concerns decline, persist, or change character ov er time, and do trajectories differ across tools? 4 Methodology This section describes our methodology 1 (Figure 2 ) for an- swering our research questions. W e ﬁrst select tools and col- lect issue data (§ 4.1 ). W e then process and code issues to identify usability concerns and affected components (§ 4.2 ). Finally , we analyze how these concerns distrib ute across tools and how the y change over time (§ 4.3 ). 4.1 T ool Selection & Data Collection Here we rationalize our choice of identity-based signing tools (§ 4.1.1 ), suitability of our data source (§ 4.1.2 ) and our data collection process (§ 4.1.3 ). 4.1.1 Identity-Based Signing T ool Selection T o be included in our study , a tool (and its components) must fully or partially follow the identity-based workﬂo w described in § 2.2.1 . W e further prioritize tools with substantial public activity in their GitHub repositories (as reﬂected by stars and issue volume), e vidence of use in practice [ 18 , 35 , 38 , 77 ]. W e select ﬁv e identity-based signing ecosystems from their presence in the literature [ 18 , 35 , 38 , 77 ] and use in prac- tice. They are: Sigstor e [ 68 ] (used by PyPI [ 33 ], Y ahoo [ 88 ], 1 Our study artifacts are av ailable at https://github.com/ nextgenusability/longitudinal_usability_id_signing . T able 2: The analyzed projects. The projects range in age from 4 years (K eyfactor –SignServer) to 11 years (HashiCorp– V ault). The y are sponsored by a mix of companies (Ke yfactor , HashiCorp) and foundations (Sigstore, Notary , OpenPubKe y). T ool Component Role Stars (Forks) # Issues Sigstore Cosign orchestrator 5.6k (687) 872 Fulcio CA 796 (168) 194 Rekor Certiﬁcate/signature log 1.1k (197) 251 Ke yfactor SignServer orchestrator 407 (51) 23 EJBCA-CE CA 869 (155) 189 Notary v2 Notation orchestrator 460 (91) 333 OpenPubKe y OpenPubKe y orchestrator 881 (68) 147 HashiCorp V ault Main interface / credential issuer 35k (4.6k) 1891 Kubernetes [ 79 ], etc.), Notary v2 [ 53 ] (used by A WS and Microsoft [ 52 ]), OpenPubKe y [ 56 ] (used by Docker [ 73 ], BastionZero [ 3 ]), K e yfactor [ 43 ] (users include Siemens and Schneider Electric [ 2 ]), and IBM’ s HashiCorp V ault [ 1 , 32 ], used by Mantech [ 30 ] and others [ 31 ]. W e only select components of these tools corresponding to the architecture described in Figure 1 . For example, for Sigstore, we include the orchestrator (Cosign), the certiﬁcate authority (Fulcio), and the log (Rekor). Our selection was based on analysis of ofﬁcial documentation, repository struc- ture, and workﬂo w descriptions for each ecosystem, and was validated through author consensus. T able 2 summarizes the repositories, their component roles, and the number of issues analyzed per component. 4.1.2 Data Source: GitHub Issues as Usability Evidence W e use GitHub issues and their associated comment threads to study dev eloper -reported usability problems in identity-based software signing tools. Issue reports are a natural data source for this study because they capture problems encountered during real adoption, integration, and maintenance work, and they preserve the context of troubleshooting and resolution through discussion [ 11 ]. This source is also aligned with our goal of characterizing usability problems that arise in practice, rather than usability judgments elicited through controlled tasks. As is standard in mining software repositories research, we treat issue discussions as situated e vidence of dev eloper experience rather than as a complete record of all problems encountered [ 36 ]. Much prior work has also used GitHub issues and other similar clariﬁcation platforms, such as Stack Overﬂo w , as a source of usability evidence [ 11 , 58 , 65 , 89 ]. 4.1.3 Issue Collection W e collect GitHub issues from the selected repositories over a 48-month window ending on Nov ember 5, 2025. W e choose this windo w to enable longitudinal analysis while keeping the dataset within a ﬁx ed time horizon in which tool workﬂo ws 4 Figure 2: Overvie w of our empirical pipeline for analyzing usability problems in identity-based softw are signing tools using GitHub issues. W e show the stages of data collection and ﬁltering, coding and LLM-assisted scaling, and do wnstream analysis used to answer RQ1 & RQ2 (problem categories and af fected components) and RQ3 (changes over time). and dev eloper platforms are broadly comparable. Because most of these tools are relati vely young, we focus on a shorter window rather than sampling across the full repository history . For each issue, we retain issue metadata and link the issue to its comment thread to preserv e troubleshooting and reso- lution context. W e exclude bot-generated issues to focus our analysis on dev eloper-reported usability problems. In total, our corpus contains approximately 3.9k issues spanning the selected component (8) repositories across the ﬁ ve tools, with per-repository counts summarized in T able 2 . 4.2 Data Processing and Coding This section describes how we annotated issues and scaled coding with LLMs [ 80 ]. W e ﬁrst establish an annotation scheme through human pair annotation. W e then apply LLM assistance to transfer this scheme across the full corpus. 4.2.1 Establishing An Issue Annotation Scheme: Human Pair Annotation W e began issue processing by de veloping a characterization frame work for issue content, usability rele v ance, and usability themes. T wo analysts (A and B) conducted this phase. 2 T o de velop the framew ork, we selected a stratiﬁed sample of 180 issues (approximately 4.6% of the 3.9k-issue corpus). W e al- located samples proportionally across repositories and further stratiﬁed by time (early , middle, late thirds of the window), discussion intensity (short v ersus long comment threads), and issue labels ( e .g., bug, enhancement, question, documentation) to av oid ov er -sampling a single issue type. W e used labels as a guide rather than as ground truth. Calibrating Issue Understanding Between Analysts. T o establish a shared understanding of identity-based signing and issue context, both analysts independently memoed a subset of 50 issues from the 180-issue sample. For each issue, we used 2 Only Analyst A had prior domain knowledge of identity-based signing. the title, labels, body , and comments to memo the affected component functionality and a plain-language description of what the issue reported. W e computed the percentage agreement to assess semantic alignment between analysts 3 . Across the 50 issues, analysts reached 90% agreement on af fected component functionality and 88% agreement on issue understanding. These results indicate that the memo ﬁelds and heuristics provided suf ﬁcient context for subsequent coding. Calibrating Usability V ersus Non-Usability Issues. Next, we distinguished usability issues from non-usability issues. W e deﬁne usability using the summativ e—assesses whether users can achie ve speciﬁed goals effecti vely , efﬁciently , and satisfactorily— and formativ e lens—diagnoses difﬁculties users encounter and identiﬁes areas for improv ement to sup- port product ev olution [ 9 , 34 , 47 , 78 ]. W e treat an issue as usability-relev ant when the tool is not blocked primarily by a functional defect, but the user cannot achie ve goals ef ﬁciently , effecti vely , or satisfactorily due to workﬂow design, interface behavior , feedback, or documentation [ 34 , 47 ]. When issue reports were ambiguous, we relied on follo w-on discussion and the apparent resolution to infer the barrier elements rather than relying only on the initial report. For example, a user may initially report a bug that later discussion re veals to be confusion about conﬁguration or tool behavior . Both ana- lysts independently labeled a subset of 100 issues as usability or non-usability based on these criteria. 4 Initial agreement was 70%. After reﬁning our deﬁnitions and identiﬁcation guidelines in the codebook, all remaining disagreements were resolved through discussion. Generating Usability Themes. Next, we grouped usability- labelled issues into themes that reﬂect recurring problems in terms of af fected component functionality and the nature of 3 Percent agreement w as used since this was an open coding task for calibrating expectation [ 24 ]. 4 W e selected 100/180 follo wing recommendations to code qualitative data in batches to enable disagreement resolution and iterativ e reﬁnement [ 57 ]. 5 the usability breakdown. W e applied a two-staged theming approach. First, we inductiv ely grouped issues into common themes based on functional and implicational similarities in the reported concerns. Analyst A generated these themes inductiv ely , leveraging domain kno wledge to identify mean- ingful similarity conditions. A single issue could be assigned to multiple themes depending on context; for example, an issue may begin as user confusion and later re veal a missing feature, in which case it is assigned to both a confusion-related theme and a missing-feature theme [ 54 ]. W e further organized these themes into higher -order secondary themes to present results at a coarser lev el of abstraction. All generated themes are listed in § C . Second, we mapped the usability issues onto Nielsen’ s usability heuristics to situate our ﬁndings within a widely recognized usability framework [ 39 , 65 ]. W e use this heuristic mapping as an or ganizing lens rather than as a substitute for our inductiv e themes. Nielsen Usability Heuristics Nielsen’ s usability heuristics provide a widely used set of usability principles for ev aluating interactiv e systems [ 50 ]. Although originally proposed for user interface e v aluation, these heuristics hav e been extended to complex, domain-speciﬁc applications [ 39 ], which includes identity-based software signing tooling. This frame work has been used in prior usability research [ 28 , 65 , 82 ]. W e use the heuristics as an additional analysis lens to organize and com- pare de veloper -reported concerns across tools using a stable, recognizable usability vocab ulary . Speciﬁcally , we map each usability-relev ant issue to one or more heuristics: N1 (V isi- bility of System Status), N2 (Match Between System and the Real W orld), N3 (User Control and Freedom), N4 (Consis- tency and Standards), N5 (Error Pre vention), N6 (Recognition Rather than Recall), N7 (Flexibility and Efﬁcienc y of Use), N8 (Aesthetic and Minimalist Design), N9 (Help Users Rec- ognize, Diagnose, and Recov er from Errors), and N10 (Help and Documentation). This mapping complements our induc- tiv ely generated themes by supporting comparison with prior usable-security studies that report results in heuristic terms. 4.2.2 LLM-Assisted Coding At Scale T o scale coding beyond the human-annotated sample, we used an LLM as a co-annotator , following prior work on LLM-assisted qualitativ e coding [ 5 , 59 , 80 ]. W e used the 180 human-coded issues described above as both a calibration set and as in-context training material for the models. Instruction Compilation From Human Coding . After completing our human annotation steps, we manually pro- duced an instruction guideline and codebook that operational- ize our coding process across all phases of the scheme. This prompt document speciﬁes code deﬁnitions, decision rules, and common edge cases for labeling usability versus non- usability , assigning usability-problem types, identifying im- plicated tool components, and applying theme-mapping rules. Iterative Calibration On The 180-Issue Set. W e cali- brated GPT 5.1 (API model) using the 180 human-coded issues. Using the initial 100 human-labeled (§ 4.2 ), we reﬁned our prompt document. W e supplied the full issue text (title, labels, body , and comments) and instructed the model to gener- ate labels in three stages: (1) issue understanding and af fected components, (2) usability relev ance, and (3) thematic, and Nielsen-heuristics mappings. Analyst A revie wed the outputs after each stage, calculated agreement, documented failure modes, and reﬁned the instruction prompt document to clarify ambiguous cases and reinforce decision rules [ 44 ]. T o assess reliability , we compared LLM outputs to human annotations on the held-out 80-issue batch for each phase. Phase 1. For issue understanding and implicated compo- nents, we report percent agreement because this phase primar - ily validated the model’ s interpretiv e accuracy . Agreement was 66.7% for af fected components and 91.9% for issue un- derstanding. Phase 2. W e compared human and LLM labels for us- ability relev ance. Because this binary task was highly imbal- anced (usability ≫ non-usability), we report Gwet’ s A C1 [ 85 ], which is more stable than Cohen’ s κ under ske wed prev alence. W e obtained A C1 = 0.957 (near -perfect agreement) and Co- hen’ s κ = 0.856. The classiﬁer achie ved precision = 98.08%, recall = 98.08%, and F1 = 98.08%, with speciﬁcity = 87.50%, indicating strong performance on both the majority usability class and the minority non-usability class. Phase 3. W e ev aluated reliability for multi-label theme as- signments. Percent agreement was 83.9% for Nielsen themes, 75.8% for Associated Component Themes, and 62.5% for inductiv e themes. For chance-corrected estimates, we report Gwet’ s A C1: Nielsen theme A C1=0.84; Associated Compo- nent Theme A C1=0.86; and inductive theme A C1=0.91. 4.3 Data analysis The thematic mapping described in § 4.2 forms the ﬁrst step of our analysis. After classifying 3,900 issues, we identiﬁed 2,965 as usability-related. W e organize these issues into two complementary theme sets: (1) inductively derived themes at two le vels (a primary theme with associated lower -le vel themes), and (2) mappings to Nielsen’ s usability heuristics to situate our ﬁndings within a widely recognized usability framew ork. W e follo w an iterati ve grouping process to con- solidate semantically ov erlapping codes and to form themes that summarize recurrent usability problems across tools and ov er time [ 19 ]. These themes serve as the foundation for de- scribing ho w usability problems v ary across ecosystems and throughout the study window . T o answer RQ1 and RQ2 , we combine thematic synthesis with descriptiv e summaries. W e use the themes to characterize recurring categories of de veloper -reported usability problems and to summarize the tool functionality implicated by these issues, yielding a consistent vocab ulary for downstream anal- 6 T able 3: Usability theme distrib ution by tool, normalized to show the percentage of theme assignments per tool. Notation: Conﬁg —Conﬁg. friction; Auth —Authentication friction; Build/Rel —Build/CI/install/distribution issues; Inte gr —Integration; W orkﬂow —T edious W orkﬂows; Docs —User confusion/unclear docs; Notif/Logs —Notiﬁcation/Logging; Une xp —Unexpected behavior; P erf —Performance; Sec —Security; Missing —Missing feature/enhancement. T ool Operational Friction Cognitive Friction Functional Reliability Functional Gap Conﬁg Auth Build/Rel Inte gr W orkﬂow Docs Notif/Logs Unexp P erf Sec Missing Sigstore-Cosign 4.4 2.6 4.7 13.9 6.5 18.5 4.4 9.9 1.0 4.8 29.4 Sigstore-Fulcio 3.9 1.8 4.3 13.6 2.9 11.4 4.3 6.8 1.1 11.1 38.9 Sigstore-Rekor 2.2 0.0 4.4 10.2 4.0 16.8 5.0 10.2 3.7 6.5 37.0 Ke yfactor -SignServer 11.6 9.3 11.6 9.3 2.3 30.2 7.0 4.6 2.3 0.0 11.6 Ke yfactor -EJBCA 14.7 3.9 12.4 10.5 1.3 27.4 8.5 11.1 0.6 2.0 7.5 Notaryv2-Notation 4.2 1.5 5.0 7.4 10.1 8.8 12.6 6.3 1.7 3.6 38.7 OpenPubKe y 1.3 3.3 5.3 7.3 2.7 10.0 4.0 6.0 0.7 10.7 48.7 Hashicorp-V ault 7.2 4.4 3.6 10.6 7.9 16.4 11.1 15.1 1.9 3.0 18.9 ysis. W e then quantify and visualize theme frequencies using descripti ve statistics that sho w how usability themes distribute across tools and component roles ( e .g., orchestrator , creden- tial issuer , and log). T ogether, these analyses identify what usability problems are reported and where they most often arise in identity-based signing tools. T o answer RQ3 , we examine trends in de veloper-reported usability issues. Guided by § 2.1 , we structure our inquiry around two working hypotheses: • H1: As identity-based softwar e signing tools (and their components) matur e, the fr equency of r eported usability issues decr eases over time . • H2: The rate of c hange in r eported usability issues differ s acr oss tools, with mor e matur e tools exhibiting str onger downwar d tr ends. W e construct monthly time series of usability-issue counts and examine trends across the aggre gated corpus and within each tool-speciﬁc repository . Because the outcome is a non- negati ve integer count per time unit, we model temporal trends using Poisson regression. Poisson models are appropriate for count data because they (i) directly model ev ent counts, (ii) ensure nonneg ati ve ﬁtted v alues, and (iii) allo w multiplicati ve interpretation of time effects via the log link (e.g., a per-month rate ratio exp ( β 1 ) ) [ 10 ]. This formulation provides a concise estimate of the direction and magnitude of change over time and supports comparisons of trend strength across tools. log ( E [ Y t ]) = β 0 + β 1 t , (1) where Y t is the number of dev eloper-reported usability issues in time period t ( t index es months o ver the study window). W e ﬁt Eqt. ( 1 ) to the aggregated corpus and to each tool-speciﬁc series to ev aluate whether issue frequency decreases o ver time (H1) and whether rates of change differ across tools (H2). 4.4 Threats to V alidity Construct V alidity describes errors due to operationaliza- tions of constructs. W e study two primary constructs: usability and ar chitectural surface . W e operationalize usability using standard framew orks (e.g., Nielsen), and track it via dev eloper - reported GitHub issues (which reﬂect only visible friction). Issues were classiﬁed using LLM-assisted coding with human adjudication; although calibration and agreement checks miti- gate bias, some misclassiﬁcation remains possible. Mapping issues to components in our architectural model necessarily includes some simpliﬁcation of complex systems. Internal V alidity concerns causal interpretation of ob- served patterns. W e interpret issue frequency and temporal trends as indicators of usability . W e acknowledge that tempo- ral trends may also be inﬂuenced by factors such as adoption dynamics and reporting practices. Dif ferences across tools may be confounded by ecosystem size, maturity , or organi- zational backing. Howe ver , our ﬁndings describe patterns consistent with prior research on longitudinal usability . Statistical Conclusion V alidity addresses the reliability of quantitati ve inferences. Our longitudinal analyses rely on Poisson regression and assume independence of issues within time intervals, and clustering or overdispersion could affect es- timates. W e interpret statistical results as directional e vidence rather than precise effect magnitudes. Pearson dispersion statistics across the eight repositories (5 tools) ranged from 0.892 (Rekor) to 7.156 (Openpubkey), with ﬁv e of eight repos- itories exceeding the overdispersion threshold of 1.5. Residual 7 autocorrelation was signiﬁcant for Notation and Openpubkey , indicating temporal clustering not fully captured by a single linear slope. W e addressed these violations in two ways: (1) re-estimating all models with HC0, heteroskedasticity-robust standard errors, which did not change any signiﬁcance con- clusion; and (2) ﬁtting negati ve-binomial sensiti vity models, which preserved the direction and signiﬁcance of all trends. The one substanti ve caveat is Openpubke y , where the esti- mated magnitude of the positiv e trend is sensitive to model choice ( β Poisson = 0 . 040 vs. β NB = 0 . 104 ), though the posi- tiv e direction remains consistent across both speciﬁcations. Full diagnostics and sensitivity results are reported in § C.2 and T ables 9 to 12 . External V alidity concerns generalization beyond the stud- ied tools. Our dataset comprises public usability issues re- ported in ﬁ ve prominent open-source identity-based signing ecosystems. Our results may not generalize to proprietary signing systems, nor to silent usage contexts ( e.g ., Sigstore is used by the US DoD [ 66 ]). 5 Results In this section, we present our ﬁndings by research question. 5.1 RQ1: Reported Usability Problems Finding 1: Overall, RQ1 shows that reported usability problems are broad b ut patterned. Unmet functional e x- pectations and operational workﬂo w friction jointly dom- inate, with information clarity concerns as a consistent secondary burden. These categories recur across all tools rather than being isolated to a single implementation. W e answered RQ1 through two complementary views. In § 5.1.1 we summarize the inducti vely deriv ed usability themes from thematic synthesis. In § 5.1.2 we map issues to Nielsen’ s usability heuristics as a deductiv e lens. 5.1.1 Inductively-Deri ved Usability Themes T able 3 shows the result of our inducti ve thematic analysis. At the primary-theme lev el, the lar gest categories across the corpus are Missing feature/enhancement r equest , User confu- sion/unclear documentation , Une xpected behavior , Inte gra- tion failur e/issues , Notiﬁcation/Logging issues , T edious W ork- ﬂows , Conﬁguration friction , and Build/CI/installation/dis- tribution/r elease issues . These themes recur across all tools, though their relativ e proportions v ary by project. 5 For ex- ample, Missing feature friction is comparativ ely high for Notation, OpenPubK ey , Fulcio, and Cosign, while Notiﬁca- tion/Logging issues are especially prominent for V ault and 5 Because issue volume is uneven across repositories ( e.g., V ault con- tributes substantially more issues), we report normalized percentages in T a- ble 3 to support cross-tool comparison of theme composition. T able 4: Distribution of Nielsen usability themes (N1–N10, cf. § 4.2.1 ) by tool (percent of theme assignments per tool). T ool N1 N2 N3 N4 N5 N6 N7 N8 N9 N10 Cosign 3.5 6.6 2.6 12.3 9.2 2.6 27.4 1.8 12.9 21.1 Fulcio 4.1 5.1 0.7 12.0 12.0 2.4 33.9 2.4 6.5 20.9 Rekor 4.6 7.2 0.3 12.7 8.9 2.0 32.3 1.2 11.0 19.9 SignServer 2.2 0.0 0.0 6.7 11.1 0.0 15.6 2.2 24.4 37.8 EJBCA 4.8 4.8 1.1 10.5 14.0 0.3 12.5 0.3 21.1 30.5 Notation 8.5 5.7 1.6 12.7 8.5 4.8 26.9 2.8 10.3 18.2 OpenPubKe y 0.6 4.2 3.0 9.7 17.6 1.8 38.8 3.6 6.1 14.5 V ault 7.3 6.2 2.5 12.5 11.4 1.9 20.3 1.3 15.9 20.7 A verage 4.4 5.0 1.5 11.2 11.3 2.0 26.5 2.0 12.7 23.3 Notation. OpenPubK ey shows a relativ ely higher Missing feature share than other tools, consistent with its status as an emerging project where core functionality gaps remain salient to users. T aken together, these patterns suggest that usabil- ity burden in identity-based signing tools is driv en primarily by unmet functional expectations and setup/integration work- ﬂows, rather than isolated, tool-speciﬁc edge cases — with user-f acing clarity concerns as a consistent secondary burden across all tools. Abstracting these into secondary themes, Operational F ric- tion and Functional Gap jointly dominate the corpus for ev ery tool. A veraged across tools, Functional Gap leads at 32.7%, followed by Operational Friction (30.9%), Cognitive Fric- tion (23.4%), and Functional Reliability (13.0%). A volume- weighted summary shows a similar distrib ution: Operational Friction 32.0%, Functional Gap 29.8%, Cognitive Friction 22.7%, and Functional Reliability 15.4%. This indicates that usability burden is dri ven primarily by setup/integration/work- ﬂow barriers and unmet functional expectations, with cogni- tiv e clarity concerns also signiﬁcant. T ool-level variations are notable: EJBCA and SignServ er sho w the highest Oper- ational and Cognitive Friction shares (Operational friction: 35.1% each; Cogniti ve friction: 34.4% and 35.1% respec- ti vely). Functional Reliability concerns are lar gely small ov er - all; ho wev er, some v ariation across tools exists — V ault and EJBCA sho w comparativ ely higher Functional Reliability shares (19.5% and 17.4% respectiv ely). The distributions are similar by tool. W e performed omnibus chi-square tests of independence on the tool × theme-family contingency tables for the induced primary and secondary themes ( N = 5 , 924 theme assignments). At the primary le vel the result was signiﬁcant, χ 2 ( 70 ) = 693 . 08 , p < 0 . 001 , with a small-to-moderate eff ect size (Cramér’ s V = 0 . 129 ). At the lev el of secondary themes, the asso- ciation remained signiﬁcant, χ 2 ( 21 ) = 321 . 01 , p < 0 . 001 , with a similarly small-to-moderate ef fect size (Cramér’ s V = 0 . 134 ). This indicates statistically reliable associations 8 between themes and tools, with effect sizes that are mean- ingful but modest in practical terms. F ollow-up binary tests conﬁrm that the largest per-theme effects are concentrated in Missing featur e/enhancement r equest ( V = 0 . 215 ) and the workﬂo w components Signing W orkﬂow ( V = 0 . 307 ) and V er- iﬁcation W orkﬂow ( V = 0 . 306 ) — the latter two reﬂecting ar- chitectural speciﬁcity rather than differential usability b urden. Howe ver , from T able 3 , no tool shows a systematically dif- ferent theme proﬁle across the majority of categories. Hence, while tool-speciﬁc variation e xists, usability challenges ar e br oadly shared acr oss ecosystems. 5.1.2 Nielsen Heuristic Distribution T o situate these ﬁndings within a widely used usability frame- work, we map issues to Nielsen’ s usability heuristics. T able 4 shows the distrib ution. In this framework, issue distrib utions are concentrated in N7–Fle xibility and efﬁciency of use and N10–Help and documentation , with some occurrence of N9– Help users r ecognize, diagnose, and r ecover fr om err ors . Synthesizing the two views. The two analyses produce aligned results. The prominence of N7 and N10 corresponds to the workﬂo w o verhead (too man y steps, weak automation ergonomics, poor inte gration affordances) and the documen- tation gaps (missing/outdated/unclear guidance) identiﬁed in our inductiv e taxonomy . 5.2 RQ2: Functional Components Implicated in Usability Reports? Finding 2: In RQ2, we learn that reported usability con- cerns are concentrated in (1) CLI tooling and API inter - action surfaces, (2) policy and conﬁguration boundaries, and (3) signing and veriﬁcation workﬂow components. These results indicate the components and boundaries where usability friction most often appears in identity- based signing tools. T o answer RQ2, T able 5 gives a functionality view , and Appendix T able 13 shows a component-le vel vie w . Functionality view (T able 5 ): Across the corpus, the most frequently implicated functionalities are CLI T ooling (1,110; 19.4%), P olicy/Conﬁguration (847; 14.8%), API (654; 11.4%), Authentication/A uthorization (495; 8.7%), and Sign- ing W orkﬂow (407; 7.1%), followed by V eriﬁcation W orkﬂow (390; 6.8%) and W eb Client (372; 6.5%). This indicates that usability burden is concentrated in interface/workﬂow sur- faces and integration paths rather than in a single narro w subsystem. V eriﬁcation W orkﬂow (390; 6.8%) and Signing W orkﬂow (407; 7.1%) together account for 14.0% of assign- ments, conﬁrming that the core cryptographic operations of identity-based signing are themselv es a meaningful friction surface — not merely a background concern relati ve to infras- T able 5: T op affected functionalities across all tools. Func- tionalities reﬂects interface-related components, ( † ) and core signing workﬂo ws ( ∗ ). Counts denote the number of function- ality–theme assignments, and shares represent the percentage of all assignments ( N = 5 , 722 ). Because issues may implicate multiple functionalities, a single issue can ha ve more than one assignment. Secr ets Back end (K e y Mgmt Core) reﬂects V ault-speciﬁc secrets-management infrastructure not required to achiev e identity-based signing (80.3% V ault-origin) and is retained in the ranking for completeness. Functionality Count Share (%) CLI T ooling † 1,110 19.4 Policy/Conﬁguration † 847 14.8 API † 654 11.4 Authentication/Authorization † 495 8.7 Signing W orkﬂow ∗ 407 7.1 V eriﬁcation W orkﬂow ∗ 390 6.8 W eb Client † 372 6.5 Build/CI/Installation † 340 5.9 Notiﬁcation/Logging † 254 4.4 Secrets Backend (K ey Mgmt Core) † 563 9.8 tructure and tooling issues. Thus, reported usability concerns extend beyond signing/veriﬁcation semantics into broader dev eloper interaction surfaces (CLI, API, conﬁguration, and authentication boundaries). Component-level view (T able 13 ): Examining these func- tionalities on a per-component basis, usability proﬁles are broadly shared but not identical across repositories. Per T a- ble 13 , V ault is led by Policy/Conﬁguration (18.4%) and Se- crets Backend (16.9%–not a core identity-based signing fea- ture), with substantial API b urden (15.8%), CLI tooling and Authentication/Authorization b urden (12.6% respectively). Notation and Cosign both emphasize CLI tooling (36.2% and 30.3% respecti vely), while Cosign additionally shows relativ ely large V eriﬁcation W orkﬂow (17.5%) and Signing W orkﬂow (17.5%) shares. Rekor is more weighted toward API (39.1%) and CLI tooling (20.2%). Fulcio emphasizes Policy/Conﬁguration (22.2%) and API (15.4%). OpenPub- Ke y differs by sho wing Authentication/Authorization as its top component (23.4%). Ke yfactor EJBCA is led by Policy/- Conﬁguration (25.6%) and W eb Client (24.0%); SignServer shows a similar pattern with W eb Client leading (25.0%). T aken together , these results suggest recurring cross-tool hot spots at CLI/API/conﬁguration interfaces, with tool-speciﬁc emphasis in veriﬁcation, authentication, and web-client areas. W e note an important construct-validity caveat: the high share of CLI tooling issues may partly reﬂect tool composition in our sample, since sev eral selected systems are CLI-ﬁrst or expose substantial CLI surfaces. Therefore, CLI prominence should be interpreted as both a substantive usability signal 9 T able 6: Poisson time-trend estimates for monthly counts of usability-related issues (all themes combined). Negati ve β 1 indicates decreasing counts ov er time. Statistically signiﬁcant trends ( p < 0 . 05) are bolded. Repository β 1 RR/Month %/Month p T otal sigstore/cosign -0.0382 0.963 -3.75 2 . 43 × 10 − 40 716 sigstore/fulcio -0.0479 0.953 -4.67 4 . 14 × 10 − 13 148 sigstore/rekor -0.0505 0.951 -4.92 2 . 63 × 10 − 17 186 keyfactor/signserv er-ce 0.0126 1.013 1.27 0.408 22 keyfactor/ejbca-ce -0.0001 1.000 -0.01 0.991 146 notaryproject/notation -0.0322 0.968 -3.17 1 . 20 × 10 − 11 250 openpubkey/openpubke y 0.0404 1.041 4.12 3 . 39 × 10 − 7 96 hashicorp/vault -0.0250 0.975 -2.47 3 . 00 × 10 − 37 1,401 and a sampling-sensitiv e effect. 5.3 RQ3: How do reported usability pr oblems change over time? Finding 3: In RQ3, trends indicate non-uniform ma- turity across identity-based signing tools and concern types. Repository-lev el Poisson models show signiﬁcant declines in de veloper-reported usability problems for Sig- store (Cosign, Fulcio, Rekor), V ault, and Notation, while Ke yfactor exhibits weak or mix ed trends and OpenPub- Ke y shows increasing counts over the study window . Theme- and component-lev el analyses further rev eal that declines are not uniform: some usability surfaces stabilize earlier , while others persist or increase. T ogether , these results provide partial support for H1 (decline with matu- rity) and support H2 (heterogeneous trajectories across tools and concern types). W e answered RQ3 ﬁrst by considering aggregate temporal patterns across components (§ 5.3.1 ), and second by consider- ing temporal trends at the tool- and theme-lev el (§ 5.3.2 ). 5.3.1 Aggregate temporal patter ns across components At the aggregate lev el, we structured our inquiry with two hypotheses (§ 4 ): ( H1) frequency of reported usability issues decreases o ver time, and ( H2) rate of decrease differs by tools. H1 Is Supported F or Some T ools, But Not All. At the repos- itory lev el (T able 6 ), sev eral tools e xhibit statistically signiﬁ- cant downw ard trajectories in monthly usability-issue counts. All three Sigstore repositories sho w strong, statistically sig- niﬁcant declines: Cosign ( β 1 = − 0 . 0382 , p < 0 . 001 ), Ful- cio ( β 1 = − 0 . 0479 , p < 0 . 001 ), and Rekor ( β 1 = − 0 . 0505 , p < 0 . 001 ). Notation ( β 1 = − 0 . 0322 , p < 0 . 001 ) and V ault ( β 1 = − 0 . 0250 , p < 0 . 001 ) also decline signiﬁcantly . In contrast, OpenPubK ey increases signiﬁcantly o ver this win- dow ( β 1 = 0 . 0404 , p < 0 . 001 ), while K eyfactor EJBCA ( β 1 = − 0 . 0001 , p = 0 . 991 ) and SignServ er ( β 1 = 0 . 0126 , p = 0 . 408) show no statistically signiﬁcant ov erall trend. At the theme level, per-tool primary-theme slopes reinforce this non-uniform pattern (Figure 3 ). V ault shows the most consistent improvement, with all 11 primary themes declining signiﬁcantly (11/11). Rekor and Cosign are nearly as consis- tent (9/10 and 9/11 signiﬁcant neg ati ve slopes respecti vely), and Fulcio shows broad but slightly less uniform decline (7/11). Notation is more selecti ve, with 5 of 11 themes declin- ing signiﬁcantly — Build/CI, Conﬁguration friction, Missing feature, Notiﬁcation/Logging, and T edious W orkﬂo ws. Open- PubKe y is the only tool with signiﬁcant positiv e slopes, with 4 themes increasing signiﬁcantly . Both K eyf actor tools sho w limited e vidence of systematic change (0 signiﬁcant slopes). T aken together , these results indicate that H1 is supported for Sigstore and V ault, partially supported for Notation, and not supported for OpenPubKe y or the Keyf actor tools. H2 Is Supported By Substantial Differences In T rend Magnitude And Consistency Across T ools. The rate of change differs both in direction and in stability across tool families. Sigstore and V ault exhibit the most consistent e vi- dence of decreasing usability-problem reports, with signiﬁ- cant negati ve trends at the repository lev el and broadly ne ga- tiv e slopes across nearly all and all themes respecti vely , but V ault sho ws shallo wer slopes than the Sigstore tools. Notation trends downw ard overall but with theme-dependent signiﬁ- cance, suggesting partial rather than uniform improv ement. OpenPubKe y trends upward ov erall, and its theme-le vel be- havior conﬁrms that this increase is concentrated in a speciﬁc subset of themes rather than a uniform rise across all concern types. K eyf actor sho ws no e vidence of systematic change in either direction during the study window , indicating that ma- turity signals inferred from issue frequenc y are not uniform across identity-based signing tools. 5.3.2 Theme- and T ool-Level T emporal T rends Those repository-le vel trends summarize net change in re- ported usability over time, but they do not indicate which classes of usability problems are impro ving (or w orsening), nor whether those shifts are consistent across tools. Theme-level and heuristic views. Across tools that trend downw ard, the decline is not uniform across categories. Fig- ure 3 shows the time-trend slopes by tool, correlated with the primary usability theme from our inductive analysis (§ 5.1 ). Under our inductiv ely generated primary themes, Sigstore, V ault, and Notary show broadly ne gati ve average slopes, con- sistent with a reduction in sev eral classes of reported friction as tools stabilize. Under Nielsen heuristics, trends are direc- tionally similar but more heterogeneous (Appendix Figure 6 ). At the repository lev el, V ault shows 8 signiﬁcant negati ve Nielsen slopes, Notation shows 5, and each Sigstore repos- itory shows 7–8. OpenPubKe y e xhibits 4 signiﬁcant posi- tive Nielsen slopes, and Keyf actor shows limited e vidence 10 Figure 3: Heatmap of Poisson time-trend slopes ( β 1 ) by tool and inducti vely generated primary usability theme. Neg ati ve values indicate decreasing e xpected monthly issue counts over time; positi ve v alues indicate increasing counts. Only statistically signiﬁcant cells ( p < 0 . 05) are colorized and annotated with the estimated β 1 ; non-signiﬁcant cells are masked. (a) Expected monthly issue counts by usability theme . (b) Expected monthly issue counts by aff ected component . Figure 4: Aggregate Poisson regression curv es of expected monthly issue counts across all tools o ver calendar time. Curv es represent ﬁtted Poisson means. Downw ard trajectories indicate decreasing expected counts (neg ati ve time slope), while upward trajectories indicate increasing expected counts (positi ve time slope). (EJBCA: one signiﬁcant positive; SignServer: none signiﬁ- cant). Maturity is not a single trajectory , and some classes of usability problems diminish faster than others. Pooled theme- and component-level trajectories. T o complement repository-lev el models, we pool issue counts across tools and ﬁt Poisson re gressions by theme and by af- fected component, plotting ﬁtted means over calendar time. Figure 4 (a) shows the aggregate trends by usability theme ( what ), and Figure 4 (b) sho ws the aggregate trends by affected component ( wher e ). Because the Poisson mean is modeled as exp ( β 0 + β 1 t ) , negati ve slopes indicate exponential declines in expected monthly issue counts, while positi ve slopes indi- cate exponential increases. At the inductiv ely generated primary-theme lev el, ﬁtted aggregate curves decline across all modeled themes. Esti- mated slopes are consistently ne gati ve and statistically signiﬁ- cant, ranging from about − 0 . 0193 to − 0 . 0326 (Appendix T a- ble 14 ). The steepest declines are observed for Missing fea- tur e/enhancement r equest ( β 1 ≈ − 0 . 0326 ), Security concerns ( β 1 ≈ − 0 . 0326 ), Authentication friction ( β 1 ≈ − 0 . 0304 ), and T edious W orkﬂows ( β 1 ≈ − 0 . 0284 ). Other themes also decline but more gradually ( e .g ., Build/CI/installation r elease issues and Conﬁguration friction , both β 1 ≈ − 0 . 0193 ), suggesting broad improv ement with differential rates across problem classes. At the af fected-component level, aggreg ate ﬁtted curves are also uniformly downw ard in the current dataset. All modeled component slopes are neg ati ve and statistically signiﬁcant (Appendix T able 15 ), with the steepest declines in V eriﬁcation W orkﬂow ( β 1 ≈ − 0 . 0338 ) and CLI T ooling ( β 1 ≈ − 0 . 0308 ), followed by API and P olicy/Conﬁguration . Core interaction surfaces such as A uthentication/Authorization , W eb Client , 11 and Build/CI/Installation also decline, indicating broad down- ward pressure in usability-problem reporting across both workﬂo w and interface layers. Notably , this dataset does not show an increasing release-pipeline trajectory . These aggregate declines should be interpreted alongside tool-lev el heterogeneity . In particular , aggregate do wnward mov ement can coexist with repository-speciﬁc increases ( e.g., OpenPubKe y in repository-le vel models), because pooled ag- gregate trends are weighted by cross-tool composition, vol- ume, and differing maturation trajectories. Thus, aggregate models describe corpus-le vel direction, while per-tool models rev eal where that direction is not uniform. T ogether , these pooled analyses indicate changes in ov erall volume and shifts in the composition of reported usability concerns ov er time. 6 Discussion and Future W ork 6.1 Data V alidity and Interpretation Limits Any empirical study must reﬂect on the nature of the data. As discussed in § 4.4 , we observe publicly reported GitHub issues. These represent visible friction experienced by some users, not the total usability burden. Other feedback channels ( e .g., enterprise support, telemetry , internal monitoring, mailing lists, or internal team discussions) may exist and could absorb or explain discrepancies in issue volume across ecosystems. Reporting practices may also v ary across projects depend- ing on governance structure, or ganizational backing, and user population norms. F or e xample, security-conscious communi- ties may resist telemetry , and some enterprise or government deployments may triage issues through priv ate channels. Our dataset should therefore be understood as reﬂecting a lower bound: reported friction within a speciﬁc public channel rather than comprehensiv e ecosystem usability . A notable feature of our dataset is that approximately 76% of the LLM-analyzed issues were categorized as usability- related. W e manually revie wed a substantial subset of these issues and concur with the classiﬁcation. Many issues are fea- ture requests, workﬂo w discussions, or clariﬁcation requests that directly af fect usability . W e believ e the high proportion of usability-labeled issues reﬂects the types of problems that surface in well-engineered projects: these ecosystems are often sponsored by companies and maintained by expert engi- neering teams, which may reduce the pre v alence of low-le vel crash reports in lieu of higher-le vel interaction and workﬂo w concerns. W e do not believe that “e v erything is usability”, b ut in these projects it appears to be the dominant concern. The label of “usability” may mask a user’ s underlying con- cern, a phenomenon known as the “XY Problem” [ 87 ]. Repos- itory artifacts make it difﬁ cult to distinguish interface-lev el confusion from deeper architectural constraints or design trade-offs. Our ﬁndings should therefore be interpreted as patterns in reported usability framing rather than deﬁnitiv e diagnoses of root technical causes. 6.2 Usability Lessons f or ID-based Signing W e distill three lessons from our study . Lesson 1: Usability problems concentrate at architectural boundaries, esp. CLI/API interaction surfaces, conﬁgura- tion layers, and veriﬁcation workﬂo ws. In the studied ecosystems, the reported usability problems are not e venly distributed across architectural components. CLI tooling, conﬁguration, and veriﬁcation workﬂow surfaces are among the most frequently implicated components in is- sue reports (T able 5 ). These components sit at architectural boundaries, where identity binding, policy speciﬁcation, log- ging infrastructure, and workﬂo w orchestration intersect. Sigstore illustrates this pattern clearly . Despite broad de- creases in dev eloper-reported usability problems over time, substantial volume remains in veriﬁcation workﬂow and conﬁguration-related reports. Many of these reports con- cern interpreting veriﬁcation failures, diagnosing mismatches between identity assertions and policy expectations, or un- derstanding interactions between Cosign, Fulcio, and Rek or . Even when indi vidual components mature, the coordination among them remains a salient usability surface. V ault provides a complementary illustration. While V ault shows statistically signiﬁcant declines in se veral usability themes, its issues concentrate hea vily in polic y and conﬁgu- ration components. Because V ault operates as infrastructure embedded in organizational w orkﬂo ws, usability depends on correct policy design, backend conﬁguration, and inte gration into CI/CD systems rather than on a single signing command. In this setting, usability problems often reﬂect ambiguity in authorization logic, policy intent, or deplo yment assumptions. These examples reinforce a pattern across the tools: us- ability problems cluster at inte gration boundaries. Architec- tural modularity enables reuse and separation of concerns, b ut also expands the coordination surfaces that engineers must understand and conﬁgure correctly . As identity-based sign- ing decomposes responsibility across orchestrators, identity providers, certiﬁcate authorities, and logging systems, the in- terfaces among these components become primary usability domains. Issues in veriﬁcation in particular are asymmetri- cally consequential, as it is the mechanism by which down- stream users determine whether to trust an artifact. Lesson 2: Usability improv ements are component- dependent rather than uniform across tools. T emporal Poisson trends indicate heterogeneous trajec- tories across tools (T able 6 , Figure 7 ). In sev eral ecosys- tems (Sigstore, V ault, Notation), veriﬁcation and identity- integration components trend do wnward, whereas Keyf actor remains largely ﬂat and OpenPubK e y shows increases across multiple components. Maturity therefore does not correspond 12 to a uniform reduction in friction; rather , it reﬂects a redistri- bution of usability pressure across architectural surf aces and integration boundaries. Lesson 3: Identity-based signing reduces core usability burdens and shifts the focus to other historical hurdles in software signing. Our dataset contains 2,965 usability-related issues across ﬁv e ecosystems § 5 , indicating that developer friction remains substantial ev en in mature identity-based designs. While identity-based signing removes much of the direct ke y lifecy- cle burden documented in earlier usable-security studies [ 81 ], it does not produce a commensurate reduction in overall us- ability challenges. Instead, improv ements in one domain ap- pear to expose other coordination and integration dif ﬁculties. This pattern is consistent with Amdahl’ s Law [ 61 ]: remov- ing a dominant source of friction mak es pre viously-secondary challenges more visible. Identity-based signing simpliﬁes key management, b ut usability remains constrained by the se- mantic and operational complexity of trust ev aluation in real- world deployment conte xts. These secondary challenges were also described with respect to the usability of ke y-based ap- proaches to signing ( e.g., [ 86 ]), and no w manifest as primary challenges. In the identity-based signing paradigm, usability is still a central concern. 7 Conclusion Identity-based signing does not eliminate usability bur den; it r edistrib utes and r eshapes it across ar chitectur al boundaries and lifecycle phases. Identity-based signing w as introduced in part to alleviate the long-standing usability challenges of ke y-managed cryptographic workﬂo ws. Through a cross- tool, longitudinal analysis of 3,900 GitHub issues across ﬁv e identity-based signing ecosystems, we examine how usability manifests in practice as these systems mature. Identity-based signing represents a meaningful architectural advance over traditional ke y management. Y et our results indicate that us- ability remains a central and evolving constraint in secure software supply chains. Usability problems concentrate at veriﬁcation and conﬁguration surfaces. Improvements over time are component-dependent rather than uniform, and con- ﬁguration and workﬂow integration remain dominant usability concerns. For tool designers and maintainers , this suggests that veriﬁcation clarity and deployment inte gration must be treated as ﬁrst-class design targets rather than peripheral inte- gration details. For r esearchers , our ﬁndings underscore the importance of ecosystem-le vel and longitudinal analyses of security tool usability , as architectural ev olution can e xpose ne w coordination challenges e ven as earlier friction points de- cline. Future signing systems must therefore pair strong trust models with deliberate attention to the coordination surfaces through which dev elopers operationalize them. References [1] Ibm completes acquisition of hashicorp, creates com- prehensiv e, end-to-end hybrid cloud platform. Press release. [2] Ke yfactor customers. https://www.keyfactor.com/ customers/ . Accessed: 2026-02-19. [3] Zero trust infrastructure access platform. BastionZero joined Cloudﬂare in May 2024 (per site content accessed on urldate). [4] Asma J Abdulwareth and Asma A Al-Shar gabi. T ow ard a multi-criteria frame work for selecting softw are testing tools. IEEE Access , 9:158872–158891, 2021. [5] Meysam Alizadeh, Maël Kubli, Ze ynab Samei, Shirin Dehghani, Mohammadmasiha Zahediv afa, Juan D. Bermeo, Maria K orobe ynikov a, and Fabrizio Gilardi. Open-source llms for text annotation: a practical guide for model setting and ﬁne-tuning. Journal of Computa- tional Social Science , 8(1):17, December 2024. [6] S. Benthall. Assessing software supply chain risk using public data. In IEEE Softwar e T echnolo gy Confer ence (STC) , 2017. [7] C. Braz and J. Robert. Security and usability: The case of the user authentication methods. In Confer ence on l’Interaction Homme-Mac hine . A CM, April 2006. [8] C. Okafor et al. SoK: Analysis of software supply chain security by establishing secure design properties. In A CM W orkshop on Softwar e Supply Chain Offensive Resear c h and Ecosystem Defenses , 2022. [9] C. Rusu et al. User experience ev aluations: Challenges for newcomers. In Aaron Marcus, editor , Design, User Experience, and Usability: Design Discourse . Springer , Cham, 2015. [10] A. Colin Cameron and Pravin K. T riv edi. Re gression analysis of count data. In Re gr ession Analysis of Count Data . Cambridge Univ ersity Press, 2 edition, 2013. [11] Jinghui Cheng and Jin L.C. Guo. How do the open source communities address usability and ux issues? an exploratory study . In Extended Abstracts of the 2018 CHI Confer ence on Human F actors in Computing Sys- tems , CHI EA ’18, page 1–6, Ne w Y ork, NY , USA, 2018. Association for Computing Machinery . [12] D. Cooper et al. Security considerations for code signing. NIST Cybersecurity White P aper , January 2018. [13] Fred D Da vis. Perceiv ed usefulness, percei ved ease of use, and user acceptance of information technology . MIS quarterly , pages 319–340, 1989. 13 [14] Fred D Davis, Richard P Bagozzi, and Paul R W arshaw . User acceptance of computer technology: A compari- son of two theoretical models. Management science , 35(8):982–1003, 1989. [15] Whitﬁeld Dif ﬁe and Martin E. Hellman. New directions in cryptography . In IEEE T ransactions on Information Theory , volume 22, pages 644–654. IEEE, 1976. [16] E. Heilman et al. OpenPubkey: Augmenting OpenID connect with user held signing keys, 2023. https:// eprint.iacr.org/2023/296 . [17] A. Reuter et al. Secure email - a usability study . In F inancial Cryptography and Data Security , Lecture Notes in Computer Science, pages 36–46, Cham, 2020. Springer International Publishing. [18] C. L. Okafor et al. Div erify: Di v ersifying identity ver - iﬁcation in next-generation software signing. arXiv pr eprint arXiv:2406.15596 , 2024. [19] G. T erry et al. Thematic analysis. The SA GE handbook of qualitative r esear ch in psycholo gy , 2, 2017. [20] J. C. Davis et al. A guide to stak eholder anal- ysis for cybersecurity researchers. arXiv pr eprint arXiv:2508.14796 , 2025. [21] K. Merrill et al. Speranza: Usable, pri vacy-friendly soft- ware signing. In ACM SIGSAC Confer ence on Computer and Communications Security , 2023. [22] P . Ladisa et al. SoK: T axonomy of attacks on open- source software supply chains. In 2023 IEEE Sympo- sium on Security and Privacy (SP) , 2023. [23] T . R. Schorlemmer et al. Establishing prov enance before coding: T raditional and next-gen signing, July 2024. arXiv:2407.03949 [cs]. [24] G. C. Feng. Intercoder reliability indices: disuse, misuse, and abuse. Quality & Quantity , 2014. [25] Hal Finney , Lutz Donnerhacke, Jon Callas, Rodney L. Thayer , and Daphne Shaw . OpenPGP Message Format, 2007. [26] Nicole Forsgren, Jez Humble, and Gene Kim. Acceler- ate: The science of lean softwar e and devops: Building and scaling high performing tec hnology or ganizations . IT Rev olution, 2018. [27] Simson Garﬁnkel and Gene Spaf ford. Pr actical UNIX and Internet Security . O’Reilly Media, Sebastopol, CA, 2003. [28] Emily Gonzalez-Holland, Daphne Whitmer , Larry Moralez, and Mustapha Mouloua. Examination of the use of nielsen’ s 10 usability heuristics & outlooks for the future. In Pr oceedings of the Human F actors and Er gonomics Society Annual Meeting , volume 61, pages 1472–1475. SA GE Publications Sage CA: Los Angeles, CA, 2017. [29] Matthe w Green and Matthew Smith. Dev elopers are not the enemy!: The need for usable security apis. IEEE Security & Privacy , 14(5), 2016. [30] HashiCorp. Defending those who defend us: ManT ech customer case study . PDF . [31] HashiCorp. Hashicorp vault: Identity-based secrets man- agement. W eb page. [32] HashiCorp. V ault. https://developer.hashicorp. com/vault , 2025. Accessed: 2025-12-14. [33] Dustin Ingram. Pypi now supports digital attesta- tions. https://blog.pypi.org/posts/2024-11-14-pypi-no w- supports-digital-attestations/, nov 14 2024. [34] International Organization for Standardization. ISO 9241-11: Ergonomic requirements for of ﬁce work with visual display terminals (VDTs) — part 11: Guidance on usability speciﬁcation and measures, 1997. [35] K. Kalu et al. An industry intervie w study of software signing for supply chain security . In USENIX Security Symposium , 2025. [36] Eirini Kalliamvak ou, Georgios Gousios, K elly Blincoe, Leif Singer , Daniel M German, and Daniela Damian. The promises and perils of mining github . In Pr oceed- ings of the 11th working conference on mining softwar e r epositories , pages 92–101, 2014. [37] Kelechi G. Kalu and James C. Davis. Wh y software signing (still) matters: T rust boundaries in the software supply chain, 2025. [38] Kelechi G. Kalu, Soﬁa Okorafor , T anmay Singla, So- phie Chen, Santiago T orres-Arias, and James C. Davis. Why johnny adopts identity-based software signing: A usability case study of sigstore. In Pr oceedings of the 35th USENIX Security Symposium (USENIX Security ’26) , Baltimore, MD, USA, 2026. USENIX Association. https://arxiv .org/abs/2503.00271. [39] Kate Kaplan. 10 usability heuris- tics applied to complex applications. https://www.nngroup.com/articles/ usability- heuristics- complex- applications/ , August 2021. Nielsen Norman Group. Accessed: 2026-02-10. 14 [40] Jonathan Katz and Y ehuda Lindell. Intr oduction to Modern Cryptography . CRC Press, 2 edition, 2014. [41] K eyfactor. Ejbca community edition (ejbca-ce). https: //github.com/Keyfactor/ejbca- ce , 2026. Ac- cessed: 2026-02-05. [42] K eyfactor. Ejbca: Open-source certiﬁcate authority soft- ware. https://www.ejbca.org/ , 2026. Accessed: 2026-02-05. [43] Ke yfactor. Ke yfactor . https://www.keyfactor. com/ , 2026. Accessed: 2026-02-10. [44] Muhammad T alal Khalid and Ann-Perry W itmer . Prompt engineering for lar ge language model-assisted inductiv e thematic analysis. Social Science Computer Revie w , page 08944393251388098, October 2025. [45] Martin F . Krafft, Klaas-Jan Stol, and Brian Fitzgerald. How do free/open source de velopers pick their tools?: a delphi study of the debian project. In Pr oceedings of the 38th International Confer ence on Softwar e Engineering Companion , Austin T exas, May 2016. A CM. [46] Urška Lah, James R Lewis, and Boštjan Šumak. Per- cei ved usability and the modiﬁed technology acceptance model. International Journal of Human–Computer In- teraction , 36(13):1216–1230, 2020. [47] J. R. Lewis. Usability: Lessons learned...and yet to be learned. International Journal of Human-Computer Interaction , 30(9), 2014. [48] M. A. Sasse and I. Flechais. Usable security: Why do we need it? How do we get it? O’Reilly , 2005. [49] Geoffre y A Moore. Crossing the chasm pdf. 1991. [50] Jakob Nielsen. Usability engineering . Academic Press, Boston, 1993. [51] W illiam L. Nolte. Did I Ever T ell Y ou about the Whale? or Measuring T echnology Maturity . IAP , October 2008. Google-Books-ID: vPsnDwAA QB AJ. [52] Notary Project. Notary project. https:// notaryproject.dev/ , 2026. Accessed: 2026-02-05. [53] Notary Project. notation. https://github.com/ notaryproject/notation , 2026. Accessed: 2026-02- 05. [54] notaryproject/notation contributors. Issue #1254. https://github.com/notaryproject/notation/ issues/1254 , 2026. GitHub issue, accessed 20 February 2026. [55] OpenPGP .org. Openpgp. https://www.openpgp. org/ , 2023. [56] OpenPubkey Project. Openpubkey . https://github. com/openpubkey/openpubkey , 2026. Accessed: 2026- 02-05. [57] C. O’Connor and H. Joffe. Intercoder reliability in qual- itativ e research: Debates and practical guidelines. Inter- national Journal of Qualitative Methods , 2020. [58] Nikhil Patnaik, Joseph Hallett, and A wais Rashid. Us- ability smells: An analysis of dev elopers’ struggle with crypto libraries. [59] Huiyun Peng, Antonio Zhong, Ricardo Andrés Calvo Méndez, Kelechi G. Kalu, and James C. Da vis. How do agents perform code optimization? an empirical study , 2025. [MSR-Challenge’26] Proceedings of the 23rd International Conference on Mining Software Reposito- ries – Mining Challenge. [60] Jason Robbins. Adopting Open Sour ce Softwar e En- gineering (OSSE) Practices by Adopting OSSE T ools , page 245–264. The MIT Press, May 2005. [61] David P Rodgers. Improv ements in multiprocessor sys- tem design. A CM SIGARCH Computer Arc hitectur e News , 13(3):225–231, 1985. [62] Everett M Rogers, Arvind Singhal, and Margaret M Quinlan. Diffusion of innov ations. In An inte grated appr oach to communication theory and resear ch , pages 432–448. Routledge, 2014. [63] S. Ruoti et al. Confused Johnny: When automatic en- cryption leads to confusion and mistakes. In Ninth Symposium on Usable Privacy and Security , SOUPS ’13, pages 1–12, July 2013. [64] S. Sheng et al. Why Johnny still can’t encrypt: Ev al- uating the usability of email encryption software. In Symposium on usable privacy and security . ACM, 2006. [65] Argha v an Sanei and Jinghui Cheng. Characteriz- ing usability issue discussions in open source soft- ware projects. Pr oc. A CM Hum.-Comput. Interact. , 8(CSCW1):30:1–30:26, April 2024. [66] T im Seagren. Linkedin post on sops, git, and sigstore adoption. https://web.archive. org/web/20220324224232/https://www. linkedin.com/posts/tim- seagren- 7876aa112_ sops- git- sigstore- activity- 6910723090570240000- sDXg/ , 2022. LinkedIn post, archiv ed March 24, 2022. [67] R. Shirey . Internet security glossary , version 2. T ech- nical report, RFC Editor , August 2007. https://doi. org/10.17487/rfc4949 . 15 [68] Sigstore. Sigstore: A new standard for signing, verify- ing, and protecting software. https://www.sigstore. dev/ . [69] Sigstore Project. cosign: Code signing for containers and other artifacts. https://github.com/sigstore/ cosign , 2026. Accessed: 2026-02-05. [70] Sigstore Project. Fulcio: Sigstore oidc pki. https: //github.com/sigstore/fulcio , 2026. Accessed: 2026-02-05. [71] Sigstore Project. Rekor: Software supply chain transparency log. https://github.com/sigstore/ rekor , 2026. Accessed: 2026-02-05. [72] Sigstore Project. Sigstore. https://www.sigstore. dev/ , 2026. Accessed: 2026-02-05. [73] Jonny Stoten. Signing docker ofﬁcial images using openpubkey , October 2023. [74] Synopsys. 2020 Open Source Security and Risk Analysis (OSSRA) Report, 2020. https: //www.synopsys.com/software- integrity/ resources/analyst- reports/ 2020- open- source- security- risk- analysis. html . [75] Synopsys. 2024 open source security and risk analysis (OSSRA) report, 2024. https: //www.synopsys.com/software- integrity/ resources/analyst- reports/ open- source- security- risk- analysis.html# introMenu . [76] T . R. Schorlemmer et al. Signing in four public softw are package registries: Quantity , quality , and inﬂuencing factors. In 2024 IEEE Symposium on Security and Privacy (SP) , 2024. [77] T . R. Schorlemmer et al. Establishing provenance be- fore coding: Traditional and next-generation software signing. IEEE Security & Privacy , IEEE S&P Ma gazine ’25 , (01), 2025. [78] Mary Theofanos and Whitney Quesenbery . T ow ards the design of effecti ve formative test reports. J. Usability Studies , 1(1):27–45, Nov ember 2005. [79] Stev en J. V aughan-Nichols. Kubernetes adopts sigstore for supply chain security . https://thenewstack.io/ kubernetes- adopts- sigstore- for- supply- chain- security/ , May 2022. [80] Xinru W ang, Hannah Kim, Sajjadur Rahman, Kushan Mitra, and Zhengjie Miao. Human-llm collaborativ e annotation through effecti ve veriﬁcation of llm labels. In Pr oceedings of the 2024 CHI Conference on Human F actors in Computing Systems , pages 1–21, 2024. [81] A. Whitten and J. D. T ygar . Why Johnny can’ t encrypt: A usability ev aluation of PGP 5.0. In USENIX security symposium , volume 348, 1999. [82] Nagyan Y osse Wibisono and V iany Utami Tjhin. Eval- uating user experience in a public sector digital sys- tem through nielsen’ s heuristic approach. International Journal of Advanced Computer Science & Applications , 16(9):285, 2025. [83] M. W illett. Lessons of the SolarW inds hack. In Survival April–May 2021: F acing Russia , pages 7–25. Routledge, 2023. [84] Laurie W illiams, Giacomo Benedetti, Siv ana Hamer, Ranindya Paramitha, Imranur Rahman, Mahzabin T amanna, Gre g T ystahl, Nusrat Zahan, Patrick Morrison, Y asemin Acar , et al. Research directions in software supply chain security . ACM T ransactions on Softwar e Engineering and Methodology , 34(5):1–38, 2025. [85] Nahathai W ongpakaran, Tinak on W ongpakaran, Danny W edding, and Kilem Li Gwet. A comparison of Cohen’ s kappa and Gwet’ s AC1 when calculating inter -rater reli- ability coefﬁcients: A study conducted with personality disorder samples. BMC Medical Researc h Methodology , 13:61, 2013. [86] W illiam W oodruff. Pgp signatures on pypi: worse than useless, May 2023. Accessed: 2026-02-19. [87] W ooledge Wiki. The XY problem. https://mywiki. wooledge.org/XyProblem , n.d. Accessed 19 February 2026. [88] Y ahoo Inc. Scaling up supply chain security: Imple- menting sigstore for seamless container image signing, 2023. [89] Zhou Y ang, Chenyu W ang, Jieke Shi, Thong Hoang, Pa vneet Kochhar , Qinghua Lu, Zhenchang Xing, and David Lo. What do users ask in open-source ai repos- itories? an empirical study of github issues. In 2023 IEEE/A CM 20th International Confer ence on Mining Softwar e Repositories (MSR) , page 79–91, May 2023. ISSN: 2574-3864. [90] Z. Ne wman et al. Sigstore: Software signing for e v ery- body . In A CM SIGSAC Confer ence on Computer and Communications Security , 2022. Outline of A ppendices The appendix contains the following material: 16 • § A : Ethics Statement. • § B : Extended Background. • § C : Additional methodological details (codebook). • § D : Additional results. A Ethics Statement W e describe our stakeholder-based ethics analysis, conducted following the guidance of Da vis et al. [ 20 ]. Stakeholders: The direct stakeholders in this study are maintainers and contributors of the open-source signing tools whose issue trackers and repositories we analyzed. Indirect stakeholders include downstream dev elopers and organiza- tions that adopt these tools, and the cybersecurity community that may interpret or act upon our ﬁndings. The research team is also a stakeholder: we bear responsibility for accurate rep- resentation and careful interpretation of public artifacts, and the consequences of dissemination. Potential Harms: The primary potential harm of our work is r eputational . W e analyzed only publicly-available artif acts from open-source software repositories and issue trackers. These projects intentionally operate in the public domain to promote transparency , collaboration, and external scrutiny . All engineers whose artifacts appear in our dataset contributed those materials in public forums. No priv ate communications, non-public data, or personally sensiti ve information were ac- cessed. Ne vertheless, our ﬁndings could be interpreted as re- ﬂecting negati vely on speciﬁc organizations or engineers. T o mitigate this risk, we report aggregated results at the ecosys- tem and component lev el. Potential Beneﬁts: This work aims to improve under- standing of usability challenges in identity-based signing sys- tems. By identifying recurring friction points across multiple ecosystems, we seek to inform tool designers, maintainers, and adopters about areas where usability improv ements may strengthen security outcomes. Adopters and the broader pub- lic may beneﬁt if impro ved usability increases the adoption and correct use of signing technologies. Decision to Proceed and Publish: W e judge that the so- cietal beneﬁts of understanding usability barriers in security tooling outweigh the limited reputational risks of aggregate reporting. W e therefore proceeded with dissemination. B Extended Background Here we elaborate on the distinction between “traditional” key-based signing, and identity-based signing. W e also pro- vide a more detailed description of the architectural compo- nents of typical identity-based signing. B.1 T raditional Key-Based Signing Ke y-managed (legac y) software signing tools (illustrated in Figure 5 ), which Schorlemmer et al. [ 76 ] describe as traditional signing tools—such as OpenPGP-style work- ﬂows—rely on long-li ved public-priv ate key pairs that are created and maintained by the signer [ 55 ]. A typical work- ﬂow has three phases. First, the signer generates a key pair and protects the pri v ate key . Second, the signer distributes the public key (or certiﬁcate) so that veriﬁers can discov er it. Third, veriﬁers retriev e the artifact, retriev e the signer’ s public information, and validate the signature before trusting the artifact. This ke y-managed design makes usability hinge on ke y life- cycle tasks that are external to the signing act itself. Examples include key generation choices, backups, rotation, rev ocation, publication to ke y servers , and establishing trust relationships. Usable security research shows that these steps are dif ﬁcult to ex ecute correctly , and dif ﬁculty can reduce adoption or lead to incorrect veriﬁcation decisions [ 29 , 81 ]. Figure 5: Traditional k ey-managed signing centers on a long- liv ed ke y pair that the signer creates and protects and that veriﬁers must obtain to v alidate signatures. The workﬂow highlights two recurring usability b urdens: public-key distri- bution (often via a ke y server) and establishing conﬁdence that the retriev ed public key corresponds to the intended issuer identity . Compare this ﬂo w to that of identity-based signing, depicted in Figure 1 . B.2 Identity-Based Software Signing Software signing is a formally guaranteed method of estab- lishing provenance by ensuring authorship and artifact in- tegrity [ 12 , 25 , 27 , 37 , 38 , 67 ]. It builds on public key cryp- tography , where two complementary functions exist: a signer holds a priv ate key and distrib utes a corresponding public key , and a veriﬁer veriﬁes the identity of the signer using the distributed public key [ 15 , 23 , 40 ]. When v eriﬁcation succeeds, the v eriﬁer gains e vidence that: the artif act has not been modi- ﬁed since signing, and the signer controlled the corresponding priv ate key at signing time [ 40 ]. This process is illustrated in Figure 5 . In practice, these guarantees depend not only on cryptographic primitives but also on the surrounding work- ﬂo w that governs: ho w signing k eys or credentials are created, ho w they are protected, how they are rotated and re voked, ho w 17 T able 7: Core components of identity-based signing ecosystems and their responsibilities. Component Primary Responsibilities and T ypical Implementations Orchestrator Primary developer interface for signing and veriﬁcation. Creates or selects artifacts, obtains identity tokens, generates ephemeral key pairs, creates signatures, and associates signatures with credential material. Examples include Sigstore cosign [ 69 , 72 ], Notary v2 notation [ 52 , 53 ], and OpenPubKe y tooling [ 16 , 56 ]. Identity Provider (IdP) Authenticates a human or workload and issues an identity token presented to other services. Often implemented using standards-based systems such as OpenID Connect pro viders. Examples include enterprise IdPs and de veloper -ecosystem identities ( e.g., Google or GitHub-based authentication), depending on conﬁguration and deplo yment model. Credential Issuer / CA Binds authenticated identity to signing material, commonly by issuing a short-li ved certiﬁcate for an ephemeral public ke y . Enables time-scoped and auditable identity binding. Examples include Sigstore deployments using Fulcio [ 70 ] and enterprise PKI systems such as EJBCA [ 41 , 42 ]. Certiﬁcate and Signature Logs Append-only logs that record certiﬁcate issuance and signing e vents to support transparency , monitoring, and audit. Shape veriﬁcation semantics and operational practices. Example: Sigstore’ s Rekor [ 71 ]. veriﬁers discover the correct veriﬁcation material, and how identities are bound to veriﬁcation material [ 35 , 38 , 76 , 81 ]. Software signing tools [ 23 , 37 ] operationalize these tasks by providing user interfaces, automation, and infrastructure that bridge between the abstract cryptographic model and de vel- oper workﬂows. As a result, the ev olution of signing tools largely reﬂects changes in how tool ecosystems allocate re- sponsibility for key management and identity binding. W e next describe two tool families, ke y-managed (le gacy) and identity-based signing tools, and the workﬂo ws they impose on signers and veriﬁers. Identity-based signing tools aim to reduce direct user key management by shifting responsibility for identity bind- ing and credential issuance to an ecosystem of components [ 18 , 23 ]. Instead of requiring developers to create and dis- tribute long-li ved ke ys, identity-based ecosystems typically rely on: an orchestrator that performs signing and v eriﬁcation, an identity provider that authenticates the signer, a credential- issuing service such as a Certiﬁcate Authority (CA) that binds identity to a signing credential, and an optional logging in- frastructure that supports transparenc y and audit. Examples of tools here include; Sigstore, Notary v2, OpenPubKey , and Ke yfactor . A ke y strength of identity-based tool design is this decou- pling of responsibilities into components. The same signing CLI can often be conﬁgured to work with different credential issuers, and organizations can reuse e xisting PKI services or logging infrastructure rather than adopting a single mono- lithic system. For example, a team might use a Sigstore-style CLI experience for signing while relying on an enterprise CA implementation for credential issuance, or the y may adopt a Ke yfactor-style issuer while inte grating veriﬁcation through other downstream tooling. This modularity increases ﬂexi- bility , but it also expands the conﬁguration surface and in- troduces cross-component usability problems at inte gration boundaries. Orchestrator . This is the primary interface that dev elopers use to sign and verify artifacts and to integrate signing into CI/CD pipelines. In the workﬂow sho wn in Figure 1 , the CLI creates or selects the artif act to be signed (A), obtains an iden- tity token from an IdP (B–C), initiates the process to generate an ephemeral key pair (D-E), creates the artifact signature (G), and associates the resulting signature with the credential material needed for veriﬁcation. Examples of orchestrators in popular identity-based software signing ecosystems include: Sigstore [ 72 ] cosign [ 69 ], Notary v2 [ 52 ] notation [ 53 ], and OpenPubKey tooling [ 16 , 56 ], while some ecosystems may provide alternativ e entry points or embed signing into platform tooling rather than a standalone CLI. Identity provider . The identity pro vider authenticates a hu- man or workload and issues an identity token that the CLI can present to other services. In practice, these pro viders are often standards-based systems such as OpenID Connect providers, and are not implemented by the signing tools themselves. Examples include enterprise IdPs ( e.g ., Okta) and de veloper - ecosystem identities ( e.g ., Google or GitHub-based authen- tication), depending on tool conﬁguration and deployment model. Certiﬁcate authority or cr edential issuer . A credential- issuing service binds the authenticated identity to signing material, commonly by issuing a short-li ved certiﬁcate for an ephemeral public key (E–F). This step enables veriﬁcation to rely on an identity binding that is time-scoped and auditable, rather than on a long-lived ke y that must be distributed and trusted indeﬁnitely . Examples include Sigstore deployments that use a dedicated issuing service—Fulcio [ 70 ]—for identity certiﬁcates and enterprise deployments that use org anizational PKI components such as EJBCA [ 41 , 42 ] as the issuer . Certiﬁcate and signature logs. Many identity-based ecosystems include append-only logs that record issuance ev ents and signing ev ents to support transparency , monitor- ing, and audit e.g ., Sigstore’ s Rekor [ 71 ]. In Figure 1 , the workﬂo w records certiﬁcate-related information (G–I) and signature-related information (H–J) so that veriﬁers can con- sult these records as part of veriﬁcation. These logs are not always mandatory for e very deplo yment, b ut the y often shape veriﬁcation semantics, deb ugging practices, and operational requirements. 18 C Methodological Details C.1 Codebook T able 8 presents our codebook. T able 8: Phase 3 theme codebook with heuristic deﬁnitions. Theme Family Theme Deﬁnition Example Issues Primary theme (inductively generated) Missing feature / enhancement request User requests functionality that does not currently exist, or enhancement of existing capability . EJBCA-473 Primary theme (inductively generated) Unexpected beha vior T ool behav es incorrectly despite user following expected instructions or documentation. Fulcio-1858 Primary theme (inductively generated) Authentication friction User cannot authenticate/authorize smoothly due to token, ﬂow , timeout, or identity setup problems. V ault-31568 Primary theme (inductively generated) Conﬁguration friction User struggles to conﬁgure the tool due to complexity , rigid settings, or unclear setup requirements. V ault-31627 Primary theme (inductively generated) Integration failure/issues Interoperation with external systems f ails or requires ad- ditional support/enhancement. Notation-1254 Primary theme (inductively generated) User confusion / unclear docu- mentation User is confused, asks usage questions, or documentation is unclear/outdated/insufﬁcient. Notation-1332 Primary theme (inductively generated) Build/CI/installation/distribution release issues Problems running/installing/building/distrib uting the tool in local or CI/CD en vironments. EJBCA-824 Primary theme (inductively generated) Performance issue T ool functions but is too slow or resource intensiv e (or needs performance improv ements). EJBCA-211 Primary theme (inductively generated) Security concerns Reported vulnerability , unsafe default, leakage risk, or request to harden security posture. EJBCA-854 Primary theme (inductively generated) Notiﬁcation/Logging Issues Logs, errors, status output, or web/client feedback are unclear , noisy , or not actionable. EJBCA-430 Primary theme (inductively generated) T edious W orkﬂows W orkﬂow is overly manual or requires too man y steps to achiev e a routine task. EJBCA-275 Secondary theme (induc- tiv ely generated) Operational Friction Barriers in setup, en vironment, integration, or workﬂow ex ecution where users know what to do but the system resists. Rekor -2575 Secondary theme (induc- tiv ely generated) Cognitiv e Friction Barriers in understanding or mental model clarity , where users do not know ho w to proceed. Cosign-3671 Secondary theme (induc- tiv ely generated) Functional Reliability Friction when expected function is undermined by fail- ures, slowness, or security risk. EJBCA-952 Secondary theme (induc- tiv ely generated) Functional Gap T ool currently lacks capability needed for the user’ s in- tended task. EJBCA-943 Nielsen theme (deductiv e) 1. V isibility of system status System should keep users informed with timely , clear feedback about ongoing operations. Cosign-4438 Nielsen theme (deductiv e) 2. Match between system and the real world Use familiar language and concepts; av oid internal jar gon that conﬂicts with user expectations. Notation-1170 Nielsen theme (deductiv e) 3. User control and freedom Provide clear e xits/undo paths so users can recover from unwanted states. Rekor -2575 Nielsen theme (deductiv e) 4. Consistency and standards Use consistent terminology/behavior and follow ecosys- tem con ventions. V ault-13192 Nielsen theme (deductiv e) 5. Error prevention Prev ent a voidable mistak es with checks, guardrails, and conﬁrmations before commitment. V ault-31586 Nielsen theme (deductiv e) 6. Recognition rather than re- call Reduce memory b urden by making required actions/op- tions visible in the interface/help. V ault-31612 Nielsen theme (deductiv e) 7. Flexibility and efﬁciency of use Support efﬁcient w orkﬂo ws for experts while remaining usable for less-experienced users. Cosign-4507 Nielsen theme (deductiv e) 8. Aesthetic and minimalist de- sign A void irrele vant/noisy information that hides key outputs or decisions. Cosign-3911 Nielsen theme (deductiv e) 9. Help users recognize, diag- nose, and recov er from errors Error messages should be clear , speciﬁc, and provide ac- tionable recov ery guidance. V ault-31582 19 Theme Family Theme Deﬁnition Example Issues Nielsen theme (deductiv e) 10. Help and documentation Documentation and help text should be disco verable, ac- curate, and aligned with current behavior . Cosign-3671 Affected component theme Authentication/Authorization tools Identity veriﬁcation and permission logic, including OIDC ﬂows, token handling, MF A, and RB A C behav- ior . Cosign-4438 Affected component theme CLI tooling Command-line UX including command/ﬂag behavior , prompts, and output formatting. Notation-1259 Affected component theme Signing workﬂo w End-to-end client-side path used to produce cryptographic signatures for artifacts. Notation-1254 Affected component theme V eriﬁcation workﬂow End-to-end client-side path used to validate signatures and trust decisions. Cosign-3671 Affected component theme Polic y/conﬁguration Conﬁguration and policy e xpression/interpretation (e.g., Y AML/JSON, OP A/Rego, policy enforcement). Cosign-2570 Affected component theme Build/CI/Installation Friction in installation/build and automated execution in CI/CD or scripted en vironments. Notation-1288 Affected component theme Release pipeline Maintainer-side release engineering: building, signing, and publishing ofﬁcial binaries/artif acts. Ejbca-824 Affected component theme Notiﬁcation/Logging Clarity/actionability of logs, warnings, status output, and debug traces. Notation-1259 Affected component theme Core Issue af fects the o verall tool/core behavior when a speciﬁc component boundary is unclear . Notation-1239 Affected component theme API Issues in API endpoints, request/response behavior , and API-facing interactions. V ault-31582 Affected component theme W eb Client Issues in web UI/client/admin dashboard interactions and feedback. Cosign-1108 Affected component theme Ke y Management Core / Se- crets Backend Ke y/secret lifecycle and secure storage infrastructure, in- cluding KMS/HSM/keychain inte grations. V ault-31601 20 C.2 Poisson Assumption Checks and Robust- ness Analyses Poisson Assumption Checks. W e assessed the adequacy of Poisson trend models using two dispersion diagnostics and a residual dependence check: Pearson dispersion ( χ 2 P / df ), deviance dispersion, and a lag-1 Ljung–Box test on Pearson residuals (T ables 9 and 10 ). Pearson dispersion values near 1 indicate approximate Poisson variance, while values sub- stantially greater than 1 indicate overdispersion (v ariance ex- ceeding the mean). Using a conservati ve ﬂag of χ 2 P / df > 1 . 5 , ov erdispersion was common across model sets: 5/8 ov erall- by-tool models, 6/11 aggregate models by L1 theme, 7/12 aggregate models by af fected component, and 21/90 tool- by-component models (T able 9 ). In the overall-by-tool ﬁts, ov erdispersion was especially large for openpubke y/openpub- key ( χ 2 P / df = 7 . 16 ) and notaryproject/notation ( 4 . 51 ), while sigstore/rekor was close to Poisson v ariance ( 0 . 89 ) (T able 10 ). Lag-1 residual autocorrelation was signiﬁcant for notarypro- ject/notation and openpubke y/openpubke y , indicating tem- poral dependence not fully captured by a single linear time slope. These diagnostics suggest that Poisson mean–variance as- sumptions are frequently violated in this corpus. Accordingly , we interpret Poisson signiﬁcance tests cautiously and focus primarily on trend direction and ef fect magnitude; we report robustness checks using heteroskedasticity-robust standard errors and negati ve-binomial sensitivity models. Robust Standard Err ors. As a ﬁrst robustness check, we re-estimated the overall-by-tool Poisson models using HC0 ro- bust co v ariance estimators (T able 11 ). Inference was direction- ally stable: all six tools that were signiﬁcant under classical Poisson standard errors remained signiﬁcant under robust stan- dard errors, and both non-signiﬁcant tools (Ke yfactor/ejbca-ce and Ke yfactor/signserver-ce) remained non-signiﬁcant. F or the most ov erdispersed tools, robust SEs were notably lar ger than classical SEs — openpubkey/openpubke y’ s SE increased from 0.0079 to 0.0150 and notaryproject/notation’ s from 0.0047 to 0.0088 — yet robust p -values remained strongly signiﬁcant, consistent with genuine underlying trends. Negative-Binomial Sensitivity . As a second rob ustness check, we ﬁt neg ativ e-binomial trend models for the overall- by-tool series (T able 12 ), which relax the Poisson varianc e restriction by estimating an overdispersion parameter ˆ α . The sign pattern was preserved across all tools and signiﬁcance conclusions were unchanged at α = 0 . 05 . For the ﬁv e well- behav ed tools, ˆ α was near zero (0.005–0.054), conﬁrming Poisson was already appropriate. For the most o verdispersed tools, ˆ α was substantially lar ger: 0.636 for notaryproject/no- tation, 1.143 for Keyf actor/signserver -ce, and 4.542 for open- pubkey/openpubk ey . For openpubkey/openpubke y , the esti- mated magnitude of the positive trend w as sensitiv e to model choice ( β Poisson = 0 . 040 vs. β NB = 0 . 104 , p = 0 . 013 ), though the direction and signiﬁcance were preserved. Sigstore repos- itories, V ault, and Notation remained negativ e and signiﬁ- cant; Ke yfactor repositories remained non-signiﬁcant. Over- all, these results indicate that our main substantiv e conclusions are robust to v ariance misspeciﬁcation, while reinforcing that classical Poisson p -v alues should be interpreted cautiously in the presence of ov erdispersion and residual dependence. D Additional Results D.1 Results f or RQ2 RQ2 asks which functionalities are most frequently inv olved in usability problems. T able 13 shows the top-three trouble areas for each studied tool. This T able is discussed in § 5.2 . D.2 Results f or RQ3 RQ3 asks how the reported usability problems change over time. W e include several additional ﬁgures and tables here, which are discussed in § 5.3 . • Figure 6 shows Nielsen heuristic time-trend slopes by tool, indicating that declines in reported issues v ary across heuristic cate gories and tools. This ﬁgure comple- ments Figure 3 . • T able 14 reports aggregate Poisson time-trend estimates by inducti ve primary theme, sho wing that most themes decline ov er time but at dif ferent rates. • T able 15 presents aggregate Poisson trends by affected component, illustrating that some architectural compo- nents decline while others persist. • Figure 7 visualizes tool-lev el Poisson slopes by theme/- component, highlighting heterogeneity in maturity tra- jectories across tools and usability surfaces. 21 T able 9: Poisson model diagnostic checks by model set (dispersion and residual autocorrelation). Model Set N Median Disp. P90 Disp. N > 1 . 5 N < 0 . 8 N autocorr % > 1 . 5 aggregate_by_component_theme 12 1.54 2.11 7 0 0 58.33 aggregate_by_l1_theme 11 1.66 2.06 6 0 0 54.55 overall_usability_by_tool 8 1.87 5.30 5 0 2 62.50 tool_by_Associated_Component_Theme 90 1.13 2.12 21 5 8 23.33 tool_by_L1_Theme 86 1.17 1.86 22 8 9 25.58 tool_by_L1_Theme_Secondary 32 1.50 2.99 16 2 6 50.00 tool_by_Nielsen_theme 77 1.27 2.24 22 11 10 28.57 T able 10: Poisson diagnostic checks for overall monthly usability-count trends by tool. T ool Disp. (Pearson) Disp. (Deviance) Overdispersed > 1.5 Underdispersed < 0.8 Ljung-Box p Autocorr . Ke yfactor/ejbca-ce 1.106 1.288 no no 0.453 no Ke yfactor/signserver -ce 1.544 1.229 yes no 0.684 no hashicorp/vault 2.468 2.681 yes no 0.780 no notaryproject/notation 4.505 4.358 yes no 0.000 yes openpubkey/openpubke y 7.156 5.324 yes no 0.000 yes sigstore/cosign 2.202 1.936 yes no 0.658 no sigstore/fulcio 1.027 1.049 no no 0.165 no sigstore/rekor 0.892 1.039 no no 0.654 no T able 11: Sensitivity check for overall usability trends: Poisson standard errors vs. HC0 robust standard errors by tool. Rate ratio (RR) is per month. T ool β 1 SE p Robust SE Rob ust p RR/month sigstore/rekor -0.0505 0.0060 2 . 63 × 10 − 17 0.0056 1 . 69 × 10 − 19 0.9508 sigstore/fulcio -0.0479 0.0066 4 . 14 × 10 − 13 0.0072 3 . 46 × 10 − 11 0.9533 sigstore/cosign -0.0382 0.0029 2 . 43 × 10 − 40 0.0048 1 . 36 × 10 − 15 0.9625 notaryproject/notation -0.0322 0.0047 1 . 20 × 10 − 11 0.0088 2 . 65 × 10 − 4 0.9683 hashicorp/vault -0.0250 0.0020 3 . 00 × 10 − 37 0.0032 1 . 07 × 10 − 14 0.9753 Ke yfactor/ejbca-ce -0.0001 0.0059 0.991 0.0059 0.991 0.9999 Ke yfactor/signserver -ce 0.0126 0.0152 0.408 0.0141 0.373 1.0127 openpubkey/openpubke y 0.0404 0.0079 3 . 39 × 10 − 7 0.0150 7 . 20 × 10 − 3 1.0412 T able 12: Negati ve-binomial sensiti vity check for overall monthly usability-count trends by tool. T ool β 1 p RR/mo NB β NB SE NB p NB RR/mo ˆ α sigstore/rekor -0.0505 2 . 63 × 10 − 17 0.9508 -0.0506 0.0062 2 . 04 × 10 − 16 0.9506 0.0049 sigstore/fulcio -0.0479 4 . 14 × 10 − 13 0.9533 -0.0477 0.0068 2 . 03 × 10 − 12 0.9535 0.0145 sigstore/cosign -0.0382 2 . 43 × 10 − 40 0.9625 -0.0373 0.0038 4 . 17 × 10 − 23 0.9634 0.0532 notaryproject/notation -0.0322 1 . 20 × 10 − 11 0.9683 -0.0446 0.0112 6 . 77 × 10 − 5 0.9564 0.6356 hashicorp/vault -0.0250 3 . 00 × 10 − 37 0.9753 -0.0272 0.0033 6 . 81 × 10 − 17 0.9731 0.0542 Ke yfactor/ejbca-ce -0.0001 0.991 0.9999 -0.0001 0.0061 0.991 0.9999 0.0233 Ke yfactor/signserver -ce 0.0126 0.408 1.0127 0.0151 0.0206 0.464 1.0152 1.1425 openpubkey/openpubke y 0.0404 3 . 39 × 10 − 7 1.0412 0.1041 0.0412 1 . 14 × 10 − 2 1.1097 4.5421 22 T able 13: T op three affected components per repository (usability issues only). V alues in parentheses are within-repository percentages. Cor e is e xcluded from the ranking as it is a cross-cutting label applied when no speciﬁc sub-component boundary can be identiﬁed, rather than a discrete architectural surface. Where Cor e would ha ve appeared in the top three, the next speciﬁc component is shown instead and its displaced rank is noted. † T ool/Repository #1 (Most common) #2 (Second-most) #3 (Third-most) sigstore/cosign CLI T ooling (30.3) V eriﬁcation W orkﬂow (17.5) Signing W orkﬂow (17.5) sigstore/fulcio Policy/Conﬁguration (22.2) Authentication/Authorization (15.4) API (15.4) sigstore/rekor API (39.1) CLI T ooling (20.2) V eriﬁcation W orkﬂow (8.5) † keyf actor/signserver -ce W eb Client (25.0) Build/CI/Installation (17.5) Policy/Conﬁguration (15.0) keyf actor/ejbca-ce Policy/Conﬁguration (25.6) W eb Client (24.0) Build/CI/Installation (13.6) notaryproject/notation CLI T ooling (36.2) Signing W orkﬂow (15.6) V eriﬁcation W orkﬂow (12.4) openpubkey/openpubk ey Authentication/Authorization (23.4) API (14.5) CLI T ooling (13.1) hashicorp/vault Policy/Conﬁguration (18.4) Ke y Mgmt Core / Secrets Back. (16.9) API (15.8) † For sigstore/rekor , Core would rank #3 (9.3%); V eriﬁcation W orkﬂow is sho wn as the next speciﬁc component. For all other tools, Cor e ranked 4th or lower and did not af fect the top-three listing. Figure 6: Heatmap of Poisson time-trend slopes ( β 1 ) by tool and Nielsen’ s T en usability Heuristics’ theme. Negati ve values indicate decreasing expected monthly issue counts ov er time; positive v alues indicate increasing counts. Only statistically signiﬁcant cells ( p < 0 . 05) are colorized and annotated with the estimated β 1 ; non-signiﬁcant cells are masked. T able 14: Aggregate Poisson regression slopes for expected monthly usability issue counts by L1 theme across all tools. All themes show statistically signiﬁcant ne gative trends ( p < 0 . 05). Sorted by β 1 (steepest decline ﬁrst). L1 Theme β 1 RR/Month %/Month p T otal Missing feature / enhancement request -0.0326 0.968 -3.21 1 . 21 × 10 − 62 1,481 Security concerns -0.0326 0.968 -3.20 9 . 21 × 10 − 12 248 Authentication friction -0.0304 0.970 -2.99 9 . 42 × 10 − 9 199 T edious W orkﬂows -0.0284 0.972 -2.80 1 . 70 × 10 − 14 401 User confusion / unclear documentation -0.0270 0.973 -2.66 2 . 35 × 10 − 30 982 Integration failure/issues -0.0218 0.978 -2.16 1 . 30 × 10 − 14 659 Unexpected beha vior -0.0218 0.978 -2.16 1 . 35 × 10 − 15 709 Performance issue -0.0218 0.978 -2.16 3 . 16 × 10 − 3 97 Notiﬁcation/Logging / W eb UI Issues -0.0217 0.979 -2.14 1 . 55 × 10 − 11 512 Build/CI/Installation release issues -0.0193 0.981 -1.91 1 . 04 × 10 − 5 272 Conﬁguration friction -0.0193 0.981 -1.91 3 . 55 × 10 − 7 364 23 T able 15: Aggregate Poisson regression slopes for e xpected monthly usability issue counts by affected component across all tools. Monthly counts are modeled in inclusi ve calendar -month bins over Nov ember 2021–November 2025 (49 month bins; 48 months elapsed). All components show statistically signiﬁcant ne gative trends ( p < 0 . 05). Sorted by β 1 (steepest decline ﬁrst). Component β 1 RR/Month %/Month p T otal V eriﬁcation W orkﬂow -0.0338 0.967 -3.32 1 . 08 × 10 − 18 390 CLI T ooling -0.0308 0.970 -3.03 6 . 99 × 10 − 43 1,110 API -0.0306 0.970 -3.01 1 . 15 × 10 − 25 654 Policy/Conﬁguration -0.0289 0.971 -2.85 8 . 13 × 10 − 30 847 Signing W orkﬂow -0.0253 0.975 -2.50 3 . 56 × 10 − 12 407 Authentication/Authorization -0.0246 0.976 -2.43 8 . 11 × 10 − 14 495 Ke y Mgmt Core / Secrets Backend -0.0241 0.976 -2.38 5 . 43 × 10 − 15 563 Core † -0.0227 0.978 -2.24 4 . 27 × 10 − 5 173 Build/CI/Installation -0.0221 0.978 -2.19 2 . 07 × 10 − 8 340 Notiﬁcation/Logging -0.0221 0.978 -2.19 1 . 28 × 10 − 6 254 Release Pipeline -0.0191 0.981 -1.89 4 . 29 × 10 − 3 117 W eb Client -0.0161 0.984 -1.60 1 . 47 × 10 − 5 372 † Cor e represents issues where no speciﬁc sub-component boundary could be identiﬁed (solo-only assignments, N = 173 ); co-occurring Core labels were redistrib uted to their speciﬁc co-labels in the pipeline. Figure 7: Poisson time-trend slopes ( β 1 ) for monthly usability-issue counts by tool and af fected component theme. Cells show statistically signiﬁcant estimates ( p < 0 . 05 ); negati ve slopes indicate declining expected issue counts o ver time, positi ve slopes indicate increasing counts, and color intensity reﬂects slope magnitude. 24

A Longitudinal Study of Usability in Identity-Based Software Signing

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment