Towards an OSF-based Registered Report Template for Software Engineering Controlled Experiments

Towards an OSF-based Registered Report Template for Software Engineering Controlled Experiments
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Context: The empirical software engineering (ESE) community has contributed to improving experimentation over the years. However, there is still a lack of rigor in describing controlled experiments, hindering reproducibility and transparency. Registered Reports (RR) have been discussed in the ESE community to address these issues. A RR registers a study’s hypotheses, methods, and/or analyses before execution, involving peer review and potential acceptance before data collection. This helps mitigate problematic practices such as p-hacking, publication bias, and inappropriate post hoc analysis. Objective: This paper presents initial results toward establishing an RR template for Software Engineering controlled experiments using the Open Science Framework (OSF). Method: We analyzed templates of selected OSF RR types in light of documentation guidelines for controlled experiments. Results: The observed lack of rigor motivated our investigation of OSF-based RR types. Our analysis showed that, although one of the RR types aligned with many of the documentation suggestions contained in the guidelines, none of them covered the guidelines comprehensively. The study also highlights limitations in OSF RR template customization. Conclusion: Despite progress in ESE, planning and documenting experiments still lack rigor, compromising reproducibility. Adopting OSF-based RRs is proposed. However, no currently available RR type fully satisfies the guidelines. Establishing RR-specific guidelines for SE is deemed essential.


💡 Research Summary

The paper addresses the persistent problem of insufficient rigor in reporting controlled experiments within software engineering (SE), which hampers reproducibility and transparency. To mitigate these issues, the authors explore the use of Registered Reports (RR) – a pre‑registration mechanism that requires researchers to submit hypotheses, methods, and analysis plans before data collection, followed by peer review and conditional acceptance. The study focuses on leveraging the Open Science Framework (OSF), a widely used platform that offers a variety of RR templates, and evaluates how well these templates align with a comprehensive set of SE experiment documentation guidelines.

The authors adopt the guideline framework proposed by Jedlitschka et al., which enumerates 37 detailed items (G1‑G37) covering every stage of a controlled experiment: title, authorship, structured abstract (background, objectives, methods, results, limitations, conclusions), keywords, problem statement, research objectives, context, related work, experimental planning (goals, units, materials, tasks, hypotheses, variables, design, procedures, analysis), execution (preparation, deviations), analysis (descriptive statistics, data reduction, hypothesis testing), interpretation (evaluation, threats to validity, inferences, lessons learned), conclusions (summary, impact, future work), acknowledgments, references, and appendices. These guidelines are intended to ensure that SE experiments are fully documented, enabling replication and critical assessment.

The authors then systematically examine the eleven RR types available on OSF (e.g., Preregistration (RR.1), Open‑Ended Registration (RR.2), Qualitative Preregistration (RR.3), etc.). They first discard types unsuitable for controlled experiments (qualitative, exploratory, etc.) and map the remaining templates against each of the 37 guideline items. The analysis reveals that the most comprehensive OSF template, RR.1, covers many early‑stage items (G1‑G21) but fails to address crucial later‑stage components such as execution deviations (G22‑G24) and interpretation/lessons learned (G28‑G31). Moreover, OSF’s built‑in customization options are limited; adding fields for complex SE designs (e.g., factorial or crossover designs) or for hierarchical experimental units is cumbersome. Consequently, none of the existing OSF RR types fully satisfy the SE documentation guidelines.

In response, the authors propose an initial OSF‑based RR template specifically tailored for SE controlled experiments. The proposed template augments the standard OSF form with additional sections and checklists that directly map to each guideline item. Key features include: (1) mandatory inclusion of “‑ A controlled experiment” in the title; (2) detailed author role and contact information; (3) a structured abstract split into eight predefined sub‑fields; (4) explicit sections for problem statement, objectives, context, and related work; (5) separate components for experimental units, materials, tasks, hypotheses, variables, design, procedures, and analysis plan; (6) dedicated fields for documenting preparation activities and any deviations from the original plan; (7) systematic capture of descriptive statistics, data‑set reduction decisions, and hypothesis‑testing results; (8) sections for interpretation, threats to validity, inferences, lessons learned, and future work. The template leverages OSF’s “Components” feature to keep each part in a separate, version‑controlled file, facilitating collaboration and traceability.

The authors discuss lessons learned from this exercise. First, the lack of automated validation in OSF means that researchers must manually ensure completeness, which can be error‑prone. Second, SE experiments often involve multiple dependent variables and complex designs that are not easily expressed in the generic OSF schema; thus, additional metadata standards are needed. Third, while the proposed template improves coverage, it still requires researchers to maintain auxiliary documents (e.g., README files) for items that OSF cannot natively host.

Finally, the paper outlines future work. The authors suggest developing custom OSF plugins via its API to provide real‑time checklist validation and to enforce mandatory fields. They also propose establishing a SE‑specific RR registry that could host community‑maintained templates and best‑practice examples. Pilot testing the template in upcoming SE conferences (ICSE, ESEM, etc.) is recommended to gather feedback and refine the design. Long‑term, the authors advocate empirical studies to assess the impact of RR adoption on research quality, bias reduction, and reproducibility in SE.

In conclusion, while OSF currently offers valuable infrastructure for pre‑registration, its existing RR templates do not fully meet the rigorous documentation needs of SE controlled experiments. The authors’ initial OSF‑based RR template bridges many of these gaps and serves as a foundation for further customization and community adoption, ultimately aiming to raise the standard of experimental rigor and reproducibility in software engineering research.


Comments & Academic Discussion

Loading comments...

Leave a Comment