Analyzing families of experiments in Software Engineering

Thesis event information

Date and time of the thesis defence

Place of the thesis defence

Remote connection:

Topic of the dissertation

Analyzing families of experiments in Software Engineering

Doctoral candidate

Master of Science Adrian Santos

Faculty and unit

University of Oulu Graduate School, Faculty of Information Technology and Electrical Engineering, M3S

Subject of study

Information Processing Science


Professor Robert Feldt, Chalmers University


Professor Natalia Juristo, Technical University of Madrid

Add event to calendar

Analyzing Families of Experiments in Software Engineering

Experimentation is common across the sciences, and Software Engineering (SE) in particular. SE experiments allow assessing the performance of new treatments (e.g. technologies, processes, or tools), and verifying whether hypotheses on the performance of such treatments hold (e.g., does Java outperform C++ in terms of quality?). Unfortunately, isolated SE experiments suffer from two notorious threats to validity: (1) a small number of subjects typically participate in SE experiments (i.e., SE experiments' small sample sizes), what makes results unreliable; (2) the lack of generalizability of the results to different contexts rather than those of the experimental settings. With the aim of overcoming such weaknesses, SE researchers from different groups and institutions are collaborating towards the construction of groups of experiments by means of replication (i.e., conducting families of experiments). Disparate aggregation techniques are being applied to aggregate experiments' results within families.

Applying unsuitable aggregation techniques may undermine families' potential to obtain in-depth insights from experiments' results. With the aim of aiding SE researchers to apply up-to-date aggregation techniques, we have conducted this dissertation. In particular, we first conducted a systematic mapping study to identify the aggregation techniques that had been used to analyze SE families. Meanwhile we conducted a literature review in mature experimental disciplines such as medicine and pharmacology to learn about the advantages and disadvantages of each aggregation technique. Then, we acknowledged some differences between SE families and their closer representatives in medicine: multicenter clinical trials. In view of these differences, we tailored an analysis procedure with a set of embedded guidelines to overcome the most common limitations of SE families in terms of joint data analysis. We applied the proposed procedure to analyze a stereotypical SE family of experiments.

Finally, we analyzed a family of experiments on Test-Driven Development with the aggregation techniques commonly recommended in mature experimental disciplines. Families of experiments grant access to the raw data, and to the characteristics of the experimental settings and the participants. SE families are usually comprised by a low number of experiments with small and dissimilar sample sizes and heterogeneous results. Narrative synthesis, Aggregated Data (AD), Individual Participant Data (IPD) mega-trial or stratified, and Aggregation of $p$-values have been used in SE to analyze families. AD and IPD stratified, when used in tandem, seem suitable to analyze SE families. In sum, applying unsuitable aggregation techniques to analyze SE families may translate into wasted resources and unwarranted conclusions. The aggregation techniques used to analyze families should be justified in research articles to increase the reliability of the findings. The use of guidelines may ease such endeavour.
Last updated: 21.4.2020