Heterogeneity between studies was evaluated using the I2 test

Thus, even if all studies had an infinitely large sample size, the observed study effects would still vary because of the real differences in treatment effects. Such heterogeneity in treatment effects is caused by differences in study populations (such as age of patients), interventions received (such as dose of drug), follow up length, and other factors.In the random effects example in figure 1, I2 is 71%, suggesting 71% of the variability in treatment effect estimates is due to real study differences (heterogeneity) and only 29% due to chance.3 This is visually evident from the wide scatter of effect estimates with little overlap in their confidence intervals, in contrast to the fixed effect example (fig 1). The random effects model summary result of 0.33 (95% confidence interval 0.48 to 0.18) provides an estimate of the average treatment effect, and the confidence interval depicts the uncertainty around this estimate.

This method is recommended by Higgins et al30 to avoid excessive weightings from "double counts" originating from the control group shared by the multiple treatment arms.A further complication arose because our meta analytical packages required that intervention effects be entered in terms of mean differences, their standard deviation, and sample size. As the meta analyses returned point estimates and their standard errors instead, we arbitrarily fixed our effective sample size for the combined intervention effect to the sum of the sample sizes of all K treatment groups in each study, and then back calculated the standard deviation required for the combined intervention effect to have the right level of precision. As the forest plot depends solely on the product of the standard deviation with the square root of the reported sample size, the arbitrariness in these calculations did not affect the validity of the forest plot itself.Heterogeneity between studies was evaluated using the I2 test and Galbraith plots31 and the 95% confidence interval for I2 calculated using the method of Higgins et al.32 33 Small study effects were appraised by visual inspection of funnel plots of effect size against the standard error, with asymmetry assessed formally with Egger's test, chosen over Begg's test for its greater specificity and power,34 where a P value less than 0.1 was considered as significant.35To explore potential sources of heterogeneity, we conducted moderation testing using intervention type (diet and diet and/or exercise), intervention length (6 months and >6 months), age (As a secondary analysis, we recoded FTO genotypes using a dominant model, where participants with one (AT) and two (AA) copies of the minor allele were grouped together and compared with those with no copies of the minor allele (TT).Given that only one study20 included drug based interventions, we run a further sensitivity analysis where only the lifestyle intervention treatment arm was included.

