aph wrote:There are many articles and a few books referencing this particular Physicians Health Study with an example of how to calculate the effect size. The 0.77% is calculated as the proportion of people that got MI in the control group (0.0171 or 1.71%) minus the proportion of people who got MI in the experimental group (0.0094 or 0.94%), which is 0.0077 or 0.77%. I'll have to get back with the formula for calculating the sigma, this article only references the number from a different article I can't access.

I edited my post above because I got home and looked up and did the math. It's a simple Bernoulli trial, so the standard deviation is sqrt(p(1-p)), and that version of the effect size is either (p

_{1}-p

_{2})/sigma

_{1} or (p

_{1}-p

_{2})/sigma

_{2}, where 1 and 2 refers to the two groups. Getting d=0.06 results from choosing the larger sigma, whereas it's closer to 0.08 for the smaller one.

I'm starting to forget why is there an issue with not reporting studies that don't find statistically significant effects. If it is not stat. significant, that there is just no big effect. A larger study might find some effect, but it is still going to be a small effect. Sure, it would be better to report all, but it seems like a minor problem.

If there are 100 studies, then we would expect even from totally random data one of them will find statistical significance at the p=0.01 level. If that's the only study that gets reported, then it looks like a strong result, when in fact there is no effect at all. (This is true regardless of effect sizes or sample sizes.)

Reporting all studies is the only way we can judge whether the apparent statistical significance of one study is actually significant.

Edit: let's consider a toy example with flipping fair coins. Suppose there's a lot of interest whether pennies are more likely to come up heads than dimes, and so people run experiments by flipping five randomly selected pennies and then flipping five randomly selected dimes (they, like you, figure that if there's a large effect size, it'll show up even in this very small sample). Suppose too that everyone's decided p=0.001 should be the threshold for significance in thinking there's anything interesting about coins.

Then you see a study published, finding that pennies came up heads 100% of the time and dimes came up heads 0% of the time. This is an incredibly large effect (infinite by all three methods described in your link), and is significant at p<0.001.

Should you put stock in this study?

If you know that people don't bother to publish negative results, then you should do nothing of the kind, because you don't know how many other studies were done that found nothing significant. If there were about 1000 of them, then it shouldn't be surprising even with completely fair coins that one study would have the results in question.