I was thinking about the phenomenon of phacking (or data dredging), and how it's much more likely to give apparently significant results than proper hypothesis testing would give.
In particular, I'm considering the case where a researcher keeps increasing the sample size until statistical significance is reached. (As in, "Oh we ran the test with 20 participants, and the data are suggestive but not significant, so let's test another person," repeated until significance is reached.)
My question is, if this process is allowed to continue indefinitely, what's the probability of eventually hitting a statistically significant result (at some predefined level of significance)?
For example, when I modeled it as a thousand runs of 100 coin flips, 54 had few enough total heads for p<0.05, while 194 reached "too few heads" significance at some point during the run (i.e. 140 random walks stumbled into and then back out of significance). When I did a thousand runs of a thousand coin flips, 47 had "too few" heads overall while 346 reached significance at some point, meaning an additional 150ish of the random walks that never stumbled into significance in the first 100 steps managed to do so at least once in the subsequent 900.
Is "eventually stumbling into significance" the kind of tail event that will almost surely happen as the runs are allowed to get arbitrarily long, or is there some limit strictly less than 1?
Also, is there a known expression for the probability of stumbling into significance at some point on a walk of length N (i.e. an expression which would give something near 19.4% for N=100 and something near 34.6% for N=1000)?
Mathematics of phacking: random walks and significance
Moderators: gmalivuk, Moderators General, Prelates

 Posts: 224
 Joined: Tue Jun 17, 2008 11:04 pm UTC
Re: Mathematics of phacking: random walks and significance
If the underlying process is exactly zero mean or fair with respect to what you're testing (i.e. fair coin flips), then stumbling into significance will almost surely happen if you are allowed to keep going arbitrarily long, and a multiplicative decrease in number of runs that have not yet hit significance at some point requires you to multiplicatively increase long you are going. i.e. the inverse of the proportion of runs hitting significance should be asymptotically polynomialish with the exponent of the polynomial depending on your threshold on p.
Basic intuition:
I don't know an exact expression though.
On the other hand, if the underlying process is slightly biased rather than exactly fair, such as coins biased in favor of heads, then you don't get this sort of long asymptotic tail. Roughly, stumbling into significance on the "correct" side will happen with rapidly increasing chance as you start reaching the amount of data needed data to distinguish the bias from noise, whereas doing so on the "wrong" side will only happen with some total probability strictly less than 1.
As a result, I think the "increase your sample size until significance" issue alone, unlike things like publication bias and most other experimenterdegreesoffreedom, is dealablewith if you just pay attention to confidence intervals on effect magnitudes rather than only the sign of the result. Because if someone really has no other degrees of freedom and is obligated to publish everything and the process underneath is truly zero mean, doing repeated studies where you try to continue until significance on each one eventually requires the researcher to publish results where they had to collect arbitrarily much data before hitting significance and therefore be publishing confidence intervals on effect magnitudes that are arbitrarily small and close to zero. And if the process wasn't truly zero mean, then you can only make the "wrong" conclusion with bounded probability and eventually will have to publish studies that taken together make you strongly confident in the effect in the correct direction with the correct magnitude.
Alternatively, you can also consider things like SPRT.
Basic intuition:
 Do 100 flips. Did you ever hit significance?
 No? Okay, do 10000 more flips, which is so much more data that it should completely wash out and make negligible the result of the 100 flips and give you another almost independent "chance" to find p < 0.05. Did you ever hit significance?
 No? Okay, do 1000000 more flips, which is so much more data that it should completely wash out and make negligible the result of the 10000 flips and give you another almost independent "chance" to find p < 0.05. Did you ever hit significance?
 ... etc
I don't know an exact expression though.
On the other hand, if the underlying process is slightly biased rather than exactly fair, such as coins biased in favor of heads, then you don't get this sort of long asymptotic tail. Roughly, stumbling into significance on the "correct" side will happen with rapidly increasing chance as you start reaching the amount of data needed data to distinguish the bias from noise, whereas doing so on the "wrong" side will only happen with some total probability strictly less than 1.
As a result, I think the "increase your sample size until significance" issue alone, unlike things like publication bias and most other experimenterdegreesoffreedom, is dealablewith if you just pay attention to confidence intervals on effect magnitudes rather than only the sign of the result. Because if someone really has no other degrees of freedom and is obligated to publish everything and the process underneath is truly zero mean, doing repeated studies where you try to continue until significance on each one eventually requires the researcher to publish results where they had to collect arbitrarily much data before hitting significance and therefore be publishing confidence intervals on effect magnitudes that are arbitrarily small and close to zero. And if the process wasn't truly zero mean, then you can only make the "wrong" conclusion with bounded probability and eventually will have to publish studies that taken together make you strongly confident in the effect in the correct direction with the correct magnitude.
Alternatively, you can also consider things like SPRT.
 gmalivuk
 GNU Terry Pratchett
 Posts: 26596
 Joined: Wed Feb 28, 2007 6:02 pm UTC
 Location: Here and There
 Contact:
Re: Mathematics of phacking: random walks and significance
Yeah, I know there are ways to deal with sequential testing, just as there are ways to adjust for multiple comparisons. I was just curious about the probabilities when things are being done improperly, either through naivete or dishonesty. (For example, as you increase the number of separate ("sufficiently" independent, whatever that may turn out to mean) questions you ask about totally random data, the likelihood of hitting p<0.05 for one of them goes to 1.)
Re: Mathematics of phacking: random walks and significance
I recently had the same question, and after a lot of searching, dug up these two links:
https://mathoverflow.net/questions/6444 ... ceedsqrtt
https://math.stackexchange.com/question ... 152#210152
So from what I understand, stumbling into significance will almost surely happen, but the amount of flips needed is infinite in expectation.
The only part I'm still shaky on is the interpretation the statement
(Where S_k is the sum of k iid random variables with expectation 0 and variance 1)
So by expanding out the definitions for lim/sup, my understanding is that this is equivalent to saying:
For every real number w, there is an integer n, such that there exists an integer k >= n such that S_k/sqrt(k) > w, with probability 1.
To me, this seems to be equivalent to: For every real number w, there is an integer k such that S_k/sqrt(k) > w, with probability 1.
However, this would be written as P(sup S_k/sqrt(k) = inf) = 1, which seems like it definitely has a different interpretation from the original equation, and isn't the same. So I'm not sure if i've messed up something here.
In other words, it is true that for any expression E(n), lim sup E = inf implies sup E = inf, right? (where the limit is n > inf and sup is over natural numbers or natural numbers > n).
https://mathoverflow.net/questions/6444 ... ceedsqrtt
https://math.stackexchange.com/question ... 152#210152
So from what I understand, stumbling into significance will almost surely happen, but the amount of flips needed is infinite in expectation.
The only part I'm still shaky on is the interpretation the statement
(Where S_k is the sum of k iid random variables with expectation 0 and variance 1)
So by expanding out the definitions for lim/sup, my understanding is that this is equivalent to saying:
For every real number w, there is an integer n, such that there exists an integer k >= n such that S_k/sqrt(k) > w, with probability 1.
To me, this seems to be equivalent to: For every real number w, there is an integer k such that S_k/sqrt(k) > w, with probability 1.
However, this would be written as P(sup S_k/sqrt(k) = inf) = 1, which seems like it definitely has a different interpretation from the original equation, and isn't the same. So I'm not sure if i've messed up something here.
In other words, it is true that for any expression E(n), lim sup E = inf implies sup E = inf, right? (where the limit is n > inf and sup is over natural numbers or natural numbers > n).
Re: Mathematics of phacking: random walks and significance
>) wrote:For every real number w, there is an integer n, such that there exists an integer k >= n such that S_k/sqrt(k) > w, with probability 1.
Not quite: limsup means it keeps happening out to infinity. In other words, for *every* integer n there is a k>n with f(k) > w.
wee free kings
Re: Mathematics of phacking: random walks and significance
I'm a bit confused by that.
I'm under the impression if lim n > inf of f(n) = infinity, then it means for every w, there exists n such that f(n) > w.
according to wikipedia, lim sup is defined:
so this would mean for every w, there exists n such that sup_{m >=n} x_m > w, which means there exists n such that there exists m such that x_m > w. right?
I'm under the impression if lim n > inf of f(n) = infinity, then it means for every w, there exists n such that f(n) > w.
according to wikipedia, lim sup is defined:
so this would mean for every w, there exists n such that sup_{m >=n} x_m > w, which means there exists n such that there exists m such that x_m > w. right?
 Eebster the Great
 Posts: 3252
 Joined: Mon Nov 10, 2008 12:58 am UTC
 Location: Cleveland, Ohio
Re: Mathematics of phacking: random walks and significance
lim sup S_{k}/√k = ∞ means that for any M,N, the maximum value of S_{k}/√k (where k>N) is greater than M. This means that no matter how far you go into the sequence (i.e. no matter how big N is), you can always find an even greater value for k that is arbitrarily large (i.e. bigger than M). In other words, there is no point in the sequence after which it remains bounded.
By contrast, consider e^{k} cos k. Here, the farther you go into the sequence, the smaller its bounds. For instance, after k=1, we know the sequence will never again get larger than 1/e. The limit of the sequence as k→∞ is 0, and therefore so is the limit superior (and the limit inferior).
Or consider e^{k} + cos k. Here, there is no limit at infinity, because as k gets large, the function approaches cos k, which oscillates between 1 and 1. However, the limit superior equals 1, because no matter how deep we get into the sequence, we will never reach a point where we can't get arbitrarily close to 1 just by waiting until k gets really close to a multiple of 2π. The limit inferior is 1 for the same reason.
By contrast, consider e^{k} cos k. Here, the farther you go into the sequence, the smaller its bounds. For instance, after k=1, we know the sequence will never again get larger than 1/e. The limit of the sequence as k→∞ is 0, and therefore so is the limit superior (and the limit inferior).
Or consider e^{k} + cos k. Here, there is no limit at infinity, because as k gets large, the function approaches cos k, which oscillates between 1 and 1. However, the limit superior equals 1, because no matter how deep we get into the sequence, we will never reach a point where we can't get arbitrarily close to 1 just by waiting until k gets really close to a multiple of 2π. The limit inferior is 1 for the same reason.
Re: Mathematics of phacking: random walks and significance
I agree with the examples you gave.
I've figured out where I was confused now. I had the wrong definition of limit in my head. So e^k * cos k is an example of a function with no limit as k > inf, but the lim sup would be infinity.
I've figured out where I was confused now. I had the wrong definition of limit in my head. So e^k * cos k is an example of a function with no limit as k > inf, but the lim sup would be infinity.
 Eebster the Great
 Posts: 3252
 Joined: Mon Nov 10, 2008 12:58 am UTC
 Location: Cleveland, Ohio
Re: Mathematics of phacking: random walks and significance
>) wrote:I'm under the impression if lim n > inf of f(n) = infinity, then it means for every w, there exists n such that f(n) > w.
Not quite: that limit means for every w, eventually f(x) will never drop below w again. In other words, there exists n such that f(x) > w for *every* x>n.
wee free kings
Who is online
Users browsing this forum: No registered users and 7 guests