Problems in medical research:Tracking outcome switching

For the discussion of the sciences. Physics problems, chemistry equations, biology weirdness, it all goes here.

Moderators: gmalivuk, Moderators General, Prelates

p1t1o
Posts: 857
Joined: Wed Nov 10, 2010 4:32 pm UTC
Location: London, UK

Re: Problems in medical research:Tracking outcome switching

Postby p1t1o » Thu Feb 25, 2016 5:00 pm UTC

lorb wrote:
p1t1o wrote:I don't like the term myself, GSK are not the only pharma company and most have much less chequered history. Tarring a whole group with the same brush is a pet peeve of mine.


Still there is a reason why this exists: Wikipedias list of largest pharmaceutical settlements

I am not saying all pharmaceutical companies are evil, but there are definitely some things that are systematically going wrong in their business. Leading back to the topic: researches feeling pressed to come up with significant results is one of those things.


Oh for sure you are correct!

aph
Posts: 296
Joined: Tue Nov 12, 2013 7:48 am UTC

Re: Problems in medical research:Tracking outcome switching

Postby aph » Fri Feb 26, 2016 5:44 pm UTC

ijuin wrote:Yes--Stage III clinical trials generally involve hundreds of test subjects in order to get a statistically significant sample. It is NOT simply a grad student running around taking monthly surveys of a group of people--it is hundreds of people taking a drug or using a medical device for several months to a year.

Isn't having large sample sizes a way of increasing the chance of getting a statistically significant result, even if the effect size is rather small?

If a drug "works", you'll get a significant result even from a relatively small group of test subject, and you'll get a large effect size. Having a large sample and a statistically significant small effect practically means "we are very certain that this drug has a very small effect".

User avatar
gmalivuk
GNU Terry Pratchett
Posts: 26413
Joined: Wed Feb 28, 2007 6:02 pm UTC
Location: Here and There
Contact:

Re: Problems in medical research:Tracking outcome switching

Postby gmalivuk » Fri Feb 26, 2016 6:25 pm UTC

Large effect sizes are more useful than small ones, and large sample sizes are more useful than small ones.

Plus you're not just looking for effectiveness, you're also looking for rare side effects, which won't show up in small studies.
Unless stated otherwise, I do not care whether a statement, by itself, constitutes a persuasive political argument. I care whether it's true.
---
If this post has math that doesn't work for you, use TeX the World for Firefox or Chrome

(he/him/his)

User avatar
Izawwlgood
WINNING
Posts: 18686
Joined: Mon Nov 19, 2007 3:55 pm UTC
Location: There may be lovelier lovelies...

Re: Problems in medical research:Tracking outcome switching

Postby Izawwlgood » Fri Feb 26, 2016 6:57 pm UTC

gmalivuk wrote:Plus you're not just looking for effectiveness, you're also looking for rare side effects, which won't show up in small studies.

Though, to be fair, clinical trials must report ALL complications and effects participants experience over the course of the study - I recently read a paper on a psoriasis treatment, and over the course of therapy, one of the patients died of a gunshot wound, and was reported as 'reason patient dropped from trial'.

Literally any time a patient has something physical occur while in the trial is reported.
aph wrote:Isn't having large sample sizes a way of increasing the chance of getting a statistically significant result, even if the effect size is rather small?
Not necessarily, at all - it's a way to increase your confidence in your reporting. Smaller sample sizes can easily produce spurious significance.
... with gigantic melancholies and gigantic mirth, to tread the jeweled thrones of the Earth under his sandalled feet.

aph
Posts: 296
Joined: Tue Nov 12, 2013 7:48 am UTC

Re: Problems in medical research:Tracking outcome switching

Postby aph » Fri Feb 26, 2016 7:03 pm UTC

Rare side effects and interactions are definitely important, so it is good that the samples are larger in late stage trials.

On the other hand, the larger the sample is, the smaller is the practical (clinical) significance of statistical significance, and effect size becomes a much more important measure of effectiveness of the drug. Here is a real example:

In more than 22 000 subjects over an average of 5 years, aspirin was associated with a reduction in MI (although not in overall cardiovascular mortality) that was highly statistically significant: P < .00001. The study was terminated early due to the conclusive evidence, and aspirin was recommended for general prevention. However, the effect size was very small: a risk difference of 0.77% with r2  =  .001—an extremely small effect size. As a result of that study, many people were advised to take aspirin who would not experience benefit yet were also at risk for adverse effects. Further studies found even smaller effects, and the recommendation to use aspirin has since been modified.

Their explanation:
... if a sample size is 10 000, a significant P value is likely to be found even when the difference in outcomes between groups is negligible and may not justify an expensive or time-consuming intervention over another. The level of significance by itself does not predict effect size. Unlike significance tests, effect size is independent of sample size. Statistical significance, on the other hand, depends upon both sample size and effect size. For this reason, P values are considered to be confounded because of their dependence on sample size. Sometimes a statistically significant result means only that a huge sample size was used.

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3444174/ (2012, Journal of graduate medical education)

User avatar
gmalivuk
GNU Terry Pratchett
Posts: 26413
Joined: Wed Feb 28, 2007 6:02 pm UTC
Location: Here and There
Contact:

Re: Problems in medical research:Tracking outcome switching

Postby gmalivuk » Fri Feb 26, 2016 7:34 pm UTC

That is a point in favor of looking for large effect sizes.

The problem is you are treating it like a point against using large sample sizes.
Unless stated otherwise, I do not care whether a statement, by itself, constitutes a persuasive political argument. I care whether it's true.
---
If this post has math that doesn't work for you, use TeX the World for Firefox or Chrome

(he/him/his)

aph
Posts: 296
Joined: Tue Nov 12, 2013 7:48 am UTC

Re: Problems in medical research:Tracking outcome switching

Postby aph » Fri Feb 26, 2016 7:43 pm UTC

It is a point against using p-values as an important measure for the effectiveness of a drug when it is tested on large samples. If a drug trial only reports the p-value, that result may very well be practically useless or worse. This is important especially for large samples.

It is also a point for the validity of small sample studies. Having a statistically significant effect on a small study usually means that the effect size is large. Though, of course, there are many problems with small sample sizes, and it is necessary to conduct larger scale trials.

User avatar
Izawwlgood
WINNING
Posts: 18686
Joined: Mon Nov 19, 2007 3:55 pm UTC
Location: There may be lovelier lovelies...

Re: Problems in medical research:Tracking outcome switching

Postby Izawwlgood » Fri Feb 26, 2016 10:22 pm UTC

aph wrote:It is also a point for the validity of small sample studies. Having a statistically significant effect on a small study usually means that the effect size is large.
No.
... with gigantic melancholies and gigantic mirth, to tread the jeweled thrones of the Earth under his sandalled feet.

Tyndmyr
Posts: 11028
Joined: Wed Jul 25, 2012 8:38 pm UTC

Re: Problems in medical research:Tracking outcome switching

Postby Tyndmyr » Fri Feb 26, 2016 10:50 pm UTC

Uuuuh, no. That's not how ANY of this works.

A statistically significant effect is only easier to get on a small study if you're relying on error. The fact that the study was small doesn't inherently mean that the effect size is larger. That's kind of ridiculous.

aph
Posts: 296
Joined: Tue Nov 12, 2013 7:48 am UTC

Re: Problems in medical research:Tracking outcome switching

Postby aph » Sat Feb 27, 2016 1:17 am UTC

Tyndmyr wrote:Uuuuh, no. That's not how ANY of this works.

A statistically significant effect is only easier to get on a small study if you're relying on error. The fact that the study was small doesn't inherently mean that the effect size is larger. That's kind of ridiculous.

No, you correct for the error on small samples, you get a different t-table. If a study on a small sample finds a *statistically significant* result, then that means (highly likely) that the effect size is relatively large. You cannot get a statistically significant result on a small sample without a large effect size because for the small samples you need to correct for effects of random factors. For 30 participants, you need t > 2.75 to be significant at p < 0.01. For 10 participants, you need t > 3.17. For large samples, the t is > 2.58, at p < 0.01. On a small sample, for a small effect size, you won't get a statistically significant result.

There are certainly caveats for small samples - the normality of the distribution, dropping of extremes and so on - but if a certain drug is effective, you *will* get statistical significance on small samples. Meaning - the drug has been successful in treating the condition with a high certainty and with a relatively large effect. The effect size does not depend on the sample size, while the the p-value does depend on it, as the formulas for calculating them demonstrate.

If the effect of a certain drug is small, you can just get a large number of participants, and get a statistically significant result (as in the example with aspirin) and not report on the d (the effect size). If the readers/evaluators/regulators only care about the p-values and confuse them with the size of the effect, you can get your drug going. That is why there is increased awareness of the importance of the effect size vs p-values.


... the p-value depends essentially on two things: the size of the effect and the size of the sample. One would get a 'significant' result either if the effect were very big (despite having only a small sample) or if the sample were very big (even if the actual effect size were tiny). It is important to know the statistical significance of a result, since without it there is a danger of drawing firm conclusions from studies where the sample is too small to justify such confidence. However, statistical significance does not tell you the most important thing: the size of the effect. One way to overcome this confusion is to report the effect size, together with an estimate of its likely 'margin for error' or 'confidence interval'.


and


An important consequence of the capacity of meta-analysis to combine results is that even small studies can make a significant contribution to knowledge. The kind of experiment that can be done by a single teacher in a school might involve a total of fewer than 30 students. Unless the effect is huge, a study of this size is most unlikely to get a statistically significant result. According to conventional statistical wisdom, therefore, the experiment is not worth doing. However, if the results of several such experiments are combined using meta-analysis, the overall result is likely to be highly statistically significant. Moreover, it will have the important strengths of being derived from a range of contexts (thus increasing confidence in its generality) and from real-life working practice (thereby making it more likely that the policy is feasible and can be implemented authentically).


http://www.leeds.ac.uk/educol/documents/00002182.htm

User avatar
gmalivuk
GNU Terry Pratchett
Posts: 26413
Joined: Wed Feb 28, 2007 6:02 pm UTC
Location: Here and There
Contact:

Re: Problems in medical research:Tracking outcome switching

Postby gmalivuk » Sat Feb 27, 2016 2:47 am UTC

Right, small sample sizes are weaker so you need to adjust your math accordingly. Also, small sample sizes are far more prone to other experimental errors beyond pure random probability.

You still haven't actually demonstrated any points against large samples, you've just "reminded" us of what people already know if they've been paying attention: effect size is also really important in clinical trials.
Unless stated otherwise, I do not care whether a statement, by itself, constitutes a persuasive political argument. I care whether it's true.
---
If this post has math that doesn't work for you, use TeX the World for Firefox or Chrome

(he/him/his)

aph
Posts: 296
Joined: Tue Nov 12, 2013 7:48 am UTC

Re: Problems in medical research:Tracking outcome switching

Postby aph » Sat Feb 27, 2016 9:22 am UTC

gmalivuk wrote:You still haven't actually demonstrated any points against large samples.

I never made one. If a drug has a small effect, then the right way of getting statistically significant results is to have a large sample study. The main finding of that study is still that the drug has a very small effect.

User avatar
gmalivuk
GNU Terry Pratchett
Posts: 26413
Joined: Wed Feb 28, 2007 6:02 pm UTC
Location: Here and There
Contact:

Re: Problems in medical research:Tracking outcome switching

Postby gmalivuk » Sat Feb 27, 2016 12:46 pm UTC

The right way to get statistically significant results for *any* effect size is to have a large study, because large studies have tons of other advantages. Then just make sure you also report the effect size so it's clearer what your results actually mean.

You seem to be arguing that because large studies *can* give significance for small effect sizes, therefore large studies are suspect in general. But there are so many other reasons why large sample sizes are desirable that your position looks frankly ridiculous.
Unless stated otherwise, I do not care whether a statement, by itself, constitutes a persuasive political argument. I care whether it's true.
---
If this post has math that doesn't work for you, use TeX the World for Firefox or Chrome

(he/him/his)

aph
Posts: 296
Joined: Tue Nov 12, 2013 7:48 am UTC

Re: Problems in medical research:Tracking outcome switching

Postby aph » Sat Feb 27, 2016 1:30 pm UTC

gmalivuk wrote:The right way to get statistically significant results for *any* effect size is to have a large study, because large studies have tons of other advantages. Then just make sure you also report the effect size so it's clearer what your results actually mean.

Sure, large studies have a lot of practical advantages, but statistically, the effect size is just as strong for a small sample as for a large one. A large sample is N > 30 (or N > 50 if you're nitpicky). Not necessarily something you would call a "large study", and quite enough to establish effectiveness of a treatment over another, or establish harmfulness of a drug, if properly conducted. Early stage trials can be, and often are, conducted on small samples. If a drug has some nasty side effects on a small sample, you just don't go to further trials, the evidence can be quite conclusive.

You seem to be arguing that because large studies *can* give significance for small effect sizes, therefore large studies are suspect in general. But there are so many other reasons why large sample sizes are desirable that your position looks frankly ridiculous.

You seem to be interpreting my posts as me being against large studies. Not at all, if a study on a large sample is properly conducted and properly interpreted, it has greater statistical power and practical significance then a study on a small sample. What is suspect in large studies is just reporting or just interpreting statistical significance and ignoring the size of the effect, which was not uncommon in the past, and it is less common today.

User avatar
Izawwlgood
WINNING
Posts: 18686
Joined: Mon Nov 19, 2007 3:55 pm UTC
Location: There may be lovelier lovelies...

Re: Problems in medical research:Tracking outcome switching

Postby Izawwlgood » Sat Feb 27, 2016 2:06 pm UTC

aph wrote:A large sample is N > 30 (or N > 50 if you're nitpicky).
No. N is entirely based on the data, and the confidence you are seeking in your statistical analysis. Some trials can reach high confidence (90%, 95%) with 20 patients. Some require 200.

aph wrote:If a drug has some nasty side effects on a small sample, you just don't go to further trials, the evidence can be quite conclusive.
Yes. They thought of this already.

aph wrote: What is suspect in large studies is just reporting or just interpreting statistical significance and ignoring the size of the effect, which was not uncommon in the past, and it is less common today.
It sounds like you're under the impression that the issue here is trials using larger sample sizes and not reporting the... magnitude of change observed over smaller trials? Or that you don't think clinical trials, when reported, report the observed difference between their control and experimental groups?
... with gigantic melancholies and gigantic mirth, to tread the jeweled thrones of the Earth under his sandalled feet.

aph
Posts: 296
Joined: Tue Nov 12, 2013 7:48 am UTC

Re: Problems in medical research:Tracking outcome switching

Postby aph » Sat Feb 27, 2016 4:33 pm UTC

Izawwlgood wrote:No. N is entirely based on the data, and the confidence you are seeking in your statistical analysis. Some trials can reach high confidence (90%, 95%) with 20 patients. Some require 200.

N is just the number of participants. The t and z distributions are not much different after N > 30, and almost identical after N > 50. And yes, some trials can reach high confidence (95%, 99%) with 20 patients, that is because of the effect size. The larger the effect of the drug, either beneficial or harmful, the smaller the sample can be to have statistically significant results. The inverse also holds.


Yes, obviously, I'm not saying anything new. Testing drugs on small samples is a valuable and useful procedure, relevant for drug testing, especially in early trials.

It sounds like you're under the impression that the issue here is trials using larger sample sizes and not reporting the... magnitude of change observed over smaller trials? Or that you don't think clinical trials, when reported, report the observed difference between their control and experimental groups?

It is a bit off topic, but yes, as in the example with aspirin, the statistical significance of the difference between the control and experimental groups was interpreted to mean clinical significance, which was a wrong interpretation. The important data point was the extremely low effect size (the magnitude of the difference). Doctors advised to take aspirin for lowering the risk of MI, but the data actually did not support that advice since the risk lowered was very low, and patients might have been exposed to harmful side effects.

There are similar criticisms of reported research on psychoactive drugs (for depression, anxiety...) where the effect sizes were.. not as high as interpreted.

User avatar
gmalivuk
GNU Terry Pratchett
Posts: 26413
Joined: Wed Feb 28, 2007 6:02 pm UTC
Location: Here and There
Contact:

Re: Problems in medical research:Tracking outcome switching

Postby gmalivuk » Sat Feb 27, 2016 8:34 pm UTC

aph, you're being fairly dishonest with your reporting of the effect size.

The 0.77% reduction was in absolute risk, but it's for something that only has a couple percent chance of happening in the first place. Since the risk of MI is low to start with, that's a relative reduction of 44% (assuming you're talking about the same 22k-member Physicians' Health Study I am).

It's important to understand the difference between relative risk and absolute risk, but when you neglect to tell us both of those numbers, it's pretty similar to someone who reports a p-value but no effect size when doing a large study. (Imagine if the absolute risk was only 0.77% to start with. Then that same "extremely small effect size" would mean reducing risk to zero.)

Prophylactic use of anything will tend to have a fairly small effect on absolute risk, because most absolute risks under consideration are pretty small to begin with. It's important to be aware of both absolute and relative risks when assessing the usefulness of an intervention.
Unless stated otherwise, I do not care whether a statement, by itself, constitutes a persuasive political argument. I care whether it's true.
---
If this post has math that doesn't work for you, use TeX the World for Firefox or Chrome

(he/him/his)

aph
Posts: 296
Joined: Tue Nov 12, 2013 7:48 am UTC

Re: Problems in medical research:Tracking outcome switching

Postby aph » Sun Feb 28, 2016 12:16 pm UTC

The paper quoted referred to this study: http://www.ncbi.nlm.nih.gov/pubmed/21481826/
Aspirin has had a larger and significant effect in preventing MI for patients with prior history of cardiovascular disease or who were otherwise at high risk of developing it. The low effect was observed for patients who were in the low risk group of developing coronary disease. For them, the risk of developing harmful side effects is nearly the same or outweighs the benefits when using aspirin in prevention. Should have reported both findings, sorry.

Here is more recent review article if someone is interested: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4669607/

User avatar
gmalivuk
GNU Terry Pratchett
Posts: 26413
Joined: Wed Feb 28, 2007 6:02 pm UTC
Location: Here and There
Contact:

Re: Problems in medical research:Tracking outcome switching

Postby gmalivuk » Sun Feb 28, 2016 1:21 pm UTC

My point still stands, though. In theory, an absolute risk reduction of 0.77% could mean risk is reduced to zero, which is an infinite effect size by all three of the measures listed in Table 1 of the article where you got that number. It is dishonest to report that as an "extremely small" effect without bothering to tell us the effect size as measured in any of those ways.

Sure, absolute risk is important when comparing the desired outcome with adverse side effects, and if the risk of side effects worse than MI is higher than 0.77%, then that's an important thing to consider when deciding whether to take aspirin. But then just say that directly and stop pretending the Physicians' Health Study was somehow dishonest in its reporting, when in fact you (and that article) were the one leaving out the relative risk reduction figure.
Unless stated otherwise, I do not care whether a statement, by itself, constitutes a persuasive political argument. I care whether it's true.
---
If this post has math that doesn't work for you, use TeX the World for Firefox or Chrome

(he/him/his)

aph
Posts: 296
Joined: Tue Nov 12, 2013 7:48 am UTC

Re: Problems in medical research:Tracking outcome switching

Postby aph » Sun Feb 28, 2016 7:00 pm UTC

I'm not sure I understand what you mean. The 0.77% is a measure of the difference of risks for stroke between the aspirin and control groups. It is extremely small - standardized, is 0.06 sigma. It was not reported in the original study, but no one is accusing the researches of dishonesty.

The study was apparently terminated because of conclusive evidence that aspirin lowers the risk of first stroke, on the grounds that it would be unethical to continue giving placebo to half of the participants.

User avatar
gmalivuk
GNU Terry Pratchett
Posts: 26413
Joined: Wed Feb 28, 2007 6:02 pm UTC
Location: Here and There
Contact:

Re: Problems in medical research:Tracking outcome switching

Postby gmalivuk » Sun Feb 28, 2016 8:01 pm UTC

aph wrote:I'm not sure I understand what you mean. The 0.77% is a measure of the difference of risks for stroke between the aspirin and control groups.
Yes, but my point is that this number could result from a reduction of risk from 0.77% to 0.00%, which is small in one sense but in another is huge. The article you yourself linked describes different ways of measuring effect size. As an odds ratio or a risk ratio, reducing from some risk to 0 is an "infinite" effect size, and the article says 3 or 4, respectively, for those numbers is "large".

It is extremely small - standardized, is 0.06 sigma.
The standard deviation seems more useful for an effect size calculation when you're looking at a continuous variable, but whether or not someone has a MI is binary, so one of the ratios would make more sense. (The PHS had a reduction of 44%, for a RR of 1.8, which is "small" according to the article, but not negligible.)

Edit: And again, you only know it's 0.06 sigma because you also know the size of the absolute risk to start with. Just the 0.77% by itself could be anywhere from 0.0154 sigma up to infinity, depending on what the starting risk is. So as I've been saying all along, reporting only the "extremely small" value of 0.77% doesn't actually give the kind of information you seem to think it does.
Unless stated otherwise, I do not care whether a statement, by itself, constitutes a persuasive political argument. I care whether it's true.
---
If this post has math that doesn't work for you, use TeX the World for Firefox or Chrome

(he/him/his)

aph
Posts: 296
Joined: Tue Nov 12, 2013 7:48 am UTC

Re: Problems in medical research:Tracking outcome switching

Postby aph » Mon Feb 29, 2016 12:14 am UTC

gmalivuk wrote:Yes, but my point is that this number could result from a reduction of risk from 0.77% to 0.00%, which is small in one sense but in another is huge. The article you yourself linked describes different ways of measuring effect size. As an odds ratio or a risk ratio, reducing from some risk to 0 is an "infinite" effect size, and the article says 3 or 4, respectively, for those numbers is "large".

For some people, who are at some risk of developing MI and some other conditions, yes, the aspirin therapy definitely helps to reduce to risk. There are formulas that attempt to approximate this risk based on significant factors such as age, sex, cholesterol levels and others for individual patients. So I suppose statisticians' general recommendation is to calculate the risk for each patient.

Standardized how? Sigma from what? I don't think you're doing math the same way the article describes.

Edit: The standard deviation is useful for an effect size calculation when you're looking at a continuous variable, but whether or not someone has a MI is binary, so one of the ratios would make more sense. (The PHS had a reduction of 44%, for a RR of 1.8, which is "small" according to the article, but not negligible.)

There are many articles and a few books referencing this particular Physicians Health Study with an example of how to calculate the effect size. The 0.77% is calculated as the proportion of people that got MI in the control group (0.0171 or 1.71%) minus the proportion of people who got MI in the experimental group (0.0094 or 0.94%), which is 0.0077 or 0.77%. I'll have to get back with the formula for calculating the sigma, this article only references the number from a different article I can't access. There is some criticism for using RR as statisticians say it is easy mi misinterpret it - a risk ratio is 1.8 that does not mean that people who take aspirin are 1.8 times safer then people who don't take it.

I can remember why I abstained from statistics for a while.

They also started using lots of greek symbols in their formulas, and I'm pretty sure I've seen a hieroglyph. As if we can type a snake eye in R.

I should also add something about small sample studies - they are not always clinically (or practically) significant because, while then can show a statistically significant effect of a drug, they can't prove its safety, as you have also pointed out - that is why we need large studies. Small sample studies point to drugs with potential while not exposing a lot of participants to risks of side effects. If they find side effects for particular doses, they become clinically important.

I'm starting to forget why is there an issue with not reporting studies that don't find statistically significant effects. If it is not stat. significant, that there is just no big effect. A larger study might find some effect, but it is still going to be a small effect. Sure, it would be better to report all, but it seems like a minor problem.

User avatar
gmalivuk
GNU Terry Pratchett
Posts: 26413
Joined: Wed Feb 28, 2007 6:02 pm UTC
Location: Here and There
Contact:

Re: Problems in medical research:Tracking outcome switching

Postby gmalivuk » Mon Feb 29, 2016 12:33 am UTC

aph wrote:There are many articles and a few books referencing this particular Physicians Health Study with an example of how to calculate the effect size. The 0.77% is calculated as the proportion of people that got MI in the control group (0.0171 or 1.71%) minus the proportion of people who got MI in the experimental group (0.0094 or 0.94%), which is 0.0077 or 0.77%. I'll have to get back with the formula for calculating the sigma, this article only references the number from a different article I can't access.
I edited my post above because I got home and looked up and did the math. It's a simple Bernoulli trial, so the standard deviation is sqrt(p(1-p)), and that version of the effect size is either (p1-p2)/sigma1 or (p1-p2)/sigma2, where 1 and 2 refers to the two groups. Getting d=0.06 results from choosing the larger sigma, whereas it's closer to 0.08 for the smaller one.

I'm starting to forget why is there an issue with not reporting studies that don't find statistically significant effects. If it is not stat. significant, that there is just no big effect. A larger study might find some effect, but it is still going to be a small effect. Sure, it would be better to report all, but it seems like a minor problem.
If there are 100 studies, then we would expect even from totally random data one of them will find statistical significance at the p=0.01 level. If that's the only study that gets reported, then it looks like a strong result, when in fact there is no effect at all. (This is true regardless of effect sizes or sample sizes.)

Reporting all studies is the only way we can judge whether the apparent statistical significance of one study is actually significant.

Edit: let's consider a toy example with flipping fair coins. Suppose there's a lot of interest whether pennies are more likely to come up heads than dimes, and so people run experiments by flipping five randomly selected pennies and then flipping five randomly selected dimes (they, like you, figure that if there's a large effect size, it'll show up even in this very small sample). Suppose too that everyone's decided p=0.001 should be the threshold for significance in thinking there's anything interesting about coins.

Then you see a study published, finding that pennies came up heads 100% of the time and dimes came up heads 0% of the time. This is an incredibly large effect (infinite by all three methods described in your link), and is significant at p<0.001.

Should you put stock in this study?

If you know that people don't bother to publish negative results, then you should do nothing of the kind, because you don't know how many other studies were done that found nothing significant. If there were about 1000 of them, then it shouldn't be surprising even with completely fair coins that one study would have the results in question.
Unless stated otherwise, I do not care whether a statement, by itself, constitutes a persuasive political argument. I care whether it's true.
---
If this post has math that doesn't work for you, use TeX the World for Firefox or Chrome

(he/him/his)

aph
Posts: 296
Joined: Tue Nov 12, 2013 7:48 am UTC

Re: Problems in medical research:Tracking outcome switching

Postby aph » Tue Mar 01, 2016 12:25 am UTC

gmalivuk wrote:Reporting all studies is the only way we can judge whether the apparent statistical significance of one study is actually significant.

I understand that that is the rationale for reporting all the studies - and it is certainly more important in fields such as medicine where even small effects can be clinically important, for reducing the risk of life threatening events. But then statisticians seem to be increasingly warning about the importance of effect sizes, and some journals began requiring calculations of effect sizes for all submissions. So, calculating the effect size seems to be another way of judging whether the apparent statistical significance is actually / practically / clinically significant.

Well, at least in some cases.
Should you put stock in this study?

No, I suppose I wouldn't... Thanks for the illustration. If all the studies were reported, that would certainly change the perception of importance of the published studies that found a statistically significant effect.

User avatar
gmalivuk
GNU Terry Pratchett
Posts: 26413
Joined: Wed Feb 28, 2007 6:02 pm UTC
Location: Here and There
Contact:

Re: Problems in medical research:Tracking outcome switching

Postby gmalivuk » Tue Mar 01, 2016 1:50 am UTC

aph wrote:So, calculating the effect size seems to be another way of judging whether the apparent statistical significance is actually / practically / clinically significant.

Those other types of significance are different, though. There's no "another" about it.

Publishing all studies is the only way to ensure that reported statistical significance is actually significant. Reporting effect size is relevant to a bunch of other types of significance, but from a purely mathematical perspective is completely independent from statistical significance.

(Also, my original objection to your posts still stands: In reporting about an "extremely small" effect size, you neglected to actually report the effect size in any of the standardized ways the article mentioned, and the 0.77% number on its own tells us literally nothing about those measures of effect size.)
Unless stated otherwise, I do not care whether a statement, by itself, constitutes a persuasive political argument. I care whether it's true.
---
If this post has math that doesn't work for you, use TeX the World for Firefox or Chrome

(he/him/his)

aph
Posts: 296
Joined: Tue Nov 12, 2013 7:48 am UTC

Re: Problems in medical research:Tracking outcome switching

Postby aph » Tue Mar 01, 2016 2:51 pm UTC

gmalivuk wrote:
aph wrote:So, calculating the effect size seems to be another way of judging whether the apparent statistical significance is actually / practically / clinically significant.

Those other types of significance are different, though. There's no "another" about it.

Your statement was "Reporting all studies is the only way we can judge whether the apparent statistical significance of one study is actually significant." My mistake as well, I used the term 'apparent statistical significance' - there is no apparent statistical significance - the difference between samples either is or isn't statistically significant. When it is statistically significant, then we are interested whether this statistical significance is 'actually important' or 'practically significant' or if we are talking about medical treatments - 'clinically important'. If the studies are not all reported, then that reduces our ability to judge the 'actual importance' of the smaller number of reported statistically significant results.

And yes, effects size are important for practical significance, but they are not even interpreted when the results aren't statistically significant.
(Also, my original objection to your posts still stands: In reporting about an "extremely small" effect size, you neglected to actually report the effect size in any of the standardized ways the article mentioned, and the 0.77% number on its own tells us literally nothing about those measures of effect size.)

It tells you a lot - from 10k people in the aspirin group, a bit less then 1% got stroke. In the control group, the number was 1.7%. So, for people who were apparently in the low risk for stroke, the aspirin therapy reduced the risk by 0.77%, which is small by absolute measures, just standing by itself as a percentage (though might not be practically negligible). I thought you were referring to me not reporting the reduction of risk of stroke for people in the higher risk group - aspirin is a very effective treatment for secondary prevention.

The article also does mention in the same sentence that r^2 is 0.001, which is the coefficient of determination, for which the table lists < 0.04 to be 'small', and which is also an effect size measure.

The article was about the importance of reporting effects sizes, published in a medical education journal, not about aspirin effectiveness, and they didn't go into too much detail on this particular study (it is mentioned in a lot of other places). Also, weirdly, I see now that that article does not list the formula for the risk difference used and instead lists several risk ratios.

User avatar
gmalivuk
GNU Terry Pratchett
Posts: 26413
Joined: Wed Feb 28, 2007 6:02 pm UTC
Location: Here and There
Contact:

Re: Problems in medical research:Tracking outcome switching

Postby gmalivuk » Tue Mar 01, 2016 7:57 pm UTC

Yes, when you report an absolute risk, 0.77% tells you something useful.

But you initially didn't, so 0.77% gave no information about effect size in the sense used by the article.
Unless stated otherwise, I do not care whether a statement, by itself, constitutes a persuasive political argument. I care whether it's true.
---
If this post has math that doesn't work for you, use TeX the World for Firefox or Chrome

(he/him/his)

aph
Posts: 296
Joined: Tue Nov 12, 2013 7:48 am UTC

Re: Problems in medical research:Tracking outcome switching

Postby aph » Tue Mar 01, 2016 8:14 pm UTC

Yeah, I suppose a person interested in the effects needs to analyse the numbers closely and go trough the tedium of statistical logic and all the different measures, and the interpretation is not apparent just from the few lines of results.

The rationale of using risk differences is that they would be the same if the risks for developing a condition were, say, 50% in the experimental group vs 50.77% in the control group, which would mean that the experimental treatment reduced the risk by 0.77%, and that would still be a very small effect. In comparison, risk ratio would then be close to 1. Risk differences might be more closely connected to intuitive understanding of risks, and risk ratios get into the trouble of divisions by zero or by very small numbers.

User avatar
gmalivuk
GNU Terry Pratchett
Posts: 26413
Joined: Wed Feb 28, 2007 6:02 pm UTC
Location: Here and There
Contact:

Re: Problems in medical research:Tracking outcome switching

Postby gmalivuk » Tue Mar 01, 2016 8:26 pm UTC

aph wrote:Yeah, I suppose a person interested in the effects needs to analyse the numbers closely and go trough the tedium of statistical logic and all the different measures, and the interpretation is not apparent just from the few lines of results.
Analyze what numbers closely? Just include in those few lines any *one* of 0.94% (risk in the experimental group) or 1.71% (risk in the control group) or 45% (risk reduction relative to control) or 0.55 (risk ratio), and then together with the 0.77% number readers now have all the information they need.

The rationale of using risk differences is that they would be the same if the risks for developing a condition were, say, 50% in the experimental group vs 50.77% in the control group, which would mean that the experimental treatment reduced the risk by 0.77%, and that would still be a very small effect.
That is the rationale for why the absolute risk needs to be included in the reporting alongside the risk difference. Which has been my point literally this whole time.

A risk difference of 0.77% is tiny when it's a decrease from 50.77% to 50% (RR=1.00385, d=0.0154), but huge when it approaches a decrease from 0.77% to 0% (where both RR and d approach infinity).
Unless stated otherwise, I do not care whether a statement, by itself, constitutes a persuasive political argument. I care whether it's true.
---
If this post has math that doesn't work for you, use TeX the World for Firefox or Chrome

(he/him/his)

aph
Posts: 296
Joined: Tue Nov 12, 2013 7:48 am UTC

Re: Problems in medical research:Tracking outcome switching

Postby aph » Tue Mar 01, 2016 9:08 pm UTC

Risk difference is the same 0.77% for 1.71% - 0.94% and 50.77% - 50.00%. It can stand by itself if the issue is the magnitude of the effect.

The risk ratio, in this study, would be 1.71% /0.94% which is 1.82, meaning the participants in the control group were almost twice as likely to develop a heart condition - but this is very easy to misinterpret to mean that aspirin has had a large effect and should be used for primary prevention. Which is what has happened in 1989 when the original study was terminated due to conclusive evidence and statistically significant results. If we are reporting risk ratios, then yes, by all means, both risks should be reported, but it is a question whether it makes sense to report and interpret the risk ratio (edit: in cases such as this one).

User avatar
gmalivuk
GNU Terry Pratchett
Posts: 26413
Joined: Wed Feb 28, 2007 6:02 pm UTC
Location: Here and There
Contact:

Re: Problems in medical research:Tracking outcome switching

Postby gmalivuk » Tue Mar 01, 2016 9:35 pm UTC

It always makes sense to report and interpret the risk ratio. It also makes sense to report and interpret the absolute risk in the general population or control group. What doesn't make any sense is your continued insistence that reporting a risk difference gives useful information by itself. In what situation can that stand by itself?

What if a proposed safety measure resulted in a decrease of 0.77% in the likelihood of dying in a car accident per million miles driven. Would you say that was a useful safety measure? Would you be in favor of it if the implementation and regulation would have the hefty price tag of a billion US dollars per year?
Unless stated otherwise, I do not care whether a statement, by itself, constitutes a persuasive political argument. I care whether it's true.
---
If this post has math that doesn't work for you, use TeX the World for Firefox or Chrome

(he/him/his)

aph
Posts: 296
Joined: Tue Nov 12, 2013 7:48 am UTC

Re: Problems in medical research:Tracking outcome switching

Postby aph » Tue Mar 01, 2016 10:00 pm UTC

gmalivuk wrote:It always makes sense to report and interpret the risk ratio. It also makes sense to report and interpret the absolute risk in the general population or control group. What doesn't make any sense is your continued insistence that reporting a risk difference gives useful information by itself. In what situation can that stand by itself?

I thought I illustrated that, but sure, the more data the better.
What if a proposed safety measure resulted in a decrease of 0.77% in the likelihood of dying in a car accident per million miles driven. Would you say that was a useful safety measure? Would you be in favor of it if the implementation and regulation would have the hefty price tag of a billion US dollars per year?

The risk of dying in a car accident per million miles is pretty low to start with, so I'm pretty sure that proponents of that measure would proclaim something like "a 50% reduction!", while the opponents would keep to risk difference divided by cost or somesuch measure. I'm not sure, I guess I would, depending on what are the alternative suggestions of where to spend the billions. I'd be in favor of daily aspirin too, if didn't also cause stomach bleeds.

User avatar
gmalivuk
GNU Terry Pratchett
Posts: 26413
Joined: Wed Feb 28, 2007 6:02 pm UTC
Location: Here and There
Contact:

Re: Problems in medical research:Tracking outcome switching

Postby gmalivuk » Tue Mar 01, 2016 11:07 pm UTC

aph wrote:
gmalivuk wrote:It always makes sense to report and interpret the risk ratio. It also makes sense to report and interpret the absolute risk in the general population or control group. What doesn't make any sense is your continued insistence that reporting a risk difference gives useful information by itself. In what situation can that stand by itself?

I thought I illustrated that, but sure, the more data the better.
When did you think you illustrated that? I haven't seen any example where just reporting that single number is usefully informative.

What if a proposed safety measure resulted in a decrease of 0.77% in the likelihood of dying in a car accident per million miles driven. Would you say that was a useful safety measure? Would you be in favor of it if the implementation and regulation would have the hefty price tag of a billion US dollars per year?

The risk of dying in a car accident per million miles is pretty low to start with, so I'm pretty sure that proponents of that measure would proclaim something like "a 50% reduction!", while the opponents would keep to risk difference divided by cost or somesuch measure. I'm not sure, I guess I would, depending on what are the alternative suggestions of where to spend the billions. I'd be in favor of daily aspirin too, if didn't also cause stomach bleeds.
It's actually an 80% reduction, or nearly 28 thousand lives a year. But the point was that without also being told (or guesstimating, as you did) the initial risk, that single 0.77% number that sort of seems small gives basically no information whatsoever.
Unless stated otherwise, I do not care whether a statement, by itself, constitutes a persuasive political argument. I care whether it's true.
---
If this post has math that doesn't work for you, use TeX the World for Firefox or Chrome

(he/him/his)

aph
Posts: 296
Joined: Tue Nov 12, 2013 7:48 am UTC

Re: Problems in medical research:Tracking outcome switching

Postby aph » Wed Mar 02, 2016 12:51 am UTC

Well, the whole situation with the changed doctors' recommendation of aspirin in primary prevention is illustrating that the effect of aspirin is more meaningfully interpreted as being extremely small then as being large in any sense. Doctors just don't recommend aspirin anymore for primary prevention to people who are in the low risk group to begin with. And that is nicely summarized in that single number showing the effect of the drug for the low risk sample. If we are just interested in the size of the effect, that is the number we are looking for. Specifically, I thought that the example with 50.00% vs 50.77% illustrated that this measure stays the same for the different initial risks, and that is why it can stand by itself (if we are ONLY interested in the size of the effect). I'll get a more authoritative source with a nice illustration...

Of course, we are interested in much more then just the size of the effect - the p level to start with, number of participants, possible side effects and interactions, costs, dosages and so on. The more data the better. Risk ratios if you want, and if you know how to interpret them.

You said it was dishonesty to not report other ratios - it isn't, they are just very prone to misinterpretation just like the aspirin study demonstrates.

Measuring effectiveness (2015), http://www.sciencedirect.com/science/ar ... 8615000837

(RD is the risk difference, RR risk ratio, and RRR is the relative risk reduction)

Effectiveness always should be measured and reported in absolute terms (using measures such as ‘absolute risk reduction’), and only sometimes should effectiveness also be measured and reported in relative terms (using measures such as ‘relative risk reduction’). [...]
Employment of relative measures, such as RR or RRR, promotes the base-rate fallacy (Worrall, 2010). Both physicians and patients overestimate the effectiveness of medical interventions when presented with only relative measures, and their estimates are more accurate when they are presented with both relative and absolute measures or with absolute measures alone. This finding has been replicated many times in different contexts. [...]
... as suggested above, to address the patient’s central question articulated above, we should have a measure that represents the capacity of an intervention to change the probability of the beneficial outcome in question. RR does not do this. RD does."

Following this is an example of a utility calculation for a heart attack prevention using RD, as well as a full page of examples of fallacies using risk ratios.

Continuing...
One might object that a medical intervention with a low absolute effect size could nevertheless be considered ‘effective’, because if the medical intervention were used by a large number of people, then a significant absolute number of those people would experience the beneficial outcome of the intervention. This is especially the case with those medical interventions that are widely used today as preventive medications, such as cholesterol-lowering drugs and blood pressure-lowering drugs.[...]
An individual patient and her physician want to know that if they employ a particular medical intervention then there is a reasonably good chance that the intervention will be effective for this particular patient. For drugs with low absolute effect sizes like the ones I have been discussing above, that is almost never the case.

User avatar
gmalivuk
GNU Terry Pratchett
Posts: 26413
Joined: Wed Feb 28, 2007 6:02 pm UTC
Location: Here and There
Contact:

Re: Problems in medical research:Tracking outcome switching

Postby gmalivuk » Wed Mar 02, 2016 1:05 am UTC

aph wrote:If we are just interested in the size of the effect, that is the number we are looking for. Specifically, I thought that the example with 50.00% vs 50.77% illustrated that this measure stays the same for the different initial risks, and that is why it can stand by itself (if we are ONLY interested in the size of the effect).
RD is not the only way to measure "the size of the effect", and pointing out other examples with the same RD as though that proves your point (and as though I wasn't doing the same thing from the start) makes me wonder if you still don't get what I'm trying to say.

An Ebola treatment that reduces mortality of some strain from 50.77% to 50% is basically ineffective. A safety measure that eliminates 80% of driving fatalities is hugely effective. Both of those have the same "size of the effect" as you're describing it above, but I can't imagine why that would ever be the only number we're interested in.

An individual patient and her physician want to know that if they employ a particular medical intervention then there is a reasonably good chance that the intervention will be effective for this particular patient. For drugs with low absolute effect sizes like the ones I have been discussing above, that is almost never the case.
Yes, okay, this is a specific application of research results where RD might be the most salient number. And RD is obviously the most useful figure to compare to side effect risks (which btw can often only be ascertained with large-scale studies, contrary to your initial claim).

I still maintain that reporting only RD when talking about more general things like a potential public health measure can be at least as misleading as reporting only RR or RRR.
Unless stated otherwise, I do not care whether a statement, by itself, constitutes a persuasive political argument. I care whether it's true.
---
If this post has math that doesn't work for you, use TeX the World for Firefox or Chrome

(he/him/his)

aph
Posts: 296
Joined: Tue Nov 12, 2013 7:48 am UTC

Re: Problems in medical research:Tracking outcome switching

Postby aph » Wed Mar 02, 2016 1:37 am UTC

If the car safety example wasn't an issue of public safety, but rather you buying a car, any you were being sold an 80% reduction in the risk of say, brakes blocking, for the mere cost of 20k, you would need to see the numbers for the baseline risk to assess whether the difference is worth the cost?

Do you think there is a specific difference when considering public health / safety issues vs individual decisions? I don't know, maybe there is. There also might be side effects.

User avatar
gmalivuk
GNU Terry Pratchett
Posts: 26413
Joined: Wed Feb 28, 2007 6:02 pm UTC
Location: Here and There
Contact:

Re: Problems in medical research:Tracking outcome switching

Postby gmalivuk » Wed Mar 02, 2016 1:59 am UTC

aph wrote:If the car safety example wasn't an issue of public safety, but rather you buying a car, any you were being sold an 80% reduction in the risk of say, brakes blocking, for the mere cost of 20k, you would need to see the numbers for the baseline risk to assess whether the difference is worth the cost?
Yes, I would want to see those numbers. I have never advocated only reporting the RRR.

Do you think there is a specific difference when considering public health / safety issues vs individual decisions? I don't know, maybe there is.
Of course there is. The decisions are about vastly different levels, so of course the information needed to make them could reasonably be different.
Unless stated otherwise, I do not care whether a statement, by itself, constitutes a persuasive political argument. I care whether it's true.
---
If this post has math that doesn't work for you, use TeX the World for Firefox or Chrome

(he/him/his)

qetzal
Posts: 855
Joined: Thu May 01, 2008 12:54 pm UTC

Re: Problems in medical research:Tracking outcome switching

Postby qetzal » Wed Mar 02, 2016 4:31 am UTC

aph wrote:Your statement was "Reporting all studies is the only way we can judge whether the apparent statistical significance of one study is actually significant." My mistake as well, I used the term 'apparent statistical significance' - there is no apparent statistical significance - the difference between samples either is or isn't statistically significant. When it is statistically significant, then we are interested whether this statistical significance is 'actually important' or 'practically significant' or if we are talking about medical treatments - 'clinically important'. If the studies are not all reported, then that reduces our ability to judge the 'actual importance' of the smaller number of reported statistically significant results.


It sounds like you're conflating two distinct issues here. Publication bias, where negative results don't get published, can make us think an effect is real even when it's not. That's because we might only see the results of the few studies that randomly produce a false positive result, but we don't see the many more studies that generate a true negative.

That's quite different from the situation where a positive result is real, but not large enough to matter. This is where effect size and clinical significance become relevant.

aph
Posts: 296
Joined: Tue Nov 12, 2013 7:48 am UTC

Re: Problems in medical research:Tracking outcome switching

Postby aph » Wed Mar 02, 2016 5:58 am UTC

gmalivuk wrote:What if a proposed safety measure resulted in a decrease of 0.77% in the likelihood of dying in a car accident per million miles driven. Would you say that was a useful safety measure? Would you be in favor of it if the implementation and regulation would have the hefty price tag of a billion US dollars per year?

It's actually an 80% reduction, or nearly 28 thousand lives a year. But the point was that without also being told (or guesstimating, as you did) the initial risk, that single 0.77% number that sort of seems small gives basically no information whatsoever.

How did you get the 80% and the 28k?

(edit: deleted my calculations, need to recalculate, is it per million miles or per year? )
Last edited by aph on Wed Mar 02, 2016 6:29 am UTC, edited 2 times in total.

aph
Posts: 296
Joined: Tue Nov 12, 2013 7:48 am UTC

Re: Problems in medical research:Tracking outcome switching

Postby aph » Wed Mar 02, 2016 6:13 am UTC

qetzal wrote:It sounds like you're conflating two distinct issues here. Publication bias, where negative results don't get published, can make us think an effect is real even when it's not. That's because we might only see the results of the few studies that randomly produce a false positive result, but we don't see the many more studies that generate a true negative.

Found a relevant: https://xkcd.com/882/
Well, if the news were published studies.
That's quite different from the situation where a positive result is real, but not large enough to matter. This is where effect size and clinical significance become relevant.

Yes, that would be another situation in which the effect size measures would be relevant - we usually don't calculate or interpret the effect size if the study didn't find statistically significant results, but if we have a good reason to suspect that there should be some stat. sign. effect, we can approximate the needed size of the sample that would resolve the issue using measures of effect size.

I was referring to the importance of calculating the size of the effect for studies that already found stat. sign. results. The p-value tells us the effect was likely not due to chance, but it doesn't tell us the more important information of just how large the effect was.


Return to “Science”

Who is online

Users browsing this forum: Majestic-12 [Bot] and 7 guests