On Drug Design

On Drug Design

"About the blog"

Personal reflections on drug design. Research interest includes combining new technology, informatics and science in innovative ways to tackle the challenging tasks in drug discovery...as well as trying to distinguish science facts from science fiction using the power of computers...something I'll post a text on now and then...usually after having read an interesting book/paper.

Luck, Molecular Modeling and the Sports Illustrated Curse

StatisticsPosted by Jonas Boström Wed, November 18, 2015 23:13:18
This was first posted:10/07/2013 here

Accomplish something extraordinary within Sports and you may appear on the prestigious cover of the Sports Illustrated magazine. There's a downside to that however. An in-depth analysis revealed that a "curse" (i.e. poorer performance in the near future) followed a cover appearance 37.2 percent of the time. The largest effect was seen for golfers, who were jinxed almost 70 percent of the time. The jinx effect may be attributed to an often-forgotten statistical rule termed “regression to mean”.

Wikipedia helps us with the definition: “in statistics, regression to the mean is the phenomenon that if a variable is extreme on its first measurement, it will tend to be closer to the average on its second measurement — and, paradoxically, if it is extreme on its second measurement, it will tend to have been closer to the average on its first”. Hence, after a peak performance, it is likely to go downhill from there. Lindsey Vonn effectively illustrates the effect on the cover in the middle below.

Regression effects occur whenever the correlation between two measures is incomplete, and since regression to mean can predict both the future and the past, it effectively puts causality out of play. The more extreme the accomplishment is the more regression we should expect. This can be difficult to grasp and it is dissatisfying to us humans. Daniel Kahneman, Nobel Prize winner in economic sciences, states in his marvelous book “Thinking, Fast and Slow” that we always want to associate a cause to an effect, and that we have difficulties with handling statistical facts. On top of that, it is a mathematical inevitable consequence that luck influences most everything of what we do. Events happen over which we have no control, and these can carry more weight than our own actions. When asked his favorite equation, Kahneman shared this:

Success = talent + luck

Great success = a little more talent + a lot of luck

If, for example, you play golf and pull off an exceptionally stunning score regression to mean predict the next round more towards the mean (in either time direction!), in the absence of other information. The same effect is obviously also true if you perform a virtual screen and obtain an extraordinary hit-rate. The next time you perform a virtual screen the results are likely not to be as good. You might be jinxed!? One way to make sure that you can draw any conclusions is to compare with a control, for example using a NULL hypothesis.

If, for example, you play golf and pull off an exceptionally stunning score regression to mean predict the next round more towards the mean (in either time direction!), in the absence of other information. The same effect is obviously also true if you perform a virtual screen and obtain an extraordinary hit-rate. The next time you perform a virtual screen the results are likely not to be as good. You might be jinxed!? One way to make sure that you can draw any conclusions is to compare with a control, for example using a NULL hypothesis.

Molecular modeling and virtual screening reports are notoriously bad at accounting for statistical effects. One reason for this is that testing compounds in biology assay is often very expensive, and modelers/chemists/biologists generally do not want to ‘waste’ precious slots with ‘some random compounds’. This is extra problematic for low-throughput assays, since extreme effects are larger for small sample sizes. But, without randomization it is difficult to rule out that an extreme hit-rate was not a twist of fate.

There is also a danger that without proper validation new modeling software with one or two early accidental successes are hyped, and vendors can play on this and not have to worry about people validating their software and coming up unimpressed. As scientists we need to be more mindful about such things. The modeling community is however progressing; general statistics and the quest for reproducibilityare being taken more seriously. In fact, my friend Anthony Nicholls arranged a different conferencedevoted entirely to statistics in molecular modeling this summer (slide decks).

To be fair, other disciplines than molecular modeling can also be sloppy. The Sports Illustrated analysis mentioned above did not, for example, include a NULL model. A colleague of mine hinted that it could just as well be a blessing instead of a curse! Here, I need to come clean and admit that I have misbehaved myself. We recently publisheda virtual screen where shape and electrostatics was used to select a few (68) compounds from a large database (1M). The remarkable result was a hit compound, a fibrinolysis inhibitor four times as potent as the reference compound (an existing drug!). No NULL model was used. My bad. To my defense the method we used had previously proven useful on numerous occasions. There was also a significant amount of luck. The hit compound was acquired and added to the AstraZeneca corporate database only a month before the virtual screen was made.

Finally, might there be a Journal of Medicinal Chemistry curse as well? The cover to the right(above) is made by yours truly to “highlight outstanding research” performed by AZ colleagues and myself (Note: words in italics from J. Med. Chem editors, not me). Was that the kiss of death for me? I hope not. Another classical way to judge if you have an extreme value is to increase the sample size. That is, if you can repeat an extreme occurrence multiple times it can both be outstanding and expected behavior. Golfer Arnold "the more I practice the luckier I get" Palmer has appeared on as many as thirteen Sports Illustrated covers, which is both outstanding and expected for him. I also play golf, is there causal connection?




Fill in only if you are not real





The following XHTML tags are allowed: <b>, <br/>, <em>, <i>, <strong>, <u>. CSS styles and Javascript are not permitted.