the importance of being self-critical

In laboratory science-based fields, we are accustomed to performing research using the principles of the scientific method. To summarize, we form a hypothesis, conduct an experiment, collect data, analyze and interpret the data, and then draw conclusions that may or may not be aligned with the initial hypothesis. We then cycle back with a revised hypothesis and repeat the method. For this process to work, the experiments being performed must be designed to limit the number of possible conclusions, hence the need for appropriate controls. Finally, we try to use Occam’s Razor when drawing conclusions; the simplest, most assumption-free interpretation of the data should be explored first, before moving on to more complex explanations.

Recently I have been reading literature from various disciplines associated with obesity and heart disease, and the relationship between diet and those maladies. To be blunt, I have been amazed at what folks can get away with in the supposedly peer-reviewed literature. In my opinion, coming at it as an experimentalist, some of what I present below crosses the line from being poorly done to just plain irresponsible. For the sake of brevity, I will present these examples by linking to writers who have already gone through the process of dissecting the primary literature. I don’t think it is necessary for me to re-do analyses that have already been ably done. However, I want to make it clear that some of these writers have economic interest in the whole argument – they are selling books, are on lecture circuits, etc. By linking to those articles, I am not endorsing their products – I do however find some of what they have to say quite useful.

Let’s start with a recent study of Swedish women that attempted to draw connections between diet and heart disease. This is what is called a prospective cohort study, where (typically) a small number of groups of individuals who preferably differ only by a single factor (e.g., alcohol consumers vs. teetotalers) are analyzed over a period of time for a specific outcome (e.g. liver disease). For a cohort study to be effective, the number of differences  between the segments of the cohort (beyond the control variable) must be as few as possible. Furthermore, the outcomes must be prevalent enough so that differences between the two groups can reasonably be considered to lie outside the realm of random statistical fluctuations (chance). For example, if a study follows 200 individuals divided into two groups and two individuals from the first group develop a specific outcome (a 2% incidence rate) while only one person develops that outcome in the second group (a 1% incidence rate), we can say that the first group had twice the likelihood of developing that outcome. However, the statistical significance of that result is incredibly small and it is questionable whether those results can or should be extrapolated from the small sample to the entire population. You should now see the challenges of doing a prospective cohort study – many individuals need to be tracked over a sufficient period of time for outcomes to be observed. Note my emphasis of the word individuals – it is fundamentally impossible to find two homogenous groups of any complex organism where there is only one difference between those two groups. We cannot do this with supposedly simpler systems like lab mice or even cultured mammalian cell lines. Why should we believe this is possible with human beings? Our uniqueness is what begat the need for the word “individual” in our language in the first place.

Getting back to the Swedish heart disease study, the control variables here were the macronutrient (carbohydrate, protein, fat) ratios in the subjects’ diets (43,396 Swedish women were enrolled). To obtain that information, the subjects filled out a questionnaire regarding their dietary intake over the previous 6 months (yes, that seems sketchy to me). They were then followed for ~15 years and health outcomes related to cardiovascular disease were recorded. The supposed outcomes of this were trumpeted in the press. For example, the Daily Mail (UK) headline was “Can Atkins-style diets raise heart attack risk for women? Eating high levels of protein can increase chance by a quarter“. Sounds terrifying, right? The fact of the matter is, if you dig into their statistics, you find that the increased chances are vanishingly small when you consider them at face value – the authors are describing small differences in a small total number of outcomes. As I briefly described above, this is very dangerous, and it is questionable as to whether such numbers should be extrapolated to larger populations. Zoe Harcombe wrote a very nice, highly detailed article analyzing this study, where she calls into question the statistical analyses, the study design, the investigators’ understanding of the complexities in food and diet, and most importantly, the validity of the chosen control variable. In the laboratory sciences, if we obtained such data, I think I can confidently say that we would be very cautious in publishing the results – in fact, I doubt they would ever see the light of day. Would you ascribe much importance to an increase in the intensity of a spectroscopic signal of 0.09%? Probably not, yet that is what the investigators are essentially reporting here, as they find the incidence of cardiovascular disease increases from 0.14% to 0.23% in women who reported diets the investigators call “low carbohydrate-high protein” (Harcombe points out that this is probably a misnomer). Anyway, if we observed such small differences in a well-controlled laboratory chemistry experiment using highly precise and accurate instrumentation, those data would not be considered anything to write home about. Why then do such data warrant international media attention?

The final point I will make on this is related to the validity of diet as the important control parameter. The investigators seem to recognize that there are other variables to be considered that might also contribute to the observed results: age, level of education, height, body mass index, smoking, and exercise are all presented in the context of cardiovascular disease incidence. Differences in the cardiovascular disease rate are observed for ALL of these variables, yet those correlations are not considered in the final conclusions of the study, which only relate to macronutrient ratios. As a scientist, with all the data spread out in front of you, it would have been obvious to consider all of these variables as being important, since differences in those variables induced differences in the outcomes to roughly similar degrees. Why then would you choose a single variable to harp on? Let’s think about this in the context of laboratory science. A graduate student collects data on the degree of metastasis in a mouse metastatic tumor model where the study is testing an immunotherapy. At that student’s lab meeting presentation, a small correlation between administration of the therapy and a decrease in metastasis is noted, causing some optimism on the part of the advisor. However, during the Q&A, it becomes clear that metastasis also has similar inverse correlations with tumor size and initial (pre-administration) tumor growth rate. The advisor rightly asks why those variables were not considered important, to which the student answers “because that wasn’t the focus of the study”. The rest of the group cringes as the verbal flogging of the student commences.

Studies like the Swedish one are the bread and butter of observational epidemiology. The general approach is to collect data on outcomes (diseases, for example) and “inputs” (diet, lifestyle, income, etc.) and then try to correlate the different bits of data. The blogosphere is full of some outstanding detailed dissections of such studies:

– Denise Minger on Ancel Keys and her excellent collection on The China Study

– Zoe Harcombe on this Archives of Internal Medicine article

– Gary Taubes on observational epidemiology …just to name a few.

The important thing to note here is that in so many of these cases, the authors of the observational studies draw conclusions that then become media headlines. Weak statistical correlations become life and death instances of causation when presented in the media. It is important to realize that the starkest difference between these observational disciplines and experimental science is the impossibility of identifying causation in the former. For example, the very plausible causative connection between smoking and lung cancer is the result of BOTH observational studies (lung cancer rates are higher in the smoking population) and experimental studies (cigarette smoke contains chemicals determined to be carcinogenic in a variety of in vitro and in vivo models). However, many other observational studies are not supported by corresponding experimental evidence. In those cases, it is critical for the investigators to remove bias from their analyses, something that can be difficult to do. We sometimes find that the authors of many observational studies are quite ok with letting the intent of their study drive their interpretations of the data and their resultant conclusions instead of looking at the entire data set with an open mind. It is interesting to note that experimental biomedical scientists face this challenge every day. When working on complex living systems, it is nearly impossible to control for every variable, especially since not all the variables may be known, and the interconnections of variables are often obscured. Thus, better experimental design is needed, more self-critique is required, and analysis of the data from all angles is warranted. Hopefully taking a step back and looking at the absurdity of the examples described above will recalibrate us as to the importance of being self-critical. Unfortunately, It is not uncommon to see flecks of personal bias coloring the conclusions of supposed “hard science” papers these days; the corresponding media attention can then unwittingly promote that shoddy work. As experimentalists, we are all going to make errors in our work – hopefully those errors arise from honest mistakes and not bias.

(edit – 7/17/12 – changed the Taubes link above to a much more relevant article – thanks to GT for the suggestion)

Published by Andrew Lyon

Founding Dean, Fowler School of Engineering @ Chapman University. Formerly Dean of the Schmid College of Science and Technology @ Chapman.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: