Drawing conclusions from empirical science
A recent paper in PLOS One made some noise in my twittersphere over the Christmas days. It compares the productivity of writing scientific documents using Microsoft Word and using LaTeX, and concludes that Microsoft Word is so clearly superior that, in the interest of saving taxpayers’ money, scientific publishers should abandon LaTeX to allow authors to become more productive.
The noise in my twittersphere is about the technical shortcomings of the study, whose findings are in clear contradiction to the personal experience of everyone who has used both LaTeX and Microsoft Word in preparing real-life scientific articles for publication. This is well discussed in the comments on the paper. In short, the situations explored in the study are limited to the reproduction of a given piece of text with some typical “scientific” elements such as tables or formulas, but without the complexity of real-life documents: references, citations, revisions, collaborative editing, etc.
The topic of this post is a more fundamental problem illustrated by the study cited above, and which is shared by a large number of scientific explorations of much more important subjects, in particular concerning health and medicine. It is the problem of drawing practical conclusions from the results of a scientific study, such as the conclusion cited above that abandoning LaTeX would lead to significant savings in the field of scientific publishing. In the following, I will concentrate on this issue and leave aside everything else: let’s assume for a few minutes that published scientific studies are 100% reliable and described clearly enough that no misunderstandings or erroneous interpretations ever occur.
The feature that the Word vs. LaTeX study shares with much of modern research is that it is purely empirical. It starts from the question if science writers are more productive using Word or using LaTeX, taking into account a few obvious parameters such as prior experience with one or the other system. To answer that question, a specific experiment is designed, performed, and analyzed. Importantly, there is no underlying model that is used to interpret the results, which is what makes the model purely empirical.
Empirical studies are characteristic of relatively young domains of scientific exploration. It’s what every new field starts out with: the search for systematic relations between observable facts and quantities. As our understanding of some aspect of nature improves, we move on to the next level of scientific inquiry: the construction of models. A model makes assumptions about the mechanisms underlying the observed behavior, and allows the prediction of results that some not-yet-performed experiment should produce. The introduction of models is an enormous boost to the power and efficiency of scientific research. First of all, predictions can be tested, and therefore the models can be tested. Of course, an isolated hypothesis (“Word makes scientists more productive than LaTeX”) can also be tested, but a model produces a whole family of related hypotheses that can be tested as a whole. In particular, one can search for corner cases that may be untypical from a real-world point of view, but provide a particularly precise way to test a model. Second, a model allows scientists to develop an intuitive understanding of the phenomena they are looking at, which again makes their work more efficient and more reliable. But perhaps most importantly, a model that has been exposed to several rounds of serious testing comes with a list of scenarios in which it works or doesn’t work, which is a very important element in generating trust in its predictions.
As an example of a successful model, consider Newtonian mechanics as taught in high-school physics classes. It has been around for a few centuries, and its strengths and limitations are well known. Contrary to what people believed initially, it is not universally true. It breaks down for objects moving at extremely high speed, and for objects of atomic size. But it works very well for many practically relevant situations. Thanks to this and other well-tested models, engineers and architects can design engines and buildings that work as expected.
In contrast, purely empirical science provides only provisional answers to the questions asked, because it is impossible to know, or even test, that all relevant aspects of the situation have been taken into account. In the Word vs. LaTeX study, prior knowledge of either system was taken into account as a parameter, but many other factors weren’t. It is conceivable, for example, that a person’s native language may make them “better tuned” to one or the other system. Or their work experience, or their education. And why not genetic factors or dietary habits – this sounds far-fetched, but it can’t be excluded. As long as there is no model explaining where productivity differences come from, it is not even clear what one would have to study in order to improve our understanding of the situation.
This uncertainty stemming from the existence of many unexplored potential factors makes it very risky to draw practical conclusions from purely empirical studies, no matter how well they were designed and executed. And this is a very real problem in many aspects of today’s life. Suppose you are determined to adopt the “healthiest” dietary regime possible, and turn to the scientific literature for guidance. You will find a bewildering collection of partially contradicting findings. Does eating eggs expose you to a higher risk of cardiovascular diseases? Do oranges protect you against the flu? You will find studies that claim to provide the answers to such questions, but they are purely empirical and based on a small number of observations. They may even be based on experiments on mice that were extrapolated to humans. And they definitely have not explored all imaginable aspects of the question. What it vitamin C is beneficial to everyone except people with some rare blood group? What if a specific gene variant decides how your body reacts to high sugar intake? Most probably no one has ever looked into these possibilities. Not to mention the much more fundamental question if a “healthiest” diet exists at all. Perhaps the best you can do is choose between a higher risk of a stroke and a higher risk of cancer.
To end with some practical advice: the next time you see some recommendation made on a “scientific basis”, check what that basis is. If it’s a single recent study, it’s safe to assume that the recommendation is premature. But even if it’s a larger body of scientific evidence, check if there is a model behind it, and if it has been tested. If it isn’t, be prepared to get a contradictory recommendation in a few years.