A critical view of altmetrics

Altmetrics is one of the hotly debated topics in the Open Science movement today. In summary, the idea is that traditional bibliometric measures (citation counts, impact factors, h factors, …) are too limited because they miss all the scientific activity that happens outside of the traditional journals. That includes the production of scientific contributions that are not traditional papers (i.e. datasets, software, blog posts, etc.) and the references to scientific contributions that are not in the citation list of a traditional paper (blogs, social networks, etc.). Note that the altmetrics manifesto describes altmetrics as a tool to help find scientists publications worth reading. I find it hard to believe that its authors have not thought of applications in evaluation of researchers and institutions, which will inevitably happen if altmetrics ever takes off.

At first sight, altmetrics appear as an evident “update” to traditional bibliometry. It sounds pretty obvious that, as scientific communication moves on to new media and finds new forms of expressions, bibliometry should adapt. On the other hand, bibliometry is considered a more less necessary evil by most scientists. Many deplore today’s “publish or perish” culture and correctly observe that it is harmful to science in the long term, giving more importance to the marketing of research studies than to their careful design and meticulous execution. I haven’t yet seen any discussion of this aspect in the context of altmetrics, so I’d like to start such a discussion with this post.

First of all, why is bibliometry so popular, and why is it harmful in the long run? Second, how will this change if and when altmetrics are adopted by the scientific community?

Bibliometry provides measures of scientific activity that have two important advantages: they are objective, based on data that anyone can check in principle, and they can be evaluated by anyone, even by a computer, without any need to understand the contents of scientific papers. On the downside, those measures can only indirectly represent scientific quality precisely because they ignore the contents. Bibliometry makes the fundamental assumption that the way specific articles are received by the scientific community can be used as a proxy for quality. That assumption is, of course, wrong, and that’s how bibliometry ultimately harms the progress of science.

The techniques that people use to improve their bibliometrical scores without contributing to scientific progress are well known: dilution of content (more articles with less content per article), dilution of authorship (agreements between scientists to add each others’ names to their works), marketing campaigns for getting more citations, application of a single technique to lots of very similar applications even if that adds no insight whatsoever. Altmetrics will cause the same techniques to be applied to datasets and software. For example, I expect scientific software developers to take Open Source libraries and re-publish them with small modifications under a new name, in order to have their name attached to them. Unless we come up with better techniques for software installation and deployment, this will probably make the management of scientific software a bit more complicated because we will have to deal with lots of small libraries. That’s a technical problem that can and should be solved with a technical solution.

However, these most direct and most discussed negative consequences of bibliometry are not the only ones and perhaps not the worst. The replacement of expert judgement by majority vote, which is the basis of bibliometry, also in its altmetrics incarnation, leads to a phenomenon which I will call “scientiic bubbles” in analogy to market bubbles in economy. A market bubble occurs if the price of a good is determined not by the people who buy it to satisfy some need, but by traders and speculators who try to estimate the future price of the good and make a profit from a rise or fall relative to the current price. In science, the “client” whose “need” is fulfilled by a scientific study is mainly future science, plus in the case of applied research engineering and product development. The role of traders and speculators is taken by referees and journal editors. A scientific bubble is a fashionable topic that many people work on not because of its scientific interest but because of the chance it provides to get a highly visible publication. Like market bubbles, scientific bubbles eventually explode when people realize that the once fashionable topic was a dead end. But before exploding, a bubble has wasted much money and intellectual energy. It may also have blocked alternative and ultimately more fruitful research projects that were refused funding because they were in contradiction with the dominating fashionable point of view.

My prediction is that altmetrics will make bubbles more numerous and more severe. One reason is the wider basis of sources from which references are counted. In today’s citation-based bibliometry, citations come from articles that went through some journal’s peer-reviewing process. No matter how imperfect peer review is, it does sort out most of the unfounded and obviously wrong contributions.  To get a paper published in a journal whose citations count, you need a minimum of scientific competence. In contrast, anyone can publish an opinion on Twitter or Facebook. Since for any given topic the number of experts is much smaller than the number of people with just some interest, a wider basis for judgement automatically means less competence on average. As a consequence, high altmetrics scores are best obtained by writing articles that appeal to the masses who can understand what the work is about but not judge if it is well-founded. Another reason why altmetrics will contribute to bubbles is the positive feedback loop created by people reading and citing publications because they are already widely read and cited. That effect is dampened in traditional bibliometry because of the slowness of the publishing and citation mechanism.

My main argument ends here, but I will try to anticipate some criticisms and reply to them immediately.

One objection I expect is that the analysis of citation graphs can be used to assign a kind of reputation to each source and weight references by this reputation. That is the principle of Google’s famous PageRank algorithm. However, any analysis of the citation graph suffers from the same fundamental problem as bibliometry itself: a method that only looks at relations between publications but not at their contents can’t distinguish a gem from a shiny bubble. There will be reputation bubbles just like there are topic bubbles. No purely quantitative analysis can ever make a statement about quality. The situation is similar to mathematical formalisms, with citation graph analysis taking the role of formal proof and scientific quality the role of truth in Gödel’s incompleteness theorem.

Another likely criticism is that the concept of the scientific bubble is dubious. Many paths of scientific explorations have turned out to be failures, but no one could possibly have predicted this in the beginning. In fact, many ultimately successful strategies have initially been criticized as hopeless. Moreover, exploration of a wrong path can still lead to scientific progress, once the mistake has been understood. How can one distinguish promising but ultimately wrong ideas from bubbles? The borderline is indeed fuzzy, but that doesn’t mean that the concept of a bubble is useless. It’s the same for market bubbles, which exist but are less severe when a good is traded both for consumption and for speculation. My point is that the bubble phenomenon exists and is detrimental to scientific progress.

  Sibele Says:

    Mas igualmente como acontece na economia, onde as bolhas de mercado são forjadas por detentores do controle dos meios especulativos financeiros, as “bolhas científicas” na bibliometria tradicional podem ser ( e efetivamente são) forjadas por detentores do controle dos índices bibliométricos, normalmente grandes editores que controlam a indexação das revistas científicas e índices como o Fator de Impacto.

    Se as altmetrics favorecem mais as bolhas científicas, pelo menos é de uma forma mais livre e sem as amarras determinadas por grandes grupos econômicos ligados à indústria editorial científica. E nisso, as opiniões de “qualquer um”, seja no Twitter, Facebook ou mesmo blogs (que vejo como formas livres e legítimas de expressão de opiniões – inclusive você deve concordar, já que tem esse seu ótimo blog para tornar públicas suas próprias opiniões…) têm sim sua importância, mesmo que sejam manifestações e opiniões de não especialistas – ou a ciência deve continuar em sua torre de marfim e longe do público, só discutida entre os distintos pares? Sendo que é o público que financia boa parte das pesquisas, a maioria pagas com recursos públicos?

    Inclusive essa maior liberdade que as Altmetrics proporcionam, evidenciando o interesse público pela ciência, favorecem um outro aspecto que vejo como muito positivo: essas metrias alternativas permitem que uma produção científica que sempre foi marginalizada – principalmente aquela oriunda dos países emergentes e periféricos ao “scientific mainstream”, mas que é interessante e necessária no seu contexto – seja finalmente destacada, ganhando sua devida relevância (J. C Alperin discutiu essa questão aqui: http://asis.org/Bulletin/Apr-13/AprMay13_Alperin.html).

    Mas parabéns pelo post e por manter acesa a discussão sobre as altmetrics. Quanto mais discussão, melhor! :)

    khinsen Says:

      Opening science to a wider audience and listening to their feedback is all fine with me. A separate metric measuring public interest in some field of research might indeed even be useful. The problem is the use of a metric in evaluation and funding decisions. It’s great if science is accessible, understandable, and popular, but the most basic criteria useful science has to satisfy is to be reliable, verifiable, and innovative. I doubt that the general public can make useful comments on these aspects.

  Sibele Says:

    Ok, these are the basic criteria to determine useful science. But what determines that science is reliable, verifiable, and innovative? Traditional citations? It not seems, for this joint report by the International Mathematical Union (IMU), the International Council of Industrial and Applied Mathematics (ICIAM) and the Institute of Mathematical Statistics (IMS) that points serious criticism in evaluating the quality of science through citations: http://projecteuclid.org/DPubS?verb=Display&version=1.0&service=UI&handle=euclid.ss/1255009002&page=record.

    And so? I have doubts too.

    khinsen Says:

      Traditional citations are not the right tool either, as I already said in my post. But whereas altmetrics improves on traditional bibliometry in some respects, I believe it is worse in others.

  Sibele Says:

    These many gaps only show that the science evaluation really is a field with great potential study – because it needs to improve a lot yet. But I think we already walked a far when we see that certain misleading measures considered important, as IF, which was uncontested for decades, is finally becoming irrelevant, and other alternatives are beginning to emerge, though still in need of more theoretical grounding.

    Another measure that promises be a good parameter for evaluation is the social impact of science – not restricted only to the public interest, but to the actual benefits that science returns to society (see Bornmann & Marx: http://onlinelibrary.wiley.com/doi/10.1002/asi.22803/abstract). But still this issue also has its caveats, related to those basic research that do not present an immediate or short to medium term return, despite its high need for future development.

  khinsen Says:

    Much work has indeed been done on evaluating science, but I haven’t seen much evaluation of all of these science evaluation methods. It isn’t even clear what the ideal criteria of evaluation should be.

    For me it is obvious that the most important criterion is correctness, for which I haven’t seen any attempt at systematic evaluation. We work with the tacit hypothesis that peer review weeds out faulty work. That hasn’t worked as well as we believed, but at least peer review provides a filter. I don’t see any quality filter in altmetrics, for example. A study that hasn’t been checked and/or replicated by someone competent shouldn’t even be eligible for any other evaluation criterion.

    An interesting idea is the use of digital badges, see http://software-carpentry.org/blog/2013/05/translucent-badges.html for the details. This shifts the problem to “who hands out the badges and how”, but at least it is clear that badges are personal and about competence and experience.

