Fact or Fiction? Hyphens in an article title negatively impact citation counts
A recent article in the journal IEEE Transactions on Software Engineering by Zhi Quan Zhou, T.H. Tse, and Matt Witheridge, doi: 10.1109/TSE.2019.2915065, has drawn attention due to its claim that the number of hyphens in article titles negatively impacts the citation counts of such articles in abstract and citation databases such as Scopus and Web of Science.
Hyphens do not play a role in Scopus reference linking. Scopus matching algorithms (e.g. those that match references to articles) cope with many kinds of variations of punctuation, because punctuation in an article title is where the most discrepancies occur when the article is cited. Missed reference matches are continually analyzed by the Scopus team and the occurrence of hyphens in article titles is not a reason for the missing citation links. Therefore we reject the claim made in the article.
In addition, we observe:
- The dataset used in the article is very small: it contains 140,000 articles which is 0.19% of Scopus.
- The dataset is comprised of the top 20,000 most cited articles for each year between 2007 and 2013, at the time of collection. Hyphens in titles in this dataset are actually more prevalent than in Scopus overall, which contradicts the point that hyphens negatively impact citation counts, as these are top-cited articles.
- The Scopus reference matching algorithm makes use of a diversity of details in an article’s metadata and the metadata present in references pointing to the article in question. Critical datapoints are for instance (1) the names of the first author, and the name of the second author when present in the article; (2) volume/issue/page number/article number information; (3) publication year; (4) Journal name; (5) other metadata such as the DOI.
- The algorithm only takes the article title itself into account as an extra confirmation step in case a match has not been made via the aforementioned metadata fields. For this extra confirmation step, text is ‘flattened’,i.e., hyphenation and punctuation are ignored.