Wikipedia:Spot checking sources
![]() | This is an essay. It contains the advice or opinions of one or more Wikipedia contributors. This page is not an encyclopedia article, nor is it one of Wikipedia's policies or guidelines, as it has not been thoroughly vetted by the community. Some essays represent widespread norms; others only represent minority viewpoints. |
Verifiability is one of the five pillars of wikipedia. This means using in-line citations to reliable sources. Various processes on Wikipedia – DYK, GA, and FA – require editors to check citations, ensuring the cited material actually supports the statements cited to it without plagiarism. These are called spot checks.
Spot-checks are often used as a compromise between checking every citation (best quality, but prohibitively expensive) and doing no checking at all (easiest, but an unacceptable level of quality control). This type of statistical sampling is ubiquitous in industry, and there is no shortage of scholarship exploring how to perform the sampling to achieve a desired confidence level. Such treatment is outside of the scope of this essay, which instead gives some broad suggestions for how spot checks can be performed.
Things to consider before you start
[edit]Spot checking is one of the most time-intensive parts of a review, often second only to reading the article in detail, even if very few sources are checked. Time consuming parts include locating the source; locating the cited passage within the source; and comparing the passage with the article text. In some cases, it could happen that your spot-check becomes irrelevant in the light of other issues and is therefore not being acted on, especially at WP:GAN. Below is some advice to minimize frustration due to wasted effort:
- Start with a check against the quickfail criteria (WP:QF). Importantly, remember to check for plagiarism (use Earwig's Copyvio Detector), but be aware that, if the source itself is under a free licence, the content may potentially be copied and no quick-fail is warranted for this reason. Check if there might be an earlier GAN review with unaddressed comments. Also do a first quick check against the other four GAN criteria to make sure that you do not have to quick-fail the article for other reasons after you already completed the sources review.
- Do the easier bits of the sources review first. Look at the references list to check for any unreliable sources (obvious examples include personal websites, YouTube videos, and self-references to Wikipedia). Check if the article has any passages that lack inline citations (be aware, though, that unreferenced paragraphs can easily be created accidentally by adding line-breaks, in which case they are quick to fix and do not necessarily indicate deeper problems).
- Do the spot-checks before reading the article in greater detail, especially when it is a long article. It can be highly frustrating for both reviewer and nominator if a nomination has to be failed because of the sources review after much effort has already gone into a prose review.
How to access the sources
[edit]Communicate with the nominator
[edit]Locating the full text of sources can take a lot of time. In many cases, the nominator will be happy to explain how some of the sources can be accessed, or to send you copies of the sources you want to see for your spot check via wikimail (e.g., photographs of the respective pages of a book). While it is usually worthwhile to carefully ask, note that not all nominators can or want to send sources to other editors, and such a decision has to be accepted. Also consider that the nominator did not necessarily insert a particular source themselves, and consequently cannot be expected to have the respective source available. At WP:GAN, it is even possible to nominate articles without significant prior contributions to the nominated article. There are multiple other reasons why a nominator might possibly decline to send a source, for example because this would disclose their email address, because they don't have an email address, because they don't have access to the sources anymore or lack the knowledge to digitize them, or because the relevant pages contain personal notes. In all cases, reviewers should assume good faith.
Article authors may also balk at sending reviewers copies of reference material because they are concerned that doing so may violate copyright. While this is a laudable concern, copyright laws in most counties recognize that copies made for criticism, research, scholarship, or review is considered fair use. However, make sure that no source material is made publicly accessible (e.g., uploaded to Wikipedia or Wikimedia Commons directly), even if only temporarily, as this would likely be a copyright infringement.
Find the sources yourself
[edit]You might also prefer to try to locate sources yourself so that you do not have to wait for the response of the nominator and can finish a review in one go. Since you are only spot-checking a very limited number of sources, it might be possible to simply avoid those sources that are difficult to access. Academic search services like Google Scholar for scholarly sources often highlight full text pdfs where they are available (simply copy and paste the source title). Note that although publicly available full texts are often linked in the Wikipedia article directly, this is not always the case or even possible. For example, at WP:FAC, it is discouraged to link to papers uploaded to the social network Research Gate because of the risk that they were uploaded without consent of the copyright holders.
Many editors perform research through The Wikipedia Library. If you meet the activity threshold, you may be able to retrieve sources at no cost. Wikipedia is unable to link directly to the Wikipedia Library's hosting service in articles or talk pages, so the nominator may have replaced the link with one hosted on a different online repository. Many papers do not show up in the regular search, and it may be necessary to access the collection (e.g., "Science Direct") that includes the required journal to get the paper. For example, the journal Science is included in the collection "JSTOR". Some online tools, such as Zotero (link), have automatic proxy features which can redirect websites through the Wikipedia Library and give access.
For offline sources, many print books are available for free at the Internet Archive or via Hathi Trust. Phrases can be cross-referenced with a service like Google Books to validate the passage's material. The unpaywall browser extention may be able to find freely accessible versions of materials which are behind paywalls. Finally, requests for sources can be made at Wikipedia's Resource Request.
Choosing claims to spot check
[edit]There are various possible strategies for choosing claims to check, with different advantages and disadvantages.
- Check the most accessible sources. The easiest solution is just to check those sources which are freely available online. The disadvantage to this strategy is that it skews which sources get spot-checked towards those where readers are most likely to find issues naturally.
- Check the most difficult to access sources. If you have access to an obscure source, or are able to check sources in a foreign language, it might be a good idea to prioritise these as the ones other people are least likely to check.
- Choose claims at random. For instance, if there are 10 numbered footnotes in an article, use a random number generator to pick three numbers from 1 to 10 and check the corresponding sources. This helps ensure that you get a good spread of different sources, and stops you from lazily checking just the ones which are easiest.
- Check the most important sources. If an article is primarily based on a single source, prioritise spot checking that source. For example, when our article on Alice Kober was promoted to GA ([1]), 34 of the 41 footnotes referenced the book The Riddle of the Labyrinth. As the article is so heavily dependent on that source, that is arguably the most important to check is being used correctly – and if you can get hold of it, you can then check a lot of the claims with relatively little effort compared to having to search around lots of different sources.
- Check the most extraordinary claims. As a reviewer, you might read something in an article and think that it doesn't sound right. Maybe it doesn't mesh with what you already know about the subject. Maybe it just sounds implausible. If you think a fact is wrong, or sounds unlikely, it's probably worth checking to see if the source supports it.
- Check the ones which would have the worst impact if they are wrong. For example, contentious or negative statements in a WP:BLP.