User:Dicklyon/Why I care about over-capitalization
This is an in-process personal essay in user space. Reactions are invited on the corresponding talk page.
tl;dr: Wikipedia is unreasonably effective at influencing usage by outside authors. Over-capitalization in Wikipedia leads to increasing capitalization in sources, and I see this as harmful to the idea that capitalization is primarily for proper names. I'm not trying to right great wrongs, but to ameliorate an ongoing small degradation of our language. And I'm trying to be transparent about my motivation and approach.
Why our P&Gs say what they do about capitalization
[edit]The decision to use sentence-case, as opposed to title-case, article titles in Wikipedia was originally for a technical and convenience reason: it was decided (by whom?) that wikilinks would be case sensitive except for the first letter, which meant that if article titles were in title case, then links to them in sentences, without title-case capitalization, would not work unless sentence-case redirects were added for all multi-word titles. To prevent that need or difficulty, it was decided that sentence-case article titles should be used. This was represented in WP:NCCAPS in 2002 as: "Unless the term you wish to create a page for is a proper noun, do not capitalize second and subsequent words. ... one should leave the second and subsequent words in lowercase unless the title phrase would always occur capitalized, even in the middle of a sentence" (with always italicized).
So the test for capitalization naturally became "is it a proper noun?" (or proper name).
When MOS:CAPS was split off as its own page in Feb 2006, it was still pretty much a collection of random advice, but with the principle "Initial capitals and all capitals should not be used for emphasis" (this is about article text more than titles). In December 2007, the MOS:CAPS lead was updated to where it's closer to the modern one, with "Wikipedia follows a conservative usage style for capitalization (unnecessary capitalization is avoided). The main use of capitalization is for proper names, acronyms, and initialisms."
By the end of 2011, we had been through a lot of turmoil at MOS, especially around PMAnderson's "follow the sources" mantra that tried to diminish the idea that Wikipedia should respect its own style. See particularly the last section at the talk page of 17 December 2011, where we discussed and accepted what's pretty close to the modern source-based consistency principle:
- "Wikipedia relies on usage in sources to determine what is a proper noun; words and phrases that are consistently capitalized in sources are treated as proper nouns and capitalized in Wikipedia."
which was derived from User:DGG's thoughtful suggestion from a few months earlier (see it at that same talk page version):
- "I think the best rule is that something has to be consistently referred to as a proper name by third party discriminating sources, for us to do so. That most of the available sources may use capitals is not conclusive, because most of them are likely to just copy the press release. Capitals in this context are promotional. I rewrite quite a number of articles of products and the like, and I start off by changing most of the mentions of full names to "it" or the equivalent, and then changing the capitals to lower case. Those two steps alone are often enough to make something look like an encyclopedic article." — User:DGG
(Sadly, DGG is no longer with us.)
Alternative views of what should be capitalized in Wikipedia
[edit]We occasionally see editors who know about our title policy yet object to using sentence-case titles.
And there are a huge number of editors who just never noticed that we have such a policy at all. For example, arguing in an RM discussion: "We don't capitalize 'civilization' everywhere. We do, ideally, only in the page title or section titles." Editors ignorant of the policy provide a constant stream of article titles to fix, but are not much of a real problem as long as there's enough style gnoming going on to keep up. But sometimes people disagree on whether the name of a topic is a proper name, or have other divergent ideas about where capitalization is needed.
User:DGG's view that "something has to be consistently referred to as a proper name by third party discriminating sources, for us to do so" has been MOS:CAPS' core principle for deciding proper name status for many years now. While it has generally received wide support in RM and RFC discussions, and is reasonably well aligned with the always in title policy, it is not the only approach that editors like to use to decide what is (or should be treated as) a proper name.
The most common alternative is what we might call, following DGG, the "Press Release Style", in which the phrases describing the topic being discussed are capitalized for emphasis or promotion, or for an "offical" look. This may be OK for names of products that are consistently capitalized in sources, but for phrases that are also commonly used lowercase in reference to the same entities, e.g. as in military equipment such as M6 bomb service truck and M40 gun motor carriage, it conflicts with the way we usually decide.
Others resort to simple dictionary definitions (e.g. "A proper noun is a noun that serves as the name for a specific place, person, or thing.") or theoretical grammatical concepts (e.g. as described in Proper name (philosophy)) to decide what's a proper name. By this logic they can typically claim that any descriptive phrase that is specific enough to denote a particular entity is a proper name; or they can argue that if the name is descriptive, it's not a proper name. Debates over this are hard to settle in any objective way.
Examples of arguments of these sorts, which may sound plausible but don't explain why Wikipedia would cap something that sources mostly do not:
- "The Syrian Revolution is proper name as it refers to a specific revolution in Syria and not a general phenomenon of revolution in Syria."
- "it is the Galactic Center (capital G capital C) because it is the centre of a specific galaxy, namely the Milky Way, so it is a proper noun."
- "Revolutions become proper names once long-established and in the light of history."
- "It is a proper noun, the name of a specific tree."
In addition, some topic areas managed to get special exceptions into MOS:CAPS many years ago. In the old days (from 2007 at least), official common names of bird species were capitalized, but that was removed some years ago after long discussion arrived at a consensus to not follow that style. Also from the beginning there has been an exception for astronomical names (modern version at MOS:CELESTIALBODIES), which results in capitalization in Halley's Comet ([1]) and Andromeda Galaxy ([2]), even though "comet" and "galaxy" are only about half capped in sources about those, and Solar System ([3]) (since 2008) and other such terms even when their capitalization in sources is very much in the minority. The International Astronomical Union's style guide has been cited as a reason, but explicit reliance on that has not been accepted in the MOS.
Also note that MOS:MILTERMS (and its fork WP:MILCAPS) seems to be an exception in how proper name status is decided for terms related to military history and military equipment. There is not much agreement on whether it is actually specifying exceptions, versus clarifications, to the usual criteria of MOS:CAPS, but most related RM discussions have closed with a consensus to lowercase even when some milhist editors interpreted it as saying to capitalize (see User:Dicklyon/MIL precedents).
Wikipedia's unreasonable effectiveness
[edit]I have found a good number of topics on Wikipedia where the relative rate of capitalization of the topic name in sources turned upward shortly after Wikipedia created an article with the title capitalized. It's hard to do an unbiased statistical study, but it certainly appears that Wikipedia's treatment of a topic name as (or as if) a proper name influences authors of books to also treat it as a proper name (and then, to complete the circularity, that usage trend gets some WP editors to insist on treating it as a proper name, even when we note that it's still not consistently capitalized in sources). This is a form of "citogenesis", and it matters when Wikipedia has a capitalization standard that is grounded in capitalization rates in sources.
A few examples:
- National Signing Day most often appeared in books as "national signing day" up until a couple of years after our article was created with title case in 2004. The book n-gram stats show a strong trend toward increasing capitalization over the following decade, probably (but not provably) triggered by our choice.
- The ACC Championship Game started off (in 2006) with lead sentence "The Atlantic Coast Conference, or ACC, hosts a conference championship game in football each year." It was soon changed, copying the over-capitalized title into the text, to "The ACC Championship Game is a football game held by the Atlantic Coast Conference each year." Over the following years, capitalization in books moved from mostly lowercase to mostly capped; book n-gram stats show a takeoff in capping starting around 2010.
- The NFL Combine article was created in 2005, when sources capped combine only about half the time. By 5 years later, and since, it's close to 2/3 capped.
- The Apple Campus article, created in 2010, capped campus, as if it were part of a proper name, before any book sources did, according to book n-gram stats; then the caps shot up. The words "Apple Campus" had been on the sign in front of their buildings where I used to work at One Infinite Loop, Cupertino, for a long time before that, but words on signs of course don't signal proper name the way a Wikipedia article title does.
- The M1918 Browning Automatic Rifle article, created in late 2003, started a trend to use that exact title in books, increasing the capitalization of "Browning Automatic Rifle", which was running about equal to "Browning automatic rifle" rifle before then.
- The Cambodian Civil War article, created in 2003, didn't get the title into the lead until 2006, at which time it was still mostly "Cambodian civil war" in book sources. By a decade later, it was mostly capped.
- The North Yemen Civil War article, moved from Yemen Civil War in 2006, resulted in a lot of capitalization where before 2006 there was none; the use of the "North" was relatively rare before WP added it to the title. Omitting "North", there was a big uptick in "Yemen Civil War" in 2004 due to this book, before Wikipedia made that capped article in 2005; the capping continued afterward (likely partly influenced by the book and partly by WP).
- NBA Eastern Conference finals and NBA Western Conference finals (in section titles) had capitalized "Finals" from 2005, and capping in sources increased a lot from about that time. This was fixed in May 2024, broken again in June 2025.
- Our Seven Days Battles article has been over-capitalized since 2005. Book stats show increasing capitalization over the years since then.
- The Indus Valley Civilisation article has used capped "Civilisation" since 2001, even though sources were more often lowercase; recently sources are more often capitalizing. We fixed that one in 2022, briefly, but then the close with consensus to lowercase was overturned at move review.
My approach
[edit]Titles first
[edit]Over-capitalization is common both in article titles and in article text, so I work on fixing both. The presumption is generally that a topic-naming phrase will appear in text the same way it appears in titles, except maybe for the first letter, which would be lowercase in text when it's not a proper name (e.g. banana split, not Banana split). Edits to text will last better if they agree with titles, so titles get priority for fixing.
In many cases, over-capitalized titles are obvious errors, and fixing them is obviously not controversial, so a WP:BOLD move is in order, with no discussion necessary. Even articles that have been around a long time with over-capitalized titles, unnoticed by gnomes for years, are typically fixable this way with no reaction, no controversy. More on this below.
Controversial cases
[edit]When a move is potentially controversial, e.g. because it's capitalized often enough in sources that some editors might argue it's a proper name, then an WP:RM discussion is in order. If there's a cluster of similar article titles needing a similar fix, a multi-RM is more appropriate than many separate ones. For example, the multi-RM discussion at Talk:Minsk district#Requested move 25 December 2024 resulted in unanimous consensus to lowercase 116 titles; it drew no opposition even though it was advertised at 116 articles and at a project's article alerts, at Wikipedia:WikiProject Belarus#Move discusions, and even though the over-capitalized titles were over 15 years old.
Sometimes a single or multi RM is "hotly contested", or has about as many opposing as supporting responses. I typically try to clarify the evidence from sources, and remind opposers of the guideline statements, in hope that either they'll change their mind or the closer will judge that the supporters have better arguments, better rooted in guidelines and data. For example, see the discussion at Talk:North Yemen civil war#Requested move 28 November 2024, in which the closer, sadly, simply said "moved" instead of explaining how he came to that as consensus in light of the opposing votes numbering about the same as the supports. There was a lot of back-and-forth about the interpretation of book n-grams and scholar articles, so it was a fairly contentious discussion. Usually everyone stays polite and civil and focusing on their different interpretations, as in that one. Sometimes it gets worse; I try to keep it calm.
As can be reviewed at Wikipedia talk:Manual of Style/Capital letters/Concluded RM archive, we had about 600 RM discussions over a 4-year period, with the great majority of those closing with a consensus to use lowercase, in many cases establishing clear patterns that seem to clarify how editors interpret MOS:CAPS, and that should lead to a reduction in uncertainty and contention in similar issues in the future. But it doesn't always work that way. A few editors cling to approaches that have been repeatedly shot down in RM discussions – and they sometimes get their way, or prevent getting to a consensus. So discussions/arguments continue.
Bold moves
[edit]Besides those discussions, a fair fraction of which I originated, I have also been doing on the order of a thousand WP:BOLD (aka unilateral or undiscussed) moves per year. Sometimes I try to do a move that I think is obvious and uncontroversial, and it doesn't work because there's already a redirect that has been edited at the target title. So I look at those to see the history of the redirect and whether there has been any case discussion or previous moves. Most often, the redirect was created long ago by a user who wanted to use the topic phrase in a sentence, but got a redlink because the use in the sentence didn't match the over-capitalized title, so instead of fixing it by moving he simply added a redirect with lowercase. And then a user or bot came along and tagged it as "Redirect from other capitalization". So these I typically fix via a request at WP:RMTR (requested move, technical request). If nobody objects and the functionaries working that page don't see an issue with it, it gets moved. So essentially equivalent to a WP:BOLD undiscussed move.
Bold moves that are questioned
[edit]Now and then, a bold or technical move will be objected to, either explicitly by asking me or at WP:RMTR to have the move reverted, or implicitly by asking me why I did the move. In the latter case, a quick explanation and pointer to MOS:CAPS is often enough to satify the inquiry and there's no move back; or we work out a mutually satisfactory alternative, as we did here. When a move is reverted, I typically do an RM to see if we can reach consensus. Most often, the consensus does support the move that I thought was uncontroversial (the RM is even unopposed, not uncommonly), but sometimes discussion proves how wrong I was about that.
Sometimes even if my move is not questioned I'll reach out to the person who did the over-capitalization to let them know about MOS:CAPS, especially if I see they have a pattern of such things. For example, as I did here with positive effect.
Making mistakes – and trying to avoid them
[edit]Sometimes my bold moves are big mistakes. Like one I moved in Feb. 2024, a year after a failed RM that I had forgotten about; I don't recall how we managed to not even have a redirect in the way, and apparently I didn't look at the talk page. We immediately discussed that misstep, reverted it, and followed up with an RM, which finally arrived at consensus to lowercase, as expected. Not a good process, but a good outcome, quickly settled with no drama, except from one editor who noted "This is not uncontroversial and is exactly the sort of behavior that pisses people off about Dick's decapitalization crusade." True that, which is why I've tried not to repeat such mistakes.
A more subtle mistake is moving a page where I think the case fix will be uncontroversial, and some editor shows I was wrong about that. I can't call them all correctly all the time, but I try to keep the proportion of unchallenged/unquestioned bold moves to 90% or better, and try to discuss politely and quickly when questioned.
I have also gone overboard in a few cases, lecturing other editors at length when there's no way they'd care what I have to say. I've come to realize that's pointless, and might just turn up the temperature on a discussion when what's needed is a cooling off. Here is an example where saying so much was probably a mistake.
Post-move cleanup
[edit]Most of my case-fixing edits are in fixing over-capitalization in text, in links, etc., after moving a title. It is useful to tag the redirect from an old title with Template:R from miscapitalisation, which puts it in a maintenance category involved in forming the report Wikipedia:Database reports/Linked miscapitalizations. I usually try to look for and fix such things by checking "what links here", without waiting for the report; especially looking in Template space, since links in templates will often cause large numbers of articles to show up on that report even though they don't link the over-capitalized title directly.
Keeping up with the reported linked miscapitalizations has proved to be pretty much impossible, especially without semi-automated editing tools such as AWB. Sometimes I can get others to help by making a specific enough AWB task request or bot task request, but that tends to be a slow process, too.