Wikipedia:Village pump (miscellaneous)
Policy | Technical | Proposals | Idea lab | WMF | Miscellaneous |
For questions about a wiki that is not the English Wikipedia, please post at m:Wikimedia Forum instead.
Discussions are automatically archived after remaining inactive for 8 days.
Meaningful intervals for edit size histogram
[edit]With T236087 XTools is going to get a histogram of a user's edit sizes soon. This will be a bar chart. For screen real estate reasons, it's max ~12 bars. The idea is that each bar gives the number of edits in a certain size interval. My question is: which intervals do you think we should use? The current code uses 200-width intervals (0-200, 200-400, &c), up to 1800-2000, and lumps the rest into >2000.
The issue with fixed-width intervals is they don't allow much granularity for smaller edits (e.g., separating the +1 typo fix from the +120 paragraph addition). I was thinking also of perhaps something exponential like 0-20, 20-40, 40-80, 80-160, 160-320, 320-640, 640-1280, 1280-2560, >2560. What do you think could be more meaningful to users, and why? Welcoming suggestions. Thanks, — Alien 3
3 3 16:33, 28 April 2025 (UTC)
- Just looking at my most recent mainspace contributions, the <10 typo fix or minor c/e shows up, then from 10-100 there's larger copyedits, adding categories, and formatting tweaks. The adding text+adding source seems to start from perhaps 200. I have a small number of +2000 edits which seem meaningfully distinct from say reverting page blanking vandalism, so I'd put the final bin a bit higher. CMD (talk) 02:32, 29 April 2025 (UTC)
- Thanks for the answer! When you say "higher", where would that be? 3K? 4K? 10K? Just asking for a general order of magnitude. — Alien 3
3 3 09:09, 29 April 2025 (UTC)- Probably something like 5K or 10K? Maybe someone has an existing histogram this could be based on. CMD (talk) 12:15, 29 April 2025 (UTC)
- Thanks for the answer! When you say "higher", where would that be? 3K? 4K? 10K? Just asking for a general order of magnitude. — Alien 3
- What about negatives? A few years ago I looked at my edits (in mainspace) and found that my median change was −3 bytes. —Tamfang (talk) 23:33, 29 April 2025 (UTC)
- This would be in absolute value, i.e. putting -1 with +1. Else it takes twice as much width. We could do both positive and negative, but then we'd have pretty low granularity (could only have about 6 bars on either side). — Alien 3
3 3 05:43, 30 April 2025 (UTC)- Could you split the bars in two? Top colour is positive and bottom colour is negative. 80.76.122.163 (talk) 08:45, 1 May 2025 (UTC)
- We could, I think. Question would be, what do we do with 0? is it positive or negative? — Alien 3
3 3 09:25, 1 May 2025 (UTC)- Centered/split? I agree that positive/negative above/below the horizontal axis was also where my mind went immediately. -- Avocado (talk) 22:27, 4 May 2025 (UTC)
- Yup, that's done (see discussion below). Currently the zero is put between the additions and the x-axis in the 0-10 interval, in a separate colour.
- Splitting the zero bar (as in half-above and half-below) is not doable with our library without some meh hacks I'd really like to avoid. — Alien 3
3 3 09:49, 5 May 2025 (UTC)
- Centered/split? I agree that positive/negative above/below the horizontal axis was also where my mind went immediately. -- Avocado (talk) 22:27, 4 May 2025 (UTC)
- We could, I think. Question would be, what do we do with 0? is it positive or negative? — Alien 3
- Could you split the bars in two? Top colour is positive and bottom colour is negative. 80.76.122.163 (talk) 08:45, 1 May 2025 (UTC)
- This would be in absolute value, i.e. putting -1 with +1. Else it takes twice as much width. We could do both positive and negative, but then we'd have pretty low granularity (could only have about 6 bars on either side). — Alien 3
- I like the exponential (or semi-log?) better than a straight division. Most of our edits are actually small.
- What I really wish is that we could get numbers for changes to readable prose (e.g., not fiddling with whitespace and template formatting). WhatamIdoing (talk) 03:25, 1 May 2025 (UTC)
- Sadly, that's just not doable on a statistical scale. The best possible in reasonable time would be a bit below 100 edits, which is not a lot.
- If you're ready to wait something like at least 30 seconds for it, we could make a separate tool that does this.
Update: now looks like this. Other suggestions? — Alien 3
3 3 13:29, 1 May 2025 (UTC)
- The link doesn't work.
- Instead of a separate tool (I greedily want all the tools, but would I use it often enough to justify your efforts? I'm not sure, in this case), I wonder if it would be possible to add Special:Tags to non-prose changes. Something like the "Undo" tag, which is calculated later? WhatamIdoing (talk) 03:53, 2 May 2025 (UTC)
- Well, my bad for the link. This one should work.
- Adding tags is beyond our capacity (should ask the mw people), but I get the use of it. I'm wondering, though: is a non-prose change a change that changes no prose, or that also changes something that isn't prose? — Alien 3
3 3 05:36, 2 May 2025 (UTC)- The red/green color choice in the diagram probably needs to be checked for Wikipedia:Manual of Style/Accessibility purposes. Could the red/minus items hang down below the 0 line?
- About non-prose changes: I don't want to be bothered with edits like these: [1][2][3][4][5][6]. I do want to see edits like this one: [7] WhatamIdoing (talk) 20:27, 2 May 2025 (UTC)
- Current histogram, after some color tweaking and putting the neg below the 0 line. (Actually, it was the grey that was really problematic for accessibility). — Alien 3
3 3 08:16, 3 May 2025 (UTC)- That shape is a little easier for me to understand at a glance.
- Does the new color scheme work for someone with Red–green color blindness? WhatamIdoing (talk) 22:41, 3 May 2025 (UTC)
- Yes; I checked. Still clearly distinguishable. — Alien 3
3 3 22:55, 3 May 2025 (UTC)- Thanks. WhatamIdoing (talk) 17:08, 8 May 2025 (UTC)
- Yes; I checked. Still clearly distinguishable. — Alien 3
- Current histogram, after some color tweaking and putting the neg below the 0 line. (Actually, it was the grey that was really problematic for accessibility). — Alien 3
Many thanks to everyone for all the input! Will probably go out in the next deployment or two. — Alien 3
3 3 12:29, 9 May 2025 (UTC)
File:Syrian Petroleum Company Logo.png
[edit]Hi ,how deleted this logo (File:Syrian Petroleum Company Logo.png) ,is not a official logo in this website (https://spc.sy/) the official logo is a colour blue in top? (google translator). AbchyZa22 (talk) 08:42, 30 April 2025 (UTC)
- @~Berilo Linea~ and Yedaman54, it looks like the logo at the top of Syrian Petroleum Company might be outdated (or maybe they use different colors for their website vs other places?). Could you look into it? WhatamIdoing (talk) 03:30, 1 May 2025 (UTC)
- @Freedoxm and @Abo Yemen any opinion?? AbchyZa22 (talk) 20:19, 3 May 2025 (UTC)
- Not as of right now. Freedoxm (talk · contribs) 23:00, 3 May 2025 (UTC)
- @Freedoxm and @Abo Yemen any opinion?? AbchyZa22 (talk) 20:19, 3 May 2025 (UTC)
AI tool to fact-check articles (proof of concept)
[edit]I have created a proof of concept tool for automating fact-checking of articles against sources using AI. GitHub repository. An OpenAI API key or compatible provider is required (I use BotHub). It is cost-effective; when using gpt-4.1-nano, verification of one 100-word block against a single source (approximately 12,000 characters) costs about 0.1 cent. Functionality:
- The program loads the article text from file and all available sources (text files: source1.txt, source2.txt, etc.).
- It divides the article into blocks of approximately 100 words, preserving sentences.
- For each block and each source:
- Sends a request to the OpenAI API for correspondence analysis
- Receives credibility probabilities for each word
- Combines results for all blocks and sources
- Visualizes the text with color coding based on the obtained probabilities (textmode with all sources combined or GUI allowing to select individual sources)
Installation and usage instructions, along with example screenshots, are available in the README. Bugs are certainly present (almost all code was generated using Anthropic Claude 3.7).
It is also possible to use models hosted locally by installing an OpenAI API compatible LLM server (such as LLaMA.cpp HTTP Server) and directing script to use it with --base_url and --model parameters.
Suggestions and proposals are welcome, but unless submitted as pull requests, they will be reviewed at an indeterminate time. The creation of new tools based on this idea and code is strongly encouraged. Kotik Polosatij (talk) 13:40, 5 May 2025 (UTC)
- Interesting, thanks! -- GreenC 00:56, 9 May 2025 (UTC)
Papal traffic - one of our busiest hours?
[edit]In case anyone is curious, I did a bit of digging on yesterday's traffic:
- On 8 May, the Pope Leo XIV article here was read 13.2 million times ; the Spanish, Italian, German, French and Portuguese made up another 10.9 million. This was 4.5% of all pageviews in the day for English, and as high as 12.9% for the Spanish Wikipedia. (These figures include all traffic from redirect pages)
- Absolute totals for all Wikipedias are a little trickier. The count for pageviews of the "main article title" was around 15 million on all 93 Wikipedias with articles; the six biggest ones above made up 88.5% of that. So assuming the breakdown between main articles + redirects is in proportion, maybe something like 27 million pageviews overall, including redirects.
- We went from 23 WPs having an article on him before the announcement, to 93 by midnight UTC, and 113 now. 20 Wikipedias managed to rename their article in the first three minutes (17:14 to 17:17 UTC) and two other projects had created new articles on him by that time.
- In the hour after the announcement (17:00 to 18:00 UTC), English Wikipedia had around 8.4 million hits on Pope Leo XIV and the redirect titles - around half of those were to Robert Francis Prevost - which represented one third of all pageviews during the hour.
- It probably represented over 40% of all pageviews, over 3000/second, from 17:14 to 18:00 (assuming that the other traffic was evenly distributed) and while the public data doesn't go lower than hourly, I would be happy betting money that in the first fifteen minutes, it was well over half of our traffic.
I don't know if this was our one-time traffic record, but it must certainly be well up there. Congratulations to everyone who worked on it. Andrew Gray (talk) 21:12, 9 May 2025 (UTC)
- Other contenders: Death and funeral of Pope John Paul II; Death of Michael Jackson. I think the Michael Jackson one maxed out our servers. --Redrose64 🌹 (talk) 22:26, 9 May 2025 (UTC)
- Looks like the death of Michael Jackson in 2009 and the views it generated caused wikitech:Michael Jackson effect, which was solved by our software engineers writing the software mw:PoolCounter, which is now installed on our servers to prevent it from happening again. An interesting bit of technical history. –Novem Linguae (talk) 22:40, 9 May 2025 (UTC)
- Interesting, thankyou - I had somehow forgotten the Jackson case!
- That page points to Wikipedia:Article traffic jumps which identifies a handful pushing towards 10m in a day (Kobe Bryant, Matthew Perry, Elizabeth II). Some of these do not include redirects in the count and so are ahead of Leo XIV on purely "single title" data, but I think none are likely to beat the one-day (or one-hour) figure for Leo once redirects are included (and IMO they should be).
- I'll see if I can work out what any of these were like as a percentage of traffic - in particular it seems plausible that Steve Jobs might be higher than Leo XIV, with 7.4m views in 2011. Andrew Gray (talk) 22:54, 9 May 2025 (UTC)
- Looks like the death of Michael Jackson in 2009 and the views it generated caused wikitech:Michael Jackson effect, which was solved by our software engineers writing the software mw:PoolCounter, which is now installed on our servers to prevent it from happening again. An interesting bit of technical history. –Novem Linguae (talk) 22:40, 9 May 2025 (UTC)
Looking at some recent high-traffic deaths, with a little rounding up added to the global data for redirects (which are relatively rare for stable articles like these ones):
- Matthew Perry got ~8.8m enwiki hits on 29/10/23, and ~11.8m globally, which would put him at 3.7% of enwiki traffic and 2.1% of global traffic. (Death was reported about midnight UTC)
- Kobe Bryant got ~9.5m enwiki hits on 26/01/20, and ~15.1m globally, which would put him at 3.4% of enwiki traffic and 2.6% of global traffic. (Death was reported about 1930 UTC)
- Elizabeth II got ~8.5m enwiki hits on 8/9/22, and ~20m globally, which would put her on 3.2% of enwiki traffic and 3.5% of global traffic. (Death was reported about 1730 UTC)
My rough estimate for the Pope had 4.5% of enwiki and (more tentatively) 4.4% of global traffic in the day, so I think that puts him ahead of all three. Interesting to see, though, the difference between Elizabeth/Leo and Perry/Bryant in terms of English vs global traffic. Peak hour was I think around 3.5m/21% for Bryant, 2.2m/13% for Elizabeth II, and 1.3m/11% for Perry, so again all a bit behind what we saw this week.
- For Jobs in 2011, we have the problem that a new and more reliable pagecount system came in about a month after his death. From what we do have (which may have errors/omissions), I get ~7.8m enwiki hits over the full day 6/10/11 (counting Steve Jobs & the main redirect at Steve jobs). Total hits for the day were 231.5m for enwiki, so this suggests Jobs was ~3.3% of English Wikipedia traffic that day, maybe a shade higher to account for the other redirects. Jobs's death seems to have been announced about midnight UTC so the affected period covers the full day; for the peak hour (1-2am) it was 10% of all traffic.
- For Jackson in 2009, with the same caveats, there were ~1.5m hits over the full day 25/6/09 (Michael Jackson + Michael jackson), or 0.6% of total enwiki traffic, but his death was announced only in the last couple of hours of the day so it's not a great comparison. The last two hours of the day had ~7.1% of all enwiki traffic go to the two Jackson page titles, and the last hour had ~12%.
Again, I think the data for the Pope this time around is ahead of both in terms of the share of traffic and the one-hour spike.
In terms of overall sitewide impact, 8 May was a relatively normal day for English Wikipedia in absolute traffic terms - it was busier than usual, especially for a Thursday, but only the fifth busiest this year. However, for Wikimedia as a whole, it was quite a leap, with 613m pageviews - this is the most it has been since 28/1/2024, and the sixth highest since the start of 2021. — Preceding unsigned comment added by Andrew Gray (talk • contribs)
How many left?
[edit]At this writing, there were 6,991,903 articles in the encyclopedia, and as you are reading, there are now 6,992,175. There are 7825 left to go to hit the big 7M! Who will be the lucky one to make the seven millionth edit article?? Mathglot (talk) 07:09, 10 May 2025 (UTC)
- P.S. If you are sitting here hitting reload to see the number change, you might need to listen to the calming sound of Wikipedia being edited. Mathglot (talk) 08:41, 10 May 2025 (UTC) the page instead. While you do that, you can
- Surely we've hit our 7th million edit! I have a list of notable article topics and I might get to some of them, so I'll try and chip away at a quarter of a percent. CMD (talk) 09:03, 10 May 2025 (UTC)
- Yes, we're up into the region of 1.2 thousand million edits now (specifically, 1,285,082,165). I suspect that Mathglot meant "seven millionth article" when they wrote "seven millionth edit". --Redrose64 🌹 (talk) 13:31, 10 May 2025 (UTC)
- Big 'oops!' on my part. Of course I meant article, thanks for the correction. Someone trout me! Mathglot (talk) 18:36, 10 May 2025 (UTC)
CMD (talk) 02:31, 11 May 2025 (UTC)
- Gawrsh, thanks; I needed that! [wipes trout juice and a few silvery scales off chin...] Mathglot (talk) 02:38, 11 May 2025 (UTC)
- Big 'oops!' on my part. Of course I meant article, thanks for the correction. Someone trout me! Mathglot (talk) 18:36, 10 May 2025 (UTC)
- Yes, we're up into the region of 1.2 thousand million edits now (specifically, 1,285,082,165). I suspect that Mathglot meant "seven millionth article" when they wrote "seven millionth edit". --Redrose64 🌹 (talk) 13:31, 10 May 2025 (UTC)
- I wonder what % of those articles don't meet the WP:Notability guidelines... Some1 (talk) 14:17, 10 May 2025 (UTC)
- Probably a smaller number than the number of articles that could meet the notability guidelines that don't yet exist, so it should all balance out in some way. CMD (talk) 17:35, 10 May 2025 (UTC)
Any predictions?
[edit]
Anyone want to take a guess at when it will happen? You'll probably at least qualify for the Barnstar of Arbitrary Achievement, and bragging rights (at least, until we get to 8 million). Cast your bets... Mathglot (talk) 03:56, 11 May 2025 (UTC)
- I'll start. 12:53, 5 June 2025 (UTC) – that's my guess! Mathglot (talk) 04:07, 11 May 2025 (UTC)