Wikipedia talk:WikiProject AI Cleanup

This is the talk page for discussing WikiProject AI Cleanup and anything related to its purposes and tasks.

Put new text under old text. Click here to start a new topic.
New to Wikipedia? Welcome! Learn to edit; get help.

Archives: 1, 2: 30 days

To help centralize discussions and keep related topics together, all non-archive subpages of this talk page redirect here.

This page has been mentioned by multiple media organizations:

Maiberg, Emanuel (9 October 2024). "The Editors Protecting Wikipedia from AI Hoaxes". 404 Media. Retrieved 9 October 2024.
Nine, Adrianna (9 October 2024). "People Are Stuffing Wikipedia with AI-Generated Garbage". ExtremeTech. Retrieved 10 October 2024.
Harrison Dupré, Maggie (10 October 2024). "Wikipedia Declares War on AI Slop". The Byte. Retrieved 10 October 2024.

Good faith use of Gemini

[1] Not sure what to do. Doug Weller talk 16:50, 19 June 2025 (UTC)[reply]

I've told the editor. Doug Weller talk 16:51, 19 June 2025 (UTC)[reply]

Hi Doug,

FWIW, here are the three prompts I used from Gemini 2.5 Flash:

1) Can you generate an updated economic summary using 2024 data for Guyana using the format below, and providing referenced sources for each data point that could be integrated into the Wikipedia page for it located at

https://en.wikipedia.org/wiki/Guyana

2) Can you also provide in Wikipedia format the list of references in your prior answer, also including verified working http links to webpages for each one?

3) Can you

1) find an alternative source than the website tradingeconomics.com for that reference, and if you cannot, remove that data and reference as it is blacklisted by Wikipedia

2) and then provide a combination of the last two answers as a single body of Wikipedia text markup , modeled on the format below, but integrating the data you have just collated in the past two answers. Please double check that both the data and coding for Wikipedia markup are accurate.

And then I made hand-tweaks of a few things that weren't perfect.

Is there a Wikipedia good-faith-AI crew collating efforts like this?

It makes no sense to have the world's data centers regenerating the same kinds of outputs afresh when efforts could be strategically coordinated to flow the data to Wikipedia (among those inclined to do so).... Vikramsurya (talk) 17:02, 19 June 2025 (UTC)[reply]

The problem is this, from your edit summary Data needs full verification but preliminary suggests it's accurate. You should only make edits that you have already fully verified are borne out by the sources, not just a vague suggestion that they're probably accurate. There are also three random inline citations on a line by themselves after the Imports bullet, and there's something wrong with the formatting of ref 57. Cheers, SunloungerFrog (talk) 17:25, 19 June 2025 (UTC)[reply]

PPP sources are broken, the sites list the data as being both for Guyana and Chad. Under "arable land" the hectare claim is not found in the source. Under "labor force participation" the rate in the source is 49.6%, not 56.4%. Under "industrial production" neither source mentions crude petroleum, gold, timber, or textiles.

The model's output can be characterized as "subtly wrong", this is par. fifteen thousand two hundred twenty four (talk) 19:15, 19 June 2025 (UTC)[reply]

AI hallucinating? Doug Weller talk 19:39, 19 June 2025 (UTC)[reply]

Possibly some hallucination, but sourcing misattribution has certainly occurred, which can be viewed as better or worse. The arable land claim of 420,000 hectares (but not "more than") is the exact figure in Wolfram's database, but the prompt requested "working http links to webpages", so the model's pattern contained a link, even if wrong. fifteen thousand two hundred twenty four (talk) 04:39, 20 June 2025 (UTC)[reply]

Misattribution and hallucination are really the same issue, the AI is finding words and numbers that fit the pattern it develops. CMD (talk) 05:31, 20 June 2025 (UTC)[reply]

I have a question - when did you think the verification by other editors would occur? If I was watching the page and started to check and found more than a couple of errors, I would just revert the whole edit with a request not to submit error-strewn material. Why? Because I would judge that the edit overall could not be trusted if there were already this many faults and I wasn't going to waste my time looking further. This is something that happens all the time: we are all volunteers who shouldn't be making work for each other like this. That doesn't mean using an LLM is bad. It's saved you time doing some of the formatting. That frees you up to do what the LLMs are bad at, which is fine-grained fact-checking of reliable sources. OsFish (talk) 05:44, 20 June 2025 (UTC)[reply]

Royal Gardens of Monza

I'm not super familiar with the process here, but Royal Gardens of Monza seems like it might be AI generated to me - two of the books it cites have ISBNs with invalid checksums, the third doesn't seem to resolve to an actual book anyways, it cites dead URLs despite an access date of yesterday, and uses some invalid formatting in the "Design and features" heading. The author has also had a draft declined at AFC for being LLM-generated before. ScalarFactor (talk) 23:07, 21 June 2025 (UTC)[reply]

You are correct. I've draftified and tagged the article, left notices on the draft and creator's talk pages, and notified the editor who accepted the draft at AfC. I think Fazzoacqua100's other articles should be reviewed for similar issues. fifteen thousand two hundred twenty four (talk) 01:22, 22 June 2025 (UTC)[reply]

Their other submissions and drafts have now been reviewed, draftified, and had notices posted where appropriate. Thank you @ScalarFactor for posting here. fifteen thousand two hundred twenty four (talk) 04:40, 22 June 2025 (UTC)[reply]

No problem - thanks for dealing with the cleanup. ScalarFactor (talk) 05:15, 22 June 2025 (UTC)[reply]

More signs of LLM use from my recent AfC patrolling

For the past month I've been participating in the WP:AFCJUN25 backlog drive, and oh man, I've been finding a LOT of AI slop in the submission queue. I've found a few more telltale signs of LLM use that should probably be added to WP:AICATCH:

(oh god, these bulleted lists are exactly the sort of thing ChatGPT does...)

Red links in the See also section — often these are for generic terms that sound like they could be articles. Makes me wonder if an actually practical use of ChatGPT would be to suggest new article titles... as long as you write the article in your own words. I'm just spitballing here.
Fake categories, i.e. red links that sound plausible, but don't currently exist in our category system.
Thin spaces? Maybe? I've been encountering a surprisingly high number of Unicode thin space characters, and I'm wondering if there's some chatbot that tends to use them in their output, because I don't know of any common keyboard layouts that let you type them (aside from custom layouts like the one I use, but it seems vanishingly unlikely that some random user with 2 edits is using one of those).

Anyone got any more insights on any of these? —pythoncoder (talk | contribs) 21:05, 30 June 2025 (UTC)[reply]

Forgot to link a thin space example: Draft:Independent National Electoral and Boundaries Commission (Somalia)

Another sign I just found: Draft:Opaleak has a bunch of text like :contentReference[oaicite:3]{index=3} in place of references. —pythoncoder (talk | contribs) 21:11, 30 June 2025 (UTC)[reply]

@Pythoncoder Could you note where the thins paces are in that example? CMD (talk) 02:35, 1 July 2025 (UTC)[reply]

Just double-checked and it looks like they're actually narrow nonbreaking spaces (U+202F) — copy and paste into your find-and-replace dialog: > <

They appear twice here: "On 15 April 2025, INEBC rolled out..." and "unanimously adopted Law No. 26 on 16 November 2024." —pythoncoder (talk | contribs) 02:50, 1 July 2025 (UTC)[reply]

Another one: excessive use of parentheses any time a term with an acronym show up, even if the acronym in the parentheses is never used again in the article. Sometimes it even does it twice: Draft:Saetbyol-4 —pythoncoder (talk | contribs) 19:15, 8 July 2025 (UTC)[reply]

ChatGPT likes to generate malformed AfC templates (which breaks the submission and automatically creates a broken Decline template).

An examples of this..

{{Draft topics|biography|south-asia}}
:{{AfC topic|other}}
:{{AfC submission|||ts=20250708193354|u=RsnirobKhan|ns=2}}
:{{AFC submission|d|ts=2025-06-07T00:00:00Z}}
:{{AFC submission|d|ts=19:32, 8 July 2025 (UTC)}}

qcne (talk) 19:40, 8 July 2025 (UTC)[reply]

LLM-translated articles in need of review

By https://oka.wiki - an organisation that is open and working in good faith, but also extremely into its LLMs. List here - David Gerard (talk) 21:48, 30 June 2025 (UTC)[reply]

Can you point to a example, it's a lot of articles. Sohom (talk) 03:02, 1 July 2025 (UTC)[reply]

These are, as far as I am aware, translated by editors with dual fluency. All go through AfC and are tagged as necessary by AfC reviewers. @David Gerard, do you have any specific problems with any of them? If so, please do raise them (and maybe also with the AfC reviewer), but in general I believe these aren't any more of an issue than any other translated article. -- asilvering (talk) 03:45, 1 July 2025 (UTC)[reply]

User:Jessephu consistently creating LLM articles

hello, Jessephu has already made articles flagged as AI, which is how i spotted this- see Childbirth in Nigeria and Draft:Olanrewaju Samuel. however, this exact same unusual bullet-point style is seen in many of the articles he created, including but not limited to Cancer care in Nigeria, this revision and Neglected tropical diseases in Nigeria, this revision. he's been doing this for a while now for a lot of articles. ceruleanwarbler2 (talk) 13:33, 1 July 2025 (UTC)[reply]

For the sake of transparency, this editor asked me on Tumblr what should be done about this situation, and I told her that she could report it to this noticeboard (and clarified that the report would not be seen as casting aspersions). Chaotic Enby (talk · contribs) 13:52, 1 July 2025 (UTC)[reply]

@Ceruleanwarbler2@ Jessephu (talk) 17:54, 1 July 2025 (UTC)[reply]

Alright duely noted and thanks for bringing this up

I understand the concern regarding the formatting style and the tagged AI related article. I ackwnoledge that in some of my previous articles I used the bullet point format as a way of organising my article clearly but after this review I will surely work on that.

If there is any area my edits has fallen short, I sincerely apologise and will make nessesary corrections. I appreciate your feedback Jessephu (talk) 18:01, 1 July 2025 (UTC)[reply]

The bullet-point format, while not ideal, is not the main issue at hand – your response doesn't answer the question of whether you were using AI or not. While that is not explicitly disallowed either, it is something that you should ideally be transparent about, especially given the editorializing and verifiability issues in some of your articles. Chaotic Enby (talk · contribs) 18:30, 1 July 2025 (UTC)[reply]

Thank you for the feedback. Yes, I use AI to sometimes assist with drafting, but I do make sure to review and edit the content to ensure accuracy. Jessephu (talk) 03:07, 2 July 2025 (UTC)[reply]

You created National Association of Kwara State Students on 21 April. The "Voice of Nigeria" source 404s, the "KSSB:::History" source is repeated twice for for separate claims and fails to support either, the "Ibrahim Wakeel Lekan 'Hon. Minister' Emerges as NAKSS President" source also does not support the accompanying text. Neither of the two provided sources support the subjects notability. The article is unencyclopedic in tone and substance, and is written like an essay. I have serious doubts concerning your claim that you review content for accuracy and have draftified that article. fifteen thousand two hundred twenty four (talk) 07:57, 2 July 2025 (UTC)[reply]

i do make sure to review....... But the ones mentioned here could be a mistake from my end, currently going through articles listed here to correct errors. Will do well to strictly cross check thoroughly. Jessephu (talk) 08:10, 2 July 2025 (UTC)[reply]

I admit I might have done somethings wrongly..... sincerely apologise will work on them now Jessephu (talk) 08:12, 2 July 2025 (UTC)[reply]

i checked now and one of the reason i used the "KSSB:::History" source is to cite the association role in advocating for kwara state student affairs.

regardless i am sorry, still on other articles to make necessary adjustments. Jessephu (talk) 08:31, 2 July 2025 (UTC)[reply]

Discussion about CzechJournal at RSN

There's a discussion about the reliability of CzechJournal at RSN that could use additional opinions from editors with LLM knowledge. See WP:RSN#CzechJournal in articles about AI (or in general). -- LCU ActivelyDisinterested «@» °∆t° 20:10, 3 July 2025 (UTC)[reply]

Yaswanthgadu.21 - stub expansion using LLM

I came across a supposed stub expansion to an article on my watchlist, Formby Lighthouse. It seemed to be largely generated by LLM, with all its accompanying problems (flowery text, content not supported by sources, etc.), so I reverted it.

It seems that the user in question, Yaswanthgadu.21 may have done this for other stub articles, as part of Wikipedia:The World Destubathon. I don't have the time at present to look into this further, but if others had the opportunity, that would be helpful. On the face of it, their additions to Three Cups, Harwich look similarly dubious, and they have destubbed a bunch of other articles. Cheers, SunloungerFrog (talk) 05:59, 6 July 2025 (UTC)[reply]

Hey SunloungerFrog,

Just wanted to quickly explain the process I’ve been following: I usually start by Googling for sources based on the requirement. I read through them once, pick out key points or keywords, and then rewrite the content in my own words. After that, I use ChatGPT or other LLM to help refine what I’ve written and organize it the way I want. I also provide the source links at that stage. Once the content is cleaned up, I move it over to Wikipedia.

Since everything was based on the links I gave, I assumed nothing unrelated or unsourced was getting in. But after your observation, I decided to test it. I asked GPT, “Where did this particular sentence come from? Is it from the data I gave you?” and it replied, “No, it’s not from the data you provided.” So clearly, GPT can sometimes introduce its own info beyond what I input.

Thanks again for pointing this out. I’ll go back and review the articles I’ve worked on. If I find anything that doesn’t have a solid source, I’ll either add one or remove the sentence. I’d appreciate it if I could have two weeks to go through everything properly. Yaswanthgadu.21 (talk) 07:52, 6 July 2025 (UTC)[reply]

I'll be blunt: it would be far preferable if you self-reverted all the edits you've made in this way, and started from scratch, because then you know you can be confident in the content, language and sourcing. Please do that instead. Cheers, SunloungerFrog (talk) 08:47, 6 July 2025 (UTC)[reply]

I agree. Reverting all of the edits you made in this way and redoing them by hand would be preferable on every level. If you want to organize your writing the way you want, organize it yourself. Stepwise Continuous Dysfunction (talk) 16:35, 6 July 2025 (UTC)[reply]

ISBN checksum

I just found what appears to be an LLM-falsified reference which came to my attention because it raised the citation error "Check |isbn= value: checksum", added in Special:Diff/1298078281. Searching shows some 300 instances of this error string; it may be worth checking whether others are equally bogus. —David Eppstein (talk) 06:43, 6 July 2025 (UTC)[reply]

Could be added to Wikipedia:WikiProject AI Cleanup/AI catchphrases. —CX Zoom[he/him] ^{(let's talk • {C•X})} 19:27, 10 July 2025 (UTC)[reply]

I've added it. Ca ^{talk to me!} 02:42, 17 July 2025 (UTC)[reply]

Looks good, made some minor changes for ce and to swap links for projectspace articles like WP:ISBN and WP:DOI since they have more relevant information for editors. Feel free to switch them back if you like. fifteen thousand two hundred twenty four (talk) 03:12, 17 July 2025 (UTC)[reply]

Discussion at WP:Village pump (idea lab) § Finding sources fabricated by AI

You are invited to join the discussion at WP:Village pump (idea lab) § Finding sources fabricated by AI, which is within the scope of this WikiProject. SunloungerFrog (talk) 16:58, 6 July 2025 (UTC)[reply]

User:Yunus_Abdullatif

User:Yunus_Abdullatif has been expanding dozens of stub articles for the last few weeks obviously using AI. For example, their edits include capitalization and quoting that does not follow the style guideline, duplicate references, and invalid syntax. 2001:4DD4:17D:0:DA74:25C:8189:4830 (talk) 07:35, 7 July 2025 (UTC)[reply]

Possible new idea for WP:AITELLS: non-breaking spaces in dates

Over the past few weeks, I've been noticing a ton of pages showing up in Category:CS1 errors: invisible characters with non-breaking spaces in reference dates (also causing CS1 date errors). I've been trying to figure out where these are coming from, and I'm leaning towards it being another AI thing -- see this draft, which has various other AI hallmarks. Jay8g [V•T•E] 20:36, 7 July 2025 (UTC)[reply]

For the interested

A German newspaper [2] had an AI/human team check articles on German WP, and found that there are many WP-articles that contain errors and have outdated information, and the number of editors are not that many. Apparently this didn't use to be the case, unclear when it changed.^[sarcasm]

Anyway, this was interesting:

"Can artificial intelligence replace the online encyclopedia? Not at the moment. The FAS study also shows this: When Wikipedia and artificial intelligence disagreed, the AI wasn't more often right than Wikipedia. Sometimes, the AI even correctly criticized a sentence, but also provided false facts itself. That's why human review was so important. At the same time, most AI models are also trained on Wikipedia articles. The AI has therefore very likely overlooked some errors because it learned inaccurate information from Wikipedia." Gråbergs Gråa Sång (talk) 09:47, 8 July 2025 (UTC)[reply]

Wikipedia_talk:WikiProject_AI_Cleanup/Archive_2#WP:LLMN?, again

This discussion wasn't very conclusive, but it seems clear this page is the closest to an LLM noticeboard we have atm. So, I made a couple or redirects, WP:LLMN and Wikipedia:Large language models/Noticeboard, and added this page to Template:Noticeboard links. We'll see what happens. Gråbergs Gråa Sång (talk) 14:18, 8 July 2025 (UTC)[reply]

Looks good to me. Thanks for adding the link and redirects. — Newslinger talk 15:14, 8 July 2025 (UTC)[reply]

Possible disruptive LLM usage by User:Pseudopolybius

I'm not sure if this is the right place to report this kind of thing.

I started working on a section of Long Peace until I realized the whole article has been totally transformed in the last few months, mostly by one extremely fast editor, User:Pseudopolybius. Their contributions to the article include the following nonsense: "The Coming War with Japan will be followed by The Coming Conflict with China who are locked in the Thucydides Trap and The Jungle Grows Back, While America Sleeps."

Looks like the work of an LLM to me. Also, this user has been warned three times for using copyrighted content. Apfelmaische (talk) 19:42, 8 July 2025 (UTC)[reply]

I've just reverted the article. Apfelmaische (talk) 20:13, 8 July 2025 (UTC)[reply]

"Nonsense" makes perfect sense, see the Talk:Long Peace For this misunderstanding Apfelmaische reverted the article.--Pseudopolybius (talk) 22:40, 8 July 2025 (UTC)[reply]

I was mistaken. Sorry! Apfelmaische (talk) 23:44, 8 July 2025 (UTC)[reply]

I marked Wikipedia:WikiProject AI Cleanup/AI catchphrases as complete.

I filled in all the incomplete entries, added some new ones, and expanded explanations. After a year and a half, I marked our core guidance page Wikipedia:WikiProject AI Cleanup/AI catchphrases as complete. Feel free to expand it with new entries if you notice new characteristics of AI writing. Ca ^{talk to me!} 13:00, 10 July 2025 (UTC)[reply]

@Ca I've added a couple of examples I've come across in my AfC work. A thought: the drafts linked as examples will be deleted under G13 in six months- should we take a copy as a subpage under this project? qcne (talk) 15:32, 12 July 2025 (UTC)[reply]

I think that's a good idea! It would be useful to have a corpus of LLM text examples. Ca ^{talk to me!} 15:46, 12 July 2025 (UTC)[reply]

Move proposal

The following is a closed discussion of a requested move. Please do not modify it. Subsequent comments should be made in a new section on the talk page. Editors desiring to contest the closing decision should consider a move review after discussing it on the closer's talk page. No further edits should be made to this discussion.

The result of the move request was: Moved. It's WP:SNOWing. (non-admin closure) TarnishedPath^talk 15:45, 17 July 2025 (UTC)[reply]

Wikipedia:WikiProject AI Cleanup/AI catchphrases → Wikipedia:Signs of AI writing

– The word "Catchphrases" insinuate that the page contains specific phrases or words that can catch AI-writing which were true in the essay's inception but is no longer true in its current form; the entries are too broad and wide-reaching to fit the definition. Ca ^{talk to me!} 13:11, 10 July 2025 (UTC)[reply]

Support. I prefer "LLM" over "AI", but with a project name of "AI Cleanup" its not something I'm going to get hung up on. If the move is accepted I suggest that the displayed shortcut WP:AICATCH be switched for a new WP:LLMSIGNS or WP:LLMTELLS shortcut, and WP:AIC/C should be switched for WP:AIC/S as well. fifteen thousand two hundred twenty four (talk) 13:26, 10 July 2025 (UTC)[reply]

Support. I also prefer LLM but the AfC template already uses "AI" and I think it's the more common phrasing. qcne (talk) 13:30, 10 July 2025 (UTC)[reply]

Support, and thanks a lot for your work on it! Chaotic Enby (talk · contribs) 15:41, 10 July 2025 (UTC)[reply]

You're welcome! I want to also credit User:MrPersonHumanGuy and User:Newslinger, who has done tremendous work expanding the initial essay. Ca ^{talk to me!} 17:18, 10 July 2025 (UTC)[reply]

Support and thanks. -- LWG ^talk 15:47, 10 July 2025 (UTC)[reply]

Support as the page also lists punctuation and broken formatting. The current title presumably intends catchphrase as "a signature phrase spoken regularly by an individual", though, rather than "a phrase with which to catch someone". Belbury (talk) 16:01, 10 July 2025 (UTC)[reply]

Support. I'm glad to see this essay graduate from the development stage. I have a weak preference for "LLM" in the title, as it would be more specific than "AI". — Newslinger talk 17:29, 10 July 2025 (UTC)[reply]

Support per nom. Paprikaiser (talk) 20:18, 10 July 2025 (UTC)[reply]

Support - I don't know that we need to specify "LLM", since "AI writing" is ubiquitous with LLMs and probably more recognizable to editors who are not familiar with technical terminology surrounding generative AI. - ZLEA _T\^C 20:24, 10 July 2025 (UTC)[reply]

Support per above. No opinion on LLM or AI. —CX Zoom[he/him] ^{(let's talk • {C•X})} 20:55, 10 July 2025 (UTC)[reply]
Support what appears to be a SNOW-able MR. Guettarda (talk) 23:11, 10 July 2025 (UTC)[reply]

I hate to be contrarian, because obviously moving the page is correct, but I am opposing over the "AI" vs "LLM" split. While referring to them as AI is indeed commonplace in journalism, scholarly sources tend to prefer referring to generative tools by the underlying technology,^[1]^[2]^[3] meaning in a technical discussion of their behavior it's perhaps better to use the latter phrase.

This has less to do with any Wikipedian rationale, but I want to point out that we are unfortunately colluding with the marketing of these things by referring to them with such a high-prestige term. People come to this site every day and in good faith make use of LLMs on the understanding that they are intelligent and potentially smarter than them, when they are not. The language we use on the site should reflect the fact that we address these things as tools, and agree with the scholarly (and Wikipedian) consensus that these things are generally unreliable when not deeply scrutinized.

Obviously the fate of the universe doesn't rest on the name of this one Wikipedia page. I just want everyone who feels apathetic about the name change to understand the subtext and how we're deviating from academic terminology and replacing it with a trendier term born out of a speculative market, which may in time become seen ubiquitously as inaccurate. Altoids0 (talk) 04:24, 12 July 2025 (UTC)[reply]

Although I agree with changing the page's title to something else, I also think Wikipedia:Signs of LLM use would be a better title than Wikipedia:Signs of AI writing. – MrPersonHumanGuy (talk) 10:57, 12 July 2025 (UTC)[reply]

References (move proposal)

References

^ "Pay Less Attention to Deceptive Artifacts: Robust Detection of Compressed Deepfakes on Online Social Networks". arXiv. doi:10.48550/arXiv.2506.20548.
^ "LLM-based NLG Evaluation: Current Status and Challenges". Computational Linguistics. doi:10.1162/coli_a_00561.
^ "A Survey on LLM-Generated Text Detection: Necessity, Methods, and Future Directions". Computational Linguistics. doi:10.1162/coli_a_00549.

The discussion above is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.

Discussion at Wikipedia:Edit filter noticeboard § Edit filters related to logging and blocking AI edits

You are invited to join the discussion at Wikipedia:Edit filter noticeboard § Edit filters related to logging and blocking AI edits. –Novem Linguae (talk) 05:34, 11 July 2025 (UTC)[reply]

New edit filters

After a few days of discussion at Wikipedia:Edit filter noticeboard and Wikipedia:Edit filter/Requested, we now have two new AI-related edit filters, and a big update to an existing one!

Special:AbuseFilter/1346 now catches more text from LLM-generated citations, such as oai_citation, contentReference and turn0search0.
Special:AbuseFilter/1369 looks for Markdown-formatted text, which is natively generated by LLMs and often directly copy-pasted.
Special:AbuseFilter/1370 logs spurious actions related to AfC templates, such as "fake declines" sometimes generated alongside drafts.

Chaotic Enby (talk · contribs) 22:48, 16 July 2025 (UTC)[reply]

Thanks for doing the groundwork to get these filters up and running. With limited volunteer time, we need automated tools like these to help address an automated problem. — Newslinger talk 12:47, 17 July 2025 (UTC)[reply]

Idea lab: New CSD criteria for LLM content

There have been multiple proposals for a new CSD criteria for patently LLM-generated articles [3], but they failed gain much traction due to understandable concerns about enforce-ability and redundancy with WP:G3.

This time, I thinking of limiting the scope to LLM-generated text that were obviously not reviewed by a human. The criteria could include some of the more surefire WP:AITELLS such as collaborative communication and non-existent references, which would have been weeded out if reviewed by a human. I think it would help to reduce the high bar set by WP:G3 (hoax) criteria and provide guidance on valid ways of detecting LLM generations and what is and is not valid use of LLMs.

Here is my rough draft of the above idea; feedback is welcome.

A12. LLM-generated without human review

This applies to any article that obviously indicates that it was generated by a large language model (LLM) and no human review was done on the output. Indicators of such content include collaborative communication (e.g. "I hope this helps!"), non-existent references, and implausible citations (e.g. source from 2020 being cited for a 2022 event). The criterion should not be invoked merely because the article was written with LLM assistance or because has reparable tone issues.

{{Db-a12}}, {{Db-ai}}, {{Db-llm}}

Ca ^{talk to me!} 00:50, 18 July 2025 (UTC) Update: I have posted a revised version below based on feedback. 15:59, 19 July 2025 (UTC)[reply]

Oppose. This is very vague and would see a lot of disagreement based on differing subjective opinions about what is and isn't LLM-generated, what constitutes a "human review" and what "tone issues" are repairable. Secondly, what about repairable issues that are not related to tone?

I could perhaps support focused, objective criteria that cover specific, identifiable issues, e.g. "non-existent or implausible citations" rather than being based on nebulous guesses about the origin (which will be used to assume bad faith of the contributor, even if the guess was wrong). Thryduulf (talk) 01:21, 18 July 2025 (UTC)[reply]

If it's limited to only cases where there is obvious WP:AITELLS#Accidental disclosure or implausible sources it could be fine. Otherwise I agree with Thryduulf with the vagueness; an editor skimming through the content but not checking any of the sources counts as a "human review". And sources that may seem non-existent at first glance might in fact do exist. I think the "because has reparable tone issues" should go as well since if it's pure LLM output, we don't want it even if the tone is fine. Jumpytoo Talk 04:33, 18 July 2025 (UTC)[reply]

Ca, I am very supportive of anything that helps reduce precious editor time wasted on content generated by LLMs that cannot be trusted. For a speedy deletion criteria, I think that we would need a specific list of obvious signs of bad LLM generation, something like:

collaborative communication
- for example, "I hope this helps!"
knowledge-cutoff disclaimers
- for example, "Up to my last training update"
prompt refusal
- for example, "As a large language model, I can't..."
non-existent / invented references
- for example, books whose ISBNs raise a checksum error, unlisted DOIs
implausible citations
- for example, a source from 2020 being cited for a 2022 event

And only those signs may be used to nominate for speedy deletion. Are there others? Maybe those very obvious criteria that are to be used could be listed at the top of WP:AISIGNS rather than within the CSD documentation, to allow for future updating.

The other thing that comes to mind with made-up sources or implausible citations is, how many of them must there be to qualify for speedy deletion? What if only one out of ten sources was made up? Cheers, SunloungerFrog (talk) 09:48, 18 July 2025 (UTC)[reply]

Regarding the number of sources, I don't think it matters – editors are expected to have checked all the sources they cite, and using AI shouldn't be an excuse to make up sources. If even one source is made up, we can't guarantee that the other sources, even if they do exist, support all the claims they are used for. Chaotic Enby (talk · contribs) 10:06, 18 July 2025 (UTC)[reply]

I'd be very happy with that. I only mentioned it because I imagine there might be a school of thought that would prefer more than one source to be made up, to cement the supposition that the article is an untrustworthy LLM generation. Cheers, SunloungerFrog (talk) 11:21, 18 July 2025 (UTC)[reply]

If someone deliberately makes up an entire source, that's just as much of an issue in my opinion. In both cases, all the sources will need to be double-check as there's no guarantee anymore that the content is in any way consistent with the sources. I wouldn't be opposed to expanding G3 (or the new proposed criterion) to include all cases of clear source fabrication by the author, AI or not. Chaotic Enby (talk · contribs) 11:42, 18 July 2025 (UTC)[reply]

I would also support it, but only for issues that can only plausibly be generated by LLMs and would have been removed by any reasonable human review. So, stylistic tells (em-dashes, word choices, curly apostrophes, Markdown) shouldn't be included.

It is reasonably plausible that an editor unfamiliar with the MOS would try to type Markdown syntax or curly apostrophes, or keep them in an AI output they double-checked. It is implausible that they would keep "Up to my last training update".

I would also tend to exclude ISBN issues from the list of valid reasons, as it is possible that an ISBN might be mistyped by a honest editor, or refer to a different edition. However, if the source plainly doesn't exist at all, it should count. Editors should cross-check any AI-generated output to the sources it claims to have used. Chaotic Enby (talk · contribs) 10:04, 18 July 2025 (UTC)[reply]

The main issue with strict tells is that they may change over time as llms update. They'll probably change at a slow enough rate and within other factors that means editors would be able to stay mostly abreast of them, but I'm not sure CSD criteria could keep up. What may help with or without a CSD is perhaps a bit of expansion at the WP:TNT essay on why llm-generated articles often need to be TNTed, which helps make clear the rationale behind any PROD, CSD, or normal MFD. CMD (talk) 10:20, 18 July 2025 (UTC)[reply]

I think lot of the WP:TNT-worthy AI issues (dead on arrival citations, generic truthy content attached to unrelated citations, malformed markup, etc) can be addressed by just removing the AI content, then seeing if the remaining content is enough to save the article from WP:A3/WP:A7/etc. -- LWG ^talk 16:16, 18 July 2025 (UTC)[reply]
If the article is generated by AI, then it is all AI content. Removing the AI content would be TNT. CMD (talk) 16:57, 18 July 2025 (UTC)[reply]
The ideal procedure on discovering something like this is:
Remove all the actively problematic content that can only be fixed by removal (e.g. non-existent and/or irrelevant citations)

Fix and/or remove any non-MediaWiki markup

Evaluate what remains:
If it is speedily deletable under an existing criterion (A1, A3, A7/A9, A11 and G3 are likely to be the most common), then tag it for speedy deletion under the relevant criterion

If it would be of benefit to the project if cleaned up, then either clean it up or mark it for someone else to clean up.

If it isn't speedily deletable but would have no value to the project even if cleaned up, or TNT is required then PROD or AfD.
If there are a lot of articles going to PROD or AfD despite this then propose one or more new or expanded CSD criteria at WT:CSD that meets all four of the requirements at WP:NEWCSD. In all of this it is important to remember that whether it was written by AI or not is irrelevant - what matters is whether it is encyclopaedic content or not. Thryduulf (talk) 18:58, 18 July 2025 (UTC)[reply]
But I think that whether it's written by AI is relevant. On an article written by a human, it's reasonable to assume good faith. On an article written by an AI, one cannot assume good faith, because they are so good at writing convincing sounding rubbish, and so, e.g., the job of an NPP reviewer is hugely disproportionately more work, to winkle out the lies, than it took the creating editor in the first place to type a prompt into their LLM of choice. And that's the insidious bit, and why we need a less burdensome way to deal with such articles. Cheers, SunloungerFrog (talk) 19:16, 18 July 2025 (UTC)[reply]
If you are assuming anything other than good faith then you shouldn't be editing Wikipedia. If the user is writing in bad faith there will be evidence of that (and using an LLM is not evidence of any faith, good or bad) and so no assumptions are needed. Once text has been submitted there are exactly three possibilities:
The text is good and encyclopaedic how it is. In this situation it's irrelevant who or what wrote it because it's good and encyclopaedic.

The text needs some cleanup or other improvement but it is fundamentally encyclopaedic. In this situation it's irrelevant who or what wrote it because, when the cleanup is done (by you or someone else, it doesn't matter) it is good and encyclopaedic.

The text, even if it were cleaned up, would not be encyclopaedic. In this situation it's irrelevant who wrote it because it isn't suitable for Wikipedia either way. Thryduulf (talk) 19:38, 18 July 2025 (UTC)[reply]

I agree with your core point that content problems, not content sources, are what we should be concerned about, and my general approach to LLM content is what you described as the ideal approach above, but I would point out that assumption of good faith can only be applied to a human. In the context of content that appears to be LLM-generated, AGF means assuming that the human editor who used the LLM reviewed the LLM content for accuracy (including actually reading the cited sources) before inserting it in the article. If the LLM text has problems that any human satisfying WP:CIR would reasonably be expected to notice (such as the cited sources not existing or being irrelevant to the claims), then the fact that those problems weren't noticed tells me that the human didn't actually review the LLM content. Once I no longer have reason to believe that a human has reviewed a particular piece of LLM content, I have no reason to apply AGF to that content, and my presumption is that such content fails WP:V, especially if I am seeing this as a pattern across multiple edits for a given article or user. -- LWG ^talk 20:05, 18 July 2025 (UTC)[reply]
assumption of good faith can only be applied to a human - exactly, and I'm always delighted to apply AGF to fellow human editors. But not to ChatGPT or Copilot, etc. Cheers, SunloungerFrog (talk) 20:18, 18 July 2025 (UTC)[reply]
We have seen plenty of instances of good faith users generating extremely poor content. Good faith isn't relevant to the content, it's relevant to how the content creator (behind the llm, not the llm itself) is addressed. CMD (talk) 14:41, 19 July 2025 (UTC)[reply]

You should not be applying faith of any sort (good, bad, indifferent it doesn't matter) to LLMs because they are incapable of contributing in any faith. The human who prompts the LLM and the human who copies the output to Wikipedia (which doesn't have to be the same human) have faith, but that faith can be good or bad. Good content can be added in good or bad faith, bad content can be added in good or bad faith. Thryduulf (talk) 18:36, 19 July 2025 (UTC)[reply]
Support for articles composed of edits with indicators that are very strongly associated with LLM-generated content, such as the ones listed in WP:AISIGNS § Accidental disclosure and WP:AISIGNS § Markup. I would also apply the criterion to less obvious hoax articles that cite nonexistent sources or sources that do not support the article content, if the articles also contain indicators that are at least moderately associated with LLM-generated content, such as the ones listed in WP:AISIGNS § Style. — Newslinger talk 21:34, 18 July 2025 (UTC)[reply]
Support: Using a model to generate articles is fast, reviewing and cleaning it up is slow. This asymmetry in effort is a genuine problem which this proposal would help address. There is also a policy hole of sorts: An unreviewed generated edit with fatal flaws made to an existing article can be reverted, placing the burden to carefully review and fix the content back on the original editor. An unreviewed generated edit with fatal flaws made to a new page cannot. Promo gets G11, I don't see why this shouldn't get a criteria also.

I also support the distinction that Chaotic Enby has made that candidate edits should be ones "that can only plausibly be generated by LLMs and would have been removed by any reasonable human review". fifteen thousand two hundred twenty four (talk) 23:21, 18 July 2025 (UTC)[reply]

Also adding that assessing whether an article's prose is repairable or not, in the context of G11, is also a judgement call to some extent. So I don't believe that deciding whether issues are repairable should be a complete hurdle to a new criterion, although I still prefer to play it safe and restrict it to my stricter distinction above. Chaotic Enby (talk · contribs) 23:36, 18 July 2025 (UTC)[reply]

Agreed, and its not just G11 that requires judgement: G1, G3, G4 and G10 all do to differing extents. Good luck to anybody who tries to rigorously define what "sufficiently identical" means for G4. fifteen thousand two hundred twenty four (talk) 23:51, 18 July 2025 (UTC)[reply]

RfC workshop

Thanks for all the feedback! I have created a revised criteria with areas of vagueness ironed out and incorporating wordings proposed by User:Chaotic Enby and User:SunloungerFrog. I hope to finalize the criterion wording before I launch a formal RfC.

A12. LLM-generated without human review

This applies to any article that exhibits one or more of the following signs which indicate that the article could only plausibly have been generated by Large Language Models (LLM)^[1] and would have been removed by any reasonable human review:^[2]

Communication intended for the user: This may include collaborative communication (e.g., "Here is your Wikipedia article on..."), knowledge-cutoff disclaimers (e.g., "Up to my last training update ..."), self-insertion (e.g., "as a large language model"), and phrasal templates (e.g., "Smith was born on [Birth Date].")
Implausible non-existent references: This may include external links that are dead on arrival, ISBNs with invalid checksums, and unresolvable DOIs. Since humans can make typos and links may suffer from link rot, a single example should not be considered definitive. Editors should use additional methods to verify whether a reference truly does not exist.
Nonsensical citations: This may include citations of incorrect temporality (e.g a source from 2020 being cited for a 2022 event), DOIs that resolve to completely unrelated content (e.g., a paper on a beetle species being cited for a computer science article), and citations that attribute the wrong author or publication.

In addition to the clear-cut signs listed above, there are other signs of LLM writing that are more subjective and may also plausibly result from human error or unfamiliarity with Wikipedia's policies and guidelines. While these indicators can be used in conjunction with more clear-cut indicators listed above, they should not, on their own, serve as the sole basis for applying this criterion.

This criterion only applies to articles that would need to be fundamentally rewritten to remove the issues associated with unreviewed LLM-generated content. If only a small portion of the article exhibits the above indicators, it is preferable to delete the offending portion only.

{{Db-a12}}, {{Db-ai}}, {{Db-llm}}
Category:Candidates for speedy deletion as unreviewed LLM-generated content (0)

References

^ The technology behind AI chatbots like ChatGPT and Google Gemini
^ Here, "reasonable human review" means that a human editor has 1) thoroughly read and edited the LLM-generated text and 2) verified that the generated citations exist and verify corresponding content. For example, even a brand new editor would recognize that a user-aimed message like "I hope this helps!" is wholly inappropriate for inclusion if they had read the article carefully. See also Wikipedia:Large language models.

To notify: WP:NPP, WP:AFC, WP:LLMN, T:CENT, WP:VPP, WT:LLM

— Preceding unsigned comment added by Ca (talk • contribs) 16:05, 19 July 2025 (UTC)[reply]

Discussion

I don't agree with the last section requiring articles need to be "fundamentally rewritten to remove the issues associated with unreviewed LLM-generated content", it largely negates the utility of the criteria. If there are strong signs that the edits which introduced content were not reviewed, that should be enough, otherwise it is again shifting the burden to other editors to perform review and fixes on what is raw LLM output. A rough alternate suggestion:

"This criterion only applies to articles where, according to the above indicators, a supermajority of their content is unreviewed LLM-generated output. ~~If only a small portion of the article indicates it was unreviewed, it is preferable to delete the offending portion only.~~" (struck as redundant and possibly confusing) fifteen thousand two hundred twenty four (talk) 16:46, 19 July 2025 (UTC)[reply]

I agree that if content shows the fatal signs of unreviewed LLM use listed above then we shouldn't put the onus on human editors to wade through it to see if any of the content is potentially salvageable. If the content is that bad, it's likely more efficient to delete the offending content and rewrite quality content from scratch. So we lose nothing by immediate deletion, and by requiring a larger burden of work prior to nomination we increase the amount of time this bad content is online, potentially being mirrored and contributing to citogenesis. LLM content is already much easier to create and insert than it is to review, and that asymmetry threatens to overwhelm our human review capacity. As one recent example, it took me hours to examine and reverse the damage done by this now-blocked LLM-using editor even after I stopped making any effort to salvage text from them that had LLM indicators. Even though that user wasn't creating articles and therefore wouldn't be touched by this RFC, that situation illustrates the asymmetry of effort between LLM damage and LLM damage control that necessitates this kind of policy action. -- LWG ^talk 17:21, 19 July 2025 (UTC)[reply]

I would also like to suggest an indicator for usage of references that, when read, clearly do not support their accompanying text. I've often found model output can contain references to real sources that are broadly relevant to the topic, but which obviously do not support the information given. An article making pervasive use **Not just** of these — but also — “other common signs” [1], is a very strong indicator of unreviewed model-generated text. Review requires reading sources after all. fifteen thousand two hundred twenty four (talk) 17:27, 19 July 2025 (UTC)[reply]

I agree with the idea of the criterion, although I agree with User:fifteen thousand two hundred twenty four that the burden shouldn't be on the editor tagging the article. It's a question of equivalent effort: if little effort was involved in creating (and not reviewing) the article, then little effort should be expected in cleaning it up before tagging it for deletion. Or, in other words, what can be asserted without evidence can also be dismissed without evidence.

However, I also have an issue with the proposal of only deleting the blatantly unreviewed portions. If the whole article was written at once, and some parts show clear signs of not having been reviewed, there isn't any reason to believe that the rest of the article saw a thorough review. In that case, the most plausible option is that the indicators aren't uniformly distributed, instead of the more convoluted scenario where part of the AI output was well-reviewed and the rest was left completely unreviewed. Chaotic Enby (talk · contribs) 19:06, 19 July 2025 (UTC)[reply]

"I also have an issue with the proposal of only deleting the blatantly unreviewed portions ... " – Agree with this completely. I attempted to address this with my suggestion that "This criterion only applies to articles where, according to the above indicators, a supermajority of their content is unreviewed LLM-generated output." (I've now struck the second maladapted sentence as redundant and possibly confusing.)

It deliberately doesn't ask that indicators be thoroughly distributed or have wide coverage, just that they exist and indicate a majority of the article is unreviewed, aka "the most plausible option" you mention. But the clarity is absolutely lacking and I'm not happy with the wording. Hopefully other editors can find better ways to phrase it. fifteen thousand two hundred twenty four (talk) 19:37, 19 July 2025 (UTC)[reply]

Jimbo Wales' idea on improving ACFH using AI

See User talk:Jimbo Wales#An AI-related idea, if anyone wants to give feedback. qcne (talk) 11:25, 18 July 2025 (UTC)[reply]

"Am I so out of touch? No, it's the children who are wrong." Apocheir (talk) 19:07, 18 July 2025 (UTC)[reply]

New(?) weirdness in `{{Help me}}` requests

For those not familiar with it, the {{Help me}} template can be used to request (sort of) real-time help from editors who watch Category:Wikipedians looking for help or monitor the #wikipedia-en-help IRC channel. In the past 24 hours I've found two requests with a new-to-me pattern that includes what look like variable placeholders $2 and $1 at the start and end of the request: Special:Permalink/1301173503, Special:Permalink/1301272334.

Has anyone else seen this kind of pattern elsewhere? ClaudineChionh (she/her · talk · email · global) 00:56, 19 July 2025 (UTC)[reply]

Interesting, found one more with an opening "$2", but not a closing "$1" at Special:PermanentLink/1301015012#Help me! 2.

Here is the search I used which also finds the two you linked. Unsure what would cause this. fifteen thousand two hundred twenty four (talk) 01:44, 19 July 2025 (UTC)[reply]

Looks like this was an error introduced by @Awesome Aasim in Special:Diff/1300926998 and fixed in Special:Diff/1301447495 – so not an LLM at all. ClaudineChionh (she/her · talk · email · global) 03:36, 20 July 2025 (UTC)[reply]

I was working on the unblock wizard and working on the preloads as fallbacks in case the unblock wizard does not work. If I knew all the links that use the help me preloads I can reinstate my change and update them all to the new format. Alternatively I can create a second preload template with parameters that can be filled in. Aasim (話す（はなす）) 03:58, 20 July 2025 (UTC)[reply]

[1] "Pay Less Attention to Deceptive Artifacts: Robust Detection of Compressed Deepfakes on Online Social Networks". arXiv. doi:10.48550/arXiv.2506.20548.

[2] "LLM-based NLG Evaluation: Current Status and Challenges". Computational Linguistics. doi:10.1162/coli_a_00561.

[3] "A Survey on LLM-Generated Text Detection: Necessity, Methods, and Future Directions". Computational Linguistics. doi:10.1162/coli_a_00549.

[4] The technology behind AI chatbots like ChatGPT and Google Gemini

[5] Here, "reasonable human review" means that a human editor has 1) thoroughly read and edited the LLM-generated text and 2) verified that the generated citations exist and verify corresponding content. For example, even a brand new editor would recognize that a user-aimed message like "I hope this helps!" is wholly inappropriate for inclusion if they had read the article carefully. See also Wikipedia:Large language models.

[1]

[2]

[3]

[1]

[2]

v t e Noticeboards
Wikipedia's centralized discussion, request, and help venues. For a listing of ongoing discussions and current requests, see the dashboard. For a related set of forums which do not function as noticeboards see formal review processes.
General	Administrators Main Incidents Bots Bureaucrats Centralized discussion Closure requests Education Interface admins Main Page errors Open proxies VRT Oversight User permissions
Articles, content	Biographies of living persons Copyrights Questions on media Problems Dispute resolution External links Fringe theories Large language models (LLMs) Neutral point of view Original research Pending changes Reliable sources Resource requests Scalable vector graphics Spam Blacklist Whitelist Style Titleblacklist Translation
Page handling	History merges Mergers Splits Moves Protection Importation XfD Articles Redirects Categories Templates Files Miscellany Undeletion
User conduct	Conflict of interest Contributor copyright Edit warring and 3RR Sanctions Personal restrictions General sanctions Contentious topics Sockpuppets Usernames (Requests for comment) Vandalism
Other	Arbitration Committee noticeboard Requests Enforcement Edit filters Requested False positives Questions Help desk Teahouse Reference desk New articles Requests for comment Village pump Policy Technical Proposals Idea lab WMF Miscellaneous WikiProject proposals Discussions for discussion
Category:Wikipedia noticeboards