Jump to content

Wikipedia talk:WikiProject Classical Greece and Rome

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
(Redirected from Wikipedia talk:CGR)
Project overviewTasksCurationGuidesAwardsOur classicistsTalk page

Fabricated citations, possibly AI-generated, in content by User:Vineviz

[edit]

I found several citations by Vineviz (talk · contribs) in Old Latium, both in the article and within talk page discussion, which failed my attempted verification. I suspect they were AI-generated, although they were problematic regardless of how they were created. I raised these concerns at Talk:Old Latium two weeks ago, which Vineviz, who had become inactive, did not reply to. I reverted Vineviz's edits to that article.

I fear that serious issues are in other material created by the same editor, who in March–April edited a lot of content related to Italy, primarily ancient Italy. (Example: Ancient settlements of the Liri Valley, where I am unable to find some sources cited.)

Do you think this assessment is correct? If so, please help clean up as much as possible.

Cross-posted from WikiProject Italy – please leave comments there, Wikipedia talk:WikiProject Italy#Fabricated_citations,_possibly_AI-generated,_in_content_by_User:Vineviz. Adumbrativus (talk) 09:14, 10 May 2025 (UTC)[reply]

Recommend posting this to WP:AIC – they'll be the folks you need! — ImaginesTigers (talk) 11:25, 10 May 2025 (UTC)[reply]
I PRODed Ancient settlements of the Liri Valley and will nominate for AfD if it's challenged, I don't know what the AI Cleanup project's policy is on these things but personally I think it's perverse to expect editors to spend hours fact checking something bogus that someone generated in a couple minutes Psychastes (talk) 15:50, 10 May 2025 (UTC)[reply]
in fact, they seem to have done a *remarkable* amount of research with cited sources during approximately 5 days in late march and early april, im reverting everything they added in that time under the same assumption. Psychastes (talk) 16:01, 10 May 2025 (UTC)[reply]
@UndercoverClassicist: Any thoughts on Ancient settlements of the Liri Valley before it's deleted? — ImaginesTigers (talk) 17:49, 10 May 2025 (UTC)[reply]
warned the user and did a manual rollback of any page where they significantly changed the content (and no one had manually cleaned it up/reverted yet), going back to November 2024, which is where i found the earliest fabricated sources. they seem to have mostly been inactive prior to then up until pre-2023. Psychastes (talk) 16:36, 10 May 2025 (UTC)[reply]
The article still "smells" a bit AI to me -- it has a few features that ChatGPT etc really like, such as bullet-pointed lists with bolded first words, tables, and fluffy, vaguely promotional language:
  • These projects exemplify current methodological approaches in Mediterranean archaeology and continue to yield new insights into the development of ancient settlements in relation to their environmental and cultural contexts.
  • These settlements collectively demonstrate how frontier regions between different cultural spheres developed during periods of political transformation, making them relevant comparative material for understanding similar processes throughout the ancient Mediterranean world.
  • These settlements, which include important Volscian centers later incorporated into Roman territory, offer archaeologists and historians an exceptional case study of pre-Roman indigenous development, Roman colonization strategies, and the process of cultural integration in ancient Italy
I'm uneasy about this one. On one level, the topic's clearly notable and has plenty written about it (even if it doesn't necessarily have to be a stand-alone article), there's a lot of information in the article, and on a casual scan nothing sticks out as glaringly wrong. On the other hand, the article does have evident deficiencies (particularly layout, as well as the tone issues alluded to above) and, most seriously, uncited material. Given that there's a suspicion of AI use, I can see a case for WP:TNT -- since, as Psychastes notes above, that concern requires (at minimum) the verification of every single thing in a reliable source. I'm not sure which way I'd vote in an AfD, but I don't think I'll object to the PROD either. UndercoverClassicist T·C 19:10, 10 May 2025 (UTC)[reply]
Yeah, to be clear I'm mostly concerned about volume here - if it was a start-class length article on a notable topic with a dozen potentially verifiable claims, I'd think it might be worth salvaging, but there's just too much, and deleting the page rather than blanking and redirecting means all the slop isn't sitting there in the page history for someone to rediscover. Looks like the PROD was contested, so i've nominated it for AfD here, happy to withdraw it if people make improvements to the article and there's consensus here that all the problems have been addressed, but in my experience if we let it sit in the expectation that someone *could* fix it in the future it will probably just stay the way it is. Psychastes (talk) 19:35, 10 May 2025 (UTC)[reply]
  • I'm alarmed by the dependent proposed deletions of Cereatae Marianae and Praefectura (Roman settlement) that arose out of this nomination. I assume they were proposed for deletion because of their association with Vineviz, but the reasons given—and seconded by Bearian, were that they relied on fictitious sources and (in the former case), "real place, fake facts".
Even a cursory investigation showed that neither article cited any fictitious sources, or that their contents deviated from what those or other sources had to say about them. This should never happen. Either article could, of course, be cleaned up or improved; but both appear to be fully verifiable and were in fact verified with a minimum of checking. The sources cited were all real, and all the ones I was able to access (and a couple of others I had available) said what the articles claimed. While I have no way of determining whether AI was used to generate any of the text, I didn't see anything that a human editor couldn't have written.
Please do not propose articles for deletion, claiming that their sources or contents are fictitious, without first attempting to determine whether the sources exist and whether they support the basic facts stated in the article. They may, in some cases, be badly cited, or in the wrong places, some may not be available to review over the internet (but that's not required by Wikipedia), others not cited may be available. But articles should not be deleted, or proposed or nominated for deletion, without at least a basic review of the sources that are available; with short articles like these, that should only have taken a few minutes. P Aculeius (talk) 15:17, 11 May 2025 (UTC)[reply]
My default assumption is that anything AI-generated and unreviewed by a human should be removed from wikipedia. I think you will find a broad consensus for this view across the project, even if you personally disagree. It may only take "a few minutes" per article, but this very quickly adds up into several hours; it is simply not possible to keep up with the rate at which a large language model can generate text. In this case, you reviewed the sources and determined there were no issues, and rightfully deprodded the article. If you or any other editors review any other content that was added by Vineviz and determine that it in fact contains useful information, I certainly won't fight anybody on it. If it is actually reviewed. But we cannot build a process on this. Our time cannot entirely be spent fact-checking and cleaning up computer-generated hallucinations. Psychastes (talk) 15:38, 11 May 2025 (UTC)[reply]
But I didn't fact-check or clean up computer-generated hallucinations: these were perfectly good articles, in the sense that there was nothing the least bit suspicious about them, and they were based on obviously real sources. There is no evidence that they were AI-generated. There is some reason to believe that the article you originally brought here might include AI-generated language, and it would indeed take a considerable amount of time to fix it, though as another editor has said, that might be worth doing given that it's a notable topic about which a great deal has been written.
But it's not a good idea to assume that if an editor has ever used AI to generate article text, or somehow copied AI-generated text into an article, that all of the articles that said editor has written or contributed to must be AI-generated or fictitious. At the very least some attempt should be made to determine whether the editor's work is all suspect or just some of it, because under PROD they would have been deleted automatically if nobody had taken the time to check the facts or sources in the next week. The fact that two different editors reviewed two short articles and concluded they were written using fictitious sources that did not support the facts that they contained, when in reality they were both fully verifiable in a matter of minutes, is a cause for concern. P Aculeius (talk) 15:49, 11 May 2025 (UTC)[reply]
I didn't review the articles at all, I applied Fruit of the poison tree. I and other editors spot-checked several of this editor's contributions, and identified fictitious sources. This, combined with the sheer volume of content added by Vineviz between March 25th and April 7th, and the fact that they seem to have gone inactive almost immediately as soon as people started questioning their edits, makes it pretty clear what is going on. But this really has nothing to do with WP:CGR and we shouldn't clog this talk page with it anymore. if you want to get consensus for a clearer process on what to do in situations where an editor has introduced a massive amount of computer-generated text without reviewing it, I'd recommend drafting up a proposal and posting it at WP:AICU, given that there doesn't seem to be a clearly defined process yet for this sort of thing, and it will certainly happen again. Psychastes (talk) 16:16, 11 May 2025 (UTC)[reply]
I very much appreciate the clean up you're doing and the time you're spending to fix this. I agree with P Aculeius in the sense that both articles contain footnotes with page numbers, etc. It would take seconds—not minutes—to exclude some articles on this basis (likely adding up to minutes, not hours; certainly less than PROD and AFD). Above, you said you would AFD the valley article if PROD failed, so it's not unreasonable for someone to bring it up here to you. Additionally, both articles mentioned by P Aculeius are within scope of this project.. although sadly it is almost as dead as the Greco-Roman religions. — ImaginesTigers (talk) 16:54, 11 May 2025 (UTC)[reply]
I can see what Psychastes is getting at with the volume of edits. Most of the ones on the first page of results do seem to involve this project, however. And I will say that CGR is far from dead. If we're not creating vast numbers of new articles, it's because the basics have already been well-covered, leaving us to monitor and make incremental improvements. There are certainly areas that are underdeveloped, or which could stand to be completely revised. But while gathering steam for that, we still have a number of editors who keep an eye on things and occasionally add new articles or rehabilitate old ones. At least compared with many other projects, we still have a lot of discussion here, along with debates that occasionally become heated and may lead some of us to comment less out of a desire to avoid unnecessary conflict on unresolvable issues... but we're certainly not inactive! P Aculeius (talk) 17:25, 11 May 2025 (UTC)[reply]
Agreed that CGR is not dead or inactive! And for my part, I won't PROD any newly created articles next time if it happens again, I'll just tag them for AI review, given that, for new pages, an immediate cleanup is more permanent and time-sensitive than restoring dubious content that was added to existing pages; I certainly don't want to create a sense of urgency and end up wasting people's time in the other direction, that pages marked for deletion need to be reviewed for useful content that might get erased. Psychastes (talk) 17:37, 11 May 2025 (UTC)[reply]
I agree that I was too heavyhanded here and there are other options I'll take in the future, but one minor thing - Be careful with just validating the footnotes! I may have contributed to the confusion here by saying "fictious sources" but another very real possibility is that the source is real and the content isn't, sometimes the page cited given doesn't discuss the topic, but sometimes it does and the LLM garbled the meaning. I've seen Google's "AI Overview" cite sources where it ascribed something to another author listed on the same page, or lost a "not" somewhere, or paraphrased the word "description" to mean "Commentary". Psychastes (talk) 17:46, 11 May 2025 (UTC)[reply]
I've tried a little spot-checking of sources; it's a bit frustrating. In Cereatae Marianae, Thomas Ashby, The Classical Topography of the Roman Campagna does not seem to have been published by the Clarendon Press in 1910 and what I have found doesn't support the text. The Pliny and Smith references were wrong until you corrected them. This is not atypical of AI generations; the named sources - or something very like the named sources - often exist but the text has not been generated by examining them; instead, plausible text has been generated and plausible sources associated with it. Two of Praefectura (Roman settlement)'s cites are "Abbott, Frank Frost. Municipal Administration in the Roman Empire. Read Books, 2007, p. 35". I can't find a Read Books edition or a 2007 one. Archive.org has the 1926 Princeton edition but the print page 35 of that doesn't support the text or mention praefectura(e) at all; page 10 does mention the praefectura but pp10-11 don't support the description of a prefect as serving an annual term or as "selected from among Roman citizens or officials of a nearby municipium". I do notice that Praefectura (Roman settlement) originally had no citations and a citations-need tag[1]; the texts were listed as "Further reading", and later attached to parts of the text as refs.[2] It's all rather disquieting. NebY (talk) 17:44, 11 May 2025 (UTC)[reply]
I'm not sure we have an established process for dealing with users where suspicion of misuse of LLMs exist, but it would be on the table to mass-AFD articles created by a user with suspicion of copyvio (for the same reason: that keeping them around without a prohibitively laborious check opens up Wikipedia to issues). Given that one of the major problems of LLM use is that it can lead to copyrighted material being placed into articles, I think the same approach is at least justified here. I share NebY's disquiet with the two articles mentioned above. UndercoverClassicist T·C 17:51, 11 May 2025 (UTC)[reply]
In this case, I'll retract and support a nuclear approach. I'll admit that I didn't consider the copyright component either. Please accept my apologies, Psychastes! — ImaginesTigers (talk) 17:55, 11 May 2025 (UTC)[reply]
As I said before, I couldn't tell which edition of Ashby was being cited, because the work was published in several parts, some of which I located on Archive.org, while others I could not find in any readable format. The ones I located didn't discuss the area in question, as far as I could tell, but since I can't see them all I'm just going to say that we shouldn't assume the source was incorrectly cited because we can't verify it. The general description given in Praefectura matches what what Abbott says, and also what I read in the shorter article in Harper's Dictionary of Classical Antiquities, a citation I could have added if I'd taken the time to locate each item in the article that it verifies. As far as prefects being annual, that's something we take for granted with most Roman officials; very few of them were appointed for terms longer than a year.
The pagination in Abbott is wrong if it's citing the original editions, but if I recall it looked as if a recent reprint was being cited, and for that it may be correct. Since recent books aren't always available online, and didn't turn up in a brief search, I looked at the older one to see if it verified what the article said, without intending to change the edition or pages cited. I was focused on determining whether anything in the article was a hoax, AI-generated nonsense, or not supported by the sources—not on cleaning up the references. That can be done in the course of ordinary editing. All of the books cited are actual sources, not fictional ones, as the PROD had suggested; and most or all of the contents of the article seemed to match what I was reading in the sources I located. So the article should certainly have been kept. P Aculeius (talk) 18:32, 11 May 2025 (UTC)[reply]
Hoaxes and nonsense are comparatively easy to detect. We're talking about seemingly plausible material that's at best intermittently supported by the seemingly plausible sources provided, and often not - as I described above. We're going to see a lot of this and we have to be ready to deal with it robustly, otherwise Wikipedia will fill up with plausible content that looks RS-based and verifiable, but isn't. NebY (talk) 18:49, 11 May 2025 (UTC)[reply]
That's fair, but the two articles I'm talking about appear verifiable. I read the sources I mentioned specifically, as well as checking whether the others were real sources, though I didn't search the recent ones that seemed likely to be hard to access. Everything I saw in those articles looked like it matched what I was reading, but if you want I can go over them more carefully and try to attribute everything separately. That should be easier with Cereatae, which is shorter and doesn't look like it says anything that isn't in the DGRG article it cites. My complaint was that two editors proposed the articles for deletion without having checked any of the sources, or other sources that might have verified or refuted the contents. "Real location, fake facts" was not an accurate description. P Aculeius (talk) 19:00, 11 May 2025 (UTC)[reply]
All of the books cited are actual sources, not fictional ones, as the PROD had suggested; and most or all of the contents of the article seemed to match what I was reading in the sources I located. So the article should certainly have been kept: P Aculeius, I think you're approaching this as a WP:V problem -- which it is -- but it's more importantly a WP:COPYVIO problem. LLMs, by their nature, take in and sometimes reproduce copyrighted material; to establish that there is no issue here, it's not enough simply to find a source that verifies each cited claim; we would also need to verify that the expression was not a close paraphrase of any extant copyrighted source. Even if the articles were entirely written by (up-to-date) LLMs, it wouldn't be a surprise for at least some of the citations to check out. UndercoverClassicist T·C 18:58, 11 May 2025 (UTC)[reply]
I think that's backward: we don't assume that the contents of an article are copyright violations unless and until some copyrighted text is detected, or the editor who wrote them clearly has a habit of lifting unattributed text from other sources. I didn't see anything that was obviously borrowed here, but it's easy enough to revise text to ensure there aren't any copyright violations. Most of the sources I relied on when verifying the facts are out of copyright, but the articles still should identify any quotes clearly, or rephrase to avoid quoting them. P Aculeius (talk) 19:04, 11 May 2025 (UTC)[reply]
If the "editor" who wrote them is an LLM, then the editor does clearly have a habit of lifting unattributed text -- so we're then on the question of whether it's reasonable to suspect that an LLM might have been used to write the articles. Here, the human editor's previous form is also important evidence. UndercoverClassicist T·C 19:50, 11 May 2025 (UTC)[reply]
We don't know that this—or three other articles besides the one this discussion was originally about—were AI generated, though the reasons given for PRODding each of them were "probably AI-generated using fictitious sources". None of them appear to contain fictitious sources, and the only text that looked suspicious to me was a rather opaque description of the archaeological/paleontological remains on one of them. However, I've gone over "Cereatae" thoroughly and fixed the issues I could find, replacing substantially all of the language and reciting from what I could find.
The only remaining issue is that the Ashby citations are defective—not because I think they don't exist, or say what they're supposed to, but because I don't know which part they're citing. The work was published in several parts in different issues of a periodical, with different and sometimes overlapping pagination. I don't have a clear picture of how many parts there were; three or four were on Internet Archive, a couple on Google Books (but without the ability to preview or see snippet views). If anyone else wants to help find these, I'd be grateful. I'll try to work on the other articles later this week. P Aculeius (talk) 23:15, 11 May 2025 (UTC)[reply]

Ancient Greek/s

[edit]

Ancient Greek and Ancient Greece are separate articles. Ancient Greeks redirects to the latter. I have always used Ancient Greek language without even checking where it goes. It never occurred to me that we would treat "Ancient Greek" as unambiguous for the language. We don't treat "Greek" this way. Thoughts? Srnec (talk) 22:58, 11 May 2025 (UTC)[reply]

It seems natural to me; the language is normally called "Ancient Greek", which seems like the most natural title. Even though I can see someone referring to "an ancient Greek", I wouldn't think of searching for (or linking to) "ancient Greek" on the assumption that it would go to an article about ancient Greeks. P Aculeius (talk) 23:22, 11 May 2025 (UTC)[reply]
I think it's fine; I would write "It was composed in Ancient Greek" over "It was composed in the Ancient Greek language". Modern Greek doesn't feel like a good comparison – Latin is more appropriate, and I think we'd agree "Latin language" would be a bit odd. — ImaginesTigers (talk) 23:31, 11 May 2025 (UTC)[reply]
Greek seems like the more ambiguous case to me so it makes sense that it's disambiguated; in my experience the alphabet, the modern language, and the ancient language are all WP:NOUNs frequently are referred to as "Greek" without further qualification. but all of the uses of "Ancient Greek" that refer to the place, people, culture, etc. that I encounter are adjectives. Psychastes (talk) 23:50, 11 May 2025 (UTC)[reply]

Discussion on the direct citation of ancient Greek and Roman sources

[edit]

I've opened a discussion at RSN about the direct citation of classical writers in articles, which is a common occurence in this topic area Wikipedia:Reliable_sources/Noticeboard#Classical_sources_(Herodotus,_Plutarch_etc) in order to develop guidance on when/if direct citations of ancient authors is appropriate. As this seems to likely be of interest to members of this WikiProject, I'd thought I'd make a post here noting the discussion. Please participate if interested. Thanks. Hemiauchenia (talk) 01:13, 12 May 2025 (UTC)[reply]

Good article reassessment for Ancient Greek literature

[edit]

Ancient Greek literature has been nominated for a good article reassessment. If you are interested in the discussion, please participate by adding your comments to the reassessment page. If concerns are not addressed during the review period, the good article status may be removed from the article. Psychastes (talk) 20:57, 14 May 2025 (UTC)[reply]

Good article reassessment for Aristotle

[edit]

Aristotle has been nominated for a good article reassessment. If you are interested in the discussion, please participate by adding your comments to the reassessment page. If concerns are not addressed during the review period, the good article status may be removed from the article. Psychastes (talk) 16:22, 18 May 2025 (UTC)[reply]

Good article reassessment for Tiberius

[edit]

Tiberius has been nominated for a good article reassessment. If you are interested in the discussion, please participate by adding your comments to the reassessment page. If concerns are not addressed during the review period, the good article status may be removed from the article. Z1720 (talk) 19:01, 19 May 2025 (UTC)[reply]