🇮🇷 Iran Proxy | https://www.wikipedia.org/wiki/Special:NewSection/Wikipedia_talk:Writing_articles_with_large_language_models
Jump to content

Wikipedia talk:Writing articles with large language models

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Why is this guideline only restricted to "new articles"? Shouldn't this apply to all articles? (and talk pages and so on...)

[edit]

Under my own reading of this rule, it seems like it only applies to new articles, and that pre-existing articles are somehow allowed to have AI-generated text inserted into them. GarethBaloney (talk) 13:46, 24 November 2025 (UTC)[reply]

I think because it's a badly written sentence and was erroneously promoted to Guideline. qcne (talk) 13:48, 24 November 2025 (UTC)[reply]
Well if people are saying it's a badly written guideline then we should make a new discussion on changing it! GarethBaloney (talk) 14:05, 24 November 2025 (UTC)[reply]
Yes! Let's have all our guidelines be padded out with twelve-thousand word essays defending and justifying them and providing supplementary information such that no-one will ever read them and newbies have no freaking idea what it's actually telling them. Cremastra (talk · contribs) 02:05, 25 November 2025 (UTC)[reply]
The guideline and RFC were probably written minimalistically to increase its chances of passing an RFC, with the intent to flesh it out in follow up discussions. –Novem Linguae (talk) 21:26, 24 November 2025 (UTC)[reply]
This one. Cremastra (talk · contribs) 01:08, 25 November 2025 (UTC)[reply]

Further amendment proposal #1: Festucalex

[edit]

Well, habemus guideline. Now, how is it going to be enforced, given the fact that the guideline is donut-shaped? We might as well address the "from scratch" loophole and preempt the thousands of man-hours that are going to be wasted debating it with LLM users. How should we define "from scratch"? In an ideal situation, the guideline would be this:

Large language models (or LLMs) can be useful tools, but they are not good at creating entirely new Wikipedia articles. Large language models should not be used to generate new Wikipedia articles from scratch.
+
Large language models should not be used to edit Wikipedia.

This will close the loophole. Any improvements are welcome. Festucalextalk 14:05, 24 November 2025 (UTC)[reply]

Strong support. The usage of LLMs to directly edit, add to, or create articles should not be accepted in any way. The high likelihood, and inherent poor quality of sourcing of LLMs makes them ill suited for use on Wikipedia, and genuine human writing and research should be the standard. Stickymatch 02:55, 25 November 2025 (UTC)[reply]
LLM sourcing can be 100% controlled (editor selects sources, uploads them, and explicitly prohibits using anything else). So the poor choice of sources is a human factor, evident in many human-written articles here. Викидим (talk) 05:49, 25 November 2025 (UTC)[reply]
I am not sure if we can amend the proposal after all these !votes have been made, but could you make an exclusion for grammer checkers? Mikeycdiamond (talk) 15:52, 26 November 2025 (UTC)[reply]
These are not !votes. This is an WP:RFCBEFORE discussion. voorts (talk/contributions) 16:33, 26 November 2025 (UTC)[reply]

P.S. I just finished writing an essay against one of the proposed "accepted uses" for LLMs on Wikipedia. I welcome your feedback on the essay's talkpage. User:Festucalex/Don't use LLMs as search engines Festucalextalk 16:54, 24 November 2025 (UTC)[reply]

I support this wholeheartedly. GarethBaloney (talk) 14:07, 24 November 2025 (UTC)[reply]
Support. "From scratch" is way too generous. TheBritinator (talk) 14:25, 24 November 2025 (UTC)[reply]
Any policy or guideline that says "ban all uses of LLMs" is bound to get significant opposition. SuperPianoMan9167 (talk) 14:31, 24 November 2025 (UTC)[reply]
And all policies and guidelines have a built-in loophole anyway. SuperPianoMan9167 (talk) 14:36, 24 November 2025 (UTC)[reply]
The fact that WP:IAR exists doesn't mean that we ought to actively introduce crippling loopholes into guidelines. Imagine if we banned vandalism only on new articles, or only on articles that begin with the letter P. Festucalextalk 14:57, 24 November 2025 (UTC)[reply]
If you look at the RfC you can see a significant number of users who disagree with the assertion that "all LLM use is bad", which is why I have doubts that a proposal to ban LLMs entirely will ever pass. SuperPianoMan9167 (talk) 15:00, 24 November 2025 (UTC)[reply]
It's WP:NOTVOTE and it should never be. As I said before, anyone who wants to open up uses for LLMs on Wikipedia should explain precisely, minutely, down to the atomic level how and why LLMs can be used on Wikipedia and how these uses are legitimate and minimally disruptive as opposed to all other uses. The case against LLMs has been made practically thousands of times, while the pro-LLM case consists of nothing more than handwaving towards vague say-so assertions and AI company marketing buzzwords. Festucalextalk 15:09, 24 November 2025 (UTC)[reply]
WikiProject AI Tools was formed to coordinate legitimate uses of LLMs. SuperPianoMan9167 (talk) 22:31, 24 November 2025 (UTC)[reply]
Also, the rules are principles. The general idea of this guideline is that using LLMs to generate new articles is bad. It is not and should not be a blanket ban on LLMs. LLMs are tools. Like all tools, they have valid use cases but can be misused. Yes, their outputs may be inherently unreliable, but it is incorrect to say they have no use cases. SuperPianoMan9167 (talk) 22:39, 24 November 2025 (UTC)[reply]
Support but with the caveat that I think it's too broad for what this policy has already been approved for. This edit implies any use of LLMs is unacceptable, even if it's not LLM-generated content being included in an article. Given that there's still arguably a carveout for using LLMs to assist with idea generation etc, my Counterproposal if people find it more appealing can be found at #Further amendment proposal #3: Athanelar. Athanelar (talk) 14:43, 24 November 2025 (UTC)[reply]
I think we ought to actively discourage other non-submission uses, even if we can't detect them. At least we'd be making it clear that the community disapproves. This only will stop the honest ones, but hey, that's something. Festucalextalk 14:55, 24 November 2025 (UTC)[reply]
I agree, that's why my initial statement is support, I just wanted to present a counterproposal in case the majority would prefer something that doesn't widen the scope so much. Athanelar (talk) 14:59, 24 November 2025 (UTC)[reply]
Can you put the counterproposal in a different section to avoid confusion? NicheSports (talk) 15:10, 24 November 2025 (UTC)[reply]
Done. Athanelar (talk) 15:19, 24 November 2025 (UTC)[reply]
Support. We should probably add clarifying language to this (I have some ready I can propose), but definitely agree and think the community is ready to support a complete LLM ban NicheSports (talk) 15:09, 24 November 2025 (UTC) Now that I understand what is meant by this proposal, I don't support it. I would support a ban on using LLMs to generate article content (per Kowal2701) NicheSports (talk) 00:11, 25 November 2025 (UTC)[reply]
Similar to my comment below, this completely changes the purpose of this guideline (expanding its scope from new articles to all edits) and would require a new RfC. Toadspike [Talk] 15:48, 24 November 2025 (UTC)[reply]
Definitely – I interpreted this as workshopping something that will be brought to another RFC. Is that fine to do here or should we move it to WP:AIC? NicheSports (talk) 15:50, 24 November 2025 (UTC)[reply]
Yes, what we're doing here is the WP:RFCBEFORE that the original proposal never got. There are already 3 wordings on the table: mine, qcne's, and Athanelar's, and I hope this eventually crystallizes (after more refining) into a community-wide RFC. As the closing note pointed out, this issue requires a lot more work and discussion, and a lot of people agreed to Cremastra's proposal because they wanted anything to be instituted to stem the bleeding while the community deliberated on a wider policy. Festucalextalk 16:14, 24 November 2025 (UTC)[reply]
Oppose. AI is a tool. For example, I routinely use AI to generate {{cite journal}} templates from loose text (like the references in other publications) or to check my grammar. This is IMHO no more dangerous than using the https://citer.toolforge.org/ for the same purpose (or Grammarly to check the grammar). We should encourage the disclosure, not start an un-enforceable Prohibition. Викидим (talk) 21:29, 24 November 2025 (UTC)[reply]
@Викидим What are your thoughts on my proposal #2, below, which has a specific carve-out for limited LLM use? qcne (talk) 21:31, 24 November 2025 (UTC)[reply]
Does creating the journal template count as generating text for articles? GarethBaloney (talk) 21:44, 24 November 2025 (UTC)[reply]
The sources are certainly part of the text. According to views expressed in the discussion, AI can hallucinate the citation. For the avoidance of doubt, in my opinion – and experience – this is not the case with this use, but then there are many other safe uses of AI – like translation – and all of these IMHO shall be explicitly allowed (yes, I also happen to like m-dashes). Викидим (talk) 22:10, 24 November 2025 (UTC)[reply]
This is IMHO no more dangerous than using [...] I strongly disagree that using the hallucination machine specifically designed to create natural-sounding but not-necessarily-accurate language output is 'no more dangerous' for these purposes than using tools specifically designed for the tasks at hand. Athanelar (talk) 21:54, 24 November 2025 (UTC)[reply]
The AI is not made to manufacture lies any more than a keyboard is. The difference is in performance and intent of the user – these are the ones we might want to address. Blaming tools is IMHO a dead end, Luddites, ostensibly also fighting for quality, quickly lost their battle. Викидим (talk) 22:13, 24 November 2025 (UTC)[reply]
Are unscrupulous editors not more likely to use something like ChatGPT to try and sound professional even when they aren't? Besides, Grammarly is not the same as asking an LLM to generate a Wikipedia article, complete with possibly fake sources. GarethBaloney (talk) 22:59, 24 November 2025 (UTC)[reply]
(1) try and sound professional even when they aren't We are (almost) all amateurs here, so a tool that makes non-professionals sound better is not necessarily bad. (2) The proposal reads should not be used to edit Wikipedia leaving no exceptions for grammar checking. Викидим (talk) 23:23, 24 November 2025 (UTC)[reply]
Grammar checking can done (and has been being done for decades) using non-LLM artificial intelligence models and programs. Festucalextalk 23:35, 24 November 2025 (UTC)[reply]
I was going to point this out, haha. There's been automatic grammar checking and spellcheck since what- Word 97? No LLM required. Stickymatch 02:58, 25 November 2025 (UTC)[reply]
All modern translation and grammar checking tools use AI, as it produces superior results. Google for obvious reasons was heavily invested into both for almost 20 years. According to my source, they at first were trying to go the non-AI way (studying and parsing individual grammars, etc.) only to discover than direct mapping between texts does a better job at a lower cost. Everyone else of any importance followed their approach many years ago. It was just not a generic AI that we know now, but an AI nonetheless. Some detail can be found, for example, on p. 19 of the 2008 thesis [1] (there should be better written sources, naturally, but the fact is very well known). Викидим (talk) 06:03, 25 November 2025 (UTC)[reply]
Strong support: removes all ambiguity. Z E T AC 21:34, 24 November 2025 (UTC)[reply]
Oppose, people often use stuff like Grammarly. The ban needs to be on generating content Kowal2701 (talk) 21:38, 24 November 2025 (UTC)[reply]
Grammarly is not an LLM. Festucalextalk 23:34, 24 November 2025 (UTC)[reply]
It's powered by LLMs: In April 2023, Grammarly launched a product using generative AI built on the GPT-3 large language models. (from the article) SuperPianoMan9167 (talk) 23:35, 24 November 2025 (UTC)[reply]
Generative AI tools like Grammarly are powered by a large language model, or LLM - from the Grammarly website [2] GreenLipstickLesbian💌🧸 23:37, 24 November 2025 (UTC)[reply]
Then users can use a grammar checker other than Grammarly. Festucalextalk 23:40, 24 November 2025 (UTC)[reply]
Wow. voorts (talk/contributions) 23:45, 24 November 2025 (UTC)[reply]
I think what users on both sides of this ideological divide are running up against is a common thing that happens whenever there is such a divide between two groups; both groups assume that members of the other group are operating on the same fundamental value system that they are, and that their arguments are built from that same value system.
I.e., the 'less restrictive' party here (voorts, qcne et al) is beginning from the core value that 'the reason LLMs are problematic is that their output is generally not compatible with Wikipedia's standards,' and the argument that stems from that is 'any LLM policy we make should be designed around bringing the result of LLM usage in line with Wikipedia's standards, whether that be directly LLM-generated text, or simply users utilising LLMs in their creative process.'
The 'more restrictive' part here (myself, Festucalex et al) is beginning from the core value that 'LLMs and their output are inherently undesirable and detrimental (for some of us to the internet as a whole, for others perhaps specifically only to Wikipedia)' and the argument that stems from that is 'any LLM policy we make should be designed around minimising the influence of LLMs on the content of Wikipedia.'
That's why Festucalex pivoted here and said people should use something other than Grammarly. We simply believe that it's imperative that we purge LLM output from Wikipedia, regardless of whether it's reviewed or policy compliant or anything else. It's also important to keep in mind that NEWLLM as it stands is a product of the latter ideology, not the former, and I think that's why it appears to be so flawed to people like qcne; because it's solving a completely different problem than the one they're trying to solve. Athanelar (talk) 01:03, 25 November 2025 (UTC)[reply]
I understand your views. What I don't see is evidence. voorts (talk/contributions) 01:11, 25 November 2025 (UTC)[reply]
Exactly. I made an identical point about this fundamental divide in the RfC. (I have discovered I am pivoting more towards the "less restrictive" side in my comments here.) SuperPianoMan9167 (talk) 01:59, 25 November 2025 (UTC)[reply]
Yes, I think people understand the divide is between this idea of fundamentalism (the intrinsic nature of LLMs is that they are bad) and those who don't subscribe to it. But what many of us who oppose this fundamentalism think is that rather than being based on evidence (voorts), it's an article of faith. Katzrockso (talk) 02:56, 25 November 2025 (UTC)[reply]
Not workable – if somebody comes up to me and says "Hey, you've made a mistake in Hanako (elephant)" or shows up on BLP saying "You have my birthdate wrong", then I don't care if they use a LLM to write their post, and I don't care if they use an LLM to translate it from their native language. I'm not even sure I care if they use the LLM to make the edit/explain themselves in the edit summary (but I'd rather they disclose it, for obvious reasons), assuming they do it right.
Ultimately, somebody who repeatedly introduces hoax material/fictitious references to articles repeatedly should be blocked quickly, whether they're using AI or not. Somebody who repeatedly introduces spammy text repeatedly should be blocked, whether they have a COI or not. Somebody who repeatedly introduces unsourced negative BLP information should be blocked, whether or not they're a vandal/have a COI. Somebody who repeatedly inserts copyright violations should be blocked, whether they're acting in good faith or not. The LLM is a red herring – once we've established that the content somebody writes is seriously flawed in a way that's not just accidentally, we need to block the contributor. If they say "but it's not my fault, ChatGPT told me to" then unblocks admins can take that into consideration & we can tban that editor from using automated or semi-automated tools as an unblock condition. GreenLipstickLesbian💌🧸 23:00, 24 November 2025 (UTC)[reply]
+1 This whole guideline is everyone just sticking their heads in the sand and hoping LLM usage will go away. We should be thinking about how LLMs can be used well, not outright banning their use. voorts (talk/contributions) 23:08, 24 November 2025 (UTC)[reply]
It's also yet another example of why PAGmaking on the fly and without advanced deliberation is a terrible idea. voorts (talk/contributions) 23:10, 24 November 2025 (UTC)[reply]
There are no legitimate uses for LLMs, just like there are no legitimate uses for chemical weapons. They're both technically a tool, and anyone can argue that sarin gas can technically be used against rodents, but is it really worth the risk of having it around the kitchen? Festucalextalk 23:46, 24 November 2025 (UTC)[reply]
Are you seriously comparing LLMs to chemical weapons? voorts (talk/contributions) 23:48, 24 November 2025 (UTC)[reply]
Yep. Festucalextalk 23:49, 24 November 2025 (UTC)[reply]
65k bytes to get to Godwin's Law, nice! GreenLipstickLesbian💌🧸 00:03, 25 November 2025 (UTC)[reply]
Festucalex please lol. Also, idk if this is written down anywhere, there's probably an essay, but the fastest way to nuke support for a plausible idea here is to start saying stuff like "X is like sarin gas" NicheSports (talk) 00:06, 25 November 2025 (UTC)[reply]
I think the analogy I'm making is clear: it's a technology whose risks override any potential benefits, at least in this context. Forget sarin gas, let's say it's like a pogo stick in a porcelain museum. Festucalextalk 00:09, 25 November 2025 (UTC)[reply]
There are no legitimate uses for LLMs What about this, and this, and this, and this, and this, and this, and this, and...
You get the point. SuperPianoMan9167 (talk) 23:53, 24 November 2025 (UTC)[reply]
There are no legitimate uses of LLMs on Wikipedia. I have said it before and I will say it again. Even if it is impossible to stop all LLM usage, guidelines like this one can serve as a statement of principle. Yours, &c. RGloucester 00:00, 25 November 2025 (UTC)[reply]
So everyone in WikiProject AI Tools is editing in bad faith? SuperPianoMan9167 (talk) 00:02, 25 November 2025 (UTC)[reply]
They're using bad tools in good faith because we don't have a comprehensive guideline yet. Festucalextalk 00:04, 25 November 2025 (UTC)[reply]
Why can't LLMs ever be legitimately used on Wikipedia? voorts (talk/contributions) 00:06, 25 November 2025 (UTC)[reply]
What is the philosophical mission of Wikipedia? WP:ABOUT begins with the Jimbo quote Imagine a world in which every single person on the planet is given free access to the sum of all human knowledge. That's what we're doing.
LLMs don't produce human knowledge. They produce realistic-sounding human language, because that's what they're designed to do, it's all they've ever been designed to do, it's all they can ever be designed to do – it's literally in their fundamental structure. Not only that, but the output they produce is explicitly biased by their programming and their training data, which are both determined by a private company with no transparency or oversight.
Would you be content if the entirety of Wikipedia's article content were created and maintained by a single editor? Let's assume that single editor is flawless in their work; all of their work is rigorous and meets the standards set by the community (who are still active in a non-article capacity), it's perfectly sourced etc; it's just that it's all coming from a single individual.
What about 90%? 80%? 50%? What percentage of the encyclopedia could be written and managed by a single individual before it would compromise the collaborative nature of Wikipedia?
Thesis 1: The output of an LLM is, effectively, the work of a single individual. Obviously it's more complex than that, but LLM output all has the same tone because it's all the product of the same algorithms from the same privatised training data.
Thesis 2: Given the opportunity, LLM output will comprise an increasingly large percentage of Wikipedia, because it is far faster to copyedit, rewrite and create with LLMs than it is to do so manually. This will only increase the more advanced LLMs get, because their output will require less and less human oversight to comply with Wikipedia's standards.
The conclusion, then, is how much of Wikipedia's total content you're willing to accept being authored by what is essentially a single individual with inscrutable biases and motivations. There must be some cutoff in your mind; and our contention is that if you allow them to get their foot in the door, then the result is going to end up going beyond whatever percentage cutoff you've decided as acceptable. Athanelar (talk) 02:48, 25 November 2025 (UTC)[reply]
"The output of an LLM is, effectively, the work of a single individual. Obviously it's more complex than that" is putting it lightly. Notwithstanding the fact that more than one LLM exists, editors who opposite anti-LLM fundamentalism here have consistently advocated for the necessity of human review and editing when evaluating LLM output. Katzrockso (talk) 02:58, 25 November 2025 (UTC)[reply]
editors who opposite anti-LLM fundamentalism here have consistently advocated for the necessity of human review and editing when evaluating LLM output.
Well, okay, take my initial example again, then. Let's say John Wikipedia is still producing 50 or 80 or 100% or whatever of Wikipedia's output, but it's first being checked by somebody else to make sure it meets standards. Would it now be acceptable that John Wikipedia is the sole author of the majority (or a plurality or simply a large percentage) of Wikipedia's content, simply because his work has been double-checked? Athanelar (talk) 03:09, 25 November 2025 (UTC)[reply]
Yes, if John Wikipedia's contributions all accurately represents the sources as evaluated by other editors and meets our content standards, why would that be a problem? Katzrockso (talk) 03:43, 25 November 2025 (UTC)[reply]
Well, that's just one of those fundamental value differences we'll never overcome, then. I don't think John Wikipedia should be the primary author of content on Wikipedia because that would undermine the point of Wikipedia being a communal project, and for that same reason I don't think we should allow AI-generated content to steadily overtake Wikipedia either, whether or not it's been reviewed or verified or what have you. Athanelar (talk) 03:47, 25 November 2025 (UTC)[reply]
This happens all the time at smaller Wikipedias. There just aren't enough people who speak some languages + can afford to spend hours/days/years editing + actually want to do this for fun to have "a communal project" the way that you're thinking of it. WhatamIdoing (talk) 06:09, 27 November 2025 (UTC)[reply]
What about uses of LLMs that aren't generating new content (which is what most of the tools at WikiProject AI Tools are about)? SuperPianoMan9167 (talk) 03:03, 25 November 2025 (UTC)[reply]
I don't have any issue with that, because it's functionally impossible to identify and police. That's why my proposal is worded differently to Festucalex's, because I think it's only sensible and possible to prohibit the inclusion of AI-generated text, not the use of AI in one's editing process at all. Athanelar (talk) 03:11, 25 November 2025 (UTC)[reply]
I asked why they can't ever be used. I have several FAs and GAs, but I'm terrible at spelling. If, as seems to be the direction the world is heading, most browsers replaced their original spellcheckers with LLM-powered ones, are you suggesting I'd need to install an obscure browser created by anti-AI people to avoid running afoul of this proposed dogmatism? voorts (talk/contributions) 13:38, 25 November 2025 (UTC)[reply]
No, my proposal is to ban adding AI-generated content to Wikipedia, not to ban people using AI as part of their human editing workflow, that would be unenforceable. Athanelar (talk) 14:13, 25 November 2025 (UTC)[reply]
None of these are legitimate, and I hope that our new guideline puts an end to them before they become standard practice. No use designing and marketing kitchen canisters for sarin gas. Festucalextalk 00:02, 25 November 2025 (UTC)[reply]
This reads more like a moral panic than a logically & evidentially supported proposal Katzrockso (talk) 00:52, 25 November 2025 (UTC)[reply]
It's not a moral issue. LLMs undermine the whole foundation of this project. They were developed by companies that are in direct competition with Wikipedia. These companies have used our content with the aim of monetarising it through LLM chatbots, and now plot to replace Wikipedia altogether, à la Grokipedia. Promoting LLM use will rot the project from within, and ultimately result in its collapse. Yours, &c. RGloucester 06:12, 25 November 2025 (UTC)[reply]
Slippery slope Katzrockso (talk) 14:09, 25 November 2025 (UTC)[reply]
Yes, it is a 'slippery slope' argument, if anything, a better term is 'death by a thousand cuts'. It is a common misconception that a slippery slope argument is an inherent fallacy. I find it very interesting that some editors here prefer to place emphasis on the quality of the content produced, rather than on the actual mission of the project. Let us take this kind of argument to its logical conclusion. If some form of LLM were to advance, and were able to produce content of equivalent quality to the best Wikipedia editors, would we wind up the project, our mission complete? I'd like to hope that the answer would be no, because Wikipedia is meant to be a free encyclopaedia that any human can edit.
When one outsources some function to these 'tools', whether it be spellchecking or article writing, it will inevitably result in the decline of one's own copyediting and writing skills. As our editors lose the skills they have gained by working on this encyclopaedia over these past two decades, they will become more and more reliant on the LLMs. What happens then, when the corporations that own these LLMs decide to cease providing their 'tools' to the masses gratis? Editors, with their own skills weakened, will become helpless. Perhaps only those with the ability to pay to access LLMs will be able to produce content that meets new quality standards that have shifted to align with LLM output. Wikipedia's quality will decline as the pool of skilled editors dwindles, and our audience will shift toward alternatives, like the LLMs themselves. The whole mission of the project will be called into question, as Wikipedia loses its competitive advantage in the marketplace of knowledge. Yours, &c. RGloucester 00:20, 26 November 2025 (UTC)[reply]
But we shouldn't sacrifice newcomers in the name of preserving the project by blocking them for using LLMs right after they join when they have no clue why or how LLMs are unreliable. SuperPianoMan9167 (talk) 00:25, 26 November 2025 (UTC)[reply]
My hope for this guideline is that it will prevent that kind of blocking, since good faith newcomers who show up using LLMs will get reverted and linked to this page, instead of the previous situation where they get asked politely to stop, then when they don't, they eventually get dragged to ANI and TBanned from using LLMs, which is frustrating and much more difficult to understand than a simple page that says "Wikipedia doesn't accept LLM-generated articles because that's one of the things that makes Wikipedia different from Grokipedia". -- LWG talk 00:57, 26 November 2025 (UTC)[reply]
Assuming we adopt this proposal, and assuming that good faith newcomers abide, there will still be editors who get asked politely to stop (i.e., they will be warned), then when they don't, they eventually [will] get dragged to ANI and blocked, not TBANNED (by my count, only 3 editors are topic banned from LLM use per Wikipedia:Editing restrictions). I've blocked/revoked TPA of many accounts for repeated LLM use and I can assure you that almost none of those editors knew or cared about what any of our guidelines said. In no universe would a no-LLM rule result in any change to the process of having to drag people to ANI to get them blocked. voorts (talk/contributions) 01:11, 26 November 2025 (UTC)[reply]
^this.
To use a real example, every single time anybody makes a post, they agree not to copy paste content from other sites, attribute it if they copy from within Wikipedia, and there are sooooooooooooooooooooooo many copyright blocks given out every year. Most of these people unambigiously acted in good faith. And each and every one got dragged to a noticeboard, often multiple times, before they were blocked. I'm sorry, but this won't be any different - and Wikipedia naturally draws the type of people who like to ask "why", so we're still going to have to point them to WP:LLM and won't be swayed by a simple page saying "no, because I said so".GreenLipstickLesbian💌🧸 08:05, 26 November 2025 (UTC)[reply]
SuperPianoMan, I agree with you, and I also agree with LWG. The problem until now was that Wikipedia has failed to clearly explain its stance on LLMs, blocking myriad editors without any obvious policy or guideline-based rationale. This ad hoc form of justice has gone on too long, and is unfair to newcomers, and is one reason why I supported the adoption of this guideline, despite its shortcomings. The community needs to clearly explain Wikipedia's purpose, and why LLMs are not suited for use on Wikipedia, to both new editors and our readership. Wikipedia should aim to promote the value of a project that is free, that anyone can edit, and that is made by independent men and women from right across the world. If anything, our position as a human encyclopaedia should be a merit in a competitive information marketplace. Yours, &c. RGloucester 01:11, 26 November 2025 (UTC)[reply]
they eventually get dragged to ANI and TBanned from using LLMs, which is frustrating and much more difficult to understand than a simple page
Yes exactly. People were regularly being sanctioned for a rule that they could not have known about it because no such rule existed. Even if not a single newbie ends up reading this guideline, its existence is still beneficial, because it means we are no longer punishing people for breaking unwritten rules. Gnomingstuff (talk) 09:50, 26 November 2025 (UTC)[reply]
I don't think it's ever been practice to sanction somebody for just AI use, though? It's always been fictitious references, violating mass create, copyright issues, WP:V failures, UPE/COI, NPOV violations, ect. I'm not saying no admin has ever blocked a user for only using LLMs, (admins do act outside of policy, sometimes!) though I'd be interested to see any examples. Thanks, GreenLipstickLesbian💌🧸 10:23, 26 November 2025 (UTC)[reply]
Usually it's more than just AI use if it ends up at ANI but I doubt the distinction is really getting through to people, and a lot of !votes to block, CBAN, TBAN, etc. are made with the rationale of "AI has no place on Wikipedia ever." Sometimes the bulk of the thing is that (example: Wikipedia:Administrators'_noticeboard/IncidentArchive1185#User:_BishalNepal323)
There's also the uw-ai to uw-ai4 series of templates, which implies a four-strikes rule; I don't use them but others do. Gnomingstuff (talk) 10:51, 26 November 2025 (UTC)[reply]
In your example, Ivanvector blocked for disruptive editing, not solely for AI use. voorts (talk/contributions) 14:08, 26 November 2025 (UTC)[reply]
What are we arguing about here? Obviously people are getting blocked for LLM misuse, not LLM use. And I agree with Gnomingstuff and LWG etc. I believe in AGF and have dozens of examples of editors who have stopped using LLMs after I alert them to the difficulty of using them in compliance with content policies. NicheSports (talk) 14:22, 26 November 2025 (UTC)[reply]
We're arguing about the assertion that we need a no AI rule because we've been blocking people solely for AI use without any attendant disruption. That is not true and therefore not a good reason to impose a no AI rule. voorts (talk/contributions) 14:23, 26 November 2025 (UTC)[reply]
To be more clear, when I said "the bulk of the thing" I meant the tenor of the responses in an average ANI posting. Several regulars at ANI generally seem to be under the impression that we do not allow AI, so most !votes are going to have largely unchallenged comments like CIR block now. This LLM shit needs to be stopped by any means necessary. or LLM use should warrant an immediate block, only lifted when a user can demonstrate a clear understanding that they can't use LLMs in any situation. Or if someone gets hit with a uw-ai2, they are told Please refrain from making edits generated using a large language model (an "AI chatbot" or other application using such technology) to Wikipedia pages. Gnomingstuff (talk) 00:39, 27 November 2025 (UTC)[reply]
People say a lot of incorrect things at ANI. We don't usually amend the PAGs to accommodate those people. voorts (talk/contributions) 01:05, 27 November 2025 (UTC)[reply]
On the contrary, that's exactly what we do. PAGs are meant to reflect the actual practice of editors. The process of updating old PAGs or creating new ones to reflect changes in editorial practice is the foundation that has built all of our policies and guidelines. Yours, &c. RGloucester 03:10, 27 November 2025 (UTC)[reply]
We're not tho. Nobody as far as I can tell has ever been blocked solely for using AI/LLMs. This is a red herring. voorts (talk/contributions) 13:54, 26 November 2025 (UTC)[reply]
If tomorrow, a LLM came out that could produce a FA-quality article on a given topic in 2 minutes, would you still suggest that LLMs have no place on Wikipedia?
Histrionic comparisons about scenarios that won't happen go both ways. Katzrockso (talk) 07:59, 27 November 2025 (UTC)[reply]
Yes, I would, because using such a technology to produce articles is contrary to the purpose and mission of Wikipedia. Wikipedia's defining principles are that it is free, that any human can edit it, and that its content is produced collaboratively by divers volunteers. Others and I have already explained why machine-produced content contravenes these principles. I care less whether an article is 'FA-quality', whatever that means, and more about how it was made. Yours, &c. RGloucester 08:46, 27 November 2025 (UTC)[reply]
Wikipedia's defining principles are that it is free, that any human can edit it, and that its content is produced collaboratively by divers volunteers. Others and I have already explained why machine-produced content contravenes these principles. I am certainly not a fan of LLMs for generating content. However, I don't see how you can say that a human editor, who chooses to use an LLM to generate some content, checks the content to make sure that it accurately reflects its sources and is otherwise PAG-compliant, and finally adds the sourced content to an article contravenes these principles. Wikipedia is no less free, any human can still edit it, and divers volunteers are still able to collaboratively work on the article. Even though that particular content happend to have been produced by a machine. Cheers, SunloungerFrog (talk) 09:12, 27 November 2025 (UTC)[reply]
Yes, in this hypothetical thought experiment. We don't live in a thought experiment. LLM output is getting better in that it is less obviously bad, but the nature of this kind of text generation means it is not suited well and may never be suited well to producing verifiable nonfiction articles. Gnomingstuff (talk) 14:18, 27 November 2025 (UTC)[reply]
Why? What's wrong with an LLM spellchecker other than that you don't like it? voorts (talk/contributions) 13:47, 25 November 2025 (UTC)[reply]
+1 Even the autocorrect on my iPhone uses a transformer, which is the same kind of neural network as that which powers LLMs. The major difference is in size (they're called large language models for a reason). SuperPianoMan9167 (talk) 14:19, 25 November 2025 (UTC)[reply]
  • Support. This guideline is a good start and I am glad it was approved but it should be expanded.LLMS are not an acceptable way to edit wiki as they cause lots of issues like hallucinations.Changing to oppose as I just realised this goes beyond creating content and would include thongs like grammerly .GothicGolem29 (Talk) 18:35, 28 November 2025 (UTC)[reply]
    @GothicGolem29: The Grammarly thing isn't necessarily included. As long as it doesn't generate its own output, it's not really a large language model, even if it claims to use one. The important thing here is that de novo output doesn't make it to the encyclopedia. Festucalextalk 23:46, 3 December 2025 (UTC)[reply]

Further amendment proposal #2: qcne

[edit]

Why the current version of the guideline is bad: A single sentence that clunkily prohibits all LLM use on new articles. How do we define that? Does "from scratch" cover the lead section only? the whole article? a stub? a list? Dunno! It doesn't bother to say! This is banning a method without actually defining where it begins or ends. Since no one can reliably tell if an LLM was used, enforcement would be impossible. LLM detection is unreliable, and we already have CSD G15 to handle unreviewed LLM slop.

I wrote this up a while ago and am now posting it for community consensus. I did just replace the Guideline with my version, but was sadly reverted.

Version 1

[edit]

Version 2

[edit]

Version 3

[edit]

I still believe this Guideline is grossly short and needs to be expanded a little bit, but am also taking into account the feedback given.

Would my much shorter Version 3 guideline here be at all acceptable to the more hard-line anti-LLM editors? I have:

  • made it more concise.
  • removed the limited use carve-out, with the idea that experienced editors can be trusted to use LLMs, and this Guideline is more focused towards new editors.

Hidden ping to users who have participated. qcne (talk) 22:37, 3 December 2025 (UTC)[reply]

I predict that hard-line anti-LLM editors will still want the word "unreviewed" removed from "do not add unreviewed LLM-generated content to new or existing articles". SuperPianoMan9167 (talk) 22:40, 3 December 2025 (UTC)[reply]
Yes, potentially, but I would like to have some sort of compromise! qcne (talk) 22:41, 3 December 2025 (UTC)[reply]
Agreed. SuperPianoMan9167 (talk) 22:42, 3 December 2025 (UTC)[reply]
The compromise on the reviewed language is to only allow it for experienced editors with an llm-user right. A few editors have suggested this. There is a vast amount of evidence (AfC, NPP, 1346 (hist · log), any WikiEd class, etc.), that inexperienced editors essentially never sufficiently review LLM-generated prose or citations. NicheSports (talk) 23:31, 3 December 2025 (UTC)[reply]
I think that'd have to be a separate RfC, would support Kowal2701 (talk) 23:32, 3 December 2025 (UTC)[reply]
Given my experience with CCIs of autopatrolled and NPR editors, and even the odd admin, would you be offended if I scream "NO!" really loudly at the idea of tying LLM use to a user right?
Sorry, but I've had too much trouble with older users being grandfathered in to the autopatrolled system to be comfortabel with the idea of giving somebody the right to say "Oh, but my use of Chat GPT is fine - I have autopatrolled!" GreenLipstickLesbian💌🧸 23:48, 3 December 2025 (UTC)[reply]
Valid point. There's been at least one editor who had their autopatrolled right revoked for creating unreviewed LLM-generated articles. SuperPianoMan9167 (talk) 23:53, 3 December 2025 (UTC)[reply]
far from being offended, I actually laughed 😅 but I would still much much rather deal with that problem than continuing the fantasy that inexperienced editors should be allowed to use these tools with review that is never performed! NicheSports (talk) 23:56, 3 December 2025 (UTC)[reply]
Disagree with adding an LLM-user right, but either way I think that is best workshopped elsewhere. fifteen thousand two hundred twenty four (talk) 23:55, 3 December 2025 (UTC)[reply]
The issue with "unreviewed" is that it is at risk of being wikilawyered, even a bad review would be kosher. Otherwise it's great. I worry that by having a nuanced approach, it'd struggle to communicate a clear message, especially since people dispositioned to use LLMs likely already have CIR issues that LLM-use is compensating for. I'd remove "unreviewed", and especially where the content is unverifiable, fabricated, or otherwise non-compliant with existing Wikipedia policies can support people's IAR "not what the policy was intended for" (if they so want) in the fringe cases LLM-use is not practically problematic, subject to consensus. Kowal2701 (talk) 23:08, 3 December 2025 (UTC)[reply]
"insufficiently reviewed" has more wiggle room while still allowing for the edge cases; once any problem is identified, it puts the responsibility on the person adding the content rather than other editors. GreenLipstickLesbian💌🧸 23:43, 3 December 2025 (UTC)[reply]
That'd be good too Kowal2701 (talk) 01:30, 4 December 2025 (UTC)[reply]
Honestly, "unreviewed" has been my main point of disagreement in every proposal that includes it -- thank you for articulating it. There are two fundamental problems:
First, if it's hard to know whether someone used AI, it's even harder to know how much they reviewed it.
Second, and more problematic: Properly "reviewing" LLM content means that every single word, fact, and claim needs to be verified against every single source. You essentially need to reconstruct the writing process in reverse, after the fact. But most good-faith editors who use AI seem to think "reviewing" means one of two things:
  • Quickly skimming it and going "yeah that looks OK."
  • Using AI to "review" the text.
This results, and will continue to result, in the following situation: Editor 1 finds some bad AI text. Editor 1 says that the AI text wasn't reviewed, and they aren't wrong. Editor 2 says that they did review the AI text, and they aren't lying. Meanwhile, the text remains bad. Gnomingstuff (talk) 01:31, 5 December 2025 (UTC)[reply]
Enthusiastic support. I think this is the best we're going to get for a compromise option between the two LLM ideologies here.
You don't leave any room for 'acceptable' carve-outs, you've included the very direct "Editors should not use an LLM to add content to Wikipedia, whether creating a new article or editing an existing one." which, although it uses 'should' and not 'must,' serves to discourage LLM use in general, which is very desirable for me. You've preserved the spirit of NEWLLM by categorically saying "Do not" use an LLM to author an article or major expansion, you've codified LLMCOMM by saying "Do not" use LLMs for discussions.
My only suggested change would be to drop the "Why LLM content is problematic" section. We already have that covered at WP:LLM, there's no need to bloat this guideline by including it here. Other than that, I think this is exactly the kind of AI guideline we should have right now. Athanelar (talk) 23:11, 3 December 2025 (UTC)[reply]
If we do that we should probably make WP:LLM an information page. SuperPianoMan9167 (talk) 00:13, 4 December 2025 (UTC)[reply]
I think that's totally fine. We can link to it from qcne's proposal (and even promote it to supplement if necessary). It's better than adding unnecessary bloat to the guideline. The main target for this guideline, after all, is going to be people who are already using AI for something and need to be told to stop, who probably aren't going to be interested in the finer points of why LLM use is problematic. If they want to do the further reading, they can. Athanelar (talk) 03:10, 4 December 2025 (UTC)[reply]
I did it. SuperPianoMan9167 (talk) 03:16, 4 December 2025 (UTC)[reply]
Awesome, thank you @SuperPianoMan9167. qcne (talk) 11:17, 4 December 2025 (UTC)[reply]
I was reverted. I did say people could do that when I made the change. SuperPianoMan9167 (talk) 16:46, 4 December 2025 (UTC)[reply]
I appreciate your work here. I do think what you have makes sense and also is realistic in how editors work. As for "unreviewed," could a footnote work to explain what "reviewed" means? - Enos733 (talk) 23:14, 3 December 2025 (UTC)[reply]
I'd like that. FTR I'd still support this regardless as it's a massive improvement Kowal2701 (talk) 23:22, 3 December 2025 (UTC)[reply]
  • Your ping missed me, but I really like the version 3 proposal. I agree with GreenLipstickLesbian that "insufficiently reviewed" would be better verbiage, but it's not a blocker. This would have my support as-is. Adding raw or lightly edited LLM output degrades the quality of the encyclopedia, and frequently wastes the time of other editors who must then cleanup after it. This proposed guideline would explicitly prohibit such nonconstructive model use in a clear manner, and would serve as a useful tool for addressing and preventing instances of misuse. fifteen thousand two hundred twenty four (talk) 00:11, 4 December 2025 (UTC)[reply]
Support I like it. Since that's not an argument, I also think this is finally a version Randy in Boise can understand and follow. ~ Argenti Aertheri(Chat?) 01:51, 4 December 2025 (UTC)[reply]
  • Serious concern: isn't this proposal contradictory? How can both of these statements be in the same guideline?
  1. Do not use an LLM as the primary author of a new article or a major expansion of an existing article, even if you plan to edit the output later. (Emphasis my own)
  2. Editors should not... Paste raw or lightly edited LLM output into existing articles as new or expanded prose. #2 strongly implies it is fine to add reviewed LLM content. But this directly contradicts #1. NicheSports (talk) 02:06, 4 December 2025 (UTC)[reply]
    These do not read as contradictory to me. Nowhere in #1 does it prohibit LLM use.
    even if you plan to edit the output later means editors cannot immediately add LLM output to the project with an excuse of "I'll fix it later", they must fix it first before it can be added at all. fifteen thousand two hundred twenty four (talk) 02:26, 4 December 2025 (UTC)[reply]
    I'm not sure about that interpretation... what about the first part of that sentence: Do not use an LLM as the primary author.... Still pretty contradictory. Either you can use an LLM to generate a bunch of text and then edit it, or you can't. This guideline, as written, plays both sides NicheSports (talk) 02:47, 4 December 2025 (UTC)[reply]
    I don't follow. #1 applies to edits which create new articles or are major expansions, situations where majority-LLM authorship would be especially undesirable, and so that is explicitly disallowed. #2 applies to editing in general, where raw or lightly-edited LLM content is disallowed. Maybe you could pose a hypothetical editing scenario where you believe a contradiction would occur, and that would help me understand your point better. fifteen thousand two hundred twenty four (talk) 03:15, 4 December 2025 (UTC)[reply]
    Oh. With this interpretation, I would support! But if I don't understand this I guarantee you a lot of the non-native English speakers who are using LLMs would miss the distinction. Can we clarify the wording? NicheSports (talk) 03:19, 4 December 2025 (UTC)[reply]
    It reads well to me, so I'm not sure what changes could be made, @Qcne may have some suggestions? fifteen thousand two hundred twenty four (talk) 03:30, 4 December 2025 (UTC)[reply]
    I mean the header needs to be changed but it could just be changed to "Rules for using LLMs to assist with article content" or something neutral. We should specify that #1 above are rules for "major content additions" while #2 is rules for "minor content additions". NicheSports (talk) 03:31, 4 December 2025 (UTC)[reply]
    I do prefer the current Do not use an LLM to add unreviewed content header, it communicates up-front what the most basic requirement is before providing more detail below.
    #1 does already specify that it concerns new articles or a major expansions, and #2 already applies to all editing, narrowing its scope would introduce another point of argumentation (define "minor" vs "major"). The grammatical clarity could maybe be improved, but right now it's in good enough condition for adoption, and as said prior, I'm wary of bikeshedding. fifteen thousand two hundred twenty four (talk) 03:51, 4 December 2025 (UTC)[reply]
    I also think we need to be wary of any headline like the suggested "Rules for including LLM content" for fear of implying permission. I do think the "do not" header is the best way to go about it, and the way it's currently written is fine for a compromise guideline which isn't aiming to be a total ban. Athanelar (talk) 04:00, 4 December 2025 (UTC)[reply]
    The categories could just be "New articles or major expansions" and "General considerations". Could just be a bolded title before each section. That would be enough to make it clear (I support your interpretation but completely missed it when I first read). I disagree with the "unreviewed content" header, because it does contradict the guideline's language for new articles and major edits, and is going to confuse the heck out of newer editors, but I guess I can live with it for now. NicheSports (talk) 04:05, 4 December 2025 (UTC)[reply]
3rd time is truly a charm. I really like his one. Викидим (talk) 02:54, 4 December 2025 (UTC)[reply]
  1. Remove the entire "Why LLM-written content is problematic" section. As I've said before, guidelines aren't information pages. Remove unnecessary words.
  2. Change to: "Do not use an LLM to add unreviewed content"
  3. "Handling existing LLM-generated content" – good section. Thumbs up from me on this one.
Cremastra (talk · contribs) 03:06, 4 December 2025 (UTC)[reply]
If guidelines aren't information pages, then shouldn't WP:LLM be tagged as an information page? SuperPianoMan9167 (talk) 03:08, 4 December 2025 (UTC)[reply]
IMO, yes, because that's what it is – it provides useful information on why LLMs are problematic and factual tips to handle and identify them. Cremastra (talk · contribs) 03:10, 4 December 2025 (UTC)[reply]
 Done in Special:Diff/1325613952. WP:LLM is now an information page. SuperPianoMan9167 (talk) 03:16, 4 December 2025 (UTC)[reply]
When/if qcne's guideline goes live, we must remember to add it to the information page template there as a page that is interpreted by it. Athanelar (talk) 03:31, 4 December 2025 (UTC)[reply]
I got reverted. SuperPianoMan9167 (talk) 16:44, 4 December 2025 (UTC)[reply]
Guidelines aren't information pages, true, but you do need need to explain to people why the guideline exists; Wikipedia attracts far too many free-thinking, contrarian, and libertarian types who like asking "why?" and will resist a nameless figure telling them what to do unless they're provided a reason to do otherwise. GreenLipstickLesbian💌🧸 03:10, 4 December 2025 (UTC)[reply]
Guidelines should absolutely link – prominently! – to pertinent information pages, and give a one or two-sentence explanation of why the guideline is necessary. But whole sections dedicated to justifying its existence mean that the important parts are covered by clouds of factual information rather than principled guidance, which is confusing for new editors, who need the guidelines most. Cremastra (talk · contribs) 03:12, 4 December 2025 (UTC)[reply]
Change to: "Do not use an LLM to add unreviewed content" – I don't think this is going to shape up to be that kind of full-ban proposal (unlike #1 and #3 on this page are). That said, the core text as-is would be straightforward improvement while also posing no impediment to adopting more restrictions in the future. WP:NEWLLM was a small step, this would be a larger one, I'd suggest not letting perfect be the enemy of better. fifteen thousand two hundred twenty four (talk) 03:27, 4 December 2025 (UTC)[reply]
Thanks for all the comments. I have formally opened an RfC: User talk:Qcne/LLMGuideline#RfC: Replace text of Wikipedia:Writing articles with large language models. qcne (talk) 11:28, 4 December 2025 (UTC)[reply]

Further amendment proposal #3: Athanelar

[edit]

Throwing my hat in the ring, essentially the same as Festucalex's proposal but just with slightly narrower scope that doesn't imply we're trying to police people using AI for idea generation or the likes.

Large language models (or LLMs) can be useful tools, but they are not good at creating entirely new Wikipedia articles. Large language models should not be used to generate new Wikipedia articles from scratch.
+
Large language models (or LLMs) are not good at creating article content which is suitable for Wikipedia, and therefore should not be used to generate content to add to Wikipedia, whether for new articles or when editing existing ones.

Athanelar (talk) 15:17, 24 November 2025 (UTC)[reply]

This completely changes the purpose of this guideline (expanding its scope from new articles to all edits) and would require a new RfC. Toadspike [Talk] 15:48, 24 November 2025 (UTC)[reply]
That's sort of the intention, yes. I assume Festucalex is doing the same, and the intention is to gauge support before a formal RfC to expand the guideline. Athanelar (talk) 15:52, 24 November 2025 (UTC)[reply]
@Qcne and Athanelar: May I have your permission to change the headers from their present titles to this:
  • Further amendment proposal #1: Festucalex
  • Further amendment proposal #2: qcne
  • Further amendment proposal #3: Athanelar
Just to make it clearer to other editors? I'll also change the section link that Athanelar put above. Festucalextalk 16:18, 24 November 2025 (UTC)[reply]
Of course, thank you. qcne (talk) 16:19, 24 November 2025 (UTC)[reply]
Go ahead, thanks. Athanelar (talk) 16:31, 24 November 2025 (UTC)[reply]
Done, thank you both. I took the liberty of adding an explanatory hatnote. Festucalextalk 16:34, 24 November 2025 (UTC)[reply]
This is all to see if people support a new guideline as opposed to a proper change. GarethBaloney (talk) 16:37, 24 November 2025 (UTC)[reply]
I suggest dropping the Large language models (or LLMs) can be useful tools part. It's not necessary and will cause an awkward divide if taken to RfC where editors who more broadly oppose LLM use would have to endorse that they are useful tools. fifteen thousand two hundred twenty four (talk) 16:27, 24 November 2025 (UTC)[reply]
I've modified my wording somewhat. I agree that part is unnecessary. Athanelar (talk) 16:35, 24 November 2025 (UTC)[reply]
As I've discussed previously, personally I would prefer any guidance not to refer to specific technology, as this changes and is not always evident to those using tools written by others, and focus on purpose. Along the lines of my previous comment in the RfC, I suggest something like "Programs must not be used to generate text for inclusion in Wikipedia, where the text has content that goes beyond any human input used to trigger its creation." (Guidance for generated images is already covered by Wikipedia:Image use policy § AI-generated images.) isaacl (talk) 18:22, 24 November 2025 (UTC)[reply]
How would Text generation software such as large language models (LLMs) should not [...] sound? Athanelar (talk) 18:26, 24 November 2025 (UTC)[reply]
Personally, I prefer using a phrase such as "Programs must not be used to generate text" as I think it better reflects what many editors want: text written by a person, not a program. I think whether it's in a footnote or a clause, text generation should be defined, so using programs to help with copy-editing, or to fill in the blanks of a skeleton outline is still allowed. Also, I prefer "must" to "should". isaacl (talk) 19:16, 24 November 2025 (UTC)[reply]
"Programs" is too nonspecific I think; a word processor is arguably a "program used to generate text" for example. We need to be somewhat specific about what sort of technology we're forbidding here. Athanelar (talk) 19:30, 24 November 2025 (UTC)[reply]
Thus why I said the meaning of text generation should be defined, and as I suggested, the generated text should not have content that goes beyond any human input used to to trigger its creation. Accordingly, word processors do not fall within the definition. isaacl (talk) 23:44, 24 November 2025 (UTC)[reply]
Honestly, I like this as the lead for Qcne's proposal above. Specifying it's about both creating articles and editing existing ones is good clarity Kowal2701 (talk) 21:41, 24 November 2025 (UTC)[reply]
Oppose. I would argue that the current text is already too restrictive (yes, AI can be abused, but so does the WP:AWB) and needs to be handled in other way altogether (like the AWB is handled). Викидим (talk) 22:04, 24 November 2025 (UTC)[reply]
This proposal is more restrictive than proposal #2, so it can't serve as a lead for it. isaacl (talk) 23:50, 24 November 2025 (UTC)[reply]
Support. I'm still going to try making incremental changes to improve the current version, but this closes the biggest loophole (inserting content into existing articles) while eliminating "from scratch". You're going to need to tighten your definitions though or "but it's only one sentence and I reviewed it". ~ Argenti Aertheri(Chat?) 21:13, 26 November 2025 (UTC)[reply]
How would you know whether one sentence was AI-generated? Is it practical to prohibit an undetectable use? Unenforceable "laws" can lead to a general disregard for rules ("Oh, yes, driving that fast is illegal here, but everybody does it, and the police don't care" becomes "Nobody cares about speeding, and reckless driving is basically the same thing"). WhatamIdoing (talk) 06:28, 27 November 2025 (UTC)[reply]
Is it practical to prohibit an undetectable use? – Banning all use bans all use. All vandalism is prohibited, not just detectable vandalism, same for NPOV violations, promotion, undisclosed paid editing, sockpuppetry, etc. What can be detected will be, what can not will not. I do not understand your point. fifteen thousand two hundred twenty four (talk) 06:43, 27 November 2025 (UTC)[reply]
Yes, banning bans all use. But if you can't tell whether the use happened, or prove that it didn't, then we might end up with drama instead of an LLM-free wiki. WhatamIdoing (talk) 02:20, 28 November 2025 (UTC)[reply]
We can't prove COI or undisclosed paid editing either, we still don't allow them. ~ Argenti Aertheri(Chat?) 19:39, 28 November 2025 (UTC)[reply]
And we end up with drama about that regularly, when an editor issues an accusation, and the targeted editor denies it, and how do you prove who's correct? WhatamIdoing (talk) 02:47, 2 December 2025 (UTC)[reply]
Since that's all par the course for COI, I think you may have misunderstood my !vote. I'm sorry if it sounded like I was trying to say one reviewed sentence should (not) be allowed. I meant to say: this will come up if this goes for RfC, so address it before RfC. Personally I think one reasonable length sentence is my comfort level, if only because of how much GPTs like to ramble. ~ Argenti Aertheri(Chat?) 18:31, 2 December 2025 (UTC)[reply]

Expanding CSD G15 to align with this guideline

[edit]

Those participating in this discussion might also be interested in my discussion about potentially expanding CSD G15 to apply to all AI-generated articles per this guideline. Athanelar (talk) 16:53, 24 November 2025 (UTC)[reply]

Discussion withdrawn within six hours by the OP due to opposition. WhatamIdoing (talk) 06:29, 27 November 2025 (UTC)[reply]

Not a proposal, just some stray ideas

[edit]

I didn't participate in the original RfC and I haven't fully read the new proposals and discussions here, but I'll table the rough notes I've been compiling at User:ClaudineChionh/Guides/New editors and AI in case there are any useful ideas there. (There might be nothing useful there; I'm still slowly working my way through the discussions on this page.) ClaudineChionh (she/her · talk · email · global) 23:04, 24 November 2025 (UTC)[reply]

After reflecting on the common refrain in these discussions that AI is just a tool, we should judge LLM text by the same standards we judge human text, I also finally put some of my thoughts on this matter into essay form (complete with clickbaity title!) at User:LWG/10 Wikipedia Policies, Guidelines, and Expectations That Your ChatBot Use Probably Violates. There's also a little "spot the LLM" easter egg if anyone wants a small diversion. -- LWG talk 03:03, 25 November 2025 (UTC)[reply]

Further amendment proposal #4: Mikeycdiamond

[edit]

During the initial discussion of this guideline, I noticed that people were complaining that others would use it to blanketly attack stuff at XFD because it might be by an AI. My proposal would fix that problem. I also noticed some slight overlap with the third sentence of my proposal and Qcne's proposal, but I would appreciate input on whether I should delete it. If my proposal were to be enacted, I believe it should be its own paragraph.

"When nominating an AI article for deletion, don't just point at it and say, "That's AI!" Please point out the policies or guidelines that the AI-generated article violated. WP:HOAX and WP:NPOV are examples of policies and guidelines that AIs commonly violate." Mikeycdiamond (talk) 00:55, 25 November 2025 (UTC)[reply]

Oppose. I would compare the situation to WP:BURDEN - deleting AI slop should be easy at the slightest suspicion, keeping it should require disclosures / proofs of veracity, etc. (like BURDEN does in the case of unsourced text). This proposal goes in the opposite direction: another editor should be able to tell me that "this article looks like AI slop. Explain to me how you created this text", in the same way they can point to BURDEN and tell me "show me your sources or this paragraph will be gone". Викидим (talk) 01:17, 25 November 2025 (UTC)[reply]
@Викидим, I have "the slightest suspicion" that the new articles you created at Attribute (art) and Christoph Ehrlich used AI tools. Exactly how easy should it be for me to get your new articles deleted? WhatamIdoing (talk) 06:35, 27 November 2025 (UTC)[reply]
The key word in my remark is "slop". I do not think that everything that AI produces is sloppy. Incidentally, I already provide full disclosures on the talk pages. I hope this would convince other editors in the veracity of the article content, so the hypothetical AfD would not happen. So, (1) I firmly believe that using AI should be allowed and (2) acknowledge the need to restrict the cost of absorbing the AI-generated text into the encyclopedia.
My personal preference would be to have a special "generative AI" flag that allows the editor to use generative AI. For some reason this idea is not popular. An alternative would be to shift the burden onto of proof of quality onto the users of generative AI. For an article showing the telltale signs of AI use, absence of published prompts or prompts indicating that the AI was involved in the search for RS can be grounds for deletion IMHO. Викидим (talk) 06:58, 27 November 2025 (UTC)[reply]
I think some editors believe "AI slop" is redundant (i.e., all generative AI is automatically slop), so your articles would be at risk of AFD.
Other editors believe that "deleting slop should be easy", even if it's not AI-related. WhatamIdoing (talk) 02:22, 28 November 2025 (UTC)[reply]
Regarding the quality of AI output: based on what I have witnessed firsthand, the modern AI models, when properly used, can provide correct software code of quite non-trivial size. I will happily admit that the uncertainties inherent in any human language make operations with it harder than than with programming languages, but the fact that AI (as of late 2025) in principle can generate demonstrably correct text is undeniable. Same thing apparently happens when AI is asked to produce, say, a summary of facts relating to X from a few-hundred-page book that references back to the pages in the original book. Here, based on personal experience, I am yet to encounter major issues, too. Writing of a Wikipedia article is very close to this latter job, so I see no reason why modern AI, properly prompted, should produce slop. Unlike in the former case, where the proof of correctness is definite, I can be wrong, and will happily acknowledge it if somebody provides me with an example of, say, Gemini 3.0 summarizing text on a "soft" topic wildly incorrectly after adequate prompts (which in this case are simple: "here is the file with text X, create summary of what it says about Y for use in an English Wikipedia article"). Викидим (talk) 04:39, 28 November 2025 (UTC)[reply]
Even if you think that modern AI can produce good content, other editors appear to be dead-set against it.
Additionally, you are opposing a request for editors to say more than "That's AI" when trying to get something deleted. Surely you at least mean for them to say "That's AI slop"? Because if "modern AI, properly prompted" is a reason for deletion, then your AI-generated articles will disappear soon. WhatamIdoing (talk) 02:50, 2 December 2025 (UTC)[reply]
I understand the internal contradiction in my posture. I stems from the fact that I look at AI from two angles, as an editor who actually likes to create articles using AI and feels good about the need to wash hands prior to cooking the text, and as an WP:NPP member where I occasionally face the slop. Викидим (talk) 06:46, 2 December 2025 (UTC)[reply]
My experience has been the opposite -- AI-generated text in my experience tends to represent sources so poorly that when I spot check some obviously-modern-AI text, there is a >50% chance that it's going to be the same old slop just with a citation tacked on.
Recent and characteristic example: Talk:Burn (Papa Roach song), generated a few days ago most likely with ChatGPT (based on utm_source params in the editor's other contributions). I don't know what LLM or prompt was used, but it took me only ~10 minutes to find several instances of AI-generated claims that sources say things that they simply don't. This isn't an especially noteworthy example either, it got it wrong in the exact same ways it usually does.
And if the article were to go to AfD -- note, I am not saying that it should -- that is actually relevant, because the AI text is presenting one source as multiple, and in one case inventing fictitious WP:SIGCOV literally just from a song's inclusion in a tracklisting. This becomes obvious when you read the cited sources, but many at AfD don't. Gnomingstuff (talk) 20:55, 2 December 2025 (UTC)[reply]
Oppose in its current form. Generally I think AI usage falls under WP:NOTCLEANUP -- a lot of AI-generated articles are about notable subjects, especially the ones where there's a language gap. But I do think there are legitimate reasons to bring AI usage up at AfD, because AI can misrepresent sources, and in particular often misrepresents them by making a huge deal out of a passing mention, making coverage seem significant that actually isn't. I also think that for certain topics -- POV forks, BLPs, etc. -- AI generation is a legitimate reason to just delete the thing. Gnomingstuff (talk) 01:23, 25 November 2025 (UTC)[reply]
Support. Explaining how WP:AfD is not cleanup is very important to clarifying the scope of this guideline Katzrockso (talk) 01:31, 25 November 2025 (UTC)[reply]
Promote WP:LLM to guideline We cite it and treat it as if it were a guideline and not an essay. For Pete's sake, just promote it already! It has everything necessary for a comprehensive LLM usage guideline. SuperPianoMan9167 (talk) 02:01, 25 November 2025 (UTC)[reply]
We've already gone through a month-long RFC to promote this to a guideline. Could you image how large the debate would be if we tried to promote that essay? It might be quicker to work on this guideline. Mikeycdiamond (talk) 02:05, 25 November 2025 (UTC)[reply]
That essay is comprehensive and well-written. In my opinion, it would be quicker to just promote it to guideline instead. Besides, it already contains guidance in the spirit of this guideline in the form of WP:LLMWRITE. It also contains WP:LLMDISCLOSE, which I think should be policy (and I am honestly baffled that it isn't). SuperPianoMan9167 (talk) 02:09, 25 November 2025 (UTC)[reply]
No one is stopping you from making an RFC. I don't disagree with you, but I am not sure if it would pass. Mikeycdiamond (talk) 02:12, 25 November 2025 (UTC)[reply]
I was looking through LLM's talk page archives; there was an RFC in 2023. The RFC showed large consensus against promoting it, but a lot has changed since then. Mikeycdiamond (talk) 02:22, 25 November 2025 (UTC)[reply]
Oppose; misses the point of NEWLLM, which is specifically to forbid AI-generated articles simply because they are AI-generated, and not because of AI-related policy violation. Athanelar (talk) 02:56, 25 November 2025 (UTC)[reply]
That's your interpretation of the guideline. Other editors will interpret it in different ways. SuperPianoMan9167 (talk) 02:57, 25 November 2025 (UTC)[reply]
The text of the guideline is pretty clear on what it forbids. It says that LLMs are not good at generating articles, and should not be used to generate articles from scratch. We can argue all day about what 'from scratch' means (which is what these amendment proposals are meant to solve) but the fact that the guideline forbids AI writing in itself is not I think ambiguous in any sense; there is no room in the proposal to argue that it's saying AI-generated articles are only bad if they violate other policies. Athanelar (talk) 03:06, 25 November 2025 (UTC)[reply]
If they don't violate other policies/guidelines, what is the point of deleting them? Isn't the sole reason of banning AIs because they violate our other policies/guidelines? Mikeycdiamond (talk) 03:11, 25 November 2025 (UTC)[reply]
Because they violate this guideline, which says you shouldn't generate articles using AI. Athanelar (talk) 03:15, 25 November 2025 (UTC)[reply]
WP:IMPERFECT and WP:ATD-E are core Wikipedia policies that collectively suggest WP:SURMOUNTABLE problems that can be resolved with editing should not be deleted. Katzrockso (talk) 03:45, 25 November 2025 (UTC)[reply]
In my eyes, a guideline which says "Articles should not be generated from scratch using an LLM" logically means the same thing as "An article generated from scratch using an LLM should not exist." It would be kind of odd to me to argue that this guideline doesn't support deletion; because what, you're saying that you shouldn't generate articles using AI, but if you happen to do so, then it's fine as long as it doesn't violate other policies/guidelines? That would mean that this guideline really does nothing at all.
And anyway, your argument also arguably applies to an AI-generated article which violates other policies/guidelines, too. I mean, those problems might also be surmountable, so what's the problem there? Should we disregard CSD G15 and say that unreviewed AI-generated articles are fine as long as the article subject is notable and the article is theoretically fixable with human intervention?
Basically, I think adding a paragraph to this guideline saying that you can't use it to support deletion would mean there's no point in this guideline existing at all, and you might as well just propose that the guideline be demoted again. Athanelar (talk) 03:57, 25 November 2025 (UTC)[reply]
Say Mary Jane generates an LLM-written article that has some major, but surmountable, issues. For example, two of her citations are to fake links, but other sources are readily available to support the claims, three of the claims are improperly in wikivoice when they should be attributed, and there is a section of the article that is irrelevant/undue. Would you suggest this article be deleted in whole, despite being otherwise a notable topic, or should editors be allowed to remedy the problems generated by the LLM usage? Katzrockso (talk) 04:04, 25 November 2025 (UTC)[reply]
I think in the given example it would essentially be the same amount of effort to TNT the article and start from scratch as to try to rework it from the flawed foundation; so yes, I'd say deletion would still be fine in that case.
Besides, what exactly would we be fighting to keep in the other case? It's not as if we'd be doing so out of a desire to respect Mary Jane's effort in creating the article. We'd be trying to hammer a square peg into a round hole for no reason other than 'well, the subject's notable and the article's here now, so...' Athanelar (talk) 04:11, 25 November 2025 (UTC)[reply]
It's my (and my other editors) belief that TNT is not a policy-based solution remedy (WP:TNTTNT), but one that violates fundamental Wikipedia PAG. In my given example, I don't see how "it would essentially be the same amount of effort to TNT the article and start from scratch as to try to rework it from the flawed foundation". The remedy in my scenario would be:
1) Replace the fake link citations to the readily available real sources that support the claim
2) Change the three sentences that are improperly in wikivoice to attributed claims
3) Remove the off-topic/irrelevant section
If you think that is more difficult than starting from scratch, I don't know what to express other than shock and disbelief. Katzrockso (talk) 06:02, 25 November 2025 (UTC)[reply]
About TNT: Has it ever occurred to you that the actual admin delete button isn't necessary? You can follow process you're thinking of (AFD, red link, start new article) or you could open the article, blank the contents, and replace it with the new article right there, without needing to spend time at AFD or anything else first. WhatamIdoing (talk) 06:37, 27 November 2025 (UTC)[reply]
(also, the article you've given as your example here would already be suitable for deletion under CSD G15 whether or not WP:NEWLLM existed, so if you don't think that article would be suitable for deletion, you're also arguing we shouldn't have CSD G15) Athanelar (talk) 04:13, 25 November 2025 (UTC)[reply]
Things are only as good as the parts that make them up. If it wasn't for HOAX or NPOV--among many other-- violations, this guideline wouldn't exist. We already have policies and guidelines for the subjects AIs violate; why shouldn't we use them? It is much clearer to point out the specific thing the text violates then blindly saying it is AI. I know AI text is relatively easy to spot now, but it will get progressively better at hiding from detection. What if people use anti-AI detection software? This guideline is meant to back up stronger claims using other policies/guidelines, not be the sole argument in an XFD. Mikeycdiamond (talk) 03:09, 25 November 2025 (UTC)[reply]
The text of this guideline literally says 'LLMs should not be used to generate articles from scratch.' Your proposed amendment to that guideline is to tell people that when deleting AI-generated articles, they cannot reference the guideline that specifically says 'Don't generate articles with AI' and must instead reference other policies/guidelines that the article violates.
That would seem to defeat the whole point of passing a guideline that says 'Don't generate articles with AI,' wouldn't it? Athanelar (talk) 03:14, 25 November 2025 (UTC)[reply]
Deletion policy wasn't really discussed all too much in the RfC or the nonexistent RFCBEFORE, so whether it defeats the purpose is not established. Many editors expressed positive attitudes towards the guideline because it provided somewhere to point to explain to people why their LLM contributions aren't beneficial. Katzrockso (talk) 03:47, 25 November 2025 (UTC)[reply]
Oppose as defeating the purpose of having a guideline. We just passed a guideline saying "don't create articles with LLMs", this would effectively negate that by turning around and saying "actually, it's fine if it doesn't violate anything else". It doesn't work that way with any other guideline and for good reason: imagine nominating something for deletion due to serious COI issues and being told "nah, prove it violates NPOV". No, the burden of proof is on the editor with the conflict because they're already violating one guideline. This is one guideline, violating one guideline is enough. ~ Argenti Aertheri(Chat?) 21:27, 25 November 2025 (UTC)[reply]
I agree completely with the objections raised by Викидим and Gnomingstuff and Athanelar and Argenti Aertheri. AFD is about what an article is lacking (sourcing establishing notability), not about what bad content it has - just remove the bad content and AFD whatever is left if warranted. So there is no reason to treat NEWLLM differently from any other guideline there. -- LWG talk 01:10, 26 November 2025 (UTC)[reply]
Oppose — This reminds me of when people tried to undercut the ban on AI slop images as soon as it passed. The guideline needs to made stronger, not weaker. pythoncoder (talk | contribs) 15:39, 26 November 2025 (UTC)[reply]
Oppose per all above. A guideline is a guideline and a statement of principle, and should be used directly, not as through proxies. If there is overwhelming evidence an article is wholly AI-generated such that it falls afoul of this guideline, the article should be deleted at AfD. Cremastra (talk · contribs) 19:01, 26 November 2025 (UTC)[reply]
Oppose. Not topical in this guideline as this guideline is not about deletion in the first place.—Alalch E. 23:52, 27 November 2025 (UTC)[reply]
Some people think it is, see #Expanding CSD G15 to align with this guideline. SuperPianoMan9167 (talk) 00:04, 28 November 2025 (UTC)[reply]

community consensus on how to identify LLM-generated writing

[edit]

Not sure how I feel about this one.

On the one hand, there is some research suggesting that consensus helps: specifically, when multiple people familiar with signs of AI writing agree on whether a given piece of text is AI, they can achieve up to 99% accuracy. Individual editors were topping out at around 90% accuracy (which is still very good obviously).

On the other hand, we have to treat an edit as human-generated until there's consensus otherwise seems like a massive restriction that came out of nowhere -- it doesn't have consensus in the RfC and I'm not sure more than a handful of people even said anything close. Like, just think about how that would work in practice. Do we have to commune a whole AI Tribunal before reverting text that is very clearly AI-generated? Is individual informed judgment not enough?

This stuff is really not hard to identify. WP:AISIGNS exists, and is relatively up to date with existing research on common characteristics of LLM-generated text -- and specifically, things it does that text prior to ~2022 just... didn't do very often. This is also the case with Wikipedia text prior to mid-2022. I've been running similar if lax text crunching on Wikipedia articles before mid-2022, and the same tells have just skyrocketed. The problem is actually convincing people of this: that AI text consistently displays various patterns far more often than human text does (or for that matter, than LLM base models do), that people have actually studied those patterns, and that the individual edit they are looking at fits the pattern almost exactly. Is the page just not clear enough? Does it need additional citations? Gnomingstuff (talk) 01:11, 25 November 2025 (UTC)[reply]

I think this caveat was added to the RfC only because the closer didn't believe there was enough consensus for the promotion to guideline, and adding the requirement for consensus to determine that an article is in fact AI generated helps to soothe those who think the guideline is over-restrictive.
I also think it's really a non-issue; since there's no support currently to expand CSD G15 to apply to all AI-generated articles, any article suspected of being AI-generated in violation of NEWLLM will have to go to AfD anyway, which automatically will end up determining consensus about whether the article is AI generated and should be deleted under NEWLLM. Athanelar (talk) 03:02, 25 November 2025 (UTC)[reply]
we have to treat an edit as human-generated until there's consensus otherwise Where did this come from?
As for your number crunching, I'm not sure if I understand the results, but if we are going to start taking phrases like "pivotal role in" and "significant contributions to" as evidence of LLM contributions, then I think this starts to pose problems. Katzrockso (talk) 03:03, 25 November 2025 (UTC)[reply]
It's from the RFC closing note. Athanelar (talk) 03:07, 25 November 2025 (UTC)[reply]
That sentence in the closing note is strange to me as well, and only makes sense in the context of an AFD or community sanctions on a problem user. In terms of reversion/restoration of individual suspected LLM-edits, the WP:BURDEN is clearly on the user who added the content to explain and justify the addition, not on a reverting editor to explain and justify their reversion. In the context of LLM use, that means that if someone asks an editor "did you use an LLM to generate this content, and if so what did that process look like?" they should get an clear and accurate answer, and if they don't get a clear and accurate answer the content should be removed until they do. -- LWG talk 03:22, 25 November 2025 (UTC)[reply]
I think ultimately it's just an effort by the closer to avoid 'taking a side' on what they perceived as a pretty tight consensus, and to preempt a controversy about the nature of the guideline; which of course is occurring anyway. Athanelar (talk) 03:32, 25 November 2025 (UTC)[reply]
No, it's not any of those things. It's me knowing this argument was going to be made and pre-empting it. Where there's no rule or guideline, Wikipedia makes content decisions by consensus; so an edit isn't to be treated as AI-generated until either we've got consensus for a test that it's AI-generated or else we've analysed the edit and reached consensus that it's AI-generated.
I know this limits the applicability of the guideline but that's not because I'm unclear or unsure about the RFC outcome or worried about taking sides. It's because of how long-established Wikipedia custom and practice works.
A test of what actually identifies AI-generated writing should really be the next step, folks.—S Marshall T/C 08:57, 25 November 2025 (UTC)[reply]
The issue is that requiring consensus before tagging content as problematic (instead of tagging the content and then following WP:BRD) imposes an unnecessary restriction, even on current practices, which wasn't brought up in the discussion. This close would mean, for example, that we can't tag a page as {{AI-generated}} anymore without first requiring an explicit consensus. This isn't Wikipedia custom and practice for tagging and has never been. Chaotic Enby (talk · contribs) 09:10, 25 November 2025 (UTC)[reply]
The best solution to these problems is to reach consensus on a test. But obviously, tagging doesn't need consensus and never has. What's not allowed is to revert or delete content for being AI-generated unless there's consensus to do so. Just to be clear: all our normal rules apply. You can still revert for all the usual reasons. BRD still applies. ONUS still applies. You can still tag stuff you suspect might be problematic.—S Marshall T/C 09:43, 25 November 2025 (UTC)[reply]
But obviously, tagging doesn't need consensus and never has. This is certainly not obvious from your close, which says that this means that we have to treat an edit as human-generated until there's consensus otherwise. A closure should only summarize the given discussion, not add new policies that need to rely on the word of the closer for later clarification, even if they would be a logical development from previous practice. Chaotic Enby (talk · contribs) 10:44, 25 November 2025 (UTC)[reply]
Summarize and clarify. A close should summarize the community's decision and clarify its relationship to existing policy and procedure. What we don't want looks like this: I think this user is adding AI-generated content so I'm going to quick-fail all their AfC submissions and then follow them round reverting and prodding.S Marshall T/C 12:22, 25 November 2025 (UTC)[reply]
What's not allowed is to revert or delete content for being AI-generated unless there's consensus to do so --
I'm not aware of anything in policy stating this -- certainly not AI policy, because we don't have any. Based on the consensus of this RfC, and on the fact that people are already reverting and deleting content for being AI to relatively little outcry, I don't think there would be consensus for such a prohibition, and I think most people in the RfC would be surprised to learn they were !voting for one. Gnomingstuff (talk) 10:16, 26 November 2025 (UTC)[reply]
As far as a test being the next step... I mean I'm trying Jennifer. We have WP:AISIGNS and are trying to make it as research-backed as possible. It is an evolving document, and I'm sure most contributors to it have their own list of personal tells they've noticed. (For example I trust @Pythoncoder's judgment implicitly on detecting AI but they see stuff I have no idea about. Apologies if you don't want the ping, I figured the outcome here is relevant to you.) But there are several problems:
Problem 1: Getting people to actually believe that these are signs of AI use. There seems to be no amount of evidence that is enough.
Problem 2: Getting people to interpret things correctly. This stuff gets very in-the-weeds, and AISIGNS leaves out a lot for that reason. For instance, one "personal tell" I have noticed is that Additionally, starting a sentence with capitals and punctuation, is a strong indicator of possible AI use, but the word additionally as an infix isn't necessarily a sign. Other tells I have are still kind of in the oven until I can hammer out a version with as few false positives as possible, with as little potential for confusion.
Problem 3: We are doomed to remain in the world of evidence, not proof. It is impossible to prove whether AI was used in an edit unless you are the editor who made it. Since we have had AI text incoming since 2023, many of those editors aren't around anymore. Other editors are not forthcoming with the information. Some dodge the question, some trickle-truth it, small handful of editors lie. Gnomingstuff (talk) 10:34, 26 November 2025 (UTC)[reply]
This is exactly the shit I mean. When:
  • A word is identified in multiple academic studies as very over-represented in LLM-generated text compared to human text
  • The most obvious phrase containing that word is roughly 1605% more common in one admittedly less rigorous sample of AI-generated edits compared to human-generated -- a substantial portion of which are human-generated articles tagged as promotional
...then yes, it would seem to be empirical evidence? No one can prove how a user produced an edit besides that user, but when patterns start showing up that happen to be similar patterns to ones cited in external sources as characteristic of AI use, that is telling. Gnomingstuff (talk) 03:31, 25 November 2025 (UTC)[reply]
Empirical evidence of what? I have humanly generated both those phrases before (not on Wikipedia, I don't think, but elsewhere), are you going to suggest deleting my contributions on these types of grounds, because your model suggests that LLMs use these phrases at higher rates? Keep in mind that human language is changing as a result of LLMs ([5]), for better or worse. Katzrockso (talk) 03:53, 25 November 2025 (UTC)[reply]
Empirical evidence that these words and phrases appear more frequently in the aggregate of AI-generated text -- in this case, on Wikipedia -- compared to the aggregate of human-generated text on Wikipedia. They also tend to occur together, and occur in the same ways, in the same places in sentences, the same forms, etc. So if an edit shows up with a whole bunch of this crammed into 500 words, that's a very strong indication that the text is probably AI. Not a perfect indication -- for instance, this version of Julia's Kitchen Wisdom is way too early for AI but sounds just like it -- but a very strong one.
I am aware of the studies that human language is changing as a result of LLMs -- one study suggests that this particular set of words is really just a supercharge to increases in those words that were naturally happening already. That particular study is less convincing because it seems to think podcasts are never scripted or pre-written, which is... not true. But anecdotally I do see it happening. (It's a bit weird to hear this stuff out of human mouths in the wild, although that's probably just the frequency illusion given how much AI text I am seeing all day.) Not sure how much that affects Wikipedia, especially the last few years of AI stuff to deal with, given that the changes in human language feel like a lagging indicator. Gnomingstuff (talk) 10:08, 26 November 2025 (UTC)[reply]
Incidentally GPTZero scans that revision of Julia's Kitchen Wisdom as 98% human, highlighting the pivotal role of illustrating the benefit of using multiple channels of evidence to assess content. -- LWG talk 17:47, 26 November 2025 (UTC)[reply]
I have the opposite reaction to Individual editors were topping out at around 90% accuracy (which is still very good obviously): I look at that and say even the best of the best were making false accusations at least 10% of the time.
Imagine the uproar if someone wanted to work in Wikipedia:Copyright problems, but they made false accusations of copyvios 10% of the time. We would not be talking about how good they are.
If anything, this information has convinced me that unilateral declarations of improper LLM use should be discouraged. Maybe tags such as Template:AI-generated should be re-written to suggest something like "This article needs to be checked for suspected AI use". WhatamIdoing (talk) 07:01, 27 November 2025 (UTC)[reply]
The template already says the article may contain them. There is a separate parameter, certain=y, that is added for cases where the AI use is unambiguous. Gnomingstuff (talk) 04:21, 28 November 2025 (UTC)[reply]
There does not need to be a community consensus on how to identify LLM-generated writing. It's a technical question. Different editors will apply different methods. Disputes will be resolved in the normal way. —Alalch E. 23:49, 27 November 2025 (UTC)[reply]
Tell that to the closing admin who specifically said in the RfC close In particular we need community consensus on (a) How to identify LLM-generated writing [...] Athanelar (talk) 00:13, 28 November 2025 (UTC)[reply]
That statement is true because most signs of AI writing, except for the limited criteria of G15, are largely subjective. SuperPianoMan9167 (talk) 00:17, 28 November 2025 (UTC)[reply]
A closer does not need to be an admin and the closer wasn't in this case. GothicGolem29 (Talk) 18:48, 28 November 2025 (UTC)[reply]

Further amendment proposal #5: Argenti Aertheri

[edit]
Large language models (or LLMs) can be useful tools, but they are not good at creating entirely new Wikipedia articles. Large language models should not be used to generate new Wikipedia articles from scratch.
+
Artificial intelligence, including GPTs and Large language models (or LLMs), is not good at creating entirely new Wikipedia articles, and should not be used to generate new Wikipedia articles from scratch.

We barely got the thing passed, so I propose we make small, incremental, changes. Changing LLMs to all AI seems as good a place to start as any other, and probably less controversial than some. ~ Argenti Aertheri(Chat?) 03:53, 25 November 2025 (UTC)[reply]

Oppose. One of the primary criticisms the first amendment proposals were trying to address was the prominent criticism during RfC that the term 'from scratch' has no agreed-upon definition and thus the scope of which articles this guideline applies to isn't clearly defined; your proposal doesn't address that, and in the process introduces a whole host of new ambiguity as to what tools are and aren't allowed, and in what capacity one might be allowed to use them. Athanelar (talk) 04:00, 25 November 2025 (UTC)[reply]
There's a definition at wikt:from scratch. Merriam-Webster offers a similar definition.
There were 37 uses of "from scratch" in the RFC; most of them were entirely favorable. There were 117 editors in the discussion; I see four who complained about the "from scratch" wording, and some of them (example) would still be valid no matter what words were used. WhatamIdoing (talk) 07:10, 27 November 2025 (UTC)[reply]
GPT is a type of LLM, not something that can be contrasted with it. What other forms of "artificial intelligence" (a dubious + nebulous concept) are creating Wikipedia articles other than LLMs? Katzrockso (talk) 04:00, 25 November 2025 (UTC)[reply]
The point isn't to address all the problems in the guideline that passed, just one: what technologies does this include. I know AI is a nebulous concept, that's actually why I chose it, so that WP:Randy from Boise can tell in seconds if his use of his software is included. Porn is a nebulous concept too, but we all know it when we see it. ~ Argenti Aertheri(Chat?) 04:20, 25 November 2025 (UTC)[reply]
What is not covered by the existing guidelines that your change would include? Katzrockso (talk) 05:56, 25 November 2025 (UTC)[reply]
1) Remove the unnecessary "can be useful tools", it's not relevant here.
2) Replace the technical term "LLM" with a more readily accessible definition that clarifies that we want human intelligence, not artificial intelligence, regardless of the exact technology being used. Ergo explicitly stating GPTs despite them being a subset of LLMs, people know what a GPT is and if they're using one. ~ Argenti Aertheri(Chat?) 06:33, 25 November 2025 (UTC)[reply]
The "can be useful tools" part was just implemented as a part of the RfC on the two-sentence guideline, removing half of the approved text from the RfC is not a good start.
"clarifies that we want human intelligence, not artificial intelligence" makes no sense, is less clear than the current version and if anything muddies the scope and applicability of this guideline. Katzrockso (talk) 09:34, 25 November 2025 (UTC)[reply]
Would you find it acceptable to change the current wording from "LLMs" to "LLMs, including GPTs" if no other changes were made? ~ Argenti Aertheri(Chat?) 19:02, 25 November 2025 (UTC)[reply]
I would find it acceptable/unobjectionable, I just think it's superfluous Katzrockso (talk) 00:14, 26 November 2025 (UTC)[reply]
It's redundant if you know that GPTs are LLMs, but not if you're just Randy from Boise asking ChatGPT about the Peloponnesian War. Randy would likely have an easier time understanding the guideline with that explicitly spelled out. ~ Argenti Aertheri(Chat?) 01:35, 26 November 2025 (UTC)[reply]
Maybe a footnote like the one in WP:G15 would work, which says The technology behind AI chatbots such as ChatGPT and Google Gemini. SuperPianoMan9167 (talk) 02:07, 26 November 2025 (UTC)[reply]
Works for me, hopefully it works for Randy too. Should I reword this proposal or WP:BRD? ~ Argenti Aertheri(Chat?) 07:22, 26 November 2025 (UTC)[reply]
I went ahead and added the footnote. SuperPianoMan9167 (talk) 22:47, 26 November 2025 (UTC)[reply]
This is much clearer/explanatory than the term "GPTs" or "artificial intelligence". Support this change Katzrockso (talk) 07:47, 27 November 2025 (UTC)[reply]
I think that @Katzrockso and @Argenti Aertheri make a good point, and it's one that could be solved by making a list. Imagine something that says "This bans article creation with AI-based tools such as ChatGPT, Gemini, and that paragraph at the top of Google search results. This does not ban the use of AI-using tools such as Grammarly, the AI grammar tools inside Google Docs, or spellcheck tools."
These lists don't need to be in this guideline, but it might help if they were long. It should be possible to get a list of the notable AI tools in Template:Artificial intelligence navbox. WhatamIdoing (talk) 07:17, 27 November 2025 (UTC)[reply]
So this begs the question why is Grammarly spell check allowed but not ChatGPT spellchecking? I'm not saying that people should plop "Write me a Wikipedia article" into a LLM and paste that into Wikipedia, but these LLMs have other use cases too. What use cases people want to prohibit/permit really need to be laid out more explicitly for this to be workable. Katzrockso (talk) 07:46, 27 November 2025 (UTC)[reply]
Here (as someone who admittedly has not used Grammarly since their adoption of LLM tech) it would (potentially) be that Grammarly uses a narrow and specific LLM model that has additional guardrails that prevent it from acting in the generative manner that ChatGPT does. Or at least that would have been the smart way of rolling out LLM tech for Grammarly, as said I've not used it so I don't know where they have implemented rails. -- Cdjp1 (talk) 16:54, 27 November 2025 (UTC)[reply]
In my experience reading Grammarly-edited text, it doesn't always use those guardrails well. It also tends to push a lot of more expansive AI features on people. Gnomingstuff (talk) 17:07, 29 November 2025 (UTC)[reply]
In re this begs the question why is Grammarly spell check allowed but not ChatGPT spellchecking? Yes, well, that is a question, isn't it? And I think it's a question that editors won't be able to answer if they don't realize that ChatGPT can do spellchecking.
https://arxiv.org/html/2501.15654v2 (which someone linked above) gave 300 articles to a bunch of humans, and asked them to decide whether each article was AI-generated or human-written. They learned that an individual who doesn't use LLM incorrectly missed 43% of the LLM-written articles and falsely accused 52% of the human-written articles as being LLMs. This is in the range of a coin-flip; it is almost random chance.
I'm reminded of this because those non-users (e.g., me) are also going to be unaware of the various features or tools in the LLMs. A list might inform people of what's available, and therefore let us use a bit more common sense when we say "This tool is acceptable for checking your spelling, but that tool is prohibited." WhatamIdoing (talk) 02:30, 28 November 2025 (UTC)[reply]
It's spellcheck, no one cares how you figure out how to spell a word as long as you knew which word you were trying to spell. I'd be wary of grammarly unless they put guardrails as Cdjp1 suggests though, and if they have guardrails then that's what needs to be specified: which built-in guardrails make it ok? ~ Argenti Aertheri(Chat?) 04:50, 28 November 2025 (UTC)[reply]
Nobody should care how you figure out how to spell a word, but it sounds like some editors aren't operating with that level of nuance. WhatamIdoing (talk) 02:52, 2 December 2025 (UTC)[reply]
LLMs can't do spellchecking in the sense we are used to. They can do something that can be similar in output, but the underlying process used won't be the same, due to the fundamental way llms work. In terms of tools, any llm-use will have this underlying generative framework because everything is converted into mathematics and then reconverted in some way. As Cdjp1 and Gnomingstuff note, refining any llm-use is about building the right guardrails, but these don't change the way the underlying program works. The complication with Grammarly is that it has its original software and new llm-based tools, and I'm not sure how much control or even knowledge the user has. Same possibly with Microsoft these days. CMD (talk) 07:24, 2 December 2025 (UTC)[reply]
In a couple of years, will the average person realistically have a way to use ordinary word processing software (e.g., MS Word or Google Docs) without an LLM being used somewhere in the background? I don't know. Maybe it just looks inevitable because of where we are in the Gartner hype cycle right now, but the inadvertent use of LLMs feels like it will only get bigger over time. WhatamIdoing (talk) 19:59, 4 December 2025 (UTC)[reply]

Since copying over the footnote seems pretty non-controversial, version 2:

Large language models (or LLMs) can be useful tools, but they are not good at creating entirely new Wikipedia articles. Large language models should not be used to generate new Wikipedia articles from scratch.
+
Large language models (or LLMs) are not good at creating entirely new Wikipedia articles. Large language models should not be used to generate new Wikipedia articles from scratch.

While true, it's not relevant and only makes this mess messier. If it's a guideline about content creation then it doesn't really matter how well LLMs can do other tasks. ~ Argenti Aertheri(Chat?) — Preceding undated comment added an unspecified datestamp.

Since you didn't get any direct replies to this, here's a late comment:
We're trying to present this as a guideline that involved reasonable people making a reasonable choice about reasonable things, rather than a bunch of ill-informed AI haters. The guideline is less likely to seem unreasonable or to be challenged by pro-AI folks if it acknowledges reality before taking away their tools. Therefore the guideline acknowledges and agrees with their POV ("can be useful"), names the community's concern ("not good at creating entirely new Wikipedia articles"), and then states the rule ("should not be used to generate new Wikipedia articles from scratch"). WhatamIdoing (talk) 20:07, 4 December 2025 (UTC)[reply]
Agreed. The rules are principles, not lists of things that editors should and should not do. SuperPianoMan9167 (talk) 20:10, 4 December 2025 (UTC)[reply]
Agreed. Alaexis¿question? 21:04, 5 December 2025 (UTC)[reply]

Supplemental essay proposal on identifying AI-generated text

[edit]

Seeing as it has been noted (particularly by the RfC closer) that the existence of a guideline which prohibits AI-generated articles necessitates the existence of a consensus standard on identifying AI-generated articles, I've drafted a proposal which aims to codify ways that AI text can be identified for the purpose of enforcing this guideline (and any other future AI-restricting guideline)

The essay content is largely redundant to WP:AISIGNS but rather than just a list of AI indicators it specifically aims to be a standard by which content can be labelled as AI-generated.

Your feedback and proposed changes/additions are most welcome at User:Athanelar/Identifying AI-generated text. If reception is positive I will submit an RFC.

Pinging some editors who were active in this discussion: @Qcne @Voorts @Gnomingstuff @Festucalex @Mikeycdiamond @Argenti Aertheri @LWG Athanelar (talk) 17:55, 26 November 2025 (UTC)[reply]

I agree a consensus standard is implied, but I would guess any rate of false positives or negatives will render either a guideline or tools controversial. I have a few suggestions: 1) I prefer a 'weak' or humble standard, using various criteria or methods may suggest but not prove AI use. 2) Checking the volume of changes, either as a single submission or in terms of bytes/second from a given IP or account, may occasionally serve as a cheaper semi-accurate proxy for AI detection, although once again there will be false positives and negatives. 3) Given the rapid development and diversity of AI tools, and the resources involved, I do not think developing uncontroversial tools for AI detection is a feasible goal in the near future. Deploying automatic tools sitewide or on-demand would likely be prohibited by cost, but if individual users wish to run them, I think their findings could contribute evidence towards a finding - so long as we guard against bias and overconfidence in the use of these tools. --Edwin Herdman (talk) 19:32, 26 November 2025 (UTC)[reply]
The "suggest" wording is a good idea. For those who worry it may not be workable, our entire concept of notability rests on similar wording (e.g. "presumed to be suitable", "typically presumed to be notable"). If we're going down this road, I'd support wording like this and judgement by consensus in case of dispute. Toadspike [Talk] 21:08, 26 November 2025 (UTC)[reply]
Regarding AI tools changing quickly, I did some very very very rough analysis of text pre- and post-GPT-5 if anyone is interested. Will revisit once I have more data. Gnomingstuff (talk) 03:57, 27 November 2025 (UTC)[reply]
I made one small tweak -- adding the bit about edits having to be post-2022 for AI use to even be possible. "Strongly suggest" is the best we can do, unfortunately. If the burden of proof is on the person tagging/identifying AI-generated text, then that is almost literally impossible to provide because no one knows how someone made an edit but that person.
As far as automated tools, you could do worse than just scraping all articles containing >5 instances (or whatever) of the listed "AI vocabulary" words, and then manually checking those to see what's up. (This is basically what I've been doing, minus the tools.) The elephant in the room, though, is that LLMs are changing right now -- GPT-5.1 came out just 2 weeks ago. We also almost never know which tools people are using, let alone the version or prompt or provided sources. And all that is compounded by the fact that even researchers don't know why AI sounds the way it does. The whole thing is largely a black box, and it's honestly kind of surprising we (as in we-the-public) have figured anything out at all. Gnomingstuff (talk) 00:11, 27 November 2025 (UTC)[reply]
Thanks for your tweak. I haven't had any adverse reaction to this essay yet, so I'll give it until the 24 hour mark and if nobody's raised any major objections I'll put it up for RfC, and providing that passes then we can link to my essay from the NEWLLM page and that'll at least solve one of the RfC close's two problems. Then it'll just be a matter of codifying what we do if something breaches NEWLLM; but people seem to be generally on board with 'send it to AfD' as a solution for that already.
My fingers are crossed we can move onto RfC for a proposal to expand NEWLLM to include all AI-generated contributions and not just new articles. Athanelar (talk) 00:16, 27 November 2025 (UTC)[reply]
This is redundant to WP:AISIGNS. Perhaps some content can be merged with AISIGNS. —Alalch E. 23:47, 27 November 2025 (UTC)[reply]
Note for everyone subscribed to this discussion; I have raised an RfC at the essay's talk page. Athanelar (talk) 00:20, 28 November 2025 (UTC)[reply]

A hypothetical scenario

[edit]

Here's a hypothetical scenario to consider. Say you have an editor writing an article. It's a well-written, comprehensive article. They publish their draft and it gets approved at AfC and moved to mainspace. If that editor then says "I used AI to write the first draft of this article", does this guideline require the article be deleted, even though the content is perfectly acceptable? SuperPianoMan9167 (talk) 00:52, 27 November 2025 (UTC)[reply]

Personally I believe that if the article has been comprehensively rewritten and checked line by line for accuracy prior to asking other editors to spend time on it at AfC, the tools used for the initial draft don't matter. -- LWG talk 01:04, 27 November 2025 (UTC)[reply]
To me, "from scratch" implies a lack of rigorous review or corrections from a human editor. I attempted to clarify this in [6], but it got reverted. No reasonable person would require a perfectly-written and verified article to be deleted merely because an early draft was written with software assistance. Anne drew (talk · contribs) 01:05, 27 November 2025 (UTC)[reply]
It's possibly already happened, and certainly has been used for edits. One temporary account recently asked about it at the help desk. I wrote my questions 0 and 1 for this case. Reasons I think are good for disallowing it are: 1) We don't like the 'moral hazard' of letting a part of the process not have human input, and the larger the change without human input and oversight, the greater the potential problem. 2) Openly allowing AI use might cause human reviewers to be overwhelmed. 3) The copyright status of Wikipedia content could be challenged, especially if 'substantive' AI edits are allowed to stand, a concern I think may be decisive for Wikimedia Foundation and ArbCom given the potential for losses. I think a lot of the rest of it is similar to the risks we accept in ordinary editing - bias and errors may propagate for a long time, but we hope that eventually somebody spots the problem. --Edwin Herdman (talk) 02:34, 27 November 2025 (UTC)[reply]
It has absolutely already happened, to the tune of thousands of articles that we know about. And the ones we know about, we know about because there were enough signs in the text to be identifiable as AI. Gnomingstuff (talk) 02:35, 27 November 2025 (UTC)[reply]
@Edwin Herdman, I don't think I understand The copyright status of Wikipedia content could be challenged, especially if 'substantive' AI edits are allowed to stand, a concern I think may be decisive for Wikimedia Foundation and ArbCom given the potential for losses.
Does this mean that:
  • some of Wikipedia's contents will not be eligible for copyright protection? In that case, the WMF isn't going to care (they're willing to host public domain/CC-0 content, though they would prefer that it was properly labeled), and protecting editors' copyrights is none of ArbCom's business. (ArbCom cares about editors' behavior on wiki. They are not a general-purpose governance group.)
  • someone might (correctly) claim that they own the copyright for the AI-generated/AI-plagiarized contents of an article? In that case, the WMF will point them to the WP:DMCA process to have the material removed. If the copyright holder wishes to sue someone over this copyvio, they will need to sue the editor who posted it (not the WMF or ArbCom). This is in the foundation:Policy:Terms of Use; look for sentences like "Responsibility — You take responsibility for your edits (since we only host your content)" (emphasis in the original) and "You are responsible for your own actions: You are legally responsible for your edits and contributions" (ditto).
WhatamIdoing (talk) 05:48, 27 November 2025 (UTC)[reply]
I wrote that badly, but you've clarified the issue. I can't assume Wikipedia will always benefit from the Safe Harbor provision - the DMCA might be amended again or even repealed, or Wikipedia might be found to fail the Safe Harbor criteria. Even without a suit seeking damages, the DMCA process imposes at least some administrative burdens which I would consider worth a rough worst-case scenario estimate. I'll be happy if wrong; AI risks on copyright aren't totally unlike what any editor can do without AI, what's different is mainly spam potential and the changing legal landscape. My final thought is that LLMs don't inherently bring copyright issues - it's possible an LLM with a clear legal status might be developed. --Edwin Herdman (talk) 08:38, 27 November 2025 (UTC)[reply]
Based purely on the plain meaning of 'from scratch,' I would say that if the majority of the article's text is AI generated, then this guideline would suggest that the article should be deleted.
If a 'first draft' was written with AI and then substantially rewritten by a human, it would essentially be the same as doing it from scratch by the human, so it gets a pass.
'From scratch' to me implies you had nothing before, now you have an article. If that article was written with AI, then it falls afoul of this guideline. Athanelar (talk) 15:07, 27 November 2025 (UTC)[reply]
I would argue that there are actually two ways to parse how the “from scratch” guideline applies:
1. (as intended) You may not use an LLM to write a wholly new article that does not exist on Wikipedia as of yet.
2. You may not write an article by asking an LLM to generate it “from scratch”- ie without putting in any information. (Implied- you may use an LLM if you provide it with raw data)
In other words, it is entirely possible to read the “from scratch” clause as referring to the LLM generation process, and not the Wikipedia article process. ~2025-36891-99 (talk) 20:09, 27 November 2025 (UTC)[reply]
The answer is: No. To delete an article, it must be done in accordance with the wp:Deletion policy. —Alalch E. 23:37, 27 November 2025 (UTC)[reply]
IMO this misses the point. We don't set policy based on what it is possible, but based on the overall impact on the project. For example, I am sure there are users who could constructively edit within WP:PIA from their first edit, but we don't let them, because on average letting inexperienced users edit in that topic area was leading to huge problems. Same logic applies here. We need to set LLM policy based on overall impact to the project. NicheSports (talk) 23:57, 27 November 2025 (UTC)[reply]
We don't let new editors edit in the PIA topic area because ArbCom remedies are binding and cannot be overturned by fiat. This guideline is not like that. Reasonable exceptions should still be allowed. SuperPianoMan9167 (talk) 00:13, 28 November 2025 (UTC)[reply]
I was speaking more generally about how our LLM PAGs should develop in the future. This guideline is far from ideal and clearly is going to change. I don't know the right first step, I just know what I want it to get to. NicheSports (talk) 00:16, 28 November 2025 (UTC)[reply]
Is your ideal LLM guideline something like WP:LLM? SuperPianoMan9167 (talk) 00:20, 28 November 2025 (UTC)[reply]
WP:LLM covers a lot, so there are parts I'd probably agree with, but as it relates to usage of LLMs, no. My ideal policies would be
  • LLMs cannot be used to generate article prose or citations, regardless of the amount of review that is subsequently performed, unless the editor is experienced and possesses the llm-user right
  • Experienced editors could apply for the llm-user right, with the same requirements as autopatrolled
  • Users without the llm-user right could use LLMs for non prose-generating tasks. A few examples of this could be generating tables, doing proofreading, etc. We would need to draft an approved list of uses
  • I want to add a G15 criteria for machine-generated articles with multiple material verification failures. This would efficiently handle problematic LLM-generated articles
  • Content policy compliant LLM-generated articles would not need to be deleted. Although if they were discovered to be created by a user without the llm-user user right, we would warn the user about not doing so in the future.
NicheSports (talk) 00:38, 28 November 2025 (UTC)[reply]
So kinda like how AutoWikiBrowser (LLMs, like AWB, could be considered automated editing tools that assist a human editor) requires special approval? SuperPianoMan9167 (talk) 00:41, 28 November 2025 (UTC)[reply]
Yes, but with more restrictive criteria than AWB. I think the autopatrolled requirements are a nice fit (and kind of spiritually related) NicheSports (talk) 00:46, 28 November 2025 (UTC)[reply]
Please drop tables from the list of approved uses, it does it, and on face value seems to do it well, but under the hood is a different story. Maybe there's some version that does it well, or we could put guide rails on it, but GPTs format tables with overlapping column and row spans that are barely human readable. They're great with templates in general though if you check they haven't done more than copy and paste. "Put this text in this template following these rules" usually works beautifully, but not tables, the wiki table formatting is just too weird I guess. ~ Argenti Aertheri(Chat?) 02:10, 28 November 2025 (UTC)[reply]
This is a very nice proposal, reflecting both the current situation (AI is simply as good as most humans on many technical tasks, so banning its use makes no sense) and concerns about a flood of disastrous content generated with AI due to ignorance, greed, or malice. Викидим (talk) 18:24, 2 December 2025 (UTC)[reply]

Content self feedback

[edit]

I would like to suggest that the concept of closed loop system be considered and somehow discussed in the guideline. The LLM nightmare is when other sources pick half baked content from AI generated material, and said sources pick it up again themselves. The feedback can continue and eventually many sources will affirm each other. The term to use then is: jambalaya knowledge. Yesterday, all my dreams... (talk) 16:17, 29 November 2025 (UTC)[reply]

We do have WP:CITOGENESIS which describes this regarding Wikipedia, not quite the same but Wikipedia is a big feeder for AI training sets. Gnomingstuff (talk) 17:06, 29 November 2025 (UTC)[reply]
I did not know about that page, so thank you. The LLM problem is in fact a super turbocharged version of that. Yesterday, all my dreams... (talk) 20:59, 29 November 2025 (UTC)[reply]
We do have a mainspace article on model collapse which is the term for this phenomenon in large language models. It's not really relevant to this guideline specifically, though. Athanelar (talk) 14:29, 30 November 2025 (UTC)[reply]

Nutshell

[edit]

@Novem Linguae: Nothing personal, but I challenge your assertion that this page is too short to have a nutshell. Having a modicum of humor helps keep this project from drowning in bureaucracy.  — Hex talk 14:38, 30 November 2025 (UTC)[reply]

 You are invited to join the discussion at Wikipedia:Village pump (policy) § RfC: Replace text of Wikipedia:Writing articles with large language models. –Novem Linguae (talk) 23:40, 5 December 2025 (UTC)[reply]

so... anyone want to RFCBEFORE

[edit]

this is the official statement that an RFCBEFORE was attempted on December 8, 2025, so anyone saying that there wasn't one can be referred to this timestamp Gnomingstuff (talk) 15:58, 8 December 2025 (UTC)[reply]

I put my ideas for what a comprehensive LLM guideline should have in User:SuperPianoMan9167/LLM guideline ideas. SuperPianoMan9167 (talk) 16:59, 8 December 2025 (UTC)[reply]
The proposals are also considered RFCBEFORE attempts. Mikeycdiamond (talk) 19:37, 8 December 2025 (UTC)[reply]
They sure are. Maybe if the magic word "RFCBEFORE" is intoned, people will actually acknowledge that fact. Gnomingstuff (talk) 21:09, 8 December 2025 (UTC)[reply]
I think people are mostly complaining about the short time period between when Qcne posted v3 of the proposal and when the RfC was opened. SuperPianoMan9167 (talk) 22:04, 8 December 2025 (UTC)[reply]
In support of this idea, I hereby invite all interested parties to leave a short comment on my talk page, especially about what they most want to see in the guideline. Of course I am just a random editor who has had only a little bit of involvement in all this, but I'm here for RFCBEFORE to get this moving to the next stage. --Edwin Herdman (talk) 21:53, 8 December 2025 (UTC)[reply]
Imo, this will be easier to keep track of if it's all in one place. Join us on the talk page for SuperPianoMan's ideas instead? ~ Argenti Aertheri(Chat?) 23:45, 8 December 2025 (UTC)[reply]
Agreed but I think this should be the one place, more visible than a userpage Gnomingstuff (talk) 23:52, 8 December 2025 (UTC)[reply]
The list of things I would like to see in a comprehensive LLM guideline is too long to put here unless I wrap it up in a {{cot}}. Besides, Qcne did the same thing (write it on a userpage) with his proposed guideline. I can transclude the text on this page so that it is more visible if that's more helpful. SuperPianoMan9167 (talk) 23:55, 8 December 2025 (UTC)[reply]
I think Gnomingstuff is talking about discussing it here. Mikeycdiamond (talk) 11:52, 9 December 2025 (UTC)[reply]
Also, Qcne is currently hosting a RFC on this guideline at the Village Pump. Mikeycdiamond (talk) 13:25, 9 December 2025 (UTC)[reply]
I am aware of the RfC and have contributed to it multiple times. The entire point of this is because people are dismissing the RfC out of hand because "discussion did not take place," when we're coming up on 3 years of discussion of AI policy as well as 4 months of concentrated recent discussion, I'm curious what made you think I didn't know about it?
I would prefer all discussion happen here, in one consolidated place, rather than several scattered places including userpages and the like. That way it's impossible for anyone to say it didn't happen. Or rather, it's very possible for people to say that, and they probably will, but at least there will be a solid timestamp to point to. Gnomingstuff (talk) 13:45, 9 December 2025 (UTC)[reply]
That way it's impossible for anyone to say it didn't happen. People definitely still will, but you're right that having it directly here would enable pointing to the "you had your chance" diff. SuperPianoMan, {{cot}} it I guess? ~ Argenti Aertheri(Chat?) 13:54, 9 December 2025 (UTC)[reply]
I wasn't. I was just referring to the fact that a RFCBEFORE after a RFC started is redundant. Also, I am losing track of who contributed where between the AI and temp account discussions. I currently have 24 notifications, and I'm exhausted. Mikeycdiamond (talk) 14:17, 9 December 2025 (UTC)[reply]
It's probably not going to pass because of the contradictory language in the proposal, which is feedback I raised in the (24 hour) RFCBEFORE that Qcne ignored for some reason. Anyways, you know that I'm likely to support anything you propose, but what do you want to discuss here G? I think the best way forward would be to stick with Qcne's proposal but adjust it based on the RFC feedback. NicheSports (talk) 14:25, 9 December 2025 (UTC)[reply]
Yes, I agree that editing Qcne's proposal might be the best way forward. Also, my talk page has moved at a glacial pace over the last two decades, but point taken, I'm striking the recommendation to use my talk page. --Edwin Herdman (talk) 16:48, 9 December 2025 (UTC)[reply]
RFCBEFORE isn't some checkbox or hoop to jump through. Simply saying "this is an RFCBEFORE" doesn't somehow inoculate the RFC that comes out of this. The problem with the current RFC is that the RFC question--the specific proposal--had only been up for 12 hrs or so before the RFC. There is no specific proposal here, so, if and when somebody drafts an RFC question or RFC proposal, they should still wait days after posting the proposed RFC question/proposal here for feedback, before launching the RFC. That was the RFCBEFORE mistake made in the current RFC.
More substantively, before trying to write a guideline that summarizes consensus, consider first trying to figure out what the consensus is. So, a potential RFC question might be something like "Should all LLM use on Wikipedia be banned?" or, "What LLM use should be allowed and what LLM use should not be allowed?" When we have the answers to those questions, then we'll have a better understanding of what the consensus is, and then we'll have a better chance of writing a guideline that documents that consensus. Avoid a "cart before the horse" situation. Levivich (talk) 20:08, 9 December 2025 (UTC)[reply]
Maybe we need to clarify WP:RFCBEFORE. The main point of the RFCBEFORE section (I know, I know, Wikipedia:Nobody reads the directions...) is:
  • Before you start an RFC, see if you should be using some other, non-RFC method (e.g., an ordinary discussion on the talk page, or a peer review or whatever) instead of an RFC.
In other words, RFCBEFORE is primarily about ways you can avoid having an RFC at all. That's not viable for a WP:PROPOSAL.
RFCBEFORE is ☒N not:
  • a requirement,
  • a delaying tactic,
  • a discussion about whether to have an RFC, or
  • a discussion about how to word an RFC question.
In this case, I think that the proposal would have been more successful if the OP had engaged in more discussion before exposing the proposal to friendly fire (see also "It is crucial to improve a proposal in response to feedback received from outside editors. Consensus is built through a process of listening to and discussing the proposal with many other editors" in WP:PROPOSAL), but the problem there isn't a supposed violation of RFCBEFORE "rules". The problem there is that the proposal wasn't strong enough to get accepted. Those "but there wasn't any RFCBEFORE" comments should probably be understood as meaning something like "you jumped the gun with a weak proposal, and now we've lost one of our best chances to get these ideas approved". WhatamIdoing (talk) 22:00, 9 December 2025 (UTC)[reply]
I think the main benefit of RFCBEFORE for this particular topic is to hash out the wordings of everything. Once you launch the big RFC, it is difficult to change any wordings. So for example, in Wikipedia:Village pump (policy)#RfC: Replace text of Wikipedia:Writing articles with large language models, it's currently at approximately 22 support, 16 oppose. But many of the opposes cite that the proposed wordings contradict each other, with a couple sentences completely forbidding LLM use, then a couple sentences right after saying only raw or lightly edited LLM output is forbidden. This is the kind of thing to hash out in an RFCBEFORE. If this had been properly hashed out, perhaps there'd be less opposes in the current RFC. –Novem Linguae (talk) 05:06, 10 December 2025 (UTC)[reply]
I agree with you that a discussion of the proposal would have been helpful in this instance.
Can you agree with me that nothing related to that is found in WP:RFCBEFORE, and that advice on discussing RFC questions before starting an RFC is instead found in Wikipedia:Requests for comment#Statement should be neutral and brief? WhatamIdoing (talk) 05:34, 10 December 2025 (UTC)[reply]
Perhaps the defined meaning of WP:RFCBEFORE and the commonly used meaning have diverged. I think of RFCBEFORE as improving RFC questions and options so that the future RFC isn't bogged down by those issues and can focus more on substance. –Novem Linguae (talk) 06:28, 10 December 2025 (UTC)[reply]
I think that might be the case. People sometimes guess from the WP:UPPERCASE what it "ought" to mean, without checking to see what it actually says. WhatamIdoing (talk) 19:16, 10 December 2025 (UTC)[reply]
This sort of thing often happens. The use of "Keep per WP:CHEAP" at RfD has now more or less completely diverged from what WP:CHEAP actually says. Cremastra (talk · contribs) 19:54, 10 December 2025 (UTC)[reply]
Maybe you'd like to add both of those to WP:UPPERCASE. WhatamIdoing (talk) 23:42, 10 December 2025 (UTC)[reply]

Ideas from SuperPianoMan9167

[edit]

Here are my ideas (not proposals yet) for what a comprehensive LLM proposal should have (transcluded from one of my user subpages). This is NOT a draft guideline. This is a list of ideas for what a hypothetical all-inclusive LLM guideline would include.

Feedback is welcome. SuperPianoMan9167 (talk) 14:38, 9 December 2025 (UTC)[reply]

I think this is far too involved. 1. Our guideline doesn't need to define what an LLM is; that's what our articles are for. 2. This needlessly duplicates a lot of existing policies and guidelines, which do not stop applying once LLMs are involved. 3. I don't think it should contain exhaustive lists of what uses are okay, because editors who screw up may then point to it as a kind of "get out of jail free" card and we may end up misleading folks into getting themselves blocked. 4. Codifying a lot of this as a guideline, including AISIGNS, risks lagging behind the fast-changing technology that LLMs are and eliminating the very editor discretion and common sense that distinguishes us from machines. Toadspike [Talk] 14:57, 9 December 2025 (UTC)[reply]
My responses to these points, in order:
  1. We still need to define LLMs because very often, editors are accused of using LLMs, they say "I didn't, I used [some grammar tool]!", and then other editors point out that many of those tools (like Grammarly) do in fact use LLMs. So I think some guidance is necessary to avoid this confusion on what an LLM is for the purposes of the guideline.
  2. I think it is useful to define how exactly existing policies apply to LLMs so that we have solid justifications for the guideline and so editors don't just think that Wikipedia is run by AI-haters when they get blocked for misusing LLMs. The point of a policy or guideline is to instruct; pointing to other policies and guidelines and saying "you can't use LLMs because they violate [X policy]" is not at all helpful to new editors who have no idea what [X policy] says. For this reason, we should explain both what [X policy] says and how LLMs violate it.
  3. This argument has been made many, many, many times before. I believe it is invalid because guidelines are not supposed to be lists of rules that you must follow or face consequences, despite many users endorsing such rules for LLMs (i.e. they want a guideline consisting only of enforceable rules like "if you use LLMs to write articles they will be deleted. If you repeatedly do this you will be blocked.") Again, the point of a policy or guideline is to instruct, which means giving both dos and don'ts. For example, the sockpuppetry policy includes both illegitimate and legitimate uses of alternative accounts.
  4. LLMs are not a fast-changing technology; they are a stagnating technology. ChatGPT is three years old. The transformer, the type of neural network powering LLMs, is eight years old. The AI industry has basically forced itself into a dead end. The problems with LLM-generated articles, like hallucination, are due to the inherent limitations of the models themselves, and these limitations cannot be overcome just by making the models larger.
One last point: the main reason why I think this guideline would be helpful, despite it potentially being redundant to other policies and guidelines, is that it's more helpful to point to one page and say "that's our AI guidelines" than to say "well actually, we have no AI guideline per se, but you can't write articles with AI, you can't write comments with AI, you can't use AI-generated images..." etc. SuperPianoMan9167 (talk) 16:51, 9 December 2025 (UTC)[reply]
LLMs might not be a fast-changing technology, but the characteristics of LLM output are fast-changing, in unpredictable (or at lest heavily guarded) ways. AI-generated text from January 2024 reads way differently than AI-generated text from December 2025. Claude text probably reads differently than GPT-5 text than Gemini text than GPT-4o text. (Grok text absolutely does.) OpenAI is apparently rushing out a new ChatGPT update in a couple of days.
This means that yes, AISIGNS is always going to be a lagging indicator (as is the research it cites). The whole thing is like trying to map a black box. I don't think that's a huge problem, though, because A) even the outdated stuff, like the promotional "stands as a testament" crap, still useful for finding undetected AI edits from 2023-2024, B) it's not meant to be a tool to sanction editors, and C) any AI guideline is useless if it doesn't contain guidance on how to actually find the AI text. Gnomingstuff (talk) 20:38, 10 December 2025 (UTC)[reply]
In my mind it's less a policy draft and more a list of requirements for that policy, so we can at least all agree on what exactly we're trying to write here. Thus, since your points 1-3 have been repeatedly requested during various (before) RfCs, they should probably be touched on at least in the final policy. No policy is ever going to make everyone happy, but hopefully, if we can mostly agree on what that policy should include, we can stop getting sidetracked by minutiae. ~ Argenti Aertheri(Chat?) 18:46, 9 December 2025 (UTC)[reply]
In my mind it's less a policy draft and more a list of requirements for that policy, so we can at least all agree on what exactly we're trying to write here Yes, exactly. SuperPianoMan9167 (talk) 19:24, 9 December 2025 (UTC)[reply]
Regarding point 3: I'm not the only one who disagrees with the idea that listing acceptable uses will encourage the unacceptable uses:

I disagree with "anyone who gets accused of using them unacceptably is just going to claim they were doing one of the acceptable things". This is an old argument on Wikipedia that I've seen raised many times, and I think it's bad advice, contrary to the fundamental purpose of guidelines, which is to teach. The notion that we shouldn't outline what is acceptable because people who do unacceptable things will claim it's acceptable is nonsensical to me.
— User:Levivich

Quoted from this comment. SuperPianoMan9167 (talk) 19:34, 9 December 2025 (UTC)[reply]
I strongly agree with An explanation of what an LLM is and how it differs from other applications of AI and simple automation (e.g. spellchecking). Let's develop a good, shared understanding of what "this" is before we declare that "this" is banned. WhatamIdoing (talk) 22:03, 9 December 2025 (UTC)[reply]
On that note, since the real issue for a lot of us is the generative parts specifically, should we actually be using "GPT" instead of "LLM"? ~ Argenti Aertheri(Chat?) 23:23, 9 December 2025 (UTC)[reply]
A GPT is a specific type of LLM. GPT stands for "generative pre-trained transformer". The problem with using GPT instead of LLM is people are guaranteed to conflate "GPT" with "ChatGPT" (this has led to brand issues with OpenAI's models). SuperPianoMan9167 (talk) 23:41, 9 December 2025 (UTC)[reply]
I know GPTs are a subset of LLMs, but they're the one that's actually causing headaches. Sucks for OpenAI, but for us perhaps something like "GPTs, including but not limited to ChatGPT"? Someone must have market data we can use to build a list of the popular ones. ~ Argenti Aertheri(Chat?) 02:00, 10 December 2025 (UTC)[reply]
Using the long name of Generative pre-trained transformer might interrupt the GPT = ChatGPT assumption. WhatamIdoing (talk) 05:36, 10 December 2025 (UTC)[reply]
I don't know whether this is better suited to a guideline or an essay, but I really would like to see a section on "what to do if someone says you used AI." Rough idea below:
When someone asks whether you used AI, they're not trying to ban you or accuse you of editing in bad faith. They are trying to gather information that they currently don't have, and then use that information to improve the article. As the editor, you and you alone know whether you used AI (and if you truly don't know, that's a little concerning). The most productive way to respond, then, is to provide that information:
If you did use AI: Say that you used AI, including the version and prompt, the workflow, and the review you did. (Reminder about LLMDISCLOSE here if it ends up being policy, which it should).
Dodging the question is unlikely to go the way you want it to. In particular, the following are common responses to accusations of AI that usually don't go well:
  • Saying things like "if there is AI," "there shouldn't be AI," etc. We don't want to know whether there should be AI, or whether there could be AI -- we want to know whether there is.
  • Saying that "an AI detector said it wasn't AI." We don't want to hear from an AI detector, we want to hear from you.
  • Asking them to prove that you used AI. No one can prove whether you used AI but yourself; if you want proof, provide it yourself.
  • Responding to the question with AI (reminder here about HATGPT).
  • Asking for the AI cleanup tag to be removed from the article before the cleanup is done. If you put AI-generated text into an article, then a template stating that the article contains AI-generated text is a true statement.
  • Lying (obviously)
  • (am I missing anything? I probably am)
If you didn't use AI: Say that you didn't use AI. There is no reason to dance around the truth here. Also, consider the possibility that you used AI without knowing it. Tools like Grammarly use the same language models as other AI and often produce the same problems as them. An increasing amount of software, including Microsoft Word, incorporates AI into certain features. Most of them are not especially transparent about this; this is not your fault.
Also, keep in mind:
  • If someone says that an article is AI, they are talking about the article, not criticizing you personally.
  • No one owns an article. This is a wiki and anything can be edited or deleted at any time, regardless of how much work you put into it or how much you want it to stay the same.
  • (am I missing anything?)
Gnomingstuff (talk) 21:08, 10 December 2025 (UTC)[reply]
Since that keeps coming up in these discussions, your last section should probably include that questions about your AI use are not accusations of bad faith, but might reflect a competence issue and WP:LLMCIR. ~ Argenti Aertheri(Chat?) 22:25, 10 December 2025 (UTC)[reply]

Where do we go from here?

[edit]

Sorry for fucking up the RfC. I am fairly certain it will end up as no-consensus after well-reasoned Oppose votes. I rushed into it, my first RfC, after feeling pretty good about my Version 3 posted above.

So, where do we go from here?

I have been avoiding commenting in the RfC so as not to prejudice the result, but have been thinking about the feedback and the clear split in the community.

Would an RfC with four clearly defined options be better?

Option 1: Status Quo

[edit]

Retain the current text of WP:NEWLLM.

  • Summary: Large language models should not be used to generate new Wikipedia articles from scratch.
  • Pros: Short, simple, current consensus.
  • Cons: Vague definition of "from scratch"; does not address LLM text added to existing articles; does not address talk page use.

Option 2: Prohibition on Unreviewed LLM Content

[edit]

User:Qcne/LLMGuidelineOption2

  • Summary: Prohibits adding unreviewed LLM content; permits use only with rigorous verification.
  • Pros: Focuses on editor responsibility; permits some non-disruptive LLM assisted edits; hopefully resolves the contradictory language from the RfC.
  • Cons: Not acceptable to anti-LLM editors; does not define acceptable LLM use; does not have a LLM use disclosure section.

Option 3: Prohibition on LLM Content

[edit]

User:Qcne/LLMGuidelineOption3

  • Summary: Prohibits LLM content.
  • Pros: Clear and enforceable; unambiguous.
  • Cons: Not acceptable to pro or neutral-LLM editors; may be overly restrictive for constructive tools; enforcement relies on unreliable detection.

Option 4: Limited LLM Use with Disclosure

[edit]

User:Qcne/LLMGuidelineOption4

  • Summary: Permits limited LLM assistance with mandatory disclosure; prohibits generation from scratch.
  • Pros: Promotes transparent LLM use; codifies some best practices
  • Cons: Not acceptable to anti-LLM editors; limited-use boundaries may be pushed.

I feel slightly at a loss on the appropriate next steps. qcne (talk) 21:58, 10 December 2025 (UTC)[reply]

Differences in a table

[edit]

This is an AI-generated comparison (see the details in the edit comment), but I am the one responsible for errors and omissions. Feel free to edit or object. --Викидим (talk) 23:22, 10 December 2025 (UTC)[reply]

Comparison of LLM Guideline Options
Section Option 2 Option 3 Option 4
Title Prohibition on Unreviewed LLM Content Prohibition on LLM Content Limited LLM Use with Disclosure
Nutshell Focuses on prohibiting unreviewed content. Focuses on prohibiting content generation entirely. Focuses on prohibiting unreviewed content (same as Option 2).
Scope (Identical across all three versions)
Defines LLMs, applies to all models/outputs, and links to the main information page.
Primary Restriction Policy Do not use an LLM to add unreviewed content

Permits LLM use only if thoroughly reviewed. Defines "unreviewed" as output not checked line-by-line against reliable sources.
Do not use an LLM to add content

Strict prohibition. States that using an LLM to generate new articles, drafts, or expand existing articles is "not permitted".
Do not use an LLM to add unreviewed content

(Same text as Option 2)
Permits LLM use only if thoroughly reviewed.
Specific Prohibitions (Bulleted List) Prohibits pasting "raw or unreviewed" LLM output. Prohibits pasting "LLM output" (removes the "raw or unreviewed" qualifier). Prohibits pasting "raw or unreviewed" LLM output.
Limited Use & Disclosure (Section not present) (Section not present) Limited use
Strongly discourages use. Suggests use only for narrow tasks (e.g., copyediting) by experienced editors.

Disclosure and responsibility
Requires editors to disclose LLM assistance in the edit summary. Reaffirms editor responsibility for content.
Handling Existing Content (Identical across all three versions)
Allows removal, replacement, tagging, or deletion (including speedy deletion G15) of problematic LLM content.
See Also (Identical across all three versions)

Discussion of the options suggested by Qcne

[edit]

Quick question about It does not cover spellcheckers, grammar checkers, you are aware that Grammarly is now Grammarly AI? People don't even realize they are using AI, they think its just a normal spellchecker/grammar checker. Polygnotus (talk) 22:11, 10 December 2025 (UTC)[reply]

I believe that our next proposal should still be based on Qcne's drafts. The proposal at the open RFC is roughly the right length and has a lot of support - I want to avoid ping ponging back and forth between totally different approaches. Many decisions and modifications to be made but let's stay on the Qcne path! NicheSports (talk) 22:24, 10 December 2025 (UTC)[reply]

The discussion around SuperPianoMan's ideas directly above this isn't an attempt to write a guideline, but rather to reach some consensus as to what the final guideline should include. I'd encourage you to join in regardless which option you prefer here as "do we define LLM in the guideline or not" is policy agnostic. ~ Argenti Aertheri(Chat?) 22:37, 10 December 2025 (UTC)[reply]

I tried striking my comments but I kept hitting an edit conflict as Polygnotus has already collapsed it. Thank you for that. There's too much nuance to make such a comparison. SuperPianoMan9167 (talk) 22:26, 10 December 2025 (UTC)[reply]

@SuperPianoMan9167 Thanks! Yeah its complicated stuff. Polygnotus (talk) 22:27, 10 December 2025 (UTC)[reply]

It may be a good idea to explain that LLMs cannot and do not differentiate between training data and input, and cannot summarize text. Polygnotus (talk) 22:34, 10 December 2025 (UTC)[reply]

I agree that it may be a good idea to (briefly) explain how LLMs work in general, as that will demonstrate why the output is not policy-compliant. The text at the start of this section of AISIGNS seems like a good foundation to modify. SuperPianoMan9167 (talk) 22:37, 10 December 2025 (UTC)[reply]
Yeah it would be nice to have a place you can point people to when they don't understand why LLMs are a problem/which of their shortcomings can harm Wikipedia. Polygnotus (talk) 22:44, 10 December 2025 (UTC)[reply]
Which is why I think a short guideline that doesn't explain itself is not the policy path we should be pursuing here. Otherwise, newbies will see the guideline and think that all Wikipedians are AI-haters. The current system, where WP:LLM provides the actual background information, doesn't seem to be working as WP:LLM is liable to be dismissed as "just an essay". SuperPianoMan9167 (talk) 22:55, 10 December 2025 (UTC)[reply]
I mean when wording this we should also be concerned about the flipside, newbies seeing the guideline and thinking that all Wikipedians are AI supporters. I don't know the proportions of those two groups, but given the recent wave of WMF advertising about how this is the human encyclopedia in the age of AI and that's why you should donate money... Gnomingstuff (talk) 23:05, 10 December 2025 (UTC)[reply]

The proposals 2-4 share a lot of text. It would be nice to have the textual differences in the table form here, so that the eye-strauining "visual" diffs are not required. --Викидим (talk) 22:48, 10 December 2025 (UTC)[reply]

 Done using AI. Викидим (talk) 23:17, 10 December 2025 (UTC)[reply]
Per edit summary it was Gemini, and it handled colspans properly. Maybe it's just that rowspan are already barely human readable? Could you please plop an example of Gemini doing a more complicated table on my talk page? This is the first "correct" ai generated table I've seen and I'm curious how you did it since I've seen so many bad ones. ~ Argenti Aertheri(Chat?) 23:27, 10 December 2025 (UTC)[reply]

You did not fuck up the RfC. Gnomingstuff (talk) 23:01, 10 December 2025 (UTC)[reply]

I think the main problem with the proposed draft is that it needed a few folks with wikilawyering skills to find and fix all the loopholes and similar problems before it was proposed.
I suggest that we pause efforts to expand this guideline. There is nothing that we can do that will improve the behavior of editors in the next few months anyway. The next few months are worth thinking about, because January usually brings an uptick in new editors (or new UPEs, perhaps), and that probably means an uptick in AI-generated contributions. We can come back to it when it's clearer, and we can propose bits and pieces piecemeal.
In the meantime, if Wikipedia:Village pump (idea lab)#Wikipedia as a human-written encyclopedia reaches agreement, then we might be able to turn that agreement into a MediaWiki: message that discourages AI-generated content ("Wikipedia is written by humans. Please don't copy/paste content from AI or chatbots here"?). In the short term, that might be more protective than a guideline that nobody reads. WhatamIdoing (talk) 00:00, 11 December 2025 (UTC)[reply]
That "Wikipedia as a human-written encyclopedia" discussion is very very unlikely to lead to anything, people don't even agree on if that is a good claim to make and certainly not what to do with that idea (banner? policy? something else?).
we might be able to turn that agreement into a MediaWiki: message that discourages AI-generated content People who don't read guidelines are unlikely to read (or care about) such messages. Polygnotus (talk) 00:14, 11 December 2025 (UTC)[reply]
I don't think that wikilegalese would help at all. Common-law-like ambiguity is useful: it allows community to set up the boundaries later without going through any formal processes. This is especially important in a seminal situation like this, when too lax rules will cause humans to drown in a sea of slop, while rules that are too tight will cause us to lose to grokipedias of the forthcoming scary, but inescapable IMHO, world. Викидим (talk) 01:06, 11 December 2025 (UTC)[reply]
Yeah the normal procedure is to iron out the details as we go. Polygnotus (talk) 01:21, 11 December 2025 (UTC)[reply]
This AI thing is new and big and scary-looking. So my right hand is stretched out for a handshake, while left has brass knuckles ready just in case. Викидим (talk) 01:29, 11 December 2025 (UTC)[reply]
@Викидим Are you right or left handed? Polygnotus (talk) 10:06, 11 December 2025 (UTC)[reply]
Good question. Physically, my left is stronger. Otherwise, I am a righty. Викидим (talk) 10:29, 11 December 2025 (UTC)[reply]


I love option three, but could you make a minor change?

Editors should not use an LLM to generate content for Wikipedia
+
Editors should not use an LLM to generate content for Wikipedia, even if you edit the result

Also, is this an RFCBEFORE? Mikeycdiamond (talk) 02:36, 11 December 2025 (UTC)[reply]

This is not going to work. It is akin to requesting editors to write manuscripts by hand and only accept the handwritten images, not typed texts. This cat is out of the bag, and there is no way to convince people who tried the AI to abandon the 10x improvement in performance on the most tedious and annoying tasks, like creating {{cite book}}s or tables. The fact is: on these tasks, the AI is not only faster, but it is better than many (definitely better than me). Pretty soon it will be better than most of us at actually writing the texts. We need to learn how to use it to our advantage, not prohibit it. So option 3 will not have any staying power. I personally prefer option 4 with its explicit endorsements of some use. Викидим (talk) 08:20, 11 December 2025 (UTC)[reply]
After testing ChatGPT with citation templates, I agree with you that AI is good at menial tasks, but I disagree that AI will ever get better at writing than us. Nonetheless, option 4 has gotten popular support, so I would like to recommend some changes to it.
Check the output they intend to use against suitable reliable sources.
+
Check the output they intend to use against the suitable reliable sources that the AI cited, which they must cite as the source of the text. If the AI didn't cite reliable sources, get the AI to generate text with information from reliable sources.
Editors should disclose LLM assistance in the edit summary (e.g. "copyedited with the help of ChatGPT 5.1 Thinking"). This helps other editors understand and review the edit.
+
Editors must disclose LLM assistance in the edit summary (e.g. "copyedited with the help of ChatGPT 5.1 Thinking"). This helps other editors understand and review the edit.
Mikeycdiamond (talk) 19:32, 11 December 2025 (UTC)[reply]
I like this in principle, although I'm afraid that your last addition to the first paragraph might be interpreted by some editors as "get AI to add sources to the text it wrote", which usually won't be great from a text-source integrity perspective (or, for that matter, from a writing perspective). Chaotic Enby (talk · contribs) 19:38, 11 December 2025 (UTC)[reply]
Is this better?
Check the output they intend to use against suitable reliable sources.
+
Check the output they intend to use against the suitable reliable sources that the AI cited, which they must cite as the source of the text. Use AIs that automatically cite sources, such as ChatGPT's search program, and ask the AI to only use reliable sources. Make sure to also manually check the reliability of the sources, either by yourself or through the [[Wikipedia:Reliable sources/Noticeboard|Reliable Source Noticeboard]]
Mikeycdiamond (talk) 20:39, 11 December 2025 (UTC)[reply]
That works much better, thanks! Not sure if "by yourself" and "through RSN" are exclusive here – for many sources, there is a consensus at RSN, and we don't want editors to sidestep it by either going "looks trustworthy enough" or starting a new repetitive discussion. Maybe:
Check the output they intend to use against suitable reliable sources.
+
Check the output they intend to use against the suitable reliable sources that the AI cited, which they must cite as the source of the text. Use AIs that automatically cite sources, such as ChatGPT's search program, and ask the AI to only use reliable sources. Make sure to also manually check the reliability of the sources, and look for previous discussions of these sources at the [[Wikipedia:Reliable sources/Noticeboard|Reliable Source Noticeboard]].
Chaotic Enby (talk · contribs) 20:56, 11 December 2025 (UTC)[reply]
This is great! Should we implement it, or wait for more input? Mikeycdiamond (talk) 21:03, 11 December 2025 (UTC)[reply]
LLMs are not good at being search engines, despite being able to cite sources. I think a better approach would be to recommend finding sources manually first and then including them in the prompt. SuperPianoMan9167 (talk) 21:05, 11 December 2025 (UTC)[reply]
Well, the AI gets the source from somewhere, and it is better to get the sources it used than working backward. Mikeycdiamond (talk) 21:10, 11 December 2025 (UTC)[reply]
I agree, Working backward makes it easier harder. SuperPianoMan9167 (talk) 21:18, 11 December 2025 (UTC)[reply]
"This is much harder than doing it forward. Experienced editors would say at least 20 times as hard. Step 2 is of course the difficult bit." The essay you cited seems to disagree with you. Also, take a look at my revision of the edit. Mikeycdiamond (talk) 21:21, 11 December 2025 (UTC)[reply]
Oops, I fell victim to WP:UPPERCASE again :) Sorry! SuperPianoMan9167 (talk) 21:40, 11 December 2025 (UTC)[reply]
I think I was under the impression that "backwards" meant "sources first", which is not at all what the essay says. trout Self-trout SuperPianoMan9167 (talk) 21:49, 11 December 2025 (UTC)[reply]
It is fine. Could you take a look at the revision to the edit I made in response to your criticism? Mikeycdiamond (talk) 21:51, 11 December 2025 (UTC)[reply]
I just did. I was also politely asked to be more thoughtful of how often I comment on LLM PAG discussions, so I might not reply here for a bit. (I bludgeoned Qcne's RfC and I apologize.) SuperPianoMan9167 (talk) 21:56, 11 December 2025 (UTC)[reply]
How about this?
Check the output they intend to use against suitable reliable sources.
+
Check the output they intend to use against the suitable reliable sources that the AI cited, which they must cite as the source of the text. They should find sources beforehand and prompt the AI to create an article from those sources. Make sure to also manually check the reliability of the sources and look for previous discussions of these sources at the [[Wikipedia:Reliable sources/Noticeboard|Reliable Source Noticeboard]].
Mikeycdiamond (talk) 21:16, 11 December 2025 (UTC)[reply]
Yes, I think this is an improvement. SuperPianoMan9167 (talk) 21:52, 11 December 2025 (UTC)[reply]
Ok, should I implement the edits? I don't have much experience with RFCBEFOREs, or RFCs in general. Mikeycdiamond (talk) 21:58, 11 December 2025 (UTC)[reply]
IMHO, the core to our activity are WP:N and WP:UNDUE that require human understanding. Therefore we should not allow AI to find sources as it goes. Whatever we do, the guidelines should reflects that the sources are always selected by human. I see the search for sources and their use as two distinct activities that require a human firewall in-between. Викидим (talk) 21:15, 11 December 2025 (UTC)[reply]
Is any of that what you meant? Or were you referring to our talk page discussion that resulted in wording along the lines of automation of filling in templates? If so, here's that discussion: User talk:Argenti Aertheri#Gemini 3 and tables, and an example from my own editing:
<ref>Animal Bones, Carcasses Found At Closed School Archived 2012-03-23 at the Wayback Machine - Dozens Of Surviving Animals Rescued By Upstate Group, WYFF4, September 10, 2010</ref>
<ref>"Animal Bones, Carcasses Found At Closed School". WYFF4. September 10, 2010. Archived from the original on 2012-03-23. Retrieved 2025-07-27.</ref>
I already had all the information for the citation, just in plain text, all ChatGPT did was "put it in this template". ~ Argenti Aertheri(Chat?) 22:40, 11 December 2025 (UTC)[reply]
Anything that happens before an RfC is part of the RFCBEFORE. RFCBEFORE is not a requirement to point at one discussion and say "this satisfies RFCBEFORE!" SuperPianoMan9167 (talk) 15:31, 11 December 2025 (UTC)[reply]

I've said it before and I'm going to say it again since you asked again: the best next step is to figure out what the consensus is regarding LLM use before trying to document that consensus in a guideline. I would oppose Option 4 because of the line "Editors are strongly discouraged from using LLMs," among other lines. You might save yourself a lot of time by first asking the community if it aggrees that editors should be strongly discouraged from using LLMs. Levivich (talk) 03:47, 11 December 2025 (UTC)[reply]

Well, regarding your last sentence, I believe that is exactly what this RfC would be for? Usually, guidelines are established through a consensus-making process, instead of having a separate undocumented consensus-making and then writing a guideline out of it. For that matter, I broadly agree with the text of Option 4. Chaotic Enby (talk · contribs) 11:07, 11 December 2025 (UTC)[reply]
That's not accurate, CE (the part about "usually..."). Look into the history of how various guidelines/policies are written. Look at how WP:RECALL was developed, it came out of WP:RFA2024. We didn't start with a fully drafted guideline and adopt it, or even a choice among several fully drafted guidelines. It started with first figuring out what the consensus was, in multiple phases of RFCs, and only after the consensus was determined was the WP:RECALL page written. Same with WP:AELECT, which was also only written after RFA2024. The problem is in trying to write the guideline first, and then figuring out what the community wants after. It's a backwards approach. In fact, I can't think of a time that somebody drafted an entirely new guideline, and it was adopted. Can you? (The current RfC, if it passes, might be the first.) Levivich (talk) 14:31, 11 December 2025 (UTC)[reply]
The "entirely new" part is where I object (although WP:NEWLLM might fall under this). I'm not saying that this is always a one-round voting on an existing guideline proposal, and we do have tools like WP:RFCBEFORE to workshop these guidelines from early drafts. The thing is, that is exactly what is being done right now, so saying that we should first ask the community as a prerequisite to the RFCBEFORE isn't helpful. Chaotic Enby (talk · contribs) 14:35, 11 December 2025 (UTC)[reply]
Not as a prerequisite to an RFCBEFORE. I'm not sure if you don't understand what I'm saying or what, but it's about what the RfC question is. You can have RfC questions that ask "what should the rule be," or you can have RfC questions that propose written text and ask to adopt that text. The former should be done first, then the latter. Often, the latter (the written text) doesn't need an RfC at all. Levivich (talk) 14:39, 11 December 2025 (UTC)[reply]
WP:RECALL and WP:AELECT were both multi-part RfCs to establish entire new processes. Shorter guidelines, such as WP:NEWLLM, have been proposed directly, and asking for a RfC on everyone's opinion before a second RfC on the specific wording (which will be debated in the case of AI policy) mostly adds bureaucracy. I'm not opposed to making this a part-by-part RfC (where each aspect of the guideline is voted on separately instead of giving the voters a single choice of full policies), but we shouldn't dismiss it just on the basis that we have well-defined options for each.
In fact, there have been many cases (like Wikipedia:Requests for comment/2024 Wikipedia blackout) where a RfC was opposed specifically because it asked a broad question and didn't present a full, detailed proposal. Chaotic Enby (talk · contribs) 14:55, 11 December 2025 (UTC)[reply]
Ok, well, hey, maybe the current RfC will end up supporting the proposed guideline and that'll be the end of the issue. Or maybe the next fully-written guideline (or one of several proposed guidelines) will gain consensus.
Your diagnoses, though, is incorrect. If it's true that "Shorter guidelines, such as WP:NEWLLM, have been proposed directly," then can you name one, besides NEWLLM? I can't think of any.
Both RECALL and AELECT came out of the same RFC. It wasn't, as you wrote, an RfC that proposed new processes--you got that backwards. The RfC asked what the consensus was, both as to what "the problem" was and what proposed solutions to those problems were. What came out of that was some new processes as well as reforms to existing processes. Various solutions were tried, some stuck and others (like the three day discussion period) were abandoned after trials. The whole point is that RFA2024 asked what the problem was before proposing any solutions. RFA2024 was more extensive than I think an LLM RFC needs to be, but the lesson is to first figure out what the community thinks, and only second to try and craft solutions to those problems (and only third, to try and document it).
FWIW, I don't think your blackout proposal failed because it asked a broad question and didn't present a full, detailed proposal. It didn't ask a broad question, it proposed a specific course of action, for which there was not consensus. I think it failed because of the same issue I'm raising here: proposing a solution without first figuring out what "the problem" is, in the eyes of the community.
But you don't have to listen to me, maybe your approach will work this time, or the next time. Levivich (talk) 15:19, 11 December 2025 (UTC)[reply]

Your diagnoses, though, is incorrect. If it's true that "Shorter guidelines, such as WP:NEWLLM, have been proposed directly," then can you name one, besides NEWLLM? I can't think of any.

Yes, many. Just this year alone, I can think of Wikipedia talk:Please do not bite the newcomers#RfC: Rewriting specific sections (a full guideline rewrite) or Wikipedia talk:Speedy deletion/Archive 93#RfC: Replacing U5 with a primarily procedural mechanism (where editors !voted on fully written speedy deletion criteria).

Both RECALL and AELECT came out of the same RFC. It wasn't, as you wrote, an RfC that proposed new processes--you got that backwards.

WP:RECALL had a standalone, point-by-point RfC, that came after a broader RfC that proposed new processes, of which RECALL was a successful one. A big criticism of how RECALL was handled was exactly that long, multi-part RfC process, and it is not a model to follow in all future RfCs.

FWIW, I don't think your blackout proposal failed because it asked a broad question and didn't present a full, detailed proposal. It didn't ask a broad question, it proposed a specific course of action, for which there was not consensus.

Much of the early opposition (and of the discussion around the proposal) centered around the fact that implementation details weren't provided at first. The #Specifics section was only added later to account for that feedback. Chaotic Enby (talk · contribs) 15:36, 11 December 2025 (UTC)[reply]
If there is an existing best practice that editors commonly follow, then I have seen guidance written that codifies it which gains community consensus approval. I'll agree though that for matters with contradictory opinions from significant numbers of editors among those who like to discuss those matters, it's hard to craft a detailed guideline proposal that will gain enough support to be approved. The inescapable reality is that although many people want to minimize the time they spent considering a matter and so prefer to see one RfC with a proposal, the difficulty in getting such a proposal approved means that it may be better to have multiple phases. Determine broad parameters for guidance in an initial phase, and get a proposal approved in a later phase. (This is a pretty common approach with organizations who need to consult with a broad population.) Unfortunately, the shifting nature of who participates in any given discussion means it can take a while, possibly with resets happening as different concerns come to the forefront. isaacl (talk) 15:54, 11 December 2025 (UTC)[reply]
undocumented consensus-making is the consensus-making process. SuperPianoMan9167 (talk) 14:33, 11 December 2025 (UTC)[reply]
Yes, but that's what we're doing here as an RFCBEFORE. There's no point in asking for a "before" to the RFCBEFORE. Chaotic Enby (talk · contribs) 14:37, 11 December 2025 (UTC)[reply]
Asking what the problem is is still part of the RFCBEFORE. RFCBEFORE is not just one discussion that you point to and say "that's our RFCBEFORE!" WP:RFCBEFORE actually says If you can reach a consensus or have your questions answered through discussion, then there is no need to start an RfC. Anything that happens before the RfC is part of the RFCBEFORE. SuperPianoMan9167 (talk) 15:29, 11 December 2025 (UTC)[reply]
Does every fucking discussion on-Wiki devolve into arguments over semantics qcne (talk) 15:31, 11 December 2025 (UTC)[reply]
Yes SuperPianoMan9167 (talk) 15:33, 11 December 2025 (UTC)[reply]
Agreed that these meta discussions about an rfcbefore are largely unproductive, and could become disruptive. If they need to happen, can they happen on a different thread? We should focus on Qcne's proposals here. NicheSports (talk) 15:36, 11 December 2025 (UTC)[reply]
I agree that arguing about labels isn't helpful. As has been pointed out by others, Wikipedia:Requests for comment § Before starting the process is actually about alternatives to having a request for comments discussion. The actual concern is more about how much work has gone into developing an RfC question that is likely to result in progress moving forward. So... no matter what it's called, let's just continue the development work. isaacl (talk) 16:00, 11 December 2025 (UTC)[reply]
Option 4 is just as contradictory as the current proposal at the RFC :( is there any chance you could create a "CE version" of option 4 that tries to resolve those contradictions? I would love to see what your preferred policy is. NicheSports (talk) 14:56, 11 December 2025 (UTC)[reply]
@NicheSports What do you find contradictory about it? I've tried hard to resolve the contradiction. qcne (talk) 14:57, 11 December 2025 (UTC)[reply]
Sure thing Qcne!
  • The summary says: Permits limited LLM assistance with mandatory disclosure; prohibits generation from scratch.. What is in the scope of "limited LLM assistance"? Does "from scratch" mean "regardless of subsquent review"? Or does it still allow reviewed LLM-generated content?
  • But then the section header in the actual proposed guideline is still titled Do not use an LLM to add unreviewed content. Ok so I guess reviewed content is still fine. But what about the limits mentioned in the summary?
  • And the bulk of the guideline is identical to Option 2 (which permits reviewed LLM-generated content). Editors should not use an LLM to generate content for Wikipedia unless they have thoroughly reviewed and verified the output... Editors should not: Paste raw or unreviewed LLM output as a new article or as a draft intended to become an article... (etc.) Ok no discussion of limits here either, this section is basically identical to Option 2.
  • But then this section is added (not in Option 2). Editors are strongly discouraged from using LLMs. LLMs, if used at all, should assist with narrow, well-understood tasks such as copyediting. How do I square this with the previous section? This seems more consistent with the "limited LLM assistance" language from the summary at least!
Frankly this might be more contradictory than the current proposal, to the extent that I don't know what you are actually suggesting here. It is possible this could be my preferred guideline, if it is banning the use of LLMs to generate text for new articles or significant article expansions/rewrites (regardless of subsequent human review), but permits the use of LLMs, with review, on a limited basis for copyediting small sections of prose (a few sentences here or there). NicheSports (talk) 15:18, 11 December 2025 (UTC)[reply]
Does "from scratch" mean "regardless of subsquent review"? No. "From scratch" means that reviewed content is okay. Proposal 4 is the same thing as proposal 2 except with a disclosure requirement. It bans the unreviewed use of LLMs to generate text for new articles or article expansions but allows rigorously reviewed content. I think the contradiction is resolved. It makes perfect sense to say both "do not add LLM content unless it is reviewed" and "despite that, LLM use is strongly discouraged, especially for newer editors". SuperPianoMan9167 (talk) 15:26, 11 December 2025 (UTC)[reply]
I almost think the correct question right now to ask is:
What LLM use is tolerated BESIDE the actual addition of text to a Wikipedia article?
We know the actual addition of text to the article space is the main sticking point for a wide variety of reasons, which will be the real bear to tackle long-term.
So define what if anything IS tolerated first, encode that as custom, and then you've got the first half of a "policy". — Very Polite Person (talk/contribs) 22:49, 11 December 2025 (UTC)[reply]

Option 2 seems best, but as others have highlighted, I would also want a disclosure clause ideally. While I have been somewhat stringent against LLM usage as I've found it, I am not against it completely, and as others have suggested that really positive example cases are needed for guidance on usage, this is something I could potentially help with, as I have used locally run LLMs in some writing exercises outside of Wikipedia to see what benefits or pitfalls there may be in topics where I am able to discern where it get's the information not obviously wrong, but wrong nonetheless. -- Cdjp1 (talk) 18:16, 11 December 2025 (UTC)[reply]

I prefer something similar to Option 4 with even stricter disclosure requirements. Any text that was generated and/or substantively edited by AI (rule of thumb: if words are replaced with other words that mean other things, so typos and punctuation don't count nor does markup), whether reviewed or not, must be disclosed not just in the edit summary but on the article itself, in a prominent place where readers will see it. (The equivalent of our current cleanup tag, except that the disclaimer remains as long as there is AI-generated text present.) This is consistent with what we do when we incorporate text from elsewhere (e.g. Catholic Encyclopedia as mentioned above). How that disclaimer looks and what it says can be discussed -- it does not necessarily have to resemble a cleanup tag or talk about cleanup -- as long as it is not hidden away in some dark pattern dungeon. (If it were entirely up to me, I would even go so far as to require any AI-generated text to be highlighted in a different color.)

I find this more realistic; if we can't stop people from using AI, we can at least let readers know where their articles came from. Does this mean that we will accumulate a great deal of these disclaimers? Yes, them's the breaks. The text would be there either way. And if AI-generated text is there, readers deserve to know about that, especially if those readers are on Wikipedia expecting to read non-AI-generated text, which many are. Gnomingstuff (talk) 19:29, 11 December 2025 (UTC)[reply]

I would support this in theory, although the one worry I do have is that this might discourage editors from being transparent about their use of AI. However, that isn't a deal-breaker, and we shouldn't be writing our policies from a standpoint of appeasing potentially disruptive editors. Chaotic Enby (talk · contribs) 19:41, 11 December 2025 (UTC)[reply]
I mean only a handful of people are proactively transparent about their use of AI right now, so there's nowhere to go but up Gnomingstuff (talk) 19:53, 11 December 2025 (UTC)[reply]
When would such a tag be removed? SuperPianoMan9167 (talk) 20:02, 11 December 2025 (UTC)[reply]
It wouldn't (unless the AI-generated text is removed). Basically, it would be the equivalent of the disclaimers in research studies and news articles. Gnomingstuff (talk) 22:03, 11 December 2025 (UTC)[reply]

Do all the prohibitions specifically come down to what gets "into the edit box"?

Because we literally can't tell people not to use or police use of it for things like grammar, searching, dissecting documents, and other actions. It's increasingly culturally ingrained already in too many workplaces that work with computers daily. Many places require it now for some things. More tools we have on our computers are now pre-loaded with the stuff. Grammarly was mentioned above here.

It would be impossible to police anyone before it gets to the edit box. Is that the intention, to dissuade people from using it at all, or just for what actually gets saved as a revision?

The former is impossible (and beyond any presumed authority or right we have to police). The latter is not. — Very Polite Person (talk/contribs) 22:44, 11 December 2025 (UTC)[reply]

I want to add some suggestions to the "unreviewed" section:

  • LLMs are prone to hallucinating facts and citing non-existent sources. Not wrong but a little obsolete (this happens less often with newer AI), and can lead people to think that if the URL isn't broken then everything's fine and their review is done.
  • It should mention how newer AI tends to not outright hallucinate as much, but instead generate its own interpretations then claim a source said that, which is WP:OR (if AI slop can be called "research"). Just went through some examples at Talk:Us_(Gracie_Abrams_song)#AI
  • It should mention close paraphrasing as AI is prone to do this. I see this a lot in music articles -- if a reviewer says "heavy pulsating beats" or "smooth honeyed harmonies" or whatever, AI will often just regurgitate that in wikivoice. This is WP:CLOP bordering on plagiarism. Gnomingstuff (talk) 15:47, 12 December 2025 (UTC)[reply]

What LLM use is tolerated BESIDE the actual addition of text to a Wikipedia article?

[edit]

We know the actual addition of text to the article space is the main sticking point for a wide variety of reasons, which will be the real bear to tackle long-term. Remember too that non-technically savvy people may be inadvertantly using LLM without even knowing it.

Couple what-if's and examples:

  1. I sometimes use Google Translate to, unsurprisingly, translate. Is that AI/LLM? If it is, can I use such translations at all on-wiki? If someone who English isn't their first language uses Google Translate to sound coherent here, is that OK? What if they use GPT for that same ends?
  2. What if MS 365 turns Word's spell and grammar check into AI/LLM instead of dictionary/file/code type backing? Do I need to now edit in Notepad instead? What if I have no idea what Word is doing under the hood?
  3. What if I set a tool like this loose to find me every web page that mentions a certain text string published between the years 2005-2018, as a reference source? Can I not use the links if an AI tool found them versus my hand Googling them down?
  4. What if someone wants to check if their article is MOS compliant, but isn't familiar with them all, and asks a GPT tool? The tool says something like, "These lines are problems for these reasons," gives you MOS pages, and suggested language edits. There's only so many permutations for certain language problems that both make sense, fit MOS, and don't suck. Is it ok for them to audit their content like that to learn how to fix it?
  5. What if someone stuffs a PDF of a book into one of these to find references to XYZ?

The above examples are all "before saving a page version".

So define what if anything IS tolerated first, encode that as custom, and then you've got the first half of a "policy".

For the addition of text; that's the easiest example and also the hardest nut to crack. Put it aside for now. What IS tolerated? Define what no one reasonable or the majority do not object to, or are willing to not care about. A total hard "no way" prohibition is going to be impossible to police, especially as integration of these tools becomes more endemic. — Very Polite Person (talk/contribs) 22:53, 11 December 2025 (UTC)[reply]

Regarding #1, the consensus is explicitly against using unedited machine translations. #2 doesn't have any regulations against it as far as I know, and I don't think any reasonable person would object to AI-assisted spellchecking. However, if it starts changing word choices or sentence structure (similar to your "suggested language edits" in #4), that can be more problematic. LLMs usually have a poor grasp of MOS (and especially of NPOV/editorializing/weasel words), and asking one to "copyedit for neutrality" usually makes things worse, while giving the user the impression that they helped. Chaotic Enby (talk · contribs) 23:18, 11 December 2025 (UTC)[reply]
Right, that's the sort of answer I was talking about! Every time I read one of these (no offense to anyone in particular), I often get "harumph" vibes as opposed to, and I hate the business terminology, "actionable things".
This makes perfect sense: Wikipedia consensus is that an unedited machine translation, left as a Wikipedia article, is worse than nothing, especially with the connective tissue there on why, with the explanations. That's basically a section/passage we can airlift straight into a LLM policy.
For your points on #2, again, I agree. But can we ever actually police that? What's the line between man and machine there? I just wrote free hand: "Very Polite Person is a Wikipedia editor who is noted for sometimes being terribly pedantic." I then asked my GPT five times in "temp chats" (to not pollute with any data about me; sandboxes it; normally used for sports stuff/recipes more than anything): I just wrote that free hand (I am that person). I need an experiment -- without changing that message at ALL, but only terminology and structure, how many unique ways do you think you can reword that in English? Don't actually show them. Just give me an integer number. Best effort. If you can clear 100+, estimate percentage likelihood of that. Every way I ask, it says it can get that passage to 100+ variants with 99% likelihood. I honestly wasn't expecting numbers or odds that high just now.
I honestly can't think of how to police that. Any combo a person will come up an AI can more easily. If some editor is writing that about a BLP or something, and changes "terribly" to "famously" and the sourcing fits, we'd never be capable of knowing. — Very Polite Person (talk/contribs) 23:31, 11 December 2025 (UTC)[reply]
Any combo a person will come up an AI can more easily. That's assuming a lot of current AI models. Yes, they could theoretically produce any possible sentence, but there are clear known patterns that make them more problematic most of the time. Chaotic Enby (talk · contribs) 23:39, 11 December 2025 (UTC)[reply]
Yeah, like shark eyes, dead. There's rarely creative spark. We're supposed to write dry though, and half of these things are trained on us. That's why it's so hard to tell unless people are stupid and just copy/paste walls of AI into here.
But am I making sense? Any policy that isn't a hard unenforceable "not one single letter or character from GPT et al is allowed" must inherently be somewhere on the spectrum of what can be policed in a realistic context. Not what we may prefer. What we can do.
All these that I've read are variants of "How I want us to handle LLM usage," but no one's asking the actually important first question: "What can we actually police of LLM usage?" — Very Polite Person (talk/contribs) 23:43, 11 December 2025 (UTC)[reply]
In general I feel like your focus is too much on "will someone punish me for this" when it should be "is this text problematic."
I sometimes use Google Translate to, unsurprisingly, translate. Is that AI/LLM? If it is, can I use such translations at all on-wiki? If someone who English isn't their first language uses Google Translate to sound coherent here, is that OK? What if they use GPT for that same ends?
As far as I know Google has not yet shoved Gemini into Google Translate Google Translate does not use generative AI as it is currently understood. In my experience it tends to err on the opposite side, being too awkwardly literal a translation rather than turning meaning into slop.
What if MS 365 turns Word's spell and grammar check into AI/LLM instead of dictionary/file/code type backing? Do I need to now edit in Notepad instead? What if I have no idea what Word is doing under the hood?
It already has. (So has Notepad.) Again, if you have no idea Word is doing this under the hood, that doesn't mean you're editing in bad faith, but it does mean that the content you are adding is problematic.
What if I set a tool like this loose to find me every web page that mentions a certain text string published between the years 2005-2018, as a reference source? Can I not use the links if an AI tool found them versus my hand Googling them down?
If you have to, although AI is not great at distinguishing reliable sources and may contain gaps/biases in what it surfaces. Humans can obviously do this too, but they at least sometimes have a framework in their brain to tell the difference. AI does not.
What if someone wants to check if their article is MOS compliant, but isn't familiar with them all, and asks a GPT tool? The tool says something like, "These lines are problems for these reasons," gives you MOS pages, and suggested language edits. There's only so many permutations for certain language problems that both make sense, fit MOS, and don't suck. Is it ok for them to audit their content like that to learn how to fix it?
They should cut out the middleman and RTFM read the MOS. AI's suggestions are frequently wrong (introducing promotional tone while saying it's making it neutral), superficial (pointlessly twiddling words), or impossible (pointing out gaps in an article when the gaps are there because that information simply does not exist in reliable sources).
What if someone stuffs a PDF of a book into one of these to find references to XYZ?
I think this is OK as long as it's used as a tool to go back and check the actual source, not as the source of truth. (Full disclosure, I do this sometimes with AI-generated transcripts of videos that I'm fact-checking, to find out what timestamp I need to listen to.) Gnomingstuff (talk) 23:51, 11 December 2025 (UTC)[reply]
They should cut out the middleman and RTFM read the MOS.
I agree, but the point I'm tyring to make is we can't expect this. The real question is what we can know. We keep circling back to what we ideally or ideologically prefer, versus what is really achievable here. If every discussion always turns into variants of "don't do it, do it this way instead", we'll be having these discussions through 2030 with no progress, as these tools get ever more present. This is why I'm saying: proscriptions against it as a hard prohibition already don't work, so writing it more Formally won't make it work any better. — Very Polite Person (talk/contribs) 00:41, 12 December 2025 (UTC)[reply]
The risk of an inexperienced user asking an llm if an article meets MOS is that llms probably don't understand MOS, and will give a falsely confident answer. There's nothing to stop any user doing it, but at some point they will be misled. CMD (talk) 01:37, 12 December 2025 (UTC)[reply]

What if someone wants to check if their article is MOS compliant, but isn't familiar with them all, and asks a GPT tool? The tool says something like, "These lines are problems for these reasons," gives you MOS pages, and suggested language edits. There's only so many permutations for certain language problems that both make sense, fit MOS, and don't suck. Is it ok for them to audit their content like that to learn how to fix it?

Checking for things like curly quotes is a simple regex, and for more complicated MOS fixes there are Wikipedia:User scripts/List. Including a words to watch highlighter: User:Danski454/w2wFinder. Editors have already written good, Wikipedia specific, programs for this shit, why reinvent the wheel as a many sided polygon when we've already got a circle? ~ Argenti Aertheri(Chat?) 00:32, 12 December 2025 (UTC)[reply]
why reinvent the wheel as a many sided polygon when we've already got a circle?
Like I said in my last reply, I agree, but writing how we want users to do things ever more formally won't really do anything. The real question is what can we do, with the tools and capabilities available to us? Unless we have a way to somehow police AI/LLM usage that happens BEFORE the user saves, we have no way to make a policy that has any actual way to act against that or detail how it should be done.
The outcome we want is the least important question (yet). The most important question is what can we actually detect. That informs everything after. Everyone just immediately transitions into variants of "nope" to AI, as if dislike or disfavor has actionable authority. It does not. — Very Polite Person (talk/contribs) 00:44, 12 December 2025 (UTC)[reply]
The most important question is what can we actually detect. This keeps coming up and frankly I don't get it. It's virtually impossible to detect undisclosed paid editing or CoI, we still don't allow them. ~ Argenti Aertheri(Chat?) 02:04, 12 December 2025 (UTC)[reply]
Exactly, the COI thing is functionally the exact same thing. If the whole thing is just the "message" as with that policy, then that's reasonable. But with that, it's a binary. You are or you kinda aren't.
With AI, there's that entire spectrum of "GPT, give me an article that is about this species of bat for Wikipedia please" and suddenly making an Awesome bat article, compared to people who use it as a fact-checker, research tool, grammar helper, or "does this sound like shit?" assistant.
If we say NO usage is allowed, we have to actually explain what that means: nothing, it's supposed to be what YOU by HAND found online, to your brain, to your fingers, to text entry box. If we take it to the ideology or needed maximized licensing/lowest risk of copywrite issues far extreme end. And we need to explain, "that includes this, and this and all these tools and products like this", so unaware users aren't caught out.
Or, we say X is allowed if you Y, and then we need to spell it out the same. — Very Polite Person (talk/contribs) 02:45, 12 December 2025 (UTC)[reply]
there's that entire spectrum with CoI too though. No one is really going to get upset if someone corrects a typo in the article about themselves, versus, idk, politicians erasing scandals. ~ Argenti Aertheri(Chat?) 02:59, 12 December 2025 (UTC)[reply]
Hit enter to soon on mobile, sorry. I agree we need to reach consensus on what exactly is and isn't allowed, but my opinion remains to favor built in tools and user scripts written by fellow editors over what an LLM can piece together from its training. ~ Argenti Aertheri(Chat?) 03:04, 12 December 2025 (UTC)[reply]
I'm agnostic, because I believe it's going to be increasingly hard to catch as it improves. Even if the entire "industry" implodes, the technology won't go away, and like every tech ever, it'll get cheaper, easier, and dumber to run as years go by. Things rednecks do today would be deity hijinks an eon ago. We just need it to be VERY clear and particular, given the nuance. If it's all-out or graduated, we gotta explain EXACTLY what that means at a level anyone can understand on day zero of their Wikipedia experience. — Very Polite Person (talk/contribs) 03:11, 12 December 2025 (UTC)[reply]
AI detectors are reasonably reliable in my experience, so a pattern of quasi-robotic changes should not be hard to spot. The approach should be more of the self-declaration, patterned on WP:UPE. Викидим (talk) 02:05, 12 December 2025 (UTC)[reply]