Wikipedia talk:Writing articles with large language models
| This is the talk page for discussing improvements to the Writing articles with large language models page. |
|
| Archives: 1 |
Q1: What is the purpose of this guideline?
A1: To establish a ground rule against using AI tools to create articles from scratch. Q2: This guideline covers so little! What's the point?
A2: The point is to have something. Instead of trying to get consensus for the perfect guideline on AI, which doesn't exist, we have been in practice pursuing a piecemeal approach: restrictions on AI images, AI-generated comments, etc. This is the next step. Eventually, we might merge them all into one to create one guideline on AI use. Q3: Why doesn't this guideline explain or justify itself?
A3: Guidelines aren't information pages. We have plenty of information already about why using LLMs is usually a bad idea at WP:LLM. Q4: Why is this guideline only restricted to new articles?
A4: This guideline, originally a proposal, was intentionally designed to be simple and narrow on purpose so consensus could easily be gained and it could become a guideline, with the intent to flesh it out in later discussions. |
Why is this guideline only restricted to "new articles"? Shouldn't this apply to all articles? (and talk pages and so on...)
[edit]Under my own reading of this rule, it seems like it only applies to new articles, and that pre-existing articles are somehow allowed to have AI-generated text inserted into them. GarethBaloney (talk) 13:46, 24 November 2025 (UTC)
- I think because it's a badly written sentence and was erroneously promoted to Guideline. qcne (talk) 13:48, 24 November 2025 (UTC)
- Well if people are saying it's a badly written guideline then we should make a new discussion on changing it! GarethBaloney (talk) 14:05, 24 November 2025 (UTC)
- Yes! Let's have all our guidelines be padded out with twelve-thousand word essays defending and justifying them and providing supplementary information such that no-one will ever read them and newbies have no freaking idea what it's actually telling them. Cremastra (talk · contribs) 02:05, 25 November 2025 (UTC)
- The guideline and RFC were probably written minimalistically to increase its chances of passing an RFC, with the intent to flesh it out in follow up discussions. –Novem Linguae (talk) 21:26, 24 November 2025 (UTC)
- This one. Cremastra (talk · contribs) 01:08, 25 November 2025 (UTC)
Further amendment proposal #1: Festucalex
[edit]Well, habemus guideline. Now, how is it going to be enforced, given the fact that the guideline is donut-shaped? We might as well address the "from scratch" loophole and preempt the thousands of man-hours that are going to be wasted debating it with LLM users. How should we define "from scratch"? In an ideal situation, the guideline would be this:
| − | Large language models | + | Large language models should not be used to edit Wikipedia. |
This will close the loophole. Any improvements are welcome. 〜 Festucalex • talk 14:05, 24 November 2025 (UTC)
- Strong support. The usage of LLMs to directly edit, add to, or create articles should not be accepted in any way. The high likelihood, and inherent poor quality of sourcing of LLMs makes them ill suited for use on Wikipedia, and genuine human writing and research should be the standard. Stickymatch 02:55, 25 November 2025 (UTC)
- LLM sourcing can be 100% controlled (editor selects sources, uploads them, and explicitly prohibits using anything else). So the poor choice of sources is a human factor, evident in many human-written articles here. Викидим (talk) 05:49, 25 November 2025 (UTC)
- I am not sure if we can amend the proposal after all these !votes have been made, but could you make an exclusion for grammer checkers? Mikeycdiamond (talk) 15:52, 26 November 2025 (UTC)
- These are not !votes. This is an WP:RFCBEFORE discussion. voorts (talk/contributions) 16:33, 26 November 2025 (UTC)
P.S. I just finished writing an essay against one of the proposed "accepted uses" for LLMs on Wikipedia. I welcome your feedback on the essay's talkpage. User:Festucalex/Don't use LLMs as search engines 〜 Festucalex • talk 16:54, 24 November 2025 (UTC)
- I support this wholeheartedly. GarethBaloney (talk) 14:07, 24 November 2025 (UTC)
- Support. "From scratch" is way too generous. TheBritinator (talk) 14:25, 24 November 2025 (UTC)
- Any policy or guideline that says "ban all uses of LLMs" is bound to get significant opposition. SuperPianoMan9167 (talk) 14:31, 24 November 2025 (UTC)
- And all policies and guidelines have a built-in loophole anyway. SuperPianoMan9167 (talk) 14:36, 24 November 2025 (UTC)
- The fact that WP:IAR exists doesn't mean that we ought to actively introduce crippling loopholes into guidelines. Imagine if we banned vandalism only on new articles, or only on articles that begin with the letter P. 〜 Festucalex • talk 14:57, 24 November 2025 (UTC)
- If you look at the RfC you can see a significant number of users who disagree with the assertion that "all LLM use is bad", which is why I have doubts that a proposal to ban LLMs entirely will ever pass. SuperPianoMan9167 (talk) 15:00, 24 November 2025 (UTC)
- It's WP:NOTVOTE and it should never be. As I said before, anyone who wants to open up uses for LLMs on Wikipedia should explain precisely, minutely, down to the atomic level how and why LLMs can be used on Wikipedia and how these uses are legitimate and minimally disruptive as opposed to all other uses. The case against LLMs has been made practically thousands of times, while the pro-LLM case consists of nothing more than handwaving towards vague say-so assertions and AI company marketing buzzwords. 〜 Festucalex • talk 15:09, 24 November 2025 (UTC)
- WikiProject AI Tools was formed to coordinate legitimate uses of LLMs. SuperPianoMan9167 (talk) 22:31, 24 November 2025 (UTC)
- Also, the rules are principles. The general idea of this guideline is that using LLMs to generate new articles is bad. It is not and should not be a blanket ban on LLMs. LLMs are tools. Like all tools, they have valid use cases but can be misused. Yes, their outputs may be inherently unreliable, but it is incorrect to say they have no use cases. SuperPianoMan9167 (talk) 22:39, 24 November 2025 (UTC)
- It's WP:NOTVOTE and it should never be. As I said before, anyone who wants to open up uses for LLMs on Wikipedia should explain precisely, minutely, down to the atomic level how and why LLMs can be used on Wikipedia and how these uses are legitimate and minimally disruptive as opposed to all other uses. The case against LLMs has been made practically thousands of times, while the pro-LLM case consists of nothing more than handwaving towards vague say-so assertions and AI company marketing buzzwords. 〜 Festucalex • talk 15:09, 24 November 2025 (UTC)
- If you look at the RfC you can see a significant number of users who disagree with the assertion that "all LLM use is bad", which is why I have doubts that a proposal to ban LLMs entirely will ever pass. SuperPianoMan9167 (talk) 15:00, 24 November 2025 (UTC)
- The fact that WP:IAR exists doesn't mean that we ought to actively introduce crippling loopholes into guidelines. Imagine if we banned vandalism only on new articles, or only on articles that begin with the letter P. 〜 Festucalex • talk 14:57, 24 November 2025 (UTC)
- And all policies and guidelines have a built-in loophole anyway. SuperPianoMan9167 (talk) 14:36, 24 November 2025 (UTC)
- Support but with the caveat that I think it's too broad for what this policy has already been approved for. This edit implies any use of LLMs is unacceptable, even if it's not LLM-generated content being included in an article. Given that there's still arguably a carveout for using LLMs to assist with idea generation etc, my Counterproposal if people find it more appealing can be found at #Further amendment proposal #3: Athanelar. Athanelar (talk) 14:43, 24 November 2025 (UTC)
- I think we ought to actively discourage other non-submission uses, even if we can't detect them. At least we'd be making it clear that the community disapproves. This only will stop the honest ones, but hey, that's something. 〜 Festucalex • talk 14:55, 24 November 2025 (UTC)
- I agree, that's why my initial statement is support, I just wanted to present a counterproposal in case the majority would prefer something that doesn't widen the scope so much. Athanelar (talk) 14:59, 24 November 2025 (UTC)
- Can you put the counterproposal in a different section to avoid confusion? NicheSports (talk) 15:10, 24 November 2025 (UTC)
- I agree, that's why my initial statement is support, I just wanted to present a counterproposal in case the majority would prefer something that doesn't widen the scope so much. Athanelar (talk) 14:59, 24 November 2025 (UTC)
- I think we ought to actively discourage other non-submission uses, even if we can't detect them. At least we'd be making it clear that the community disapproves. This only will stop the honest ones, but hey, that's something. 〜 Festucalex • talk 14:55, 24 November 2025 (UTC)
Support. We should probably add clarifying language to this (I have some ready I can propose), but definitely agree and think the community is ready to support a complete LLM ban NicheSports (talk) 15:09, 24 November 2025 (UTC)Now that I understand what is meant by this proposal, I don't support it. I would support a ban on using LLMs to generate article content (per Kowal2701) NicheSports (talk) 00:11, 25 November 2025 (UTC)- Similar to my comment below, this completely changes the purpose of this guideline (expanding its scope from new articles to all edits) and would require a new RfC. Toadspike [Talk] 15:48, 24 November 2025 (UTC)
- Definitely – I interpreted this as workshopping something that will be brought to another RFC. Is that fine to do here or should we move it to WP:AIC? NicheSports (talk) 15:50, 24 November 2025 (UTC)
- Yes, what we're doing here is the WP:RFCBEFORE that the original proposal never got. There are already 3 wordings on the table: mine, qcne's, and Athanelar's, and I hope this eventually crystallizes (after more refining) into a community-wide RFC. As the closing note pointed out, this issue requires a lot more work and discussion, and a lot of people agreed to Cremastra's proposal because they wanted anything to be instituted to stem the bleeding while the community deliberated on a wider policy. 〜 Festucalex • talk 16:14, 24 November 2025 (UTC)
- Oppose. AI is a tool. For example, I routinely use AI to generate {{cite journal}} templates from loose text (like the references in other publications) or to check my grammar. This is IMHO no more dangerous than using the https://citer.toolforge.org/ for the same purpose (or Grammarly to check the grammar). We should encourage the disclosure, not start an un-enforceable Prohibition. Викидим (talk) 21:29, 24 November 2025 (UTC)
- @Викидим What are your thoughts on my proposal #2, below, which has a specific carve-out for limited LLM use? qcne (talk) 21:31, 24 November 2025 (UTC)
- Does creating the journal template count as generating text for articles? GarethBaloney (talk) 21:44, 24 November 2025 (UTC)
- The sources are certainly part of the text. According to views expressed in the discussion, AI can hallucinate the citation. For the avoidance of doubt, in my opinion – and experience – this is not the case with this use, but then there are many other safe uses of AI – like translation – and all of these IMHO shall be explicitly allowed (yes, I also happen to like m-dashes). Викидим (talk) 22:10, 24 November 2025 (UTC)
This is IMHO no more dangerous than using [...]
I strongly disagree that using the hallucination machine specifically designed to create natural-sounding but not-necessarily-accurate language output is 'no more dangerous' for these purposes than using tools specifically designed for the tasks at hand. Athanelar (talk) 21:54, 24 November 2025 (UTC)- The AI is not made to manufacture lies any more than a keyboard is. The difference is in performance and intent of the user – these are the ones we might want to address. Blaming tools is IMHO a dead end, Luddites, ostensibly also fighting for quality, quickly lost their battle. Викидим (talk) 22:13, 24 November 2025 (UTC)
- Are unscrupulous editors not more likely to use something like ChatGPT to try and sound professional even when they aren't? Besides, Grammarly is not the same as asking an LLM to generate a Wikipedia article, complete with possibly fake sources. GarethBaloney (talk) 22:59, 24 November 2025 (UTC)
- (1)
try and sound professional even when they aren't
We are (almost) all amateurs here, so a tool that makes non-professionals sound better is not necessarily bad. (2) The proposal readsshould not be used to edit Wikipedia
leaving no exceptions for grammar checking. Викидим (talk) 23:23, 24 November 2025 (UTC)- Grammar checking can done (and has been being done for decades) using non-LLM artificial intelligence models and programs. 〜 Festucalex • talk 23:35, 24 November 2025 (UTC)
- I was going to point this out, haha. There's been automatic grammar checking and spellcheck since what- Word 97? No LLM required. Stickymatch 02:58, 25 November 2025 (UTC)
- All modern translation and grammar checking tools use AI, as it produces superior results. Google for obvious reasons was heavily invested into both for almost 20 years. According to my source, they at first were trying to go the non-AI way (studying and parsing individual grammars, etc.) only to discover than direct mapping between texts does a better job at a lower cost. Everyone else of any importance followed their approach many years ago. It was just not a generic AI that we know now, but an AI nonetheless. Some detail can be found, for example, on p. 19 of the 2008 thesis [1] (there should be better written sources, naturally, but the fact is very well known). Викидим (talk) 06:03, 25 November 2025 (UTC)
- I was going to point this out, haha. There's been automatic grammar checking and spellcheck since what- Word 97? No LLM required. Stickymatch 02:58, 25 November 2025 (UTC)
- Grammar checking can done (and has been being done for decades) using non-LLM artificial intelligence models and programs. 〜 Festucalex • talk 23:35, 24 November 2025 (UTC)
- (1)
- Are unscrupulous editors not more likely to use something like ChatGPT to try and sound professional even when they aren't? Besides, Grammarly is not the same as asking an LLM to generate a Wikipedia article, complete with possibly fake sources. GarethBaloney (talk) 22:59, 24 November 2025 (UTC)
- The AI is not made to manufacture lies any more than a keyboard is. The difference is in performance and intent of the user – these are the ones we might want to address. Blaming tools is IMHO a dead end, Luddites, ostensibly also fighting for quality, quickly lost their battle. Викидим (talk) 22:13, 24 November 2025 (UTC)
- Strong support: removes all ambiguity. Z E T AC 21:34, 24 November 2025 (UTC)
- Oppose, people often use stuff like Grammarly. The ban needs to be on generating content Kowal2701 (talk) 21:38, 24 November 2025 (UTC)
- Grammarly is not an LLM. 〜 Festucalex • talk 23:34, 24 November 2025 (UTC)
- It's powered by LLMs:
In April 2023, Grammarly launched a product using generative AI built on the GPT-3 large language models.
(from the article) SuperPianoMan9167 (talk) 23:35, 24 November 2025 (UTC) Generative AI tools like Grammarly are powered by a large language model, or LLM
- from the Grammarly website [2] GreenLipstickLesbian💌🧸 23:37, 24 November 2025 (UTC)- Then users can use a grammar checker other than Grammarly. 〜 Festucalex • talk 23:40, 24 November 2025 (UTC)
- Wow. voorts (talk/contributions) 23:45, 24 November 2025 (UTC)
- I think what users on both sides of this ideological divide are running up against is a common thing that happens whenever there is such a divide between two groups; both groups assume that members of the other group are operating on the same fundamental value system that they are, and that their arguments are built from that same value system.
- I.e., the 'less restrictive' party here (voorts, qcne et al) is beginning from the core value that 'the reason LLMs are problematic is that their output is generally not compatible with Wikipedia's standards,' and the argument that stems from that is 'any LLM policy we make should be designed around bringing the result of LLM usage in line with Wikipedia's standards, whether that be directly LLM-generated text, or simply users utilising LLMs in their creative process.'
- The 'more restrictive' part here (myself, Festucalex et al) is beginning from the core value that 'LLMs and their output are inherently undesirable and detrimental (for some of us to the internet as a whole, for others perhaps specifically only to Wikipedia)' and the argument that stems from that is 'any LLM policy we make should be designed around minimising the influence of LLMs on the content of Wikipedia.'
- That's why Festucalex pivoted here and said people should use something other than Grammarly. We simply believe that it's imperative that we purge LLM output from Wikipedia, regardless of whether it's reviewed or policy compliant or anything else. It's also important to keep in mind that NEWLLM as it stands is a product of the latter ideology, not the former, and I think that's why it appears to be so flawed to people like qcne; because it's solving a completely different problem than the one they're trying to solve. Athanelar (talk) 01:03, 25 November 2025 (UTC)
- I understand your views. What I don't see is evidence. voorts (talk/contributions) 01:11, 25 November 2025 (UTC)
- Exactly. I made an identical point about this fundamental divide in the RfC. (I have discovered I am pivoting more towards the "less restrictive" side in my comments here.) SuperPianoMan9167 (talk) 01:59, 25 November 2025 (UTC)
- Yes, I think people understand the divide is between this idea of fundamentalism (the intrinsic nature of LLMs is that they are bad) and those who don't subscribe to it. But what many of us who oppose this fundamentalism think is that rather than being based on evidence (voorts), it's an article of faith. Katzrockso (talk) 02:56, 25 November 2025 (UTC)
- Wow. voorts (talk/contributions) 23:45, 24 November 2025 (UTC)
- Then users can use a grammar checker other than Grammarly. 〜 Festucalex • talk 23:40, 24 November 2025 (UTC)
- It's powered by LLMs:
- Grammarly is not an LLM. 〜 Festucalex • talk 23:34, 24 November 2025 (UTC)
- Not workable – if somebody comes up to me and says "Hey, you've made a mistake in Hanako (elephant)" or shows up on BLP saying "You have my birthdate wrong", then I don't care if they use a LLM to write their post, and I don't care if they use an LLM to translate it from their native language. I'm not even sure I care if they use the LLM to make the edit/explain themselves in the edit summary (but I'd rather they disclose it, for obvious reasons), assuming they do it right.
- Ultimately, somebody who repeatedly introduces hoax material/fictitious references to articles repeatedly should be blocked quickly, whether they're using AI or not. Somebody who repeatedly introduces spammy text repeatedly should be blocked, whether they have a COI or not. Somebody who repeatedly introduces unsourced negative BLP information should be blocked, whether or not they're a vandal/have a COI. Somebody who repeatedly inserts copyright violations should be blocked, whether they're acting in good faith or not. The LLM is a red herring – once we've established that the content somebody writes is seriously flawed in a way that's not just accidentally, we need to block the contributor. If they say "but it's not my fault, ChatGPT told me to" then unblocks admins can take that into consideration & we can tban that editor from using automated or semi-automated tools as an unblock condition. GreenLipstickLesbian💌🧸 23:00, 24 November 2025 (UTC)
- +1 This whole guideline is everyone just sticking their heads in the sand and hoping LLM usage will go away. We should be thinking about how LLMs can be used well, not outright banning their use. voorts (talk/contributions) 23:08, 24 November 2025 (UTC)
- It's also yet another example of why PAGmaking on the fly and without advanced deliberation is a terrible idea. voorts (talk/contributions) 23:10, 24 November 2025 (UTC)
- There are no legitimate uses for LLMs, just like there are no legitimate uses for chemical weapons. They're both technically a tool, and anyone can argue that sarin gas can technically be used against rodents, but is it really worth the risk of having it around the kitchen? 〜 Festucalex • talk 23:46, 24 November 2025 (UTC)
- Are you seriously comparing LLMs to chemical weapons? voorts (talk/contributions) 23:48, 24 November 2025 (UTC)
- Yep. 〜 Festucalex • talk 23:49, 24 November 2025 (UTC)
- 65k bytes to get to Godwin's Law, nice! GreenLipstickLesbian💌🧸 00:03, 25 November 2025 (UTC)
- Festucalex please lol. Also, idk if this is written down anywhere, there's probably an essay, but the fastest way to nuke support for a plausible idea here is to start saying stuff like "X is like sarin gas" NicheSports (talk) 00:06, 25 November 2025 (UTC)
- I think the analogy I'm making is clear: it's a technology whose risks override any potential benefits, at least in this context. Forget sarin gas, let's say it's like a pogo stick in a porcelain museum. 〜 Festucalex • talk 00:09, 25 November 2025 (UTC)
- Yep. 〜 Festucalex • talk 23:49, 24 November 2025 (UTC)
There are no legitimate uses for LLMs
What about this, and this, and this, and this, and this, and this, and this, and...- You get the point. SuperPianoMan9167 (talk) 23:53, 24 November 2025 (UTC)
- There are no legitimate uses of LLMs on Wikipedia. I have said it before and I will say it again. Even if it is impossible to stop all LLM usage, guidelines like this one can serve as a statement of principle. Yours, &c. RGloucester — ☎ 00:00, 25 November 2025 (UTC)
- So everyone in WikiProject AI Tools is editing in bad faith? SuperPianoMan9167 (talk) 00:02, 25 November 2025 (UTC)
- They're using bad tools in good faith because we don't have a comprehensive guideline yet. 〜 Festucalex • talk 00:04, 25 November 2025 (UTC)
- Why can't LLMs ever be legitimately used on Wikipedia? voorts (talk/contributions) 00:06, 25 November 2025 (UTC)
- What is the philosophical mission of Wikipedia? WP:ABOUT begins with the Jimbo quote
Imagine a world in which every single person on the planet is given free access to the sum of all human knowledge. That's what we're doing.
- LLMs don't produce human knowledge. They produce realistic-sounding human language, because that's what they're designed to do, it's all they've ever been designed to do, it's all they can ever be designed to do – it's literally in their fundamental structure. Not only that, but the output they produce is explicitly biased by their programming and their training data, which are both determined by a private company with no transparency or oversight.
- Would you be content if the entirety of Wikipedia's article content were created and maintained by a single editor? Let's assume that single editor is flawless in their work; all of their work is rigorous and meets the standards set by the community (who are still active in a non-article capacity), it's perfectly sourced etc; it's just that it's all coming from a single individual.
- What about 90%? 80%? 50%? What percentage of the encyclopedia could be written and managed by a single individual before it would compromise the collaborative nature of Wikipedia?
- Thesis 1: The output of an LLM is, effectively, the work of a single individual. Obviously it's more complex than that, but LLM output all has the same tone because it's all the product of the same algorithms from the same privatised training data.
- Thesis 2: Given the opportunity, LLM output will comprise an increasingly large percentage of Wikipedia, because it is far faster to copyedit, rewrite and create with LLMs than it is to do so manually. This will only increase the more advanced LLMs get, because their output will require less and less human oversight to comply with Wikipedia's standards.
- The conclusion, then, is how much of Wikipedia's total content you're willing to accept being authored by what is essentially a single individual with inscrutable biases and motivations. There must be some cutoff in your mind; and our contention is that if you allow them to get their foot in the door, then the result is going to end up going beyond whatever percentage cutoff you've decided as acceptable. Athanelar (talk) 02:48, 25 November 2025 (UTC)
- "The output of an LLM is, effectively, the work of a single individual. Obviously it's more complex than that" is putting it lightly. Notwithstanding the fact that more than one LLM exists, editors who opposite anti-LLM fundamentalism here have consistently advocated for the necessity of human review and editing when evaluating LLM output. Katzrockso (talk) 02:58, 25 November 2025 (UTC)
editors who opposite anti-LLM fundamentalism here have consistently advocated for the necessity of human review and editing when evaluating LLM output.
- Well, okay, take my initial example again, then. Let's say John Wikipedia is still producing 50 or 80 or 100% or whatever of Wikipedia's output, but it's first being checked by somebody else to make sure it meets standards. Would it now be acceptable that John Wikipedia is the sole author of the majority (or a plurality or simply a large percentage) of Wikipedia's content, simply because his work has been double-checked? Athanelar (talk) 03:09, 25 November 2025 (UTC)
- Yes, if John Wikipedia's contributions all accurately represents the sources as evaluated by other editors and meets our content standards, why would that be a problem? Katzrockso (talk) 03:43, 25 November 2025 (UTC)
- Well, that's just one of those fundamental value differences we'll never overcome, then. I don't think John Wikipedia should be the primary author of content on Wikipedia because that would undermine the point of Wikipedia being a communal project, and for that same reason I don't think we should allow AI-generated content to steadily overtake Wikipedia either, whether or not it's been reviewed or verified or what have you. Athanelar (talk) 03:47, 25 November 2025 (UTC)
- This happens all the time at smaller Wikipedias. There just aren't enough people who speak some languages + can afford to spend hours/days/years editing + actually want to do this for fun to have "a communal project" the way that you're thinking of it. WhatamIdoing (talk) 06:09, 27 November 2025 (UTC)
- Well, that's just one of those fundamental value differences we'll never overcome, then. I don't think John Wikipedia should be the primary author of content on Wikipedia because that would undermine the point of Wikipedia being a communal project, and for that same reason I don't think we should allow AI-generated content to steadily overtake Wikipedia either, whether or not it's been reviewed or verified or what have you. Athanelar (talk) 03:47, 25 November 2025 (UTC)
- Yes, if John Wikipedia's contributions all accurately represents the sources as evaluated by other editors and meets our content standards, why would that be a problem? Katzrockso (talk) 03:43, 25 November 2025 (UTC)
- What about uses of LLMs that aren't generating new content (which is what most of the tools at WikiProject AI Tools are about)? SuperPianoMan9167 (talk) 03:03, 25 November 2025 (UTC)
- I don't have any issue with that, because it's functionally impossible to identify and police. That's why my proposal is worded differently to Festucalex's, because I think it's only sensible and possible to prohibit the inclusion of AI-generated text, not the use of AI in one's editing process at all. Athanelar (talk) 03:11, 25 November 2025 (UTC)
- I asked why they can't ever be used. I have several FAs and GAs, but I'm terrible at spelling. If, as seems to be the direction the world is heading, most browsers replaced their original spellcheckers with LLM-powered ones, are you suggesting I'd need to install an obscure browser created by anti-AI people to avoid running afoul of this proposed dogmatism? voorts (talk/contributions) 13:38, 25 November 2025 (UTC)
- No, my proposal is to ban adding AI-generated content to Wikipedia, not to ban people using AI as part of their human editing workflow, that would be unenforceable. Athanelar (talk) 14:13, 25 November 2025 (UTC)
- "The output of an LLM is, effectively, the work of a single individual. Obviously it's more complex than that" is putting it lightly. Notwithstanding the fact that more than one LLM exists, editors who opposite anti-LLM fundamentalism here have consistently advocated for the necessity of human review and editing when evaluating LLM output. Katzrockso (talk) 02:58, 25 November 2025 (UTC)
- What is the philosophical mission of Wikipedia? WP:ABOUT begins with the Jimbo quote
- So everyone in WikiProject AI Tools is editing in bad faith? SuperPianoMan9167 (talk) 00:02, 25 November 2025 (UTC)
- None of these are legitimate, and I hope that our new guideline puts an end to them before they become standard practice. No use designing and marketing kitchen canisters for sarin gas. 〜 Festucalex • talk 00:02, 25 November 2025 (UTC)
- This reads more like a moral panic than a logically & evidentially supported proposal Katzrockso (talk) 00:52, 25 November 2025 (UTC)
- It's not a moral issue. LLMs undermine the whole foundation of this project. They were developed by companies that are in direct competition with Wikipedia. These companies have used our content with the aim of monetarising it through LLM chatbots, and now plot to replace Wikipedia altogether, à la Grokipedia. Promoting LLM use will rot the project from within, and ultimately result in its collapse. Yours, &c. RGloucester — ☎ 06:12, 25 November 2025 (UTC)
- Slippery slope Katzrockso (talk) 14:09, 25 November 2025 (UTC)
- Yes, it is a 'slippery slope' argument, if anything, a better term is 'death by a thousand cuts'. It is a common misconception that a slippery slope argument is an inherent fallacy. I find it very interesting that some editors here prefer to place emphasis on the quality of the content produced, rather than on the actual mission of the project. Let us take this kind of argument to its logical conclusion. If some form of LLM were to advance, and were able to produce content of equivalent quality to the best Wikipedia editors, would we wind up the project, our mission complete? I'd like to hope that the answer would be no, because Wikipedia is meant to be a free encyclopaedia that any human can edit.
- When one outsources some function to these 'tools', whether it be spellchecking or article writing, it will inevitably result in the decline of one's own copyediting and writing skills. As our editors lose the skills they have gained by working on this encyclopaedia over these past two decades, they will become more and more reliant on the LLMs. What happens then, when the corporations that own these LLMs decide to cease providing their 'tools' to the masses gratis? Editors, with their own skills weakened, will become helpless. Perhaps only those with the ability to pay to access LLMs will be able to produce content that meets new quality standards that have shifted to align with LLM output. Wikipedia's quality will decline as the pool of skilled editors dwindles, and our audience will shift toward alternatives, like the LLMs themselves. The whole mission of the project will be called into question, as Wikipedia loses its competitive advantage in the marketplace of knowledge. Yours, &c. RGloucester — ☎ 00:20, 26 November 2025 (UTC)
- But we shouldn't sacrifice newcomers in the name of preserving the project by blocking them for using LLMs right after they join when they have no clue why or how LLMs are unreliable. SuperPianoMan9167 (talk) 00:25, 26 November 2025 (UTC)
- My hope for this guideline is that it will prevent that kind of blocking, since good faith newcomers who show up using LLMs will get reverted and linked to this page, instead of the previous situation where they get asked politely to stop, then when they don't, they eventually get dragged to ANI and TBanned from using LLMs, which is frustrating and much more difficult to understand than a simple page that says "Wikipedia doesn't accept LLM-generated articles because that's one of the things that makes Wikipedia different from Grokipedia". -- LWG talk 00:57, 26 November 2025 (UTC)
- Assuming we adopt this proposal, and assuming that
good faith newcomers
abide, there will still be editors whoget asked politely to stop
(i.e., they will be warned),then when they don't, they eventually [will] get dragged to ANI
and blocked, not TBANNED (by my count, only 3 editors are topic banned from LLM use per Wikipedia:Editing restrictions). I've blocked/revoked TPA of many accounts for repeated LLM use and I can assure you that almost none of those editors knew or cared about what any of our guidelines said. In no universe would a no-LLM rule result in any change to the process of having to drag people to ANI to get them blocked. voorts (talk/contributions) 01:11, 26 November 2025 (UTC)- ^this.
- To use a real example, every single time anybody makes a post, they agree not to copy paste content from other sites, attribute it if they copy from within Wikipedia, and there are sooooooooooooooooooooooo many copyright blocks given out every year. Most of these people unambigiously acted in good faith. And each and every one got dragged to a noticeboard, often multiple times, before they were blocked. I'm sorry, but this won't be any different - and Wikipedia naturally draws the type of people who like to ask "why", so we're still going to have to point them to WP:LLM and won't be swayed by a simple page saying "no, because I said so".GreenLipstickLesbian💌🧸 08:05, 26 November 2025 (UTC)
- SuperPianoMan, I agree with you, and I also agree with LWG. The problem until now was that Wikipedia has failed to clearly explain its stance on LLMs, blocking myriad editors without any obvious policy or guideline-based rationale. This ad hoc form of justice has gone on too long, and is unfair to newcomers, and is one reason why I supported the adoption of this guideline, despite its shortcomings. The community needs to clearly explain Wikipedia's purpose, and why LLMs are not suited for use on Wikipedia, to both new editors and our readership. Wikipedia should aim to promote the value of a project that is free, that anyone can edit, and that is made by independent men and women from right across the world. If anything, our position as a human encyclopaedia should be a merit in a competitive information marketplace. Yours, &c. RGloucester — ☎ 01:11, 26 November 2025 (UTC)
they eventually get dragged to ANI and TBanned from using LLMs, which is frustrating and much more difficult to understand than a simple page
- Yes exactly. People were regularly being sanctioned for a rule that they could not have known about it because no such rule existed. Even if not a single newbie ends up reading this guideline, its existence is still beneficial, because it means we are no longer punishing people for breaking unwritten rules. Gnomingstuff (talk) 09:50, 26 November 2025 (UTC)
- I don't think it's ever been practice to sanction somebody for just AI use, though? It's always been fictitious references, violating mass create, copyright issues, WP:V failures, UPE/COI, NPOV violations, ect. I'm not saying no admin has ever blocked a user for only using LLMs, (admins do act outside of policy, sometimes!) though I'd be interested to see any examples. Thanks, GreenLipstickLesbian💌🧸 10:23, 26 November 2025 (UTC)
- Usually it's more than just AI use if it ends up at ANI but I doubt the distinction is really getting through to people, and a lot of !votes to block, CBAN, TBAN, etc. are made with the rationale of "AI has no place on Wikipedia ever." Sometimes the bulk of the thing is that (example: Wikipedia:Administrators'_noticeboard/IncidentArchive1185#User:_BishalNepal323)
- There's also the uw-ai to uw-ai4 series of templates, which implies a four-strikes rule; I don't use them but others do. Gnomingstuff (talk) 10:51, 26 November 2025 (UTC)
- In your example, Ivanvector blocked for disruptive editing, not solely for AI use. voorts (talk/contributions) 14:08, 26 November 2025 (UTC)
- What are we arguing about here? Obviously people are getting blocked for LLM misuse, not LLM use. And I agree with Gnomingstuff and LWG etc. I believe in AGF and have dozens of examples of editors who have stopped using LLMs after I alert them to the difficulty of using them in compliance with content policies. NicheSports (talk) 14:22, 26 November 2025 (UTC)
- We're arguing about the assertion that we need a no AI rule because we've been blocking people solely for AI use without any attendant disruption. That is not true and therefore not a good reason to impose a no AI rule. voorts (talk/contributions) 14:23, 26 November 2025 (UTC)
- To be more clear, when I said "the bulk of the thing" I meant the tenor of the responses in an average ANI posting. Several regulars at ANI generally seem to be under the impression that we do not allow AI, so most !votes are going to have largely unchallenged comments like
CIR block now. This LLM shit needs to be stopped by any means necessary.
orLLM use should warrant an immediate block, only lifted when a user can demonstrate a clear understanding that they can't use LLMs in any situation.
Or if someone gets hit with a uw-ai2, they are toldPlease refrain from making edits generated using a large language model (an "AI chatbot" or other application using such technology) to Wikipedia pages.
Gnomingstuff (talk) 00:39, 27 November 2025 (UTC)
- To be more clear, when I said "the bulk of the thing" I meant the tenor of the responses in an average ANI posting. Several regulars at ANI generally seem to be under the impression that we do not allow AI, so most !votes are going to have largely unchallenged comments like
- We're arguing about the assertion that we need a no AI rule because we've been blocking people solely for AI use without any attendant disruption. That is not true and therefore not a good reason to impose a no AI rule. voorts (talk/contributions) 14:23, 26 November 2025 (UTC)
- What are we arguing about here? Obviously people are getting blocked for LLM misuse, not LLM use. And I agree with Gnomingstuff and LWG etc. I believe in AGF and have dozens of examples of editors who have stopped using LLMs after I alert them to the difficulty of using them in compliance with content policies. NicheSports (talk) 14:22, 26 November 2025 (UTC)
- In your example, Ivanvector blocked for disruptive editing, not solely for AI use. voorts (talk/contributions) 14:08, 26 November 2025 (UTC)
- People say a lot of incorrect things at ANI. We don't usually amend the PAGs to accommodate those people. voorts (talk/contributions) 01:05, 27 November 2025 (UTC)
- On the contrary, that's exactly what we do. PAGs are meant to reflect the actual practice of editors. The process of updating old PAGs or creating new ones to reflect changes in editorial practice is the foundation that has built all of our policies and guidelines. Yours, &c. RGloucester — ☎ 03:10, 27 November 2025 (UTC)
- We're not tho. Nobody as far as I can tell has ever been blocked solely for using AI/LLMs. This is a red herring. voorts (talk/contributions) 13:54, 26 November 2025 (UTC)
- I don't think it's ever been practice to sanction somebody for just AI use, though? It's always been fictitious references, violating mass create, copyright issues, WP:V failures, UPE/COI, NPOV violations, ect. I'm not saying no admin has ever blocked a user for only using LLMs, (admins do act outside of policy, sometimes!) though I'd be interested to see any examples. Thanks, GreenLipstickLesbian💌🧸 10:23, 26 November 2025 (UTC)
- Assuming we adopt this proposal, and assuming that
- My hope for this guideline is that it will prevent that kind of blocking, since good faith newcomers who show up using LLMs will get reverted and linked to this page, instead of the previous situation where they get asked politely to stop, then when they don't, they eventually get dragged to ANI and TBanned from using LLMs, which is frustrating and much more difficult to understand than a simple page that says "Wikipedia doesn't accept LLM-generated articles because that's one of the things that makes Wikipedia different from Grokipedia". -- LWG talk 00:57, 26 November 2025 (UTC)
- If tomorrow, a LLM came out that could produce a FA-quality article on a given topic in 2 minutes, would you still suggest that LLMs have no place on Wikipedia?
- Histrionic comparisons about scenarios that won't happen go both ways. Katzrockso (talk) 07:59, 27 November 2025 (UTC)
- Yes, I would, because using such a technology to produce articles is contrary to the purpose and mission of Wikipedia. Wikipedia's defining principles are that it is free, that any human can edit it, and that its content is produced collaboratively by divers volunteers. Others and I have already explained why machine-produced content contravenes these principles. I care less whether an article is 'FA-quality', whatever that means, and more about how it was made. Yours, &c. RGloucester — ☎ 08:46, 27 November 2025 (UTC)
Wikipedia's defining principles are that it is free, that any human can edit it, and that its content is produced collaboratively by divers volunteers. Others and I have already explained why machine-produced content contravenes these principles.
I am certainly not a fan of LLMs for generating content. However, I don't see how you can say that a human editor, who chooses to use an LLM to generate some content, checks the content to make sure that it accurately reflects its sources and is otherwise PAG-compliant, and finally adds the sourced content to an article contravenes these principles. Wikipedia is no less free, any human can still edit it, and divers volunteers are still able to collaboratively work on the article. Even though that particular content happend to have been produced by a machine. Cheers, SunloungerFrog (talk) 09:12, 27 November 2025 (UTC)
- Yes, in this hypothetical thought experiment. We don't live in a thought experiment. LLM output is getting better in that it is less obviously bad, but the nature of this kind of text generation means it is not suited well and may never be suited well to producing verifiable nonfiction articles. Gnomingstuff (talk) 14:18, 27 November 2025 (UTC)
- Yes, I would, because using such a technology to produce articles is contrary to the purpose and mission of Wikipedia. Wikipedia's defining principles are that it is free, that any human can edit it, and that its content is produced collaboratively by divers volunteers. Others and I have already explained why machine-produced content contravenes these principles. I care less whether an article is 'FA-quality', whatever that means, and more about how it was made. Yours, &c. RGloucester — ☎ 08:46, 27 November 2025 (UTC)
- But we shouldn't sacrifice newcomers in the name of preserving the project by blocking them for using LLMs right after they join when they have no clue why or how LLMs are unreliable. SuperPianoMan9167 (talk) 00:25, 26 November 2025 (UTC)
- Slippery slope Katzrockso (talk) 14:09, 25 November 2025 (UTC)
- It's not a moral issue. LLMs undermine the whole foundation of this project. They were developed by companies that are in direct competition with Wikipedia. These companies have used our content with the aim of monetarising it through LLM chatbots, and now plot to replace Wikipedia altogether, à la Grokipedia. Promoting LLM use will rot the project from within, and ultimately result in its collapse. Yours, &c. RGloucester — ☎ 06:12, 25 November 2025 (UTC)
- Why? What's wrong with an LLM spellchecker other than that you don't like it? voorts (talk/contributions) 13:47, 25 November 2025 (UTC)
- +1 Even the autocorrect on my iPhone uses a transformer, which is the same kind of neural network as that which powers LLMs. The major difference is in size (they're called large language models for a reason). SuperPianoMan9167 (talk) 14:19, 25 November 2025 (UTC)
- This reads more like a moral panic than a logically & evidentially supported proposal Katzrockso (talk) 00:52, 25 November 2025 (UTC)
- There are no legitimate uses of LLMs on Wikipedia. I have said it before and I will say it again. Even if it is impossible to stop all LLM usage, guidelines like this one can serve as a statement of principle. Yours, &c. RGloucester — ☎ 00:00, 25 November 2025 (UTC)
- Are you seriously comparing LLMs to chemical weapons? voorts (talk/contributions) 23:48, 24 November 2025 (UTC)
- +1 This whole guideline is everyone just sticking their heads in the sand and hoping LLM usage will go away. We should be thinking about how LLMs can be used well, not outright banning their use. voorts (talk/contributions) 23:08, 24 November 2025 (UTC)
Support. This guideline is a good start and I am glad it was approved but it should be expanded.LLMS are not an acceptable way to edit wiki as they cause lots of issues like hallucinations.Changing to oppose as I just realised this goes beyond creating content and would include thongs like grammerly .GothicGolem29 (Talk) 18:35, 28 November 2025 (UTC)- @GothicGolem29: The Grammarly thing isn't necessarily included. As long as it doesn't generate its own output, it's not really a large language model, even if it claims to use one. The important thing here is that de novo output doesn't make it to the encyclopedia. 〜 Festucalex • talk 23:46, 3 December 2025 (UTC)
Further amendment proposal #2: qcne
[edit]Why the current version of the guideline is bad: A single sentence that clunkily prohibits all LLM use on new articles. How do we define that? Does "from scratch" cover the lead section only? the whole article? a stub? a list? Dunno! It doesn't bother to say! This is banning a method without actually defining where it begins or ends. Since no one can reliably tell if an LLM was used, enforcement would be impossible. LLM detection is unreliable, and we already have CSD G15 to handle unreviewed LLM slop.
I wrote this up a while ago and am now posting it for community consensus. I did just replace the Guideline with my version, but was sadly reverted.
Version 1
[edit]See version 3 posted below
| ||
|---|---|---|
Happy for feedback, but we really do need to do something quickly to fix the current version of this new Guideline. qcne (talk) 14:27, 24 November 2025 (UTC)
|
Version 2
[edit]See version 3 posted below
| ||
|---|---|---|
{{{1}}}
{{{1}}}
|
Version 3
[edit]I still believe this Guideline is grossly short and needs to be expanded a little bit, but am also taking into account the feedback given.
Would my much shorter Version 3 guideline here be at all acceptable to the more hard-line anti-LLM editors? I have:
- made it more concise.
- removed the limited use carve-out, with the idea that experienced editors can be trusted to use LLMs, and this Guideline is more focused towards new editors.
Hidden ping to users who have participated. qcne (talk) 22:37, 3 December 2025 (UTC)
- I predict that hard-line anti-LLM editors will still want the word "unreviewed" removed from "do not add unreviewed LLM-generated content to new or existing articles". SuperPianoMan9167 (talk) 22:40, 3 December 2025 (UTC)
- Yes, potentially, but I would like to have some sort of compromise! qcne (talk) 22:41, 3 December 2025 (UTC)
- Agreed. SuperPianoMan9167 (talk) 22:42, 3 December 2025 (UTC)
- The compromise on the reviewed language is to only allow it for experienced editors with an llm-user right. A few editors have suggested this. There is a vast amount of evidence (AfC, NPP, 1346 (hist · log), any WikiEd class, etc.), that inexperienced editors essentially never sufficiently review LLM-generated prose or citations. NicheSports (talk) 23:31, 3 December 2025 (UTC)
- I think that'd have to be a separate RfC, would support Kowal2701 (talk) 23:32, 3 December 2025 (UTC)
- Given my experience with CCIs of autopatrolled and NPR editors, and even the odd admin, would you be offended if I scream "NO!" really loudly at the idea of tying LLM use to a user right?
- Sorry, but I've had too much trouble with older users being grandfathered in to the autopatrolled system to be comfortabel with the idea of giving somebody the right to say "Oh, but my use of Chat GPT is fine - I have autopatrolled!" GreenLipstickLesbian💌🧸 23:48, 3 December 2025 (UTC)
- Valid point. There's been at least one editor who had their autopatrolled right revoked for creating unreviewed LLM-generated articles. SuperPianoMan9167 (talk) 23:53, 3 December 2025 (UTC)
- far from being offended, I actually laughed 😅 but I would still much much rather deal with that problem than continuing the fantasy that inexperienced editors should be allowed to use these tools with review that is never performed! NicheSports (talk) 23:56, 3 December 2025 (UTC)
- Disagree with adding an LLM-user right, but either way I think that is best workshopped elsewhere. fifteen thousand two hundred twenty four (talk) 23:55, 3 December 2025 (UTC)
- Yes, potentially, but I would like to have some sort of compromise! qcne (talk) 22:41, 3 December 2025 (UTC)
- The issue with "unreviewed" is that it is at risk of being wikilawyered, even a bad review would be kosher. Otherwise it's great. I worry that by having a nuanced approach, it'd struggle to communicate a clear message, especially since people dispositioned to use LLMs likely already have CIR issues that LLM-use is compensating for. I'd remove "unreviewed", and
especially where the content is unverifiable, fabricated, or otherwise non-compliant with existing Wikipedia policies
can support people's IAR "not what the policy was intended for" (if they so want) in the fringe cases LLM-use is not practically problematic, subject to consensus. Kowal2701 (talk) 23:08, 3 December 2025 (UTC)- "insufficiently reviewed" has more wiggle room while still allowing for the edge cases; once any problem is identified, it puts the responsibility on the person adding the content rather than other editors. GreenLipstickLesbian💌🧸 23:43, 3 December 2025 (UTC)
- That'd be good too Kowal2701 (talk) 01:30, 4 December 2025 (UTC)
- Honestly, "unreviewed" has been my main point of disagreement in every proposal that includes it -- thank you for articulating it. There are two fundamental problems:
- First, if it's hard to know whether someone used AI, it's even harder to know how much they reviewed it.
- Second, and more problematic: Properly "reviewing" LLM content means that every single word, fact, and claim needs to be verified against every single source. You essentially need to reconstruct the writing process in reverse, after the fact. But most good-faith editors who use AI seem to think "reviewing" means one of two things:
- Quickly skimming it and going "yeah that looks OK."
- Using AI to "review" the text.
- This results, and will continue to result, in the following situation: Editor 1 finds some bad AI text. Editor 1 says that the AI text wasn't reviewed, and they aren't wrong. Editor 2 says that they did review the AI text, and they aren't lying. Meanwhile, the text remains bad. Gnomingstuff (talk) 01:31, 5 December 2025 (UTC)
- "insufficiently reviewed" has more wiggle room while still allowing for the edge cases; once any problem is identified, it puts the responsibility on the person adding the content rather than other editors. GreenLipstickLesbian💌🧸 23:43, 3 December 2025 (UTC)
- Enthusiastic support. I think this is the best we're going to get for a compromise option between the two LLM ideologies here.
- You don't leave any room for 'acceptable' carve-outs, you've included the very direct "Editors should not use an LLM to add content to Wikipedia, whether creating a new article or editing an existing one." which, although it uses 'should' and not 'must,' serves to discourage LLM use in general, which is very desirable for me. You've preserved the spirit of NEWLLM by categorically saying "Do not" use an LLM to author an article or major expansion, you've codified LLMCOMM by saying "Do not" use LLMs for discussions.
- My only suggested change would be to drop the "Why LLM content is problematic" section. We already have that covered at WP:LLM, there's no need to bloat this guideline by including it here. Other than that, I think this is exactly the kind of AI guideline we should have right now. Athanelar (talk) 23:11, 3 December 2025 (UTC)
- If we do that we should probably make WP:LLM an information page. SuperPianoMan9167 (talk) 00:13, 4 December 2025 (UTC)
- I think that's totally fine. We can link to it from qcne's proposal (and even promote it to supplement if necessary). It's better than adding unnecessary bloat to the guideline. The main target for this guideline, after all, is going to be people who are already using AI for something and need to be told to stop, who probably aren't going to be interested in the finer points of why LLM use is problematic. If they want to do the further reading, they can. Athanelar (talk) 03:10, 4 December 2025 (UTC)
- I did it. SuperPianoMan9167 (talk) 03:16, 4 December 2025 (UTC)
- Awesome, thank you @SuperPianoMan9167. qcne (talk) 11:17, 4 December 2025 (UTC)
- I was reverted. I did say people could do that when I made the change. SuperPianoMan9167 (talk) 16:46, 4 December 2025 (UTC)
- Awesome, thank you @SuperPianoMan9167. qcne (talk) 11:17, 4 December 2025 (UTC)
- I did it. SuperPianoMan9167 (talk) 03:16, 4 December 2025 (UTC)
- I think that's totally fine. We can link to it from qcne's proposal (and even promote it to supplement if necessary). It's better than adding unnecessary bloat to the guideline. The main target for this guideline, after all, is going to be people who are already using AI for something and need to be told to stop, who probably aren't going to be interested in the finer points of why LLM use is problematic. If they want to do the further reading, they can. Athanelar (talk) 03:10, 4 December 2025 (UTC)
- If we do that we should probably make WP:LLM an information page. SuperPianoMan9167 (talk) 00:13, 4 December 2025 (UTC)
- I appreciate your work here. I do think what you have makes sense and also is realistic in how editors work. As for "unreviewed," could a footnote work to explain what "reviewed" means? - Enos733 (talk) 23:14, 3 December 2025 (UTC)
- I'd like that. FTR I'd still support this regardless as it's a massive improvement Kowal2701 (talk) 23:22, 3 December 2025 (UTC)
- Your ping missed me, but I really like the version 3 proposal. I agree with GreenLipstickLesbian that "insufficiently reviewed" would be better verbiage, but it's not a blocker. This would have my support as-is. Adding raw or lightly edited LLM output degrades the quality of the encyclopedia, and frequently wastes the time of other editors who must then cleanup after it. This proposed guideline would explicitly prohibit such nonconstructive model use in a clear manner, and would serve as a useful tool for addressing and preventing instances of misuse. fifteen thousand two hundred twenty four (talk) 00:11, 4 December 2025 (UTC)
- Support I like it. Since that's not an argument, I also think this is finally a version Randy in Boise can understand and follow. ~ Argenti Aertheri(Chat?) 01:51, 4 December 2025 (UTC)
- Serious concern: isn't this proposal contradictory? How can both of these statements be in the same guideline?
Do not use an LLM as the primary author of a new article or a major expansion of an existing article, even if you plan to edit the output later.
(Emphasis my own)Editors should not... Paste raw or lightly edited LLM output into existing articles as new or expanded prose.
#2 strongly implies it is fine to add reviewed LLM content. But this directly contradicts #1. NicheSports (talk) 02:06, 4 December 2025 (UTC)- These do not read as contradictory to me. Nowhere in #1 does it prohibit LLM use.
even if you plan to edit the output later
means editors cannot immediately add LLM output to the project with an excuse of "I'll fix it later", they must fix it first before it can be added at all. fifteen thousand two hundred twenty four (talk) 02:26, 4 December 2025 (UTC)- I'm not sure about that interpretation... what about the first part of that sentence:
Do not use an LLM as the primary author...
. Still pretty contradictory. Either you can use an LLM to generate a bunch of text and then edit it, or you can't. This guideline, as written, plays both sides NicheSports (talk) 02:47, 4 December 2025 (UTC)- I don't follow. #1 applies to edits which create
new articles
or aremajor expansions
, situations where majority-LLM authorship would be especially undesirable, and so that is explicitly disallowed. #2 applies to editing in general, where raw or lightly-edited LLM content is disallowed. Maybe you could pose a hypothetical editing scenario where you believe a contradiction would occur, and that would help me understand your point better. fifteen thousand two hundred twenty four (talk) 03:15, 4 December 2025 (UTC)- Oh. With this interpretation, I would support! But if I don't understand this I guarantee you a lot of the non-native English speakers who are using LLMs would miss the distinction. Can we clarify the wording? NicheSports (talk) 03:19, 4 December 2025 (UTC)
- It reads well to me, so I'm not sure what changes could be made, @Qcne may have some suggestions? fifteen thousand two hundred twenty four (talk) 03:30, 4 December 2025 (UTC)
- I mean the header needs to be changed but it could just be changed to "Rules for using LLMs to assist with article content" or something neutral. We should specify that #1 above are rules for "major content additions" while #2 is rules for "minor content additions". NicheSports (talk) 03:31, 4 December 2025 (UTC)
- I do prefer the current
Do not use an LLM to add unreviewed content
header, it communicates up-front what the most basic requirement is before providing more detail below. - #1 does already specify that it concerns
new articles or a major expansions
, and #2 already applies to all editing, narrowing its scope would introduce another point of argumentation (define "minor" vs "major"). The grammatical clarity could maybe be improved, but right now it's in good enough condition for adoption, and as said prior, I'm wary of bikeshedding. fifteen thousand two hundred twenty four (talk) 03:51, 4 December 2025 (UTC)- I also think we need to be wary of any headline like the suggested "Rules for including LLM content" for fear of implying permission. I do think the "do not" header is the best way to go about it, and the way it's currently written is fine for a compromise guideline which isn't aiming to be a total ban. Athanelar (talk) 04:00, 4 December 2025 (UTC)
- The categories could just be "New articles or major expansions" and "General considerations". Could just be a bolded title before each section. That would be enough to make it clear (I support your interpretation but completely missed it when I first read). I disagree with the "unreviewed content" header, because it does contradict the guideline's language for new articles and major edits, and is going to confuse the heck out of newer editors, but I guess I can live with it for now. NicheSports (talk) 04:05, 4 December 2025 (UTC)
- I do prefer the current
- I mean the header needs to be changed but it could just be changed to "Rules for using LLMs to assist with article content" or something neutral. We should specify that #1 above are rules for "major content additions" while #2 is rules for "minor content additions". NicheSports (talk) 03:31, 4 December 2025 (UTC)
- It reads well to me, so I'm not sure what changes could be made, @Qcne may have some suggestions? fifteen thousand two hundred twenty four (talk) 03:30, 4 December 2025 (UTC)
- Oh. With this interpretation, I would support! But if I don't understand this I guarantee you a lot of the non-native English speakers who are using LLMs would miss the distinction. Can we clarify the wording? NicheSports (talk) 03:19, 4 December 2025 (UTC)
- I don't follow. #1 applies to edits which create
- I'm not sure about that interpretation... what about the first part of that sentence:
- Comment could you remove a word from the second heading -- "Do not use an LLM to add unreviewed content" -) "Do not use an LLM to add content"? Using AI to add content to Wikipedia goes against the spirit of the consensus developed in the RFC. Mikeycdiamond (talk) 02:46, 4 December 2025 (UTC)
- 3rd time is truly a charm. I really like his one. Викидим (talk) 02:54, 4 December 2025 (UTC)
- Remove the entire "Why LLM-written content is problematic" section. As I've said before, guidelines aren't information pages. Remove unnecessary words.
- Change to: "Do not use an LLM to add
unreviewedcontent" - "Handling existing LLM-generated content" – good section. Thumbs up from me on this one.
- Cremastra (talk · contribs) 03:06, 4 December 2025 (UTC)
- If guidelines aren't information pages, then shouldn't WP:LLM be tagged as an information page? SuperPianoMan9167 (talk) 03:08, 4 December 2025 (UTC)
- IMO, yes, because that's what it is – it provides useful information on why LLMs are problematic and factual tips to handle and identify them. Cremastra (talk · contribs) 03:10, 4 December 2025 (UTC)
Done in Special:Diff/1325613952. WP:LLM is now an information page. SuperPianoMan9167 (talk) 03:16, 4 December 2025 (UTC)
- When/if qcne's guideline goes live, we must remember to add it to the information page template there as a page that is interpreted by it. Athanelar (talk) 03:31, 4 December 2025 (UTC)
- IMO, yes, because that's what it is – it provides useful information on why LLMs are problematic and factual tips to handle and identify them. Cremastra (talk · contribs) 03:10, 4 December 2025 (UTC)
- Guidelines aren't information pages, true, but you do need need to explain to people why the guideline exists; Wikipedia attracts far too many free-thinking, contrarian, and libertarian types who like asking "why?" and will resist a nameless figure telling them what to do unless they're provided a reason to do otherwise. GreenLipstickLesbian💌🧸 03:10, 4 December 2025 (UTC)
- Guidelines should absolutely link – prominently! – to pertinent information pages, and give a one or two-sentence explanation of why the guideline is necessary. But whole sections dedicated to justifying its existence mean that the important parts are covered by clouds of factual information rather than principled guidance, which is confusing for new editors, who need the guidelines most. Cremastra (talk · contribs) 03:12, 4 December 2025 (UTC)
Change to: "Do not use an LLM to add
– I don't think this is going to shape up to be that kind of full-ban proposal (unlike #1 and #3 on this page are). That said, the core text as-is would be straightforward improvement while also posing no impediment to adopting more restrictions in the future. WP:NEWLLM was a small step, this would be a larger one, I'd suggest not letting perfect be the enemy of better. fifteen thousand two hundred twenty four (talk) 03:27, 4 December 2025 (UTC)unreviewedcontent"
- If guidelines aren't information pages, then shouldn't WP:LLM be tagged as an information page? SuperPianoMan9167 (talk) 03:08, 4 December 2025 (UTC)
- Thanks for all the comments. I have formally opened an RfC: User talk:Qcne/LLMGuideline#RfC: Replace text of Wikipedia:Writing articles with large language models. qcne (talk) 11:28, 4 December 2025 (UTC)
Further amendment proposal #3: Athanelar
[edit]Throwing my hat in the ring, essentially the same as Festucalex's proposal but just with slightly narrower scope that doesn't imply we're trying to police people using AI for idea generation or the likes.
| − | Large language models (or LLMs) | + | Large language models (or LLMs) are not good at creating article content which is suitable for Wikipedia, and therefore should not be used to generate content to add to Wikipedia, whether for new articles or when editing existing ones. |
Athanelar (talk) 15:17, 24 November 2025 (UTC)
- This completely changes the purpose of this guideline (expanding its scope from new articles to all edits) and would require a new RfC. Toadspike [Talk] 15:48, 24 November 2025 (UTC)
- That's sort of the intention, yes. I assume Festucalex is doing the same, and the intention is to gauge support before a formal RfC to expand the guideline. Athanelar (talk) 15:52, 24 November 2025 (UTC)
- @Qcne and Athanelar: May I have your permission to change the headers from their present titles to this:
- Further amendment proposal #1: Festucalex
- Further amendment proposal #2: qcne
- Further amendment proposal #3: Athanelar
- Just to make it clearer to other editors? I'll also change the section link that Athanelar put above. 〜 Festucalex • talk 16:18, 24 November 2025 (UTC)
- Of course, thank you. qcne (talk) 16:19, 24 November 2025 (UTC)
- Go ahead, thanks. Athanelar (talk) 16:31, 24 November 2025 (UTC)
- Done, thank you both. I took the liberty of adding an explanatory hatnote. 〜 Festucalex • talk 16:34, 24 November 2025 (UTC)
- @Qcne and Athanelar: May I have your permission to change the headers from their present titles to this:
- This is all to see if people support a new guideline as opposed to a proper change. GarethBaloney (talk) 16:37, 24 November 2025 (UTC)
- That's sort of the intention, yes. I assume Festucalex is doing the same, and the intention is to gauge support before a formal RfC to expand the guideline. Athanelar (talk) 15:52, 24 November 2025 (UTC)
- I suggest dropping the
Large language models (or LLMs) can be useful tools
part. It's not necessary and will cause an awkward divide if taken to RfC where editors who more broadly oppose LLM use would have to endorse that they are useful tools. fifteen thousand two hundred twenty four (talk) 16:27, 24 November 2025 (UTC)- I've modified my wording somewhat. I agree that part is unnecessary. Athanelar (talk) 16:35, 24 November 2025 (UTC)
- As I've discussed previously, personally I would prefer any guidance not to refer to specific technology, as this changes and is not always evident to those using tools written by others, and focus on purpose. Along the lines of my previous comment in the RfC, I suggest something like "Programs must not be used to generate text for inclusion in Wikipedia, where the text has content that goes beyond any human input used to trigger its creation." (Guidance for generated images is already covered by Wikipedia:Image use policy § AI-generated images.) isaacl (talk) 18:22, 24 November 2025 (UTC)
- How would
Text generation software such as large language models (LLMs) should not [...]
sound? Athanelar (talk) 18:26, 24 November 2025 (UTC)- Personally, I prefer using a phrase such as "Programs must not be used to generate text" as I think it better reflects what many editors want: text written by a person, not a program. I think whether it's in a footnote or a clause, text generation should be defined, so using programs to help with copy-editing, or to fill in the blanks of a skeleton outline is still allowed. Also, I prefer "must" to "should". isaacl (talk) 19:16, 24 November 2025 (UTC)
- "Programs" is too nonspecific I think; a word processor is arguably a "program used to generate text" for example. We need to be somewhat specific about what sort of technology we're forbidding here. Athanelar (talk) 19:30, 24 November 2025 (UTC)
- Thus why I said the meaning of text generation should be defined, and as I suggested, the generated text should not have content that goes beyond any human input used to to trigger its creation. Accordingly, word processors do not fall within the definition. isaacl (talk) 23:44, 24 November 2025 (UTC)
- "Programs" is too nonspecific I think; a word processor is arguably a "program used to generate text" for example. We need to be somewhat specific about what sort of technology we're forbidding here. Athanelar (talk) 19:30, 24 November 2025 (UTC)
- Personally, I prefer using a phrase such as "Programs must not be used to generate text" as I think it better reflects what many editors want: text written by a person, not a program. I think whether it's in a footnote or a clause, text generation should be defined, so using programs to help with copy-editing, or to fill in the blanks of a skeleton outline is still allowed. Also, I prefer "must" to "should". isaacl (talk) 19:16, 24 November 2025 (UTC)
- How would
- Honestly, I like this as the lead for Qcne's proposal above. Specifying it's about both creating articles and editing existing ones is good clarity Kowal2701 (talk) 21:41, 24 November 2025 (UTC)
- Oppose. I would argue that the current text is already too restrictive (yes, AI can be abused, but so does the WP:AWB) and needs to be handled in other way altogether (like the AWB is handled). Викидим (talk) 22:04, 24 November 2025 (UTC)
- This proposal is more restrictive than proposal #2, so it can't serve as a lead for it. isaacl (talk) 23:50, 24 November 2025 (UTC)
- Support. I'm still going to try making incremental changes to improve the current version, but this closes the biggest loophole (inserting content into existing articles) while eliminating "from scratch". You're going to need to tighten your definitions though or "but it's only one sentence and I reviewed it". ~ Argenti Aertheri(Chat?) 21:13, 26 November 2025 (UTC)
- How would you know whether one sentence was AI-generated? Is it practical to prohibit an undetectable use? Unenforceable "laws" can lead to a general disregard for rules ("Oh, yes, driving that fast is illegal here, but everybody does it, and the police don't care" becomes "Nobody cares about speeding, and reckless driving is basically the same thing"). WhatamIdoing (talk) 06:28, 27 November 2025 (UTC)
Is it practical to prohibit an undetectable use?
– Banning all use bans all use. All vandalism is prohibited, not just detectable vandalism, same for NPOV violations, promotion, undisclosed paid editing, sockpuppetry, etc. What can be detected will be, what can not will not. I do not understand your point. fifteen thousand two hundred twenty four (talk) 06:43, 27 November 2025 (UTC)- Yes, banning bans all use. But if you can't tell whether the use happened, or prove that it didn't, then we might end up with drama instead of an LLM-free wiki. WhatamIdoing (talk) 02:20, 28 November 2025 (UTC)
- We can't prove COI or undisclosed paid editing either, we still don't allow them. ~ Argenti Aertheri(Chat?) 19:39, 28 November 2025 (UTC)
- And we end up with drama about that regularly, when an editor issues an accusation, and the targeted editor denies it, and how do you prove who's correct? WhatamIdoing (talk) 02:47, 2 December 2025 (UTC)
- Since that's all par the course for COI, I think you may have misunderstood my !vote. I'm sorry if it sounded like I was trying to say one reviewed sentence should (not) be allowed. I meant to say: this will come up if this goes for RfC, so address it before RfC. Personally I think one reasonable length sentence is my comfort level, if only because of how much GPTs like to ramble. ~ Argenti Aertheri(Chat?) 18:31, 2 December 2025 (UTC)
- And we end up with drama about that regularly, when an editor issues an accusation, and the targeted editor denies it, and how do you prove who's correct? WhatamIdoing (talk) 02:47, 2 December 2025 (UTC)
- We can't prove COI or undisclosed paid editing either, we still don't allow them. ~ Argenti Aertheri(Chat?) 19:39, 28 November 2025 (UTC)
- Yes, banning bans all use. But if you can't tell whether the use happened, or prove that it didn't, then we might end up with drama instead of an LLM-free wiki. WhatamIdoing (talk) 02:20, 28 November 2025 (UTC)
- How would you know whether one sentence was AI-generated? Is it practical to prohibit an undetectable use? Unenforceable "laws" can lead to a general disregard for rules ("Oh, yes, driving that fast is illegal here, but everybody does it, and the police don't care" becomes "Nobody cares about speeding, and reckless driving is basically the same thing"). WhatamIdoing (talk) 06:28, 27 November 2025 (UTC)
- Oppose. Instead of this approach, which I do not think would make for a useful guideline, I support adopting WP:LLMCIR as a guideline.—Alalch E. 00:04, 28 November 2025 (UTC)
- Support. AI causes wikipedia numerous issues like hallucinations text that does not make sense and unsourced content etc. I believe the guidline prohibiting the use of ai to generate article content is the best way forward. GothicGolem29 (Talk) 18:58, 28 November 2025 (UTC)
- Oppose. LLMs are useful tools when used carefully. Anne drew (talk · contribs) 19:52, 3 December 2025 (UTC)
Expanding CSD G15 to align with this guideline
[edit]Those participating in this discussion might also be interested in my discussion about potentially expanding CSD G15 to apply to all AI-generated articles per this guideline. Athanelar (talk) 16:53, 24 November 2025 (UTC)
- Discussion withdrawn within six hours by the OP due to opposition. WhatamIdoing (talk) 06:29, 27 November 2025 (UTC)
Not a proposal, just some stray ideas
[edit]I didn't participate in the original RfC and I haven't fully read the new proposals and discussions here, but I'll table the rough notes I've been compiling at User:ClaudineChionh/Guides/New editors and AI in case there are any useful ideas there. (There might be nothing useful there; I'm still slowly working my way through the discussions on this page.) ClaudineChionh (she/her · talk · email · global) 23:04, 24 November 2025 (UTC)
- After reflecting on the common refrain in these discussions that AI is just a tool, we should judge LLM text by the same standards we judge human text, I also finally put some of my thoughts on this matter into essay form (complete with clickbaity title!) at User:LWG/10 Wikipedia Policies, Guidelines, and Expectations That Your ChatBot Use Probably Violates. There's also a little "spot the LLM" easter egg if anyone wants a small diversion. -- LWG talk 03:03, 25 November 2025 (UTC)
Further amendment proposal #4: Mikeycdiamond
[edit]During the initial discussion of this guideline, I noticed that people were complaining that others would use it to blanketly attack stuff at XFD because it might be by an AI. My proposal would fix that problem. I also noticed some slight overlap with the third sentence of my proposal and Qcne's proposal, but I would appreciate input on whether I should delete it. If my proposal were to be enacted, I believe it should be its own paragraph.
"When nominating an AI article for deletion, don't just point at it and say, "That's AI!" Please point out the policies or guidelines that the AI-generated article violated. WP:HOAX and WP:NPOV are examples of policies and guidelines that AIs commonly violate." Mikeycdiamond (talk) 00:55, 25 November 2025 (UTC)
- Oppose. I would compare the situation to WP:BURDEN - deleting AI slop should be easy at the slightest suspicion, keeping it should require disclosures / proofs of veracity, etc. (like BURDEN does in the case of unsourced text). This proposal goes in the opposite direction: another editor should be able to tell me that "this article looks like AI slop. Explain to me how you created this text", in the same way they can point to BURDEN and tell me "show me your sources or this paragraph will be gone". Викидим (talk) 01:17, 25 November 2025 (UTC)
- @Викидим, I have "the slightest suspicion" that the new articles you created at Attribute (art) and Christoph Ehrlich used AI tools. Exactly how easy should it be for me to get your new articles deleted? WhatamIdoing (talk) 06:35, 27 November 2025 (UTC)
- The key word in my remark is "slop". I do not think that everything that AI produces is sloppy. Incidentally, I already provide full disclosures on the talk pages. I hope this would convince other editors in the veracity of the article content, so the hypothetical AfD would not happen. So, (1) I firmly believe that using AI should be allowed and (2) acknowledge the need to restrict the cost of absorbing the AI-generated text into the encyclopedia.
- My personal preference would be to have a special "generative AI" flag that allows the editor to use generative AI. For some reason this idea is not popular. An alternative would be to shift the burden onto of proof of quality onto the users of generative AI. For an article showing the telltale signs of AI use, absence of published prompts or prompts indicating that the AI was involved in the search for RS can be grounds for deletion IMHO. Викидим (talk) 06:58, 27 November 2025 (UTC)
- I think some editors believe "AI slop" is redundant (i.e., all generative AI is automatically slop), so your articles would be at risk of AFD.
- Other editors believe that "deleting slop should be easy", even if it's not AI-related. WhatamIdoing (talk) 02:22, 28 November 2025 (UTC)
- Regarding the quality of AI output: based on what I have witnessed firsthand, the modern AI models, when properly used, can provide correct software code of quite non-trivial size. I will happily admit that the uncertainties inherent in any human language make operations with it harder than than with programming languages, but the fact that AI (as of late 2025) in principle can generate demonstrably correct text is undeniable. Same thing apparently happens when AI is asked to produce, say, a summary of facts relating to X from a few-hundred-page book that references back to the pages in the original book. Here, based on personal experience, I am yet to encounter major issues, too. Writing of a Wikipedia article is very close to this latter job, so I see no reason why modern AI, properly prompted, should produce slop. Unlike in the former case, where the proof of correctness is definite, I can be wrong, and will happily acknowledge it if somebody provides me with an example of, say, Gemini 3.0 summarizing text on a "soft" topic wildly incorrectly after adequate prompts (which in this case are simple: "here is the file with text X, create summary of what it says about Y for use in an English Wikipedia article"). Викидим (talk) 04:39, 28 November 2025 (UTC)
- Even if you think that modern AI can produce good content, other editors appear to be dead-set against it.
- Additionally, you are opposing a request for editors to say more than "That's AI" when trying to get something deleted. Surely you at least mean for them to say "That's AI slop"? Because if "modern AI, properly prompted" is a reason for deletion, then your AI-generated articles will disappear soon. WhatamIdoing (talk) 02:50, 2 December 2025 (UTC)
- I understand the internal contradiction in my posture. I stems from the fact that I look at AI from two angles, as an editor who actually likes to create articles using AI and feels good about the need to wash hands prior to cooking the text, and as an WP:NPP member where I occasionally face the slop. Викидим (talk) 06:46, 2 December 2025 (UTC)
- My experience has been the opposite -- AI-generated text in my experience tends to represent sources so poorly that when I spot check some obviously-modern-AI text, there is a >50% chance that it's going to be the same old slop just with a citation tacked on.
- Recent and characteristic example: Talk:Burn (Papa Roach song), generated a few days ago most likely with ChatGPT (based on utm_source params in the editor's other contributions). I don't know what LLM or prompt was used, but it took me only ~10 minutes to find several instances of AI-generated claims that sources say things that they simply don't. This isn't an especially noteworthy example either, it got it wrong in the exact same ways it usually does.
- And if the article were to go to AfD -- note, I am not saying that it should -- that is actually relevant, because the AI text is presenting one source as multiple, and in one case inventing fictitious WP:SIGCOV literally just from a song's inclusion in a tracklisting. This becomes obvious when you read the cited sources, but many at AfD don't. Gnomingstuff (talk) 20:55, 2 December 2025 (UTC)
- I understand the internal contradiction in my posture. I stems from the fact that I look at AI from two angles, as an editor who actually likes to create articles using AI and feels good about the need to wash hands prior to cooking the text, and as an WP:NPP member where I occasionally face the slop. Викидим (talk) 06:46, 2 December 2025 (UTC)
- Regarding the quality of AI output: based on what I have witnessed firsthand, the modern AI models, when properly used, can provide correct software code of quite non-trivial size. I will happily admit that the uncertainties inherent in any human language make operations with it harder than than with programming languages, but the fact that AI (as of late 2025) in principle can generate demonstrably correct text is undeniable. Same thing apparently happens when AI is asked to produce, say, a summary of facts relating to X from a few-hundred-page book that references back to the pages in the original book. Here, based on personal experience, I am yet to encounter major issues, too. Writing of a Wikipedia article is very close to this latter job, so I see no reason why modern AI, properly prompted, should produce slop. Unlike in the former case, where the proof of correctness is definite, I can be wrong, and will happily acknowledge it if somebody provides me with an example of, say, Gemini 3.0 summarizing text on a "soft" topic wildly incorrectly after adequate prompts (which in this case are simple: "here is the file with text X, create summary of what it says about Y for use in an English Wikipedia article"). Викидим (talk) 04:39, 28 November 2025 (UTC)
- @Викидим, I have "the slightest suspicion" that the new articles you created at Attribute (art) and Christoph Ehrlich used AI tools. Exactly how easy should it be for me to get your new articles deleted? WhatamIdoing (talk) 06:35, 27 November 2025 (UTC)
- Oppose in its current form. Generally I think AI usage falls under WP:NOTCLEANUP -- a lot of AI-generated articles are about notable subjects, especially the ones where there's a language gap. But I do think there are legitimate reasons to bring AI usage up at AfD, because AI can misrepresent sources, and in particular often misrepresents them by making a huge deal out of a passing mention, making coverage seem significant that actually isn't. I also think that for certain topics -- POV forks, BLPs, etc. -- AI generation is a legitimate reason to just delete the thing. Gnomingstuff (talk) 01:23, 25 November 2025 (UTC)
- Support. Explaining how WP:AfD is not cleanup is very important to clarifying the scope of this guideline Katzrockso (talk) 01:31, 25 November 2025 (UTC)
- Promote WP:LLM to guideline We cite it and treat it as if it were a guideline and not an essay. For Pete's sake, just promote it already! It has everything necessary for a comprehensive LLM usage guideline. SuperPianoMan9167 (talk) 02:01, 25 November 2025 (UTC)
- We've already gone through a month-long RFC to promote this to a guideline. Could you image how large the debate would be if we tried to promote that essay? It might be quicker to work on this guideline. Mikeycdiamond (talk) 02:05, 25 November 2025 (UTC)
- That essay is comprehensive and well-written. In my opinion, it would be quicker to just promote it to guideline instead. Besides, it already contains guidance in the spirit of this guideline in the form of WP:LLMWRITE. It also contains WP:LLMDISCLOSE, which I think should be policy (and I am honestly baffled that it isn't). SuperPianoMan9167 (talk) 02:09, 25 November 2025 (UTC)
- No one is stopping you from making an RFC. I don't disagree with you, but I am not sure if it would pass. Mikeycdiamond (talk) 02:12, 25 November 2025 (UTC)
- I was looking through LLM's talk page archives; there was an RFC in 2023. The RFC showed large consensus against promoting it, but a lot has changed since then. Mikeycdiamond (talk) 02:22, 25 November 2025 (UTC)
- No one is stopping you from making an RFC. I don't disagree with you, but I am not sure if it would pass. Mikeycdiamond (talk) 02:12, 25 November 2025 (UTC)
- That essay is comprehensive and well-written. In my opinion, it would be quicker to just promote it to guideline instead. Besides, it already contains guidance in the spirit of this guideline in the form of WP:LLMWRITE. It also contains WP:LLMDISCLOSE, which I think should be policy (and I am honestly baffled that it isn't). SuperPianoMan9167 (talk) 02:09, 25 November 2025 (UTC)
- We've already gone through a month-long RFC to promote this to a guideline. Could you image how large the debate would be if we tried to promote that essay? It might be quicker to work on this guideline. Mikeycdiamond (talk) 02:05, 25 November 2025 (UTC)
- Oppose; misses the point of NEWLLM, which is specifically to forbid AI-generated articles simply because they are AI-generated, and not because of AI-related policy violation. Athanelar (talk) 02:56, 25 November 2025 (UTC)
- That's your interpretation of the guideline. Other editors will interpret it in different ways. SuperPianoMan9167 (talk) 02:57, 25 November 2025 (UTC)
- The text of the guideline is pretty clear on what it forbids. It says that LLMs are not good at generating articles, and should not be used to generate articles from scratch. We can argue all day about what 'from scratch' means (which is what these amendment proposals are meant to solve) but the fact that the guideline forbids AI writing in itself is not I think ambiguous in any sense; there is no room in the proposal to argue that it's saying AI-generated articles are only bad if they violate other policies. Athanelar (talk) 03:06, 25 November 2025 (UTC)
- If they don't violate other policies/guidelines, what is the point of deleting them? Isn't the sole reason of banning AIs because they violate our other policies/guidelines? Mikeycdiamond (talk) 03:11, 25 November 2025 (UTC)
- Because they violate this guideline, which says you shouldn't generate articles using AI. Athanelar (talk) 03:15, 25 November 2025 (UTC)
- WP:IMPERFECT and WP:ATD-E are core Wikipedia policies that collectively suggest WP:SURMOUNTABLE problems that can be resolved with editing should not be deleted. Katzrockso (talk) 03:45, 25 November 2025 (UTC)
- In my eyes, a guideline which says "Articles should not be generated from scratch using an LLM" logically means the same thing as "An article generated from scratch using an LLM should not exist." It would be kind of odd to me to argue that this guideline doesn't support deletion; because what, you're saying that you shouldn't generate articles using AI, but if you happen to do so, then it's fine as long as it doesn't violate other policies/guidelines? That would mean that this guideline really does nothing at all.
- And anyway, your argument also arguably applies to an AI-generated article which violates other policies/guidelines, too. I mean, those problems might also be surmountable, so what's the problem there? Should we disregard CSD G15 and say that unreviewed AI-generated articles are fine as long as the article subject is notable and the article is theoretically fixable with human intervention?
- Basically, I think adding a paragraph to this guideline saying that you can't use it to support deletion would mean there's no point in this guideline existing at all, and you might as well just propose that the guideline be demoted again. Athanelar (talk) 03:57, 25 November 2025 (UTC)
- Say Mary Jane generates an LLM-written article that has some major, but surmountable, issues. For example, two of her citations are to fake links, but other sources are readily available to support the claims, three of the claims are improperly in wikivoice when they should be attributed, and there is a section of the article that is irrelevant/undue. Would you suggest this article be deleted in whole, despite being otherwise a notable topic, or should editors be allowed to remedy the problems generated by the LLM usage? Katzrockso (talk) 04:04, 25 November 2025 (UTC)
- I think in the given example it would essentially be the same amount of effort to TNT the article and start from scratch as to try to rework it from the flawed foundation; so yes, I'd say deletion would still be fine in that case.
- Besides, what exactly would we be fighting to keep in the other case? It's not as if we'd be doing so out of a desire to respect Mary Jane's effort in creating the article. We'd be trying to hammer a square peg into a round hole for no reason other than 'well, the subject's notable and the article's here now, so...' Athanelar (talk) 04:11, 25 November 2025 (UTC)
- It's my (and my other editors) belief that TNT is not a policy-based solution remedy (WP:TNTTNT), but one that violates fundamental Wikipedia PAG. In my given example, I don't see how "it would essentially be the same amount of effort to TNT the article and start from scratch as to try to rework it from the flawed foundation". The remedy in my scenario would be:
- 1) Replace the fake link citations to the readily available real sources that support the claim
- 2) Change the three sentences that are improperly in wikivoice to attributed claims
- 3) Remove the off-topic/irrelevant section
- If you think that is more difficult than starting from scratch, I don't know what to express other than shock and disbelief. Katzrockso (talk) 06:02, 25 November 2025 (UTC)
- About TNT: Has it ever occurred to you that the actual admin delete button isn't necessary? You can follow process you're thinking of (AFD, red link, start new article) or you could open the article, blank the contents, and replace it with the new article right there, without needing to spend time at AFD or anything else first. WhatamIdoing (talk) 06:37, 27 November 2025 (UTC)
- (also, the article you've given as your example here would already be suitable for deletion under CSD G15 whether or not WP:NEWLLM existed, so if you don't think that article would be suitable for deletion, you're also arguing we shouldn't have CSD G15) Athanelar (talk) 04:13, 25 November 2025 (UTC)
- Say Mary Jane generates an LLM-written article that has some major, but surmountable, issues. For example, two of her citations are to fake links, but other sources are readily available to support the claims, three of the claims are improperly in wikivoice when they should be attributed, and there is a section of the article that is irrelevant/undue. Would you suggest this article be deleted in whole, despite being otherwise a notable topic, or should editors be allowed to remedy the problems generated by the LLM usage? Katzrockso (talk) 04:04, 25 November 2025 (UTC)
- WP:IMPERFECT and WP:ATD-E are core Wikipedia policies that collectively suggest WP:SURMOUNTABLE problems that can be resolved with editing should not be deleted. Katzrockso (talk) 03:45, 25 November 2025 (UTC)
- Because they violate this guideline, which says you shouldn't generate articles using AI. Athanelar (talk) 03:15, 25 November 2025 (UTC)
- If they don't violate other policies/guidelines, what is the point of deleting them? Isn't the sole reason of banning AIs because they violate our other policies/guidelines? Mikeycdiamond (talk) 03:11, 25 November 2025 (UTC)
- The text of the guideline is pretty clear on what it forbids. It says that LLMs are not good at generating articles, and should not be used to generate articles from scratch. We can argue all day about what 'from scratch' means (which is what these amendment proposals are meant to solve) but the fact that the guideline forbids AI writing in itself is not I think ambiguous in any sense; there is no room in the proposal to argue that it's saying AI-generated articles are only bad if they violate other policies. Athanelar (talk) 03:06, 25 November 2025 (UTC)
- Things are only as good as the parts that make them up. If it wasn't for HOAX or NPOV--among many other-- violations, this guideline wouldn't exist. We already have policies and guidelines for the subjects AIs violate; why shouldn't we use them? It is much clearer to point out the specific thing the text violates then blindly saying it is AI. I know AI text is relatively easy to spot now, but it will get progressively better at hiding from detection. What if people use anti-AI detection software? This guideline is meant to back up stronger claims using other policies/guidelines, not be the sole argument in an XFD. Mikeycdiamond (talk) 03:09, 25 November 2025 (UTC)
- The text of this guideline literally says 'LLMs should not be used to generate articles from scratch.' Your proposed amendment to that guideline is to tell people that when deleting AI-generated articles, they cannot reference the guideline that specifically says 'Don't generate articles with AI' and must instead reference other policies/guidelines that the article violates.
- That would seem to defeat the whole point of passing a guideline that says 'Don't generate articles with AI,' wouldn't it? Athanelar (talk) 03:14, 25 November 2025 (UTC)
- Deletion policy wasn't really discussed all too much in the RfC or the nonexistent RFCBEFORE, so whether it defeats the purpose is not established. Many editors expressed positive attitudes towards the guideline because it provided somewhere to point to explain to people why their LLM contributions aren't beneficial. Katzrockso (talk) 03:47, 25 November 2025 (UTC)
- That's your interpretation of the guideline. Other editors will interpret it in different ways. SuperPianoMan9167 (talk) 02:57, 25 November 2025 (UTC)
- Oppose as defeating the purpose of having a guideline. We just passed a guideline saying "don't create articles with LLMs", this would effectively negate that by turning around and saying "actually, it's fine if it doesn't violate anything else". It doesn't work that way with any other guideline and for good reason: imagine nominating something for deletion due to serious COI issues and being told "nah, prove it violates NPOV". No, the burden of proof is on the editor with the conflict because they're already violating one guideline. This is one guideline, violating one guideline is enough. ~ Argenti Aertheri(Chat?) 21:27, 25 November 2025 (UTC)
- I agree completely with the objections raised by Викидим and Gnomingstuff and Athanelar and Argenti Aertheri. AFD is about what an article is lacking (sourcing establishing notability), not about what bad content it has - just remove the bad content and AFD whatever is left if warranted. So there is no reason to treat NEWLLM differently from any other guideline there. -- LWG talk 01:10, 26 November 2025 (UTC)
- Oppose — This reminds me of when people tried to undercut the ban on AI slop images as soon as it passed. The guideline needs to made stronger, not weaker. —pythoncoder (talk | contribs) 15:39, 26 November 2025 (UTC)
- Oppose per all above. A guideline is a guideline and a statement of principle, and should be used directly, not as through proxies. If there is overwhelming evidence an article is wholly AI-generated such that it falls afoul of this guideline, the article should be deleted at AfD. Cremastra (talk · contribs) 19:01, 26 November 2025 (UTC)
- Oppose. Not topical in this guideline as this guideline is not about deletion in the first place.—Alalch E. 23:52, 27 November 2025 (UTC)
- Some people think it is, see #Expanding CSD G15 to align with this guideline. SuperPianoMan9167 (talk) 00:04, 28 November 2025 (UTC)
community consensus on how to identify LLM-generated writing
[edit]Not sure how I feel about this one.
On the one hand, there is some research suggesting that consensus helps: specifically, when multiple people familiar with signs of AI writing agree on whether a given piece of text is AI, they can achieve up to 99% accuracy. Individual editors were topping out at around 90% accuracy (which is still very good obviously).
On the other hand, we have to treat an edit as human-generated until there's consensus otherwise
seems like a massive restriction that came out of nowhere -- it doesn't have consensus in the RfC and I'm not sure more than a handful of people even said anything close. Like, just think about how that would work in practice. Do we have to commune a whole AI Tribunal before reverting text that is very clearly AI-generated? Is individual informed judgment not enough?
This stuff is really not hard to identify. WP:AISIGNS exists, and is relatively up to date with existing research on common characteristics of LLM-generated text -- and specifically, things it does that text prior to ~2022 just... didn't do very often. This is also the case with Wikipedia text prior to mid-2022. I've been running similar if lax text crunching on Wikipedia articles before mid-2022, and the same tells have just skyrocketed. The problem is actually convincing people of this: that AI text consistently displays various patterns far more often than human text does (or for that matter, than LLM base models do), that people have actually studied those patterns, and that the individual edit they are looking at fits the pattern almost exactly. Is the page just not clear enough? Does it need additional citations? Gnomingstuff (talk) 01:11, 25 November 2025 (UTC)
- I think this caveat was added to the RfC only because the closer didn't believe there was enough consensus for the promotion to guideline, and adding the requirement for consensus to determine that an article is in fact AI generated helps to soothe those who think the guideline is over-restrictive.
- I also think it's really a non-issue; since there's no support currently to expand CSD G15 to apply to all AI-generated articles, any article suspected of being AI-generated in violation of NEWLLM will have to go to AfD anyway, which automatically will end up determining consensus about whether the article is AI generated and should be deleted under NEWLLM. Athanelar (talk) 03:02, 25 November 2025 (UTC)
we have to treat an edit as human-generated until there's consensus otherwise
Where did this come from?- As for your number crunching, I'm not sure if I understand the results, but if we are going to start taking phrases like "pivotal role in" and "significant contributions to" as evidence of LLM contributions, then I think this starts to pose problems. Katzrockso (talk) 03:03, 25 November 2025 (UTC)
- It's from the RFC closing note. Athanelar (talk) 03:07, 25 November 2025 (UTC)
- That sentence in the closing note is strange to me as well, and only makes sense in the context of an AFD or community sanctions on a problem user. In terms of reversion/restoration of individual suspected LLM-edits, the WP:BURDEN is clearly on the user who added the content to explain and justify the addition, not on a reverting editor to explain and justify their reversion. In the context of LLM use, that means that if someone asks an editor "did you use an LLM to generate this content, and if so what did that process look like?" they should get an clear and accurate answer, and if they don't get a clear and accurate answer the content should be removed until they do. -- LWG talk 03:22, 25 November 2025 (UTC)
- I think ultimately it's just an effort by the closer to avoid 'taking a side' on what they perceived as a pretty tight consensus, and to preempt a controversy about the nature of the guideline; which of course is occurring anyway. Athanelar (talk) 03:32, 25 November 2025 (UTC)
- No, it's not any of those things. It's me knowing this argument was going to be made and pre-empting it. Where there's no rule or guideline, Wikipedia makes content decisions by consensus; so an edit isn't to be treated as AI-generated until either we've got consensus for a test that it's AI-generated or else we've analysed the edit and reached consensus that it's AI-generated.I know this limits the applicability of the guideline but that's not because I'm unclear or unsure about the RFC outcome or worried about taking sides. It's because of how long-established Wikipedia custom and practice works.A test of what actually identifies AI-generated writing should really be the next step, folks.—S Marshall T/C 08:57, 25 November 2025 (UTC)
- The issue is that requiring consensus before tagging content as problematic (instead of tagging the content and then following WP:BRD) imposes an unnecessary restriction, even on current practices, which wasn't brought up in the discussion. This close would mean, for example, that we can't tag a page as {{AI-generated}} anymore without first requiring an explicit consensus. This isn't
Wikipedia custom and practice
for tagging and has never been. Chaotic Enby (talk · contribs) 09:10, 25 November 2025 (UTC)- The best solution to these problems is to reach consensus on a test. But obviously, tagging doesn't need consensus and never has. What's not allowed is to revert or delete content for being AI-generated unless there's consensus to do so. Just to be clear: all our normal rules apply. You can still revert for all the usual reasons. BRD still applies. ONUS still applies. You can still tag stuff you suspect might be problematic.—S Marshall T/C 09:43, 25 November 2025 (UTC)
But obviously, tagging doesn't need consensus and never has.
This is certainly not obvious from your close, which says thatthis means that we have to treat an edit as human-generated until there's consensus otherwise
. A closure should only summarize the given discussion, not add new policies that need to rely on the word of the closer for later clarification, even if they would be a logical development from previous practice. Chaotic Enby (talk · contribs) 10:44, 25 November 2025 (UTC)- Summarize and clarify. A close should summarize the community's decision and clarify its relationship to existing policy and procedure. What we don't want looks like this:
I think this user is adding AI-generated content so I'm going to quick-fail all their AfC submissions and then follow them round reverting and prodding.
—S Marshall T/C 12:22, 25 November 2025 (UTC)
- Summarize and clarify. A close should summarize the community's decision and clarify its relationship to existing policy and procedure. What we don't want looks like this:
What's not allowed is to revert or delete content for being AI-generated unless there's consensus to do so
--- I'm not aware of anything in policy stating this -- certainly not AI policy, because we don't have any. Based on the consensus of this RfC, and on the fact that people are already reverting and deleting content for being AI to relatively little outcry, I don't think there would be consensus for such a prohibition, and I think most people in the RfC would be surprised to learn they were !voting for one. Gnomingstuff (talk) 10:16, 26 November 2025 (UTC)
- As far as a test being the next step... I mean I'm trying Jennifer. We have WP:AISIGNS and are trying to make it as research-backed as possible. It is an evolving document, and I'm sure most contributors to it have their own list of personal tells they've noticed. (For example I trust @Pythoncoder's judgment implicitly on detecting AI but they see stuff I have no idea about. Apologies if you don't want the ping, I figured the outcome here is relevant to you.) But there are several problems:
- Problem 1: Getting people to actually believe that these are signs of AI use. There seems to be no amount of evidence that is enough.
- Problem 2: Getting people to interpret things correctly. This stuff gets very in-the-weeds, and AISIGNS leaves out a lot for that reason. For instance, one "personal tell" I have noticed is that Additionally, starting a sentence with capitals and punctuation, is a strong indicator of possible AI use, but the word additionally as an infix isn't necessarily a sign. Other tells I have are still kind of in the oven until I can hammer out a version with as few false positives as possible, with as little potential for confusion.
- Problem 3: We are doomed to remain in the world of evidence, not proof. It is impossible to prove whether AI was used in an edit unless you are the editor who made it. Since we have had AI text incoming since 2023, many of those editors aren't around anymore. Other editors are not forthcoming with the information. Some dodge the question, some trickle-truth it, small handful of editors lie. Gnomingstuff (talk) 10:34, 26 November 2025 (UTC)
- The best solution to these problems is to reach consensus on a test. But obviously, tagging doesn't need consensus and never has. What's not allowed is to revert or delete content for being AI-generated unless there's consensus to do so. Just to be clear: all our normal rules apply. You can still revert for all the usual reasons. BRD still applies. ONUS still applies. You can still tag stuff you suspect might be problematic.—S Marshall T/C 09:43, 25 November 2025 (UTC)
- The issue is that requiring consensus before tagging content as problematic (instead of tagging the content and then following WP:BRD) imposes an unnecessary restriction, even on current practices, which wasn't brought up in the discussion. This close would mean, for example, that we can't tag a page as {{AI-generated}} anymore without first requiring an explicit consensus. This isn't
- No, it's not any of those things. It's me knowing this argument was going to be made and pre-empting it. Where there's no rule or guideline, Wikipedia makes content decisions by consensus; so an edit isn't to be treated as AI-generated until either we've got consensus for a test that it's AI-generated or else we've analysed the edit and reached consensus that it's AI-generated.I know this limits the applicability of the guideline but that's not because I'm unclear or unsure about the RFC outcome or worried about taking sides. It's because of how long-established Wikipedia custom and practice works.A test of what actually identifies AI-generated writing should really be the next step, folks.—S Marshall T/C 08:57, 25 November 2025 (UTC)
- I think ultimately it's just an effort by the closer to avoid 'taking a side' on what they perceived as a pretty tight consensus, and to preempt a controversy about the nature of the guideline; which of course is occurring anyway. Athanelar (talk) 03:32, 25 November 2025 (UTC)
- That sentence in the closing note is strange to me as well, and only makes sense in the context of an AFD or community sanctions on a problem user. In terms of reversion/restoration of individual suspected LLM-edits, the WP:BURDEN is clearly on the user who added the content to explain and justify the addition, not on a reverting editor to explain and justify their reversion. In the context of LLM use, that means that if someone asks an editor "did you use an LLM to generate this content, and if so what did that process look like?" they should get an clear and accurate answer, and if they don't get a clear and accurate answer the content should be removed until they do. -- LWG talk 03:22, 25 November 2025 (UTC)
- This is exactly the shit I mean. When:
- A word is identified in multiple academic studies as very over-represented in LLM-generated text compared to human text
- The most obvious phrase containing that word is roughly 1605% more common in one admittedly less rigorous sample of AI-generated edits compared to human-generated -- a substantial portion of which are human-generated articles tagged as promotional
- ...then yes, it would seem to be empirical evidence? No one can prove how a user produced an edit besides that user, but when patterns start showing up that happen to be similar patterns to ones cited in external sources as characteristic of AI use, that is telling. Gnomingstuff (talk) 03:31, 25 November 2025 (UTC)
- Empirical evidence of what? I have humanly generated both those phrases before (not on Wikipedia, I don't think, but elsewhere), are you going to suggest deleting my contributions on these types of grounds, because your model suggests that LLMs use these phrases at higher rates? Keep in mind that human language is changing as a result of LLMs ([5]), for better or worse. Katzrockso (talk) 03:53, 25 November 2025 (UTC)
- Empirical evidence that these words and phrases appear more frequently in the aggregate of AI-generated text -- in this case, on Wikipedia -- compared to the aggregate of human-generated text on Wikipedia. They also tend to occur together, and occur in the same ways, in the same places in sentences, the same forms, etc. So if an edit shows up with a whole bunch of this crammed into 500 words, that's a very strong indication that the text is probably AI. Not a perfect indication -- for instance, this version of Julia's Kitchen Wisdom is way too early for AI but sounds just like it -- but a very strong one.
- I am aware of the studies that human language is changing as a result of LLMs -- one study suggests that this particular set of words is really just a supercharge to increases in those words that were naturally happening already. That particular study is less convincing because it seems to think podcasts are never scripted or pre-written, which is... not true. But anecdotally I do see it happening. (It's a bit weird to hear this stuff out of human mouths in the wild, although that's probably just the frequency illusion given how much AI text I am seeing all day.) Not sure how much that affects Wikipedia, especially the last few years of AI stuff to deal with, given that the changes in human language feel like a lagging indicator. Gnomingstuff (talk) 10:08, 26 November 2025 (UTC)
- Incidentally GPTZero scans that revision of Julia's Kitchen Wisdom as 98% human,
highlighting the pivotal role ofillustrating the benefit of using multiple channels of evidence to assess content. -- LWG talk 17:47, 26 November 2025 (UTC)
- Incidentally GPTZero scans that revision of Julia's Kitchen Wisdom as 98% human,
- Empirical evidence of what? I have humanly generated both those phrases before (not on Wikipedia, I don't think, but elsewhere), are you going to suggest deleting my contributions on these types of grounds, because your model suggests that LLMs use these phrases at higher rates? Keep in mind that human language is changing as a result of LLMs ([5]), for better or worse. Katzrockso (talk) 03:53, 25 November 2025 (UTC)
- It's from the RFC closing note. Athanelar (talk) 03:07, 25 November 2025 (UTC)
- I have the opposite reaction to Individual editors were topping out at around 90% accuracy (which is still very good obviously): I look at that and say even the best of the best were making false accusations at least 10% of the time.
- Imagine the uproar if someone wanted to work in Wikipedia:Copyright problems, but they made false accusations of copyvios 10% of the time. We would not be talking about how good they are.
- If anything, this information has convinced me that unilateral declarations of improper LLM use should be discouraged. Maybe tags such as Template:AI-generated should be re-written to suggest something like "This article needs to be checked for suspected AI use". WhatamIdoing (talk) 07:01, 27 November 2025 (UTC)
- The template already says the article may contain them. There is a separate parameter, certain=y, that is added for cases where the AI use is unambiguous. Gnomingstuff (talk) 04:21, 28 November 2025 (UTC)
- There does not need to be a community consensus on how to identify LLM-generated writing. It's a technical question. Different editors will apply different methods. Disputes will be resolved in the normal way. —Alalch E. 23:49, 27 November 2025 (UTC)
- Tell that to the closing admin who specifically said in the RfC close
In particular we need community consensus on (a) How to identify LLM-generated writing [...]
Athanelar (talk) 00:13, 28 November 2025 (UTC)- That statement is true because most signs of AI writing, except for the limited criteria of G15, are largely subjective. SuperPianoMan9167 (talk) 00:17, 28 November 2025 (UTC)
- A closer does not need to be an admin and the closer wasn't in this case. GothicGolem29 (Talk) 18:48, 28 November 2025 (UTC)
- Tell that to the closing admin who specifically said in the RfC close
Further amendment proposal #5: Argenti Aertheri
[edit]| − | + | Artificial intelligence, including GPTs and Large language models (or LLMs), is not good at creating entirely new Wikipedia articles, and should not be used to generate new Wikipedia articles from scratch. |
We barely got the thing passed, so I propose we make small, incremental, changes. Changing LLMs to all AI seems as good a place to start as any other, and probably less controversial than some. ~ Argenti Aertheri(Chat?) 03:53, 25 November 2025 (UTC)
- Oppose. One of the primary criticisms the first amendment proposals were trying to address was the prominent criticism during RfC that the term 'from scratch' has no agreed-upon definition and thus the scope of which articles this guideline applies to isn't clearly defined; your proposal doesn't address that, and in the process introduces a whole host of new ambiguity as to what tools are and aren't allowed, and in what capacity one might be allowed to use them. Athanelar (talk) 04:00, 25 November 2025 (UTC)
- There's a definition at wikt:from scratch. Merriam-Webster offers a similar definition.
- There were 37 uses of "from scratch" in the RFC; most of them were entirely favorable. There were 117 editors in the discussion; I see four who complained about the "from scratch" wording, and some of them (example) would still be valid no matter what words were used. WhatamIdoing (talk) 07:10, 27 November 2025 (UTC)
- GPT is a type of LLM, not something that can be contrasted with it. What other forms of "artificial intelligence" (a dubious + nebulous concept) are creating Wikipedia articles other than LLMs? Katzrockso (talk) 04:00, 25 November 2025 (UTC)
- The point isn't to address all the problems in the guideline that passed, just one: what technologies does this include. I know AI is a nebulous concept, that's actually why I chose it, so that WP:Randy from Boise can tell in seconds if his use of his software is included. Porn is a nebulous concept too, but we all know it when we see it. ~ Argenti Aertheri(Chat?) 04:20, 25 November 2025 (UTC)
- What is not covered by the existing guidelines that your change would include? Katzrockso (talk) 05:56, 25 November 2025 (UTC)
- 1) Remove the unnecessary "can be useful tools", it's not relevant here.
- 2) Replace the technical term "LLM" with a more readily accessible definition that clarifies that we want human intelligence, not artificial intelligence, regardless of the exact technology being used. Ergo explicitly stating GPTs despite them being a subset of LLMs, people know what a GPT is and if they're using one. ~ Argenti Aertheri(Chat?) 06:33, 25 November 2025 (UTC)
- The "can be useful tools" part was just implemented as a part of the RfC on the two-sentence guideline, removing half of the approved text from the RfC is not a good start.
- "clarifies that we want human intelligence, not artificial intelligence" makes no sense, is less clear than the current version and if anything muddies the scope and applicability of this guideline. Katzrockso (talk) 09:34, 25 November 2025 (UTC)
- Would you find it acceptable to change the current wording from "LLMs" to "LLMs, including GPTs" if no other changes were made? ~ Argenti Aertheri(Chat?) 19:02, 25 November 2025 (UTC)
- I would find it acceptable/unobjectionable, I just think it's superfluous Katzrockso (talk) 00:14, 26 November 2025 (UTC)
- It's redundant if you know that GPTs are LLMs, but not if you're just Randy from Boise asking ChatGPT about the Peloponnesian War. Randy would likely have an easier time understanding the guideline with that explicitly spelled out. ~ Argenti Aertheri(Chat?) 01:35, 26 November 2025 (UTC)
- Maybe a footnote like the one in WP:G15 would work, which says
The technology behind AI chatbots such as ChatGPT and Google Gemini.
SuperPianoMan9167 (talk) 02:07, 26 November 2025 (UTC)- Works for me, hopefully it works for Randy too. Should I reword this proposal or WP:BRD? ~ Argenti Aertheri(Chat?) 07:22, 26 November 2025 (UTC)
- I went ahead and added the footnote. SuperPianoMan9167 (talk) 22:47, 26 November 2025 (UTC)
- This is much clearer/explanatory than the term "GPTs" or "artificial intelligence". Support this change Katzrockso (talk) 07:47, 27 November 2025 (UTC)
- Works for me, hopefully it works for Randy too. Should I reword this proposal or WP:BRD? ~ Argenti Aertheri(Chat?) 07:22, 26 November 2025 (UTC)
- Maybe a footnote like the one in WP:G15 would work, which says
- It's redundant if you know that GPTs are LLMs, but not if you're just Randy from Boise asking ChatGPT about the Peloponnesian War. Randy would likely have an easier time understanding the guideline with that explicitly spelled out. ~ Argenti Aertheri(Chat?) 01:35, 26 November 2025 (UTC)
- I would find it acceptable/unobjectionable, I just think it's superfluous Katzrockso (talk) 00:14, 26 November 2025 (UTC)
- Would you find it acceptable to change the current wording from "LLMs" to "LLMs, including GPTs" if no other changes were made? ~ Argenti Aertheri(Chat?) 19:02, 25 November 2025 (UTC)
- What is not covered by the existing guidelines that your change would include? Katzrockso (talk) 05:56, 25 November 2025 (UTC)
- I think that @Katzrockso and @Argenti Aertheri make a good point, and it's one that could be solved by making a list. Imagine something that says "This bans article creation with AI-based tools such as ChatGPT, Gemini, and that paragraph at the top of Google search results. This does not ban the use of AI-using tools such as Grammarly, the AI grammar tools inside Google Docs, or spellcheck tools."
- These lists don't need to be in this guideline, but it might help if they were long. It should be possible to get a list of the notable AI tools in Template:Artificial intelligence navbox. WhatamIdoing (talk) 07:17, 27 November 2025 (UTC)
- So this begs the question why is Grammarly spell check allowed but not ChatGPT spellchecking? I'm not saying that people should plop "Write me a Wikipedia article" into a LLM and paste that into Wikipedia, but these LLMs have other use cases too. What use cases people want to prohibit/permit really need to be laid out more explicitly for this to be workable. Katzrockso (talk) 07:46, 27 November 2025 (UTC)
- Here (as someone who admittedly has not used Grammarly since their adoption of LLM tech) it would (potentially) be that Grammarly uses a narrow and specific LLM model that has additional guardrails that prevent it from acting in the generative manner that ChatGPT does. Or at least that would have been the smart way of rolling out LLM tech for Grammarly, as said I've not used it so I don't know where they have implemented rails. -- Cdjp1 (talk) 16:54, 27 November 2025 (UTC)
- In my experience reading Grammarly-edited text, it doesn't always use those guardrails well. It also tends to push a lot of more expansive AI features on people. Gnomingstuff (talk) 17:07, 29 November 2025 (UTC)
- In re this begs the question why is Grammarly spell check allowed but not ChatGPT spellchecking? Yes, well, that is a question, isn't it? And I think it's a question that editors won't be able to answer if they don't realize that ChatGPT can do spellchecking.
- https://arxiv.org/html/2501.15654v2 (which someone linked above) gave 300 articles to a bunch of humans, and asked them to decide whether each article was AI-generated or human-written. They learned that an individual who doesn't use LLM incorrectly missed 43% of the LLM-written articles and falsely accused 52% of the human-written articles as being LLMs. This is in the range of a coin-flip; it is almost random chance.
- I'm reminded of this because those non-users (e.g., me) are also going to be unaware of the various features or tools in the LLMs. A list might inform people of what's available, and therefore let us use a bit more common sense when we say "This tool is acceptable for checking your spelling, but that tool is prohibited." WhatamIdoing (talk) 02:30, 28 November 2025 (UTC)
- It's spellcheck, no one cares how you figure out how to spell a word as long as you knew which word you were trying to spell. I'd be wary of grammarly unless they put guardrails as Cdjp1 suggests though, and if they have guardrails then that's what needs to be specified: which built-in guardrails make it ok? ~ Argenti Aertheri(Chat?) 04:50, 28 November 2025 (UTC)
- Nobody should care how you figure out how to spell a word, but it sounds like some editors aren't operating with that level of nuance. WhatamIdoing (talk) 02:52, 2 December 2025 (UTC)
- LLMs can't do spellchecking in the sense we are used to. They can do something that can be similar in output, but the underlying process used won't be the same, due to the fundamental way llms work. In terms of tools, any llm-use will have this underlying generative framework because everything is converted into mathematics and then reconverted in some way. As Cdjp1 and Gnomingstuff note, refining any llm-use is about building the right guardrails, but these don't change the way the underlying program works. The complication with Grammarly is that it has its original software and new llm-based tools, and I'm not sure how much control or even knowledge the user has. Same possibly with Microsoft these days. CMD (talk) 07:24, 2 December 2025 (UTC)
- In a couple of years, will the average person realistically have a way to use ordinary word processing software (e.g., MS Word or Google Docs) without an LLM being used somewhere in the background? I don't know. Maybe it just looks inevitable because of where we are in the Gartner hype cycle right now, but the inadvertent use of LLMs feels like it will only get bigger over time. WhatamIdoing (talk) 19:59, 4 December 2025 (UTC)
- It's spellcheck, no one cares how you figure out how to spell a word as long as you knew which word you were trying to spell. I'd be wary of grammarly unless they put guardrails as Cdjp1 suggests though, and if they have guardrails then that's what needs to be specified: which built-in guardrails make it ok? ~ Argenti Aertheri(Chat?) 04:50, 28 November 2025 (UTC)
- Here (as someone who admittedly has not used Grammarly since their adoption of LLM tech) it would (potentially) be that Grammarly uses a narrow and specific LLM model that has additional guardrails that prevent it from acting in the generative manner that ChatGPT does. Or at least that would have been the smart way of rolling out LLM tech for Grammarly, as said I've not used it so I don't know where they have implemented rails. -- Cdjp1 (talk) 16:54, 27 November 2025 (UTC)
- So this begs the question why is Grammarly spell check allowed but not ChatGPT spellchecking? I'm not saying that people should plop "Write me a Wikipedia article" into a LLM and paste that into Wikipedia, but these LLMs have other use cases too. What use cases people want to prohibit/permit really need to be laid out more explicitly for this to be workable. Katzrockso (talk) 07:46, 27 November 2025 (UTC)
- The point isn't to address all the problems in the guideline that passed, just one: what technologies does this include. I know AI is a nebulous concept, that's actually why I chose it, so that WP:Randy from Boise can tell in seconds if his use of his software is included. Porn is a nebulous concept too, but we all know it when we see it. ~ Argenti Aertheri(Chat?) 04:20, 25 November 2025 (UTC)
Since copying over the footnote seems pretty non-controversial, version 2:
| − | Large language models (or LLMs) | + | Large language models (or LLMs) are not good at creating entirely new Wikipedia articles. Large language models should not be used to generate new Wikipedia articles from scratch. |
While true, it's not relevant and only makes this mess messier. If it's a guideline about content creation then it doesn't really matter how well LLMs can do other tasks. ~ Argenti Aertheri(Chat?) — Preceding undated comment added an unspecified datestamp.
- Since you didn't get any direct replies to this, here's a late comment:
- We're trying to present this as a guideline that involved reasonable people making a reasonable choice about reasonable things, rather than a bunch of ill-informed AI haters. The guideline is less likely to seem unreasonable or to be challenged by pro-AI folks if it acknowledges reality before taking away their tools. Therefore the guideline acknowledges and agrees with their POV ("can be useful"), names the community's concern ("not good at creating entirely new Wikipedia articles"), and then states the rule ("should not be used to generate new Wikipedia articles from scratch"). WhatamIdoing (talk) 20:07, 4 December 2025 (UTC)
- Agreed. The rules are principles, not lists of things that editors should and should not do. SuperPianoMan9167 (talk) 20:10, 4 December 2025 (UTC)
- Agreed. Alaexis¿question? 21:04, 5 December 2025 (UTC)
Supplemental essay proposal on identifying AI-generated text
[edit]Seeing as it has been noted (particularly by the RfC closer) that the existence of a guideline which prohibits AI-generated articles necessitates the existence of a consensus standard on identifying AI-generated articles, I've drafted a proposal which aims to codify ways that AI text can be identified for the purpose of enforcing this guideline (and any other future AI-restricting guideline)
The essay content is largely redundant to WP:AISIGNS but rather than just a list of AI indicators it specifically aims to be a standard by which content can be labelled as AI-generated.
Your feedback and proposed changes/additions are most welcome at User:Athanelar/Identifying AI-generated text. If reception is positive I will submit an RFC.
Pinging some editors who were active in this discussion: @Qcne @Voorts @Gnomingstuff @Festucalex @Mikeycdiamond @Argenti Aertheri @LWG Athanelar (talk) 17:55, 26 November 2025 (UTC)
- I agree a consensus standard is implied, but I would guess any rate of false positives or negatives will render either a guideline or tools controversial. I have a few suggestions: 1) I prefer a 'weak' or humble standard, using various criteria or methods may suggest but not prove AI use. 2) Checking the volume of changes, either as a single submission or in terms of bytes/second from a given IP or account, may occasionally serve as a cheaper semi-accurate proxy for AI detection, although once again there will be false positives and negatives. 3) Given the rapid development and diversity of AI tools, and the resources involved, I do not think developing uncontroversial tools for AI detection is a feasible goal in the near future. Deploying automatic tools sitewide or on-demand would likely be prohibited by cost, but if individual users wish to run them, I think their findings could contribute evidence towards a finding - so long as we guard against bias and overconfidence in the use of these tools. --Edwin Herdman (talk) 19:32, 26 November 2025 (UTC)
- The "suggest" wording is a good idea. For those who worry it may not be workable, our entire concept of notability rests on similar wording (e.g. "presumed to be suitable", "typically presumed to be notable"). If we're going down this road, I'd support wording like this and judgement by consensus in case of dispute. Toadspike [Talk] 21:08, 26 November 2025 (UTC)
- Regarding AI tools changing quickly, I did some very very very rough analysis of text pre- and post-GPT-5 if anyone is interested. Will revisit once I have more data. Gnomingstuff (talk) 03:57, 27 November 2025 (UTC)
- I made one small tweak -- adding the bit about edits having to be post-2022 for AI use to even be possible. "Strongly suggest" is the best we can do, unfortunately. If the burden of proof is on the person tagging/identifying AI-generated text, then that is almost literally impossible to provide because no one knows how someone made an edit but that person.
- As far as automated tools, you could do worse than just scraping all articles containing >5 instances (or whatever) of the listed "AI vocabulary" words, and then manually checking those to see what's up. (This is basically what I've been doing, minus the tools.) The elephant in the room, though, is that LLMs are changing right now -- GPT-5.1 came out just 2 weeks ago. We also almost never know which tools people are using, let alone the version or prompt or provided sources. And all that is compounded by the fact that even researchers don't know why AI sounds the way it does. The whole thing is largely a black box, and it's honestly kind of surprising we (as in we-the-public) have figured anything out at all. Gnomingstuff (talk) 00:11, 27 November 2025 (UTC)
- Thanks for your tweak. I haven't had any adverse reaction to this essay yet, so I'll give it until the 24 hour mark and if nobody's raised any major objections I'll put it up for RfC, and providing that passes then we can link to my essay from the NEWLLM page and that'll at least solve one of the RfC close's two problems. Then it'll just be a matter of codifying what we do if something breaches NEWLLM; but people seem to be generally on board with 'send it to AfD' as a solution for that already.
- My fingers are crossed we can move onto RfC for a proposal to expand NEWLLM to include all AI-generated contributions and not just new articles. Athanelar (talk) 00:16, 27 November 2025 (UTC)
- This is redundant to WP:AISIGNS. Perhaps some content can be merged with AISIGNS. —Alalch E. 23:47, 27 November 2025 (UTC)
- Note for everyone subscribed to this discussion; I have raised an RfC at the essay's talk page. Athanelar (talk) 00:20, 28 November 2025 (UTC)
A hypothetical scenario
[edit]Here's a hypothetical scenario to consider. Say you have an editor writing an article. It's a well-written, comprehensive article. They publish their draft and it gets approved at AfC and moved to mainspace. If that editor then says "I used AI to write the first draft of this article", does this guideline require the article be deleted, even though the content is perfectly acceptable? SuperPianoMan9167 (talk) 00:52, 27 November 2025 (UTC)
- Personally I believe that if the article has been comprehensively rewritten and checked line by line for accuracy prior to asking other editors to spend time on it at AfC, the tools used for the initial draft don't matter. -- LWG talk 01:04, 27 November 2025 (UTC)
- To me, "from scratch" implies a lack of rigorous review or corrections from a human editor. I attempted to clarify this in [6], but it got reverted. No reasonable person would require a perfectly-written and verified article to be deleted merely because an early draft was written with software assistance. Anne drew (talk · contribs) 01:05, 27 November 2025 (UTC)
- It's possibly already happened, and certainly has been used for edits. One temporary account recently asked about it at the help desk. I wrote my questions 0 and 1 for this case. Reasons I think are good for disallowing it are: 1) We don't like the 'moral hazard' of letting a part of the process not have human input, and the larger the change without human input and oversight, the greater the potential problem. 2) Openly allowing AI use might cause human reviewers to be overwhelmed. 3) The copyright status of Wikipedia content could be challenged, especially if 'substantive' AI edits are allowed to stand, a concern I think may be decisive for Wikimedia Foundation and ArbCom given the potential for losses. I think a lot of the rest of it is similar to the risks we accept in ordinary editing - bias and errors may propagate for a long time, but we hope that eventually somebody spots the problem. --Edwin Herdman (talk) 02:34, 27 November 2025 (UTC)
- It has absolutely already happened, to the tune of thousands of articles that we know about. And the ones we know about, we know about because there were enough signs in the text to be identifiable as AI. Gnomingstuff (talk) 02:35, 27 November 2025 (UTC)
- @Edwin Herdman, I don't think I understand The copyright status of Wikipedia content could be challenged, especially if 'substantive' AI edits are allowed to stand, a concern I think may be decisive for Wikimedia Foundation and ArbCom given the potential for losses.
- Does this mean that:
- some of Wikipedia's contents will not be eligible for copyright protection? In that case, the WMF isn't going to care (they're willing to host public domain/CC-0 content, though they would prefer that it was properly labeled), and protecting editors' copyrights is none of ArbCom's business. (ArbCom cares about editors' behavior on wiki. They are not a general-purpose governance group.)
- someone might (correctly) claim that they own the copyright for the AI-generated/AI-plagiarized contents of an article? In that case, the WMF will point them to the WP:DMCA process to have the material removed. If the copyright holder wishes to sue someone over this copyvio, they will need to sue the editor who posted it (not the WMF or ArbCom). This is in the foundation:Policy:Terms of Use; look for sentences like "Responsibility — You take responsibility for your edits (since we only host your content)" (emphasis in the original) and "You are responsible for your own actions: You are legally responsible for your edits and contributions" (ditto).
- WhatamIdoing (talk) 05:48, 27 November 2025 (UTC)
- I wrote that badly, but you've clarified the issue. I can't assume Wikipedia will always benefit from the Safe Harbor provision - the DMCA might be amended again or even repealed, or Wikipedia might be found to fail the Safe Harbor criteria. Even without a suit seeking damages, the DMCA process imposes at least some administrative burdens which I would consider worth a rough worst-case scenario estimate. I'll be happy if wrong; AI risks on copyright aren't totally unlike what any editor can do without AI, what's different is mainly spam potential and the changing legal landscape. My final thought is that LLMs don't inherently bring copyright issues - it's possible an LLM with a clear legal status might be developed. --Edwin Herdman (talk) 08:38, 27 November 2025 (UTC)
- It has absolutely already happened, to the tune of thousands of articles that we know about. And the ones we know about, we know about because there were enough signs in the text to be identifiable as AI. Gnomingstuff (talk) 02:35, 27 November 2025 (UTC)
- Based purely on the plain meaning of 'from scratch,' I would say that if the majority of the article's text is AI generated, then this guideline would suggest that the article should be deleted.
- If a 'first draft' was written with AI and then substantially rewritten by a human, it would essentially be the same as doing it from scratch by the human, so it gets a pass.
- 'From scratch' to me implies you had nothing before, now you have an article. If that article was written with AI, then it falls afoul of this guideline. Athanelar (talk) 15:07, 27 November 2025 (UTC)
- I would argue that there are actually two ways to parse how the “from scratch” guideline applies:
- 1. (as intended) You may not use an LLM to write a wholly new article that does not exist on Wikipedia as of yet.
- 2. You may not write an article by asking an LLM to generate it “from scratch”- ie without putting in any information. (Implied- you may use an LLM if you provide it with raw data)
- In other words, it is entirely possible to read the “from scratch” clause as referring to the LLM generation process, and not the Wikipedia article process. ~2025-36891-99 (talk) 20:09, 27 November 2025 (UTC)
- The answer is: No. To delete an article, it must be done in accordance with the wp:Deletion policy. —Alalch E. 23:37, 27 November 2025 (UTC)
- IMO this misses the point. We don't set policy based on what it is possible, but based on the overall impact on the project. For example, I am sure there are users who could constructively edit within WP:PIA from their first edit, but we don't let them, because on average letting inexperienced users edit in that topic area was leading to huge problems. Same logic applies here. We need to set LLM policy based on overall impact to the project. NicheSports (talk) 23:57, 27 November 2025 (UTC)
- We don't let new editors edit in the PIA topic area because ArbCom remedies are binding and cannot be overturned by fiat. This guideline is not like that. Reasonable exceptions should still be allowed. SuperPianoMan9167 (talk) 00:13, 28 November 2025 (UTC)
- I was speaking more generally about how our LLM PAGs should develop in the future. This guideline is far from ideal and clearly is going to change. I don't know the right first step, I just know what I want it to get to. NicheSports (talk) 00:16, 28 November 2025 (UTC)
- Is your ideal LLM guideline something like WP:LLM? SuperPianoMan9167 (talk) 00:20, 28 November 2025 (UTC)
- WP:LLM covers a lot, so there are parts I'd probably agree with, but as it relates to usage of LLMs, no. My ideal policies would be
- LLMs cannot be used to generate article prose or citations, regardless of the amount of review that is subsequently performed, unless the editor is experienced and possesses the llm-user right
- Experienced editors could apply for the llm-user right, with the same requirements as autopatrolled
- Users without the llm-user right could use LLMs for non prose-generating tasks. A few examples of this could be generating tables, doing proofreading, etc. We would need to draft an approved list of uses
- I want to add a G15 criteria for machine-generated articles with multiple material verification failures. This would efficiently handle problematic LLM-generated articles
- Content policy compliant LLM-generated articles would not need to be deleted. Although if they were discovered to be created by a user without the llm-user user right, we would warn the user about not doing so in the future.
- NicheSports (talk) 00:38, 28 November 2025 (UTC)
- So kinda like how AutoWikiBrowser (LLMs, like AWB, could be considered automated editing tools that assist a human editor) requires special approval? SuperPianoMan9167 (talk) 00:41, 28 November 2025 (UTC)
- Yes, but with more restrictive criteria than AWB. I think the autopatrolled requirements are a nice fit (and kind of spiritually related) NicheSports (talk) 00:46, 28 November 2025 (UTC)
- Please drop tables from the list of approved uses, it does it, and on face value seems to do it well, but under the hood is a different story. Maybe there's some version that does it well, or we could put guide rails on it, but GPTs format tables with overlapping column and row spans that are barely human readable. They're great with templates in general though if you check they haven't done more than copy and paste. "Put this text in this template following these rules" usually works beautifully, but not tables, the wiki table formatting is just too weird I guess. ~ Argenti Aertheri(Chat?) 02:10, 28 November 2025 (UTC)
- This is a very nice proposal, reflecting both the current situation (AI is simply as good as most humans on many technical tasks, so banning its use makes no sense) and concerns about a flood of disastrous content generated with AI due to ignorance, greed, or malice. Викидим (talk) 18:24, 2 December 2025 (UTC)
- So kinda like how AutoWikiBrowser (LLMs, like AWB, could be considered automated editing tools that assist a human editor) requires special approval? SuperPianoMan9167 (talk) 00:41, 28 November 2025 (UTC)
- WP:LLM covers a lot, so there are parts I'd probably agree with, but as it relates to usage of LLMs, no. My ideal policies would be
- Is your ideal LLM guideline something like WP:LLM? SuperPianoMan9167 (talk) 00:20, 28 November 2025 (UTC)
- I was speaking more generally about how our LLM PAGs should develop in the future. This guideline is far from ideal and clearly is going to change. I don't know the right first step, I just know what I want it to get to. NicheSports (talk) 00:16, 28 November 2025 (UTC)
- We don't let new editors edit in the PIA topic area because ArbCom remedies are binding and cannot be overturned by fiat. This guideline is not like that. Reasonable exceptions should still be allowed. SuperPianoMan9167 (talk) 00:13, 28 November 2025 (UTC)
Content self feedback
[edit]I would like to suggest that the concept of closed loop system be considered and somehow discussed in the guideline. The LLM nightmare is when other sources pick half baked content from AI generated material, and said sources pick it up again themselves. The feedback can continue and eventually many sources will affirm each other. The term to use then is: jambalaya knowledge. Yesterday, all my dreams... (talk) 16:17, 29 November 2025 (UTC)
- We do have WP:CITOGENESIS which describes this regarding Wikipedia, not quite the same but Wikipedia is a big feeder for AI training sets. Gnomingstuff (talk) 17:06, 29 November 2025 (UTC)
- I did not know about that page, so thank you. The LLM problem is in fact a super turbocharged version of that. Yesterday, all my dreams... (talk) 20:59, 29 November 2025 (UTC)
- We do have a mainspace article on model collapse which is the term for this phenomenon in large language models. It's not really relevant to this guideline specifically, though. Athanelar (talk) 14:29, 30 November 2025 (UTC)
Nutshell
[edit]@Novem Linguae: Nothing personal, but I challenge your assertion that this page is too short to have a nutshell. Having a modicum of humor helps keep this project from drowning in bureaucracy. — Hex • talk 14:38, 30 November 2025 (UTC)
Discussion at Wikipedia:Village pump (policy) § RfC: Replace text of Wikipedia:Writing articles with large language models
[edit]
You are invited to join the discussion at Wikipedia:Village pump (policy) § RfC: Replace text of Wikipedia:Writing articles with large language models. –Novem Linguae (talk) 23:40, 5 December 2025 (UTC)