🇮🇷 Iran Proxy | https://www.wikipedia.org/wiki/User:Festucalex/Don%27t_use_LLMs_as_search_engines
Jump to content

User:Festucalex/Don't use LLMs as search engines

From Wikipedia, the free encyclopedia

You should never use LLMs like ChatGPT, Gemini, Claude, DeepSeek, etc. to research topics and find sources. AI models positively suck at being search engines. There are six major problems with using them as such.

The problems

[edit]

Problem 1: Knowledge cutoff

[edit]

LLMs are trained on data created up to a certain date. Training an LLM on new data is extremely expensive and time- and money-consuming, and so these models are guaranteed not to be up-to-date.

Problem 2: Hallucination

[edit]

This is intrinsic to the very way that LLMs are designed. LLMs will frequently make up sources, arguments, and conclusions out of thin air. It takes more time to check the veracity of the output by hand than to just, you know, do your research properly from the start. Thus, you may eventually be tempted to skip on checking everything, and this bodes ill for this encyclopedia.

Problem 3: Incompleteness

[edit]

A meticulous human researcher will develop the ability, nay, the intuition, to find relevant information and unknown unknowns of a certain topic. LLMs do not do that. They will do the bare minimum, and when pushed, will start hallucinating. Thus, LLMs will not provide you with complete research, because they lack the ability to spot obscure threads and make the lateral connections that drive the direction of research. This will result in glaring gaps that can only be spotted by someone who is familiar with the topic.

Problem 4: Training data quality

[edit]

LLM training data is indiscriminate, being obtained by AI companies through aggressive scraping of any and all available sources, regardless of quality. This introduces biases to the data in a way that a human would usually be more able to avoid (or at least, detect). LLMs will not be able to tell you why it took a certain path and why it highlighted a source in place of another. It's like drinking a mysterious stew at a witch's cottage in the jungle, who knows what went into this. As a researcher, your job is not just to identify which sources to include, but which sources to exclude, whether due to their low quality, obsolescence, or simply to save time and space. As of the time of writing (and probably for a long, long time), this is best left to the human mind. Normal search engines like Google and DuckDuckGo will also indiscriminately throw results, but a human researcher can identify that and look further, while an LLM won't.

Problem 5: Improper synthesis

[edit]

LLMs will attempt to combine the data it has into new formulations and conclusions not stated by the original sources, which will be difficult to trace. On Wikipedia, this is expressly forbidden, even for human editors. LLMs don't "understand" this, and will jump to conclusions based on an unstated (and unstatable) set of assumptions and correlations, combining data in opaque ways, and that will, in turn, shape your own thinking and lead you astray.

Problem 6: Atrophy

[edit]

This is the most moralizing of these problems, but I'll bloviate about it anyways. Using an LLM for this (and anything else, really) is bound to rob you of training time. You can only develop skills through doing things, and this goes for research. It's like paying someone else at the gym to do the lifting for you, you're not getting ripped any time soon, bro. When you do research for a Wikipedia article, you learn how to look for information, judge source quality, coax obscure sources out of link-rotting websites, etc. If you're an aspiring academic, this will make it easier for you when you begin to write and publish papers. If you're not an academic, this will help you get to the truth the next time someone on TV tells you to eat magic beans for weight loss. Quite a useful skill for these trying times.

Conclusion

[edit]

LLMs are not search engines and they should not be used as if they were. They can introduce almost invisible faults in your writing, amplify human biases, and misrepresent the totality of knowledge on any topic. Research and source-finding is an acquired skill that Wikipedia is uniquely positioned to help you develop, and both you and the encyclopedia will be better off for it.

See also

[edit]
  • Chatbot psychosis, wherein LLMs generate psychological feedback loops that lead to aberrant behavior. While not directly relevant, this is an example of how LLMs can change your behavior for the worse.