Yep LLMs are most definitely not perfect and make mistakes (Pretty much any AI chat bot warns about that on the first page).
But as imperfect as they are, they are still performing certain tasks better than any of the prior non AI based solutions. Great example of it is translation between languages. We had computer programs that can translate text to a different language for many decades, however if you ever used any of those you will know that they all produce sentences that sound broken or sometimes translate it outright wrong because a word is similar. But even the first ChatGPT that ran on GPT 3 was providing translations that seam to be as good as a human translator. Not only that, it provided good translations into niche languages (for example Slovenian is only spoken by 2 milion people and is a difficult language, yet it has no issues with it, despite it not even being an officially supported language). This is already from a general purpose chatbot AI that was not specifically trained to do translation, it just happened to have a mish mash of languages in its training data and it figured out the connections.
What these AIs excel in is interpreting fuzzy input. Computers with classical programing work great for interpreting very rigid languages, this is what programing languages are. Here everything follows a strict set of well defined rules, however human languages that have organically evolved over thousands of years are very messy. Hence this is why translating human language text has always been so difficult to get right for computers.
When it comes to things like summarizing large piles of human text or extracting a certain bit of information from it, we didn't really have any way of doing it using classical programing that follows strict rules. So even if it performs poorly that means it still performs way better than not performing at all. Similarly classical computer vision algorithms only really work in ideal conditions, so in some industrial machine they have the camera rigidly mounted to look exactly straight onto the scene, target is the same object every time and is perfectly illuminated..etc. However if you are going to be looking for traffic signs out of a car windshield you won't have those ideal conditions, so applying the classical machine vision approaches would perform even worse as it would either completely ignore most signs if set to be strict, or if set loose it would misidentify signs. This is a very difficult task to solve.
I also don't like the unpredictable nature of large AI models just as much. There is no way to 100% guarantee it won't generate the wrong output. You can just put more 9s on the 99.9..% by more testing. If you can solve the problem with classical programming then you should NOT be involving AI in this at all. But for tasks that can't be solved this way having something that works 99.9% of the time is still way better than something that does not work at all. As long as you are aware that it will make a mistake at some point and have something that can handle the mistake.
There's some validity in that, e.g. translating a restaurant menu is very convenient, and I expect LLMs would avoid the historic "out of sight, out of mind"->"invisible idiot" and "the spirit is weak but the flesh is willing"->"the vodka is strong but the meat is rotten". It is also possible to validate (to some extent) a translation by getting the machine (preferably a different LLM) to do the reverse translation.
However, any theoretical benefits can be overwhelmed by the way LLM systems
are used in out in the wild...
You have no way of knowing whether adding one new test input will cause the weights to be recalculated so that previously acceptable outputs become dangerous. That is
inherent in the way LLMs work, and cannot be avoided. "You can't test quality into a product".
LLMs aren't good at digesting large amounts of fuzzy input
https://hai.stanford.edu/news/hallucinating-law-legal-mistakes-large-language-models-are-pervasiveThe "99% is better than nothing" is valid in some circumstances - but not in others. The 99% falls into an "uncanny valley": the user has to be aware that you have to correct/ignore the output 1% of the time, but probably isn't able
judge which 1%. Add to that the lazy/unscrupulous user (of which there are many) and it is a recipe for problems.
Whether or not detecting speed limit signs is/isn't possible with a LLM or conventional system is a red herring. No unscrupulous manufacturers have claimed an non-LLM system could do that; LLM manufacturers on the other hand do. Ray of hope: US transportation authorities are finally catching onto that, and have started demanding manufacturers justify their claims; I wonder how that will pan out.
LLMs
are being used for life-changing applications, without understanding of how they reach safe/dangerous conclusions. In some cases the decision making is even hidden behind "commercial confidentiality" clauses, in others users actively want to not think about the conclusions, in others legal liability for their illegal and false outputs is denied.
https://www.bbc.com/travel/article/20240222-air-canada-chatbot-misinformation-what-travellers-should-knowThe medical problems I've previously mentioned.
The "lock them up and let God sort them out" applications that I've previously mentioned.
Any of the yoootoob vids showing Tesla cars trying to drive on the wrong side of the road, or drive down railway tracks, etc, etc.
Cruise and Waymo in the US.
Etc.