Never try to get ChatGPT to do something original. It simply can't.
You say that like the human creative process doesn't involve large amounts of copying, mashing up, and minor variations.
LLMs assemble plausible sequences of words into grammatically correct sentences. "Plausible" is some vague form of "average" of the words that have previously been found in similar relationship to each other and the question.
It's funny you use a negative tone to describe these systems... yet hedged in such a way that precisely the same definition applies to all human speech.
You allude to this, of certain groups, but that seems rather unfair when the net you've cast is far more inclusive.
The real insight seems to be by matter of degree, for which we need some way to measure bullshit, but we can imagine that such a method exists. Now we find a different truth: humans in general do it when expedient (or indeed necessary for rhetorical purpose, as often the case for the highlighted groups, but also more broadly, or relatedly, narcissism finds great value in gaslighting, and by extension, domains where narcissists feature prominently), and LLMs do it less and less as time goes on -- or at least, as egregiously.
A different perspective I've had is, compare a proper PC to an embedded CPU of comparable processing power, but far fewer peripherals, memory constrained. Worse still, one that lacks hardware virtualization / memory mapping. There are a lot of tasks they are comparable at, but anything that requires enough memory, will at the very least be much harder to compose (e.g. more code (and execution time) to bring in external storage, rather than everything being memory mapped to begin with), and at some point will have to turn into a VM (or plural..!) at
considerable cost to execution speed -- or if you strip down a lot of the code, compromises on feature set, or accuracy (or even if both). Suppose we have an application that requires some function calculated in given maximum time; to the extent that computations can be approximated, and for sufficient degree of restriction, approximations must be employed by the limited system. In terms of linguistic output, we might consider this bullshitting, an approximation that may or may not be accurate to underlying (available, if given enough time or access) information.
Comparing average human to LLM vocabulary, clearly the LLM wins, trained on basically every language (including programming and markup). It's "wider" at this level. Maybe not to ultimate human ability, but I mean the average human is barely trained in one or two languages (depending how strict/technical one makes "well trained"). Perhaps, to the extent we can map vocabulary to processing power, this is the case of the embedded system being faster (e.g. compare a typical STM32F4 to a 486 or early Pentium).
Comparing average human to LLM coherency, semantics, synthesis, I would argue LLMs are already superior on several fronts. A wide variety of simple, shallow, space-filling, or formatting (e.g. as standard news format) tasks are essentially solved problems. At least to the mediocre quality of copy that passes for the average case in industry.
The present limitation on buffer length, and depth of training at that length (I would assume--?), limits semantics in a similar way as low memory limits performance of a computer: without external resources, or time to access them, or some such limitation like that, some approximation is required, and thus error -- in this case, unexpected, inconsistent or incoherent text. It seems clear from progression in LLM development, that this is the hardest limitation, and is analogous with memory limitations.
And, not that it's an insurmountable problem, you could probe an LLM repeatedly and stitch together a more complete and knowledgeable overall response; but that takes more effort, and quickly becomes not worthwhile, just as running a VM vs. getting a better CPU/RAM/system overall, is. Conversely, we could liken the coherence length to various human responses, pathological or otherwise; brains with amnesia for one, but even just lack of attention (literally as in ADHD), might have similar experiences. And there too, one can augment their brain with external tools (e.g., keeping notes to remember long-term information), to varying degrees of success.
I suspect the amount of memory they're using for LLMs, let alone processing power, is already more than enough for the task, the problem is we don't know how to solve the problem algorithmically, so it's brute forcing tensors and training data to do a merely poor job of it. And, which doesn't scale well, or at least AFAIK it can't be directly sized up (but has to be fully retrained; not that that might matter much, if you're increasing model size/scope by order of magnitude, it's not like you're saving much training effort).
That, and the lack of ways to integrate semantic information -- and of outputs, to interface with the wider world (with good reason, but they're already being put into places they shouldn't, and people will simply try more as time goes on..).
Tim