If you are not aware of the hungry judge issue, look it up, and see if you can some back and repeat what you wrote with a straight face.
I am well aware that people often exhibit less than commendable behaviour.
MLs work by copying existing suboptimal behaviour.
You cannot instruct an ML to "do the same as is always done except in special case X do Y", because the X and Y are distributed across millions of trimpots.
The hungry judge study is really interesting. Judges adjudicating parole hearings were monitored for a wide range of things that might influence their decision making. The one issue that stood head and shoulders above all others was how long it had been since the judge ate. I think they were looking beyond actual case factors, for issues like how fresh or tired the judge was, but they finally realised the big deal was stomach contents. So, you could ask the judge how they came to their decision, and they could give you a nice balanced view of the factors they took into account, yet they would miss the tank in the night and day..... I mean just how sated they were at the time.
Back in the 80s there were many similar examples found while eliciting knowledge from medical experts. There were many reasons, e.g. not realising all the clues they were using when making rules, forgetting edge cases, symptoms with causes that might be inside or outside the sub-medical domain being investigated.
Fundamentally knowledge elicitation is an art, an imperfect art.
I don't know what mechanisms they use, but when current AI systems don't give the results people want, either being inaccurate or being politically unacceptable, they seem to be able to cook the system very quickly to heavily nudge the outcomes. We saw this with Google's recent mess with every famous historical white figure being rendered as some weird slightly Asian, slightly African, slightly Middle Eastern and slightly native American unisex figure. They responded to that quickly, if incompletely.
"Responded" is, of course, insufficient. There were famous problems with gorillas being included in the results; oops. Updates were confidently issued - and new failures were almost instantly discovered. Rinse and repeat several times.
That is exactly what you would expect from suck-it-and-see "fixes", where nobody can know what the changes are, nor the results spring from the changes.
On the other hand its several years since Google was ridiculed for searches like "white couples" producing only results for couples who were not white. They fixed that for the obvious cases, but I still see searches like "white plastic sheeting" giving me results for people selling every kind of plastic except white. Google's search engine is really badly broken, and we don't really know what it consists of in 2024. Google have made a lot oif TPUs, though, so I guess there is a big ML aspect to their current system.
If you can't
predict how something will behave, then anytime it gives an expected result has to be chance.
I noticed some videos responding to a recent paper from Apple, about GSM (a high school math test) results from current AI systems. These systems have been achieving high marks, but Apple showed the systems have no analytical ability. If they take the standard GSM questions, and modify them a little major things happen to the accuracy of the results. Things like simply changing the names in questions of the "Jane does X and Fred does Y" type substantial reduces the accuracy of all the AI results. Presumably these questions, in some form, and lots of commentary about the GSM tests, is among the training data. When they changed questions to add some irrelevant detail, that even a low ability human would easily ignore, the AIs took that detail into account, and produced strange results. So, they found no sense of any understanding, merely pattern matching. The researchers seemed shocked to find this. An elaborate pattern matching machine pattern matches quite well, but isn't sophisticated enough to fully generalise the patterns? Whoda thought?
Do read comp.risks; it is one of the very few sources I recommend unreservedly to everybody.
http://catless.ncl.ac.uk/risksSince it comes out infrequently (<1/week), I use the RSS feed
http://catless.ncl.ac.uk/risksrss2.xml to catch the new infelicities.
So, I guess I have to apologise to tggzzz. I thought people had got past debacles like the "find the tanks among the trees" and similar messes of the 1980s. I guess a new generation of researchers and developers need to learn the old lessons again.
Unfortunately that is the case.
People don't change. Young people continue to think their forebears know nothing. Far too old quote: "When I was 14 I thought my father was a fool. When I was 21 I was amazed at how much he had learned in the past 7 years".
My simple attitude:
- technology X has fundamental limitation Y
- later on, technology X' is promoted as being better
- ask how it avoids Y
- salesman will normally assert X' doesn't have Y (for software, typically respond with "oh excellent, the Byzantine General's problem / split brain problem / etc has been solved at last; where can I see the published results?)
- ask how and why not
- the lack of an answer almost always reveals that X->Y has been ignored or forgotten
Works with hardware, works with software languages, software products, software environments, works with politics, ........