AI comes in all shapes and sizes. Some are subservient, like Wall-E or R2-D2. Others, like Skynet, are more dominating (if not annihilating). But what about the middle ground? Can AI be more than our servants - yet less than our masters? Can AI collaborate with humans? ISI researchers Andres Abeliuk, Daniel Benjamin, and Fred Morstatter have been studying the much-more-interesting converse question: Can humans collaborate with AI?

Andres brings up the example of self-driving cars: "The dialogue around them acts as if it is an all-or-nothing proposition. But we’ve slowly been acclimated to automation in cars for years with automatic transmissions, cruise control, anti-lock brakes, etc." This more realistic scenario of collaborative AI is at the centre of their newly published Nature article Quantifying machine influence over human forecasters. They set about asking some very important questions: Do humans trust AI assistants? If so, when? If not, why not? Do human-AI collaborations fare better than AI or human intuition alone?

To measure trust in a human-machine relationship, we need lots of humans willing to try to collaborate with AI in a lab environment. This is exactly what the ISI team was working on: SAGE, short for Synergistic Anticipation of Geopolitical Events, where laypeople collaborate with AI tools to predict the future. We reported on the success of SAGE earlier this year. Non-experts accurately predicted last April that North Korea would launch its missile test before July, which it did.

But how does SAGE help in studying this more general idea of human-AI collaboration? Daniel answers: "SAGE aims to develop a system that leverages human and machine capabilities to improve upon the accuracy of either type on its own. This Hybrid Forecasting Competition (HFC) provided a unique setting to study how people interact with computer models. (Other) Studies typically involve one-off or short-term participation. The HFC recruited participants to provide forecasts for many months. These users participated week-in and week-out on questions that were open for weeks."

Hundreds of participants of varying demographics voluntarily signed up for answering such predictive questions as: How many earthquakes of magnitude 5 or stronger will occur worldwide in a given month? What will be the daily closing price of gold on a given date? Some participants were exposed to AI predictions while others were not. Participants were free to choose whether to rely on the AI predictions, or not.

So what's the verdict? Do human-AI collaborations beat humans alone? Yes, they do, and this hybrid team also beats an AI who's working alone! Fred explains: "At the start of the HFC, some of our teammates thought it was a forgone conclusion that the machine models would outperform the human forecasters - a hypothesis proven false."

It turns out that the ISI team were in for quite a few surprises. "Our key finding was that users used the statistical models more rarely than we anticipated, in a pattern that resembled how people use human advice. We expected many instances where forecasters over-relied on the models. Instead we find people over-relied on their personal information. Forecasters readily dismissed the model prediction when it disagreed with their pre-existing beliefs (known as confirmation bias)."

This has huge implications. It necessitates the consideration of trust when building AI applications. "It helps point to the importance of considering how tools will be used. It is not enough to design a tool that succeeds at a task if the tool is not used well."

Notice how despite evidence that listening to the AI helped overall, people could not optimally heed to its suggestions. "Overall, the addition of statistical models into a forecasting system did improve accuracy. However, it should not be a foregone conclusion that humans will use the tools well or at all. To optimize a human-computer system, trust in machines must be earned. Trust in machines, much like trust in other humans, is easily lost."

The implications are not merely for engineers who build such AI tools, but also for customers who use them. "The average person should learn to be more deliberate in how they interact with new technology. The better forecasters in our study were able to determine when to trust the model and when to trust their own research. The average forecaster was not."

J.A.R.V.I.S. : Sir, please may I request just a few hours to calibrate- Tony Stark : Nope! Micro-repeater implanting sequence complete! J.A.R.V.I.S. : As you wish, sir. I've also prepared a safety briefing for you to entirely ignore. Tony Stark : Which I will.

Rough Work

Outline:

AI sci-fi. Human-AI collaboration. Example questions. Introduce paper. Studying trust in AI.
How do you do that? SAGE. HFC. several people.
Why bother with humans? Surprise surprise.
What other surprising findings? Confirmation bias. trust is lost easily.
This has implications. Engineers, don't assume a tool will be used. HR, train people to work with AI. People, gain an intuition on when to trust AI and when not to.