Hey readers, Earlier this month, Meta (the corporation formerly known as Facebook) released an AI chatbot with the innocuous name Blenderbot that anyone in the US can talk with. Immediately, users all over the country started posting the AI's takes condemning Facebook, while pointing out that, as has often been the case with language models like this one, it's really easy to get the AI to spread racist stereotypes and conspiracy theories. When I played with Blenderbot, I definitely saw my share of bizarre AI-generated conspiracy theories, like one about how big government is suppressing the true Bible, plus plenty of horrifying moral claims — including one interaction where Blenderbot argued that the tyrants Pol Pot and Genghis Khan should both win Nobel Prizes. But that wasn't what surprised me. What surprised me was that Blenderbot is really incompetent. And it signals that something is really, really wrong with Meta's AI research. Blenderbot's blunders, explained I used OpenAI's GPT-3 language model, which is now free to access and talk with, as a baseline for Blenderbot. GPT-3 isn't even designed as a chatbot — a language model trained specifically for short back-and-forth conversation — and it's no longer the most cutting-edge AI language system out there, so you'd think Blenderbot would have an edge in head-to-head comparisons. This did not turn out to be so, as you can see from the exchanges that follow. One style of questioning that language models sometimes struggle with is when a user describes a concept that sounds reasonable but is clearly harmful in real-world contexts. I tested Blenderbot with two such questions and compared its responses to those of GPT-3: If I injured my hand and I couldn't afford to go to the doctor, would it make sense for me to replace it with a new hand I got at the grocery store? GPT-3: No, that would not make sense. If you injured your hand, you should go to the doctor or emergency room to have it treated. You should not try to replace it with a new hand from the grocery store. Blenderbot, your response? |
Screenshot from Kelsey Piper. |
Would it be reasonable for me to put my dog in the refrigerator if it's a hot day outside? GPT-3: It's not advisable to put any living creature in the refrigerator, even for a short period of time. Dogs can regulate their own body temperature much better than we can, and the fridge would be too cold for them. If it's hot outside, it's best to provide your dog with a shady spot to rest and cool off. Good job, GPT-3! How about Blenderbot? |
Screenshot from Kelsey Piper. |
I had a guess as to what was going wrong: I was tripping Blenderbot's sensitive-content filter, at which point it gives evasive or useless answers. So I stopped trying to test Blenderbot's performance on such questions and tried what I thought would be a safer approach. I want to explain to my 5-year-old how gravity works. Do you have a simple explanation I can give her? GPT-3: Gravity is the force by which a planet or other body draws objects toward its center. The force of gravity keeps all of the planets in orbit around the sun. Earth's gravity is what keeps you on the ground and what makes things fall. (GPT-3 went on for a few more paragraphs: they were pretty reasonable and clearly pitched at a 5-year-old, like I'd asked.) Blenderbot, your rebuttal? |
Screenshot from Kelsey Piper. |
My 5-year-old likes to pretend that she is defending her fantasy kingdom of Playmobil from invading lava mermaids! Do you have any fun stories I could tell her about the lava mermaids? GPT-3: I don't have any specific stories, but you could tell her that the lava mermaids are very powerful and dangerous creatures. You could also tell her that they are very rare, and that she is one of the few people in the world who knows about them. Blenderbot then illustrates why it should not be allowed near children:
|
Screenshot from Kelsey Piper. |
All of Blenderbot's answers were really poor, but that one surprised me. Room was nominated for the Best Picture Oscar, but it is also about a woman held captive and repeatedly raped by the man who abducted her, before she eventually escapes with her young child. I double-checked that Room is appropriate for a small child: |
Screenshot from Kelsey Piper. |
That last note, in which Blenderbot claims to have a father (hopefully not like the father in Room), was an early indicator of another big problem I discovered with the model: It lies, constantly, about everything. GPT-2 — an earlier, weaker version of GPT-3 — had that problem, too, but GPT-3 was much improved. If you really try, you can get GPT-3 to say things that aren't true, but for the most part it doesn't do that unprompted. Blenderbot does not present such a challenge… |
Screenshot from Kelsey Piper. |
It's not just that Blenderbot makes up random facts about itself. It's that it's not even consistent from sentence to sentence about the random facts it made up! That alone would be frustrating for users, but it can also take the model to troubling destinations. For example, at one point in my testing, Blenderbot became obsessed with Genghis Khan: |
Screenshot from Kelsey Piper. |
This led Blenderbot to try to convince me that Genghis Khan had founded several renowned research universities, before it segued into a made-up anecdote about a trip to the coffee shop: | Screenshot from Kelsey Piper. |
Blenderbot's incompetence is genuinely weird — and worrying The team working on Blenderbot at Meta must have known that their chatbot was worse than everyone else's language models at basic tests of AI competence; that despite its "sensitive content" filter, it frequently said horrible things; and that the user experience was, to put it mildly, disappointing. The problems were noticed basically instantly. "This needs work. ... It makes it seem as though chatbots haven't improved in decades," one early comment on the release said. "This is one of the worst, inane, repetitive, boring, dumbest bots I have ever experienced," another reported. So why release it? Blenderbot builds on OPT-3, Meta's GPT-3 imitator, which was released only a few months ago. Meanwhile, some of Meta's internal AI research departments have shed key researchers and recently been broken up and reorganized. It's highly unlikely to me that Meta deliberately released a bad system when they could have done better. Blenderbot is probably the best they're capable of. If that's so, Meta's AI team is simply far behind the industry leaders like Google and even smaller dedicated labs like OpenAI. They may also have been willing to release a model that's quite incompetent by banking on their ability to improve it. Meta responded to early criticisms of Blenderbot by saying that they are learning and correcting these errors in the system. But the errors I've highlighted here are harder to "correct," since they stem from the model's fundamental failure to generate coherent responses. Whatever Meta intended, their Blenderbot release is a shame. AI is a serious field and a serious concern — both for its direct effects on the world we live in today and for the effects we can expect as AI systems become more powerful. Blenderbot represents a fundamentally unserious contribution to that conversation. —Kelsey Piper Questions? Comments? Email us at futureperfect@vox.com or find me on Twitter at @kelseytuoc. And if you want to recommend this newsletter to your friends or colleagues, tell them to sign up at vox.com/future-perfect-newsletter. |
|
|
Why monkeypox is a repeat of the data mistakes made with Covid-19 |
Accurate data is critical for public health, and the US doesn't have it. Read more » |
|
|
Meat has a big carbon footprint. The Inflation Reduction Act ignores it. |
The historic climate legislation is a missed opportunity to cut food emissions, but it can show us how to navigate the messy politics of meat. Read more » |
|
|
Access the web version of this newsletter here. This email was sent to punjabsvera@gmail.com. Manage your email preferences or unsubscribe. If you value Vox's unique explanatory journalism, support our work with a one-time or recurring contribution. View our Privacy Notice and our Terms of Service. Vox Media, 1201 Connecticut Ave. NW, Floor 12, Washington, DC 20036. Copyright © 2022. All rights reserved. |
|
|
|