Artificial intelligence seems to have become ubiquitous in the technology industry. AIs, we’re told, are replying to our emails on Gmail, learning how to drive our cars, and sorting our holiday photos. Mark Zuckerberg is even building one to help out around the house. The problem is that the concept of "artificial intelligence" is way too potent for its own good, conjuring images of supercomputers that operate spaceships, rather than particularly clever spam filters. The next thing you know, people are worrying about exactly how and when AI is going to doom humanity.
Tech companies have partly encouraged this elision of artificial intelligence and sci-fi AI (especially with their anthropomorphic digital assistants), but it’s not useful when it comes to understanding what our computers are doing that's new and exciting. With that in mind, this primer aims to explain some of the most commonly used terms in consumer applications of artificial intelligence — as well as looking at the limitations of our current technology, and why we shouldn’t be worrying about the robot uprising just yet.
What do ‘neural network,’ ‘machine learning,’ and ‘deep learning’ actually mean?
These are the three terms you’re most likely to have heard lately, and, to be as simple as possible, we can think of them in layers. Neural networks are at the bottom — they're a type of computer architecture onto which artificial intelligence is built. Machine learning is next — it’s a program you might run on a neural network, training computers to look for certain answers in pots of data; and deep learning is on top — it’s a particular type of machine learning that’s only become popular over the past decade, largely thanks to two new resources: cheap processing power and abundant data (otherwise known as the internet).
The concept of neural networks goes all the way back to the ‘50s and the beginning of AI as a field of research. In a nutshell, these networks are a way of structuring a computer so that it looks like a cartoon of the brain, comprised of neuron-like nodes connected together in a web. Individually these nodes are dumb, answering extremely basic questions, but collectively they can tackle difficult problems. More importantly, with the right algorithms, they can be taught.
YOU TELL A COMPUTER WHAT TO DO, WITH MACHINE LEARNING, YOU SHOW IT HOW
So, say you want a computer to know how to cross a road, for example, says Ernest Davis, a professor of computer science at New York University. With conventional programming you would give it a very precise set of rules, telling it how to look left and right, wait for cars, use pedestrian crossings, etc., and then let it go. With machine learning, you’d instead show it 10,000 videos of someone crossing the road safely (and 10,000 videos of someone getting hit by a car), and then let it do its thing.
The tricky part is getting the computer to absorb the information from all these videos in the first place. Over the past couple of decades, people have tried all sorts of different methods to try to teach computers. These methods include, for example, reinforcement learning, where you give a computer a "reward" when it does the thing you want, gradually optimizing the best solution; and genetic algorithms, where competing methods for solving a problem are pitted against one another in a manner comparable to natural selection.
In today’s classrooms-for-computers, there’s one teaching method that's become particularly useful: deep learning — a type of machine learning that uses lots of layers in a neural network to analyze data at different abstractions. So, if a deep learning system is looking at a picture, each layer is essentially tackling a different magnification. The bottom layer might look at just a 5 x 5 grids of pixels, answering simply "yes" or "no" as to whether something shows up in that grid. If it answers yes, then the layer above looks to see how this grid fits into a larger pattern. Is this the beginning of a line, for example, or a corner? This process gradually builds up, allowing the software to understand even the most complicated data by breaking it down into constituent parts.
"As you go up these layers the things that are detected are more and more global," Yann LeCun, the head of Facebook’s artificial intelligence research team, tells The Verge. "More and more abstract. And then, at the very top layer you have detectors that can tell you whether you’re looking at a person or a dog or a sailplane or whatever it is."
DEEP LEARNING SYSTEMS NEED A LOT OF DATA AND A LOT OF TIME TO WORK
Next, let’s imagine that we want to teach a computer what a cat looks like using deep learning. First, we’d take a neural network and program different layers to identify different elements of a cat: claws, paws, whiskers, etc. (Each layer would itself be built on layers that allow it to recognize that particular element, but that’s why this is called deep learning.) Then, the network is shown a lot of images of cats and other animals and told which is which. "This is a cat," we tell the computer, showing it a picture of a cat. "This is also a cat. This is not a cat." As the neural network sees different images, different layers and nodes within it light up as they recognize claws, paws, and whiskers, etc. Over time, it remembers which of these layers are important and which aren’t, strengthening some connections and disregarding others. It might discover that paws, for example, are strongly correlated with cats, but that they also appear on things that are not cats, so it learns to look for paws that also appear alongside whiskers.
This is a long, iterative process, with the system slowly getting better based on feedback. Either a human will correct the computer, nudging it in the right direction. Or, if the network has a large enough pot of labeled data, it can test itself, seeing how different weightings of all its layers produce the most accurate answers. Now, you can imagine how many steps are needed just to say whether something is or is not a cat, so think how complex these systems have to be to recognize, well, everything else that exists in the world. That’s why Microsoft was proud to launch an app the other week that identifies different breeds of dogs. The difference between a doberman and a schnauzer might seem obvious to us, but there are a lot of fine distinctions that need to be defined before a computer can tell the difference.
So this is what Google, Facebook, and the rest are using?
For the most part, yes.
Deep learning techniques are now being employed for all sorts of everyday tasks. Many of the big tech companies have their own AI divisions, and both Facebook and Google have launched efforts to open up their research by open-sourcing some of their software. Google even launched a free three-month online course in deep learning last month. And while academic researchers might work in relative obscurity, these corporate institutions are churning out novel applications for this technology every week: everything from Microsoft’s "emotional recognition" web app to Google’s surreal Deep Dream images. This is another reason why we’re hearing a lot about deep learning lately: big, consumer-facing companies are playing with it, and they’re sharing some of the weirder stuff they’re making.
INTELLIGENCE IS ONE THING, COMMON SENSE IS ANOTHER
However, while deep learning has proved adept at tasks involving speech and image recognition — stuff that has lots of commercial applications — it also has plenty of limitations. Not only do deep-learning techniques require a lot of data and fine-tuning to work, but their intelligence is narrow and brittle. As cognitive psychologist Gary Marcus writes at the New Yorker, the methods that are currently popular "lack ways of representing causal relationships (such as between diseases and their symptoms), and are likely to face challenges in acquiring abstract ideas like ‘sibling’ or ‘identical to.’ They have no obvious ways of performing logical inferences, and they are also still a long way from integrating abstract knowledge, such as information about what objects are, what they are for, and how they are typically used." In other words, they don’t have any common sense.
For example, in a research project from Google, a neural network was used to generate a picture of a dumbbell after being trained on sample images. The pictures of dumbbells it produced were pretty good: two gray circles connected by a horizontal tube. But in the middle of each weight was the muscular outline of a bodybuilder’s arm. The scientists involved suggest this might be because the pictures the network had been trained on showed a bodybuilder holding the dumbbell. Deep learning might be able to work out what the common visual properties of tens of thousands of pictures of dumbbells are, but it would never make the cognitive leap to say that dumbbells don’t have arms. These sorts of problems aren’t just limited by common sense either. Because of the way they examine data, deep-learning networks can also be fooled by random patterns of pixels. You might see static, but a computer is 95 percent certain that's a cheetah.
These sorts of limitations can be artfully hidden though. Take the new wave of digital assistants like Siri, for example, which often seem like they can understand us — answering questions, setting alarms, and telling a few preprogrammed jokes and quips along the way. But as the computer scientist Hector Levesque points out, these quirks just show how big the gap between AI and real intelligence is. Levesque uses the example of the Turing Test, and points out that the machines that do best at this challenge rely on tricks to make people think they’re talking to a human. They use jokes, quotations, emotional outbursts, misdirection, and all manner of verbal dodges to confuse and distract questioners. And indeed, the machine that was said by some publications to have beaten the Turing test last year did so by claiming to be a 13-year-old Ukrainian boy — a cover story that excused its occasional ignorance, clunky phrasing, and conversational non sequiturs. View More