Something began in the 1970s that has been described as “the AI winter”, but to call it that is to miss the point, because the social illness it represents involves much more than artificial intelligence (AI). AI research was one of many casualties that came about as anti-intellectualism revived itself and society fell into a diseased state.
One might call the “AI winter” (which is still going on) an “interesting work winter” and it pertains to much more of technology than AI alone, because it represented a sea change in what it meant to be a programmer. Before the disaster, technology jobs had an R&D flavor, like academia but with better pay and less of the vicious politics. After the calamitous 1980s and the replacement of R&D by M&A, work in interesting fields (e.g. machine learning, information retrieval, language design) became scarce and over 90% of software development became mindless, line-of-business makework. At some point, technologists stopped being autonomous researchers and started being business subordinates and everything went to hell. What little interesting work remained was only available in geographic “super-hubs” (such as Silicon Valley) where housing prices are astronomical compared to the rest of the country. Due to the emasculation of technology research in the U.S., economic growth slowed to a crawl, and the focus of the nation’s brightest minds turned to creation of asset bubbles (seen in 1999, 2007, and 2014) rather than generating long-lasting value.
Why did this happen? Why did the entrenched public- and private-sector bureaucrats (with, even among them, the locus of power increasingly shifting to private-sector bureaucrats, who can’t be voted out of office) who run the world lose faith in the research being done by people much smarter, and who work much harder, than them? The answer is simple. It’s not even controversial. End of the Cold War? Nah, it began before that. At fault is the lowly perceptron.
Interlude: a geometric puzzle
This is a simple geometry puzzle. Below are four points at the corners of the square, colored (and numbered) like so:
0 1
1 0
Is it possible to draw a line that separates the red points (0′s) from the green points (1′s)?
The answer is that it’s not possible. Any separating line would have to separate two points from each other. Now draw a circle passing through all four points. Any line can intersect that circle at no more than two points. Therefore, a line separating two points from the other two would have to separate two adjacent points, which would be of opposing colors. It’s not possible. Another way to say this is that the classes (colors) aren’t linearly separable.
What is a perceptron?
“Perceptron” is a fancy name given to a mathematical function with a simple description. Let w be a known “weight” vector (if that’s an unfamiliar term, a list of numbers) and x be an input “data” vector of the same size, with the caveat that x[0] = 1 (a “bias” term) always. The perceptron, given w, is a virtual “machine” that computes, for any given input x, the following:
- 1, if w[0]*x[0] + … + w[n]*x[n] > 0,
- 0, if w[0]*x[0] + … + w[n]*x[n] < 0.
In machine learning terms, it’s a linear classifier. If there’s a linear function that cleanly separates the “Yes” class (the 1 values) from the “No” class (the 0 values) it can be expressed as a perceptron. There’s an elegant algorithm for, in that linearly separable case, finding a working weight vector. It always converges.
A mathematician might say, “What’s so interesting about that? It’s just a dot product being passed through a step function.” That’s true. Perceptrons are very simple. A single perceptron can solve more decision problems than one might initially think, but it can’t solve all of them. It’s too simple a model.
Limitations
Let’s say that you want to model an XOR (“exclusive or”) gate, corresponding to the following function:
| in_1 | in_2 | out |
+------+------+-----+
| 0 | 0 | 0 |
| 0 | 1 | 1 |
| 1 | 0 | 1 |
| 1 | 1 | 0 |
+------+------+-----+
One might recognize that this is identical to the “brainteaser” above, with in_1 and in_2 corresponding to the x- and y- dimensions in the coordinate plane. This is the same problem. This function is nonlinear; it could be expressed as f(x, y) = x + y – 2xy. and that’s arguably the simplest representation of it that works. A separating “plane” in the 2-dimensional space of the inputs would be a line, and there’s no line separating the two classes. It’s mathematically obvious that the perceptron can’t do it. I showed this, above, using high-school geometry.
To a mathematician, this isn’t surprising. Marvin Minsky pointed out the mathematically evident limitations of a single perceptron. One can model intricate mathematical functions with more complex networks of perceptrons and perceptron-like units, called artificial neural networks. They work well. One can also, using what are called “basis expansions”, generate further dimensions from existing data in order to create a higher-dimensional space in which linear classifiers still work. (That’s what people usually do with support vector machines, which provide the machinery to do so efficiently.) For example, adding xy as a third “derived” input dimension would make the classes (0′s and 1′s) linearly separable. There’s nothing mathematically wrong with doing that; it’s something that statisticians do when they want to build complex models but still have some of the analytic properties of simpler ones, like linear regression or nearest-neighbor modeling.
The limitations of the single perceptron do not invalidate AI. At least, they don’t if you’re a smart person. Everyone in the AI community could see the geometrically obvious limitation of a single perceptron, and not one of them believed that it came close to invalidating their work. It only proved that more complex models were needed for some problems, which surprised no one. Single-perceptron models might still be useful for computational efficiency (in the 1960s, computational power was about a billion times as expensive as now) or because the data don’t support a more complex model; they just couldn’t learn or model every pattern.
In the AI community, there was no scandal or surprise. That some problems aren’t linearly separable is not surprising. However, some nerd-hating non-scientists (especially in business upper management) took this finding to represent more than it actually did.
They fooled us! A brain with one neuron can’t have general intelligence!
The problem is that the world is not run, and most of the wealth in it is not controlled, by intelligent people. It’s run by social-climbing empty-suits who are itching for a fight and would love to take some “eggheads” down a notch. Insofar as an artificial neural network models a brain, a perceptron models a single neuron, which can’t be expected to “think” at all. Yet the fully admitted limitations of a single perceptron were taken, by the mouth-breathing muscleheads who run the world, as an excuse to shit on technology and pull research funding because “AI didn’t deliver”. That produced an academic job market that can only be described as a pogrom, but it didn’t stop there. Private-sector funding dried up as short-term, short-tempered management came into vogue.
To make it clear, no one ever said that a single perceptron can solve every decision problem. It’s a linear model. That means it’s restricted, intentionally, to a small subspace of possible models. Why would people work with a restricted model? Traditionally, it was for a lack of data. (We’re in the 1960s and ’70s, when data was contained on physical punch cards and a megabyte weighed something and a disk drive cost more than a car.) If you don’t have a lot of data, you can’t build complex models. For many decision problems, the humble perceptron (like its cousins, logistic regression and support vector machines) did well and, unlike other computationally intensive linear classification methods (such as logistic regression, which requires gradient descent, or a variant thereof, over the log-likelihood surface; or such as the support vector machine, which are a quadratic programming problem that we didn’t know how to solve efficiently until the 1990s) it could be trained with minimal computational expense, in a bounded amount of time. Even today, linear models are surprisingly effective for a large number of problems. For example, the first spam classifiers (Naive Bayes) operated using a linear model, and it worked well. No one was claiming that a single perceptron was the pinnacle of AI. It was something that we could build cheaply on 1970-era hardware and that could build a working model on many important datasets.
Winter war
Personally, I don’t think that the AI Winter was an impersonal, passive event like the changes of seasons. Rather, I think it was part of a deliberate resurgence of anti-intellectualism in a major cultural war– one which the smart people lost. The admitted limitations of one approach to automated decision-making gave the former high school bullies, now corporate fat cats, all the ammo they needed in order to argue that those “eggheads” weren’t as smart as they thought they were. None of them knew exactly what a perceptron or an “XOR gate” were, but the limitation that I’ve described was morphed into “neural networks can’t solve general mathematical problems” (arguably untrue) and that turned into “AI will never deliver”. In the mean-spirited and anti-liberal political climate of the 1980s, this was all that anyone needed as an excuse to cut public funding. The private sector not only followed suit, but amplified the trend. The public cuts were a mix of reasonable fiscal conservatism and mean-spirited anti-research sentiment, but the business elites responded strongly to (and took to a whole new level) the mean-spirited aspect, flexing their muscles as elitism (thought vanquished in the 1930s to ’50s) became “sexy” again in the Reagan Era. Basic research, which gave far too much autonomy and power to “eggheads”, was slashed, marginalized, and denigrated.
The claim that “AI didn’t deliver” was never true. What actually happened is that we solved a number of problems, once thought to require human intelligence, with a variety of advanced statistical means as well as some insights from fields like physics, linguistics, ecology and economics. Solving problems demystified them. Automated mail sorting, once called “artificial intelligence”, became optical character recognition. This, perhaps, was part of the problem. Successes in “AI” were quickly put into a new discipline. Even modern practitioners of statistical methods are quick to say that they do machine learning, not AI. What was actually happening is that, while we were solving specific computational problems once thought to require “intelligence”, we found that our highly specialized solutions did well on the problems they were designed for, and could be adapted to similar problems, but with very slow progress toward general intelligence. As it were, we’ve learned in recent decades that our brains are even more complicated than we thought, with a multitude of specialized modules. That no specific statistical algorithm can replicate all of them, working together in real time, shouldn’t surprise anyone. Is this an issue? Does it invalidate “AI” research? No, because most of those victories, while they fell short of replicating a human brain, still delivered immense economic value. Google, although it eventually succumbed to the sociological fragility and failure that inexorably follow closed allocation, began as an AI company. It’s now worth over $360 billion.
Also mixed in with the anti-AI sentiment is the religious aspect. It’s still an open and subjective question what human intelligence really is. The idea that human cognition could be replicated by a computer offended religious sentiments, even though few would consider automated mail sorting to bear on unanswerable questions about the soul. I’m not going to go deep into this philosophical rabbit hole, because I think it’s a waste of time to debate why people believe AI research (or, for a more popular example, evolution by natural selection) to offend their religious beliefs. We don’t know what qualia is or where it comes from. I’ll just leave it at this. If we can use advanced computational techniques to solve problems that were expensive, painful, or impossible given the limitations of human cognition, we should absolutely do it. Those who object to AI on religious grounds fear that advanced computational research will demystify cognition and bring about the end of religion. Ignoring the question of whether an “end of religion” is a bad thing, or what “religion” is, there are two problems with this. First, if there is something to us that is non-material, we won’t be able to replicate it mechanically and there is no harm, to the sacred, in any of this work. Second, computational victories in “AI” tend to demystify themselves and the subfield is no longer considered “AI”. Instead, it’s “optical character recognition” or “computer game-playing”. Most of what we use on a daily basis (often behind the scenes, such as in databases) comes from research that was originally considered “artificial intelligence”.
Artificial intelligence research has never told us, and will never tell us, whether it is more reasonable to believe in gods and religion or not to believe. Religion is often used by corrupt, anti-intellectual, politicians and clerics to rouse sentiment against scientific progress, as if automation of human grunt work were a modern-day Tower of Babel. Yet, to show what I mean by AI victories demystifying themselves, almost none would hesitate to use Google, a web-search service powered by AI-inspired algorithms.
Why do the anti-intellectuals in politics and business wish to scare the public with threats of AI-fueled irreligion and secularism (as if those were bad things)? Most of them are intelligent enough to realize that they’re making junk arguments. The answer, I think, is about raw political dominance. As they see it, the “nerds” with their “cushy” research jobs can’t be allowed to (gasp!) have good working conditions.
The sad news is that the anti-intellectuals are likely to take the economy and society down with them. In the 1960s, when we were putting billions of dollars into “wasteful” research spending, the economy grew at a record pace. The world economy was growing at 5.7 percent per year, and the U.S. economy was the envy of the world. Now, in our spartan time of anti-intellectualism, anti-science sentiment, and corporate elitism, the economy is sluggish and the society is stagnant– all because the people in charge can’t stand to see “eggheads” win.
Has AI “delivered”?
If you’re looking to rouse religious fear and fury, you might make a certain species of fantastic argument against “artificial intelligence”. The truth of the matter, however, is that while we’ve seen domain-specific superiority of machines over human intelligence in rote processes, we’re still far from creating an artificial general intelligence, i.e. a computational entity that can exhibit the general learning capability of a human. We might never do it. We might not need to and, I would argue, we should not if it is not useful.
In a way, “artificial intelligence” is a defined-by-exclusion category of “computational problems we haven’t solved yet”. Once we figure out how to make computers better at something than humans are, it becomes “just computation” and is taken for granted. Few believe they’re using “an AI” when they use Google for web search, because we’re now able to conceive of the computational work it does as mechanical rather than “intelligent”.
If you’re a business guy just looking to bully some nerds, however, you aren’t going to appeal to religion. You’re going to make the claim that all this work on “artificial intelligence” hasn’t “delivered”. (Side note: if someone uses “deliver” intransitively, as business bullies are wont to do, you should punch that person in the face.) Saying someone or something isn’t “delivering” is a way to put false objectivity behind a claim that means nothing other than “I don’t like that person”. As for AI, it’s true that artificial general intelligence has eluded us thus far, and continues to do so. It’s an extremely hard problem: far harder than the optimists among us thought it would be, fifty years ago. However, the CS research community has generated a hell of a lot of value along the way.
The disenchantment might be similar to the question about “flying cars”. We actually have them. They’re called small airplanes. In the developed world, a person of average means can learn how to fly one. They’re not even that much more expensive than cars. The reason so few people use airplanes for commuting is that it just doesn’t make economic sense for them: the savings of time don’t justify increased fuel and maintenance costs. But a middle-class American or European can, if she wants, have a “flying car” right now. It’s there. It’s just not as cheap or easy to use as we’d like. With artificial intelligence, that research has brought forth a ridiculous number of victories and massive economic growth. It just hasn’t brought forth an artificial general intelligence. That’s fine; it’s not clear that we need to build one in order to get the immense progress that technologists create when given the autonomy and support.
Back to the perceptron
One hard truth I’ve learned is that any industrial effort will have builders and politicians. It’s very rare that someone is good at both. In the business world, those unelected private-sector politicians are called “executives”. They tend, for a variety of reasons, to put themselves into pissing contests with the builders (“eggheads”) who are actually making stuff. One time-tested way to show up the builders is to take something that is obviously true (leading the builders to agree with the presentation) but present it out of context in a way that is misleading.
The incapacity of the single perceptron at general mathematical modeling is a prime example of this. Not one AI researcher was surprised that such a simple model couldn’t describe all patterns or equational relationships. The fact that can be proven (as I did) with high school geometry. That a single perceptron can’t model a key logical operation is, as above, obviously true. The builders knew it, and agree. Unfortunately, what the builders failed to see was that the anti-intellectual politicians were taking this fact way out of context, using the known limitations of a computational building block to ascribe limitations (that did not exist) to general structures. This led to the general dismantling of public, academic, and private support for technological research, an anti-intellectual and mean-spirited campaign that continues to this day.
That’s why there are so few AI jobs.
data:image/s3,"s3://crabby-images/b4f98/b4f989c4ff701a081099e9addd17539cd952e406" alt=""