Originally, I had intended the 9th part to be the one in which I solve this damn thing for good. Unfortunately, as that “9th post” crossed into 12 kiloword territory, I realized that I needed to break it up a bit. Even for me, that’s long. So I had to tear some stuff out and split this “final post” yet again. So here’s the 9th chapter in this ongoing lurid saga. (See: Part 1, Part 2, Part 3, Part 4, Part 5, Part 6, Part 7, Part 8). I am really trying to wrap this fucker up so I can get my life back, but I’m not going to solve it just yet… there are a few more core concepts I must address. Today’s topic, however, is convexity.
What is convexity?
Convexity pertains to the question: which is more important, the difference between excellent work and mediocre work, or that between mediocre work and noncompliance (zero)? For concave work, the latter is more important: just getting the job done is what matters, and qualitative concerns are minimal. For convex work, the difference between excellence and mediocrity is substantial and that between mediocrity and failure is small.
The theoretical basis for this is the logistic function, or “S-curve”, which is convex at its left side (looking exponential at y << 0.5) and concave on the right, as it approaches a horizontal asymptote (saturation point). Model input as a numerical variable i pertaining to resources, skill, talent, or effort. Then model task output as having a maximal yield Y, and the function Y * p(i) where p is a logistic function with range (0, 1) representing the proportion of maximum possible yield that is captured. Now, the inflection point (switch-over from convexity to concavity) is exactly where p(i) = 0.5. Taken in full, this logistic function is neither concave or convex. Yet, for most economic problems, the relevant band of the input range is narrow and is mostly on one side of the inflection point or the other. We can classify tasks as convex or concave based on what we know about the performance of the average.
To get concrete about it, consider exams in most schools. A failing student might be able to answer 60% of the questions right; an average one gets 80%, and the good ones get 90%. That’s a concave world. The questions are easy, and one needs to get almost all of them right to distinguish oneself. On the other hand, a math researcher would be thrilled to solve 50% of the (much harder) problems she confronts. With concave work, the success or completion rates tend to be high because the tasks are easy. With convex work, they’re low because the tasks are hard. What makes convex work worth doing is that, often, the potential yield is much higher. If the task is concave, it’s been commoditized, and it’ll be hard to make a profit by doing it. Even 100% completion will yield only marginal profits. If the work is convex, the most successful players can generate outsized returns. It may not be clear even what the upper limit (the definition of “100%”) is.
Convexity and management
Consider the following payoff structures for two tasks, A and B. A is concave; B is convex, and 0 to 4 represent degrees of success.
| Performance | A Payoff | B Payoff | Q dist | R dist |
-------------------------------------------------------
| 4 (Superb) | 125 | 500 | 20.0% | 0.0% |
| 3 (Good) | 120 | 250 | 20.0% | 20.0% |
| 2 (Fair) | 100 | 100 | 20.0% | 60.0% |
| 1 (Poor) | 60 | 25 | 20.0% | 20.0% |
| 0 (Awful) | 0 | 0 | 20.0% | 0.0% |
-------------------------------------------------------
Further, let’s assume there are two management strategies, Q and R. Under Q, the workforce will be uniformly distributed among the five tiers of performance: 20% in each. Under R, 20% each of the workforce will fall into Good and Poor, 60% into the Fair tier, and none into the Superb or Awful tiers. R is variance-reducing managerial strategy. It brings people in toward the middle. The goal, here is to maximize bulk productivity, and we assume we have enough workers that we can use the expected payoff as a proxy for that.
For Job A, which is concave, management strategy Q produces an output-per-worker of 81, while R yields 96. The variance-reducing strategy, R, is the right one, yielding 15 points more. For example, bringing up the worst slackers (from 0 to 60) delivers more benefit than pulling down the top players (from 125 to 120).
For Job B, which is convex, strategy Q gives us an average yield of 165, while R delivers only 105– 60 points less. The variance-reducing strategy fails. We see more of a drop in pulling down the best people (from 500 to 250) than we gain in hauling the laggards (o to 25).
In short, when the work is concave, variance is your enemy and reducing it increases expected yield. When the work is convex, variance is your friend; more risk means more yield.
The above may seem disconnected from the problems of the MacLeod organization, but it’s not. MacLeod organizations are based on variance-reduction management strategies, which have worked overwhelmingly well over the past 200 years of concave, industrial-era labor. MacLeod Losers naturally desire familiarity, uniformity, and stability. They want variance to be reduced and will give up autonomy to have that. MacLeod Clueless (middle managers) take on the job of reducing variance in conditions for the Losers below them, and reducing volatility in performance for the Sociopaths above. Their job is to homogenize and control, and they do it well. It doesn’t require vision or strategy. MacLeod Sociopaths start out as the “heroic” risk-takers (entrepreneurs) but that caste often evolves (as especially as transplant executives come in) into a cushy rent-seeking class as the organization matures (necessitating the obfuscation enabled by the Clueless and the Effort Thermocline). The Sociopath category itself becomes risk-averse, out of each established individual’s desire to protect organizational position. The result is an organization that is very good at reducing variance and stifling individuality, but incapable of innovation. How do we come back from that?
In the concave world, the failures of the MacLeod organization were tolerable. Businesses didn’t need to generate new ideas. They needed to turn existing, semi-fresh ones into repeatable processes, and motivate large groups of people to carry out difficult but mostly simple functions. Variance-reduction was desired and encouraged. Only in the past few decades, with the industrial era fading and the technological one coming on, has there been a need for business to have in-house creative capacity.
Old-style organizations: the optimization model
The convex/concave discussion above assumes one dimension of input (pertaining to how good an individual is at a job) and one of output (observed productivity). In truth, a more accurate model of an organization’s performance would have a interconnected network of such “S-curve” functions for the relationships between various variables, many of which are hidden. There’d be a few input variables (“business variables”) and the things the company cares about (profit, reputation, organizational health) would be outputs, but most of the cause-and-effect relationships are hidden. Wages affect morale, which affects performance, which affects productivity, which affects the firm’s profits, which is its performance function. With all of the dimensions that could be considered, this function might be very convoluted, and while it is held to exist “platonically” it is not known in its entirety. The actual function relating controllable business variables to performance is illegible (due to hidden variables) and certainly not perfectly concave or convex.
So how does the firm find an optimal solution for a problem it faces?
This gets into an area of math called optimization, and I’m not going to be able to do it justice, so I’ll just address it in a hand-wavy way. First, imagine a two-dimensional space (if only because it’s hard to visualize more) where each point has an associated value, creating a 3-dimensional graph surface. We want to find the “highest point”. If that surface is globally concave, like an inverted bowl, that’s very easy, because there can only be one maximum. We can start from any point and “hill climb”: assess the local gradient, step in the most favorable dimension. We’ll end up at the highest point. However, the more convoluted our surface is, the harder the optimization problem. If we pick a bad starting point on a convoluted surface, we might end up somewhere sub-optimal. Thinking of it in topographical terms, a “hill climb” from most places won’t lead to the top of Mount Everest, but to the neighborhood’s highest hill. In other words, the “starting point” matters if the surface is convoluted.
Real optimization problems usually involve more than two dimensions. It is obviously not the case that organizations perform optimizations over all possible values of all possible business variables (of which there are an infinite number). Additionally, the performance function changes over time. As a metaphor, however, for the profit-maximizing industrial corporation, it’s surprisingly useful. One part of what it must do is pick a good “starting point” for the state of the business, a question of “What should this company be?” That requires non-local insight. Another part is the iterative process of refinement and hill-climbing once that initial point is selected.
This leads to the three-tier organization. People who are needed for commodity labor but not trusted, at all, to affect business variables are mere workers. Managers perform the iterative hill-climb and find the highest point in the neighborhood. In startup terms, they “iterate”. Executives, whose job is to choose starting points, also have the right to make non-local “jumps” in business state if needed. In startup terminology, they “pivot”.
The MacLeod organization gets along well with this computational model of the organization. MacLeod Losers are mediocre in dedication, but that’s fine. That aspect of them is treated as a hidden variable that can be modulated via compensation (carrots) or managerial attention (sticks). In the optimization model, they’re just infrastructure– human resources in the true sense of the word. The fault of MacLeod Clueless is that they aren’t strategic, but they don’t need to be. Since their job is just to climb a hill, they don’t need to worry about non-local “vision” concerns such as whether they’re climbing the right hill. That’s for someone else to worry about. They just assess the local gradient and move in the steepest upward direction. Finally, there are the MacLeod Sociopaths, whose goal is to be strategic and have non-local insight. Being successful at that usually requires a high quality of information, and people don’t get that stuff by following the rules. The source could be illegal (industrial espionage) or chaotic (experimental approaches to social interaction) or merely insubordinate (a programmer learning new technologies on “company time”) but it’s almost always transgressive. The MacLeod Sociopath’s ability to get information confers more benefit, in an executive position, than the negatives associated with that category.
Why the optimization model breaks down
In the model above, there is some finite and well-specified set of business variables. The real world is much more unruly. In truth, there are an infinite number of dimensions. Two things make this more tractable. The first is sparsity. Most dimensions don’t matter. For example, model “product concern” as a vector representing the products that a company might make (1.0 meaning “it’s the only thing we care about”, 0 meaning “not interested at all”). Assume there are 387 trillion possible conceivable products that a firm could create. That’s 387 trillion business variables; 386.99999…+ trillion of those entries are always going to be zero (excepting a major pivot) and can be thrown out of the analysis. Second is aggregation. For personnel, one could have a variable for each of the world’s 7.1 billion people (again, most being zero for ‘not working here’) but most companies just care about a few things, like how many people they employ and how much they cost. Headcount and budget are the important business variables. Whether John A. Smith, 35, of Flint, Michigan is employed at the company (i.e. one of those 7.1 billion personal variables) isn’t that relevant for most values of John A. Smith, so executives need not concern themselves with it.
Even still, modern companies have thousands to millions of business variables that matter to them. That’s more information than a single person can process. Then there is the matter of what variables might matter (unknown unknowns). If the optimization problem were simple, the company would only need one executive to call out starting points, but these information issues mandate a larger team. The computation that the organization exists to perform must be distributed. It can’t fit on one “node” (i.e. one person’s head). That also mandates that this massively high-dimensional optimization problem be broken down as well. (I’m ignoring the reality, which is that most people in business don’t “compute” at all, and that many decisions are made on hunches rather than data.)
As far as many dimensions are separable (that is, they aren’t expected to interact, so the best values for each can be found in separation) the problem can be decomposed by splitting it into subproblems and solving each in isolation. Executives take the most important business variables where it is most likely that non-local jumps will be needed, such as whether to lay off 15% of the workforce. The less important ones (like whether to fire John A. Smith) are tackled by managers. Workers don’t participate in the problem-solving; they’re just machines.
This evolves from an optimization model where business variables and performance functions are presumed to exist platonically, to a distributed agent-based model operated by local problem-solving agents. This is a more accurate model of what actually happens in the corporation, made further amusing by the fact that the agents often have diverging personal objective functions. Centralized computation is no longer the most important force in the company; it’s communication between the nodes (people).
Here’s where MacLeod comes in to the agent-based model. MacLeod Losers consume information and turn it into work. That’s all they’re expected to do, and ideally the only thing that should be coming back is one word: done. MacLeod Clueless furnish information up and down the food chain, non-editorially because they aren’t strategic enough to turn it into power. They tell the Losers what to do, and the Sociopaths what was done, and they aren’t much of a filter. The only information they take in, in general, is what information others want from them. The MacLeod Sociopaths are strategic givers and takers of information, and (having their own agendas) they are selective in what they transmit. Organizations actually need such people in order to protect the top decision-makers from “information overload”. It’s largely the bottom-tier Sociopaths who participate in dimensionality reduction and aggregation, so they’re absolutely vital; however, they make sure to use whatever editorial sway they have toward their own benefit.
Optimization and convexity
The actual performance function of a company, in terms of its business variables, is quite convoluted. It’s generally concave in a neighborhood (enabling managers to find the “local hill”) but its global structure is not, necessitating the non-local jumping afforded to executives. The underlying structure, as I said earlier, is driven by an inordinate number of hidden variables. It might be best thought of as a neural network of S-curve functions (“perceptrons”) wherein there are elements of concavity and convexity, often interacting in strange ways. It’s not possible for anyone to ascertain what a specific organization’s underlying network looks like exactly. The overall relationship between business variables (inputs) and performance (output) is not going to be purely concave or convex. The best one can hope for is a well-chosen initial point in which the neighborhood is concave.
For typical organizations and most people in them, concavity has a lot of nice properties. For one thing, it tends toward fairness. Allocation of resources to varying parties, if the input/output relationships are concave, is likely to favor an “interior” solution– everyone gets some– because marginal returns diminish as the resource is allocated to richer people. If the input/output relationship is convex, resource allocation can favor an “edge” solution where one party gets everything and the rest get none, which tends to exacerbate the “power law” distributions associated with social inequalities: a few (who seem the most capable of turning those resources into value) get much, most get little or nothing. Another benefit of concavity is that performance relative to a standard can be measured. At concave work, the maximum sustainable output is typically well-studied and known, and acceptable error rates can be defined. With convex work, no one knows what’s possible. Once a maximum is established and can be reliably attained, the task is likely to become concave (as people develop the skills to perform it successfully more than 50% of the time). Research is inherently convex: most things explored don’t pan out, but those that do deliver major benefit. When those explorations lead to repeatable processes that can be carried out by people of average motivation and talent, that’s concave, commodity work.
MacLeod organizations exist as a risk trade between those who want to be protected from the vagaries of the market, so creating islands of concavity and easiness is kind of what they do. The Big World Out There is a place with many pockets of convexity, plenty of bad local maxima, and a difficult and mostly unexplored landscape. The MacLeod organization provides its lower-level workers with access to an already-explored and safe concave hill neighborhood. Executives read maps and find start points. Managers just follow the steepest gradient up to the top, and workers just have to follow and carry the manager’s stuff.
Technology and change
There’s a problem with concave work. It tends to be commoditized, making it hard to get substantial profits on it. If “100% performance” can truly be defined and specified, then it can also be achieved mechanically. Not only are the margins low, but machines are just better at it than humans: they’re cheaper, don’t need breaks, and don’t make as many mistakes. Robots are taking over the concave work, leaving us with convexity.
We cannot compete with robots on concave work. We’ll need to let them have it.
Software engineering is notoriously convex. First of all, excellent software engineers are 5 to 100 times more productive than average ones, a phenomenon known as the “10X engineer”. As is typical of convex projects, most software projects fail– probably over 80 percent deliver negative net value– but the payoffs of the successes are massive. This is even more the case in the VC-funded startup ecosystem, where companies that seem to show potential for runaway success are sped along, the laggards being shot and killed. In a convex world, that’s how you allocate resources and attention: focus on the winners, starve the losers.
Convexity actually makes for a very frustrating ecosystem. While convex work is a lot more “fun” because of the upside potential and the challenge, it’s not a great way to make a living. Most software engineers get to age 30 with no successful projects (most of this being not their fault) under their belt, no credibility on account of that series of ill-fated projects, and mediocre financial results (even if they had successes). Managing convex work, and compensating it fairly, are not things that we have a society have figured out how to do. For the past 200 years– the industrial era, as opposed to the technological one that is coming on– we haven’t needed to do it. Almost all human labor was concave. What little convex work existed was generally oriented toward standardizing processes so as to make 99.9% of the organization’s labor pool concave. We are now moving toward an economy where enormous amounts of work are done by machines, practically for free, leaving only convex, creatively taxing work.
The fate of the MacLeod organization
MacLeod organizations, over the past 200 years, could perform well. They weren’t great at innovation; they didn’t need it. They got the job done. One of the virtues of the corporation was its ability to function as a machine made out of people. It would render services and products not only at more scale, but much more reliably, than individual people could do. The industrial corporation co-evolved with the failings of each tier of the MacLeod organization, hence converging on the “optimization model” above that uses the traits of each to its benefit. Of course, I do not mean to suggest that these “computations” are performed in reality, but the metaphor works quite well.
The modern technological economy has created problems for that style of organization, however. Microeconomic models tend to focus on a small number of business variables– price points, quantity produced, wages. Current-day challenges require thousands, often ill-defined. What product is one going to build? What kind of people should be hired? What kind of culture should the company strive toward, and how will it enforce those ideals? Those things matter a lot more in the technological economy. Hidden variables that could once be ignored are now crucial, and industrial-era management is falling flat. Combine this with the convexity of input/output relationships regarding individual talent, effort, and motivation, and we have a dramatically different world.
The islands of concavity that MacLeod organizations can create for their Losers and Clueless are getting smaller by the year. The ability to protect the risk-averse from the Big World Out There is diminishing. MacLeod Sociopaths were never especially scrupulous about keeping that implicit promise, but now they can’t.
Even individuals now have to make non-local (formerly executive-level) decisions. For a concrete example, consider education. The generalist education implicitly assumes that most people will have a concave relationship between their amount of knowledge in an area and utility they get from it. It’s vitally important, as most educational institutions see it, for one to first get a mediocre amount of knowledge about a lot of subjects. (I agree, for non-economic reasons. How can a person know what is interesting without having a wide survey of knowledge? A mediocre knowledge gives you enough to determine if you want to know more; with no knowledge, you have no clue.) However, there’s no such thing as a Generalist job description. The market doesn’t reward a breadth of mediocre knowledge. People need to specialize. In 1950, having a college degree bought a person credibility as someone capable of learning quickly, thus entry into the managerial, professional, or academic ranks. (Specialization could begin on the job.) By 1985, one needed a marketable major: preferably, math, CS, or a physical science. In 2013, what classes a person took (compilers? machine learning?) is highly relevant. The convex valuation of a knowledge base makes deep knowledge in one area more valuable than a broader, shallower knowledge. Choosing and changing specialties is also a non-local process. A well-rounded generalist can move about the interior by gradually shifting attention. The changing specialist must jump from one “pointy” position to another– and hope it’s a good place to be.
In technology especially, we’re seeing an explosion of dimensionality. General competence doesn’t cut it anymore. Firms aren’t willing to hire “overall good” people who might take 6 months to learn their technology stacks, and the most credible job candidates don’t want to pin their careers on companies that don’t strongly correspond with their (sometimes idiosyncratic) preferences. When there’s a bilateral matching problem (e.g. dating) it usually has something to do with dimensionality. Both sides of the market are “purple squirrel hunting”.
This proliferation of dimensionality isn’t sustainable, of course. One thing I’ve come to believe is that it has an onerous effect on real estate prices. That might seem bizarre, but the “star cities” are the only places that tolerate purple squirrel hunting. If you’re a startup that wants a Python/Scala/C++ expert with production experience in 4 NoSQL products and two PhDs, you can find her in the Bay Area. For some price, she’s out there. That’s not because the people in the Bay Area are better; it’s that, with more of them, you get a continuous (it’s available at some price) rather than discrete (you might wait intolerably long and not find it) market for talent– and also for jobs, from a candidate’s perspective– even if you’re trying to fill some ridiculous purple squirrel specification. That’s what makes “tech hubs” (e.g. Bay Area, New York, Boston) so attractive– to candidates and companies both– and a major part of what keeps them so expensive. The continuous markets make high-risk business and job-hopping careers– that aren’t viable in smaller cities unless one wants to move or tolerate remote work– possible. Since real estate in these areas is reaching the point of being unaffordable for technology workers, I think it’s a fair call to say that this dimensionality explosion in technology won’t continue forever. However, convexity and high dimensionality in general are here to stay, and about to become the norm for the greater economy. The convexity introduced by an economic arrangement where an increasing bulk of commodity labor is dumped directly on machines has incredible upsides, and is very attractive. Now, in the late-industrial era, global economic growth is about 4-5 percent per year. In the thick of the technological era– a few decades from now– it could be over 10% per year.
If MacLeod rank cultures are going to become obsolete, what will replace them? That I do not know for sure, but I have some thoughts. The “optimization model” paints a world where the relevant business variables are known. Executives call out initial values (based on non-local knowledge) for a gradient ascent performed by managers. As the business world becomes high-dimensional– too many dimensions for any one person to handle them all– it begins to break down the problem and distribute the “computation” (again, solely in metaphor). High-ranking executives handle important dimensions (sub-problems) where tricky non-local jumps might be in order. Managers handle less-important ones where continuous modulation will do. Getting the communication topology right is tricky. Often the conceptual hierarchy that is created will look suspiciously like the organizational hierarchy (Conway’s Law?) This leads to an interesting question: is this hierarchy of people– which will limit the firm’s capacity to form proper conceptual hierarchies and solve its own problems– even necessary? Or is it better to have all eyes open on non-local, “visionary” questions? Is that a good idea? Organizations claim to want their employees to “act like owners”. Is that really true? With the immense complexity of the technological economy, and the increasing inability of centralized management to tackle convexity (one cannot force creative excellence or innovation by managerial fiat) it might have to be true.
Enter the self-executive. Self-executive people don’t think of themselves as subordinate employees, but as free agents. They don’t want to be told what to do. They want to excel. A manager who will guide one (mentorship) gets loyalty. However, typical exploitative managers get ignored, sabotaged, or humiliated. Self-executive employees are the ones who can handle convexity, and enjoy the risk and challenge of hard problems. They strongly toward chaos on the civil alignment spectrum. These are the people one will need in order to navigate a convex technological economy, and the self-executive culture is the one that will unleash their capabilities.
That said, the guild culture has a lot to add as well and should not be ignored. There’s a lot of lost work in exploration that can be eliminated by advice from a wise mentor (although if things change, as they do more rapidly these days, that “don’t go there” advice might sometimes be best discarded). The valuation of knowledge and skill are so strongly convex that there’s immense value generation in teaching. Not only should that not be ignored, but it’s going to become a critical component of the working culture. Companies that want loyalty are going to have to start teaching people again. Self-executives don’t work hard unless they believe they’re learning more on the work given to them than they would on their own– and these people tend to be fiercely autodidactic.
This brings us to the old quip. A VP tells his CEO that the company should invest more in its people, and he says, “What if we spend all that money training them and they leave?” The VP’s response: “what happens if we don’t and they stay?” That ends up looking like MacLeod rank culture over time. There’s a lot to be learned from guild culture, and when I finally Solve This Fucking Thing (Part 11? 12? 5764+23i?) I won’t be able to afford to overlook it.
Image may be NSFW.
Clik here to view.
Clik here to view.
