Quantcast
Channel: michaelochurch – Michael O. Church
Viewing all articles
Browse latest Browse all 304

Computing From the Middle Out, Part 1: Why Turing Machines Matter

$
0
0

While you’re here: my novel, Farisa’s Crossing will come out on April 26, 2019.

Computers have an undeserved reputation for being unpredictable, complicated beasts. I’m going to argue that, to the contrary, they’re quite simple at their core. In order to establish this, I’ll work through some models of computation, as well as some programming models that correspond well to real-world computation (with indications of where they don’t).

There’s a lot of complexity in real-world computing. Some of it’s desirable and some of it’s not. For example, today’s cell phones, laptops, and servers use electronic circuitry far more complex than, say, a Turing machine. That isn’t a problem because the payoff is immense and the cost to user is minimal. If the complicated adder or multiplier is a thousand times faster, most people are happy to have this way. So, even though real-world integrated circuits are complicated in ways we won’t even begin to discuss here, it’s not a problem. Doing simple things, better, is a worthy expense of complexity.

On the other hand, bloated buggy software ruins lives– this problem is largely preventable, but unlikely to improve because of conditions in the software industry (e.g., a culture that encourages piss-poor management) that are beyond the scope of the analysis here. If ever there were a machine for producing unusable crapware, it would be the American corporation. But again, that’s a topic for another time.

I’d prefer to motivate the claim that computers can be simple. They can be.

What Is Computation?

Computability theory is quite deep, but there’s a relatively simple, rule-based definition of what it means for a (partial) function to be mathematically computable. Our domain here is functions Nn → N; that is, from lists of natural numbers to natural numbers.

  • The n-ary zero functions z1(x) = 0, z2(x, y) = 0 , … , are computable for all n.
  • The successor function s(x) = x + 1 is computable.
  • For any nk < n, the projection function pn,k(x1, … , xn) =xk is computable.
    • p1,1(x) = x, the identity function, and p2,1(x, y) = x, f2,2(x, y) = y are the most used examples.
  • Composition: compositions of computable functions are computable.
    • For example h(x, y) = f(g1(x, y), g2(x, y), g3(x, y), g4(x, y) is computable if f and all the gi are.
    • This means that a computable function can use as many computable functions as it wants as subroutines.
  • Primitive Recursion: if g and h are computable, then so is f, defined like so:
    • f(0, x1, … , xn) = g(x1, … , xn), and
    • f(n + 1, x1, … , xn) = h(nf(nx1, …), x1, …);
    • this is the recursive analogue a for-loop; the number of calls is bounded.
  • Search (a.k.a. General Recursion): if f is computable, then so is mf, defined as:
    • mf(x1, … , xn) = k where k is the least integer where f(kx1, … , xn) = 0.
    • We say mf(x1, … , xn) ↑ (pronounced “diverges”) if there is no such k. The function is not defined at that point.
    • this is analogous to a while loop. If the function diverges, an implementation would not terminate– unless the programmer could predict divergence in advance, but this is not always possible.

Functions that don’t use search are called primitive recursive. Those are total– they have values for all inputs, and more importantly, these values can be computed in a finite number of steps. If one uses general recursion, though, all bets are off. The function may not be defined for some inputs.

For example, addition is primitive recursive. It’s defined like so:

add(0, x) = x

add(n + 1, x) = s(add(n, x))

In the language above, g(x) = x and h(nax) = s(a).

Multiplication is a primitive recursion using addition rather than the successor function. One can also show that limited subtraction, sub(xy) = max(x – y, 0) is primitive recursive.

Furthermore, any bounded search problem is primitive recursive. If you have an upper bound on how far you’re willing to search, you can use a primitive recursive function.

Sometimes, it’s a judgment call how one wants to implement it.

For example, the division function can be represented as:

div(nd) is the first q such that qd< (q + 1) * d.

Perform an unbounded search for such a q and, when d = 0, this diverges. However, in this case we know when the function’s badly behaved and can rectify it:

idiv(nd) is 1 + div(nd) if d > 0, and 0 if d = 0.

It returns a positive integer on success– a successful return of 0 becomes a 1– and a 0 on failure. The enclosing routine can decide how to handle the error case.

Divisibility checks (nothing but 0 is divisible by 0) and primality are primitive recursive and therefore total computable within finite time. Most importantly, prime factorization is primitive recursive. This is something we’ll come back to.

Turing Machines

Most people have heard of Turing machines, but unless they have taken a course in graduate-level logic or the theory of computation, they’ve probably never worked with one– and may not know what it is.

They have the reputation of being complicated beasts. They’re brain-dead simple, actually. Doing anything with them, that’s the part that can be painful. The ones that we inspect and analyze as computers tend to have massive state spaces– which may or may not be a problem– while the most aggressively minimalistic ones– I won’t prove it, but there are machines with under 20 states and two symbols that can compute any function– tend to be inscrutable in practice.

Formally, an (n, s) Turing machine is a device that:

  • recognizes a pre-programmed alphabet of n > 2 symbols. That set could be {0, 1}, or {A, B, C}, or the 100,000 most common English language words. One of these symbols is blank.
  • is in one of s distinct internal states, including one called Start and one called Halt. This set must be finite and is pre-programmed into the machine.
  • has n * (s – 1) pre-programmed rules, written as (sold, ain, snew, aout, ±1), one for each (sold, ain) pair except for those where sold = Halt.
  • reads and writes to a tape– each cell holding exactly one symbol– that never runs out in either direction.

And here is how it works:

  • Input: a finite number of cells may be set to any non-blank values. (The rest of the tape is all blank, in both directions.)
  • Initialization: the machine is put in state Start.
  • Runtime: Over and over, the machine does the same thing:
    • read the symbol (ain) at the cell where the machine is, and consult its internal state (sold);
    • fetch the matching rule (sold, ain, snew, aout, ±1);
    • write aout to the tape, and transition to state snew;
    • move right if the matching rule’s last column had a +1; left, if -1;
    • repeat this cycle unless snew is Halt, in which case the machine terminates. Whatever is on the tape is the program’s output.

What happens if the Turing machine never goes into the Halt state? It runs forever. This is generally considered undesirable. The computation doesn’t complete.

This is probably the biggest disconnect between Turing machines and the computers we actually use. Turing machines are supposed to halt. If one doesn’t, that’s considered pathological; its work isn’t done and as far as we’re concerned, it hasn’t computed anything. Meanwhile, the cell phones and laptops we use on a daily basis run in an infinite loop and that’s what we expect them to do. We expect them to be available (and I’ll formalize that much later, but not in this installment) but they never halt.

A Turing machine is all-or-nothing. Its job is to compute one function and then indicate that it’s done by going into the Halt state. For a contrast, a real-world computer, at the minimum has to respond to real-world inputs like the user’s keystrokes, its own temperature sensors (so it doesn’t run too hot), and power supply disruptions. Later on, I’ll show how to close this gap.

What’s neat about Turing machines is that, in principle, one could have been built in the late 19th century. (My work on Farisa has had be on a steampunk kick.) We were close: we had programmable looms, player pianos, and electricity. We had record players and magnetic storage. Today, a Turing machine good enough to emulate a 1980s video game console could be built with about $100 of commodity electronics. Rather than get into the details– it’s not my expertise– I’ll point the reader to Ben Eater’s excellent series of videos on the 8-bit computer he built on a breadboard. As he’s building an actual circuit, his model gives a much better representation of what computers actually do, in the physical world, than do Turing machines.

Anyway, an automaton is only as good as its ruleset. Most rulesets will have the machine pinging about at random– sound and fury, signifying nothing. A few, though, do useful things. A Turing machine can add two numbers, whether specified in binary or decimal that are supplied on the tape. These machines can multiply, or check regular expressions, or… well, literally anything computable. In fact, that’s one definition of what it means for something to be computable– they are legion, and they’re all equivalent.

It’s counterintuitive to most people, but the slowest computers from the 1960s can do anything a modern machine can– they would merely take longer. In terms of what computers can do, nothing has changed. If we allow computers to generate probabilistic bits, they even quantum computing does not add capabilities– quantum computers are merely faster.

From a practical perspective, computers and programming languages are not remotely equivalent. In theory, they are.

Now, Turing machines would be nearly useless as a real-world concept, say, if they required 2210,000 states in order to do useful computation. It would be annoying if there were computations that couldn’t be done with fewer states, because we have no way to store that much information. In fact, one can find fairly small n and s, and specific rulesets, that can emulate any Turing machine (any size, any ruleset) on any input at all. These are called universal Turing machines. I’m not going to go through the details of building one and proving it universal, but I’ll walk through the basic concepts, along two different paths.

We are not concerned with how efficiently the machines run– as long as they terminate, except on problems where no machine terminates. Real world computers are sufficiently different from Turing machines that the the (heavy) performance implications here are irrelevant.

  • First, a Turing machine’s read-fetch-write-transition-move cycle is mechanical. We can implement it over all (ns) Turing machines with a machine using sf(s), where f is a slow-growing function, states. We include the ruleset we want as an input– a lookup table– and our machine implements the read-fetch-write-transition-move cycle against that table instead.
  • Operating on k-grams of symbols allows us to use an n-symbol Turing machine to emulate an nk-symbol machine. We can in practice do any of this work with a 2-symbol machine.
  • An (n, s) Turing machine can emulate a Turing machine with a larger state space (say, s2 states) by writing state information to the tape. The details of this are ugly, and the machine may take much longer, but it will emulate the more powerful machine– by which, I mean that it will come to the same conclusions and that it will halt if the emulated machine does.

This approach isn’t the most attractive, and it has a lot of technical details that I’m handwaving away, but using those techniques, we can emulate, say, all the (n2,  s2) Turing machines using an (nf(n, s), kg(ns)) where f and g are asymptotically sub-linear (I believe, logarithmic) in their inputs. The result is that, for sufficiently large n and s, machines can be build that emulate all machines at some larger size– and, of course, a machine at that size can emulate an even larger one. The cost in efficiency may be extreme– one could be emulating the emulation of another emulator emulating another emulator… ad nauseum– but we don’t care about speed.

If that approach is unappealing, here’s a different one. It uses the symbols: {0, 1, Z, R, E,+, <, _, ~, [, ], and ?}– in two colors: black and red; 1, Z, E, and R will never be red. This gives us 20 symbols. The blank symbol is the black 0.

Here’s a series of steps that, if one goes into enough detail (I’ll confess that I haven’t, and the machines involved are likely wholly impractical) can be used to construct a universal Turing machine.

Step 1: establish that copying and equality checking on strings of arbitrary length can be done by a specific, small Turing machine.

Step 2: use a symbol Z and put it between two regions of tape at (without loss of generality) tape position 0. Use it nowhere else. Use a symbol R to separate the right side of the tape into registers. These will hold numbers, e.g. R 1 0 1 R 1 0 0 0 1 R 0 R means that 5, 17, and 0 are in the registers. Resizing the registers is tedious (everything to the right must be resized, too) but it’s relatively straightforward for a Turing machine to do. There will be an E at the rightward edge of the data.

Step 3: The right side of the Z stores a stack of nonnegative integers: 1s and 0s (representing binary numbers) separated by register symbol R. The left side stores code, which consists of the symbols {0, +, <, _, ~, [, ], ?}. Only code symbols can be red.

  • A possible tape state is: E0+++++0+0+?0+++Z 101 R 1 R 0 R 1 E. (Spaces added for convenience.) The left region is code in a language (to be defined); the red zero indicates where in execution the program is; on the stack we have [5, 1, 0, 1] with TOS being the righthand 1.

Step 4: A Turing machine with a finite number of states can be an interpreter for StackMan, which is the following programming language:

  • At initialization, the stack is empty. The stack will only ever consist of nonnegative integers. We’ll write stack left-to-right with the top-of-stack (TOS) at the right.
  • 0 (“zero”) is an instruction (not a value!) that puts a 0 on top of the stack, e.g. ... X -> ... X 0.
  • + (“plus”) increments TOS, e.g. ... X 5 -> ... X 6.
  • _ (“drop”) pops TOS, e.g. ... X Y -> ... X.
  • ~ (“dupe”) duplicates TOS, e.g. ... X -> ... X X.
  • < (“rotate”) pops TOS calls it n and then rotates the top n elements left. This may be the most tedious to implement. Examples:
    • ... X Y 2 -> ... Y X
    • ... X Y Z 3 -> ... Y Z X
    • ... X Y Z W 4 -> ... Y Z W X
  • ? (“test”) decrements TOS, then pushes a 1 on the stack, if TOS is nonzero; otherwise, it pushes a zero, e.g.:
    • ... 6 -> ... 5 1.
    • ... 0 -> ... 0 0.
  • This is a concatenative language, so instructions are executed in sequence one after the other. For example, +++ adds 3 to TOS, 0+++0+++ pushes two threes on it, _0 drops TOS and replaces it with a zero (constant function), and ?_?_?_ subtracts 3 from TOS (leaving a 0 if TOS < 3).
  • Code inside [] brackets is executed repeatedly while TOS is nonzero and skipped over once TOS is zero or if the stack is empty.
    • For example, 0+[] will loop forever because TOS is always 1.
    • The code [?_0++<+0++<]_ has behavior ... x y -> ... x + y. It’s an adder. For example, if the stack’s state is ... 6 2, it does the following:
      • The code in the brackets is executed. ? tests the 2, so we have 6 1 1, and we immediately drop the 1. The 0++< (“fish”) is a swap, so we have 1 6, and the + gives us 1 7. We do another 0++< and are back at 7 1.
      • The next cycle, we end up at 8 0; after that, TOS is zero so we exit our loop. With a _, we are left with ... 8.
  • Any instruction demanding more elements than are on the stack does nothing.

The interpreter for this language can be built on a Turing machine using a finite number of states. To keep track of the code pointer (i.e., one’s place in the stored program) while operating on the stack, color a symbol red. Make sure to color it black when you have moved on.

Step 5: show that any primitive recursive function Nn → N can be computed as a fragment of StackMan, taking the arguments from the stack; e.g.,

  • f(x, y, z) = x + y * z could be implemented a fragment with behavior ... x y z -> ... (x + y * z).

This isn’t hard. The zero functions and successor come for free (0, +) and the projection functions (data movement) can be built using _, ~, and <. Composition is merely concatenation– we get that for free by nature of the language. We can get primitive recursion from ? and principled use of [] blocks, and general recursion from arbitrary [] blocks.

Thus, a StackMan interpreter is a Turing machine that can compute any primitive recursive function.

Next, show that any computable function Nn → N can be computed as a fragment of StackMan that will terminate if the function is defined. (It may loop indefinitely where it is not.)

Step 6: since prime factorization is primitive recursive, we can go from lists of nonnegative integers to a single nonnegative integer, using multiplication (one way) and prime factorization the other way: e.g. (1, 2, 0, 1) ↔ 2* 3* 5* 71 = 126. This means that we can coalesce

Step 7: show that all (ruleset, state, tape) configurations can be encoded as a single integer. Then show that the Turing step (read-fetch-transition-write-move) and the halting check are both primitive recursive. These capabilities can be encoded as StackMan routines. (They’ll be obnoxiously inefficient but, again, we don’t care about speed here.)

Step 8: then, a Turing machine can be built with a finite number of states that:

  • takes a Turing machine ruleset, tape, and state configuration and translates it into a StackMan program that repeatedly checks whether the machine has halted and, if not, computes the next step. The read-fetch-transition-write-move cycle will be performed in bounded time. The only source of unbounded looping is that the emulated machine may not halt.
  • and, therefore, can write and run StackMan program that will halt if and only if the emulated configuration also halts.

Neither of these approaches leads to a practical universal Turing machine. We don’t actually want to be doing number theory one increment (+, in StackMan) at a time. Though StackMan can perform sufficient number theory to emulate any machine or run any program– it is, after all, Turing complete– it is unlikely that the requisite programs would complete in a human life. But, in principle, this shows one way to construct a Turing machine that is provably universal.

Human Computation

This installment is part of what was a larger work. I’ve decided to put it out in pieces. I titled it, “Why Turing Machines Matter”, but I had to start with a bunch of stuff that most people would think doesn’t matter– a stack-based esoteric language, some number theory review, et cetera. I haven’t yet motivated that this concept actually does matter. So, let me get on that, just briefly.

Mathematicians and logicians like Turing machines because they’re one of the simplest representations of all computers, and the state space and alphabet size don’t need to be unusually large to get a machine that can compute anything– although it might be slow. Alan Turing’s establishment of the first universal Turing machine led to John von Neumann’s architecture for the first actual computers.

Is it reasonable to assume that Turing machines perform all computations? Well, that’s one way that computability is defined, but it’s a bit cheap to fall back on a definition. It’s more accurate to look at the shortcomings of Turing machines and decide whether it’s reasonable to believe a computer can be built that overcomes them.

For example, some electronic devices are analog, and Turing machines don’t allow real-numbered inputs. Everything they do is in a finite world. But, in practice, machines can only differentiate a finite number of different states. There’s no such thing as a zero error bar. Not only that, but quantum mechanics suggests that this will always be the case. For example, there are an infinite number of colors in theory, but humans can only differentiate a few million under best-case circumstances, and we can only reliably name about a hundred. It’s the same for machines: measurements have error. Of course, an infinite state space isn’t allowable either: that would be analogous to infinite RAM.

So, those shortcomings of Turing machines apply to all computers that we know– including (in a different way) the quantum computers humans know how to build.

Turing machines, as theoretical objects, can’t do I/O. The input exists all at once on the tape, and output is produced– and until that output occurs, no computation has been completed. One alteration to account for this is to allow the Turing Machine an input register that other agents (e.g., keyboards, temperature sensors, the camera) can write to. When the computer is in a Ready state, it scans for input and reacts appropriately. If the machine reaches Ready within a finite time interval, that is analogous to successfully halting– the software itself may be broken, but the machine is doing its job.

In truth, modern computers are more accurately modeled as systems of interacting Turing-like machines than single machines– especially with all the multitasking they have to do to support users’ demands.

There is one thing Turing machines don’t do that we take for granted, although it’s a bit of a philosophical mess: random number generation. Turing machines don’t model it: everything they do is deterministic, and “random” is not a computable function (or a function at all). Real computers most often use pseudorandom number generators (PRNGs)– which are predictably (but ideally without pattern) “random”– and Turing machines can implement any of those. Truly random? Well, we don’t fully know what that is. We can get “random enough” with a PRNG or from some input that we expect to be uncorrelated to anything we care about (e.g. atmospheric noise, radioactive decay).

Turing machines give a poor model of performance as described here. To access data at cell 5,305, from cell 0, the machine has to go through every cell in between. That’s O(N) memory access, which is terrible. Luckily, real computers have O(1) memory access, right? That’s why it’s called random access memory, eh? Well, not quite. Caching is too much of a beast for me to take on here, but I would argue this far: a Turing machine with a 3-dimensional tape– I haven’t gotten into this, but a Turing machine can have any dimensionality and be computationally equivalent– is more faithful model for performance. Why? Well, our best case or random access is O(N1/3). . We can call random access into a finite machine O(1), but that’s moving the goalposts. Asymptotic behavior is only about the infinite, and the real world is constrained by the speed of light. If have a robot moving around a 3-dimensional cubic lattice where each cell is 100 microns on a side (no diagonal movement) and we want each round trip to complete in one nanosecond (30 cm) then we are limited to 125 trillion cells. Going up to 1 quadrillion would double our latency. Of course, we’re ignoring the absurdity of a robot zipping around at relativistic speeds.

Happily, most computers don’t have the moving part of a robotic tape head (although a traditional hard drive may be analogous). Rather than the computation going to the data (in the model of a classical Turing machine) they, instead, bring the data to the chip. Electrical signals travel faster than a mechanical robot, as on a literal Turing machine, could (without catastrophic heat dissipation). So, in this way, modern computers and Turing machines are quite different.

If anything, I’d make a different claim altogether. Turing machines aren’t a perfect model of what computers do– although they’re good enough to explain what computers can (and can’t) do. They are, perhaps surprisingly, a great representation of what we do when we compute.

Before “a computer” was a machine, it was a person whose job was to perform rote operations– addition, subtraction, multiplication, division, elementary functions, and moving data around– which is, as it were, all today’s computers really do as well. And how does a human compute, say, 157,393 * 648,203? Most of us would have to reach for paper– a two-dimensional Turing tape– and start going through rote operations. To transliterate schoolbook multiplication to be done by a Turing machine is tedious but not hard– there are a couple thousand states.

The plodding Turing machine isn’t “about” computers. It’s about us, moving around a sheet of paper with a pencil and eraser, as we do– at least, when we know we’re computing. Most of what we do, we don’t think of computation at all. We’re not even aware of computation happening.

It’s an open question whether there’s a non-computational element to human experience. I tend to be unusual– by the standards of, say, Silicon Valley, I’m downright mystical– and I think that there is. I can’t prove it, though. No one can.

The difference between intuition and computation is that the latter happens by rote, from a precisely-understood, finitely-describable state, following a series of rules that require no judgment. Intuition can’t be checked; computation can.

Most mathematicians use informal proofs– verbal arguments that convince intelligent, skeptical people that a conclusion is valid. This is a social rather than algorithmic process, and it is not devoid of error. Informal proofs can be unrolled into formal proofs from ZFC, it is generally believed, but it would typically be impractical to check. An informal proof is an argument (using other informal proofs) that a formal proof exists, and although the informal proof is imperfect– of course, 100-percent perfection in computation is not physically possible, either– it usually gives more insight into the mathematical structure than a formal one would.

Do humans have non-computational capabilities or elements to our existence? I believe so. But, in terms of what we can communicate to each others with proof– that is, checkable computation– we are limited to finite strings of finite symbols, an agreed-upon initial state, and a finite set of rules. At least in this life, that’s the best we can prove.

Next Up

In the next installment, I’m going to show how to build a Turing machine that’s practical.

Aggressively minimal universal Turing machines– with, say, only 10 states and 5 symbols– tend to be next-to-impossible to understand. I’m going to work with a large-ish state space and alphabet: 512 symbols and 248 possible states (even though we’ll only use about a million). Those numbers sound beastly, and to implement the Turing machine as a lookup table would require 1,884,160 terabytes. At such a size, storing the entire ruleset is cost-prohibitive. Most rulesets for those parameters are patternless and unmanageable, but a ruleset that we’d actually want to use is likely to be highly patterned– allowing rules to be computed on the fly. In fact, that’s what we’ll have to do.

In the second installment, we’ll build a Turing machine about as capable as a 1980s video game console (e.g. Atari, Nintendo) that’ll be much easier to program against. That’s up next.


Viewing all articles
Browse latest Browse all 304

Trending Articles