What the $!+@.* is REGEX?

A Regex History of Modern Computing

“Perl Problems” | Credit: XKCD

REGEX, a shortening of “regular expressions,” is a syntax for defining elaborate search terms. Rather than simply matching characters one-to-one, REGEX matches sequences of characters with patterns in strings of text. REGEX has powerful applications in text search, including find/find-and-replace operations and input validation.

The history of REGEX begins in the 1950s, near the advent of modern computing. In fact, the history of regex percolates throughout the history of modern computing, popping up time and again at many of the key moments.

This blog traces that history — from its heady origins at Princeton to adoption worldwide — by way of three critical players: Stephen Kleene, who, in the 1950s, invented REGEX; Ken Thompson, who, in the 1970s, promoted REGEX via the UNIX operating system; and Henry Spencer, who, in the 1980s, unified REGEX variations into the modern form we know today.

While this blog will not help fix your (or my) regex problems, at least now we know who to blame.

Stephen Kleene: The Inventor

Stephen Kleene, of the eponymous Kleene star), studied at Princeton in the early 1930s under the estimable mathematician Alonzo Church (known for the Church-Turing thesis). Returning to Princeton in 1939 as a visiting scholar at the Institute for Advanced Study, Kleene developed recursion theory alongside Church, Turing, and the incendiary logician Kurt Gödel.

In a 1956 paper entitled Representation of Events in Nerve Nets and Finite Automata, Kleene introduced the idea of “regular expressions” as a notation equivalent to finite automata (which are single-state, meaning memoryless, computational models also known as finite-state machines) for defining regular language (meaning formal languages with finite characters). This equivalency is known as Kleene’s theorem.

Ken Thompson: The Popularizer

Ken Thompson studied at Berkeley in the early 1960s under the mathematician and game theorist Elwyn Berlekamp.

In 1966, Thompson began working for Bell Labs on Multics (for Multiplexed Information and Computing Service), a precursor to Unix — which Thompson, alongside Dennis Ritchie (creator of the C programming language), would go on to invent in 1971. In a truly fateful turn of events, Thompson actually developed Unix in order to continue playing a Multics based game, Space Travel, after Bell abandoned the Multics project (the name “Uni-x” is itself a riff on “Multi-cs”).

Thompson’s developments for Unix include such computing mainstays as the command line interface, processes, and hierarchical file systems. In addition, Thompson updated a contemporaneous text editor, QED (for “Quick Editor”), to incorporate regular expressions. QED then formed the basis for ‘ed’, Thompson’s standard text editor in Unix, disseminating REGEX across the burgeoning computer industry.

Henry Spencer: The Unifier

Henry Spencer worked as a UNIX systems programmer while studying at the University of Saskatchewan and, later, the University of Toronto. There, in 1981, Spencer launched the first usenet site outside the United States (as a precursor to the world wide web, usenet is one of the oldest digital communication networks still seeing widespread use). In 1983, while still at Toronto, Spencer became a founding member of the Canadian Space Society.

By the 1980s, computer systems and software libraries were adopting REGEX en masse, and incompatibilities between different REGEX versions began to arise. In 1986, Spencer developed a non-proprietary REGEX library that he released to near universal adoption (Perl, Tcl, MySQL, PostgreSQL, and C++ all implemented Spenser’s library), becoming the standard REGEX much as we know it today.

“When you measure include the measurer”⠀–MC‏‎‎‎‎ Hammer