Tuesday, June 20, 2017

Every once in a while, imagine starting over

Here’s a suggestion for when you’re working on a long, complicated project. Every once in a while, work through the thought experiment of starting your project over. Re-examine every aspect on a fundamental level, relating the complicated abstractions and mental shortcuts to your basic guiding principles. Given everything you learned in the course of the project so far, would you do it the same way again if you started from the very beginning?

This thought experiment could have many different outcomes. When you started the project, you guessed at the right direction, but the benefit of hindsight may suggest a better way. Parts of the project may be kept around only because of the sunk cost fallacy. Maybe you’ve learned more about what the requirements really are. Maybe it’s time to kill this project and start a new one. Maybe the project is too long or too complicated and should be broken down into several smaller projects. Maybe your plans for the rest of the project need to change. Or maybe you can have more complete confidence in your current approach. Most likely you will find at least a few small things to tweak.

Let me give you a hypothetical example. Suppose that you have a software system with two threads: one that produces data objects and one that consumes them. You do extensive performance profiling and find that both threads run too slowly, so you spend some time speeding up both threads. Eventually, you speed up the consumer thread until it only takes 12 CPU instructions per incoming data object. If you were under time pressure to deliver a critical patch, then you’d be happy with your optimizations and you would ship what you had. But if you thought through what you would do if you were to start over, then you would probably decide that using a consumer-producer multithreading architecture is the wrong strategy entirely because a single thread is good enough.

Around the start of 2013, when I was working on my PhD, I was wandering in circles around a particular problem: I wanted to discover the conditions required for a pair of Q-learning agents to learn to take turns in a simulated context. I parameterized the agents’ reward functions and tried to discover a pattern that would compactly describe how to incentivize turn taking. I went down a few blind alleys: genetic algorithms, linear decompositions and staring blankly at a simple visualization of the problem. Eventually, I realized that game theory is the correct approach for analyzing this kind of multi-agent scenario. At that point, I revised the entire plan for my PhD so that the next part was focused almost entirely on game theory.

Most of your time must go to detail work, so when is it time to step back and look at the big picture? Unfortunately, you probably won’t reach an epiphany moment where you know its time to rethink the project, like I did with my PhD. No tree will scream “this is the wrong part of the forest.” At the start of this essay, I suggested that you imagine starting over "every once in a while,” but a definite schedule may be helpful. How about 1 January each year? Or the first day of each month? What projects are you working on now? Do you have one where you’ve focused exclusively on details for a long time? Maybe now is the time to imagine starting that project over

Tuesday, April 25, 2017

The Anti-Jefferson Bible

"The Life and Morals of Jesus of Nazareth" by Thomas Jefferson, also known as the "Jefferson Bible," is a cut-and-paste work derived from the New Testament gospels, where the life and teaching of Jesus is (sort of) preserved but the miracles are taken out. In this essay, I consider how the Jefferson Bible could be compressed using the canonical gospels as a codebook, and I present a new sacrilegious work, jointly derived from the canonical gospels and the Jefferson one: The Anti-Jefferson Bible, a satirical book with all the gospel verses that Jefferson excluded and only those verses.
Painting of Thomas Jefferson by Rembrandt Peale
I first encountered the Jefferson Bible when my grandpa sent me a print copy of the book, along with a few other pseudo-Christian humanist titles. I found the Jefferson Bible rather boring, because my familiarity with the canonical gospels caused the book to have a low per-page entropy. That is, I already knew the story; the only interesting part was seeing what Jefferson chose to leave out. But just how much information is there in the Jefferson Bible?

The Jefferson Bible is a redaction of the King James Bible, taking verses exclusively from the four gospels: Matthew, Mark, Luke and John. These books are in the public domain and you can download them from Wikisource:
When I transmit a message, I can compress that message more if the recipient of my message already shares some information with me. For example, if I know that my grandpa has a copy of WinZip, then I could sent him a file that uses the zip compression format, which uses fewer bits. Frequently, when we encode something, we just assume that the other end knows the format because we chose ubiquitous file formats: we have pre-shared knowledge of the encoding. I will estimate an upper bound on how much information Jefferson contributed to the world based on the information required to encode the Jefferson Bible, given the King James Bible as pre-shared knowledge. Copies of the King James Bible are much more common than copies of the Jefferson Bible, which half-justifies this approach.

Matthew, Mark, Luke and John comprise 3778 verses in the King James Bible. For each verse that Jefferson copied in whole, I will consider that he has added a full 12 bits of information: enough to choose a verse in the canonical gospels. The Jefferson Bible includes some verses only in part; I will encode those verses as straight ASCII, at 7 bits per character. The Jefferson Bible has 1028 verses: 450 from Matthew, 94 from Mark, 338 from Luke and 146 from John. Actually, some verses in the Bible are exact copies of other verses, so alternative counts are possible.1 Thirty-eight verses are partial matches. With this encoding the Jefferson Bible requires 29520 bits or 3.6 kilobytes. If we encode all 116037 characters with ASCII, then we require 812259 bits or 99 kilobytes. My scheme achieves a 3.6% compression ratio. All this is to say: Thomas Jefferson did not add much information to the corpus of human knowledge in his humanistic redaction!

The figure below graphs the relationship between verses in the Jefferson Bible and where those verses are found in the canonical Gospels. Observe that much of the graph gently slopes upward: most verses in the Jefferson Bible are followed by their successor in the canonical Gospel. I could have compressed the Jefferson Bible even further if I choose a more clever encoding scheme that made use of this knowledge!


Wikipedia tells us that "Jefferson's condensed composition is especially notable for its exclusion of all miracles by Jesus and most mentions of the supernatural, including sections of the four gospels that contain the Resurrection and most other miracles, and passages that portray Jesus as divine.” In an attempt to quantify Jefferson’s tendencies, I computed the relative frequencies of each word that appears, both in the canonical gospels and in the Jefferson Bible. Some words are much more likely to occur in the Jefferson Bible, than in the canonical gospels, while others are much less likely in the Jefferson Bible. Below is a table listing 140 words:
  • the 60 words that have the greatest increase in frequency in Jefferson Bible (relative to the canonical Gospels),
  • 20 words that occur with almost equal frequency
  • and the 60 words that have the greatest decrease in frequency in the Jefferson Bible (relative to the canonical Gospels).

More likely in Jeffferson Bible Almost equal More likely in canonical gospels
yeorwouldlittleprophetnamethe
forservantlikewiseourknowwordof
oneyougoodcommandmentmultitudeupwas
hissheepsocupbelievednowthey
notwentotherseethhealedinson
yourshaltmorefeedingherhathand
thyPhariseesthinecandlestickchristsayingme
beevileyevipersdevilouthe
thouaretalentsmurderdisciplesspiritJesus
tointoputpowerssentpeopleGod
theethenfirstdoorsmightcomewere
asayjudgeAlphaeusseensawwe
givehousehypocritesforsookbeholdshipthat
thereforeservantsheadtemptedfulfilleddevilsfather
allneitherprayfieldswrittenfrombelieve
iffeetonprophesymyseasee
willcertainscribesbeheadedwhichwhilethem
alsonorfaithfultrusteyesbegatJohn
havecalledbutphysicianmanysaidsaith
antwohimselfforsakenIspeakworld

We see that ‘believed,’ ‘fulfilled,’ and ‘healed’ are more likely in the canonical Gospels, while all the words that are more likely in the Jefferson Bible are plain and devoid of spirituality-related meaning, like ‘one,’ ’thy,’ and ‘give.’ Interestingly, ‘prophet’ is more likely in the canonical gospels, but ‘prophesy’ occurs with roughly equal frequency in both.

We have a window to Jefferson's heart in what he left out. Presumably, he could have written a short essay describing how the gospels are useful for learning morals but that the miracles can’t be trusted. Instead we have the Jefferson Bible, which, in some sense, is a polemic against the supernatural in Christianity that pretends to replace the gospels. Regardless of whether or not Jefferson is right to reject the supernatural on a factual level, his approach is ridiculous. To satirize the Jefferson Bible, I produced the “Anti-Jefferson Bible”, which includes all the verses in the canonical gospels that are not present in the Jefferson Bible. Where the Jefferson Bible included part of a verse, then the Anti-Jefferson Bible includes the remainder, even when that remainder is not a complete sentence.

From a technical literary perspective, the Jefferson Bible is a gospel harmony. An early gospel harmony was Tatian’s Diatessaron; modern attempts include various Bibles that are reordered ‘chronologically.’ Jefferson’s work is most akin in form to the heretic Marcion, who introduced his own canon list which included a version of Luke’s gospel that was edited to fit his gnostic theology. There are no new heresies! The Anti-Jefferson Bible is not a thoughtful gospel harmony, and given copies of the Jefferson Bible and the King James Bible, the entropy of the Anti-Jefferson Bible is tiny indeed. I make no claim that the Anti-Jefferson Bible has novel information. My work’s highest possible hope is to highlight the absurdity of Jefferson’s redaction.

Notes

1. Because the gospels have redundancy, we can compress them, too. Matthew and Luke seem to draw heavily from Mark, as well as from a hypothesized source, Q. I’m curious as to the entropy of the gospels themselves, which would be a useful point of reference here.