Thursday, November 22, 2012

My PhD dissertation summary


Today I submitted my PhD dissertation for examination. The title of my dissertation is: ‘Measuring and Influencing Sequential Joint Agent Behaviours.’

The essential thesis of my research is that:

Algorithmically designed reward functions can influence groups of learning agents toward measurable desired sequential joint behaviours. 

The thesis is demonstrated with research explaining how to measure a particular sequential joint behaviour, turn-taking, how to identify rewards that are conducive (or prohibitive) to turn-taking by learning agents in a simulated context and how to design rewards that incentivise arbitrary sequential joint behaviours in multi-agent stochastic games.

Informally, the thesis is about activities performed together through time by a group of agents that figure out how to do things better as they go. An agent could be a person, a robot or a computer program. We mathematically explain how to get the overall outcomes we want by telling the agents what they should individually want. Because we do this mathematically, we need to measure the things we want our group of agents to do. This dissertation explains some new ideas about how we can measure how well a group of agents is taking turns, how we can guess whether or not pairs of a certain kind of robot-like computer programs will take turns, and how we can tell individual agents what they should want so that they collectively end up doing something that we want, for some situations.

My dissertation includes most of two journal papers that I published, plus other bits that I’m planning to submit as another journal.

One of the things I studied was simulated agents communicating and learning from rewards.




Friday, November 16, 2012

Nash and Bourne

glock
Violent intrigue can be analysed mathematically
Recently, I finished reading "The Bourne Ultimatum" by Robert Ludlum, an energetic, intriguing tale of violent, manipulative men head-to-head in a struggle of life and death. My definite impression was that Robert Ludlum's Bourne series is a realistic portrayal of the total opposition that can be mathematically modelled as a 'Nash equilibrium.' Read on to get the details.

Many of you will know of Jason Bourne by the series of movies where Matt Damon plays the amnesic protagonist. However, the book series has a different plot and a different mood. In both cases, Jason Bourne is an intelligent man of violence that outguns and outwits his opponents in deadly struggles.

Matt Damon
Matt Damon plays Jason Bourne
At the same time as I read the Bourne books, I was studying mathematical ways to understand groups of opposing agents. The concept of a 'Nash equilibrium' is one mathematical way to understand the theory behind violent strategies. A Nash equilibrium is a situation where each person is acting to as to maximise his or her own good, given the actions of everyone else. A Nash equilibrium can be a good thing, where everyone is helping each other, or a Nash equilibrium can be a bad thing, where each person is at the others' throats. In the case of Jason Bourne, the Nash equilibrium is always one where the two masters of intrigue are trying to kill each other. There can only be one winner in the Bourne series.

Without giving too much of the books away, Jason Bourne opposes Carlos the Jackal, the leading international assassin. One will set a trap for the other and the other will 'reverse' the trap, and the one narrowly escapes. Neither man's appearance is clearly known by the other and they both enlist pawns to fight against each other. Jason Bourne threatens, bribes and takes every possible extreme measure in order to defeat the Jackal. If you have only seen the movies, then consider how Bourne tricks and outwits his opponents for his own ends.
Nash
John Nash was a revolutionary mathematician
The overall impression I got from the books was 'Oh, this is what a purely competitive Nash equilibrium really looks like.' The winner is not the strongest, the fastest or the man with the best weapons, but the man who thinks the extra step ahead. If you can foresee what your opponent will do, then you can defeat him. But your opponent will try to foresee what you will do. So you must think N+1 steps ahead. In fact, to win, you must be unpredictable. The mathematical solution of a purely competitive game is to randomise over all of your possible actions. In practise, you become an unstable psychopath who commits apparently arbitrary acts of violence without any discernible pattern. I read a lot of action books, but most of the bad guys are stupid or have some other drastic failings. Jason Bourne's opponents seem much more closely matched.

Life can approach the theoretical abstraction of a Nash equilibrium, but game theoretic methods often provide exact answers to slightly the wrong questions. This can make game theory blindingly addictive to some, as Venkatesh Rao observes. In the areas of game theory that I've studied, people often assume that the people have a finite number of actions available to them. However, in practice, the number of devious things that you can do to someone else is limited only by your imagination. Game theory can give us some intuition, but it's probably best to let Robert Ludlum fill in the details. (Actually, stop that thought process right there, just in case you think of a new way of inflicting harm.)

In "How the Mind Works," Steven Pinker gives some good explanations of why people are emotional and do crazy, violent things that seem irrational from some perspectives. In "The Better Angels of Our Nature: Why Violence Has Declined," Pinker explains why people have become much less violent with time. Contrary to the constant whine of the alarmists, not all our morals are bad and getting worse. Part of the reason for the decline in violence is a change in the prevailing Nash equilibrium. We are now more incentivised to be peaceful and non-violent, which is good for all of us. Centrally administrated justice helps us all be more charitable. The real life Carlos the Jackal is serving time in a French prison and Jason Bourne is best approximated by Matt Damon, who co-founded Water.org to help the poor get better access to water.

Sunday, November 4, 2012

How to be an Awesome Postgrad Student

Here are a few tips on getting through a PhD or a Master’s, based on my experience doing a PhD in Electrical Engineering at the University of Canterbury. I’ve had a good time so far, and I hope that you can have the best experience possible. This document is based on ideas that I gathered from other written works on how to be an awesome postgrad, my supervisors, friends, parents, wife and various talks that I attended as a postgrad. I hope you can translate any discipline-specific details into your own field. Your PhD experience probably be unlike mine, especially if you are not doing electrical engineering. Get advice from wise scholars in your discipline on how to do be an awesome postgrad student in your context.

Saturday, October 27, 2012

When can I expect dinner?

Imagine that I am driving from my in-laws house in Ashburton to my own house in Christchurch and that afternoon is waning closer and closer to dinner time. (Recently, this really happened to me.) I start to wonder: how much time do I expect between now and dinner? At first glance, this is an innocent question that can be mentally approximated with enough accuracy to be useful. I can formalise the process happen in my head by expressing the expected time until dinner as the sum of the delays caused by events that may occur between now and dinner (Eq. 1):$$t = \sum_{e \in E} p(e)d(e)$$where \(t\) is the time until dinner, \(E\) is the set of all possible events between now and dinner, \(p(e)\) is the probability that event \(e\) will occur and \(d(e)\) is the additional delay caused by event \(e\). The probability of each event depends on my own choices so I can use Eq. 1 to run 'what if' scenarios in my head. If I choose to stop my car by the side of the road, then I expect dinner to be later than if I continue driving at the speed limit until I reach home. On the other hand, if I stop at a cafe along the way, then I am likely get tea sooner. In general, I only need approximations that are accurate enough for making decisions in life, and I need not worry about the precise philosophy of mathematics that stands behind the approximation. But there are hidden catches in Eq. 1.

Suppose a drunk driver going south swerves onto the wrong side of the road and kills me in a head-on collision. (Keeping to one's side of the road is a life-and-death argument for multi-agent coordination!) This event has non-zero probability of occurring before my dinner time. But for the purposes of  Eq. 1, how long would the event of my death delay my dinner? Because I never get dinner, arguably the delay \(d(\textrm{my death})\) is infinite. But even one infinite delay paired with any non-zero probability means that the expected time until my dinner is also infinite. If I claim that the 'real answer' is always infinity, then I lose the ability to make judgements based on my calculations; the expected time until dinner is the same if I drive home at 10 kilometers per hour over the speed limit or if I stop to take a nap in a nearby paddock.

Instead of assigning \(d(\textrm{my death}) = \infty\) maybe I could say that dinner is delayed only up until the point of my death; that would mean that I am effectively computing the expected time until dinner or my death, whichever comes first. That seems like a reasonable way to resolve the problem of an infinity in the equation and it will produce finite answers because I am sure to die eventually, notwithstanding events that could prevent my death, such as the return of Christ (The Bible, Matthew 24:31) or a technology singularity. However, the technology singularity is another thing to ponder in conjunction with the mathematical difficulties surrounding infinity and the probability of Christ's return before dinner or before my otherwise certain death is theologically guaranteed to be uncomputable: 'concerning that day and hour no one knows' (Matthew 24:36, ESV).

Alternatively, consider another situation that is slightly less unfortunate: I doze off behind the wheel and crash my car into a power pole on the side of the road, but I survive. I might miss dinner entirely while the doctors put my bones back in their original places but get fed breakfast in hospital the next day. Does that \(d(\textrm{I miss dinner that day})\) count as infinity if I get dinner, or say breakfast, the next day? Do I actually mean to compute the expected time until my next meal? If I keep the original question, should I count until dinner the next day as the 'time until dinner' or should I count situations like that as also being an infinite time until tonight's dinner? Perhaps a better way to resolve these dilemmas is to use Bayes' theorem and ask a more restricted question: 'Assuming that I will eat dinner tonight, when do I expect to eat dinner?' Again, this will avoid the infinities, but the question has been changed.

I could look at my question in other ways, too. Perhaps the informal use of the word 'expect' should actually be interpreted as referring to the mode of the distribution of times that I might wait until I get dinner. Or maybe the median. Maybe I should exclude all outliers that have \(p(e) < \phi\) for some arbitrary, small event probability \(\phi\), replacing the original question with 'How long until dinner, assuming nothing too unexpected happens?' Because \(\phi\) can take any value, I now have an infinite number of possibilities. Before I can start computing an answer, I face the difficult problem of how to refine the question.

While this discussion may seem excessively philosophical, in fact this kind of problem must be solved every day by professional computer programmers. A program's requirements may state a subroutine computes the 'time until my dinner.' How does one deal with unusual cases? Many programmers will simply ignore their existence out of laziness or error. Others will write code that explicitly identifies odd cases but still fails to function in a meaningful way. Maybe the limitations of floating point numbers and finite sets in a computer will shine serendipitously on the problem, but maybe not. Maybe the rest of the program has no way of considering the possibility of my death and therefore the subroutine will not encounter that case, but maybe that is a flaw in the rest of the program. How does one meaningfully substitute answerable questions for unanswerable ones? Is that even possible in general?

Tuesday, July 10, 2012

New open source projects

My research has generated two new open source projects recently:
Wavechild 670 (GPL license)
A wave digital filter-based emulation of a famous stereo tube limiter from the 1950's.

Turn-taking measurement tools (BSD-style license)
Tools for measuring the quantity of turn-taking present in the behaviour of a group of agents.

The links also give the references for the associate research. Part of the idea is that academic research, open source and Wikipedia are all on a massive collision course for Knowledge 2.0. Or maybe more accurately, Knowledge N+1, because open source and Wikipedia are constantly changing. Soon the academic topics will be as open and fluid as say, the Linux kernel. Some smart person will probably be the spearhead of each topic, but there will be many contributors and the development of ideas will be out in the open.

Peter