Friday, 26 April 2013

Invitation to play a game

So I've blogged about the two thirds of the average game quite a few times now. The latest pos that is kind of a summary of the other posts can be found here.

This post is however a bit different. I'm teaching a new game theory course next year and am busy preparing that. I'm planning on using various interactive games to help with my teaching. As a result I've been figuring out google's app engine so that I can host some of these games online.

The result of this is that I've put together an online open version of the two thirds of the average game that I'd really appreciate you taking the time to play.

The website is: twothirdsoftheaveragegame.appspot.com/ and it will take you 3 minutes to make a guess (you're welcome to guess a bunch of times, only your last guess will count).

Thanks to +Leanne Smith, +Zoe Prytherch, +Izabela Komenda, +Penny Holborn and +Angelico Fetta for testing it for me. Hopefully the bugs are all gone :)

I'll let this run for a week and pick the winner(s) on the 3rd of May at 1200 GMT. That's a week away. I would really aprpeciate you taking the time to play and can't offer much to the winners a part from being named (if you would like me to) in my blog post I write next week :)

So please do take the take to guess, more details about the game itself can be found at the site:

twothirdsoftheaveragegame.appspot.com/

All the code for the site can be found at this github repo.

Sunday, 21 April 2013

Making animated gifs

This is a short post in my series of "write it up in a blog post so that I don't forget how to do it":

In this post I'm going to briefly show how to create animated "gifs" easily on a *nix system. I've recently stumbled upon how easy it is to do this and here's one from this blog post:

This is all done very easily using the convert command from the imagemagick command line tool.
In this post I showed how to carry out a simple linear regression on a date stamped data set in python. Here's a slightly modified version of the code from that post:

#!/usr/bin/env python
from scipy import stats
import csv
import datetime
import matplotlib.pyplot as plt
import os


# Import data
print "Reading data"
outfile = open("Data.csv", "rb")
data = csv.reader(outfile)
data = [row for row in data]
outfile.close()

dates = []
numbers = {}
for e in data[1:]:
    dates.append(e[0])
    numbers[e[0]] = eval(e[1])


# Function to fit line
def fit_line(dates, numbers):
    x = [(e - min(dates)).days for e in dates]  # Convert dates to numbers for linear regression
    gradient, intercept, r_value, p_value, std_err = stats.linregress(x, numbers)
    return gradient, intercept


# Function for projection
def project_line(dates, numbers, extra_days):
    extra_date = min(dates) + datetime.timedelta(days=extra_days)
    gradient, intercept = fit_line(dates, numbers)
    projection = gradient * (extra_date - min(dates)).days + intercept
    return extra_date, projection, gradient


# Function to plot data and projection
def plot_projection(dates, numbers, extra_days):
    extra_date, projection, gradient = project_line(dates, numbers, extra_days)
    plt.figure(1, figsize=(15, 6))
    plt.plot_date(dates, numbers, label="Data")
    plt.plot_date(dates + [extra_date], numbers + [projection], '-', label="Linear fit (%.2f join per day)" % gradient)
    plt.legend(loc="upper left")
    plt.grid(True)
    plt.title("Rugby Union Google Plus Community Member Numbers")
    now = max(dates)
    plt.savefig('Rugby_Union_Community_Numbers-%d-%.02d-%.02d.png' % (now.year, now.month, now.day))
    plt.close()

print "Projecting and plotting for all data points"
for i in range(3, len(dates) + 1):
    dates_to_plot = [datetime.datetime.strptime(e, '%Y-%m-%d') for e in dates[-i:]]
    numbers_to_plot = [numbers[e] for e in dates[-i:]]
    plot_projection(dates_to_plot, numbers_to_plot, 365)

The output of this script (if anything is unclear in there just ask) is a series of png files each of which represents a fitted line to for the quantity of data available up to a given date (made using matplotlib). Here are a few of them:
1st 2nd last

Here's the imagemagik convert command to take all these png files to gif:

convert *.png animation.gif

Which gives:

If we want to delay the transition slightly we write:

convert -delay 50 *.png animation.gif

Which gives:

Here's an extremely short screencast demonstrating this:

Wednesday, 10 April 2013

Two thirds of the average game

The two thirds of the average game is a great way of introducing game theory to students.

I've used this game on outreach events with +Paul Harper, at conferences with +Louise Orpin (when describing my outreach activities with other academics) and also in my classroom. I've blogged about the results before:

School kids
Postgraduate students
OR 54 conference delegates
MSc students (Note that one point of this post is to show better graphs than the ones on those posts)

The definition of the game from the wiki page is given here:

"In game theory, Guess 2/3 of the average is a game where several people guess what 2/3 of the average of their guesses will be, and where the numbers are restricted to the real numbers between 0 and 100, inclusive. The winner is the one closest to the 2/3 average."

I run this game once without any instruction. I just explain the rules and let them fill in the form in front of them (contained in this repo).

I then bring up slides discussing how iterated elimination of weakly dominated strategies leaves the "rational" strategy to be that everyone guesses 0.

After this I invite the participants to have a second guess. In general the results are pretty cool as you see the initial shift towards equilibrium. The fact that the winning guess in the second play of the game is in fact not 0 gives an opportunity to discuss irrational behaviour and also what would happen if we played again (and again...).

I've run this game a couple of times now and here's the graph of the histogram showing the results for all guesses I've recorded.

You certainly see that the guesses move after rationalising the strategies. Funnily enough though no one has guessed 100 during the first guess but a couple of people have in the second guess. I assume this is either some students trying to tell me that I'm boring them, trying to help a colleague win or I'm perhaps not doing a great job of explaining things :)

Here's some graphs from the individual events that I've collected data for:

A talk I gave during a high school revision week:

An outreach event with the MSI:

MSc students from the 2012-2013 cohort:

Academics at a talk on outreach at OR54 (the OR society annual conference):

The OR society student conference SCOR2012:

Academic at a talk on outreach at YOR18 (The OR society conference for early career academics):

Despite the low number of participants in some of the events they all show a a similar expected trend: the second guesses are lower. Surprisingly on one occasion the second winning guess was actually 1! This is a very quick move towards the theoretical equilibrium.

These experiments have been done on a much wider scale and in a much more rigorous way than my humble "bits of fun" for example the coursera game theory MOOC collect a bunch of data on this game and there's also a pretty big experiment mentioned in this talk:

Having said that I can't recall if other experiments look at successive guesses following an explanation of rational behaviour...

Github repo

https://github.com/drvinceknight/two_thirds_of_the_average_game

I've put all the code used to analyse a game on github. It's a simple python file that can be pointed at a directory or a file and will produce the graph (using matplotlib) as well as spit out some other information. The data set for all these events (the first graph on this post) is also on there and if anyone would like to add to it please let me know :)

(There's also the handout that I use to get the student answers in the repo)

(If you found your way here because of interests in game theory this previous post of mine listing free OSS software for game theory might be of interest)

Monday, 8 April 2013

Using sed to make sure hyper links refer to the correct format of file when using pandoc

In a previous post (which is by far the most popular thing on this blog which is mainly due I believe to a kind retweet by +John Cook and SciPyTip) I describe the makefiles I use to write large documents containing Tikz diagrams and +Sage Mathematical Software System plots in multiple formats (pdf, html and docx) using markdown and pandoc.

This is a short follow up post to fix one minor thing. In the previous post if you created hyper links in your markdown file (say pointing to another markdown document, which for my purposes was multiple chapters in some teaching notes) then all the formats (pdf, html and docx) would all point to the same format.

Here's a slight modification of the pandocipy_given_file.py script (from the previous post) which will use sed to replace the .md to the relevant format in all links.

#!/usr/bin/env python
from sys import argv
from os import system

e = argv[1][:-3]
print e

system("sed 's/.md/.html/g' %s > tmp" % (e + ".md"))
system("pandoc -s tmp -N -o " + e + ".html --mathjax")
system("sed 's/.md/.docx/g' %s > tmp" % (e + ".md"))
system("pandoc tmp -o " + e + ".docx")
system("sed 's/.md/.pdf/g' %s > tmp" % (e + ".md"))
system("pandoc tmp -N -o " + e + ".pdf --latex-engine=xelatex")
system("rm tmp")

This script creates a temporary file (tmp- this is actually important to make sure that the makefile doesn't think that you're recurrently changing the md files) that sed makes sure has the correct format for each link and pandoc can use to get the required files. For example the sed 's/.md/.html/g' command will replace all instances of ".md" with ".html". If you're not familiar with sed it's a *nix program that allows you to do find and replace in files "easily" (it's still a bit of voodoo to me). The script can be used by simply typing:

python pandocipy_given_file.py file.md

(Which assumes that file.md has all links sending to other md files.)

I use this script in the makefile as I did in the previous post:

md = $(wildcard *.md)
htmls = $(md:%.md=%.html)

all: $(htmls)

%.html: %.md
    ./pandocipy_given_file.py $<
    ../Scripts/generate_website.py

The final line of that file is not relevant to this blog post but it just runs a python script that will generate the website for the course (which is still a work in progress!). That python script doesn't do anything fancy but it goes in to each Chapter document for example and reads the Chapter title which it uses to write the index.html file. All this allows me to modify a given md file and simply run make to update the website file.

If any of this is of interest the github repo is here.

Note that in this particular case the links being "format loyal" is mainly useful for the `html` files to the general audience. The loyal links in pdf and docx format are mainly useful to me as all the paths are relative to my setup. I'm posting this again so that I don't forget it but also in case it's useful to others (you can obviously use sed to replace other things).

Sunday, 7 April 2013

Makefiles for tikz sagemath and teaching notes written in markdown

So I've posted before on multiple occasions about PCUTL which is a higher education certification process I'm currently undergoing. One of the things I paid particular attention to in my latest portfolio (a copy of which can be found at the previous link) is accessibility of teaching notes.

Generally this implies making sure students have notes in a format that they can easily manipulate (for example change font size and color). This is often implied to mean that notes should be written in .doc format. As a mathematician this isn't really an option as all our teaching material really requires to be written in LaTeX.

I have spent some time being amazed by pandoc which is a really smart piece of software that allows you to take documents in various languages and output to a variety of other languages. Here's a video I made showing how I use like to use pandoc to get notes in html,pdf and .docx format (I wrote a blog post here about it):

My entire SAS/R programming course I'm currently teaching was written this way (you can find the notes etc here).

I've recently started writing a new course: a game theory course.

This course needs a bunch of diagrams and plots (as opposed to the previous that basically had a bunch of screenshots). I wanted to use the same overall system but planned on using tikz for the diagrams (tikz lets you draw nice diagrams using code) and perhaps thought about using sagetex (I use +Sage Mathematical Software System a lot but have never used sagetex which lets you have sage code inside of a LaTeX document) for the plots.
After searching around for a bit I didn't see how I was going to be able to use the awesome pandoc to create teaching notes in multiple formats using the above approach. There was no way for pandoc to take the tikz code and output a separate image file for the html (and whatever it needs for .docx).

This blog post is a summary of my approach.

Which basically uses a couple of things:

Some tikz code that creates a standalone pdf image;
A script that converts a pdf to a png (so that it can be used in html);
A makefile so that I can recompile only the tikz code that needs to be recompiled;

That is how I get my tikz diagrams.

I also have a similar system to create plots from sage and I'll also show the makefile I use to get pandoc to sort out the files I want.

Creating standalone pngs with tikz and makefiles

To create a standalone tikz image you need to use the standalone document class. Once you've done that you can just use normal tikz code:

\documentclass[tikz]{standalone}
\usepackage{tikz,amsmath}
\tikzstyle{end} = [circle, minimum width=3pt, fill, inner sep=0pt, right]
\begin{document}
\begin{tikzpicture}
    \draw (0,0) -- (4,0) node [below] {$u_1$};
    \draw (0,0) -- (0,4) node [left] {$u_2$};
    \draw (3,0) node[end] (A) {} node [above=.3cm,right] {$(3,0)$};
    \draw (0,3) node[end] (B) {} node [above=.3cm,right] {$(0,3)$};
    \draw (1,1) node[end] (C) {} node [below,left] {$(1,1)$};
    \draw (2,2) node[end] (D) {} node [above=.3cm,right] {$(2,2)$};
    \draw (A) -- (D);
    \draw (D) -- (B);
    \draw (B) -- (C);
    \draw (C) -- (A);
    \draw [dashed,thick] (C) -- ++(0,1.5);
    \draw [dashed,thick] (C) -- ++(1.5,0);
    \draw [->] (2,3.5) node[above] {\tiny{Feasible average payoffs}} -- (.75,2);
    \draw [->] (4,1.5) node[above] {\tiny{Individually rational payoffs}} -- (1.5,1.25);
\end{tikzpicture}
\end{document}

The above for example creates the following standalone image (note that if you pdflatex it you obviously get a pdf which would be great if you were just pdflatex'ing a bigger document but I'll get to creating a png next):

I've then written a short python script (I'm sure this could be done as easily in bash but I'm just more comfortable in python) that can act on any given tex file to convert it to png:

#!/usr/bin/env python
from sys import argv
from os import system

system("pdflatex %s" % (argv[1]))
system("convert -density 300 %s.pdf %s.png" % (argv[1][:-4], argv[1][:-4]))

This makes use of convert which is a program that can convert pdf to png (more info here but it's natively on Mac OS and *nix I believe).

I call that python file convert_tex_to_png.py and so I can convert any tex file by doing:

python convert_tex_to_png.py file.tex

I'm going to put all these image files (tex and png) in a folder called images and there's (now that I'm done) about 70 image files in there so every time I change or write a new tex file I don't want to compile all the files and I also don't want to have to waste time compiling the particular file I've written.

So I've used makefiles.

Now I've tried to use makefiles before but in all honesty they confuse the heck out of me. I spent a while going through a bunch of tutorials but most of them look at c code. For those who don't know, a makefile is basically a short program that gives instructions to compile certain files. When written well these are great as by simply typing make you can recompile any file that has been modified (and not necessarily all of them!).

So here's the makefile I've converged to using (after heuristically trying a bunch of stuff from stackoverflow):

tex  = $(wildcard *.tex)
pngs = $(tex:%.tex=%.png)

all: $(pngs)

%.png: %.tex
    ./convert_tex_to_png.py $<;

clean:
    rm *.aux
    rm *.log
    rm *.pdf

I save the above file calling it `makefile` so that it is run by default when calling `make`. This will take all the tex files (that have been modified) and output the corresponding png files as required. Note I can also type make clean to remove all auxiliary files (aux, log and pdf).

I find this really useful as I can modify a bunch of tex files and once I'm ready simply run make to get all the pngs as required.

I'll briefly show the (basically same) code I use to get plots from sage as standalone pictures.

Creating standalone pngs with sage and makefiles

I put all the plots created via sage in a plots directory. Here's an example graph created with sage:

The sage code to do that is here:

f(a,b)=a^3/3+3/4*b^2+(1-a-b)^2/2
p = contour_plot(f,(0,.5),(0,.5),colorbar=True,axes_labels=["$\\alpha$","$\\beta$"],contours=100, cmap='coolwarm',fill=True)
p.axes_labels(['$\\alpha$','$\\beta$'])
p.save("L18-plot01.png")

The makefile (which will keep all png files up to date based on the sage files in the directory):

sage  = $(wildcard *.sage)
pngs = $(sage:%.sage=%.png)

all: $(pngs)

%.png: %.sage
    sage $<;

clean:
    rm *.py

(I can similarly use make clean to remove the .py files that get created when running sage in cli).

The final thing I'll show is how to write a makefile to get pandoc to output the required file formats.

Using a makefile with pandoc

So I write all of my notes in markdown. For various reasons which include:

Pandoc likes markdown as input (it can also handle tex and html though);
I like markdown;
I can incorporate latex in markdown when needed.
If a student ever wanted to modify the markdown but didn't know LaTeX they'd have an honest chance of knowing what was going on (if you don't know markdown it really is worth taking a look at this short video)

Thus my makefile must be able to compile all modified markdown files. I use a similar approach to the approach I had for tikz diagrams which is to write a short python script:

#!/usr/bin/env python
from sys import argv
from os import system

e = argv[1][:-3]
print e

system("pandoc -s " + e + ".md -N -o " + e + ".html --mathjax")
system("pandoc " + e + ".md -o " + e + ".docx")
system("pandoc " + e + ".md -N -o " + e + ".pdf --latex-engine=xelatex")

(pandocipy_given_file.py is the name of the above python script).

There are basically three pandoc commands going on there. The first creates the html with the LaTeX math being rendered by mathjax, the second creates the .docx format and the last one creates the pdf (using the xelatex engine).

The final piece of the puzzle is to use the following makefile:

md = $(wildcard *.md)
htmls = $(md:%.md=%.html)

all: $(htmls)

%.html: %.md
    ./pandocipy_given_file.py $<;

The way that makefiles work (I think) is to indicate what files you want as an output (and what those files are dependent on). So for example my previous makefiles wanted to output pngs with tex and sage files as dependencies. Here I use the required output as html and the dependencies are the md files.

Using the above I can change any md file and simple type make to get all the pdf, html and docx files updated.

Summary and a couple of things I don't want to forget

I have 3 directories here. The first: Course_notes contains the md files and corresponding python script and makefile. Course_notes also contains 2 other directories: images and plots which both contain the images and plots as I've described above.

So in the md files I refer to an image as so:

![A pic for C1](images/C01-img03.png)

I'm using pdf as the prescribed format of the notes for the course (students are welcome to use html and docx if they wish but I'm recommending pdf) and in general xelatex will sometime move images around to better format the text.

To make sure things don't get confusing I want to be able to \ref and \label images. The easiest way I've found to do this without messing around with the html (who knows what happens with docx, life is too short to worry about everything) is to use the \text:

In the excellently drawn image shown\text{ in Figure \ref{C01-img03}} we see that...

![A pic for C1\label{C01-img03}](images/C01-img03.png)

Pandoc is clever enough to simply ignore what is in the \text when converting to html and docx so that you get documents in html and docx that won't have a reference to the figure (but in general they won't reorder the images) and the pdf will have the documents referenced as you can see in these screenshots (showing pdf and html):

pdf:

html:

I've mainly posted this to make sure I remember how to do it but it might be helpful for others. I'm sure it could be improved by getting rid of my python scripts and having everything in the makefile. I tried that but had a hard time getting wildcards to work as I did so I didn't try anything more complicated. Finally there are ways to get makefiles to run makefiles in subdirectories so it would be nice I guess to also do that so that a single `make` in the highest directory would sort out everything.

I'll be putting all the teaching notes up on github so will link to that repository when I've done it.

EDIT: Here's the github repo and here's a website with all the content (which will eventually become the class website).

ANOTHER EDIT: Here's a follow up post which uses sed to keep the links in the right format.

Tuesday, 2 April 2013

Handling dates and carrying out linear regression to model number of members of a Google Plus community in Python

I've blogged about rugby before (using some game theory to disagree with a rugby commentator) but here's another one.

I'm a member of the Rugby Union community on Google plus and it's a great place to chat about rugby and share stuff. As I said on this post about G+, I don't post publicly about rugby as I assume that most people who circle me won't be too interested in my opinions on rugby so it's nice to have a place to go. Anyway I really recommend the community it's a nice place if you like the game.

+Davide Coppola who's the owner of the community has been posting an announcement every time we go up a 100 members. We're currently at about 960 people and we've been wondering when we'll go past the 1000 member mark.

I thought I'd code a very simple bit of python to have an educated guess but also to see what member numbers have looked like.

Here's the data:

Date,Number
2013-04-02,961
2013-03-26,900
2013-03-13,800
2013-03-03,700
2013-02-19,600
2013-02-10,500
2013-02-08,400
2013-01-31,300
2013-01-06,200
2012-12-17,100
2012-12-07,1

First of all I needed to import all the data (I've posted a short screencast about handling csv data in python and here's an old blog post with the code):

import csv
outfile = open("Data.csv", "rb")
data = csv.reader(outfile)
data = [row for row in data]
outfile.close()

Once that is done I use the datetime python module to convert the dates read as strings to actual dates but also to create a list containing the member numbers:

dates = [datetime.datetime.strptime(e[0], '%Y-%m-%d') for e in data[1:]]
numbers = [eval(e[1]) for e in data[1:]]

I will use the dates list later on to plot things nicely but for now I need to consider numeric data (to fit a linear model). I convert the dates list to a set of numbers counting the number of days having passed from the first day of the community:

x = [(e - min(dates)).days for e in dates]

Once I've done that I use the stats sub package from the scipy library to carry out a simple linear regression:

from scipy import stats
gradient, intercept, r_value, p_value, std_err = stats.linregress(x, numbers)

(I in fact only need the gradient and intercept from the above but it's all there in case I wanted it.)
To find out when we can expect to go past 1000 members (assuming a linear model of growth):

projected_date = min(dates) + datetime.timedelta(days=(1000 - intercept) / gradient)

To project the linear fit to see what number of members we could hope to have after a year I do the following:

extra_date = min(dates) + datetime.timedelta(days=365)
projection = gradient * (extra_date - min(dates)).days + intercept

Finally to plot all of the above I use pyplot:

import matplotlib.pyplot as plt
plt.figure(1, figsize=(15, 6))
plt.plot_date(dates, numbers, label="Data")
plt.plot_date(dates + [extra_date], numbers + [projection], '-', label="Linear fit (%.2f join per day)" % gradient)
plt.legend(loc="upper left")
plt.grid(True)
plt.title("Rugby Union Google Plus Community Member Numbers")
plt.savefig('Rugby_Union_Community_Numbers.png')

The output is given here (there's something not great with the graph: I don't know why a line is being plotted between the data points but I haven't been able to fix that easily):

We should have about 3000 member by the end of 2013 and most importantly we can expect to go past the 1000 member number on the 9th of April 2013 at 21:38 (+Davide Coppola and +Andrew Byrne: set your alarm clocks).

Note that you would in fact want to fit a much more complex model than a simple linear fit to actually try and forecast any of what I've done above. I was in fact suprised at how linear the fit was...

It's been a quick bit of fun and perhaps could prove useful to some as to how to handle dates (and do a simple linear regression) in python.

I've actually got a couple of screencasts that show how to handle dates in R and SAS which I'll put here in case they're of use to anyone.

Un peu de math...