I'm a member of the Rugby Union community on Google plus and it's a great place to chat about rugby and share stuff. As I said on this post about G+, I don't post publicly about rugby as I assume that most people who circle me won't be too interested in my opinions on rugby so it's nice to have a place to go. Anyway I really recommend the community it's a nice place if you like the game.

+Davide Coppola who's the owner of the community has been posting an announcement every time we go up a 100 members. We're currently at about 960 people and we've been wondering when we'll go past the 1000 member mark.

I thought I'd code a very simple bit of python to have an educated guess but also to see what member numbers have looked like.

Here's the data:

```
Date,Number
2013-04-02,961
2013-03-26,900
2013-03-13,800
2013-03-03,700
2013-02-19,600
2013-02-10,500
2013-02-08,400
2013-01-31,300
2013-01-06,200
2012-12-17,100
2012-12-07,1
```

First of all I needed to import all the data (I've posted a short screencast about handling csv data in python and here's an old blog post with the code):

```
import csv
outfile = open("Data.csv", "rb")
data = csv.reader(outfile)
data = [row for row in data]
outfile.close()
```

```
```

Once that is done I use the `datetime`

python module to convert the dates read as strings to actual dates but also to create a list containing the member numbers:```
dates = [datetime.datetime.strptime(e[0], '%Y-%m-%d') for e in data[1:]]
numbers = [eval(e[1]) for e in data[1:]]
```

```
```

I will use the `dates`

list later on to plot things nicely but for now I need to consider numeric data (to fit a linear model). I convert the dates list to a set of numbers counting the number of days having passed from the first day of the community:`x = [(e - min(dates)).days for e in dates] `

```
```

Once I've done that I use the `stats`

sub package from the `scipy`

library to carry out a simple linear regression:```
from scipy import stats
gradient, intercept, r_value, p_value, std_err = stats.linregress(x, numbers)
```

(I in fact only need the

`gradient`

and `intercept`

from the above but it's all there in case I wanted it.)To find out when we can expect to go past 1000 members (assuming a linear model of growth):

`projected_date = min(dates) + datetime.timedelta(days=(1000 - intercept) / gradient)`

To project the linear fit to see what number of members we could hope to have after a year I do the following:

```
```

```
extra_date = min(dates) + datetime.timedelta(days=365)
projection = gradient * (extra_date - min(dates)).days + intercept
```

Finally to plot all of the above I use

`pyplot`

:```
```

```
import matplotlib.pyplot as plt
plt.figure(1, figsize=(15, 6))
plt.plot_date(dates, numbers, label="Data")
plt.plot_date(dates + [extra_date], numbers + [projection], '-', label="Linear fit (%.2f join per day)" % gradient)
plt.legend(loc="upper left")
plt.grid(True)
plt.title("Rugby Union Google Plus Community Member Numbers")
plt.savefig('Rugby_Union_Community_Numbers.png')
```

The output is given here (there's something not great with the graph: I don't know why a line is being plotted between the data points but I haven't been able to fix that easily):

We should have about 3000 member by the end of 2013

**and most importantly**we can expect to go past the 1000 member number on the 9th of April 2013 at 21:38 (+Davide Coppola and +Andrew Byrne: set your alarm clocks).

**Note**that you would in fact want to fit a much more complex model than a simple linear fit to actually try and forecast any of what I've done above. I was in fact suprised at how linear the fit was...

It's been a quick bit of fun and perhaps could prove useful to some as to how to handle dates (and do a simple linear regression) in python.

I've actually got a couple of screencasts that show how to handle dates in R and SAS which I'll put here in case they're of use to anyone.

## No comments:

## Post a Comment