I'm a member of the Rugby Union community on Google plus and it's a great place to chat about rugby and share stuff. As I said on this post about G+, I don't post publicly about rugby as I assume that most people who circle me won't be too interested in my opinions on rugby so it's nice to have a place to go. Anyway I really recommend the community it's a nice place if you like the game.
+Davide Coppola who's the owner of the community has been posting an announcement every time we go up a 100 members. We're currently at about 960 people and we've been wondering when we'll go past the 1000 member mark.
I thought I'd code a very simple bit of python to have an educated guess but also to see what member numbers have looked like.
Here's the data:
Date,Number 2013-04-02,961 2013-03-26,900 2013-03-13,800 2013-03-03,700 2013-02-19,600 2013-02-10,500 2013-02-08,400 2013-01-31,300 2013-01-06,200 2012-12-17,100 2012-12-07,1
First of all I needed to import all the data (I've posted a short screencast about handling csv data in python and here's an old blog post with the code):
import csv outfile = open("Data.csv", "rb") data = csv.reader(outfile) data = [row for row in data] outfile.close()
Once that is done I use the
datetimepython module to convert the dates read as strings to actual dates but also to create a list containing the member numbers:
dates = [datetime.datetime.strptime(e, '%Y-%m-%d') for e in data[1:]] numbers = [eval(e) for e in data[1:]]
I will use the
dateslist later on to plot things nicely but for now I need to consider numeric data (to fit a linear model). I convert the dates list to a set of numbers counting the number of days having passed from the first day of the community:
x = [(e - min(dates)).days for e in dates]
Once I've done that I use the
statssub package from the
scipylibrary to carry out a simple linear regression:
from scipy import stats gradient, intercept, r_value, p_value, std_err = stats.linregress(x, numbers)
(I in fact only need the
interceptfrom the above but it's all there in case I wanted it.)
To find out when we can expect to go past 1000 members (assuming a linear model of growth):
projected_date = min(dates) + datetime.timedelta(days=(1000 - intercept) / gradient)
To project the linear fit to see what number of members we could hope to have after a year I do the following:
extra_date = min(dates) + datetime.timedelta(days=365) projection = gradient * (extra_date - min(dates)).days + intercept
Finally to plot all of the above I use
import matplotlib.pyplot as plt plt.figure(1, figsize=(15, 6)) plt.plot_date(dates, numbers, label="Data") plt.plot_date(dates + [extra_date], numbers + [projection], '-', label="Linear fit (%.2f join per day)" % gradient) plt.legend(loc="upper left") plt.grid(True) plt.title("Rugby Union Google Plus Community Member Numbers") plt.savefig('Rugby_Union_Community_Numbers.png')
The output is given here (there's something not great with the graph: I don't know why a line is being plotted between the data points but I haven't been able to fix that easily):
We should have about 3000 member by the end of 2013 and most importantly we can expect to go past the 1000 member number on the 9th of April 2013 at 21:38 (+Davide Coppola and +Andrew Byrne: set your alarm clocks).
Note that you would in fact want to fit a much more complex model than a simple linear fit to actually try and forecast any of what I've done above. I was in fact suprised at how linear the fit was...
It's been a quick bit of fun and perhaps could prove useful to some as to how to handle dates (and do a simple linear regression) in python.
I've actually got a couple of screencasts that show how to handle dates in R and SAS which I'll put here in case they're of use to anyone.