Sunday, 16 June 2013

Why and how: open education resources.

This is the fourth post in a series of posts reflecting on the teaching and learning in a recent course I've taught on SAS and R. This post will be quite different in nature to the previous posts which looked at students choices between SAS and/or R in their assessment:
Here I would like to talk about how I deliver teaching materials (notes, exercises, videos etc) to my students.

All of the teaching materials for the course can be found here: drvinceknight.github.io/MAT013/.

How they got there and why I think it's a great place for them to be will be what I hope to discuss...

A Virtual Learning Environment that was not as good as alternatives.

At +Cardiff University we have a VLE provided that is automatically available to all our students that all lecturers are encouraged to use. So when I started teaching I diligently started using the service but it had various aspects that did not fit well with my workflow (having to upload files on every change, clunky interface that actually seemed optimised for IE and various other things). It was also awkward (at the time, I believe that this has been addressed now) for students to use the environment on smart phones etc...

As an alternative, I setup a very simple website using google sites and would use +Dropbox's public links to link pdfs and other resources for my students. An example of such a delivery is these basic Game Theoretical materials. This gave me great control, I no longer had to mess around with uploading versions of files, every change I made was immediately online and also as the site was pretty simple (links and pdfs) it was easily accessible to students on all platforms (I could also include some YouTube videos).

An immediate consequence of this approach is that my materials are all publicly available online.

To anyone, our students or not. The first thing I did was check with +Paul Harper: the director of the MSc course that I was only teaching on at the time that this was ok. We chatted about it a bit and were both happy to carry on. My main train of thought was that there are far better resources already available online so mine might as well be. (I've subsequently checked with our School's director of learning and teaching and there's no internal regulations against it which is nice to know about +Cardiff University)

There is a huge amount of talk about open access in research (I won't go in to that here) but less so to some extent in teaching. I did find this interesting newspaper article that ponders as to "Why don't more academics use open educational resources?". This offers a good general discussion about open education resources.

I would feel very very humbled if anyone chose to actually use my resources. I'm at the early part of my career and am still learning so I don't think that will happen anytime soon but there is another more important benefit to having my teaching stuff online for everyone.

I always post about any new courses I'm working on, on G+ and am grateful to get a fair bit of feedback from other academics around the world. This in itself gives me a certain level of confidence in front of my students who know that what I'm teaching them is verifiable by anyone in the world. I've often changed a couple of things based on feedback by other academics and I think that's brilliant.

To some extent my teaching resources are not just reviewed by a couple of peers in my university but also by anyone online who might be interested in them.

(It would be great if research worked this way too)

Through G+ (I've posted about how awesome a tool G+ is as an academic personal development tool) I learnt about git and github. If you don't know about git watch this video by +ZoĆ« Blade is very helpful:


After a while I jumped in and starting using it. After a little longer while I found out that you can use github to host a website:


Using this I it is really easy to put together a very basic website that has all the teaching materials. The added benefit is that the materials are now all in a github repo which opens them up even more (using dbox, only the pdf files were in general in view) whereas now everything is (md, tex source files etc...) and theoretically if anyone wanted to they could pull etc...

I'm certainly not the first person to put teaching stuff up on github, (watching people like +Dana Ernst+Theron Hitchman and various others do it is what made me jump in).

The github repo for my R and SAS course can be found here and here are some other teaching things I have up on github (with the corresponding webpage if I've gotten around to setting it up):
To finish off here are the various reasons I have for putting my teaching stuff up on github:
  • Openness:
    • my students know that this is viewable by everyone which hopefully gives the resources a level of confidence;
    • people on G+ and elsewhere are able to point out improvements and fixes (if and when they have time);
  • Access: the sites are really simple (basic html with links) so they can be viewed on more or less anything;
  • Ease of use: I don't have to struggle to use whatever system is being used. If it's an option I kind of refuse to use stuff that makes me less efficient (an example of this is our email system: I use gmail). At the moment the system I like is github + git.
I wrote a blog post (which is the most read thing I've ever written - online or offline - I think) showing how to combine various things like tikz, makefiles, +Sage Mathematical Software System etc to automate the process of creating a course site so I'll put a link to that here.

Sunday, 9 June 2013

Comparing Recursive and Iterative Algorithms: Binary Search and Factorial

I'm in the middle of putting together a new course for our undergraduates at Cardiff University. The course is called 'Computing for Mathematics' and will introduce our first year students to programming in general (using python) as well as how a mathematics package can help them during there degree (we'll be using +Sage Mathematical Software System which is a natural extension from python and is also super awesome).

I was prepping some stuff on recursion (which I'm really looking forward to teaching to our mathematics students given the connection to induction) and came across a bunch of posts stating the lack of speed generally associated to recursion:
I thought I'd write some (python) code to see how much slower recursion was. All the code (and data) is in this github repo.


Binary search


The first algorithm I thought I'd take a look at was binary search. I tried to write each algorithm in as basic a way as possible so as to allow for the best possible comparison.

Iterative

Here's the algorithm written iteratively:

def iterativebinarysearch(target):
    """
    Code that carries out a binary search
    """
    first = 0
    last = len(data)
    found = False
    while first <= last and not found:
        index = int((first + last) / 2)
        if target == data[index]:
            found = True
        elif target < data[index]:
            last = index - 1
        else:
            first = index + 1
    return index

Recursive

And here's the algorithm written recursively:

def recursivebinarysearch(target, first, last):
    """
    Code that carries out a recursive binary search
    """
    if first > last:
        return False
    index = int((first + last) / 2)
    if target == data[index]:
        return index
    if target < data[index]:
        return recursivebinarysearch(target, first, index - 1)
    else:
        return recursivebinarysearch(target, index + 1, last)
    return index

The experiment

I timed 10 runs of each of these algorithms on data sets of varying size, for each size choosing a random 1000 points to search. The data is all available in this github repo.

Here's a scatter plot (with fitted lines) for all the data points:



A part from the fact that binary search seems very good indeed, there's not that much going on here apart from perhaps a slight tendency for iterative approach to be a bit slower.

I decided to take a look at the mean time (over the 1000 searches done for each data set):



This seems to show that the iterative approach is slow but again it's not very clear. This is mainly due to the fact that I haven't done any clever analysis. The data sets are pretty big (10,001,000 data points plotted in the 1st graph and 10,001 in the 2nd) so to do anything really useful I'd have to take a look at the data a bit more carefully (the two csv files: 'recursivebinarysearch.csv' and 'iterativebinarysearch.csv' are both on github).

I thought I'd try a 'simpler' algorithm as there are perhaps a bunch of things going on with the binary search (size of data set, randomness of points chosen etc...).

Computing Factorial


The other algorithm I decided to look at was the very simple calculation of $n!$.

Iteration

Here's the simple algorithm written iteratively:

def iterativefactorial(n):
    r = 1
    i = 1
    while i <= n:
        r *= i
        i += 1
    return r

Recursion

Here's the algorithm written recursively:

def recursivefactorial(n):
    if n == 1:
        return 1
    return n * recursivefactorial(n - 1)

The experiment

This was a much easier experiment to analyse however as the timings increased I thought it would also be interesting to look at the ratio of the timings:



We see that first of all the iterative algorithm seems to perform better but as the size of $n$ increases we notice that this improvement is not as noticeable. My computer maxed out it's stack limit  so I won't be checking anything further but I wonder if the ratio would ever get bigger than 1... (This data set: 'factorial.csv' is also on github).

I'm sure that there's nothing interesting in all this from a computer scientists point of view but I found it a fun little exercise :)

Saturday, 8 June 2013

Student choices between SAS and R in Individual Coursework

This is the third post is a small series of posts reflecting on my teaching in a new class (all teaching materials can be found here) introducing students to SAS and R on our MSc course. The two previous posts have so far just been a reflection on student's attitudes towards each piece of software.
In the first of those posts I said how I was slightly surprised at how students had chosen to use SAS for 1 particular question where they had the option of the language. In my opinion it was a problem much easier to tackle with R (all the course works, class tests etc can be found on this site). I also mentioned in that post how I asked students which language they preferred. Almost all students answered that it depends on the task (which is a great answer) but after pushing them for a particular decision a strong majority seemed to prefer R. I posted on G+ recently about a particular interaction I've had subsequently with a student which seemed to confirm this attitude of needing to find the correct tool for the correct job.

In the second post I described how students in their group presentations (asking them to teach me something I had not taught them) mostly evaluated SAS v R for certain tasks. It was great to see them identify strengths and weaknesses for each language.

This post is about their choices in their individual coursework.

Similar to their class test (which I discuss in the first post of this series) there was a question which allowed students to choose a language in this individual coursework component of the class (which can be found here).

In my opinion this question made use of quite a big data set (generated by some research I'm currently doing) and I thought it would probably be simpler to approach in SAS. About 52% of the class agreed with me whilst 48% seemed to still prefer R. First of all, I could be wrong and R could indeed be better suited for this question, secondly it might also be a reflection of the personal preferences that the students seemed to indicate when I asked: most students seemed to prefer R. If the latter is the case then I suppose it's nice to see that students not just realise that there's a better tool for a given job but also a better tool for a particular person doing a given job. I'll be keeping a track of this over the years and see how (if) it changes.

In my next post in this series I'll start to reflect on some of the teaching methodologies I used.

Monday, 3 June 2013

An online list of mathematics books

A couple of days ago +James Noble posted a link on G+ to a spreadsheet on G+ that is editable publicly inviting people to contribute mathematics books.


The link to the sheet it here and so far it has 65 books on it (there is also a meta list of lists that has 8 lists of books; to which I've added these 2 previous posts: about mathed and about game theory) and a website I've put together that uses it as an underlying database (which I'll talk about below) can be found here.

I think this is a great idea!

It's completely open and anyone can add to it and also benefit from it. I've used various keywords to search and have found 1 or 2 books that I didn't know about.

Some initial analysis of the list

I decided I'd throw some python at this spreadsheet and first of took a look at the contributors. In particular how many books are individuals contributing (a link to all the python code is available here).



As you can see about 50% of people contribute more than 1 book (including yours truly).

I was also curious as to how many authors were listed:



Here however it seems rare for an author to have more than 1 book in the list. The most prolific author is J. SCOTT CARTER with 5 books.

Some very basic natural language analysis

Two fields in the data set that I have not yet mentioned are the 'Overview' field and the 'Target' field. These two allow for some free text describing what the book is about and who it is aimed at. I used the nltk python library (python really is crazy, anything and everything has a library) to take a look at the frequency of "uncommon" words in these two fields.

First of all the description of the books:



Nothing too surprising here (I should perhaps remove certain words from this frequency analysis) but it's cool to see that "History" is pretty high up, as well as "Teaching" and "Philosophy".

The final piece of analysis is the frequency of the words in the description of who the book is for:



An initial glance at this seems to indicate that the books on the list are mainly aimed at students and or/teachers. It would be nice I suppose to see more books for research...

A very basic static website

The other thing that the code I've put together does is write a website that is hosted using (github pages) and lists the various books (as well as gives up an up to date analysis of what you see above). The website can be found here


The website is very very basic and all static but I've scripted everything (including the download of the spreadsheet although I haven't included that on github yet) so if it's helpful and anyone wants me to update the site just let me know (give me a nudge on G+ and I even check twitter sometimes).

The github repo for all this is here. If you have any ideas for further stuff that could be done to this data set then please improve my analysis or just let me know :) It would be nice to do something slightly smarter with the nltk and also do some analysis of the links between the books (although it would probably need a bunch more books... for that to be insightful)...

A really great job by +James Noble 

I think stuff like this is really great and it's been nice chatting to James about it.


It was cool to watch the list grow (I saw James's post pretty early so go to watch people make a bunch of contributions) and it'd be nice to see it grow even more.


Saturday, 25 May 2013

Student choices between SAS and R in teaching presentations.

This is my second post in a series of posts (the first one is here) about a SAS/R course that I've recently finished teaching to MSc students at +Cardiff University.

I taught this course using a hybrid of flipped classrooms / IBL (although I'm cautious when using the term IBL as I'm not entirely sure my approach fits with any variation of Moore's methods). I gave students access to all the content of the class before hand (including notes, exercises and a series of screencasts - all the materials are here if they're of interest). The students were then given "challenges" and had to deliver their solutions to as presentations to the other students in the class. The aim of this was to get the students to teach themselves/each other and quite often I would not actually have to say much at all during a class (this allowed for a better use of 'me' by the students during the lab sessions).

(Here's a previous post about flipping the classroom and here's one about IBL)

In the previous post I described how students chose to use SAS and/or R in their class test. Most students chose SAS despite displaying a preference for R when asked.

The above is 1 of 3 assessments that the students have had to go through. This post is about the second assessment: a group presentation. In this presentation I asked students to teach an aspect of SAS or R that had not been covered in class (you can see the brief here).

I believe that this is a particularly important thing to assess as I in no way can pretend to teach them everything. It's important that they know how to learn new things that they might need in their career.

I was expecting groups to select a particular language and then a particular topic but interestingly most groups chose to look at both languages and compare strengths and merits.
I had 6 groups and here are the subjects that they looked at:
  • Time series forecasting: both in SAS and R;
  • Principal component analysis: both in SAS and R;
  • Random sampling: both in SAS and R;
  • Survey sampling (in SAS) and the creating a gif of the Mandelbrot set (in R);
  • Scorecard building in SAS;
  • Mapping and spatial analysis: in R.
The 3 groups who carried out a single thing in both SAS and R did a good job of describing strengths and weaknesses of both languages. It was a pleasure to see and again reassures me that an important message has gotten across to the students which is that there is not 1 best tool but an appropriate tool for a particular job.

I'm planning on putting the code/slide up for these talks for them to serve as resources for students doing the course next year but I want to wait till I've marked their final piece of work and asked the students if they mind. In the mean time I'll repost this gif of the Mandelbrot set made by 1 of the groups (I thought this was cool!):



I still want to write generally about the teaching/learning methods in this class and will do that later but if it's of interest my PCUTL portfolio is available here and in there I describe and justify a lot of what I'm doing.

Monday, 20 May 2013

Probability of saying 'yes' to academic responsibilities

I've just read a great post by +Adriana Salerno: Learning to say no.

In that post Adriana discusses how in mathematics (and I'm sure a bunch of other/most fields) one needs a long period of uninterrupted time to work on Research she links to this Big Bang Theory clip:



She also however talks about how as an early career researcher it's important to take opportunities for responsibilities as and when they come. This is something that rings very true to me. Growing up I played a lot of rugby and basically had a "Say yes to coach" attitude ("Vince, you're slow, run sprints" - "Yes coach", "Vince, you're going to sit on the bench this week" - "Yes coach" etc... - Although I actually said "Oui Monsieur" as all my rugby was played in France, but I digress...).

I've kind of taken that attitude in to the early days of my career (I'm still a 'young pup' academia wise) but I also am very grateful of every opportunity that gets sent my way (I'm very lucky to be sitting on various committees, the editorial boards for a couple of journals and am in the middle of preparing not 1 but 2 brand new courses which is a great opportunity as opposed to being given others people's courses!).

Having said that, as Adriana points out in her blog it's important to find a balance so that I can also do some research.

The point of this post is not to say that I've figured out how to do that but to post this xkcd style graph that I made using this package on github: XKCDify.

If this was done by Randall Munroe the Alt Text would be far better...


I'm about at the point where the solid line meets the dashed line (ie the "unkown" for me). I suspect that I'm still being quite optimistic as to how low the probability of saying yes will go for me as I still generally do as I'm told and appreciate the opportunities greatly :)

In Adriana's post she talks about a "research day", I might try to be strict on that...

PS Here's another similar kind of graph that +Paul Harper (my head of research group) put together when he was actually looking back a bit on his 10 years in Academia.

(If anyone's interested here's the repo with the code I used to get that plot, I actually used +Sage Mathematical Software System 's find_fit command to fit a quintic to the few points I wanted to have on there... There might be a better way to do that though...)

Saturday, 18 May 2013

Student choices between SAS and R

I'm going to be writing a couple of posts looking back at a class I've taught that's just coming to an end (at the time of writing this I've got one more group presentation to see).

The course teaches SAS and R in parallel on our MSc course (if it's of interest all the teaching materials are here).

I'll be blogging about this class as I taught it a bit differently to the usual "Students Listen - Teachers Lecture" style. I'll get back to that more in future posts (although a lot of what I've done is in my PCUTL module 2 portfolio).

The purpose of this post will be to briefly discuss two questions that were on my class test that I feel give some (very shallow) information as to how the students experienced the course. (The class test was made of 4 questions: q1 - a simple task to be performed in both SAS and R, q2 - a task in SAS, q3 - a task in R and finally q4 - a task in either language.)

The class is taught over 5 weeks:
  1. Week 1: Introduction and basic statistics
  2. Week 2: Data manipulation
  3. Week 3: Programming
  4. Week 4: Extras (for example we take a look at proc optmodel and ggplot2).
  5. Week 5: A 2 hour class test
The first question on the class test (you can see it here) asked the students to rank their enjoyment of each week (the purpose of this question was to give them a nice easy starting point). So a ranking of 1 implied a favorite week while a ranking of 5 implied the least favorite.

This plot shows the mean ranking given to each week:


This data and the following discussion should be taken with no implied rigour: I'm not analysing this too closely and also students might just have written down any sequence of rankings without thinking about things too much (this was after all a test).


First of all it does seem that the students enjoyed the class test the least (which I guess is to be expected).

Secondly, it looks like the first week was perhaps less enjoyed in general.

I think this is also to be expected, I taught the class in a way that I don't believe would have been familiar to the students (I tried to encourage them to teach each other and themselves) so perhaps that first week was just a bit too unfamiliar. I'll try and rectify that in future years (if only by pointing students to what students from previous years did).

Now for the second point.



By design I hope that the students learn how to carry out various programming tasks in SAS and R seeing the strengths and weaknesses of each language as they go. The last question of the class test (again here) involved a bit of data manipulation on small data sets and I believe that the main difficulty was that this question did not force a language on the students. In essence choosing the language was the most important point of the question.

My personal approach would have been to use R for this particular question but interestingly most students chose SAS. Some used a combination (often starting in SAS before realising that perhaps R was better suited which led to a bit of a clumsy hybrid). On average SAS was used by 77% of the students for question 4 (some of which managed the task very well!).

During the group presentations I've been asking students afterwards a "this does not count" question (ie making it clear that there's no wrong or right answer to this and that I'm simply interested/curious):
'If you were starting a consultancy company tomorrow but could use only one package: ether SAS or R which would you pick?'
The really pleasing thing is that almost all of the students miss the constraint in my question and immediately reply something like:
"It depends on the kind of consultancy we'll be doing."
I re-iterate the constraint (after telling them that that's the actual right answer :)), I'd say that a majority of students seem to prefer R. Perhaps the biais towards SAS in the class test was due to the conditions (time was short) but overall it's been nice to see that most students realise that it's about finding the right tool for a particular job.

I'm yet to see the individual course work that they'll be handing in this week, which is of a very similar format. I wonder which language they'll have picked for Q4...

EDIT: Here's the next post in this series (looking at choices between SAS and R during teaching presentations).