Monday, 3 June 2013

An online list of mathematics books

A couple of days ago +James Noble posted a link on G+ to a spreadsheet on G+ that is editable publicly inviting people to contribute mathematics books.


The link to the sheet it here and so far it has 65 books on it (there is also a meta list of lists that has 8 lists of books; to which I've added these 2 previous posts: about mathed and about game theory) and a website I've put together that uses it as an underlying database (which I'll talk about below) can be found here.

I think this is a great idea!

It's completely open and anyone can add to it and also benefit from it. I've used various keywords to search and have found 1 or 2 books that I didn't know about.

Some initial analysis of the list

I decided I'd throw some python at this spreadsheet and first of took a look at the contributors. In particular how many books are individuals contributing (a link to all the python code is available here).



As you can see about 50% of people contribute more than 1 book (including yours truly).

I was also curious as to how many authors were listed:



Here however it seems rare for an author to have more than 1 book in the list. The most prolific author is J. SCOTT CARTER with 5 books.

Some very basic natural language analysis

Two fields in the data set that I have not yet mentioned are the 'Overview' field and the 'Target' field. These two allow for some free text describing what the book is about and who it is aimed at. I used the nltk python library (python really is crazy, anything and everything has a library) to take a look at the frequency of "uncommon" words in these two fields.

First of all the description of the books:



Nothing too surprising here (I should perhaps remove certain words from this frequency analysis) but it's cool to see that "History" is pretty high up, as well as "Teaching" and "Philosophy".

The final piece of analysis is the frequency of the words in the description of who the book is for:



An initial glance at this seems to indicate that the books on the list are mainly aimed at students and or/teachers. It would be nice I suppose to see more books for research...

A very basic static website

The other thing that the code I've put together does is write a website that is hosted using (github pages) and lists the various books (as well as gives up an up to date analysis of what you see above). The website can be found here


The website is very very basic and all static but I've scripted everything (including the download of the spreadsheet although I haven't included that on github yet) so if it's helpful and anyone wants me to update the site just let me know (give me a nudge on G+ and I even check twitter sometimes).

The github repo for all this is here. If you have any ideas for further stuff that could be done to this data set then please improve my analysis or just let me know :) It would be nice to do something slightly smarter with the nltk and also do some analysis of the links between the books (although it would probably need a bunch more books... for that to be insightful)...

A really great job by +James Noble 

I think stuff like this is really great and it's been nice chatting to James about it.


It was cool to watch the list grow (I saw James's post pretty early so go to watch people make a bunch of contributions) and it'd be nice to see it grow even more.