My name is Vince Knight and I'm a lecturer in Operational Research at Cardiff University with interests in game theory and queueing theory. I'll be using this blog to post about various things mainly including math and software...
Friday, 11 October 2013
Revisiting the relationship between word counts and code word counts in LaTeX documents
In this previous post I posted some python code that would recursively search though all directories in a directory and find all .tex files. Using texcount and wc the code the script would return a scatter plot of the number of words against the number of code words with a regression line fitted.
Here's the plot from all the .tex files on my machine:
That post got quite a few views and +Robert Jacobson was kind enough to not only fix and run the script on his machine but also sent over his data. I subsequently tweaked the code slightly so that it also returns a histogram. So here's some more graphs:
Robert's teaching tex files:
Robert's research files:
It looks like my .6 ratio between code words and words isn't quite the same for Robert...
BUT if we combine all our files together we get:
So I'm still sticking to the rule of thumb for words in a LaTeX file: multiply your number of code words by .65 to get in the right ball park. (But more data would be cool so please do run the script on your files :)).