Investing in maintaining a code library

I have always thought that maintaining a code library is a great way to keep up your chops in a language you do not use everyday. A lot of people use listservs and coding fora for this, but code libraries have the advantage that over time you learn a lot about one particular topic or subject.

Recently my interest has turned to nonparametric statistical techniques, and for the most surprising reason, that nonparametric estimators can be visualized. Something about a smooth effects curve flanked by 95% confidence intervals has stuck in my imagination for a few months now. I have been plotting starting a new code library, but cannot make up my mind as to what programming language it should be in.

Here are the choices I considered, and the relative pros and cons.

  1. Matlab/Octave: Matlab is proprietary and given that I will not be in academia in the near future, this could be a short-lived choice. But coming from a programming language that is an also-ran when assessed the number of users metric, the large reach is enticing. Plus, Matlab ages slowly, so current versions are good for at least a few years. Matlab comes with excellent optimization libraries, and it is my aim to be able to explore those carefully should I write the library in Matlab. Matlab has great graphics and this is important for nonparametric statistics. The main con of Matlab is that it is slow and I am not very sure how it scales for large data.
    My idea is to write first a fully Matlab version of the toolbox, as libraries are called in Matlab, and then fold in MEX code as hopefully I get better at it.
    The Octave implementation is nice and also has a MEX-like system. I think the problem with Octave is that the quality of the toolboxes drops off fairly quickly.
  2. Python: I have only recently begun using Python. Computer scientists love it, and it is open source. Some economists have recently begun to write econometrics code for Python — most notably John Stachurski at ANU who provides a fully featured advanced undergraduate textbook in econometrics with Python code examples. The code I have come across and attempted to write has turned out very neat, and this is important to me. The problem here is that there is already a project very much of the nature that I had in mind being written here.
    Another point in favor of Python is that it has great IDEs, including
    hooks for Visual Studio which I have recently discovered.
  3. R: Now you’d think this would be my first choice to write a statistics package. But I don’t like R (yet). I find the syntax unreadable, and the way functions are scattered around the vast numbers of packages impossible to keep track of. R is also (very) slow, although with C++ code folded in apparently it can be made much faster. I don’t know.
    I have an ongoing project where I am translating the empirical examples from Wooldridge’s book to R, but I think that R is best suited as a scripting language leveraging the work others in providing pre-package routines.
    I have access to the Revolution Analytics repackaging RevoR, and I like the IDE, but I remain unconvinced. R does have a fantastic library for nonparametric econometrics,
    the np package.

It is worthwhile spending some time making sure that the language chosen suits the project, because while switching is possible, it becomes more unlikely over time making dynamically inefficient equilibria likely.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: