The Software Sustainability Institute, Mathworks, and the Software Carpentry group recently collaborated to run a course at Manchester University. The event was designed to teach best practices in software engineering to young researchers and mainly focused on three points:
- the command line and shell scripting (mainly in Bash).
- version control, and in particular Git.
- data manipulation, unit testing, and performance considerations in MATLAB.
In this post I’ll highlight what I took away from the course and give links to some useful information.
Working with the shell
Learning to work with the command line is great for productivity. Once you know a few commands, for example, it becomes much faster to move all current files into a new folder with
mkdir NewFolder && mv * NewFolder
than clicking and dragging with a mouse.
It’s also great for automating repetitive tasks. Recently I was sent some matrices to test out a new algorithm on but unfortunately they were just in a plain text file with row numbers printed at the beginning of each line (which I needed to remove). I could have spent hours removing the line numbers by hand but instead used the Perl regex
$row =~ s/^\d+\s+//;
to do it all for me.
The Software Carpentry course notes can be found here but an alternative basic introduction to the shell is from lifehacker and an advanced scripting tutorial from the University of Birmingham can be found here.
In Windows you can install Git and Bash simultaneously with Git Bash.
Version control in Git
In my experience a lot of people don’t take version control too seriously whereas it should be a vital part of any project. Tracking changes, merging work with multiple collaborators, and reverting to a previous version where everything worked–for when things really get bad–is indispensable. For code projects it’s also nice to have a stable branch where everything works and experimental branches for adding new features.
Personally I use Git, not only for code, but also on LaTeX documents and other things. I also use the Git commit messages and log to keep a diary of what I’ve been working on which is pretty useful a few months later!
A great introduction to Git can be found here, dealing with local and remote repositories, branching and other tips. I’ve also found the SciPy Development Workflow very helpful: it shows how to contribute to SciPy but gives a good example of how to use Git on a major project.
Software Engineering in MATLAB
The final part of the course dealt with software engineering practice, using MATLAB as the development environment. The most interesting parts for me were the new unit testing framework (introduced in MATLAB 2013a, but much improved in 2013b) and the performance tips.
The newly introduced function-based unit tests make it really easy to write a lot of tests and analyze the results with minimal effort. The official MATLAB documentation for this is here. I’d also never seen the validateattributes function, which easily checks common properties of arrays such as size, sparsity, and whether it contains real or complex elements. This compacts a lot of error checking in functions to just one line!
In terms of performance we discussed things like preallocating memory and vectorization of loops, the relevant documentation is here and here. Using these techniques as often as possible can lead to enormous gains in performance so it’s worth while reading through them.
Personally I found this course very useful and I’d recommend attending if you get the chance. A list of the current planned courses is available on this page.