Remember me

Register  |   Lost password?


 

StatAlgo Logo


Statistics with Julia

Thu, 31 May 2012 05:38:30 GMT

I first heard about the Julia programming language a little over a month ago, in the middle of February with their first blog post: "Why We Created Julia". This was an exciting turn of events.

We want a language that’s open source, with a liberal license. We want the speed of C with the dynamism of Ruby. We want a language that’s homoiconic, with true macros like Lisp, but with obvious, familiar mathematical notation like Matlab. We want something as usable for general programming as Python, as easy for statistics as R, as natural for string processing as Perl, as powerful for linear algebra as Matlab, as good at gluing programs together as the shell. Something that is dirt simple to learn, yet keeps the most serious hackers happy. We want it interactive and we want it compiled.

This is music to my ears. This is what I want too! The thing that's really exciting is that it actually looks like the language may deliver on these things. And after a very short period of time, it is gaining a significant amount of traction on the Julia developers list (an important indicator for whether a language will succeed). [I was also interested to see that one of the language creators, Stefan Karpinski, was a high school classmate.]

I was recently reading the "Steve Jobs" biography and Jobs discussed one major realization after selling the Apply 1: to go beyond the geeks, it would be necessary to include the full package, such as a monitor, keyboard, and power supply. Julia is still at this early stage: it must be built off github, and only supports Linux and OS X. But the documentation is already extensive.

I use R and Python for all my research (with Rcpp or Cython as needed), but I would rather avoid writing in C or C++ if I can avoid it. R is a wonderful language, in large part because of the incredible community of users. It was created by statisticians, which means that data analysis lies at the very heart of the language; I consider this to be a major feature of the language and a big reason why it won't get replaced any time soon. Python is generally a better overall language, especially when you consider its blend of functional programming with object orientation. Combined with Scipy/Numpy, Pandas, and statsmodels, this provides a powerful combination. But Python is still lacking a serious community of statisticians/mathematicians.

There are always other languages to consider. I've tried OCaml, Haskell, J, K, Q, along with Matlab and Mathematica. These are all great languages and platforms. But they are generally lacking something, by either being expensive and closed source or simply lacking features and community support. It wasn't too long ago when people were considering Clojure with Incanter as an alternative. But while clojure is a nice language (i.e. Lisp is a nice language), Incanter is not a serious option for replacing R. For starters: it's performance was worse for very basic operations. And it doesn't have anywhere near the amount of libraries for analysis.

Julia and R

My interest has continued to grow with the active involvement of Douglas Bates and Harlan Harris on the Julia discussion list. Bates also wrote a nice blog post showing a performance comparison vs. R and Rcpp. Some of the discussion has been taking place on the Julia developers list:

The addition of a real data frame, and appropriate handling of NA/NaN values, will be a serious addition to Julia.

There has also been some discussion taking place on the R developers list.

The question remains: Is Julia a viable option for statistics and machine learning at this stage? I'm going to start a short blog series exploring some simple analysis with the language over the next few weeks to try and explore the language a little further. My hope is to learn a little about the language and draw some attention to interesting new developments.

[Note: I should also draw attention to Vince Buffalo's post on the same topic.]

, , , , , , , , , , , , , , , , , , , ,