point

Remember me

## Statistics with Julia: The Basics

Thu, 31 May 2012 05:38:33 GMT

Before running any statistical analysis with the Julia programming language, I thought it would be fruitful to start by giving a (very) brief introduction to the syntax and basic language features.

The Julia manual is already very detailed, so that should be considered the first source; I am here only going to scratch the surface, and put things in perspective (relative to R and Python/Pandas). Julia's syntax mostly resembles Matlab, so users of that language will be immediately comfortable.

But first, I would also be remiss if I didn't highlight two exciting recent developments:

### Installation

The installation instructions are documented on the github page, but different builds are available for download here, including a windows build which was just made available thanks to Keno Fischer and Jameson Nash (available for download here). .

Julia is most readily available on Linux or Mac OS X. I am running Julia on Ubuntu. If you want full control over the language (i.e. use the source, Luke), then it may be easier to switch off Windows to Ubuntu (with Wubi) or Debian (with http://goodbye-microsoft.com/). Or there's always the virtual machine approach using VMWARE and a Bagside application.

### Data Types

Julia is "typeless" like other dynamic languages, but it comes equiped with a really powerful type system. This means that you don't have to declare a variable type, but that you can do so and can easily create your own types. Julia is dynamically typed, but it can achieve such good performance through type inference with JIT from LLVM.

Just like R and Python, simply entering a number into the Julia REPL results in its immediate type inference without explicit declaration.

`julia> typeof(1) Int64`

``` julia> typeof(1.0) Float64 ```

`julia> typeof("hello world") ASCIIString Methods for generic function ASCIIString ASCIIString(Array{Uint8,1},)`

Julia also has the all-import unknown datatypes:

`julia> 1/0 Inf`

``` julia> Inf Inf ```

`julia> NaN NaN`

And these might themselves be considered numeric types:

`julia> NaN + 1 NaN`

``` ```

`julia> NaN + "a" +(Float64, ASCIIString) no method +(Float64,ASCIIString)`

The mathematical focus of Julia is also immediately apparent by the ability to specify mathematical formulas without excess notation.

`julia> 1.5x^2 - .5x + 1 13.0`

Also, similar to some Lisps, Julia supports imaginary and rational numbers (using the // operator). This is big deal, because beyond everything else, it means that you can avoid some floating point errors.

`julia> 2//3 + 1//3 + 2 == 3 true`

### Arrays

Julia comes with a flexible array type, which can hold any number of dimensions.

Most simple arithmetic/logic functions can be used in a vectorized form by affixing a ".".

`julia> x = randn(10) [0.120646, 0.857561, 0.819921 ... -1.80995, -0.466323, -0.111218]`

``` julia> 2x [0.241292, 1.71512, 1.63984, -0.328591 ... -3.61991, -0.932645, -0.222436] julia> x * x *(Array, Array) no method *(Array{Float64,1},Array{Float64,1}) ```

`julia> x .* x [0.0145555, 0.735411, 0.67227, 0.026993 ... 3.27594, 0.217457, 0.0123695]`

Another very useful feature is comprehensions. This is based on the set notation in mathematics, and it basically defines and function and then iterates over values within that function.

`julia> [ x^2 | x=1:10 ] {1, 4, 9, 16, 25, 36, 49, 64, 81, 100}`

Python users will find this very similar to list comprehensions.

As in NumPy, a Matrix in Julia is just a 2-D array:

`julia> Matrix Array{T,2}`

### Functions

Functions in Julia are very similar to functions in R and Python. They can be declared in long form or on one line:

`function f(x,y) x + y end`

``` f(x,y) = (z = x + y; 2z) ```

`julia> f(3, 4) 14`

You can also return values with the `return` command, particularly if there are multiple routes through a function.

Julia also supports anonymous functions (equivalent to a lambda function in Python).

`julia> map(x -> x/2.0, [1,3,-1]) [0.5, 1.5, -0.5]`

Functions also support multiple return values and varargs (using the ellipsis ...). Functions currently do not support default parameter values, although that's in the works.

Julia also include methods which allow for multiple dispatch:

Although it seems a simple concept, multiple dispatch on the types of values is perhaps the single most powerful and central feature of the Julia language.

This allows for different behavior depending on the parameters passed to the function. This can readily be see by typing + at the console.

`same_type_numeric{T<:Number}(x::T, y::T) = true same_type_numeric(x::Number, y::Number) = false`

``` julia> same_type_numeric(1, 2) true ```

`julia> same_type_numeric(1, 2.0) false`

I have really glossed over many details (e.g. flow control and loops) as well as advanced language features (such as metaprogramming and parallel computing). But this should give a taste. All else is readily available in the manual.

Next up, I will give a short review of some linear algebra in Julia, before starting to look at basic statistical analysis in the language.