I just finished reading "Dark Pools" by Scott Patterson. The book documents the foundation of the major electronic exchanges (from Island through BATS) and high-frequency trading firms, along with the related regulations. Ironically, the book says virtually nothing about actual dark pools (such as Goldman's Sigma X). Patterson makes this clear in the first note:

]]>

Linear regression, and ordinary least squares in particular, is one of the most popular tools for data analysis. Continuing on my series about using the Julia language for basic statistical analysis with a review of the most well known direction solutions to the least squares problem.

read more...]]>

Linear algebra lies at the heart of many modern statistical methods. As such, continuing on my short series on using the Julia language for basic statistical analysis, I want to give a short review of some basic matrix and vector procedures which we will use subsequently when constructing some simple optimization routines.

read more...]]>

Before running any statistical analysis with the Julia programming language, I thought it would be fruitful to start by giving a (very) brief introduction to the syntax and basic language features.

read more...]]>

I first heard about the **Julia **programming language a little over a month ago, in the middle of February with their first blog post: "Why We Created Julia". This was an exciting turn of events.

]]>

Readers of this blog will no doubt be eagerly following along with the continuing developments of open education at Coursera and Udacity. As a professed autodidact, I have been a long-time consumer of online education, especially through iTunes university, and I'm really excited to see where everything is going.

read more...]]>

We considered the problem of overfitting as model complexity increase in the prior post. Now we look at one way to control for this problem: regularization. The basic idea is to penalize each the model, essentially saying that we don't entirely believe the fit that falls out of our optimization. Since we are fitting to a sample of the data, overfitting will mean that the resulting model doesn't generalize well: it won't fit well to new datasets since they are unlikely to match the training data exactly.

read more...]]>

Data analysis is part science, part art. It is part algorithm and part heuristic. Of the various approaches to data analysis, machine learning falls more on the side of purely algorithmic, but even here we have many decisions to make which don't have well-defined answers (e.g. which learning algorithm to use, how to divide the data into training/test/validation). Learning theory provides some guidance for how to build a model that is generalizable and can be used for prediction, which is the primary goal of machine learning.

read more...]]>

The initial lectures in Stanford CS229a were concerned with regression problems where the predicted value was a continuous number. Another class of problems is concerned with discrete problems, where values are divided into groups (e.g. on or off; red, green, or blue). This builds on all the material from the previous linear regression lectures.

read more...]]>

The next set of lectures in CS229 covers "Linear Regression with Multiple Variables", also known as Multivariate Regression. This builds on the univariate linear regression material and results in a more general procedure.

read more...]]>