point

Remember me

## StatAlgo's Blog

### Market Microstructure and High-Frequency Trading: Part 1

I just finished reading "Dark Pools" by Scott Patterson. The book documents the foundation of the major electronic exchanges (from Island through BATS) and high-frequency trading firms, along with the related regulations. Ironically, the book says virtually nothing about actual dark pools (such as Goldman's Sigma X). Patterson makes this clear in the first note: The title of this book doesn't entirely refer to what is technically known in the financial industry as a "dark pool". Narrowly...

### Statistics with Julia: Least Squares Regression with Direct Methods

Linear regression, and ordinary least squares in particular, is one of the most popular tools for data analysis. Continuing on my series about using the Julia language for basic statistical analysis with a review of the most well known direction solutions to the least squares problem. The Least Squares approach to linear regression was discovered by Gauss and first published by Legendre, although there has been some historical controvesy over this point (a translation of Legendre's original...

### Statistics with Julia: Linear Algebra with LAPACK

Linear algebra lies at the heart of many modern statistical methods. As such, continuing on my short series on using the Julia language for basic statistical analysis, I want to give a short review of some basic matrix and vector procedures which we will use subsequently when constructing some simple optimization routines. [Note: The relevant Julia manual section lists all the relevant functions and should be considered the primary source.] Linear algebra provides a mechanism for efficiently...

### Statistics with Julia: The Basics

Before running any statistical analysis with the Julia programming language, I thought it would be fruitful to start by giving a (very) brief introduction to the syntax and basic language features. The Julia manual is already very detailed, so that should be considered the first source; I am here only going to scratch the surface, and put things in perspective (relative to R and Python/Pandas). Julia's syntax mostly resembles Matlab, so users of that language will be immediately comfortable....

### Statistics with Julia

I first heard about the Julia programming language a little over a month ago, in the middle of February with their first blog post: "Why We Created Julia". This was an exciting turn of events. We want a language that’s open source, with a liberal license. We want the speed of C with the dynamism of Ruby. We want a language that’s homoiconic, with true macros like Lisp, but with obvious, familiar mathematical notation like Matlab. We want something as usable for general programming as Python,...

### The Open Education Movement Continues

Readers of this blog will no doubt be eagerly following along with the continuing developments of open education at Coursera and Udacity. As a professed autodidact, I have been a long-time consumer of online education, especially through iTunes university, and I'm really excited to see where everything is going. I regretfully won't have time to fully explore the offerings over the next few months. I am signed up for Natural Language Processing, Probabilistic Graphical Models, Game Theory,...

### Stanford ML 5.2: Regularization

We considered the problem of overfitting as model complexity increase in the prior post. Now we look at one way to control for this problem: regularization. The basic idea is to penalize each the model, essentially saying that we don't entirely believe the fit that falls out of our optimization. Since we are fitting to a sample of the data, overfitting will mean that the resulting model doesn't generalize well: it won't fit well to new datasets since they are unlikely to match the training...

### Stanford ML 5.1: Learning Theory and the Bias/Variance Trade-off

Data analysis is part science, part art. It is part algorithm and part heuristic. Of the various approaches to data analysis, machine learning falls more on the side of purely algorithmic, but even here we have many decisions to make which don't have well-defined answers (e.g. which learning algorithm to use, how to divide the data into training/test/validation). Learning theory provides some guidance for how to build a model that is generalizable and can be used for prediction, which is the...

### Stanford ML 4: Logistic Regression and Classification

The initial lectures in Stanford CS229a were concerned with regression problems where the predicted value was a continuous number. Another class of problems is concerned with discrete problems, where values are divided into groups (e.g. on or off; red, green, or blue). This builds on all the material from the previous linear regression lectures. The first classification model introduced in known as logistic regression (even though it is not technically a regression model since it is used for...