Remember me

Register  |   Lost password?


The Practical Quant's Blog

The Practical Quant Blog Header

Hedge Fund Performance - August/2011

January 22, 2012 Comments (0)

It's been a while since I've posted anything about Hedge Fund performance. I figured that with the struggles of the markets last August that I should do a brief update. First off, here are the year-to-date performance numbers along with returns for the most recent month:(Click to enlarge)The S&P 500 was down close to 6% in August so it's not surprising that the Shorts did well last month. In fact Shorts are right up there with Global Macro managers through the first eight months of 2011....

Popular Links during Obama Jobs Speech & Republican Debate

January 22, 2012 Comments (0)

Here is a list of popular links shared on Twitter during the recent Republican Presidential debate at the Reagan Library, and during President Obama's speech to Congress. Links are ordered from the most popular (most tweeted), and were gathered using the Twitter Search and Untiny API's. I had code lying around and it took just a few minutes to generate these lists1. Links tweeted around the time of the Obama speech: judging from this partial list, Obama supporters weren't as active sharing...

The Best of @BigData Aug/2011

January 22, 2012 Comments (0)

Here are the most popular tweets/links shared through my BigData twitter stream:Hadoop prototype for incremental realtime analytics with MapReduce: a hash-based analytics framework from @UMassAmherstProbability & Graphs: Estimating Sizes of Social Networks, with a Biased Sampling method dramatically reduce required samplesExtracting common sense facts (causal reasoning): Simple, scalable methods for tapping Web-scale dataMining Massive Data Sets: Free Stanford CS lecture notes & outline, from a...

Bits from Scifoo 2011

January 22, 2012 Comments (0)

Just got back from Scifoo 2011 and wanted to share a few observations:Economics and AI: I had an interesting discussion with Alex Wissner-Gross, who has been using mathematics and machine-learning techniques in finance and other settings. I was particularly interested in what he and his colleagues have been doing in helping us understand (the limits of) high frequency trading. Alex isn't just a researcher, he also has experience founding startups.Citizen Science: Chris Lintott of Zooniverse...

Large-scale Named Entity Recognition in the Cloud

January 22, 2012 Comments (0)

Below is an amazing factoid shared by Carlos Guestrin during his GraphLab overview: CoEM is an entity-extraction algorithm introduced by Rosie Jones in 2005. In the slide above, the data set was a graph with 2M vertices and 200M edges. A team at CMU applied CoEM using Hadoop and found it took about 7.5 hours. GraphLab on far fewer cores (from 95 to 16 cores) took 30 minutes: 6X less cores, but still 30X faster. The same problem on 32 EC2 machines (with 256 processors) took 80 seconds, or 0.3%...

Rebranding Data Scientists as Data Alchemists

January 22, 2012 Comments (0)

The more I think about it, maybe the term Data Alchemists is more accurate than Data Scientists. For one thing Alchemy itself is regarded as a protoscience, which aptly describes the body of skills under the heading of Data Science. Secondly Alchemy describes what good Data Scientists do: they turn data into insights a power or process of transforming something common into something specialThe title Data Scientist itself is a misnomer: most Data Science practitioners are actually either...

Computer Vision & Compressed Sensing: Recovery of signals, images, & video from meager data

January 22, 2012 Comments (0)

An interesting new paper (The Split Bregman Method for L1-Regularized Problems) proposes a general set of tools applicable to a broad range of problems for which compressed sensing makes sense:The class of L1-regularized optimization problems has received much attention recently because of the introduction of “compressed sensing,” which allows images and signals to be reconstructed from small amounts of data. Despite this recent attention, many L1-regularized problems still remain difficult to...

Writing code that makes your data sing: Auditory Data Display Tools and Resources

January 22, 2012 Comments (0)

I was listening to Dr. Joshua A. Miele being interviewed by my local public radio station, when I overheard him mention tools for the auditory display of data. Dr. Miele, who happens to be blind, was trained in Psychoacoustics. His educational training and present work requires data analysis, which for those of us who can see, usually involves data visualization. In order to understand his data sets in graduate school, Dr. Miele ended up building a set of tools for the auditory display of data...

Factoids from Twitter's Engineering Open House

January 22, 2012 Comments (0)

I just got back from the first-ever (?) Twitter Engineering open house: three good talks, and a chance to mingle with Twitter engineers over lots of free food and drinks.Like any scrappy startup that has gone on to amass millions of users, Twitter has had to overhaul key portions of its technical infrastructure. The event was a chance to hear (publicly for the first time) about some of those changes. Twitter like many companies relies heavily on open source technologies. But as with most...

Text Mining and Twitter III- LDA Code on Hadoop

January 22, 2012 Comments (0)

Yahoo Research just released LDA code that runs on Hadoop. They claim that this is the fastest implementation of LDA to-date:It’s seriously fast and scales very well to 1000 machines or more (don’t worry, it runs on a single machine, too). We believe that at present this is the fastest implementation you can find, in particular if you want to have a) 1000s of topics, b) a large dictionary, c) a large number of documents, and d) Gibbs sampling. It handles quite comfortably a billion documents....