point

 

 Remember me

Register  |   Lost password?

 

« »

Quant Pythonista's Blog

Quant Pythonista Blog Header

A blog about scientific Python, quant finance, statistics, data analysis, and hacking of all kinds

Mastering high performance data algorithms I: Group By

May 13, 2012 Comments (0)

I’m on my way back from R/Finance 2012. Those guys did a nice job of organizing the conference and was great to meet everyone there. As part of pandas development, I have had to develop a suite of high performance …Read more »

A O(n log n) NA-friendly “as of” array operations

May 4, 2012 Comments (0)

In time series data, it’s fairly common to need to compute the last known value “as of” a particular date. However, missing data is the norm, so it’s a touch more complicated than doing a simple binary search. Here is …Read more »

Why I’m not on the Julia bandwagon (yet)

May 3, 2012 Comments (0)

I apologize in advance for writing what might be the first neutral to negative article about Julia that you’ve read. For those of you who don’t know, Julia is a new dynamic language with an LLVM-based JIT compiler designed for …Read more »

The need for an embedded array expression compiler for NumPy

April 15, 2012 Comments (0)

Yesterday I found myself adding some additional Cython methods for doing fast grouped aggregations in pandas. To my disappointment, I found myself duplicating a lot of code and not having much alternative beyond cooking up some kind of ad hoc …Read more »

vbench Lightning Talk Slides from PyCon 2012

March 19, 2012 Comments (0)

I prepared this lightning talk for PyCon on vbench, my performance monitoring tool, but didn’t have a chance to give it. If I get energetic I might make a screencast of it, but here are the slides: vbench: lightweight performance …Read more »

Contingency tables and cross-tabulations in pandas

March 19, 2012 Comments (0)

Someone recently asked me about creating cross-tabulations and contingency tables using pandas. I wrote a bit about this in October after implementing the pivot_table function for DataFrame. So I thought I would give a few more examples and show R …Read more »

NYCPython 1/10/2012: A look inside pandas design and development

March 19, 2012 Comments (0)

I had the privilege of speaking last night at the NYCPython meetup group. I’ve given tons of “use pandas!” talks so I thought I would take a slightly different angle and talk about some of the design and implementation work …Read more »

High performance database joins with pandas DataFrame, more benchmarks

March 19, 2012 Comments (0)

I posted a brief article with some preliminary benchmarks for the new merge/join infrastructure that I’ve built in pandas. I compared the performance with base::merge in R which, as various folks in the R community have pointed out, is fairly …Read more »

Some pandas Database Join (merge) Benchmarks vs. R base::merge

March 19, 2012 Comments (0)

Over the last week I have completely retooled pandas’s “database” join infrastructure / algorithms in order to support the full gamut of SQL-style many-to-many merges (pandas has had one-to-one and many-to-one joins for a long time). I was curious about …Read more »

Introducing vbench, new code performance analysis and monitoring tool

March 19, 2012 Comments (0)

Do you know how fast your code is? Is it faster than it was last week? Or a month ago? How do you know if you accidentally made a function slower by changes elsewhere? Unintentional performance regressions are extremely common …Read more »