]]>A recent trend in computer science and related fields is General-Purpose computation on Graphics Processing Units (GPGPU), which can yield impressive performance, i.e. the required processing times can be reduced to a great extent.

The Compute Unified Device Architecture (CUDA) is a programming approach for performing scientific calculations on a Graphics Processing Unit (GPU) as a data-parallel computing device. The programming interface allows to implement algorithms using extensions to standard C language. With continuously increased number of cores in combination with a high memory bandwidth, a recent GPU offers incredible resources for general purpose computing.

**Abstract**

A recent trend in computer science and related fields is general purpose computing on graphics processing units (GPUs), which can yield impressive performance. With multiple cores connected by high memory bandwidth, today’s GPUs offer resources for non-graphics parallel processing. This article provides a brief introduction into the field of GPU computing and includes examples. In particular computationally expensive analyses employed in financial market context are coded on a graphics card architecture which leads to a significant reduction of computing time. In order to demonstrate the wide range of possible applications, a standard model in statistical physics – the Ising model – is ported to a graphics card architecture as well, resulting in large speedup values.

]]>- computational finance
- scientific computing on GPUs
- random number generation
- finite difference applications
- current developments

**Abstract**

The pricing of options is a very important problem encountered in financial engineering since the creation of organized option trading in 1973. This sample shows an implementation of the Black-Scholes model in CUDA for European options.

]]>

**Abstract**

The pricing of options has been a very important problem encountered in financial engineering since the advent of organized option trading in 1973. As more computation has been applied to finance-related problems, finding efficient implementations of option pricing models on modern architectures has become more important. This white paper describes an implementation of the Monte Carlo approach to option pricing in CUDA. For complete implementation details, please see the “MonteCarlo” example in the NVIDIA CUDA SDK.

]]>Attendees discovered why many of the world's leading financial institutions rely on Mathematica as their computational tool of choice and learned how using Mathematica in combination with NVIDIA's groundbreaking GPU technology can give you the edge you need in an increasingly competitive environment.

In this video, NVIDIA's Senior CUDA Consultant John Ashley explains how CUDA programming is changing financial computation.

]]>

**Benedikt Wilbertz***PMA - Laboratoire de Probabilites et Modeles Aleatoires*

**Abstract**

The pricing of American style and multiple exercise options is a very challenging problem in mathematical finance. One usually employs a Least-Square Monte Carlo approach (Longstaff-Schwartz method) for the evaluation of conditional expectations which arise in the Backward Dynamic Programming principle for such optimal stopping or stochastic control problems in a Markovian framework. Unfortunately, these Least-Square Monte Carlo approaches are rather slow and allow, due to the dependency structure in the Backward Dynamic Programming principle, no parallel implementation; whether on the Monte Carlo levelnor on the time layer level of this problem.

We therefore present in this paper a quantization method for the computation of the conditional expectations, that allows a straightforward parallelization on the Monte Carlo level. Moreover, we are able to develop for AR(1)-processes a further parallelization in the time domain, which makes use of faster memory structures and therefore maximizes parallel execution.

Finally, we present numerical results for a CUDA implementation of this methods. It will turn out that such an implementation leads to an impressive speed-up compared to a serial CPU implementation.

]]>**Abstract**

The compute unified device architecture is an almost conventional programming approach for managing computations on a graphics processing unit (GPU) as a data-parallel computing device. With a maximum number of 240 cores in combination with a high memory bandwidth, a recent GPU offers resources for computational physics. We apply this technology to methods of fluctuation analysis, which includes determination of the scaling behavior of a stochastic process and the equilibrium autocorrelation function. Additionally, the recently introduced pattern formation conformity (Preis T et al 2008 Europhys. Lett. 82 68005), which quantifies pattern-based complex short-time correlations of a time series, is calculated on a GPU and analyzed in detail. Results are obtained up to 84 times faster than on a current central processing unit core. When we apply this method to high-frequency time series of the German BUND future, we find significant pattern-based correlations on short time scales. Furthermore, an anti-persistent behavior can be found on short time scales. Additionally, we compare the recent GPU generation, which provides a theoretical peak performance of up to roughly 1012 floating point operations per second with the previous one.

]]>