My wife Caroline is a teacher in a primary school near where we live in Buckinghamshire. She teaches year 6, that is to say, ten to eleven year old children.

Now, year 6 is a rather important year in British schools. Children following the National Curriculum in state-funded schools are subject to SATs (standard assessment tests) before they move on to secondary school, and the results of these tests are used to create the dreaded league tables which are supposed to help parents make an informed choice about which school to send their children to. Importantly, the number of pupils can affect the amount of funding a school gets.

Year 6 teachers are therefore under a lot of pressure to help their pupils achieve good SATs results, so extra sessions and booster groups are common. This year Caroline asked if I could go into school to spend a few hours with her top maths set - ones that it had been decided were good enough to be entered for the harder (Level 6) papers.

I'm used to teaching adults on NAG training courses, but this was the first time I'd been into a primary school to teach. It turned out to be good fun! I had four students, all of whom had already passed their "eleven plus" and would be going to grammar school. Topics we covered included straight line graphs, probability and algebra. The kids were enthusiastic, and not too shy. Many of the problems tackled turned out to involve rearrangement of equations, so drilling into them that if you do the same thing to both sides of an equation (with some caveats of course) then the equality still holds seemed quite important.

Whether the short time I spent with them actually made any difference is hard to know, but I really enjoyed the interaction and the children seemed to also. The SATs results are not yet out, and scoring a Level 6 in Mathematics or English is quite difficult (the average 11 year old is expected to achieve Level 4) - so there will be no dishonour in not achieving it, but I'm looking forward to hearing how they get on. And I'm hoping that next year, as part of NAG's Community Outreach Programme, I'll be able to go into school again, and maybe for more hours this time around.

## Thursday, 25 June 2015

## Thursday, 18 June 2015

### Wilkinson Prize for Numerical Software 2015

by Mike Dewar, Chief Technical Officer, Numerical Algorithms Group (NAG)

NAG, along with Argonne
National Laboratory and the National Physical Laboratory (NPL), awards the
Wilkinson Prize every four years to the authors of an outstanding piece of
numerical software. The prize honours the pioneering work of Dr James Hardy Wilkinson in the field of numerical algorithms, work which had a
fundamental impact on NAG in our early years and which still resonates
today. The Wilkinson Prize is unique in that it is awarded on the
basis of the quality of a piece of software, not just for the mathematics
behind it, and the judges consider the documentation and test suites as well as
the software itself.

This year the winners are P.E. Farrell (University of Oxford), S.W. Funke (Simula Research Laboratory), D.A. Ham (Imperial College London), and M.E. Rognes (Simula Research laboratory) for the development of dolfin-adjoint, a package which automatically derives and solves adjoint and tangent linear equations from high-level mathematical specifications of finite element discretisations of partial differential equations. The prize will be presented at ICIAM in Beijing in August, during the SIAM Awards Lunch.

More details about the Wilkinson Prize and past winners can be found here.

James Hardy Wilkinson |

This year the winners are P.E. Farrell (University of Oxford), S.W. Funke (Simula Research Laboratory), D.A. Ham (Imperial College London), and M.E. Rognes (Simula Research laboratory) for the development of dolfin-adjoint, a package which automatically derives and solves adjoint and tangent linear equations from high-level mathematical specifications of finite element discretisations of partial differential equations. The prize will be presented at ICIAM in Beijing in August, during the SIAM Awards Lunch.

More details about the Wilkinson Prize and past winners can be found here.

## Thursday, 21 May 2015

### Introducing the team: Gail Austin, Customer Support Coordinator, Team Leader

**Gail, describe your role at NAG?**

I manage the Technical Support Service, the Sales
Operations Team, NAG Oxford Reception and I am website editor for the marketing
area of the website.

**Can you give examples of the customer problems you help solve via NAG Technical Support?**

Rather than solve users technical issues my role is
connected to the direction of incoming support queries and the issuing and
renewal of software licences. Over time I have come to understand the
installation of our software and so can help users with some of their licensing
and software applicability issues.

**Tell us something special/unusual about NAG?**

I believe that NAG truly cares about giving our customers
the best possible service. This comes across again and again in the
positive feedback we receive following the closure of technical support
queries.

**Tell us something special/unusual about yourself?**

I’m just a little bit proud of the fact that when helping
Mick Pont with some tests he was carrying out for a presentation of NAG’s
Random Number Generator, I was found to be more random!

**Are you attending any exhibitions or conferences this year?**

I’m not, I’m too busy in the office, helping our customers
to get the best from NAG.

**What’s been the highlight of your time working at NAG so far?**

That’s a difficult one, I don’t think I can put my finger
on any one thing. I think it’s enough to say that I hope to stay at NAG
for the remainder of my career, I love my job here and I consider myself really
lucky to have finally found my niche, I’m as at home here as I am ……… at home :-)

Labels:
customer support,
installation,
licence management

## Wednesday, 29 April 2015

### Professor Mike Powell F.R.S

We are pained to pass on the sad news of the death on 19th
April of Prof Mike Powell F.R.S, a brilliant numerical analyst specialising in
numerical optimisation and approximation theory.

Mike touched many lives with his work. NAG and NAG's
community benefitted greatly because he gave freely his code to NAG from the very beginning.
The more mature may remember the routine E04DCF, an implementation of his hybrid method for
unconstrained optimisation. More recently his work on derivative-free
optimisation led to E04JCF/e04jcc entering our Libraries as implementations of
BOBYQA.

As a founder member of the IMA, Mike took an interest in the
IMANA Newsletter and encouraged NAG to contribute regularly to that until the
Newsletter's role was subsumed by the internet.

The Wikipedia page http://en.wikipedia.org/wiki/Michael_J._D._Powell
gives more detail of Mike's many achievements.

He will be greatly missed by NAG, especially past and present colleagues who had the pleasure of knowing him personally. We pass on our love, condolences and very best wishes to his family.

Labels:
life numerical analyst,
Mike Powell

## Wednesday, 15 April 2015

### Introducing the team: Craig Lucas, Senior Technical Consultant

*Craig, describe your role at NAG?*I am a Senior Technical Consultant based in NAG’s Manchester Office. Here I manage part of NAG’s High Performance Computing group. I come from a numerical linear algebra background. For my MSc dissertation I looked at nearest correlation matrix problems, something I still work on today, and this year I have an MSc student looking at more new algorithms. Another big interest of mine is shared memory parallelism with OpenMP, and in particular helping users to better exploit it.

**Can you give examples of the customer problems you help solve via NAG Technical Support?**

Technical Support is a great way to interact with our users. Queries can range from something very simple, helping people link their program to the NAG Library for the first time, to an in-depth discussion about the accuracy of an optimization routine. It is often challenging and always an opportunity to learn something new about the mathematics of the Library. Sometimes I need to read our documentation to help me get standard!

**Tell us something special/unusual about NAG?**

Working at NAG is very collaborative, whether that is working with universities or with other NAG colleagues. One day I might be listening to a Manchester student talking about their new research, another I might be at the whiteboard in the office discussing the nuances of a new NAG algorithm.

**Tell us something special/unusual about yourself?**

I came to Mathematics relatively late and was 27 when I went to university. I had somehow drifted into Civil Engineering as a teenager, working in the area of land reclamation. Much of my work was dealing with closed coal mines; this was Staffordshire in the late 1980’s. Perhaps I had some sort of epiphany, I certainly realised that a childhood interest needed some proper exploration. I will always be grateful to Graham Bowtell at City University who decided to take a risk on a mature student without any proper mathematics qualifications.

**Are you attending any exhibitions or conferences this year?**

My next conference is The University of Manchester SIAM Student Chapter Conference 2015. This annual event is another great opportunity to hear what young researchers are working on. I co-judge the Best talk and Best Poster prizes. It is a nice way to give back to a community that NAG really relies on.

Labels:
algorithms,
hpc,
NAG Library,
nearest correlation matrix,
OpenMP,
SIAM

## Thursday, 26 March 2015

### Introducing the team: Mick Pont, NAG Principal Technical Consultant

Over the next few months, scattered in between our technical blog posts, we are going to publish interviews with NAG colleagues. Our first interview is with Mick Pont.

**Mick, what is your role at NAG?**
I'm involved in the development and peer review of new NAG Library software and documentation, and in the scheduling of software production in line with company targets.

In "project management" speak, I'm named as "Executive" for the NAG Mark 26 Library (the NAG Library, Mark 25 is due for launch in April 2015) project, which means that I'm supposed to "provide support and advice to the project manager, and ensure that the project outcome is good and provides NAG with increased value".

I also produce the NAG Technical Support rotas (most of the NAG development team are on one or other of these rotas regularly). Handling support calls is a good way to get to know our customers, understand the issues they face and help solve them.

I build and test most of the Windows-based libraries that NAG produces, but I'm an operating system agnostic - I like to use Linux and other systems too.

I often go on customer site visits with my sales colleagues, as technical backup.

I also teach various training courses for NAG software, including the NAG Toolbox

*for MATLAB®*and the NAG Library; these training days usually take place at customer sites.**Can you give examples of the customer problems you help solve when you’re in your support role?**

Helping users get their NAG software and licence keys installed correctly; helping them migrate from one version of the NAG Library to another; advising on the best NAG routine to solve a particular problem.

**Tell us something special/unusual about life at NAG?**

We try to celebrate all of the world's festivals where appropriate. Icelandic cream bun day is a big favourite of mine.

**Tell us something special/unusual about yourself?**

I'm a solipsist, and I choose not to believe in this question.

**What industry events are you going to be attending this year?**

Lots of training days at customer sites around the UK, and later in the year I will be attending SC15 in Austin, Texas.

You can read some of Mick’s blog posts here:

Labels:
NAG Library,
NAG Toolbox for Matlab,
Project Management,
SC15,
solipsist

## Thursday, 5 March 2015

### Optimization, NAG and Me - 5 Years and Counting

*Authored by Jan Fiala, NAG Numerical Software Developer*

It feels almost like yesterday since I joined NAG so it is hard to believe that it has been 5 years already. Looking back, I see a long path I would like to tell you a bit about, just in case you were curious about one of the people answering your support queries.

I was always a blend of a computer scientist and a mathematician. Computers and programming were my hobby but it did not feel quite right to choose either as the main subject at university so I picked mathematics. Thereafter began my hunt to get as close to computers as possible. Numerical analysis and mathematical programming/optimization were my lucky answer.

I really enjoyed my university years, however an academic career is not for everyone. The necessity to write papers ("publish or perish"), going through a couple of postdocs before getting a tenure, fund raising, teaching and going ever deeper and deeper into a very narrow research field are some of the disadvantages. I love to learn, play and to hack things. To make them work. So, when I got the opportunity, I joined NAG.

NAG is a great place to work if you find scientific computing interesting. There is a very long history, in fact, the first NAG Library was distributed to the users even before I was born! NAG today is not confined to numerical software (NAG Libraries). The organisation has expanded to numerical and HPC services, compilers and more. However my job still consists mainly of work on the Libraries, developing new optimization solvers and maintaining the existing ones.

The best thing is the challenge -- you need get your hands dirty in various fields: from specialized solvers for derivative free optimization, through classical convex optimization to nonlinear semidefinite programming. I went through small pet projects (have you read my AMPL tutorial) through bespoke versions of our solvers tailored to special customers' data to a 50000 lines beast, a new semidefinite programming solver which will be released soon.

In addition, you mustn't forget to help the customers via our friendly support desk, write documentation and examples and keep an eye on the up-to-date algorithmic developments. I can still attend conferences as before, only the focus has changed from an academic to a more pragmatic approach. The relative freedom of the focus is almost limitless, as my ideas contribute to the development roadmap and thence to the future development. I won't lie to you. The time was not all green and great -- writing test programs or bug fixes can be tedious but they have to be done. At least it makes you look forward to the next new challenge which is waiting just around the corner.

If my story sounds familiar, you might be pleased to know that NAG is expanding our optimization team. So ladies and gentlemen, why are you waiting? Apply now and help us make the next big project happen.

Labels:
careers,
computer scientist,
mathematician,
NAG Library,
optimization,
programming

## Wednesday, 25 February 2015

### Advanced Analytics on Apache Spark

Developed in AMPLab at UC Berkeley, Apache Spark has become an increasingly popular platform to perform large scale analysis on Big Data. With run-times up to 100x faster than MapReduce, Spark is well suited for machine learning applications.

Spark is written in Scala but has APIs for Java and Python. As the NAG Library is accessible from both Java and Python, this allows Spark users access to over 1600 high quality mathematical routines. The NAG Library covers areas such as:

####

The fundamental datatype used in Spark is the Resilient Distributed Dataset (RDD). A RDD acts as a pointer to your distributed data on the filesystem. This object has intuitive methods (count, sample, filter, map/reduce, etc) and lazy evaluation that allow for fast and easy manipulation of distributed data.

Below is a simple Python example of using the NAG Library in Spark to calculate the cumulative Normal distribution function on a set of numbers (the message passing output from Spark has been omitted):

For more information on using the NAG Library in Spark or any of the other topics touched upon in this article please contact NAG at support@nag.com.

Spark is written in Scala but has APIs for Java and Python. As the NAG Library is accessible from both Java and Python, this allows Spark users access to over 1600 high quality mathematical routines. The NAG Library covers areas such as:

- Machine Learning including
- Linear regression (with constraints)
- Logistic regression (with constraints)
- Principal Component Analysis (A good article relating Machine Learning and PCA can be found here)
- Hierarchical cluster analysis
- K-means
- Statistics including
- Summary information (mean, variance, etc)
- Correlation
- Probabilities and deviates for normal, student-t, chi-squared, beta, and many more distributions
- Random number generation
- Quantiles
- Optimization including
- Linear, nonlinear, quadratic, and sum of squares for the objective function
- Constraints can be simple bounds, linear, or even nonlinear
- Matrix functions
- Inversion
- Nearest correlation
- Eigenvalues + eigenvectors

####
__Calling the NAG Library on Spark__

The fundamental datatype used in Spark is the Resilient Distributed Dataset (RDD). A RDD acts as a pointer to your distributed data on the filesystem. This object has intuitive methods (count, sample, filter, map/reduce, etc) and lazy evaluation that allow for fast and easy manipulation of distributed data.__Calling the NAG Library on Spark__

Below is a simple Python example of using the NAG Library in Spark to calculate the cumulative Normal distribution function on a set of numbers (the message passing output from Spark has been omitted):

**SparkContext available as sc**

**>>> from nag4py.s import nag_cumul_normal**

**>>> myNumbers = sc.parallelize( [-2.0, -1.0, 0.0, 1.0, 2.0] )**

**>>> myNumbers.takeSample(False, 5, 0)**

**[ 0.0, -2.0, -1.0, 1.0, 2.0]**

**>>> myNumbers.map( nag_cumul_normal ).takeSample(False, 5, 0)**

**[0.5, 0.02275, 0.15865, 0.84134, .97725]**

It should be noted that the vast majority of the algorithms employed in
the NAG library require all relevant data to be held in memory. This may seem to
deviate from the Spark ecosystem, however when working with large datasets, two
usage scenarios are commonly seen:

- The full dataset is split into subsets, for example a dataset covering the whole world may be split by country, county and city and an independent analysis carried out on each subset. In such cases all the relevant data for a given analysis may be held on a single node and therefore can be processed directly by NAG library routines.
- A single analysis is required that utilizes the full dataset. In this case it is sometimes possible to reformulate the problem. For example many statistical techniques can be reformulated as a maximum likelihood optimization problem. The objective function of such an optimization (the likelihood) can then be evaluated using the standard Spark map/reduce functions and the results fed back to one of the robust optimization routines available in the NAG library.

For more information on using the NAG Library in Spark or any of the other topics touched upon in this article please contact NAG at support@nag.com.

Labels:
apache spark,
java,
NAG Library,
Python

## Thursday, 22 January 2015

### Adding a Slider Widget to Implied Volatility

In the last post on Implied Volatility, we downloaded real options data from the CBOE and calculated the volatility curves/surface. We saw the calculations of 30,000 implied volatilities in roughly 10 seconds.

In this post we concentrate on the speed of calculating implied volatility via a variety of different methods. We look at the volatility curve/surface using Python's Scipy, the NAG Library for Python, and the NAG C Library. In addition, we've added a slider widget to the Python graphs from before to see the real-time effects of changing the interest and dividend rates (see the video below). All the code can be downloaded to produce the graphs, and a NAG license is not required for the case using scipy.optimize.fsolve.

The script and utility methods can be downloaded from here. The script begins by generating sample option prices. These are fed through different root finding methods (chosen by the user) to back out the implied volatilities.

The methods tested include:

1) scipy (scipy.optimize.fsolve) - A wrapper around the hybrd and hybrj algorithms in the MINPACK Fortran library, followed by a Python Black-Scholes formula.

2) nag4py (The NAG Library for Python) - A wrapper (nag_zero_cont_funct_brent) around Brent's method for the root-finding, followed by nag_bsm_price for the Black-Scholes formula.

3) ctypes (The NAG C Library, Mark 23) - The same NAG functions as (2), but the looping and calculations are done directly in C, rather than through the nag4py wrapper layer. This requires building a shared library on your own machine, which is then loaded from the main script.

###
**Running the Script**

You can run the script using the following command, where

*a_method*is one of {*scipy, nag4py, ctypes*}:*$ python implied_volatility.py --method a_method*

Note that option (3) (ctypes) requires you to build a shared library. The build command for Windows, Linux and Mac can be found below. NAGCDIR should be the directory of your NAG C Library installation. The C code (

*nag_imp_vol.c)*is included in the download.
Linux

gcc -Wl,--no-undefined -fPIC -shared nag_imp_vol.c -INAGCDIR/include NAGCDIR/lib/libnagc_nag.so -o nag_imp_vol.so

Mac

gcc -fPIC -shared nag_imp_vol.c -INAGCDIR/include NAGCDIR/lib/libnagc_nag.dylib -o nag_imp_vol.dylib

Windows

cl /LD /DLL /MD -I"NAGCDIR\include" nag_imp_vol.c /link /LIBPATH:"NAGCDIR\lib" "NAGCDIR\lib\nagc_nag_MD.lib"

Running the script with

*--method ctypes*produces the following result:###
**Timing the methods**

We've added a timer around the first call when calculating the implied volatilities. You can also change n_strikes in

*implied_volatility.py*to alter the total number of calculations. The base case uses 50 strikes, 5 expirations, and 2 option types (Call and Put) for a total of N = 50 * 5 * 2 = 500 implied volatilities.
The approximate times in seconds of each method are displayed below. (N is the total number of implied volatilities calculated.)

N | scipy | nag4py | NAG C Library | |

500 | 1.40 | 0.26 | 0.008 | |

1000 | 2.84 | 0.50 | 0.016 | |

2000 | 5.74 | 1.00 | 0.020 | |

4000 | 11.40 | 1.99 | 0.033 |

- While scipy.fsolve looks considerable slower than nag4py, this is not the case. The differences in speed are a result of calling a pure python Black-Scholes formula vs. using nag4py's
*nag_bsm_price*function. - The times for fsolve and nag4py scale somewhat linearly, while the NAG C shared library doesn't. I suspect this is due to the overhead of preparing the data into numpy arrays before calling the shared library.
- If you encounter dependencies issues with Scipy or Matplotlib, we recommend switching to a Python distribution such as Anaconda or Canopy.
- The code uses serial implementations in NAG and Python. You could look at a more advanced version that uses multi-threading or call a function that solves a system of equations (
*scipy.optimize.root*or*nag_zero_nonlin_eqns_easy*). - The above analysis is a
*very*oversimplified model of implied volatility. In practice, one should use a more complex model and look at other root-finding methods (such as the rational approximation). - You can also increase the number of calculations past 4000, but Python seems to have trouble plotting and updating the graphs.

Labels:
c library,
implied volatility,
Mathematical Finance,
NAG C Library,
nag4py,
Python

## Thursday, 2 October 2014

### Problem needed for research on Bermudan Option Pricing Algorithms

### Introduction

NAG together with Prof. Oosterlee and an MSc student from TU Delft are investigating the recent Stochastic Grid Bundling Method (SGBM) [1,2]. The objective is to compare the performance of SGBM to the well-known Longstaff-Schwartz (least squares method or LSM) in a non-academic setting, i.e. on the pricing of a Bermudan option, with underlying asset(s) driven by a realistic process such as Heston or LMM. We are looking for an interesting case to test these two methods. This includes the type of option, the underlying processes and any other important features or details.### Outline

The well known LSM by Longstaff-Schwartz[3] is the industry standard for pricing multi-dimensional Bermudan options by*simulation and regression*. LSM is based on the

*regression now*principle, whereas the Stochastic Grid Bundling Method (SGBM) by Jain and Oosterlee applies

*regression later*in order to get more accurate approximations. However, this limits us to apply SGBM to processes where an analytical or approximate expression of the discounted moments are available.

Another advantage of SGBM is its regression on

*bundles*instead of the whole data set in a further attempt to decrease the regression error, and SGBM allows the computation of the Greeks at almost no extra cost.

Numerical results show that, compared to LSM, a higher accuracy can be obtained at comparable computational time. However, these tests were performed on

*academic*problems, using geometric Brownian motion for the underlying assets.

Very recent research (Feng and Oosterlee, Sept 2014) extended SGBM for stochastic volatility and stochastic interest rate dynamics (Heston-Hull-White). This led to the possibility of comparing the performance of SGBM in a non-academic setting and is the cause for our research. We therefore wish to compare LSM and SGBM on a problem which is interesting and relevant to industry. Realistically, handling all the complexities of traded products will probably require more time than we have (around two months), so ideally we seek a problem which captures all the salient features and allowing us to see whether SGBM outperforms LSM.

### Industry Involvement

We would like some advice in defining the product to be valued, especially regarding dimensionality, process, correlation structure, payoff and exercise features. Any additional industry involvement will be light: perhaps a few emails to clarify details, a conf call or face to face meeting.For correspondence or further details please email support@nag.co.uk

#### References

[1] S. Jain and C. W. Oosterlee, "The Stochastic Grid Bundling Method: Efficient Pricing of Bermudan Options and their Greeks,'' papers, SSRN, Sept. 2013.[2] S. Jain and C. W. Oosterlee, "Pricing high-dimensional Bermudan options using the stochastic grid method,'' International Journal of Computer Mathematics, 89(9):1186-1211, 2012.

[3] F. Longstaff and E. Schwartz, "Valuing American Options by Simulation: a Simple Least-squares Approach,'' Review of Financial Studies, vol. 14, no. 1, pp. 113-147, 2001.

## Thursday, 21 August 2014

### Gaussian Mixture Model

With the release of Mark 24 of the NAG C Library comes a plethora of new functionality including matrix functions, pricing Heston options w/term structure, best subset selection, and element-wise weightings for the nearest correlation matrix.

Among the new routines I was excited to test out was the Gaussian mixture model (g03ga). This routine will take a set of data points and fit a mixture of Gaussians for a given (co)variance structure by maximizing the log-likelihood function. The user inputs the (co)variance structure, number of groups, and (optionally) the initial membership probabilities.

I decided to test out this new functionality, which is also in Mark 24 of the NAG Toolbox for MATLAB. Often I will use MATLAB with the NAG Toolbox before switching to C++ and the NAG C Library for my production code. So I generated some data and tried the routine to see if it could find the covariance structure. You can download the script and try it out for yourself here. The example will generate the test data, run the NAG Gaussian mixture routine and plot the results. An example of the output is given below:

The blue points are the generated data, while the red and yellow ovals show the covariance structure output from NAG Gaussian mixture model (the ovals are contours of ~0.60 density for their respective groups).

While running the example a couple times and re-sampling through the starting values for the initial membership probabilities, I noticed what I thought to be unusual behavior for the routine. Namely, the Gaussian mixture model algorithm isn't able to identify the Gaussian mixtures. The function would occasionally converge to the below structure (run the above script and click 'Resample' 3 times):

It appears the routines has converged to local extrema of the likelihood function. This happens as a result of randomizing the initial membership probabilities.

Since we have the power of the NAG Library at our disposal, I've added a K-means clustering option in the above script to initialize the membership probabilities to a particular cluster before being input into the Gaussian mixture model.

My colleagues tell me that k-means can also get stuck in a local minima and exhibit this 'wrong' behavior as well, thus one should always be careful with initial allocations - luckily the NAG Library provides a generally acceptable default allocation as an option! Many thanks to Martyn Byng and Stephen Langdell for comments on this post.

Among the new routines I was excited to test out was the Gaussian mixture model (g03ga). This routine will take a set of data points and fit a mixture of Gaussians for a given (co)variance structure by maximizing the log-likelihood function. The user inputs the (co)variance structure, number of groups, and (optionally) the initial membership probabilities.

I decided to test out this new functionality, which is also in Mark 24 of the NAG Toolbox for MATLAB. Often I will use MATLAB with the NAG Toolbox before switching to C++ and the NAG C Library for my production code. So I generated some data and tried the routine to see if it could find the covariance structure. You can download the script and try it out for yourself here. The example will generate the test data, run the NAG Gaussian mixture routine and plot the results. An example of the output is given below:

The blue points are the generated data, while the red and yellow ovals show the covariance structure output from NAG Gaussian mixture model (the ovals are contours of ~0.60 density for their respective groups).

While running the example a couple times and re-sampling through the starting values for the initial membership probabilities, I noticed what I thought to be unusual behavior for the routine. Namely, the Gaussian mixture model algorithm isn't able to identify the Gaussian mixtures. The function would occasionally converge to the below structure (run the above script and click 'Resample' 3 times):

It appears the routines has converged to local extrema of the likelihood function. This happens as a result of randomizing the initial membership probabilities.

Since we have the power of the NAG Library at our disposal, I've added a K-means clustering option in the above script to initialize the membership probabilities to a particular cluster before being input into the Gaussian mixture model.

My colleagues tell me that k-means can also get stuck in a local minima and exhibit this 'wrong' behavior as well, thus one should always be careful with initial allocations - luckily the NAG Library provides a generally acceptable default allocation as an option! Many thanks to Martyn Byng and Stephen Langdell for comments on this post.

Labels:
algorithms,
c library,
MATLAB,
NAG Toolbox for Matlab

Subscribe to:
Posts (Atom)