### What's new at Mark 23? Linear quantile regression

This is the first in an occasional series of posts highlighting new functionality in the latest release (Mark 23) of the NAG Library.  Here, we describe linear quantile regression, which has just been added to the collection of regression techniques that is already available in the Library.

Regression techniques are concerned with modelling and analyzing the relationship between a dependent (or response) variable and one or more independent (or explanatory) variables.  More specifically, they enable the user to understand how the typical value of the response variable changes when one or more of the explanatory variables are varied.  A common example of regression analysis is linear least-squares regression, which is concerned with modelling the behaviour of the conditional mean of the response variable and which, as its name implies, employs the well-known method of least-squares.  By contrast, linear quantile regression models one of the conditional quantiles of the response variable (see this note for more information on its formal definition).  The solution of this problem is obtained using linear programming techniques; least-squares methods cannot be used here.

Because quantile regression allows multiple quantiles to be modelled, it can provide an analysis of the relationship between variables which is more comprehensive than that given by least-squares regression, since that only considers the behaviour of the mean.  For example, the results of least-squares regression will not be as robust as those of quantile regression if the conditional distribution of the response variable has heavy tails, is asymmetric, or is not unimodal.  Quantile regression is less sensitive than least-squares regression to outliers in the response variable and, by concentrating on specific quantiles, the user is able to investigate the behaviour of different parts of the distribution.  Applications of quantile regression include its use in finance to estimate so-called Value at Risk (see, for example, here, here and here), and it has been used in ecology to discover predictive relationships between variables in cases where the relationship between their means is either non-existent or only weak.

In Mark 23 of the NAG Fortran Library, quantile regression is performed by the G02QGF routine and by G02QFF, which provides a simplified interface to G02QGF.  (The corresponding routines in the latest versions of the NAG C Library are nag_regsn_quant_linear and nag_regsn_quant_linear_iid, and of the NAG Toolbox for MATLAB are nag_correg_quantile_linreg and nag_correg_quantile_linreg_easy).  The figure below shows the results from the example for these routines, which investigates the relationship between household food expenditure and income, using data from an 1857 study. Plot of household food expenditure versus income, along with the results of fitting a quantile regression model to the data for the five quantiles indicated in the key.

I'm grateful to my colleagues Martyn Byng and Jeremy Walton for their help in preparing this post, and their interest in this work.