Quadratic classifier

This article is about machine learning and statistical classification. For other uses of the word "quadratic" in mathematics, see Quadratic (disambiguation).

A quadratic classifier is used in machine learning and statistical classification to separate measurements of two or more classes of objects or events by a quadric surface. It is a more general version of the linear classifier.

The classification problem

Statistical classification considers a set of vectors of observations x of an object or event, each of which has a known type y. This set is referred to as the training set. The problem is then to determine for a given new observation vector, what the best class should be. For a quadratic classifier, the correct solution is assumed to be quadratic in the measurements, so y will be decided based on

\mathbf{x^T A x} + \mathbf{b^T x} + c

In the special case where each observation consists of two measurements, this means that the surfaces separating the classes will be conic sections (i.e. either a line, a circle or ellipse, a parabola or a hyperbola). In this sense we can state that a quadratic model is a generalization of the linear model, and its use is justified by the desire to extend the classifier's ability to represent more complex separating surfaces.

Quadratic discriminant analysis

Quadratic discriminant analysis (QDA) is closely related to linear discriminant analysis (LDA), where it is assumed that the measurements from each class are normally distributed. Unlike LDA however, in QDA there is no assumption that the covariance of each of the classes is identical. When the normality assumption is true, the best possible test for the hypothesis that a given measurement is from a given class is the likelihood ratio test. Suppose there are only two groups, (so $y \in \{0,1 \}$ ), and the means of each class are defined to be $\mu_{y=0},\mu_{y=1}$ and the covariances are defined as $\Sigma_{y=0}, \Sigma_{y=1}$ . Then the likelihood ratio will be given by

Likelihood ratio =

{\frac {{\sqrt {|2\pi \Sigma _{y=1}|}}^{-1}\exp \left(-{\frac {1}{2}}(x-\mu _{y=1})^{T}\Sigma _{y=1}^{-1}(x-\mu _{y=1})\right)}{{\sqrt {|2\pi \Sigma _{y=0}|}}^{-1}\exp \left(-{\frac {1}{2}}(x-\mu _{y=0})^{T}\Sigma _{y=0}^{-1}(x-\mu _{y=0})\right)}}<t

for some threshold $t$ . After some rearrangement, it can be shown that the resulting separating surface between the classes is a quadratic. The sample estimates of the mean vector and variance-covariance matrices will substitute the population quantities in this formula.

Other

While QDA is the most commonly used method for obtaining a classifier, other methods are also possible. One such method is to create a longer measurement vector from the old one by adding all pairwise products of individual measurements. For instance, the vector

[x_1, \; x_2, \; x_3]

would become

[x_1, \; x_2, \; x_3, \; x_1^2, \; x_1x_2, \; x_1 x_3, \; x_2^2, \; x_2x_3, \; x_3^2]

Finding a quadratic classifier for the original measurements would then become the same as finding a linear classifier based on the expanded measurement vector. This observation has been used in extending neural network models;^[1] the "circular" case, which corresponds to introducing only the sum of pure quadratic terms $\; x_1^2 + x_2^2 + x_3^2 \;\ldots \;$ with no mixed products ( $\;x_{1}x_{2},\;x_{1}x_{3}\;\ldots \;$ ), has been proven to be the optimal compromise between extending the classifier's representation power and controlling the risk of overfitting (Vapnik-Chervonenkis dimension).^[2]

For linear classifiers based only on dot products, these expanded measurements do not have to be actually computed, since the dot product in the higher-dimensional space is simply related to that in the original space. This is an example of the so-called kernel trick, which can be applied to linear discriminant analysis, as well as the support vector machine.

References

↑ Cover TM (1965). "Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition". IEEE Transactions on Electronic Computers. EC-14 (3): 326–334. doi:10.1109/pgec.1965.264137.
↑ Ridella S, Rovetta S, Zunino R (1997). "Circular backpropagation networks for classification". IEEE Transactions on Neural Networks. 8 (1): 84–97. doi:10.1109/72.554194. PMID 18255613. href IEEE: .

Sources:

Sathyanarayana, Shashi (2010). "Pattern Recognition Primer II". Wolfram Demonstrations Project.

This article is issued from Wikipedia - version of the 10/21/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.