UBE Machine Learning. Kaya Oguz

UBE 521 - Machine Learning Kaya Oguz

Support Vector Machines How to divide up the space with decision boundaries? 1990s - new compared to other methods. How to make the decision rule to use with this boundary? Image reference: Neural Networks and Learning Machines, Haykin, p298

Support Vector Machines Imagine we have a vector w 0, of any length we like, constrained to be perpendicular to the decision boundary. Consider an unknown point x: what we are interested in is whether or not x is on the right side of the boundary, or on the left side. We project its vector to w 0, so that we have a distance in the direction of w 0. Image reference: Neural Networks and Learning Machines, Haykin, p299

Support Vector Machines Decision rule for positive samples: We don t know what constant (b) to use, or w to use Right now there is not enough constraint to compute these values. So we add more constraints.

Support Vector Machines Define function y as +1 for positive samples, and -1 for negative samples, and the two equations become one: And for the ones in the gutter, this value is 0.

Support Vector Machines To maximize the width of the gutter, we have to minimize w. For a positive sample x p and a negative sample x n the width of gutter is If we write x es in terms of w s, using the previous equations, we get 2 / w. Therefore, to maximize the gutter, we have to minimize w. For mathematical convenience, it is also good to minimize ( w / 2) 2.

Support Vector Machines If we want to find the extremum of a function with constraints, we need to use Lagrange multipliers, which will give us a new expression which we can maximize or minimize without thinking about the constraints.

Support Vector Machines What if they can t be separated linearly? We apply a transformation to x; called phi of x. We add another dimension. Actually we don t need to know the transformation; we just need to know the dot product in the upper dimension. This is done with the kernel trick, using kernel functions. Let s see how it works on MATLAB.

Principal Component Analysis Önce istatistik Mean (ortalama) Standard deviation of a data set is a measure of how spread out the data is. Dağılımdaki her noktanın ortalama değere olan uzaklığının ortalaması

Principal Component Analysis Variance: sigma ^ 2 Covariance: İki boyut arasındaki ilişkiyi, birbirlerine göre boyutlarının nasıl değiştiğini gösterir.

Principal Component Analysis Covariance değeri pozitif ise: iki boyut benzerlik gösteriyor (beraber artıyorlar, ya da azalıyorlar). Negatif ise: bir boyut artarken, diğeri azalıyor. Sıfır ise: iki boyut birbirinden bağımsız. Eğer birden fazla boyut varsa, x, y, z gibi, cov(x,y), cov(x,z) ve cov(y,z) hesaplamalıyız.

Eigenvectors Transformation matrix (3,4) vektörünü saat yönünde 53 derece döndürürsek? Image reference: https://en.wikipedia.org/wiki/file:2d_affine_transformation_matrix.svg

Eigenvectors Soldaki [2 3; 2 1] matrisini dönüşüm matrisi olarak düşünebiliriz? Bu durumda [6;4] vektörü kendi üzerine geri dönmüş oldu, sadece boyu 4 kat büyüdü? Burada [6;4], ya da unit olarak [0.8321; 0.5547], soldaki matrisin Özvektörüdür.

Eigenvectors Sağda, Matlab yardımı ile özvektör ve özdeğerleri görüyoruz. Şimdi daha resmi olarak tanımlarına bakalım.

Eigenvectors In linear algebra, an eigenvector or characteristic vector of a linear transformation is a non-zero vector that does not change its direction when that linear transformation is applied to it (from Wikipedia). Yalnızca kare matrislerin özleri var, ve her kare matrisin de yok. Her boyuta bir özvektör ve bir özdeğer düşüyor. Eğer 3x3 bir matrisimiz varsa, 3 adet özdeğer ve özvektör var. Matlab? V deki her kolon bir eigenvektör. Unit size. Eigenvektörler birbirlerine dik (orthogonal).

Principal Component Analysis PCA is a way of identifying patterns in data, and expressing the data in such a way as to highlight their similarities and differences. Since the patterns in data can be hard to find in data of high dimension, where the luxury of graphical representation is not available, PCA is a powerful tool for analysing data. The other advantage is that once you find the patterns in the data, you compress the data by reducing the number of dimensions, without much loss of information.

Subtract the mean Subtract the mean from each of the data dimensions. data2(:,1) = data(:,1) - mean(data(:,1)); data2(:,2) = data(:,2) - mean(data(:,2));

PCA Calculate the covariance matrix. Matlab? Calculate the eigenvectors and eigenvalues of the covariance matrix. Matlab?

PCA The eigenvector with the highest eigenvalue is the principal component of the data set. Once eigenvectors are found, the next step is to order them by eigenvalue, highest to lowest, which gives us the components in order of significance. We can ignore the components of lesser significance, if we want to. We lose information by doing so, but if the eigenvalues are small, we don t lose much. The selected eigenvectors form a feature vector. In this particular example, we only choose the [0.6779;0.7352].

PCA In the final step, we take the transpose of the feature vector, and multiply it with the original transposed data set. FinalData = RowFeatureVector X RowDataAdjust

Eigenfaces The best application of PCA is to eigenfaces. http://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html

Final Project Derste öğrendiğiniz yöntemleri kullanarak müzik parçalarını aşağıdaki özelliklere göre tanıyacak bir uygulama hazırlayınız. Tür (Rock, Metal, Pop, Rap, Klasik, Arabesk, vb ) Yıllar (50 ler, 60 lar,... ) En az 2 adet öğrenme metodu denenmeli ve karşılaştırılmalıdır. http://ube.ege.edu.tr/~oguz/dosyalar/music.zip (~306MB)