Monday, July 21, 2008

Verifying the Cascade Classifier

Thursday & Friday, I have been concentrating on verification of the Cascade Machine learning algorithm and the features used in the images.

Verification of Algorithm: To verify the algorithm a simpler dataset was used. It was just a 10D Gaussian dataset with 1000 positive examples and 10,000 negative examples. Generation of 10D dataset was relatively easier using the following MATLAB commands.

posx = 0.25 * randn (10, 1000) + 1; negx = 0.25 * randn (10, 10000) + 0; x = [posx negx]; x = x'; posy = ones (1000,1); negy = ones (10000,1); y = [posy;negy];
The dataset has:
  1. Standard Deviation = 0.25 and mean = 1 for positive examples
  2. Standard Deviation = 0.25 and mean = 0 for negative examples
The figure below shows a 2D dataset with same distribution parameters.
The figure below shows a 2D dataset with distribution parameters S.D. = 0.5 and mean = 0, 1.
The 10D Gaussian dataset was feed to the classifier and the classification results are as shown below.

The machine learning algorithm's classification results for 10D Gaussian dataset are shown in the plots below.


The target prediction rate (true positives) was set at 0.95.

Apart from this, the explanation for very low false positive rate in Dr.Tolga's linear classifier based cascade setup could be the following:
  1. Decision Stump based classifier works on a single attribute at a time the classifying hyperplane is always parallel to the axis of some dimension at anytime whereas for the linear classifier it can find a better classification hyperplane at arbitrary angles to each of the dimensions.
  2. Dr. Tolga's dataset had randomly picked non-synapses locations which could be well inside the lighter regions of the neurons, where as the dataset are confined to darker regions on the image hence making it challenging for the classifier and hence the high false positive rate.

No comments: