Friday, October 17, 2008

kNN with clustered test points

The setup for this experiment is as follows
  1. The data is divided into 4 folds (4 quadrants of the image)
  2. Positive training examples are all the SIFT key points that are within 10 pixel distance from the converged synapse markup points.
  3. Negative training examples are all the SIFT key points that are farther than 27.2 pixel distance from the converged synapse markup points. These points are clustered so that the training data is balanced.
  4. Test points are all the converged SIFT points in that quadrant. The test points are clustered.
Observations of why this experiment could fail:
  1. There are certain markups that very closely placed, but because of clustering the entire region might have only one representative point at that region which could be closer to either markups. There are multiple markups(151/468) that are closer than 27.5 (disk radius) pixels. The below histogram shows the distance between a markup and next closest markup point. (Note: The histogram shows only the distances for the pixels that have a separation of less than 50 pixels[214/468] pixels from each other and not all pixels)
  2. After choosing a representative point from a cluster. The histogram of distances between the markup point and nearest test points shown below. It is very clear that after choosing a representative point by this method, we don't even have test points near the actual markup neighborhood.
Thus this method of clustering would not work appropriately to generate representative test points.

No comments: