Thursday, September 25, 2008

kNN Various Stages Explained - Training Phase

Training Phase:

Training Image is /usr/sci/crcnsdata/CRCNS/Synapses/data/Refined2_marked_RM2_with_fake/Layer1_0_0_card_resize_p25.tif
Size of the image is 4590 x 2869 (downSampled 4x4 from original)

Step 1: Generating SIFT key points
We generate the SIFT key points for this image using /usr/sci/crcnsdata/CRCNS/Synapses/Code/Matlab/sift/sift.m
Time taken : 6 minutes
Number of SIFT key points = 359265

Step 2: Converging the SIFT key points
Filter of Key points in the brighter part of the image, and the borders and converge the rest. These operations are done using /usr/sci/crcnsdata/CRCNS/Synapses/Code/Matlab/kNN/CentroidCalc2.m
Time taken : 5 minutes
After this filtering the number of points = 69627
Unique number of SIFT points = 56521
After converging the points the number of unique points = 56467

Step 3: Clustering the points
The new cluster-center initialization method is used. The algorithm picks the first cluster center point randomly and then chooses the next points as the one farthest from the already identified cluster centers. The algorithm stops when the greatest separation of a point is lesser than specified separation from any of the cluster centers identified. The algorithm ensures that there no point that is further than the specified separation distance from a cluster center. The algorithm is in /usr/sci/crcnsdata/CRCNS/Synapses/Code/Matlab/kmeans/kmeans.m
Time taken : 87 seconds
Separation distance = 17.5 pixels
Number of clusters identified = 8395
The 8395 cluster points identified.

Wednesday, September 17, 2008

kNN Results

Experiment Setup:
Ground truth:
  • Dr.Marc's dataset
  • Positive examples were converged SIFT points less than 45 pixels from the converged Synapse points.
  • Negative examples were converged SIFT points > 45 pixels from converged Synapse markup points (All points were taken, the dataset is a skewed dataset)
Test Data:
  • First set of markup images

Results:
52 synapses were identified. The experiment. Few of the results are shown below. These pictures are screen shots from the QT based rudimentary Synapse Viewer in Linux. The yellow markups are the predicted ones and the red ones are the user marked ones(non-converged).



P.S: Finally some result after few bad days!! Kraken reboot killed an experiment!!

Monday, September 8, 2008

kNN analysis

Yesterday's learning was killed because it would have taken 10+ days for the kNN classifier to be trained using a 5-fold validation. Now all the positive examples and negative examples are used as training examples. The 5-NN classifier has started running but it looks like it would take a looong time to run (~6 days). Here I tried to see if the region matching was done. The below figures are show various 5 - nearest neighbors for test patches.

Sunday, September 7, 2008

The modified experiment starts...

The modifications suggested here have been implemented except for the C implementation for distance calculation. The dataset generation code is in the following location /usr/sci/crcnsdata/CRCNS/Synapses/Code/Matlab/featureGeneration/generateFeatures.m

The experiment is being run for k = 1,3,5,7,9,11,15,19. As in the previous experiment 5-fold validation is being done. I think this is going to run for a real long time because of the large number of negative regions & their rotations. I guess in the long run a the negative examples should be clustered together so that only few are there for test time.

Friday, September 5, 2008

Verifying the kNN classifier - Cresent dataset

The kNN classifier was trained and tested on the crescent dataset. This dataset was chosen because it is non-separable using a linear classifier. The dataset gets generated in the /usr/sci/crcnsdata/CRCNS/Synapses/Code/Matlab/utilities/generateCresentDataset.m MATLAB script. The dataset look like the one below.The following shows the plot of kNN training & testing accuracies by k in such a dataset.

Thursday, September 4, 2008

Verifying the kNN classifier

A done for out previous classifiers, we will verify our kNN with the 10D Gaussian dataset (dataset explained here ). The classifier. The dataset is separable as shown in the blog. The plot of training and test accuracies are shown in the graph below for various k are shown below (k = 1,5, 10, 20).Thus the classifier is correct.

Clustering algorithm - Complete Linkage

This is the type 3 clustering explained in the previous blog. They where clustered at maxLinkage = 35 and maxLinkage = 35 / 2 respectively. This resulted in the SIFT points reduced to 3338 and 5025 respectively. The clusters are stored in structure arrays in /usr/sci/crcnsdata/CRCNS/Synapses/Code/Matlab/clustering/completeLinkage35Clustering.mat
and /usr/sci/crcnsdata/CRCNS/Synapses/Code/Matlab/clustering/completeLinkage17.5Clustering.mat respectively.

Next step is to formulate how to represent this point set so that it is closest to the Synapse location. The strategy that we are going to use is close the spot with the darkest neighborhood. After such an operation the Histogram below shows the distribution of the synapse to its nearest cluster center.

The histogram of the distances of the Converged Synapse points to the nearest cluster centers is shown below

The below image is overlay ed with the Original Synapse points (Red '*'), Converged Synapse points (Blue 'O') and reduced cluster Centers (Green '*')

Monday, September 1, 2008

Clustering algorithm

From the view of data point reduction we will cluster the SIFT points so that, we can make the kNN classifier run faster. We will do a agglomerative clustering with complete linkage. The clustering will be done so that the distance between the key points within a cluster is less that the disk diameter that is going to be used to generate the patch.

Clustering Results:
Type 1: In this method of clustering all the distance matrix is calculated and the points with least distances are merged to a single point. All such nearest points are merged until the nearest neighbor of a point is at least the diameter of the disk size that is going to be used to generate the patch. After such a reduction 10114 unique converged SIFT points were reduced to 2676 points. The figure below is the histogram of the distances between a synapse point to the nearest such SIFT point. The set of reduced points are stored in /usr/sci/crcnsdata/CRCNS/Synapses/Code/Matlab/clustering/meanClustering.mat
Type 2: In this the clustering mechanism is same as the above method but instead of calculating simple mean a weighted mean is done. Initially all points are started with equal weight of one. Once a pair of points are merged the weight of the point is increased to the sum of the weights of the merged points. This would avoid merged points getting drifted too far from the original points. This method resulted in 1465 points. The points are stored in /usr/sci/crcnsdata/CRCNS/Synapses/Code/Matlab/clustering/weightedmeanClustering.mat
Type 3: This method the linkage would be complete. The points are not merged. A new point would we added to the cluster one and only if it's distance from all the cluster points is not larger than the disk size.

Convergence of SIFT points

The convergence of the key points to the converged Synapses can be understood from the below graphs. In the first set of graphs shows the histograms of distances between SIFT points and ground truth synapses points when the weighing function is a Gaussian (weighed most on the centroid).


The next graph shows the distance histogram when the weighing function is equal on all pixels on the circular patch.

The plot of SIFT key points generated over a Perona Malik method smoothened image is shown as below.

The points are really close to one another. Probably we can run a clustering algorithm (here and here) that will decrease the number of SIFT key points to be analyzed. This will also help in making the kNN classification algorithm faster.