Wednesday, November 26, 2008

GentleBoost Classifer on Membrane detection

The feature set is a bunch of Gabor of filter responses (nFeatures = 1080). The membrane dataset is /usr/sci/crcnsdata/CRCNS/Synapses/data/CellMembraneDetection/stom-002.png. The membrane markup is done in the stom-002_clahe_diff_thresh_concomp_thin_edit.png file. The membranes are marked as black and non membranes are marked white. The membrane is used as the positive example and the membrance is eroded and the remaining area is used as negative example. There is space between membranes and non-membrane area that is not considered during the training. The ground truth data looks like the following image.

The gabor filter response generator is run on the image and 1080 features are generated. One of the feature images is shown below.



The ROC curves is shown below for one of the folds of boosting for various stages.


Sunday, October 26, 2008

The new gentle boost classifier

After the kNN classifier experiment, we will be trying a new classifier inspired by one of the poster in MIABB 2008 - Mitochondria detection in electron microscopy images. The salient features of this classification mechanism are the following:
  1. Pre-processing: The image is histogram equalized and median filtered. Histogram equalization will help reduce intensity variations across the image and median filtering will decrease the salt & pepper kind of noise in the image.
  2. Features: Histograms and Gabor filter responses for multiple window sizes and multiple Gabor filter frequencies are used as features for the classifer.
  3. Ground truth markup: The entire Synapse region is marked up for hte experiment and all the marked up pixels will be used as positive examples for the training the classifer.
  4. Learning Algorithm: The learning algorithm uses Gentle Boost (a variation of AdaBoost) to train the classifer. The classifer is not used specified in the poster.

Friday, October 17, 2008

kNN with clustered test points

The setup for this experiment is as follows
  1. The data is divided into 4 folds (4 quadrants of the image)
  2. Positive training examples are all the SIFT key points that are within 10 pixel distance from the converged synapse markup points.
  3. Negative training examples are all the SIFT key points that are farther than 27.2 pixel distance from the converged synapse markup points. These points are clustered so that the training data is balanced.
  4. Test points are all the converged SIFT points in that quadrant. The test points are clustered.
Observations of why this experiment could fail:
  1. There are certain markups that very closely placed, but because of clustering the entire region might have only one representative point at that region which could be closer to either markups. There are multiple markups(151/468) that are closer than 27.5 (disk radius) pixels. The below histogram shows the distance between a markup and next closest markup point. (Note: The histogram shows only the distances for the pixels that have a separation of less than 50 pixels[214/468] pixels from each other and not all pixels)
  2. After choosing a representative point from a cluster. The histogram of distances between the markup point and nearest test points shown below. It is very clear that after choosing a representative point by this method, we don't even have test points near the actual markup neighborhood.
Thus this method of clustering would not work appropriately to generate representative test points.

Monday, October 13, 2008

kNN with Unclustered Test points

In this the image is divided into 4 quadrants.
Test Data: SIFT key points in each quadrant is used as test points.
Training Data: The SIFT key points from the other quadrants are used to train the classifier. The negative examples are clustered so that the dataset is more balanced.

The statistics about points in each fold is
Fold 1
Training points Total = 5027, Positive points = 2119
Test points Total = 28216, Positive points = 848
Test Result: Positive fold

Wednesday, October 8, 2008

Another kNN setup

In this the image is divided into 4 quadrants and SIFT key points in each quadrant is used as test points and the SIFT key points from the other quadrants are used to train the classifier.

The statistics about points in each fold is
Fold 1
Total points = 1797, Positive points = 848
Fold 2
Total points = 1831, Positive points = 877
Fold 3
Total points = 1499, Positive points = 486
Fold 4
Total points = 1732, Positive points = 756

The definition of FP_rate has been changed, other definitions remain the same
#TP = number of ground truth positives (synapses marked by the Mark lab) with at least one marking done by the classifier within some radius (10 pixels).
#GTP = number of ground truth points (synapses marked by the Mark lab)
=> TP_rate = #TP / #GTP
#FP = number of positions marked by the classifier - #TP.
=> FP_rate = #FP / (#nSIFTpoints - #GTP}

Monday, October 6, 2008

ROC Curve

On visual inspection of the results it actually looked better than the Confusion matrix values. The reason was that there were multiple points SIFT points near the expert markup. Of those multiple points only few were detected (like 2 of 10). That is the reason why the true positive was that low. So instead of tracking the True positive as the TP of the kNN classifier we give the following definition as described below.

#TP = number of ground truth positives (synapses marked by the Mark lab) with at least one marking done by the classifier within some radius (10 pixels).
#GTP = number of ground truth points (synapses marked by the Mark lab)
=> TP_rate = #TP / #GTP
#FP = number of positions marked by the classifier - #TP.
=> FP_rate = #FP / #{markings by classifier}This was the definition for the true positive rate and the false negative rate. Using this definition the ROC curve was done for different value of k (1,3,5,..51) in KNN classifier and different weights for the positive ones(1.0, 1.1,1.2...2.0). The figure below is one such example.The weird looking graph plotted for all ks :)

Wednesday, October 1, 2008

New kNN setup

In the new kNN experiment setup we will be using the old dataset since it has many examples compared to the Dr.Marc marked examples. There will be a 5-fold validation on the kNN classifier. In this classifier the disk size used in 55 rather than 35 used in the previous experiment.

Step 1: Generating SIFT key points
Number of SIFT key points = 447947

Step 2: Converging the SIFT key points
After this filtering the number of points = 24944
After converging the points the number of unique points = 21844

Step 3: Clustering the points
Separation distance = 27.5 pixels
Number of clusters identified = 8395

Step 4: Positive & Negative examples : Separate Clustering
Separate clustering of points was done for positive and negative examples
Positive SIFT points are ones less than 10 pixels away from the Converged Synapses. (Positives = 608, Clusters = 336, ClustersFromConvergedSynapses = 377)
Negative SIFT points are ones greater than 55 pixels away from the Converged Synapses. (Negatives = 17688, Clusters = 6198)
One observation is that the number of clusters identified depends on the first point chosen.
The for the negative data set the points would be clustered and representative points would be taken, but in case of positive we will use the entire set because clustering them halves the number of positive examples and would make the skewed data set (1:10) to (1:20).
The data points are stored in /usr/sci/crcnsdata/CRCNS/Synapses/Code/Matlab/kmeans/PositiveNegSIFTPoints.mat
Add more twist to the tale, when doing to clusterToPoint2 reduction for the Negative examples it finally ends up with 5784 points. On repeating the same procedure 3 times the reduction stops and the number of points are 4625.

Step 5: Set up for 5-fold validation
Positive points(uniqueConvergedSIFTpointsLT10) = 608 and Negative points (NegativesClusterCenters4) = 4625. From PositiveNegSIFTPoints.mat file.

The function /usr/sci/crcnsdata/CRCNS/Synapses/Code/Matlab/kmeans/createFolds.m creates the number of folds for the data.

Step 6: Circular Region Extraction
The entire setup for the experiment is done by the function "generateFeatures('synapse1-5fold')". The testing for the individual folds can be done by running the scripts foldXXRun (XX = 1,2,3,4,5). The results of the individual fold runs are found in foldXXResults.

Test Results:
A quantitative examination of the results is shown in the bar graph of confusion matrix entries for all folds.
Qualitative analysis:
For the qualitative analysis the test patches for the 5 folds have been extracted and stored in an image.

Thursday, September 25, 2008

kNN Various Stages Explained - Training Phase

Training Phase:

Training Image is /usr/sci/crcnsdata/CRCNS/Synapses/data/Refined2_marked_RM2_with_fake/Layer1_0_0_card_resize_p25.tif
Size of the image is 4590 x 2869 (downSampled 4x4 from original)

Step 1: Generating SIFT key points
We generate the SIFT key points for this image using /usr/sci/crcnsdata/CRCNS/Synapses/Code/Matlab/sift/sift.m
Time taken : 6 minutes
Number of SIFT key points = 359265

Step 2: Converging the SIFT key points
Filter of Key points in the brighter part of the image, and the borders and converge the rest. These operations are done using /usr/sci/crcnsdata/CRCNS/Synapses/Code/Matlab/kNN/CentroidCalc2.m
Time taken : 5 minutes
After this filtering the number of points = 69627
Unique number of SIFT points = 56521
After converging the points the number of unique points = 56467

Step 3: Clustering the points
The new cluster-center initialization method is used. The algorithm picks the first cluster center point randomly and then chooses the next points as the one farthest from the already identified cluster centers. The algorithm stops when the greatest separation of a point is lesser than specified separation from any of the cluster centers identified. The algorithm ensures that there no point that is further than the specified separation distance from a cluster center. The algorithm is in /usr/sci/crcnsdata/CRCNS/Synapses/Code/Matlab/kmeans/kmeans.m
Time taken : 87 seconds
Separation distance = 17.5 pixels
Number of clusters identified = 8395
The 8395 cluster points identified.

Wednesday, September 17, 2008

kNN Results

Experiment Setup:
Ground truth:
  • Dr.Marc's dataset
  • Positive examples were converged SIFT points less than 45 pixels from the converged Synapse points.
  • Negative examples were converged SIFT points > 45 pixels from converged Synapse markup points (All points were taken, the dataset is a skewed dataset)
Test Data:
  • First set of markup images

Results:
52 synapses were identified. The experiment. Few of the results are shown below. These pictures are screen shots from the QT based rudimentary Synapse Viewer in Linux. The yellow markups are the predicted ones and the red ones are the user marked ones(non-converged).



P.S: Finally some result after few bad days!! Kraken reboot killed an experiment!!

Monday, September 8, 2008

kNN analysis

Yesterday's learning was killed because it would have taken 10+ days for the kNN classifier to be trained using a 5-fold validation. Now all the positive examples and negative examples are used as training examples. The 5-NN classifier has started running but it looks like it would take a looong time to run (~6 days). Here I tried to see if the region matching was done. The below figures are show various 5 - nearest neighbors for test patches.

Sunday, September 7, 2008

The modified experiment starts...

The modifications suggested here have been implemented except for the C implementation for distance calculation. The dataset generation code is in the following location /usr/sci/crcnsdata/CRCNS/Synapses/Code/Matlab/featureGeneration/generateFeatures.m

The experiment is being run for k = 1,3,5,7,9,11,15,19. As in the previous experiment 5-fold validation is being done. I think this is going to run for a real long time because of the large number of negative regions & their rotations. I guess in the long run a the negative examples should be clustered together so that only few are there for test time.

Friday, September 5, 2008

Verifying the kNN classifier - Cresent dataset

The kNN classifier was trained and tested on the crescent dataset. This dataset was chosen because it is non-separable using a linear classifier. The dataset gets generated in the /usr/sci/crcnsdata/CRCNS/Synapses/Code/Matlab/utilities/generateCresentDataset.m MATLAB script. The dataset look like the one below.The following shows the plot of kNN training & testing accuracies by k in such a dataset.

Thursday, September 4, 2008

Verifying the kNN classifier

A done for out previous classifiers, we will verify our kNN with the 10D Gaussian dataset (dataset explained here ). The classifier. The dataset is separable as shown in the blog. The plot of training and test accuracies are shown in the graph below for various k are shown below (k = 1,5, 10, 20).Thus the classifier is correct.

Clustering algorithm - Complete Linkage

This is the type 3 clustering explained in the previous blog. They where clustered at maxLinkage = 35 and maxLinkage = 35 / 2 respectively. This resulted in the SIFT points reduced to 3338 and 5025 respectively. The clusters are stored in structure arrays in /usr/sci/crcnsdata/CRCNS/Synapses/Code/Matlab/clustering/completeLinkage35Clustering.mat
and /usr/sci/crcnsdata/CRCNS/Synapses/Code/Matlab/clustering/completeLinkage17.5Clustering.mat respectively.

Next step is to formulate how to represent this point set so that it is closest to the Synapse location. The strategy that we are going to use is close the spot with the darkest neighborhood. After such an operation the Histogram below shows the distribution of the synapse to its nearest cluster center.

The histogram of the distances of the Converged Synapse points to the nearest cluster centers is shown below

The below image is overlay ed with the Original Synapse points (Red '*'), Converged Synapse points (Blue 'O') and reduced cluster Centers (Green '*')

Monday, September 1, 2008

Clustering algorithm

From the view of data point reduction we will cluster the SIFT points so that, we can make the kNN classifier run faster. We will do a agglomerative clustering with complete linkage. The clustering will be done so that the distance between the key points within a cluster is less that the disk diameter that is going to be used to generate the patch.

Clustering Results:
Type 1: In this method of clustering all the distance matrix is calculated and the points with least distances are merged to a single point. All such nearest points are merged until the nearest neighbor of a point is at least the diameter of the disk size that is going to be used to generate the patch. After such a reduction 10114 unique converged SIFT points were reduced to 2676 points. The figure below is the histogram of the distances between a synapse point to the nearest such SIFT point. The set of reduced points are stored in /usr/sci/crcnsdata/CRCNS/Synapses/Code/Matlab/clustering/meanClustering.mat
Type 2: In this the clustering mechanism is same as the above method but instead of calculating simple mean a weighted mean is done. Initially all points are started with equal weight of one. Once a pair of points are merged the weight of the point is increased to the sum of the weights of the merged points. This would avoid merged points getting drifted too far from the original points. This method resulted in 1465 points. The points are stored in /usr/sci/crcnsdata/CRCNS/Synapses/Code/Matlab/clustering/weightedmeanClustering.mat
Type 3: This method the linkage would be complete. The points are not merged. A new point would we added to the cluster one and only if it's distance from all the cluster points is not larger than the disk size.

Convergence of SIFT points

The convergence of the key points to the converged Synapses can be understood from the below graphs. In the first set of graphs shows the histograms of distances between SIFT points and ground truth synapses points when the weighing function is a Gaussian (weighed most on the centroid).


The next graph shows the distance histogram when the weighing function is equal on all pixels on the circular patch.

The plot of SIFT key points generated over a Perona Malik method smoothened image is shown as below.

The points are really close to one another. Probably we can run a clustering algorithm (here and here) that will decrease the number of SIFT key points to be analyzed. This will also help in making the kNN classification algorithm faster.

Friday, August 29, 2008

kNN Results

Experiment Setup: The 400 odd examples where used as positive examples and the image was run on the image marked up by Dr. Marc. 13248 examples generated by rotating converged synapses where taken as positive examples. Randomly selected other patches (14400) where selected as negative examples. k = 5 Nearest Neighbor classifier was used as the classifier. The test data was the drifted SIFT points generated by the SIFT algorithm.

Results: There where totally 98.94% of SIFT points identified as synapses. That is whopping 49040 (of 49567) synapses in place of 65 identified :(.

Modifications for the next experiment:
  1. Ground truth data: Mr. Marc's dataset will be used as the training dataset and first dataset as testing dataset as per Antonio's recommendation.
  2. Convergence of centroids: Need to understand if the centroid is converging to the darkest or largest dark patch
  3. Pick tougher negative examples: The negative examples selected last time where random locations. This time it will be the drifted SIFT point farther than (2 * diskSize) distance.
  4. Performance enhancement: We need C Code for distance calculation and interface with Matlab.

Thursday, August 28, 2008

kNN Results

The kNN experiment was run for values of k=1,2,3,4,5,10,15,20 and surprisingly all of them have got an accuracy of 1. Weird!!! The result files are stored in the files /usr/sci/crcnsdata/CRCNS/Synapses/Code/Matlab/kNN/knnsynapse-resultsX.mat where X = k value of the classifier. I am skeptical about the results. I will review the code again generate SIFT points for image 2 marked up my Dr. Marc and test the results on it.

Tuesday, August 26, 2008

Gabor Filter Banks

Gabor Filter: The Gabor filter is linear filter. It is a harmonic function (sinusoidal function) multiplied with the a Gaussian envelope function.

Gabor = Harmonic function X 2D Gaussian envelope.

The harmonic function has two attributes the wavelength and phase and the Gaussian envelope has sigma_x and sigma_y and theta the orientation of the ellipsoidal shaped Gaussian. When constructing a filter bank we need to construct many Gabor filter of various wavelengths and orientations. The gabor filters can be generated by the code found in /usr/sci/crcnsdata/CRCNS/Synapses/Code/Matlab/textureClassification/gabor_fn.m
The below image are a sample of these filters.

The Gabor filter response looks like a edge detection filter. The below images are the original & Gabor filter bank response. The filter bank is constructed for 20 different orientations. The max response of a pixel for each filter bank is chosen as the filter bank response.



Synapse Status: As the kNN classifier continues to run, I am going through the various filtering approaches of texture classification as described in here. This blog talks about Gabor filters in general.

Thursday, August 21, 2008

Porting code to C++

Since the crash of the hex, I have been porting my Matlab code to C++. I have been experimenting with Qt, Xerces and ITK.

Monday, August 18, 2008

Texture Classification: Signal Processing: Heuristically designed filters

Laws Filter Masks
The laws filters tried to identify the levels, edges, spots, waves, ripple features in the image. It provides 25 features which are a combination of filters to detect the above features in the image. The code to generate the Laws features can be found in the following file
/usr/sci/crcnsdata/CRCNS/Synapses/Code/Matlab/textureClassification/generateLawsFeatures.m


References:
  1. http://www.c3.lanl.gov/~kelly/notebook/laws.shtml

Texture Classification

For adding features that could be used for the synapse classification, we will be looking into methods that are used for texture classification. Texture classification is one of the important components of texture analysis. The other components of texture analysis are texture segmentation, texture synthesis, shape identification from texture.

Texture classification is done by different types of Techniques. The major types are the following:
  1. Statistical
    1. Co-occurrence
    2. Angular Second Moment
    3. Contrast
    4. Correlation
    5. Entropy
  2. Geometrical
  3. Structural
  4. Model-based
    1. Multi-resolution autoregressive
  5. Signal processing features
    1. Heuristically designed filter banks
      1. Laws Filter Masks
      2. Ring and Wedge Filters
      3. Dyadic Gabor Filter bank
      4. Wavelet transfer, Packets and Frames
      5. Discrete Cosine Transform
      6. Quadrature Mirror Filters
      7. Tree structured Gabor Filter bank
    2. Optimized filter and filter banks
      1. Eigenfilter
      2. Prediction error filter
      3. Optimized representation Gabor Filter bank
      4. Optimized Two-Class Gabor Filter
      5. Optimized Multi-Class Gabor Filter
      6. Optimized Two-Texture FIR Filters
      7. Optimized FIR Filter bank
      8. Back Propagation Designed Mask
References:
  1. Filtering for Texture Classification: A Comparative Study
  2. http://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/OJALA1/texclas.htm

Friday, August 15, 2008

Animated GIFs showing rotated Synapses regions

I created code that will create Animated GIFs in MATLAB the code can be found in the following location
/usr/sci/crcnsdata/CRCNS/Synapses/Code/Matlab/kNN/CreateRotatedRegionGIFs.m

The animated GIFs of the rotating synapses were created successfully but it looks like the animated GIFs don't work in the Blog :( .

Thursday, August 14, 2008

Orientation of Synapse: From Image moments

In this method we will calculate the orientation of the synapse from the image moments. Here the moments are calculated calculated from the gray scale image. The method is described in the following location in Wikipedia. The code for this method and the previous method are in /usr/sci/crcnsdata/CRCNS/Synapses/Code/Matlab/orientation/getOrientation.m.

The result for that is almost the same as the previous method.

Orientation of Synapse

Apart from predicting the location of the synapse we will also predict the orientation of the synapse. We will be trying out the following methods to determine the orientation:
  1. Fitting an ellipse
  2. Image moments
  3. Gradient image analysis
  4. Image processing toolbox
As per previous months meeting discussions apart from determining the orientation of the synapse the direction of the synapse also has to be Synapse also has to be determined. The clue that we have is the pre-synaptic densities bend the membrane to a concave shape (from the receptor neuron). One visual clue that we have is that the pre-synaptic density if vesicles makes the transmitter part of the Synapses appear blobbier.

The easiest one to try quickly is the one in image processing toolbox. The algorithm for determining the angle that is the following.
  1. For the circular patch, threshold values below the gray value that is at 50 percentile point.
  2. For the resulting binary image choose the largest patch and determine the orientation of the match from the "regionprops" command.
  3. The orientation of the synapse is perpendicular to the orientation of the region.


The results for this method are shown below. The input image is shown to the left. THe input image is an image downsampled 25% and smoothened by perona-malik smoothening for 25 iterations. Then the grey values of this image have be ordered and the 50 percentile grey value is calculated and all grey value above it are set as "1" and the rest are set as "0".








The result of the above thresholding is shown to the left here. The region property of orientation is determined which is likely to the orientation of the membrane. The orientation of the synapse is perpendicular to the synapse. Hence the orientation of the synapse is +15.7 degrees to the positive X-axis.