The below graph shows the speed up results for a convolution neural network, The network has 3 hidden layers and 4 nodes per hidden layer. The output layer is 1 pixel/voxel in size.
This implementation executes spacial convolutions. It doesn't use shared memory or texture memory hence there are large latencies involved. The speed up is not significant for the smaller kernels because there are not enough operations to parallelize because of the high modularity of the CUDA code.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment