As an exercise I've coded a C++ fully connected neural network and am testing it on MNIST. The network is 784x512x512x10.
I'm looking for the fastest reasonable CPU I can run this on.
I've tried an i9-9900K with 4x 16GB RAM:
1 thread 3.8k training samples / second
15 threads 24k training samples / second
Also…2x Xeon Gold 6130 16-Core CPUs with 4x 32GB RAM:
1 thread 1.8k training samples / second
63 threads 26k training samples / second
My feeling is that the memory bandwidth is an issue (shown by the non-linear thread performance).
What else should I try this on to reach better performance than the i9?
I'm looking for the fastest reasonable CPU I can run this on.
I've tried an i9-9900K with 4x 16GB RAM:
1 thread 3.8k training samples / second
15 threads 24k training samples / second
Also…2x Xeon Gold 6130 16-Core CPUs with 4x 32GB RAM:
1 thread 1.8k training samples / second
63 threads 26k training samples / second
My feeling is that the memory bandwidth is an issue (shown by the non-linear thread performance).
What else should I try this on to reach better performance than the i9?
Last edited: