BerandaTak BerkategoriClojure: Going Faster Than TensorFlow on the GPU (GTX 1080Ti)

Clojure: Going Faster Than TensorFlow on the GPU (GTX 1080Ti)

You can adopt a pet function!
Support my work on my Patreon page, and access my dedicated discussion server. Can’t afford to donate? Ask for a free invite.

November 2, 2020

Please share: .

New books are available for subscription.




A few weeks ago I’ve shown you how simple Clojure’s
Deep Diamond() is, even compared to Keras. I’ve also mentioned
that it’s superfast. Here’s how fast it is on the GPU!

TL;DR Much faster than Keras+TensorFlow on the GPU, too!

In the previous article, we have only compared the libraries on the CPU.
Deep Diamond was considerably faster: 368 seconds vs 509 seconds. Most readers were intrigued,
but, being skeptical as they should be, they complained that CPU performance doesn’t matter
anyway, since everybody uses GPU for training convolution networks;
let’s do the GPU comparison then.

Both Deep Diamond, and Keras with TensorFlow, use Nvidia’s cuDNN low level performance
library under the hood, and any difference is due to the higher-level implementation.

Deep Diamond completes this training in 21 seconds while Keras + TensorFlow takes 35 seconds.
The gap even increased in favor of Deep Diamond! Now the ratio is 1.67, in place of 1.38 on the CPU.

Keras CNN in Python

I repeat the relevant model code for reference. We’re
interested in the running time of model.fit, with minimal verbosity,
for 12 epochs. I’m using Nvidia’s GTX 1080Ti GPU. Keras code is taken from official Keras examples.

model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),
                 activation='relu',
                 input_shape=(28, 28, 1)))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='softmax'))

model.compile(loss=keras.losses.categorical_crossentropy,
              optimizer=Adam(learning_rate=0.01),
              metrics=['accuracy'])

s = time.time_ns()
model.fit(x_train, y_train,
          batch_size=128,
          verbose=2,
          epochs=12)
e = time.time_ns()
print((e-s)/(109), " seconds")

Deep Diamond CNN in Clojure

In Clojure, we’re measuring the runtime of the train function.

(defonce net-bp
  (network (desc [128 1 28 28] :float :nchw)
           [(convo [32] [3 3] :relu)
            (convo [64] [3 3] :relu)
            (pooling [2 2] :max)
            (dropout)
            (dense [128] :relu)
            (dropout)
            (dense [10] :softmax)]))

(defonce net (init! (net-bp :adam)))

(time (train net train-images y-train :crossentropy 12 []))

The books

The book Deep Learning for Programmers: An Interactive Tutorial with
CUDA, OpenCL, DNNL, Java, and Clojure
teaches the nuts and bolts of neural networks and deep learning
by showing you how Deep Diamond is built, from scratch, in interactive sessions. Each line of code
can be executed and the results inspected in the plain Clojure REPL. The best way to master something is to build
it yourself!

It’ simple. But fast and powerful!

Please subscribe, read the drafts, get the full book soon, and support my work on this free open source library.

Read More

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments