Arvados 2.4 now supports using GPUs, specifically the NVIDIA CUDA platform. In this blog post, we will show an example of running a TensorFlow machine learning demo on Arvados.
About the GPU Demo
We will be running using a demo that uses TensorFlow. TensorFlow is a
free and open-source software platform used for machine learning and
artificial intelligence with a focus on neural networks. This
TensorFlow demo trains a neural network to create a classifier that
identifies handwritten digits. It uses data from a famous dataset,
the MNIST handwritten digit dataset,
which is a large database of images of handwritten digits. It is a
classic “real-world” dataset used by those wishing to learn more about
machine learning. Each image is stored as 28x28 pixels (28x28 matrix
of grayscale values).
Load and Preprocess a Training Set: This includes loading in the handwritten digits data, normalizing by 250 and converting the integers to double precision numbers. The data loaded in is already divided into a test and training set.
Build a Neural Network: In this case, a sequential model is used to create the neural network. The model is created to take in an input of an image (28 x 28) and return the probability that the image is each class (i.e. 0-9).
Train the Neural Network: The model is trained (i.e.the model parameters are adjusted to minimize the chosen loss function) using the training data.
Evaluate Accuracy of the Neural Network: The model is then tested on the “test-set”. This means, the model is run to classify images that were not used to train the model. Then the accuracy is calculated on how well the model did on predicting the actual digit in these new images. The classifier is found to have ~98% accuracy on the testing set.
The ultimate output is a trained and tested neural network model (e.g. the model) that can be used to identify digits from new unclassified images. The model can then be applied to novel input to make classification decisions.
predictions = model(input)
Gives log-odds for each possible digit classification
With the array of probabilities, one can determine a classification by finding the highest probability of match over a threshold.
Running the Demo on Arvados
Now we wrap the TensorFlow Python script with a Common Workflow Language (CWL) CommandLineTool to specify the Docker image that provides the TensorFlow software environment, the amount of RAM needed, and the GPU requirement. Arvados supports the CWL extension cwltool:CUDARequirement to request nodes with NIVIDA GPUs. This requirement also declares CUDA version and minimum compute capability needed to run your tool, which Arvados will use to select the correct compute environment to run the job. And, that is all that is needed to run your GPU-ready code on Arvados!