Deep Learning: Exploring High Level APIs of Knet.jl and Flux.jl in comparison to Tensorflow-Keras
When it comes to complex modeling, specifically in the field of deep learning, the go-to tool for most researchers is the Google’s TensorFlow. There are a number of good reason as to why, one of it is the fact that it provides both high and low level APIs that suit the needs of both beginners and advanced users, respectively. I have used it in some of my projects, and indeed it was powerful enough for the task. This is also due to the fact that TensorFlow is one of the most actively developed deep learning framework, with Bayesian inference or probabilistic reasoning as the recent extension (see TensorFlow Probability, another extension is the TensorFlow.js). While the library is written majority in C++ for optimization, the main API is served in Python for ease of use. This design works around the static computational graph that needs to be defined declaratively before executed. The static nature of this graph, however, led to difficulty on debugging the models since the codes are itself data for defining the computational graph. Hence, you cannot use a debugger to check the results of the models line by line. Thankfully, it’s 2019 already and we have a stable Eager Execution that allows users to immediately check the results of any TensorFlow operations. Indeed, this is more intuitive and more pythonic. In this article, however, we’ll attempt to explore, what else we have in 2019. In particular, let’s take look at Julia’s deep learning libraries and compare it to high level APIs of TensorFlow, i.e. Keras’ model specification.
As a language that leans towards numerical computation, it’s no surprise that Julia offers a number of choices for doing deep learning, here are the stable libraries:
- Flux.jl - The Elegant Machine Learning Stack.
- Knet.jl - Koç University deep learning framework.
- MLJ.jl - Julia machine learning framework by Alan Turing Institute.
- MXNet.jl - Apache MXNet Julia package.
- TensorFlow.jl - A Julia wrapper for TensorFlow.
Other related packages are maintained in JuliaML. For this article, we are going to focus on the usage of Flux.jl and Knet.jl, and we are going to use the Iris dataset for classification task using Multilayer Perceptron. To start with, we need to install the following packages. I’m using Julia 1.1.0. and Python 3.7.3.
Loading and Partitioning the Data
The random seed set above is meant for reproducibility as it will give us the same random initial values for model training. The
iris variable in line 11 (referring to Julia code) contains the data, and is a data frame with 150 × 5 dimensions, where the columns are: Sepal Length, Sepal Width, Petal Length, Petal Width, and Species. There are several ways to partition this data into training and testing datasets, one procedure is to do stratified sampling, with simple random sampling without replacement as the sampling selection within each stratum — the species. The following codes define the function for partitioning the data with the mentioned sampling design:
Extract the training and testing datasets using the function above as follows:
All three codes above extract
xtrn, the training data (feature) matrix of size 105 × 4 (105 observations by 4 features) dimensions;
ytrn, the corresponding training target variable with 105 × 1 dimension;
xtst, the feature matrix for testing dataset with 45 × 4 dimensions; and
ytst, the target variable with 45 × 1 dimension for testing dataset. Moreover, contrary to TensorFlow-Keras, Knet.jl and Flux.jl need further data preparation from the above partitions. In particular, Knet.jl takes minibatch object as input data for model training, while Flux.jl needs one-hot encoding for the target variables
ytst. Further, unlike Knet.jl which ships with minibatch function, Flux.jl gives the user the flexibility to create their own.
Specify the Model
The model that we are going to use is a Multilayer Perceptron with the following architecture: 4 neurons for the input layer, 10 neurons for the hidden layer, and 3 neurons for the output layer. The first two layers contain bias, and the neurons of the last two layers are activated with Rectified Linear Unit (ReLU) and softmax functions, respectively. The diagram below illustrates the architecture described: The codes below specify the model:
Coming from TensorFlow-Keras, Flux.jl provides Keras-like API for model specification, with
Flux.Chain as the counterpart for Keras’
Sequential. This is different from Knet.jl where the highest level API you can get are the nuts and bolts for constructing the layers. Having said, however,
Flux.Dense is defined almost exactly as the Dense struct of the Knet.jl code above (check the source code here). In addition, since both Flux.jl and Knet.jl are written purely in Julia, makes the source codes under the hood accessible to beginners. Thus, giving the user a full understanding of not just the code, but also the math. Check the screenshots below for the distribution of the file types in the Github repos of the three frameworks:
From the above figure, it’s clear that Flux.jl is 100% Julia. On the other hand, Knet.jl while not apparent is actually 100% Julia as well. The 41.4% of Jupyter Notebooks and other small percentages account for the tutorials, tests and examples and not the source codes.
Train the Model
Finally, train the model as follows for 100 epochs:
The codes (referring to Julia codes) above save both loss and accuracy for every epoch into a data frame and then into a CSV file. These will be used for visualization. Moreover, unlike Flux.jl and Knet.jl which require minibatch preparation prior to training, TensorFlow-Keras specifies this on
fit method as shown above. Further, it is also possible to train the model in Knet.jl using a single function without saving the metrics. This is done as follows:
The Flux.jl code above simply illustrates the use of
Flux.@epochs macro for looping instead of the
for loop. The loss of the model for 100 epochs is visualized below across frameworks:
From the above figure, one can observe that Flux.jl had a bad starting values set by the random seed earlier, good thing Adam drives the gradient vector rapidly to the global minimum. The figure was plotted using Gadfly.jl. Install this package using
Pkg as described in the first code block, along with Cario.jl and Fontconfig.jl. The latter two packages are used to save the plot in PNG format, see the code below to reproduce:
Evaluate the Model
The output of the model ends with a vector of three neurons. The index or location of the neurons in this vector defines the corresponding integer encoding, with 1st index as setosa, 2nd as versicolor, and 3rd as virginica. Thus, the codes below take the argmax of the vector to get the integer encoding for evaluation.
The figure below shows the traces of the accuracy during training: TensorFlow took 25 epochs before surpassing 50% again. To reproduce the figure, run the following codes (make sure to load Gadfly.jl and other related libraries mentioned earlier in generating the loss plots):
At this point, we are going to record the training time of each framework.
The benchmark was done by running the above code repeatedly for about 10 times for each framework, I then took the lowest timestamp out of the results. In addition, before running the code for each framework, I keep a fresh start of my machine. The code of the above figure is given below (make sure to load Gadfly.jl and other related libraries mentioned earlier in generating the loss plots):
In conclusion, I would say Julia is worth investing even for deep learning as illustrated in this article. The two frameworks, Flux.jl and Knet.jl, provide a clean API that introduces a new way of defining models, as opposed to the object-oriented approach of the TensorFlow-Keras. One thing to emphasize on this is the
for loop which I plainly added in training the model just to save the accuracy and loss metrics. The
for loop did not compromise the speed (though Knet.jl is much faster without it). This is crucial since it let’s the user spend more on solving the problem and less on optimizing the code. Further, between the two Julia frameworks, I find Knet.jl to be Julia + little-else, as described by Professor Deniz Yuret (the main developer), since there are no special APIs for Dense, Chains, etc., you have to code it. Although this is also possible for Flux.jl, but Knet.jl don’t have these out-of-the-box, it ships only with the nuts and bolts, and that’s the highest level APIs the user gets. Having said, I think Flux.jl is a better recommendation for beginners coming from TensorFlow-Keras. This is not to say that Knet.jl is hard, it’s not if you know Julia already. In addition, I do love the extent of flexibility on Knet.jl by default which I think is best for advanced users. Lastly, just like the different extensions of TensorFlow, Flux.jl is flexible enough that it works well with Turing.jl for doing Bayesian deep learning, which is a good alternative for TensorFlow Probability. For Neural Differential Equations, Flux.jl works well with DifferentialEquations.jl, checkout DiffEqFlux.jl.
In my next article, we will explore the low level APIs of Flux.jl and Knet.jl in comparison to the low level APIs of TensorFlow. One thing that’s missing also from the above exercise is the use of GPU for model training, and I hope to tackle this in future articles. Finally, I plan to test these Julia libraries on real deep learning problems, such as computer vision and natural language processing (checkout the workshop on these from JuliaCon 2018).
If you are impatient, here are the complete codes excluding the benchmarks and the plots. These should work after installing the required libraries shown above:
- Yuret, Deniz (2016). Knet: beginning deep learning with 100 lines of Julia. 30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.
- Innes, Mike (2018). Flux: Elegant machine learning with Julia. Journal of Open Source Software, 3(25), 602, https://doi.org/10.21105/joss.00602
- Abadi, Martin et al (2016). TensorFlow-Keras: A system for large-scale machine learning. 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16). p265–283.