Fast.ai is a great deep learning course for those who prefer to learn by doing. Unlike other courses, here you will build a deep neural network that achieves good results in an image recognition problem in an hour. You start from working code and then dig deeper into the theory and the ways to improve your algorithm. Authors of the course are Jeremy Howard and Rachel Thomas.
I had prior knowledge of machine learning in general and neural networks in particular. I completed Andrew Ng’s Machine Learning course on Coursera, read a lot of articles, papers, and books, including parts of the Deep Learning book, and I’ve been building machine learning applications at work for more than a year. Nevertheless, I learned a lot from fast.ai lessons. When you read books or papers, you learn a lot about the mathematics of deep learning, and little about how to use them in practice. Fast.ai is very helpful to learn practical aspects: how to split the dataset, how to choose good values of hyperparameters, how to prevent overfitting.
So far, I went through the first three lessons. Lesson 1 is mostly about setting up a development environment: set up an AWS account, create a virtual machine on Amazon EC2, run Jupyter Notebook there and execute the code that trains a convolutional neural network. The code sets up a Vgg16 CNN, loads weights pretrained on the ImageNet dataset, finetunes the model on a dataset from Dogs vs. Cats Kaggle competition, and then makes a prediction on test data to submit to Kaggle. I was doing something like that before, but it’s still surprising to me how easy it is to do transfer learning with CNN’s and get great results. Very inspiring.
They use Keras with Theano backend (you can use TensorFlow backend too, all the code still works).
The gaming industry has a significant impact on deep learning and self-driving cars. GPUs fueled the progress in deep learning, as they enabled training much larger neural networks on large datasets.
For about 20 years, GPUs were mostly used by gamers. Every year NVIDIA was pouring billions of dollars into R&D to make better and faster GPUs to support better gaming experience. In the early 2000s, they started considering scientific computing and machine learning acceleration, or, more generally, general purpose GPU computation (GPGPU). They released the first version of CUDA, a parallel computing platform, in 2007. CUDA made it much easier to program for GPUs, which led to more experimentation by researchers.
In 2012, SuperVision group (Alex Krizhevsky, Geoffrey Hinton, and Ilya Sutskever) used CUDA to develop the AlexNet model that won the ImageNet Large Scale Visual Recognition Challenge. Perhaps, it was the event that brought the attention of machine learning community to the power of deep neural networks and GPU computation. Now NVIDIA dominates the market of hardware for training deep neural networks and is moving the space of hardware for inference. That’s interesting to think that gamers of 90′s and 2000′s paid for R&D of the deep learning hardware.
Udacity has announced a Flying Car Nanodegree Program in September. They want to teach students the skills necessary to build the future of smart transportation. Students will develop the software skills and conceptual understanding for building an autonomous flight system for quadrotor and fixed-wing drones.
Broadcom announced a new GPS chip that will give 30-centimeter accuracy instead of today’s 5 meters.
They are able to achieve better accuracy because of the new GPS L5 signal. It is a civilian “safety of life” signal, designed to provide means of secure and robust navigation enough for life-critical applications, like aircraft precision approach guidance. GPS, QZSS (Japanese satellite system) and Galileo (European satellite system) started deploying satellites with L5 support in 2011, and now there are enough satellites in space to start using this functionality.
It should have an impact on self-driving cars. Autonomous vehicles must know precisely where they are, relative to a high-definition map. Using L5 together with together with older L1 and L2 signals gives an order of magnitude improvement in navigation accuracy.
Jeff Dean is a Google Senior Fellow. He leads the Google Brain project. He spoke at Y Combinator in August 2017. The video is available on YouTube, and slides on Scribd.
If you develop personalization of user experience for your website or an app, contextual bandits can help you. Using contextual bandits, you can choose which content to display to the user, rank advertisements, optimize search results, select the best image to show on the page, and much more.
There are many names for this class of algorithms: contextual bandits, multi-world testing, associative bandits, learning with partial feedback, learning with bandit feedback, bandits with side information, multi-class classification with bandit feedback, associative reinforcement learning, one-step reinforcement learning.
Researchers approach the problem from two different angles. You can think about contextual bandits as an extension of multi-armed bandits, or as a simplified version of reinforcement learning.
OpenAI’s bot is the first ever to defeat world’s best players in DotA 2 at The International 2017. It is a major step for AI in eSports. The bot was trained through self-play, but some tactics were hardcoded.
The bot doesn’t play DotA in regular 5v5 setup. It can only beat humans in 1v1 play. Team work will be harder to learn. Also, the bot plays only one character and has a few unfair advantages comparing to human players, e.g. it’s likely to have access to exact information such as distance to other players on the map and health.
The field of deep learning is very active, arguably there are one or two breakthroughs every week. Research papers, industry news, startups, and investments. How to keep up with the news?
There are a few newsletters with well-curated links and summaries:
Papers and code:
- AI section of Arxiv.org is useful if you are looking for the latest research papers.
- Gitxiv is a collection of source code links for deep Arxiv papers.
Good regular podcasts about deep learning:
Data is the most important component for building a machine learning model. Recently researchers from Google trained a CNN model for image classification on 300 million images and they demonstrated that even on a scale of hundreds of millions of examples adding more data helps to improve the model performance. Apparently, more data is better. But where can you get large datasets if you are doing research on text classification?
I found nice references to a few large text classification datasets in “Text Understanding from Scratch” paper by Xiang Zhang and Yann LeCun. The paper describes a character-level CNN model for text classification. Authors provide benchmarks of different CNN architectures and a few simple models on a few datasets. More recent version of this paper: “Character-level Convolutional Networks for Text Classification” contains more experimental results but it misses some details on dataset usage: which fields to use, how to truncate long texts, etc. If you are looking for information about datasets, read the older paper. If you want to learn more about the character level CNN models, read the latest paper.
Somebody uploaded the datasets to Google Drive, so you can download them here.
If you have other large text classification datasets, please share in comments to this post.
I started looking at Kaggle competitions to practice my machine learning skills. One of currently running competitions is framed as an image classification problem. Intel partnered with MobileODT to start a Kaggle competition to develop an algorithm which identifies a woman’s cervix type based on images.
The training set contains 1481 images split into three types. Kagglers can use 6734 additional images. Some of them come from duplicate patients. Some of the additional images are lower quality. Test sets for two stages of the competition are available, kagglers have to submit a set of predicted probabilities, one for each of 3 classes, for each image of the test set. The total prize pool is $100,000.
I tried to approach the problem in a naïve way: just get a pre-trained Inception V3 image classification model and fine-tune it on this dataset.