Google announced Google Assistant bot in May 2016 on Google I/O conference. The bot is integrated into a new messaging application Google Allo that was released on September 21, 2016.
Google Assistant can show weather, news, travel ideas, restaurants, put events on your calendar, and make restaurant reservations. I guess it can do most of the things that Google Now could handle.
Google Assistant is going to be an “uber-bot”: a bot that serves as an entry point for any user requests. The bot can recognize what you are asking and route the request to an appropriate specialized bot. On October 3, 2016, Google announced “Actions on Google” program, which will allow developers to build “actions” for Google Assistant.
Chatbots are on the rise. Startups are building chatbots, platforms, APIs, tools, analytics. Microsoft, Google, Facebook introduce tools and frameworks, and build smart assistants on top of these frameworks. Multiple blogs, magazines, podcasts report on news in this industry, and chatbot developers gather on meetups and conferences.
I have been working on chatbot software for a while, and I have been looking on what is going on in the industry. See my previous posts:
In this article, I will dive into architecture of chatbots.
One of the common natural language understanding problems is text classification. Over last few decades, machine learning researchers have been moving from the simplest “bag of words” model to more sophisticated models for text classification.
Bag of words model uses only information about which words are used in the text. Adding TFIDF to the bag of words helps to track relevancy of each word to the document. Bag of n-grams enables using partial information about structure of the text. Recurrent neural networks, like LSTM, can capture dependencies between words even if they are far from each other. LSTM learns structure of sentences from the raw data, but we still have to provide a list of words. Word2vec algorithm adds knowledge about word similarity, which helps a lot. Convolutional neural networks can also help to process word-based datasets.
A trend is to learn using raw data, and provide machine learning models with an access to more information about text structure. A logical next step would be to feed a stream of characters to the model and let it learn all about the words. What can be cruder than a stream of characters? An additional benefit is that the model can learn misspellings and emoticons. Also, the same model can be used for different languages, even those where segmentation into words is not possible.
When Apple introduced App Store in 2008, developers’ attention moved from web-based to native mobile apps.
A few years later the app market stabilized. Facebook, Amazon, and Google apps dominate in their verticals. Consumers don’t want to install new apps anymore. According to comScore’s mobile app report, most US smartphone owners download zero apps in a typical month, and a “staggering 42% of all app time spent on smartphones occurs on the individual’s single most used app”.
More than half of the time we spend on our phones is talking, texting or in email, according to Experian’s report.
In the end of January, RE-WORK organized a Virtual Assistant Summit, which took place in San Francisco at the same time as RE-WORK Deep Learning Summit.
Craig Villamor wrote a nice overview of key things discussed on the summit.
I didn’t attend these conferences, but I watched a few presentations which RE-Work kindly uploaded to YouTube. I would like to share notes I took while watching these videos. I could have misinterpreted something, so please keep that in mind, and watch original videos for more details.
Deep learning is computationally intensive. Model training and model querying have very different computation complexities. A query phase is fast: you apply a function to a vector of input parameters (forward pass), get results.
Model training is much more intensive. Deep learning requires large training datasets in order to produce good results. Datasets with millions of samples are common now, e.g. ImageNet dataset contains over 1 million images. Training is an iterative process: you do forward pass on each sample of the training set, do backward pass to adjust model parameters, repeat the process a few times (epochs). Thus training requires millions, or even billions more computation than one forward pass, and a model can include billions of parameters to adjust.
It’s interesting that The New York Times published an article about brain cryonics, immortality, connectomics, trans-humanism, and uploading. Kim Suozzi, who died of cancer at age 23, chose to have her brain preserved in hope to get alive sometime in future. One of the options is to scan the brain and map the connections between individual neurons.
“I can see within, say, 40 years that we would have a method to generate a digital replica of a person’s mind,” said Winfried Denk, a director at the Max Planck Institute of Neurobiology in Germany, who has invented one of several mapping techniques.”
“The mapping technique pioneered by Dr. Denk and others involves scanning brains in impossibly thin sheets with an electron microscope. Stacked together on a computer, the scans reveal a three-dimensional map of the connections between each neuron in the tissue, the critical brain anatomy known as the connectome.”
The author doesn’t dive into details of reconstructing a map of neuron connections, though. As Yan LeCunn points out, “connectomics efforts use 3D convolutional nets to analyze the volumetric brain images and to reconstruct the neural circuits.”
As strange as it may sound, neuroscientists use artificial neural networks to reconstruct models of human neural networks. Yet another good use of deep learning techniques.
I’m trying to solve one of recent Kaggle competitions: “ICDM 2015: Drawbridge Cross-Device Connections“. That competition provides data on device/browser usage and asks you to determine which cookies belong to an individual using a device.
The data for this competition is available in two formats: CSV files and SQLite database. A relational database looks more suitable for ad-hoc queries because SQL is a quite powerful tool: you can easily join tables, filter and group data. Though it lacks some of statistical analysis capabilities which you can get in R or other tools specialized in statistics.
SQLite is an awesome technology for small embedded databases, but there are certainly no good GUI applications for querying SQLite databases. I also was afraid that SQLite query execution engine is not very smart, and SQLite dialect of SQL is not rich enough, in comparison to SQL Server or Oracle, so I decided to import SQLite database into Microsoft SQL Server 2012.
A huge part of database related work is to make sure that the data is consistent. In real world data is never ideal, and whenever you need using data from existing data sources, you have to understand what is right and what is wrong there, and know how to circumvent data quality issues. Two most frequent data integrity issues in relational databases are missing date and duplicate data. A record/document is missing if it was not written to the database by an application, or was mistakenly deleted. A record/document is duplicated if it was recorded more than once.
Why does application write same record more than once? A user or an upstream code could send same document twice, and an application doesn’t handle this case. Or a user could send incorrect record the first time, and a corrected one later: an application could be designed to save all records instead of modifying existing ones.