Comments on: Natural Language Pipeline for Chatbots

By: surmenok

surmenok — Tue, 16 Jan 2018 03:03:00 +0000

Another idea. Entity recognition can be done by assigning a tag to each word (e.g. https://en.wikipedia.org/wiki/Inside%E2%80%93outside%E2%80%93beginning_(tagging) )
If you perform this kind of entity tagging first, then you can just a sequence of pairs (word, tag) into classification model instead of a sequence of words.
This way you don’t increase complexity of the model too much.

By: Kaushik Govindarajan

Kaushik Govindarajan — Tue, 16 Jan 2018 02:52:00 +0000

That sounds like good approach. So while converting my words to vectors I need to add additional features for the entities. While predicting, the NER has to be done first and based on its output I can predict the intent also. But the only thing that concerns me is that the features will keep increasing as my entities increase and hence the classification algorithm may not perform well.
Thanks

By: surmenok

surmenok — Sun, 14 Jan 2018 19:23:00 +0000

Yes. One way to do this is to perform entity recognition first, and then add features like HasTime (0 or 1), HasDate (0 or 1). Using such features together with raw text of the message can be very helpful.

By: Kaushik Govindarajan

Kaushik Govindarajan — Sun, 14 Jan 2018 15:52:00 +0000

Thanks, I am aware of the entity extraction part but I was wondering if there is a way to tell the classifier that a time expression is expected in this part of the sentence. This might significantly improve the performance of the classifier.

By: surmenok

surmenok — Sat, 13 Jan 2018 18:17:00 +0000

intent is restaurant_booking. “7pm tomorrow” is an entity, you should treat it as entity recognition problem. Parsing dates and times can be quite hard because there are many ways how people can express it. This article about x.ai architecture for natural language understanding may help: https://x.ai/blog/a-peek-at-x-ais-data-science-architecture/ They deal with datetime parsing a lot.

By: Kaushik Govindarajan

Kaushik Govindarajan — Sat, 13 Jan 2018 14:35:00 +0000

Hi Pavel, the article serves as a good intro to a newbie. In response to the last part of your post, I am currently building an intent classifier and I have a brief idea about the training of static questions that contain hardcoded text. I am having problems in training sentences with temporal data. For ex. consider and example sentence and an intent
Book a table for 2 at 7pm tomorrow – restaurant_booking
Now in this case “7pm tomorrow” is an arbitrary value and a user can enter any time expression. So in this case how should I structure my training data in order to train a more accurate model?