Name: | Description: | Size: | Format: | |
---|---|---|---|---|
2.1 MB | Adobe PDF |
Advisor(s)
Abstract(s)
Chatbots are becoming increasingly popular and require the ability to interpret natural
language to provide clear communication with humans. To achieve this, intent detection is cru cial. However, current applications typically need a significant amount of annotated data, which
is time-consuming and expensive to acquire. This article assesses the effectiveness of different text
representations for annotating unlabeled dialog data through a pipeline that examines both classical
approaches and pre-trained transformer models for word embedding. The resulting embeddings
were then used to create sentence embeddings through pooling, followed by dimensionality re duction, before being fed into a clustering algorithm to determine the user’s intents. Therefore,
various pooling, dimension reduction, and clustering algorithms were evaluated to determine the
most appropriate approach. The evaluation dataset contains a variety of user intents across differ ent domains, with varying intent taxonomies within the same domain. Results demonstrate that
transformer-based models perform better text representation than classical approaches. However,
combining several clustering algorithms and embeddings from dissimilar origins through ensemble
clustering considerably improves the final clustering solution. Additionally, applying the uniform
manifold approximation and projection algorithm for dimension reduction can substantially improve
performance (up to 20%) while using a much smaller representation.
Description
Keywords
BERT Chatbots Embedding clustering Intent detection Natural language processing Natural language understanding RoBERTa Word and sentence embedding . Faculdade de Ciências Exatas e da Engenharia
Citation
Moura, A.; Lima, P.; Mendonça, F.; Mostafa, S.S.; Dias, F. M. On the Use of Transformer-Based Models for Intent Detection Using Clustering Algorithms. Appl. Sci. 2023, 13, 5178. https://doi.org/10.3390/app13085178
Publisher
MDPI