Text analysis with deep learning and data augmentation

Karimi, Akbar

Please use this identifier to cite or link to this item: https://hdl.handle.net/1889/4787

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Prati, Andrea	-
dc.contributor.author	Karimi, Akbar	-
dc.date.accessioned	2022-06-15T08:26:05Z	-
dc.date.available	2022-06-15T08:26:05Z	-
dc.date.issued	2022	-
dc.identifier.uri	https://hdl.handle.net/1889/4787	-
dc.description.abstract	With the vast amount of textual data available on the Web, it is becoming increasingly difficult to analyze them manually. Therefore, there is a growing need to automatically process them for various applications such as opinion mining, sentiment classification, and question answering to name but a few. While traditional text analysis techniques such as N-gram language models can perform reasonably well, they still rely on manual feature engineering. Deep neural networks do away with manually designing features and allow us to create systems with the capability of end-to-end data processing. In order to do this effectively, they depend heavily on the amount of input data for training. However, the data can still be scarce for applications or domains that are newly worked on. In these cases, data augmentation techniques can be used to augment the input data to help networks perform better. In this dissertation, we make several contributions to text analysis by addressing some of its problems including Sentiment Analysis (SA), Toxic Language Detection (TLD), Text Classification (TC). Firstly, we introduce a novel deep architecture to address Aspect-Based Sentiment Analysis (ABSA), combining adversarial training, which is a form of data augmentation in the embedding space, with a state-of-the-art pre-trained language model called BERT. Then, we propose two additive modules that are attached on top of BERT and help improve the model performance. Furthermore, we introduce a simple bag-of-words model which performs reasonably well in detecting toxic language despite its simplicity. Moreover, we put forward a novel data augmentation technique in the input space, and show that it is fruitful for neural network models applied on various text classification data sets. Finally, collecting product image and comments from social media, we build an annotated multimodal dataset that can be utilized to address Aspect-Based Emotion Analysis (ABEA).	en_US
dc.language.iso	Inglese	en_US
dc.publisher	Università degli studi di Parma. Dipartimento di Ingegneria e architettura	en_US
dc.relation.ispartofseries	Dottorato di ricerca in Tecnologie dell'informazione	en_US
dc.rights	© Akbar Karimi, 2022	en_US
dc.rights	Attribuzione 4.0 Internazionale	en_US
dc.rights.uri	http://creativecommons.org/licenses/by/4.0/	*
dc.subject	Aspect-Based Sentiment Analysis	en_US
dc.subject	Text Classification	en_US
dc.subject	Toxic Language Detection	en_US
dc.subject	Data Augmentation	en_US
dc.subject	Aspect-Based Emotion Analysis	en_US
dc.subject	Text Analysis	en_US
dc.title	Text analysis with deep learning and data augmentation	en_US
dc.type	Doctoral thesis	en_US
dc.subject.miur	INF/01	en_US
Appears in Collections:	Tecnologie dell'informazione. Tesi di dottorato

Files in This Item:

File	Description	Size	Format
Akbar_Karimi_PhD_Thesis_V2_a.pdf		5.94 MB	Adobe PDF	View/Open
phd_final_report_a.pdf Restricted Access		50.75 kB	Adobe PDF	View/Open Request a copy

Show simple item record

This item is licensed under a Creative Commons License

DSpaceUnipr

DSpaceUnipr is the institutional repository of the University of Parma. Its aim is to give visibility to the University's scholarly content and learning material.