A comparison of different classification methods

Morelli, Gianluca

Please use this identifier to cite or link to this item: https://hdl.handle.net/1889/2296

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Cerioli, Andrea	-
dc.contributor.author	Morelli, Gianluca	-
dc.date.accessioned	2013-07-17T09:47:52Z	-
dc.date.available	2013-07-17T09:47:52Z	-
dc.date.issued	2013-04-11	-
dc.identifier.uri	http://hdl.handle.net/1889/2296	-
dc.description.abstract	Cluster analysis is the generic name of all those techniques which allow to aggregate n-units into k-groups where k is usually much smaller than n. Classification can be useful in many fields including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics and market research. Generalizing, cluster analysis is peculiar all times when we need to identify groups of units which have similar behaviour. The main objective of this work is to find an effective cluster analysis method which can be applied to different frameworks and in particular to market research. The aim of this work is to present a comparison among different methods to underline, if it exists, the strongest classification method, based on data structure, to get an optimal allocation for each dataset. To achieve this target we compare existing methods with new ones based on robust approaches which have shown high efficiency in many simulations performed so far. For the computational part of the work the software which has been used is MatLab. The structure of the thesis is as follows. The first chapter focuses on the problem of identifying outliers and how they affected the different classification techniques. In particular we consider: a) the method of k-means that represents the reference benchmark given its widespread diffusion in the economic sciences; b) the method of trimmed k-means which constitutes a robustification of the method of k-means, developed in the late 90s; c) the method of TCLUST which is one of the robust methods attracting the main research efforts in the statistical literature; d) the Forward Search, which is a robust method developed in large part within the Department of Economics of University of Parma and the London School of Economics, whose potentiality for classification purposes are still largely unexplored. The second chapter is focused on the tests of the methods introduced on simulated data sets generated by various types of distributions with different degrees of overlapping observations. The purpose is to understand which method and which calibration of the parameters allows to obtain the best classification. The results of the classification are then measured through performance indices of proper allocation which allow to obtain a comparison of the different methods. In the third chapter we will test the methods on a real data set of marketing interest. Finally, the thesis concludes with an appendix that describes the contributions of the work in the field of computing.	it
dc.language.iso	Inglese	it
dc.publisher	Università di Parma. Dipartimento di Economia	it
dc.relation.ispartofseries	Dottorato di ricerca in Economia	it
dc.rights	© Gianluca Morelli, 2013	it
dc.subject	Classification	it
dc.subject	Graphical dynamic clustering	it
dc.subject	Robustness	it
dc.title	A comparison of different classification methods	it
dc.type	Doctoral thesis	it
dc.subject.soggettario	Campionamento a grappoli	it
dc.subject.soggettario	Classificazione - Metodi matematici	it
dc.subject.soggettario	Analisi multivariata	it
dc.subject.miur	SECS-S/01	it
Appears in Collections:	Economia. Tesi di dottorato

Files in This Item:

File	Description	Size	Format
G_Morelli_phd_thesis.pdf Until 2101-01-01	Tesi di dottorato	6.09 MB	Adobe PDF	View/Open Request a copy

Show simple item record

DSpaceUnipr

DSpaceUnipr is the institutional repository of the University of Parma. Its aim is to give visibility to the University's scholarly content and learning material.