Please use this identifier to cite or link to this item:
http://hdl.handle.net/10174/13962
|
Title: | Comparison of Different Graph Distance Metrics for Semantic Text Based Classification |
Authors: | Das, Nibaran Gosh, Swarnendu Gonçalves, Teresa Quaresma, Paulo |
Issue Date: | 2014 |
Abstract: | Nowadays semantic information of text is used
largely for text classification task instead of bag-of-words
approaches. This is due to having some limitations of bag of word
approaches to represent text appropriately for certain kind of
documents. On the other hand, semantic information can be
represented through feature vectors or graphs. Among them,
graph is normally better than traditional feature vector due to its
powerful data structure. However, very few methodologies exist
in the literature for semantic representation of graph. Error
tolerant graph matching techniques such as graph similarity
measures can be utilised for text classification. However, the
techniques like Maximum Common Subgraph (mcs) and
Minimum Common Supergraph (MCS) for graph similarity
measures are computationally NP-hard problem. In the present
paper summarized texts are used during extraction of semantic
information to make it computationally faster. The semantic
information of texts are represented through the discourse
representation structures and later transformed into graphs. Five
different graph distance measures based on Maximum Common
Subgraph (mcs) and Minimum Common Supergraph (MCS) are
used with k-NN classifier to evaluate text classification task. The
text documents are taken from Reuters21578 text database
distributed over 20 classes. Ten documents of each class for both
training and testing purpose are used in the present work. From
the results, it has been observed that the techniques have more or
less equivalent potential to do text classification and as good as
traditional bag-of-words approaches. |
URI: | http://hdl.handle.net/10174/13962 |
Type: | article |
Appears in Collections: | INF - Publicações - Artigos em Revistas Internacionais Com Arbitragem Científica
|
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.
|