Sethi, Prerna; Alagiriswamy, Sathya

Association Rule Based Similarity Measures for the Clustering of Gene Expression Data

Prerna Sethi^{*, 1}, Sathya Alagiriswamy²

¹ Department of Health Informatics and Information Management and Biological Sciences, Ruston, USA

² Department of Biomedical Engineering, Louisiana Tech University, Ruston, LA 71272, USA

Article Information

Identifiers and Pagination:

Year: 2010
Volume: 4
First Page: 63
Last Page: 73
Publisher Id: TOMINFOJ-4-63
DOI: 10.2174/1874431101004010063

Article History:

Received Date: 10/10/2009
Revision Received Date: 5/11/2009
Acceptance Date: 5/11/2009
Electronic publication date: 28/5/2010
Collection year: 2010

Article Metrics

CrossRef Citations:

Total Statistics:

Full-Text HTML Views: 2698
Abstract HTML Views: 2255
PDF Downloads: 210
Total Views/Downloads: 5163

Unique Statistics:

Full-Text HTML Views: 1174
Abstract HTML Views: 1159
PDF Downloads: 146
Total Views/Downloads: 2479

© Sethi and Alagiriswamy; Licensee Bentham Open.

open-access license: This is an open access article licensed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted, non-commercial use, distribution and reproduction in any medium, provided the work is properly cited.

^* Address correspondence to this author at the Department of Health Informatics and Information Management and Biological Sciences, Louisiana Tech University, Ruston, LA 71272, USA; Tel: 318-257-2862; Fax: 318-257-4896; E-mail: prerna@latech.edu

In life threatening diseases, such as cancer, where the effective diagnosis includes annotation, early detection, distinction, and prediction, data mining and statistical approaches offer the promise for precise, accurate, and functionally robust analysis of gene expression data. The computational extraction of derived patterns from microarray gene expression is a non-trivial task that involves sophisticated algorithm design and analysis for specific domain discovery. In this paper, we have proposed a formal approach for feature extraction by first applying feature selection heuristics based on the statistical impurity measures, the Gini Index, Max Minority, and the Twoing Rule and obtaining the top 100-400 genes. We then analyze the associative dependencies between the genes and assign weights to the genes based on their degree of participation in the rules. Consequently, we present a weighted Jaccard and vector cosine similarity measure to compute the similarity between the discovered rules. Finally, we group the rules by applying hierarchical clustering. To demonstrate the usability and efficiency of the concept of our technique, we applied it to three publicly available, multiclass cancer gene expression datasets and performed a biomedical literature search to support the effectiveness of our results.

Keywords:: Microarray gene expression, association rules, similarity measure, clustering.

View Fulltext HTML Download PDF

RESEARCH ARTICLE

Association Rule Based Similarity Measures for the Clustering of Gene Expression Data

Article Information

Identifiers and Pagination:

Article History:

Article Metrics

CrossRef Citations:

Total Statistics:

Unique Statistics:

Abstract

Published Contents

About the Journal

The Open Medical Informatics Journal