search
for
 About Bioline  All Journals  Testimonials  Membership  News


Journal of Applied Sciences and Environmental Management
World Bank assisted National Agricultural Research Project (NARP) - University of Port Harcourt
ISSN: 1119-8362
Vol. 17, No. 4, 2013, pp. 537-548
Bioline Code: ja13060
Full paper language: English
Document type: Research Article
Document available free of charge

Journal of Applied Sciences and Environmental Management, Vol. 17, No. 4, 2013, pp. 537-548

 en On an Improved Fuzzy C-Means Clustering Algorithm
UKPAI OGBAN, FELIX; ASAGBA, PRINCE OGHENEKARO & OWOLABI, OLUMIDE

Abstract

A cluster is a gathering of similar objects which can exhibit dissimilarity to the objects of other clusters. Clustering algorithms may be classified as: Exclusive, Overlapping, Hierarchical, and Probabilistic; and several algorithms have been formulated for classification and found useful in different areas of application. The K-means, Fuzzy C-means, Hierarchical clustering, and Mixture of Gaussians are the most prominent of them. Our interest on this work is on the web search engines. In this paper, we examined the fuzzy c-means clustering algorithm in anticipation to improving upon its application area. On the Web, classification of page content is essential to focused crawling. Focused crawling supports the development of web directories, to topic-specific web link analysis, and to analysis of the topical structure of the Web. Web page classification can also help improve the quality of web search. Page classification is the process of assigning a page to one or more predefined category label. In all, the tendency for a web page to contain the qualities of two or more clusters could exist. Thus exclusive clustering would not be very useful in our case; so the need for overlapping clustering using Fuzzy C-means. It is worthy of note that the Fuzzy C-mean being an optimization problem, converges to a local minimum or a saddle point. The iteration in some cases becomes recurring. At such a point, one would assume the saddle point is reached and if the iteration is not terminated, the loop may continue to a stack-grab that may fault (increase running time, etc) the algorithm. In this work, we developed a modified fuzzy C-mean clustering algorithm with a sharp stopping condition which was tested on a demo data to ascertain its convergence and comparatively test its efficiency. Corel Q-pro optimizer was used on a timing macro. Our result(s) are quite interesting and challenging as they clearly show the presence of inter-lapping documents along the spectrum of two different clusters.

Keywords
Fuzzy clusters, unsupervised learning, classification, similarity measures, Page classification

 
© Journal of Applied Sciences and Environmental Management

Home Faq Resources Email Bioline
© Bioline International, 1989 - 2024, Site last up-dated on 01-Sep-2022.
Site created and maintained by the Reference Center on Environmental Information, CRIA, Brazil
System hosted by the Google Cloud Platform, GCP, Brazil