On an Improved Fuzzy C-Means Clustering Algorithm|
UKPAI OGBAN, FELIX; ASAGBA, PRINCE OGHENEKARO & OWOLABI, OLUMIDE
A cluster is a gathering of similar objects which can exhibit dissimilarity to the objects of other clusters. Clustering algorithms may be classified as: Exclusive, Overlapping, Hierarchical, and Probabilistic; and several algorithms have been formulated for classification and found useful in different areas of application. The K-means, Fuzzy C-means, Hierarchical clustering, and Mixture of Gaussians are the most prominent of them. Our interest on this work is on the web search engines. In this paper, we examined the fuzzy c-means clustering algorithm in anticipation to improving upon its application area. On the Web, classification of page content is essential to focused crawling. Focused crawling supports the development of web directories, to topic-specific web link analysis, and to analysis of the topical structure of the Web. Web page classification can also help improve the quality of web search. Page classification is the process of assigning a page to one or more predefined category label. In all, the tendency for a web page to contain the qualities of two or more clusters could exist. Thus exclusive clustering would not be very useful in our case; so the need for overlapping clustering using Fuzzy C-means. It is worthy of note that the Fuzzy C-mean being an optimization problem, converges to a local minimum or a saddle point. The iteration in some cases becomes recurring. At such a point, one would assume the saddle point is reached and if the iteration is not terminated, the loop may continue to a stack-grab that may fault (increase running time, etc) the algorithm. In this work, we developed a modified fuzzy C-mean clustering algorithm with a sharp stopping condition which was tested on a demo data to ascertain its convergence and comparatively test its efficiency. Corel Q-pro optimizer was used on a timing macro. Our result(s) are quite interesting and challenging as they clearly show the presence of inter-lapping documents along the spectrum of two different clusters.
Fuzzy clusters, unsupervised learning, classification, similarity measures, Page classification