A Framework for Mining Emerging Trends in Web Clickstreams with Particle Swarm Optimization.
The expansion of World Wide Web (WWW) in its size and exponential growth of its users has made the web most powerful and dynamic medium for information dissemination, storage, and retrieval. Moreover, the improvements in data storage technology have also made it possible to capture the huge amount of the user interactions (clickstreams) with the websites. The availability of such a huge amount of web user clickstreams has opened the new challenges for researchers to explore the weblog for the identification of hidden knowledge. For the last decade, web usage mining is playing a crucial role in the identification of trends from user clickstreams such as web personalization; user profiling; and user behavior analysis. These trends are beneficial in many ways such as information retrieval, website administration and improvement; customer relationship management; e-marketing; and recommender systems. The plenty of techniques are available in the literature, however, the accuracy, correctness and validity of the generated trends is totally relying on the proper selection of web mining process such as web sessionization, which is the benchmark for the later web usage mining stages. For the promising and optimized results, weblog sessionization is the eventual choice. Moreover, the extraction of proper, accurate and noise free sessionization is a demanding and challenging job in the presence of huge web clickstreams. The sessionization problem may fail to identify the focused and visualized groups from clickstreams records with high coverage and precision. Even though the well-known web session similarity measures such as Euclidean, Cosine, and Jaccard are prevalent in literature for mining process at the early learning stages. The web sessionization must take account of the validity of generated trends, which entirely depends upon the correctness and credibility of web sessions. To overcome the limitations of existing web sessionization techniques, we propose a Framework for Mining Trends (F MET) that empowers us to gauge the user activities on the website through evolutionary hierarchical Sessionization. Hierarchical Sessionization enhances the visualization of user click data to improve the business logic and mines the focused groups for scalable tracking of user activities. The foundation of the proposed framework is the swarm based optimized clustering technique along with a proposed web session similarity measure ST Index to address the Hierarchical Sessionization problem. The proposed web session similarity measure ST Index for hierarchical sessionization overcomes the limitations of Euclidean, Cosine and Jaccard measures, which may have failed to explicitly seek the proper and accurate trends. The Euclidean measures are of numerical in nature while the weblog data is of mixed nature. Moreover, existing measures are best for independent and isolated clustering groups. The proposed similarity measure ST Index computes the similarity among the user sessions through the common features (pages) shared among the sessions while assigning weight to uncommon features among the given sessions along with the minimum time shared by the given sessions time ratio. We validated and verify the proposed framework on three different datasets. The proposed ST Index measure produced the accurate and valid relationship among the sessions against common web session similarity measures. Furthermore, framework also produced the correct, accurate and valid trends. The performance of the proposed framework is validated against the well-known data analysis metrics such as VC (visitor coherence), accuracy, coverage and F1 Measures. The results show the significance improvements over the existing techniques of hierarchical Sessionization