Articles | Open Access |

IMPROVING XML DOCUMENT CLASSIFICATION THROUGH COLLABORATIVE CLUSTERING

Camilla Marino , Dept. of Electronics, Computer and Systems Sciences (DEIS), University of Calabria, Arcavacata di Rende (CS), Italy

Abstract

This study presents a novel approach to improving XML document classification through collaborative clustering techniques. XML (eXtensible Markup Language) documents are widely used for data representation and interchange; however, their hierarchical and semi-structured nature poses significant challenges for effective classification and retrieval. Traditional classification methods often fall short in handling the complexity and variability of XML data. This research introduces a collaborative clustering framework that leverages the relationships among documents to enhance classification accuracy. By employing clustering algorithms that incorporate user feedback and document similarity measures, the proposed method aims to group similar XML documents together while facilitating easier identification and classification of relevant data. The effectiveness of the collaborative clustering approach is evaluated through a series of experiments on benchmark XML datasets, demonstrating significant improvements in classification performance compared to traditional methods. The findings indicate that collaborative clustering can significantly enhance the organization and retrieval of XML documents, making it a valuable tool for information management in various applications.

Keywords

XML Document Classification, Collaborative Clustering, Data Organization

References

S. Abiteboul, I. Manolescu, N. Polyzotis, N. Preda, C. Sun, XML processing in DHT networks, in: Proc. IEEE Int. Conf. on Data Engineering (ICDE), 2008, pp. 606–615.

S. Abiteboul, I. Manolescu, E. Taropa, A framework for distributed XML data management, in: Int. Conf. on Extending Database Technology (EDBT), 2006, pp. 1049–1058.

C.C. Aggarwal, N. Ta, J. Wang, J. Feng, M. Zaki, XProj: a framework for projected structural clustering of XML documents, in: Proc. ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD), 2007, pp. 46–55.

P. Antonellis, C. Makris, N. Tsirakis, XEdge: Clustering homogeneous and heterogeneous XML documents using edge summaries, in: Proc. ACM Symposium on Applied Computing (SAC), 2008, pp. 1081–1088.

P. Antonellis, C. Makris, N. Tsirakis, Utilizing XML clustering for efficient XML data management on P2P networks, in: Proc. Int. Conf. on Database and Expert Systems Applications (DEXA), 2009, pp. 68–82.

M. Arenas, L. Libkin, A normal form for XML documents, ACM Trans. Database Systems (TODS) 29 (1) (2004) 195–232.

R. Baeza-Yates, B. Ribeiro-Neto, Modern Information Retrieval, ACM Press Books/Addison–Wesley, 1999.

L. Candillier, I. Tellier, F. Torre, Transforming XML trees for efficient classification and clustering, in: INEX Workshop, 2005, pp. 469–480.

G. Costa, G. Manco, R. Ortale, A. Tagarelli, A tree-based approach to clustering XML documents by structure, in: Proc. European Conf. on Principles and Practice of Knowledge Discovery in Databases (PKDD), 2004, pp. 137–148.

L. Denoyer, P. Gallinari, Report on the XML Mining Track, at INEX 2007: categorization and clustering of XML documents, Tech. report, 2008.

I.S. Dhillon, D.S. Modha, A data-clustering algorithm on distributed memory multiprocessors, in: ACM SIGKDD Workshop on Large-Scale Parallel KDD Systems, 1999, pp. 245–260.

I.S. Dhillon, D.S. Modha, Concept decompositions for large sparse text data using clustering, Machine Learning 42 (1/2) (2001) 143–175.

A. Doucet, M. Lehtonen, Unsupervised classification of text-centric XML document collections, in: INEX Workshop, 2006.

M. Eisenhardt, W. Muller, A. Henrich, Classifying documents by distributed P2P clustering, in: Proc. GI Jahrestagung (2), 2003, pp. 286–291.

K. Hammouda, M. Kamel, Collaborative document clustering, in: Proc. SIAM Int. Conf. on Data Mining (SDM), 2006, pp. 451–461.

Rowsan Jahan Bhuiyan, Salma Akter, Aftab Uddin, Md Shujan Shak, Md Rasibul Islam, S M Shadul Islam Rishad, Farzana Sultana, & Md. Hasan-Or-Rashid. (2024). SENTIMENT ANALYSIS OF CUSTOMER FEEDBACK IN THE BANKING SECTOR: A COMPARATIVE STUDY OF MACHINE LEARNING MODELS. The American Journal of Engineering and Technology, 6(10), 54–66. https://doi.org/10.37547/tajet/Volume06Issue10-07

Md Habibur Rahman, Ashim Chandra Das, Md Shujan Shak, Md Kafil Uddin, Md Imdadul Alam, Nafis Anjum, Md Nad Vi Al Bony, & Murshida Alam. (2024). TRANSFORMING CUSTOMER RETENTION IN FINTECH INDUSTRY THROUGH PREDICTIVE ANALYTICS AND MACHINE LEARNING. The American Journal of Engineering and Technology, 6(10), 150–163. https://doi.org/10.37547/tajet/Volume06Issue10-17

Md Salim Chowdhury, Md Shujan Shak, Suniti Devi, Md Rashel Miah, Abdullah Al Mamun, Estak Ahmed, Sk Abu Sheleh Hera, Fuad Mahmud, & MD Shahin Alam Mozumder. (2024). Optimizing E-Commerce Pricing Strategies: A Comparative Analysis of Machine Learning Models for Predicting Customer Satisfaction. The American Journal of Engineering and Technology, 6(09), 6–17. https://doi.org/10.37547/tajet/Volume06Issue09-02

Md Shujan Shak, Md Shahin Alam Mozumder, Md Amit Hasan, Ashim Chandra Das, Md Rashel Miah, Salma Akter, & Md Nur Hossain. (2024). OPTIMIZING RETAIL DEMAND FORECASTING: A PERFORMANCE EVALUATION OF MACHINE LEARNING MODELS INCLUDING LSTM AND GRADIENT BOOSTING. The American Journal of Engineering and Technology, 6(09), 67–80. https://doi.org/10.37547/tajet/Volume06Issue09-09

Md Abu Sayed, Badruddowza, Md Shohail Uddin Sarker, Abdullah Al Mamun, Norun Nabi, Fuad Mahmud, Md Khorshed Alam, Md Tarek Hasan, Md Rashed Buiya, & Mashaeikh Zaman Md. Eftakhar Choudhury. (2024). COMPARATIVE ANALYSIS OF MACHINE LEARNING ALGORITHMS FOR PREDICTING CYBERSECURITY ATTACK SUCCESS: A PERFORMANCE EVALUATION. The American Journal of Engineering and Technology, 6(09), 81–91. https://doi.org/10.37547/tajet/Volume06Issue09-10

Article Statistics

Downloads

Download data is not yet available.

Copyright License

Download Citations

How to Cite

Camilla Marino. (2024). IMPROVING XML DOCUMENT CLASSIFICATION THROUGH COLLABORATIVE CLUSTERING. International Journal of Computer Science & Information System, 9(10), 1–5. Retrieved from https://scientiamreearch.org/index.php/ijcsis/article/view/130