Home|Journals|Articles by Year Follow on Twitter| Subscribe to List

Directory for Medical Articles
 

Open Access

Review Article

JCR. 2020; 7(1): 135-140


AN EXTRACTIVE BASED MULTI-DOCUMENT SUMMARIZATION USING WEIGHTED TF-IDF AND CENTROID BASED K-MEANS CLUSTERING (TF-IDF: CBC) FOR LARGE TEXT DATA

JEBAMALAI ROBINSON1*, V. SARAVANAN.

Abstract
In research problems associated with text mining and classification, many factors have to be considered as on what basis the classification needs to be done. These factor variables are termed as features. The hardness of the visualization of training data is directly based on the number of features. Most of the times, the features are found to have high correlation and redundant. Dimensionality reduction helps to reduce the number of these features under the task by accumulating a group of principle variables. In the previous work an automated feature extraction technique using the weighted TF-IDF was proposed. Although the proposed method performed well, there was a drawback that some of the features generated are correlated to each other which resulted in high dimensionality resulting in more time complexity and memory usage. This paper proposes an Automatic text summarization method using the weighted TF-IDF model and K-means clustering for reducing the dimensionality of the extracted features. The various similarity measures are utilized in order to identify the similarity between the sentences of the document and then they are grouped in cluster on the basis of their term frequency and inverse document frequency (tf-idf) values of the words. The experiments were carried out on the student text data from the US educational data hub and the results were compared with other dimensionality reduction methods in terms of co-selection, content based, weight based and term significance parameters. The proposed method found to be efficient in terms of memory usage and time complexity.

Key words: Text Mining, Classification, Dimension Reduction, Text Summarization, Weighted TF-IDF and K-Means Clustering .



Full-text options

Full-text Article




Advertisement
American Journal of Diagnostic Imaging

SUBMIT YOUR ARTICLE NOW




ScopeMed.com
BiblioMed Home
Follow ScopeMed on Twitter
Author Tools
eJPort Journal Hosting
About BiblioMed
License Information
Terms & Conditions
Privacy Policy
Contact Us

The articles in Bibliomed are open access articles licensed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (https://creativecommons.org/licenses/by-nc-sa/4.0/) which permits unrestricted, non-commercial use, distribution and reproduction in any medium, provided the work is properly cited.
ScopeMed is a Database Service for Scientific Publications. Copyright ScopeMed Information Services.