Skip to content

bguvenc/LexRank

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 

Repository files navigation

LexRank

LexRank is the extractive generic text summarization system proposed by Erkan and Radev. This algoritm is the stochastic graph based method to find important sentences of text to create meaningful summarizations in multi-document systems for Natural Language Processing. The main idea of this process was similar sentences in a cluster are the more central to subject and to find the most important sentences they benefit from eigenvector centrality of representations of sentences in a graph. There are different ways of defining similarity between two sentences, however in this algorithm one of the most popular similarity measure which is cosine distance similarity metric is used.

Representation of documents modeled as vectors (with TF-IDF counts) in a vector space and similarity between different documents in this space represented by cosine similarity matrix. According to this method, every node represents one sentence and then graph is constructed and cosine similarity determines edges between nodes. Pagerank algorithm is used to compute the centrality of sentences over whole text. Sentences which have a high rank are more central to the topic.

About

Text summarization using Pagerank

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages