Git Product home page Git Product logo

retext-lexrank's Introduction

Retext Lexrank

Build Coverage Size

Retext plugin for generating unsupervised text summarization using the Lexrank algorithm.

Install

npm i --save retext-lexrank

Use

import { unified } from 'unified'
import latin from 'retext-latin'
import lexrank from 'retext-lexrank'

const processor = unified()
  .use(latin)
  .use(lexrank)

const file = '...' // vfile or text string
const tree = processor.parse(file)

processor.run(tree, file)

Adding the part-of-speech and keywords plugins to the pipeline yields more polarized results.

import { unified } from 'unified'
import latin from 'retext-latin'
import pos from 'retext-pos'
import keywords from 'retext-keywords'
import lexrank from 'retext-lexrank'

const processor = unified()
  .use(latin)
  .use(pos)
  .use(keywords)
  .use(lexrank)

Example

Note

The retext-lexrank plugin works best on medium-to-long samples of text, like web articles, blogs, and essays. The following is a simple example.

Using the classic write-music sample from the unifiedjs use-cases:

Write Music (by Gary Provost)

This sentence has five words. Here are five more words.
Five word sentences are fine. But several together
become monotonous. Listen to what is happening. The
writing is getting boring. The sound of it drones. It's
like a stuck record. The ear demands some variety.

Now listen. I vary the sentence length, and I create
music. Music. The writing sings. It has a pleasant
rhythm, a lilt, a harmony. I use short sentences. And I
use sentences of medium length. And sometimes when I am
certain the reader is rested, I will engage him with a
sentence of considerable length, a sentence that burns
with energy and builds with all the impetus of a
crescendo, the roll of the drums, the crash of the
cymbals—sounds that say listen to this, it is important.

So write with a combination of short, medium, and long
sentences. Create a sound that pleases the reader's ear.
Don't just write words. Write music.

Supplying the above text to the processor, we can then find the top-ranked sentences:

import { selectAll } from 'unist-util-select'
import { toString } from 'nlcst-to-string'

selectAll('SentenceNode', tree)
  .sort(({ data: { lexrank: a } }, { data: { lexrank: b } }) => b - a)
  .slice(0, 3)
  .forEach(sentence => {
    const score = sentence.data.lexrank.toFixed(2)
    console.log(`[${score}]: ${toString(sentence)}`)
  })

Running the above yields:

[1.00]: I vary the sentence length, and I create music.
[0.85]: And I use sentences of medium length.
[0.71]: So write with a combination of short, medium, and long sentences.

Tests

Run npm test to run tests.

Run npm coverage to produce a test coverage report.

License

MIT © Goran Spasojevic

retext-lexrank's People

Contributors

gorango avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

retext-lexrank's Issues

Fix `NaN` results in `normalize` fn

retext-lexrank/index.js

Lines 99 to 106 in cb2f777

function normalize(arr) {
const ratio = Math.max(...arr) / 100
return arr.map((num) => {
return num < ratio ? num : num / ratio / 100
// if num < ratio, it's largely inconsequential to the top scores
// however, applying the formula can produce numbers that fall out of `sort` range
})
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.