Git Product home page Git Product logo

excerpt-html's Introduction

Build Status js-standard-style Maintainability Test Coverage

excerpt-html

parses a given html text for a good excerpt.

Install

$ npm i excerpt-html --save

API usage

var htmlCode = '<p>Hello world</p>';
var excerptHtml = require('excerpt-html');
var excerpt = excerptHtml(htmlCode);

It will either use the first found paragraph or everything up to a

<!-- more -->

Options

You can specify a few options that modify the way the excerpt is parsed:

excerptHtml(htmlCode, {
    moreRegExp:  /\s*<!--\s*more\s*-->/i,  // Search for the slug
    stripTags:   true, // Set to false to get html code
    pruneLength: 140, // Amount of characters that the excerpt should contain
    pruneString: 'โ€ฆ', // Character that will be added to the pruned string
    pruneSeparator: ' ', // Separator to be used to separate words
})

Note: pruneLength and prunestring only work when stripTags is set to true (default).

History

To make this project we detached the code of metalsmith-better-excerpts from metalsmith.

excerpt-html's People

Contributors

martinheidegger avatar pixelastic avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

excerpt-html's Issues

Consider trimming spaces in parsed html

If the html contains inline images (or some other html entity with non text content) with spaces in between, the parsed text contains those spaces.

<p><img></img> <img></img> <img></img> text</p>

would produce <space><space><space>text after parse.

These spaces count as characters in the pruneLength.

A simple regex to trim this could be

parsed = parsed.replace(/^\s+|\s+$|\s+(?=\s)/g, "")

Option to ignore H1-H6 headings

Say I have some HTML like this:

<h1>Title</h1>
<p>Hello, world</p>
<p>Hello again, world</p>
<hr>
<p>This is where the content really gets going, world</p>

I want to use <hr> as the excerpt cutoff, like so:

excerpt(myString, {moreRegExp: /<hr>/i})

Unfortunately the excerpt also includes the <h1> content. Would you be open to adding an option to omit h1-h6 tags from the excerpt?

A real-world example of this can be found on the Electron blog, where Jekyll's excerpt creation ignores headings by default:

screen shot 2017-06-26 at 2 44 55 pm

First paragraph is not being captured by default.

Hi @martinheidegger! ๐Ÿ‘‹

The docs say:

It will either use the first found paragraph or everything up to a <!-- more -->

But I'm not seeing that behavior:

excerpt_html('first line\n\nsecond line')
// => 'first line\nsecond line'

excerpt_html('#heading\n\nfirst line\n\nsecond line')
//=> '#heading\nfirst line\nsecond line'

Am I missing something?

Failing to extract excerpt if first paragraph only contains an image

Hello,

I have markdown content (blog posts) where the first paragraph(s) usually only contains an image and the actual textual content starts after. This module fails to correctly extract the excerpt in that case, returning an empty string instead.

test('ignore empty first paragraphs', function (t) {
  t.equals(excerptHtml('<p><img src="cat.png" /></p><p><img src="dog.png" /></p><p>test</p>', { 
  }), 'test')
  t.end()
})

Would you be interested in a PR fixing this behavior?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.