Git Product home page Git Product logo

excerpts's Introduction

@metalsmith/excerpts

A Metalsmith plugin to extract an excerpt from HTML files.

metalsmith: core plugin npm: version ci: build code coverage license: MIT

Installation

NPM:

npm install @metalsmith/excerpts

Yarn:

yarn add @metalsmith/excerpts

Usage

The excerpt is scraped from the first paragraph (<p> tag) of the rendered HTML contents of a file and added to its metadata excerpt key.

const excerpts = require('@metalsmith/excerpts')

metalsmith.use(excerpts()) // default -> file.excerpt
metalsmith.use(excerpts({ multipleFormats: true })) // -> file.excerpt.html & file.excerpt.text

Custom excerpts

You can define a custom excerpt in the front-matter of specific files:

---
excerpt: This will be the excerpt
---

This would be the excerpt if none was specified in the front-matter

Excerpts with tags stripped

Sometimes you may need access to the text content of the excerpt without HTML tags. Pass the multipleFormats: true option to store an excerpt object with both HTML and text excerpts { html: '...', text: '...' }:

metalsmith.use(excerpts({ multipleFormats: true }))

CLI usage

Add the @metalsmith/excerpts key to your metalsmith.json plugins key:

{
  "plugins": [{ "@metalsmith/excerpts": { "multipleFormats": false } }]
}

License

MIT

excerpts's People

Contributors

calvinfo avatar davidosomething avatar doup avatar ericlathrop avatar ianstormtaylor avatar lambtron avatar matterweave avatar reinpk avatar srcreigh avatar webketje avatar woodyrew avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

excerpts's Issues

More precise excerpt selection

What you guys thinks about adding functionality to specify either number of paragraphs to take, or minimum characters. (so if first paragraph have less than minimum characters, take also next one and so on..)

I can prepare PR if you are interested in it.

Adding new options

Hi, I was also willing to extract n character like in issue #11, and then saw that there are other issues related with customization. So far:

  • extract n paragraphs #11
  • extract n characters #11
  • extract custom element #10
  • add custom extractor #3
  • strip HTML tags and get only text (in the line with #8 but not the same)

Can we add new options to the plugin?
Or it's better to create a brand new plugin? e.g.: metalsmith-extract.

I can work on a PR if you see this interesting. Otherwise I'll go the another-plugin way.

Wordpress-style <!--more--> tag?

Would it be interesting for users to be able to manually define the end of the excerpt by writing a <!--more--> comment in the file contents? Leave a thumbs up/ down to give feedback.

support custom excerpt

instead of taking the first paragraph it would be great to be able to supply a custom excerpt

Indented first paragraphs are transformed incorrectly

If the first paragraph of a Markdown file is indented, indicating that it's a code block, like this:


---
title: Test code blocks as first paragraphs

---

    This is code

This is not code

markdown-excerpts will go through each line and remove leading whitespace, making the code block a normal paragraph block.

High Dependency Vulnerability: cheerio

From npm audit:

┌───────────────┬──────────────────────────────────────────────────────────────┐
│ High          │ Prototype Pollution                                          │
├───────────────┼──────────────────────────────────────────────────────────────┤
│ Package       │ lodash.merge                                                 │
├───────────────┼──────────────────────────────────────────────────────────────┤
│ Dependency of │ metalsmith-excerpts                                          │
├───────────────┼──────────────────────────────────────────────────────────────┤
│ Path          │ metalsmith-excerpts > cheerio > lodash.merge                 │
├───────────────┼──────────────────────────────────────────────────────────────┤
│ More info     │ https://npmjs.com/advisories/1066                            │
└───────────────┴──────────────────────────────────────────────────────────────┘

Reference-style links aren't transformed into anchor tags

Given a Markdown file like the following:


---
title: Reference-style link demo

---

This includes some [links][] and other [things that should turn into anchor tags][a].

[links]: http://example.com
[a]: http://example.com

metalsmith-excerpts just grabs the first portion of text up to the first two newline characters and passes that into marked. This is why marked doesn't turn the reference-style links into actual anchor tags -- the references for them aren't passed in with the first paragraph.

Ideally, the way to fix this would be to excerpt from the HTML after converting the entire document. That way, documents that put all of the references at the bottom of the document would still work. After that, you would need to find the first tag (be it <p>, <pre>, <blockquote>, etc.) in the HTML. Using cheerio, you can accomplish that with:

var cheerio = require('cheerio');
var $ = cheerio.load(file.contents.toString());
var firstTag = $('*').first().clone();
$('*').replaceWith(firstTag);
file.excerpt = $.html();

I think you have to use clone() and replaceWith() to get the outerHTML of the tag you're selecting. I might be wrong about this.

Roadmap 2.0

This issue regroups and provides an overview of the proposals for 2.0.
Leave a thumbs up if you like all, thumbs down if you don't like any, and a comment if it's somewhere in between.

  • #11 Provide a truncate option that takes an instruction string as input x chars/ words/ sentences/ or comment and add an ellipsis at the end ....
  • #42 Remove the multipleFormats option and make that the default. Add a third format stripped which only removes the outerHTML
  • #41 Enables to manually indicate the end of the excerpt tag. Variants are: <!-- excerpt -->, <!-- excerpt:end --> or could even have an <!-- excerpt:start --> too

Add option to strip surrounding paragraph element

Sometimes it's a use-case to want to present the excerpt in a different way than the rest of the content — for instance with a different class name on that element without having to wrap it in a different element.

I'd be happy to do a PR on this if you think it's a good idea?

Simplify API by only providing multiple formats

Would normalize the excerpt file property to always be { html: '...', text: '...' }

Pro's:

  • no need for a multipleFormats option (=simpler API)
  • could add support for a third format: stripped, removing only the outer <p> tags

Cons:

  • not backwards-compatible
  • can't just put {{ excerpt }} in a template

Excerpt from Markdown

It looks like unless the file is HTML you guys don't pull the excerpt. If I build in some code that could do that would you guys expect a pull request? Or am I doing something wrong and it should be?

Excerpt html does not get interpreted

I guess it's my bad, but I'm nevertheless posting it here as I don't know what to do anymore.
I'm trying to implement my blog using excerpts and therefore I'm iterating over all my posts like this:

{{#each collections.blog}}
  <h3><a href="/{{ this.path }}">{{ this.title }}</a></h3>
  <p class="meta">{{formatDate this.date 'Do MMM YYYY'}}</p>
  {{ this.excerpt }}
{{/each}}

The result I'm getting looks something like this:

screenshot - 2014-08-19 - 22 10 50

So the html somehow does not get interpreted. When I view one of the blogposts on its own, the html is correct.
Would you maybe guide me into the right direction?

The source code of my metalsmith script can be found here:
https://bitbucket.org/FlorianSchrofner/flosch.at/src

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.