metalsmith / excerpts Goto Github PK

View Code? Open in Web Editor NEW

24.0 38.0 26.0 517 KB

A Metalsmith plugin to extract an excerpt from HTML files.

License: MIT License

JavaScript 100.00%

metalsmith metalsmith-plugin excerpt

excerpts's Introduction

@metalsmith/excerpts

A Metalsmith plugin to extract an excerpt from HTML files.

Installation

NPM:

npm install @metalsmith/excerpts

Yarn:

yarn add @metalsmith/excerpts

Usage

The excerpt is scraped from the first paragraph (<p> tag) of the rendered HTML contents of a file and added to its metadata excerpt key.

const excerpts = require('@metalsmith/excerpts')

metalsmith.use(excerpts()) // default -> file.excerpt
metalsmith.use(excerpts({ multipleFormats: true })) // -> file.excerpt.html & file.excerpt.text

Custom excerpts

You can define a custom excerpt in the front-matter of specific files:

---
excerpt: This will be the excerpt
---

This would be the excerpt if none was specified in the front-matter

Excerpts with tags stripped

Sometimes you may need access to the text content of the excerpt without HTML tags. Pass the multipleFormats: true option to store an excerpt object with both HTML and text excerpts { html: '...', text: '...' }:

metalsmith.use(excerpts({ multipleFormats: true }))

CLI usage

Add the @metalsmith/excerpts key to your metalsmith.json plugins key:

{
  "plugins": [{ "@metalsmith/excerpts": { "multipleFormats": false } }]
}

License

MIT

excerpts's People

Contributors

Stargazers

Watchers

excerpts's Issues

More precise excerpt selection

What you guys thinks about adding functionality to specify either number of paragraphs to take, or minimum characters. (so if first paragraph have less than minimum characters, take also next one and so on..)

I can prepare PR if you are interested in it.

Adding new options

Hi, I was also willing to extract n character like in issue #11, and then saw that there are other issues related with customization. So far:

extract n paragraphs #11
extract n characters #11
extract custom element #10
add custom extractor #3
strip HTML tags and get only text (in the line with #8 but not the same)

Can we add new options to the plugin?
Or it's better to create a brand new plugin? e.g.: metalsmith-extract.

I can work on a PR if you see this interesting. Otherwise I'll go the another-plugin way.

Wordpress-style  tag?

Would it be interesting for users to be able to manually define the end of the excerpt by writing a  comment in the file contents? Leave a thumbs up/ down to give feedback.

support custom excerpt

instead of taking the first paragraph it would be great to be able to supply a custom excerpt

Indented first paragraphs are transformed incorrectly

If the first paragraph of a Markdown file is indented, indicating that it's a code block, like this:


---
title: Test code blocks as first paragraphs

---

    This is code

This is not code

markdown-excerpts will go through each line and remove leading whitespace, making the code block a normal paragraph block.

rename to "excerpts"

High Dependency Vulnerability: cheerio

From npm audit:

┌───────────────┬──────────────────────────────────────────────────────────────┐
│ High          │ Prototype Pollution                                          │
├───────────────┼──────────────────────────────────────────────────────────────┤
│ Package       │ lodash.merge                                                 │
├───────────────┼──────────────────────────────────────────────────────────────┤
│ Dependency of │ metalsmith-excerpts                                          │
├───────────────┼──────────────────────────────────────────────────────────────┤
│ Path          │ metalsmith-excerpts > cheerio > lodash.merge                 │
├───────────────┼──────────────────────────────────────────────────────────────┤
│ More info     │ https://npmjs.com/advisories/1066                            │
└───────────────┴──────────────────────────────────────────────────────────────┘

Reference-style links aren't transformed into anchor tags

Given a Markdown file like the following:


---
title: Reference-style link demo

---

This includes some [links][] and other [things that should turn into anchor tags][a].

[links]: http://example.com
[a]: http://example.com

metalsmith-excerpts just grabs the first portion of text up to the first two newline characters and passes that into marked. This is why marked doesn't turn the reference-style links into actual anchor tags -- the references for them aren't passed in with the first paragraph.

Ideally, the way to fix this would be to excerpt from the HTML after converting the entire document. That way, documents that put all of the references at the bottom of the document would still work. After that, you would need to find the first tag (be it <p>, <pre>, <blockquote>, etc.) in the HTML. Using cheerio, you can accomplish that with:

var cheerio = require('cheerio');
var $ = cheerio.load(file.contents.toString());
var firstTag = $('*').first().clone();
$('*').replaceWith(firstTag);
file.excerpt = $.html();

I think you have to use clone() and replaceWith() to get the outerHTML of the tag you're selecting. I might be wrong about this.

Roadmap 2.0

This issue regroups and provides an overview of the proposals for 2.0.
Leave a thumbs up if you like all, thumbs down if you don't like any, and a comment if it's somewhere in between.

#11 Provide a truncate option that takes an instruction string as input x chars/ words/ sentences/ or comment and add an ellipsis at the end ....
#42 Remove the multipleFormats option and make that the default. Add a third format stripped which only removes the outerHTML
#41 Enables to manually indicate the end of the excerpt tag. Variants are: ,  or could even have an  too

Add option to strip surrounding paragraph element

Sometimes it's a use-case to want to present the excerpt in a different way than the rest of the content — for instance with a different class name on that element without having to wrap it in a different element.

I'd be happy to do a PR on this if you think it's a good idea?

Simplify API by only providing multiple formats

Would normalize the excerpt file property to always be { html: '...', text: '...' }

Pro's:

no need for a multipleFormats option (=simpler API)
could add support for a third format: stripped, removing only the outer <p> tags

Cons:

~~not backwards-compatible~~
~~can't just put {{ excerpt }} in a template~~

Excerpt from Markdown

It looks like unless the file is HTML you guys don't pull the excerpt. If I build in some code that could do that would you guys expect a pull request? Or am I doing something wrong and it should be?

Please update dependency on debug

The version of "debug" ("~2.2.0") has a known vulnerability.

Changing it to "^2.2.0" (or "^2.6.9") should do the trick.

See https://nodesecurity.io/advisories/534

Excerpt html does not get interpreted

I guess it's my bad, but I'm nevertheless posting it here as I don't know what to do anymore.
I'm trying to implement my blog using excerpts and therefore I'm iterating over all my posts like this:

{{#each collections.blog}}
  <h3><a href="/{{ this.path }}">{{ this.title }}</a></h3>
  <p class="meta">{{formatDate this.date 'Do MMM YYYY'}}</p>
  {{ this.excerpt }}
{{/each}}

The result I'm getting looks something like this:

So the html somehow does not get interpreted. When I view one of the blogposts on its own, the html is correct.
Would you maybe guide me into the right direction?

The source code of my metalsmith script can be found here:
https://bitbucket.org/FlorianSchrofner/flosch.at/src