pelican-plugins / seo Goto Github PK

View Code? Open in Web Editor NEW

48.0 4.0 7.0 341 KB

Pelican plugin to improve search engine optimization (SEO)

Python 96.74% CSS 0.98% HTML 2.27%

pelican plugin seo search-engine optimization seo-optimization seo-plugin

seo's People

Stargazers

Watchers

Forkers

micland76 wphuocom jwierzbi ysard sogolumbo heavythumper gaybro8777

seo's Issues

Adding :figure: together with :image:

Regarding Structured Data / Article Schema could be useful to add :figure: field where :figure: is used instead of :image:?

Ensure all plugin settings are SEO_* prefixed

For example, the following settings should probably be prefixed with SEO_[…] in order to prevent potential collisions with other plugins.

ARTICLES_LIMIT = 10
PAGES_LIMIT = 10

Usage guidance needs to be improved!

I really love the project. But when I first started using it, I had problems getting it to work. I tried pip installing and it seemed like nothing happened since I couldn't import the plugin (maybe I used the wrong alias?). I ended up git cloning the repo into my plugins folder and then it finally worked. It would be very helpful for future new users if there is a better step-by-step guide on how to use it.

is "description" field a duplicate of "summary"?

Is there a need to create the metadata field description while there is already summary in pelican ?

In pelican's docs, the description of summary is Brief description of content for index pages.

Refactor: external_canonical should not call CanonicalURLCreator

See #33

https://github.com/pelican-plugins/seo/blob/master/pelican/plugins/seo/seo_enhancer/__init__.py#L37 should not call CanonicalURLCreator.create_url() method in case of external canonical as the URL is already built.

open_graph.py's _create_absolute_fileurl method does not honor a SITEURL with a subdirectory

Given

SITEURL = 'mysite.com/blog' in the Pelican settings file.
and a fileurl of 'posts/my_article/index.html`

the _create_absolute_fileurl method in open_graph.py will return a file_url missing the SITEURLs subdirectory.
This is due to the way that urllib's parse.urljoin works.

My quick workaround for now is to append a slash to the siteurl. If it's redundant, it will be stripped out by urljoin

def __init__(
       self, siteurl, fileurl, file_type, title, description, image, locale
      ) -> None:
          self.siteurl = siteurl + "/"

Test canonical feature in case of HTML generation

See #33

Localization: https://github.com/pelican-plugins/seo/blob/master/pelican/plugins/seo/tests/test_seo_enhancer.py

The test should be parametrized and test the HTML generation for canonical feature for the following cases:

external_canonical
save_as
external_canonical AND save_as
nothing set

Add social meta tags?

Salut @MaevaBrunelles !

Clearly I caught this in the works. But here is a proposal.

Could we add <meta> tags to improve search engine optimization? See examples below!

For further reference, I think jekyll does this pretty well.

I'm rebuilding my website and coding up some jinja logic to get this working. Can share some code 😄

IMO these are essential for great SEO results!

Thanks for the great work!

Some examples:

Twitter:

<meta name="twitter:card" content="summary">
<meta name="twitter:site" content="@site_account">
<meta name="twitter:creator" content="@individual_account">
<meta name="twitter:url" content="https://example.com/page.html">
<meta name="twitter:title" content="Content Title">
<meta name="twitter:description" content="Content description less than 200 characters">
<meta name="twitter:image" content="https://example.com/image.jpg">
<meta name="twitter:image:alt" content="A text description of the image conveying the essential nature of an image to users who are visually impaired. Maximum 420 characters.">

Facebook:

<meta property="fb:app_id" content="123456789">
<meta property="og:url" content="https://example.com/page.html">
<meta property="og:type" content="website">
<meta property="og:title" content="Content Title">
<meta property="og:image" content="https://example.com/image.jpg">
<meta property="og:image:alt" content="A description of what is in the image (not a caption)">
<meta property="og:description" content="Description Here">
<meta property="og:site_name" content="Site Name">
<meta property="og:locale" content="en_US">
<meta property="article:author" content="">

Perhaps google verification as well?

<meta name="google-site-verification" content="your verification string">

Structured data is inserted in another existing script tags

Hi!
I am using this plugin in one of my websites and the Google Console reported me an error related with the structured data based on Schema. At the first time I thought it may be related with this note in the README:

Note that schemas generated by default are compliant with Schema.org but not (by default) Google-compliant.

However, after check the console and then the HTML files generated I discovered that structured data is being added in existing <script> tags and then it creates two new <script type="application/ld+json"> but let them empty. So in my <head> I have:

  <script src="/theme/navbar.js">
   {"@context": "https://schema.org", "@type": "BreadcrumbList", "itemListElement": [{"@type": "ListItem", "position": 1, "name": "Coruja Digital", "item": "https://corujadigital.tech"}, {"@type": "ListItem", "position": 2, "name": "Blog", "item": "https://corujadigital.tech/blog"}, {"@type": "ListItem", "position": 3, "name": "Rediseno sitio web", "item": "https://corujadigital.tech/blog/rediseno-sitio-web.html"}]}
  </script>
  <script src="/theme/js/fontawesome-all.min.js">
   {"@context": "https://schema.org", "@type": "Article", "author": {"@type": "Person", "name": "Iván Hernández Cazorla"}, "publisher": {"@type": "Organization", "name": "Coruja Digital", "logo": {"@type": "ImageObject", "url": "https://corujadigital.tech/theme/logo_coruja_digital.png"}}, "headline": "Rediseño del sitio web de Coruja Digital", "about": "corujadigital.tech", "datePublished": "2020-10-29 00:00"}
  </script>
  <script type="application/ld+json">
  </script>
  <script type="application/ld+json">
  </script>

I upgraded to the latest version (1.0.1) but it did not fix this. Have you got any idea of what could be happening?

Thanks in advance!

LD+JSON created but no meta property tags

I'm using pelican-seo - The resulting LD+JSON is good, I'm only lacking the canonical url, but all other tags work (image, description, etc).

However, it is not generating any meta property tags.

My configuration:

SEO_REPORT = True
SEO_ENHANCER = True
SEO_ENHANCER_OPEN_GRAPH = True
SEO_ENHANCER_TWITTER_CARDS = True

I have all this tags in my articles (I use .rst):

:category: 
:tags:
:image:
:description:
:og_description:
:og_image:
:date:
:summary:

No tw_author, though, but documentation says its optional.

What am I missing? Is this a bug or user error?

'SitemapGenerator' object has no attribute 'output_path'

I am attempting to enable this plugin with the Flex theme. Unfortunately, as soon as I enable the plugin, I get the following error when generating my output.

$ pelican -s pelicanconf.py -lrv --debug
[08:49:05] DEBUG    Pelican version: 4.8.0                                                                                                    __init__.py:531
           DEBUG    Python version: 3.10.8                                                                                                    __init__.py:532
           DEBUG    Adding current directory to system path                                                                                    __init__.py:66
           DEBUG    Finding namespace plugins                                                                                                    _utils.py:81
           DEBUG    Namespace plugins found:                                                                                                     _utils.py:84
                    pelican.plugins.seo
                    pelican.plugins.series
           DEBUG    Loading plugin `series`                                                                                                      _utils.py:90
           DEBUG    Loading plugin `extract_toc`                                                                                                 _utils.py:90
           DEBUG    Loading plugin `neighbors`                                                                                                   _utils.py:90
           DEBUG    Loading plugin `extended_sitemap`                                                                                            _utils.py:90
           DEBUG    Loading plugin `seo`                                                                                                         _utils.py:90
           DEBUG    Registering plugin `pelican.plugins.series`                                                                                __init__.py:73
           DEBUG    Registering plugin `extract_toc`                                                                                           __init__.py:73
           DEBUG    Registering plugin `neighbors`                                                                                             __init__.py:73
           DEBUG    Registering plugin `extended_sitemap`                                                                                      __init__.py:73
           DEBUG    Registering plugin `pelican.plugins.seo`                                                                                   __init__.py:73
           INFO     SEO plugin initialized                                                                                                          seo.py:43
           DEBUG    Found generator: ArticlesGenerator (internal)                                                                             __init__.py:209
           DEBUG    Found generator: PagesGenerator (internal)                                                                                __init__.py:209
           DEBUG    Found generator: SitemapGenerator (extended_sitemap)                                                                      __init__.py:209
           DEBUG    Found generator: StaticGenerator (internal)                                                                               __init__.py:209

          [reading files that should be generated]
           INFO     SEO plugin - SEO Report: seo_report.html file created                                                                     __init__.py:273
           CRITICAL AttributeError: 'SitemapGenerator' object has no attribute 'output_path'                                                  __init__.py:552

The seo_report.html is generated. Nothing else is generated.

When I disable the plugin, site generation works as expected.

My config file has the relevant entries:

OUTPUT_PATH ="output/"

PLUGINS = [
    ...
    "seo",
    ...
]

SEO_REPORT = False  # Odd that the seo_report is generated with this false
SEO_ENHANCER = True  
SEO_ENHANCER_OPEN_GRAPH = False 
SEO_ENHANCER_TWITTER_CARDS = False

How do I resolve the CRITICAL error in the logs? My goal is to add structured data to my output.

SEO Plugin not running and no errors

I followed the installation steps and am not seeing any output utilizing like you show when running pelican content --verbose

If I run pelican --print-settings it will show my settings are present.

Any further tips to debug?

Canonical URL generation ignores save_as metadata

One of my pages (home.md) has a save__as metadata as index.html on that page, the canonical URL given is for /pages/.html.

This should check what the correct page URL is.

Allow settings to be in pelicanconf.py?

Would it be useful if the various settings (e.g., SEO_REPORT, etc.) were allowed to be set in the pelicanconf.py file? Currently to change these, you have to edit the settings.py file (see #51). I can try to work on this if you think it would be a good idea!

License for this plugin

Hello, the plugin looks good.
Would you please place the project under a specific open source license?
Thanks.

Twitter cards not functional?

I set everything to True

SEO_REPORT = True
SEO_ENHANCER = True
SEO_ENHANCER_OPEN_GRAPH = True
SEO_ENHANCER_TWITTER_CARDS = True

But I'm not seeing any twitter:card tags in the html. Using Python 3.8.5 on Mac. I can work up a simple example if necessary (are there examples somewhere already?)

CRITICAL: 'charmap' codec can't decode byte 0x9d in position 5425

Installing the pluing via pip and enabling it in pelicanconf.cpy results in this error:

  CRITICAL: 'charmap' codec can't decode byte 0x9d in position 5425: character maps to <undefined>

I'm sure it's this plugin, since it started when I added the plugin to a site which generated fine, and it stops if I remove the plugin.

Running Window10 & Python 3.8.5; this was the only plugin added when the error occurs. I'm guessing it might be something inside of beautiful soup scraping a file in a file with an unexpected encoding. To be sure I tried this on an empty project created with the pelican quickstart and got the same error.

UnboundLocalError: local variable 'max_index' referenced before assignment

Python version: 3.8.3
Pelican version: 4.5.0
pelican-seo version: 1.0.0
pelicanconf.py: https://gist.github.com/jwodder/35d570ca8710779af6138786b78f64da

Attempting to use this plugin on my site produces the following error:

-> Writing /Users/jwodder/work/GITHUB/kbits/site/build/posts/pypkg-mistakes/index.html
CRITICAL: local variable 'max_index' referenced before assignment
Traceback (most recent call last):
  File "/Users/jwodder/work/GITHUB/kbits/site/.nox/publish/bin/pelican", line 8, in <module>
    sys.exit(main())
  File "/Users/jwodder/work/GITHUB/kbits/site/.nox/publish/lib/python3.8/site-packages/pelican/__init__.py", line 512, in main
    pelican.run()
  File "/Users/jwodder/work/GITHUB/kbits/site/.nox/publish/lib/python3.8/site-packages/pelican/__init__.py", line 121, in run
    p.generate_output(writer)
  File "/Users/jwodder/work/GITHUB/kbits/site/.nox/publish/lib/python3.8/site-packages/pelican/generators.py", line 686, in generate_output
    self.generate_pages(writer)
  File "/Users/jwodder/work/GITHUB/kbits/site/.nox/publish/lib/python3.8/site-packages/pelican/generators.py", line 595, in generate_pages
    self.generate_articles(write)
  File "/Users/jwodder/work/GITHUB/kbits/site/.nox/publish/lib/python3.8/site-packages/pelican/generators.py", line 466, in generate_articles
    write(article.save_as, self.get_template(article.template),
  File "/Users/jwodder/work/GITHUB/kbits/site/.nox/publish/lib/python3.8/site-packages/pelican/writers.py", line 269, in write_file
    _write_file(template, localcontext, self.output_path, name,
  File "/Users/jwodder/work/GITHUB/kbits/site/.nox/publish/lib/python3.8/site-packages/pelican/writers.py", line 216, in _write_file
    signals.content_written.send(path, context=localcontext)
  File "/Users/jwodder/work/GITHUB/kbits/site/.nox/publish/lib/python3.8/site-packages/blinker/base.py", line 266, in send
    return [(receiver, receiver(sender, **kwargs))
  File "/Users/jwodder/work/GITHUB/kbits/site/.nox/publish/lib/python3.8/site-packages/blinker/base.py", line 266, in <listcomp>
    return [(receiver, receiver(sender, **kwargs))
  File "/Users/jwodder/work/GITHUB/kbits/site/.nox/publish/lib/python3.8/site-packages/pelican/plugins/seo/seo.py", line 104, in run_html_enhancer
    html_enhancements = seo_enhancer.launch_html_enhancer(
  File "/Users/jwodder/work/GITHUB/kbits/site/.nox/publish/lib/python3.8/site-packages/pelican/plugins/seo/seo_enhancer/__init__.py", line 30, in launch_html_enhancer
    "breadcrumb_schema": html_enhancer.breadcrumb_schema.create_schema(),
  File "/Users/jwodder/work/GITHUB/kbits/site/.nox/publish/lib/python3.8/site-packages/pelican/plugins/seo/seo_enhancer/html_enhancer/breadcrumb_schema_creator.py", line 100, in create_schema
    breadcrumb_items = self._create_paths()
  File "/Users/jwodder/work/GITHUB/kbits/site/.nox/publish/lib/python3.8/site-packages/pelican/plugins/seo/seo_enhancer/html_enhancer/breadcrumb_schema_creator.py", line 51, in _create_paths
    del split_path[0:max_index]
UnboundLocalError: local variable 'max_index' referenced before assignment

It's clear that this condition failed to be true, and so no value was assigned to max_index, and yet the code used max_index anyway. I'm not sure what that piece of code is meant to be doing, however, so I can't recommend a specific fix.

Report says I am missing 'Page description' & 'Content title' in all articles

Example link:

https://www.kentoseth.com/posts/2022/apr/17/python-pelican-how-to-view-draft-content-on-a-website/

Metadata from article:

Title: Python Pelican: How to view draft content on a website
Tags: pelican, blog
Author: Mohamed H.
Summary: Finding draft content on Pelican, a static site generator written in the Python programming language.

There is no metadata info for the things being requested in the report: https://docs.getpelican.com/en/stable/content.html

How do we add these things to the article metadata? How much do they impact SEO with/without having them?

Directory separator is os-dependent ('/' vs. '\', css_file)

The path to the css file may have to be interpreted by a browser on a windows machine. The hardcoded / characters here cause an error:
The css file can not be found.

Replacing the line with the following code works on Windows:

css_file = "file:///" + os.path.join(plugin_path, "static", "seo_report.css")

Asking for SITEURL in pelicanconf.py

When adding settings for pelican-seo to pelicanconf.py, it gives an error:

Exception: You must fill in SITEURL variable in pelicanconf.py       __init__.py:566
           to use SEO plugin.

But this will break the site for local testing, which uses localhost. If you add a SITEURL, localhost stops working for rendering the content locally.

About SEO report and h1 tags

Problem

I see in the following line that the content of the article is used to count the
h1 tags:

seo/pelican/plugins/seo/seo_report/seo_analyzer/__init__.py

Line 22 in 5ab4c64

self.content_title_analysis = ContentTitleAnalyzer(content=self._content)

You suggest (like everyone) the following Markdown structure in your README:

Title: Page Title
Description: Page Description

# Heading Content

Nevertheless, most (all?) templates already encapsulate the "Title" metadata in an h1 tag.

This processing is independent from the rest of the article written in Markdown indeed contained in the content attribute of the objects. It is inserted as is in the html template.

Therefore such an example poses 2 problems:
- duplication of the h1 tag (that of the template + that of the article content)
- duplication not detected by the current plugin

Currently, as far as I know, the only simple way to get a compliant html page is to write articles starting the heading level at h2 via ##
although it is semantically wrong in Markdown and can disturb some plugins (table of contents rendering, etc.).

I personally use an homemade plugin to modify the final html without modifying the original Markdown.

In this case, the SEO report plugin misleads the user by not detecting any h1 title.

I would like to mention that your plugin is very welcome because it allowed me to highlight a problem that I had totally missed.

Proposal

The plugin should be refactored to read finalized pages, like the SEOEnhancer part
called after the content_written signal.

Notes

Pandoc has implemented an option to automatically shift the heading level :
jgm/pandoc#5615
The html5 allows nesting of independent units in tags like <section>, <article>, etc., which allows multiple h1 titles to coexist in a page (Outline algorithm). However, Mozilla is very clear about this: it is a non-standard practice and not recommended.
Cf
https://developer.mozilla.org/fr/docs/orphaned/Web/Guide/HTML/Using_HTML_sections_and_outlines#lalgorithme_outline_html5
https://developer.mozilla.org/en-US/docs/Web/HTML/Element/Heading_Elements#multiple_h1_elements_on_one_page
Discussion on Hugo's side:
https://discourse.gohugo.io/t/option-to-shift-headings/6136

Error: Authors can be empty!

I get the following error:

[...]
File "...seo\seo_enhancer\html_enhancer\article_schema_creator.py", line 17, in __init__
    self._author = author.name
AttributeError: 'NoneType' object has no attribute 'name'

The reason for the error is:
The implicit assumption that the object author is not empty is wrong because there's the option to have no authors for an article.