frnmst / md-toc Goto Github PK

View Code? Open in Web Editor NEW

29.0 29.0 6.0 1.47 MB

Automatically generate and add an accurate table of contents to markdown files.

Home Page: https://docs.franco.net.eu.org/md-toc/

License: GNU General Public License v3.0

Makefile 0.10% Python 90.53% Shell 9.37%

cmark cmark-gfm commonmark github gitlab inplace markdown python python3 redcarpet table-of-contents toc

md-toc's People

Contributors

Stargazers

Watchers

Forkers

dharmab manojbadam jumanjiman charlax skylion007 ryutarote

md-toc's Issues

Consider Mentioning Existence of GitHub's Built-In Markdown Table of Contents

Slightly over a year ago, GitHub added TOC as a first-class feature, so it may be worth explicitly clarifying md-toc's value proposition in its README. Some examples that come to mind for me:

Markdown can be rendered in many contexts, such as a blog, regardless of whether the source is hosted on GitHub.
Even when using Markdown purely for READMEs and community health files, there may be other cloud-hosting platforms that do not offer first-class TOC support, and even if you use GitHub, a mirror may not.
Even on GitHub, many users may not spot the small TOC icon.
Some people prefer to render and view Markdown locally.
It's convenient to have a TOC while editing the Markdown source since editors can correctly navigate the links locally.

Some users may only use md-toc because they are unaware of GitHub's analogous built-in feature, but personally I don't see many folks abandoning the tool for this reason alone in light of the examples above as well as its ease-of-use, simplicity, reliability, and performance.

multiple hyphens in anchor links for GitLab Flavored Markdown

Two or more hyphens in a row are converted to one.

See https://docs.gitlab.com/ee/user/markdown.html#header-ids-and-links

This does not happen in GitHub Flavored Markdown.

Fix header detection rules

As explained here header detection needs to be fixed. Fox example:

# Some heading

    # this_is_a_root_shell_command
    $ this_is_another_command

leads to:

- [Some header](some-header)
- [this_is_a_root_shell_command](this_is_a_root_shell_command)

and not to:

- [Some header](some-header)

Exception on comments with # prefix in code blocks

Hi!

Thanks for making this package! I just tried to use it but unfortunately ran into a bug when a line contains a # symbol in a code block. Here's a minimal example:

# Header

```python
# look a comment!
#

(ran into an issue with markdown-in-markdown here, so couldn't close the python block, but I hope you get the idea).

The first line in the code block is interpreted as a header and the second line causes the GithubEmptyLinkLabel exception to be raised. Both are undesired behavior I think. Thanks!

Invalid Cross-device Link

The pre-commit hook sometimes crashes when run from the Windows file system within Windows Subsystem for Linux (WSL):

Update markdown table-of-contents.............................................Failed
 - hook id: md-toc
 - exit code: 1
Traceback (most recent call last):
File "~/.cache/pre-commit/repogo0t_l7x/py_env-python3.10.4/lib/python3.10/site-packages/md_toc/__main__.py", line 35, in main
                                                                                                                          result = args.func(args)
File "~/.cache/pre-commit/repogo0t_l7x/py_env-python3.10.4/lib/python3.10/site-packages/md_toc/cli.py", line 77, in write_toc
                                                                                                                          write_strings_on_files_between_markers(
File "~/.cache/pre-commit/repogo0t_l7x/py_env-python3.10.4/lib/python3.10/site-packages/md_toc/api.py", line 141, in write_strings_on_files_between_markers
                                                                                                                          write_string_on_file_between_markers(f, strings[file_id], marker, newline_string)
File "~/.cache/pre-commit/repogo0t_l7x/py_env-python3.10.4/lib/python3.10/site-packages/md_toc/api.py", line 83, in write_string_on_file_between_markers
                                                                                                                          fpyutils.filelines.remove_line_interval(
File "~/.cache/pre-commit/repogo0t_l7x/py_env-python3.10.4/lib/python3.10/site-packages/fpyutils/filelines.py", line 242, in remove_line_interval
                                                                                                                          with atomic_write(output_file, overwrite=True) as f:
File "~/.asdf/installs/python/3.10.4/lib/python3.10/contextlib.py", line 142, in __exit__
                                                                                                                          next(self.gen)
File "~/.cache/pre-commit/repogo0t_l7x/py_env-python3.10.4/lib/python3.10/site-packages/atomicwrites/__init__.py", line 169, in _open
                                                                                                                          self.commit(f)
File "~/.cache/pre-commit/repogo0t_l7x/py_env-python3.10.4/lib/python3.10/site-packages/atomicwrites/__init__.py", line 202, in commit
                                                                                                                          replace_atomic(f.name, self._path)
File "~/.cache/pre-commit/repogo0t_l7x/py_env-python3.10.4/lib/python3.10/site-packages/atomicwrites/__init__.py", line 99, in replace_atomic
                                                                                                                          return _replace_atomic(src, dst)
File "~/.cache/pre-commit/repogo0t_l7x/py_env-python3.10.4/lib/python3.10/site-packages/atomicwrites/__init__.py", line 55, in _replace_atomic
                                                                                                                          os.rename(src, dst)
OSError: [Errno 18] Invalid cross-device link: '/mnt/c/Users/<username>/source/repos/scribemd-uwp/tmpkpv7awhg' -> 'README.md'

I am not yet certain under what circumstances this issue occurs; if it's happening to others I suggest starting by re-running the hook.

IndexError: list index out of range

First of all, I love the work you have put into this package.

Now to the issue: I seem to got some errors when trying to render TOC.

  File ".../site-packages/md_toc/cmark/inlines_c.py", line 2087, in _cmark_subject_find_special_char
    if SPECIAL_CHARS[ord(subj.input.data[n])] == 1:
IndexError: list index out of range

Of course I dig in, and found out special characters throw this error. In my case there was an en-dash character in the header, which is most commonly used to denote a ranges of values, e.g.

September–October
2:00–3:00 pm
Pages 113–117

so i checked the cmark/inlines_c.py file and saw a list of SPECIAL_CHARS that just isn´t long enough to support these characters. see below all the dashlike characters and there ord() number

key	name	ord()
‐	hyphen	8208
−	minus	8722
–	en-dash	8211
—	em-dash	8212
-	hyphen-minus	45

I see there is a fixme in the code, checking for this problem, but it only seems to work if it is the first character of the header.

These characters are more common when text is copied from external text editors like Microsoft Word, which converts the hyphen-minus automatically to the other characters if it finds the need for it.

Is it possible to ignore certain headers?

I'm unable to find anything related to that in the documentation. Is it possible to ignore certain headers?

And thanks for this nice tool

Incorrect Type Annotation for `build_anchor_link` Function

Issue
md_toc.api.build_anchor_link's header_duplicate_counter parameter is correctly stated as a dict in the function's docstring, but incorrecly stated as a str in the parameter's type annotation.

Suggested Fix
Change the header_duplicate_counter parameter's type annotation to dict.

Allow "1." only ordered TOC lists output

The first feature that is blocking my implementation of the pre-commit hook is that it would be nice if we didn't have to increment the number in the markdown.

In other words instead of

1. FIRST ITEM
2. SECOND ITEM
3. THIRD ITEM

it would be nice if we could use the standard listing of

1. FIRST ITEM
1. SECOND ITEM
1. THIRD ITEM

so that Github or other viewers generate the ReadMe Markdown and that adding a new section won't pollute the git blame for the other sections / line numbers.
if there is a way to generate these lists with the current args, please let me know.

I get thi serror after `pip install md-toc`

λ md_toc --help
Traceback (most recent call last):
  File "d:\anaconda3\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "d:\anaconda3\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "D:\Anaconda3\Scripts\md_toc.exe\__main__.py", line 5, in <module>
  File "d:\anaconda3\lib\site-packages\md_toc\__init__.py", line 23, in <module>
    from .api import (
  File "d:\anaconda3\lib\site-packages\md_toc\api.py", line 25, in <module>
    import curses.ascii
  File "d:\anaconda3\lib\curses\__init__.py", line 13, in <module>
    from _curses import *
ModuleNotFoundError: No module named '_curses'

Line Endings Are Always CRLF on Windows

According to the documentation, --newline-string is:

the string used to separate the lines of the TOC. Use quotes to delimit the string. Defaults to '\n'

However, on Windows, --newline-string actually defaults to '\r\n', which is desirable and simply needs to be documented. The larger problem is that setting --newline-string '\n' has no effect.

Consequently, if you run the mixed-line-ending hook with --fix=lf after md-toc, it will fail if md-toc was fed any Markdown files containing . This remains true no matter how many times you run the hooks, which can be disorienting since mixed-line-ending outputs "fixed mixed line endings." To facilitate debugging, pre-commit hooks should fail whenever they modify any file, even if only to change the line endings.

I am on version 8.0.1 of md-toc.

Fix links.

I forgot to add a # character in the build_toc_line method. Without this, local links are broken.

File corruption (in place) if the write toc operation gets interrupted.

File might get corrupted when writing in place and the operation gets interrupted.

Implement atomic writing to avoid file corruption.

Feature request: bring back `build_toc`

Building the TOC for a single file used to be really easy:

index[filename] = md_toc.build_toc(filename, keep_header_levels=6, parser='github')

Now you have to awkwardly wrap everything in lists:

index[filename] = md_toc.build_multiple_tocs([filename], keep_header_levels=6, parser='github')[0]

There are no diff unless we specify `-p`

For some reason, when I run md-toc as a pre-commit hook in some CI platforms, -p does not work (because those platforms don't allow hooks to change files).

But if I take -p out, the tool passes with status 0 (it should have been 1).

Is it possible to implement a "checking" mode:

It fails with status code 1 if the markdown ToC would be different
It passes with status code 0 otherwise

Links not working when some styling is used in the header on GitHub

Here I've used <small> and _ styling: https://github.com/flyte/pi-mqtt-gpio/blob/d49d599c6f2f04ba0c6c31c214fd0f49746cc170/config-doc.md

Here I've only used the _ styling: https://github.com/flyte/pi-mqtt-gpio/blob/e9be6b2056a2bedca8189729abd15e7ed645b839/config-doc.md

And here I've removed the _ styling and it works: https://github.com/flyte/pi-mqtt-gpio/blob/c9227fc9bc3abc2e87cb02a034acaa7b972fd60f/config-doc.md

Feature request: Support MD in argument instead of file

First thank you for creating and maintaining this library.

We use it in Ansible, so we have written an ansible role around this library.

Today we generate the MD and write to disk in one Ansible task. We then run md_toc on the file in a second task (in python).
It works fine, but we would like to speed up the process by collapsing the two tasks in python and only write the final output to disk.

If you support the idea, I can provide a PR. I would move the parsing code to a separate function and call that from existing and new APIs. New API could use StringIO to create a stream.

Irregular jumps in heading size not being recognized

Md-toc doesn't seem to parse correctly if you have headings that go from h1 to h3, h2 to h4, h3 to h5, etc. For example, md-toc will omit the line that says compatibility in the following markdown:

## Known Issues

#### Compatibility

We could probably agree that people shouldn't write markdown like this, but in my testing, it seems other websites / TOC extensions are handling this situation.

Redcarpet v3.5.1 is out

Check if md-toc is affected by the new changes: https://github.com/vmg/redcarpet/releases/tag/v3.5.1

First line of code fence is incorrectly parsed as header for purposes of coherence

In 5.0.0, the first line of a code fence is being parsed for purposes of header coherence. This results in a TocDoesNotRenderAsCoherentList error when using the api, and a truncated TOC when using the CLI.

API:

[INFO] Generating table of contents for /mnt/infrastructure/docs/ingressroute.md
Traceback (most recent call last):
  File "src/generate.py", line 325, in <module>
    main()
  File "src/generate.py", line 321, in main
    _generate_table_of_contents()
  File "src/generate.py", line 230, in _generate_table_of_contents
    index[filename] = md_toc.build_toc(filename, keep_header_levels=6, parser='github')
  File "/usr/local/lib/python3.7/site-packages/md_toc/api.py", line 213, in build_toc
    raise TocDoesNotRenderAsCoherentList
md_toc.exceptions.TocDoesNotRenderAsCoherentList

CLI, with minimal test case:

[dharmab@n7 md-toc]$ cat test1.md 
# Foo1

## Foo2

### Foo3

```
#!/bin/bash
code
```

### Foo4
[dharmab@n7 md-toc]$ md_toc test1.md github
- [Foo1](#foo1)
  - [Foo2](#foo2)
    - [Foo3](#foo3)
[dharmab@n7 md-toc]$ cat test2.md 
# Foo1

## Foo2

### Foo3

```
code
#!/bin/bash
code
```

### Foo4
[dharmab@n7 md-toc]$ md_toc test2.md github
- [Foo1](#foo1)
  - [Foo2](#foo2)
    - [Foo3](#foo3)
    - [Foo4](#foo4)

API change for versions 9.x

Discussed in #43

^{Originally posted by frnmst April 10, 2024}
Note: ⚠️ starting from version 9.x all the functions are only accessible via the full module path. ⚠️

⚠️ For example: md_toc.build_toc(...) is now md_toc.api.build_toc(...) ⚠️

Problem with specific titles.

In some cases links are not working.

See https://github.com/frnmst/ideas/blob/master/user_interfaces/voice/kalliope-project/neurons.md#0-1-name for an example

Find a source to check for all these special cases.

Feature Request: Automatically skip lines to first or (N) TOC

A common use case seems like it would be skipping to the first TOC marker. It would be nice if instead of having the end user have to increment the SKIP_LINE number if we could specify a special value (like -1) to automatically skip to after the first TOC marker. You could even parameterize this so that (-2) represented the second marker and so on. It seems like it would be a useful utility that would prevent people from having to edit the args for the pre-commit hook anytime text above the TOC (like badges, project descriptions etc..) are updated.

Title with trailing spaces

Hello again! Just noticed that when the last character of a title is an (admittedly useless) space, the generated anchor is wrong (tested on GitHub).

import md_toc

open("test.md", "w").write("""
# title
## title with a trailing space 
""")

print(md_toc.build_toc('test.md'))

Result:

- [title](#title)
  - [title with a trailing space ](#title-with-a-trailing-space-)

May be a .strip() somewhere would fix the issue?

Ordered TOC doesn't work

The generation of an ordered TOC seems broken. Take for instance the example of the README:

import md_toc

open("test.md", "w").write("""
# this
## is
## a
### foo
#### booo
### foo
## file

## bye

# bye
""")

print(md_toc.build_toc('test.md', ordered=True))

Result:

AssertionError                            Traceback (most recent call last)
<ipython-input-11-e23380fd2495> in <module>
     15 """)
     16 
---> 17 print(md_toc.build_toc('test.md', ordered=True))

/anaconda3/lib/python3.7/site-packages/md_toc/api.py in build_toc(filename, ordered, no_links, no_indentation, no_list_coherence, keep_header_levels, parser, list_marker, skip_lines)
    228                     compute_toc_line_indentation_spaces(
    229                         header_type_curr, header_type_prev, parser, ordered,
--> 230                         list_marker, indentation_log, index)
    231                     no_of_indentation_spaces_curr = indentation_log[
    232                         header_type_curr]['indentation spaces']

/anaconda3/lib/python3.7/site-packages/md_toc/api.py in compute_toc_line_indentation_spaces(header_type_curr, header_type_prev, parser, ordered, list_marker, indentation_log, index)
    420         if ordered:
    421             assert list_marker in md_parser[parser]['list']['ordered'][
--> 422                 'closing markers']
    423         else:
    424             assert list_marker in md_parser[parser]['list']['unordered'][

AssertionError:

Passing True gives the expected result.

Thanks for this work!

Commonmark 0.30 is out

Check if md-toc is affected by the new changes: https://spec.commonmark.org/0.30/changes.html

API: `write_string_on_file_between_markers` destructively fails if marker is at end of file

If the marker passed to write_string_on_file_between_markers is at the end of the file, A LineOutOfFileBoundsError is raised and the marker is erased from the file.

[INFO] Generating README table of contents
Traceback (most recent call last):
  File "src/generate.py", line 321, in <module>
    main()
  File "src/generate.py", line 317, in main
    _generate_table_of_contents()
  File "src/generate.py", line 261, in _generate_table_of_contents
    md_toc.write_string_on_file_between_markers('/mnt/infrastructure/README.md', readme_toc, marker='[](TOC)')
  File "/usr/local/lib/python3.7/site-packages/md_toc/api.py", line 70, in write_string_on_file_between_markers
    append=False)
  File "/usr/local/lib/python3.7/site-packages/fpyutils/filelines.py", line 148, in insert_string_at_line
    raise LineOutOfFileBoundsError

GitLab migrated to commonmarker

See https://docs.gitlab.com/ee/user/markdown.html and https://github.com/gjtorikian/commonmarker
This means that gitlab will not be an alias of redcarpet but most probably of cmark instead.

CommonMark Spec 0.29 is out

Check if md-toc is affected by the new changes: https://spec.commonmark.org/0.29/changes.html

Probably wrong license declaration

Hello

I have packaged md-toc for Debian, but the package was rejected.
Because some license declaration aren't correct.

I assume that you wanted to use GPLv3+ for these files instead of GPLv5+ which doesn't exist:
asciinema/md_toc_asciinema_1_0_0_demo.sh
asciinema/md_toc_asciinema_2_0_0_demo.sh
asciinema/md_toc_asciinema_3_0_0_demo.sh
asciinema/md_toc_asciinema_3_1_0_demo.sh
asciinema/md_toc_asciinema_5_0_0_demo.sh
asciinema/md_toc_asciinema_6_0_0_demo.sh
asciinema/md_toc_asciinema_7_0_0_demo.sh
asciinema/md_toc_asciinema_7_1_0_demo.sh

When that is the case you can apply the patch, which is in the attachment.
Otherwise please change it to the license which you had in mind.

Thanks and greetings
Sakirnth

0001-Updating-copyright.patch.txt

wrong indentation spaces

The api.compute_toc_line_indentation_spaces function returns the wrong number of spaces under certain situations, using commonmark (github) as parser:

For example with $ cat test.md:

# hi
## ho
### hw
# h1

$ md_toc test.md github returns:

- [hi](#hi)
  - [ho](#ho)
    - [hw](#hw)
  - [h1](#h1)

instead of:

- [hi](#hi)
  - [ho](#ho)
    - [hw](#hw)
- [h1](#h1)