frnmst / md-toc Goto Github PK
View Code? Open in Web Editor NEWAutomatically generate and add an accurate table of contents to markdown files.
Home Page: https://docs.franco.net.eu.org/md-toc/
License: GNU General Public License v3.0
Automatically generate and add an accurate table of contents to markdown files.
Home Page: https://docs.franco.net.eu.org/md-toc/
License: GNU General Public License v3.0
Slightly over a year ago, GitHub added TOC as a first-class feature, so it may be worth explicitly clarifying md-toc's value proposition in its README. Some examples that come to mind for me:
Some users may only use md-toc because they are unaware of GitHub's analogous built-in feature, but personally I don't see many folks abandoning the tool for this reason alone in light of the examples above as well as its ease-of-use, simplicity, reliability, and performance.
- Two or more hyphens in a row are converted to one.
See https://docs.gitlab.com/ee/user/markdown.html#header-ids-and-links
This does not happen in GitHub Flavored Markdown.
As explained here header detection needs to be fixed. Fox example:
# Some heading
# this_is_a_root_shell_command
$ this_is_another_command
leads to:
- [Some header](some-header)
- [this_is_a_root_shell_command](this_is_a_root_shell_command)
and not to:
- [Some header](some-header)
Hi!
Thanks for making this package! I just tried to use it but unfortunately ran into a bug when a line contains a #
symbol in a code block. Here's a minimal example:
# Header
```python
# look a comment!
#
(ran into an issue with markdown-in-markdown here, so couldn't close the python block, but I hope you get the idea).
The first line in the code block is interpreted as a header and the second line causes the GithubEmptyLinkLabel exception to be raised. Both are undesired behavior I think. Thanks!
The pre-commit hook sometimes crashes when run from the Windows file system within Windows Subsystem for Linux (WSL):
Update markdown table-of-contents.............................................Failed
- hook id: md-toc
- exit code: 1
Traceback (most recent call last):
File "~/.cache/pre-commit/repogo0t_l7x/py_env-python3.10.4/lib/python3.10/site-packages/md_toc/__main__.py", line 35, in main
result = args.func(args)
File "~/.cache/pre-commit/repogo0t_l7x/py_env-python3.10.4/lib/python3.10/site-packages/md_toc/cli.py", line 77, in write_toc
write_strings_on_files_between_markers(
File "~/.cache/pre-commit/repogo0t_l7x/py_env-python3.10.4/lib/python3.10/site-packages/md_toc/api.py", line 141, in write_strings_on_files_between_markers
write_string_on_file_between_markers(f, strings[file_id], marker, newline_string)
File "~/.cache/pre-commit/repogo0t_l7x/py_env-python3.10.4/lib/python3.10/site-packages/md_toc/api.py", line 83, in write_string_on_file_between_markers
fpyutils.filelines.remove_line_interval(
File "~/.cache/pre-commit/repogo0t_l7x/py_env-python3.10.4/lib/python3.10/site-packages/fpyutils/filelines.py", line 242, in remove_line_interval
with atomic_write(output_file, overwrite=True) as f:
File "~/.asdf/installs/python/3.10.4/lib/python3.10/contextlib.py", line 142, in __exit__
next(self.gen)
File "~/.cache/pre-commit/repogo0t_l7x/py_env-python3.10.4/lib/python3.10/site-packages/atomicwrites/__init__.py", line 169, in _open
self.commit(f)
File "~/.cache/pre-commit/repogo0t_l7x/py_env-python3.10.4/lib/python3.10/site-packages/atomicwrites/__init__.py", line 202, in commit
replace_atomic(f.name, self._path)
File "~/.cache/pre-commit/repogo0t_l7x/py_env-python3.10.4/lib/python3.10/site-packages/atomicwrites/__init__.py", line 99, in replace_atomic
return _replace_atomic(src, dst)
File "~/.cache/pre-commit/repogo0t_l7x/py_env-python3.10.4/lib/python3.10/site-packages/atomicwrites/__init__.py", line 55, in _replace_atomic
os.rename(src, dst)
OSError: [Errno 18] Invalid cross-device link: '/mnt/c/Users/<username>/source/repos/scribemd-uwp/tmpkpv7awhg' -> 'README.md'
I am not yet certain under what circumstances this issue occurs; if it's happening to others I suggest starting by re-running the hook.
First of all, I love the work you have put into this package.
Now to the issue: I seem to got some errors when trying to render TOC.
File ".../site-packages/md_toc/cmark/inlines_c.py", line 2087, in _cmark_subject_find_special_char
if SPECIAL_CHARS[ord(subj.input.data[n])] == 1:
IndexError: list index out of range
Of course I dig in, and found out special characters throw this error. In my case there was an en-dash character in the header, which is most commonly used to denote a ranges of values, e.g.
so i checked the cmark/inlines_c.py file and saw a list of SPECIAL_CHARS that just isn´t long enough to support these characters. see below all the dashlike characters and there ord()
number
key | name | ord() |
---|---|---|
‐ | hyphen | 8208 |
− | minus | 8722 |
– | en-dash | 8211 |
— | em-dash | 8212 |
- | hyphen-minus | 45 |
I see there is a fixme in the code, checking for this problem, but it only seems to work if it is the first character of the header.
These characters are more common when text is copied from external text editors like Microsoft Word, which converts the hyphen-minus automatically to the other characters if it finds the need for it.
I'm unable to find anything related to that in the documentation. Is it possible to ignore certain headers?
And thanks for this nice tool
Issue
md_toc.api.build_anchor_link
's header_duplicate_counter
parameter is correctly stated as a dict
in the function's docstring, but incorrecly stated as a str
in the parameter's type annotation.
Suggested Fix
Change the header_duplicate_counter
parameter's type annotation to dict
.
The first feature that is blocking my implementation of the pre-commit hook is that it would be nice if we didn't have to increment the number in the markdown.
In other words instead of
1. FIRST ITEM
2. SECOND ITEM
3. THIRD ITEM
it would be nice if we could use the standard listing of
1. FIRST ITEM
1. SECOND ITEM
1. THIRD ITEM
so that Github or other viewers generate the ReadMe Markdown and that adding a new section won't pollute the git blame for the other sections / line numbers.
if there is a way to generate these lists with the current args, please let me know.
λ md_toc --help
Traceback (most recent call last):
File "d:\anaconda3\lib\runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "d:\anaconda3\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "D:\Anaconda3\Scripts\md_toc.exe\__main__.py", line 5, in <module>
File "d:\anaconda3\lib\site-packages\md_toc\__init__.py", line 23, in <module>
from .api import (
File "d:\anaconda3\lib\site-packages\md_toc\api.py", line 25, in <module>
import curses.ascii
File "d:\anaconda3\lib\curses\__init__.py", line 13, in <module>
from _curses import *
ModuleNotFoundError: No module named '_curses'
According to the documentation, --newline-string
is:
the string used to separate the lines of the TOC. Use quotes to delimit the string. Defaults to
'\n'
However, on Windows, --newline-string
actually defaults to '\r\n'
, which is desirable and simply needs to be documented. The larger problem is that setting --newline-string '\n'
has no effect.
Consequently, if you run the mixed-line-ending hook with --fix=lf
after md-toc, it will fail if md-toc was fed any Markdown files containing <!--TOC-->
. This remains true no matter how many times you run the hooks, which can be disorienting since mixed-line-ending outputs "fixed mixed line endings." To facilitate debugging, pre-commit hooks should fail whenever they modify any file, even if only to change the line endings.
I am on version 8.0.1 of md-toc.
I forgot to add a #
character in the build_toc_line
method. Without this, local links are broken.
File might get corrupted when writing in place and the operation gets interrupted.
Implement atomic writing to avoid file corruption.
Building the TOC for a single file used to be really easy:
index[filename] = md_toc.build_toc(filename, keep_header_levels=6, parser='github')
Now you have to awkwardly wrap everything in lists:
index[filename] = md_toc.build_multiple_tocs([filename], keep_header_levels=6, parser='github')[0]
For some reason, when I run md-toc as a pre-commit hook in some CI platforms, -p
does not work (because those platforms don't allow hooks to change files).
But if I take -p
out, the tool passes with status 0 (it should have been 1).
Is it possible to implement a "checking" mode:
Here I've used <small>
and _
styling: https://github.com/flyte/pi-mqtt-gpio/blob/d49d599c6f2f04ba0c6c31c214fd0f49746cc170/config-doc.md
Here I've only used the _
styling: https://github.com/flyte/pi-mqtt-gpio/blob/e9be6b2056a2bedca8189729abd15e7ed645b839/config-doc.md
And here I've removed the _
styling and it works: https://github.com/flyte/pi-mqtt-gpio/blob/c9227fc9bc3abc2e87cb02a034acaa7b972fd60f/config-doc.md
First thank you for creating and maintaining this library.
We use it in Ansible, so we have written an ansible role around this library.
Today we generate the MD and write to disk in one Ansible task. We then run md_toc
on the file in a second task (in python).
It works fine, but we would like to speed up the process by collapsing the two tasks in python and only write the final output to disk.
If you support the idea, I can provide a PR. I would move the parsing code to a separate function and call that from existing and new APIs. New API could use StringIO to create a stream.
Md-toc doesn't seem to parse correctly if you have headings that go from h1 to h3, h2 to h4, h3 to h5, etc. For example, md-toc will omit the line that says compatibility in the following markdown:
## Known Issues
#### Compatibility
We could probably agree that people shouldn't write markdown like this, but in my testing, it seems other websites / TOC extensions are handling this situation.
Check if md-toc is affected by the new changes: https://github.com/vmg/redcarpet/releases/tag/v3.5.1
In 5.0.0, the first line of a code fence is being parsed for purposes of header coherence. This results in a TocDoesNotRenderAsCoherentList
error when using the api, and a truncated TOC when using the CLI.
API:
[INFO] Generating table of contents for /mnt/infrastructure/docs/ingressroute.md
Traceback (most recent call last):
File "src/generate.py", line 325, in <module>
main()
File "src/generate.py", line 321, in main
_generate_table_of_contents()
File "src/generate.py", line 230, in _generate_table_of_contents
index[filename] = md_toc.build_toc(filename, keep_header_levels=6, parser='github')
File "/usr/local/lib/python3.7/site-packages/md_toc/api.py", line 213, in build_toc
raise TocDoesNotRenderAsCoherentList
md_toc.exceptions.TocDoesNotRenderAsCoherentList
CLI, with minimal test case:
[dharmab@n7 md-toc]$ cat test1.md
# Foo1
## Foo2
### Foo3
```
#!/bin/bash
code
```
### Foo4
[dharmab@n7 md-toc]$ md_toc test1.md github
- [Foo1](#foo1)
- [Foo2](#foo2)
- [Foo3](#foo3)
[dharmab@n7 md-toc]$ cat test2.md
# Foo1
## Foo2
### Foo3
```
code
#!/bin/bash
code
```
### Foo4
[dharmab@n7 md-toc]$ md_toc test2.md github
- [Foo1](#foo1)
- [Foo2](#foo2)
- [Foo3](#foo3)
- [Foo4](#foo4)
Originally posted by frnmst April 10, 2024
Note:
md_toc.build_toc(...)
is now md_toc.api.build_toc(...)
In some cases links are not working.
See https://github.com/frnmst/ideas/blob/master/user_interfaces/voice/kalliope-project/neurons.md#0-1-name for an example
Find a source to check for all these special cases.
A common use case seems like it would be skipping to the first TOC marker. It would be nice if instead of having the end user have to increment the SKIP_LINE number if we could specify a special value (like -1) to automatically skip to after the first TOC marker. You could even parameterize this so that (-2) represented the second marker and so on. It seems like it would be a useful utility that would prevent people from having to edit the args for the pre-commit hook anytime text above the TOC (like badges, project descriptions etc..) are updated.
Hello again! Just noticed that when the last character of a title is an (admittedly useless) space, the generated anchor is wrong (tested on GitHub).
import md_toc
open("test.md", "w").write("""
# title
## title with a trailing space
""")
print(md_toc.build_toc('test.md'))
Result:
- [title](#title)
- [title with a trailing space ](#title-with-a-trailing-space-)
May be a .strip()
somewhere would fix the issue?
The generation of an ordered TOC seems broken. Take for instance the example of the README:
import md_toc
open("test.md", "w").write("""
# this
## is
## a
### foo
#### booo
### foo
## file
## bye
# bye
""")
print(md_toc.build_toc('test.md', ordered=True))
Result:
AssertionError Traceback (most recent call last)
<ipython-input-11-e23380fd2495> in <module>
15 """)
16
---> 17 print(md_toc.build_toc('test.md', ordered=True))
/anaconda3/lib/python3.7/site-packages/md_toc/api.py in build_toc(filename, ordered, no_links, no_indentation, no_list_coherence, keep_header_levels, parser, list_marker, skip_lines)
228 compute_toc_line_indentation_spaces(
229 header_type_curr, header_type_prev, parser, ordered,
--> 230 list_marker, indentation_log, index)
231 no_of_indentation_spaces_curr = indentation_log[
232 header_type_curr]['indentation spaces']
/anaconda3/lib/python3.7/site-packages/md_toc/api.py in compute_toc_line_indentation_spaces(header_type_curr, header_type_prev, parser, ordered, list_marker, indentation_log, index)
420 if ordered:
421 assert list_marker in md_parser[parser]['list']['ordered'][
--> 422 'closing markers']
423 else:
424 assert list_marker in md_parser[parser]['list']['unordered'][
AssertionError:
Passing True
gives the expected result.
Thanks for this work!
Check if md-toc is affected by the new changes: https://spec.commonmark.org/0.30/changes.html
If the marker
passed to write_string_on_file_between_markers
is at the end of the file, A LineOutOfFileBoundsError is raised and the marker is erased from the file.
[INFO] Generating README table of contents
Traceback (most recent call last):
File "src/generate.py", line 321, in <module>
main()
File "src/generate.py", line 317, in main
_generate_table_of_contents()
File "src/generate.py", line 261, in _generate_table_of_contents
md_toc.write_string_on_file_between_markers('/mnt/infrastructure/README.md', readme_toc, marker='[](TOC)')
File "/usr/local/lib/python3.7/site-packages/md_toc/api.py", line 70, in write_string_on_file_between_markers
append=False)
File "/usr/local/lib/python3.7/site-packages/fpyutils/filelines.py", line 148, in insert_string_at_line
raise LineOutOfFileBoundsError
See https://docs.gitlab.com/ee/user/markdown.html and https://github.com/gjtorikian/commonmarker
This means that gitlab
will not be an alias of redcarpet
but most probably of cmark
instead.
Check if md-toc is affected by the new changes: https://spec.commonmark.org/0.29/changes.html
Hello
I have packaged md-toc for Debian, but the package was rejected.
Because some license declaration aren't correct.
I assume that you wanted to use GPLv3+ for these files instead of GPLv5+ which doesn't exist:
asciinema/md_toc_asciinema_1_0_0_demo.sh
asciinema/md_toc_asciinema_2_0_0_demo.sh
asciinema/md_toc_asciinema_3_0_0_demo.sh
asciinema/md_toc_asciinema_3_1_0_demo.sh
asciinema/md_toc_asciinema_5_0_0_demo.sh
asciinema/md_toc_asciinema_6_0_0_demo.sh
asciinema/md_toc_asciinema_7_0_0_demo.sh
asciinema/md_toc_asciinema_7_1_0_demo.sh
When that is the case you can apply the patch, which is in the attachment.
Otherwise please change it to the license which you had in mind.
Thanks and greetings
Sakirnth
The api.compute_toc_line_indentation_spaces
function returns the wrong number of spaces under certain situations, using commonmark (github) as parser:
For example with $ cat test.md
:
# hi
## ho
### hw
# h1
$ md_toc test.md github
returns:
- [hi](#hi)
- [ho](#ho)
- [hw](#hw)
- [h1](#h1)
instead of:
- [hi](#hi)
- [ho](#ho)
- [hw](#hw)
- [h1](#h1)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.