Git Product home page Git Product logo

Comments (6)

pawamoy avatar pawamoy commented on June 18, 2024 1

I'll need a bit more time to answer on mkdocstrings/autorefs#37, but the crux of it, so you know where I'm headed to, is the following:

  • I want to hook autorefs and handlers together. The most elegant way I see to do that is by accepting a new parameter in mkdocstrings' convert_markdown Jinja filter: metadata (or similar name).
  • The convert_markdown filter will pass this object down to the Markdown extension registered by autorefs.
  • This metadata object must implement a specific interface, so that autorefs' extension knows how to use it.
  • Its interface will require methods like resolve(identifier) or lineno(), etc., allowing to get additional metadata for a given docstring, and inject it in the spans generated by autorefs' extension.
  • The object will be passed from Jinja templates, which have access to the collected objects.
  • Therefore, no particular binding required between handlers and autorefs, like we already have for ids (by the way this new solution could replace this binding too).
jinja template with access to the object being rendered
    convert_markdown(object.docstring, object.metadata, **other_options)
        md.processors["autorefs"].metadata = metadata
        md.convert(docstring)
            autorefs extension working on unresolved links
                metadata.resolve(relative_id)
                metadata.lineno()
                inject metadata in span with HTML attributes

in autorefs, when transforming spans to HTML links
    use metadata accordingly

What this means is that the handler/extractor could then decide how to compute metadata, and what inputs it supports, meaning that your suggestion could actually be supported 🙂 For Python specifically, Griffe would probably be responsible for this metadata object, and since it supports extension, I can imagine an extension for your syntax suggestion, and other ones for the other suggested syntaxes.

Additionally, autorefs itself could propose a generic syntax, that handlers/extractors could choose to support or not, paving the way to a unified way of writing (relative/cross-language) references.

from griffe.

the-13th-letter avatar the-13th-letter commented on June 18, 2024 1

Before answering, let me try and explain how cross-references work.

  • The autorefs plugin registers a Markdown extension in the on_config hook.
  • When the mkdocstrings-python handler converts docstrings from Markdown to HTML, the extension registered by autorefs kicks in and transforms Markdown links [title][ref] into HTML spans with a data-autorefs-identifier attribute. It only transforms links that were left untouched by Python-Markdown itself because they didn't have their URL set.
  • Once all pages have been converted to HTML, the autorefs plugin searches for autorefs spans and replaces them with actual links, using an internal URL map that was built through previous events, and querying existing handlers for equivalent identifiers when it doesn't know one.

OK, got it.

Now for my answer.

If we wanted to implement your suggestion, we would have to either:

  • hook the autorefs Markdown extension and the Python handler together somehow, so that the extension can know what object is being rendered, and use it to resolve object references.
  • or as you suggest, have Griffe somehow inject additional information in Markdown links in docstrings

The second option doesn't look good to me, because:

  • modifying (not just extending) docstrings is dirty: we can tolerate it but only in last recourse
  • while the regular expression is not complex (we can reuse Python-Markdown's regex), it feels like we're short-circuiting the Markdown-to-HTML converter, when really all Markdown-to-HTML logic (even if just pre-processing) should live there and not in Griffe. Griffe is supposed to be both markup agnostic (Markdown/reStructuredText/Asciidoc/whatever) and SSG agnostic (MkDocs/Quarto/etc.). We do have a tiny bit of Markdown logic in our docstrings parsers, to avoid interpreting a few things in code blocks delimited by triple-backticks fences, but I would very much like to avoid adding more.

I'm not particularly hung up on implementing this in Griffe instead of in mkdocstrings-python; sorry if it seemed that way. I did feel however that implementing this entirely in autorefs, in a language-independent manner, leads to the aforementioned unaesthetic syntaxes and would require Python parsing knowledge (or at least scoping knowledge) in a system that is supposed to be language agnostic. Since Griffe already parses this information it made sense to me to place this related functionality there as well. Of course, given the coupling of mkdocstrings-python with Griffe, I imagine the effort to implement this in mkdocstrings-python instead is probably of similar magnitude.

So that leaves us with option 1. I have read mkdocstrings/autorefs#37 again and came to the conclusion that it's my preferred solution going forward. I'll expand there on the general debate over generic vs handler-specific handling of references, and stay focused here on the syntax you propose.

While I agree that using the current object's scope to resolve partial paths (like configuration.get_config mentioned above) looks good, and is consistent with what we do for types in docstrings, I worry about a few things.

First, the restrictions it brings. If we are to rely on the object's scope to resolve a partial path, that means the object we reference must actually exist in the given scope (either at runtime or just in the sources, guarded by if TYPE_CHECKING or similar conditions). For example, if you want to reference a function in a sibling module, you have to actually import this function, or its parent module, to be able to cross-ref it using its path. That will definitely not play well with linters/formatters that remove unused imports, since they wouldn't be able to detect that these imports are made to allow cross-refs in docstrings.

I'm inclined to say that having to actually import an object (or its containing module) to reference it in a short manner is a feature of this system. Or rather, if your actual code uses this name so little that you don't care enough to import it, then it's also very probably not important enough for you to want to document it using only a short local name.

And of course, there's always the option of overriding your linter/formatter in this case to keep the name. Or file this as a bug/wishlist feature for the linter/formatter to detect. I understand that it is currently treated as a bug on the author's side and not on the tool's side… but I don't think that that's actually true, or that it would necessarily stay this way in the long run.

Also, I'm totally fine with this being available only as an extension/a plugin that users need to explicitly opt in to.

Second, and as you mentioned, there could be cases where the reference becomes ambiguous, because it's a complete path but this path also exists in the current scope. For example:

from official import thing
import third_party as official

def func():
    """Link to [official][] package.""""

Here it's unclear whether official refers to the official external package, or to the third_party external package. It would still be unclear if we removed the line from official import thing. I don't want to bake such uncertainty into the design.

Yeah, well, that's just asking for trouble, and You Really Shouldn't Do That™. And if you actually Really Do That, then woe be to you for writing such horribly readable code. In this particular case I would probably link to the renamed third_party module, and maybe warn but otherwise accept that it is now impossible to refer to the official module even in absolute package names. You get what you deserve asked for.

There's only one scenario I can think of where I would tolerate this kind of thing: six and similar monkey patching of renamed, removed or otherwise maybe-non-existing functionality. But even there, tracing the value of the local name is unreliable enough that I don't feel bad for imposing the usage of full path names for cases like this.

(Also, during my last attempt at this type of monkey patching otherwise missing functionality—providing a poor substitute for sortedcontainers.SortedDict if sortedcontainers was unavailable—I spent almost all of my time fighting with the type checker instead. I don't think that fight would have been any less miserable if only my docstring formatter had allowed me to use short-form relative cross-references…)

On a related note, I would be fine with this being available only as an extension/a plugin that users need to explicitly opt in to.

The point of readability is important, but IMO the most readable references will always be the fully typed ones. Relative references, whether they use the current scope (not visible when displaying help(function)) or dots as prefix (requiring some mental gymnastics), will always be less readable. I would argue that at least the "dot-prefix" case allows to understand what is referred without printing and reading actual source code, contrary to the "scope" case.

Clearest, yes. Most readable, I dunno; I think that depends a lot on how unwieldy the full path is, and even if the package is known under a standard alternate name such as np for NumPy. I also believe it is permissible in writing to rely on context to disambiguate things that would on paper be slightly ambiguous. We do this all the time, particularly in technical writing and speaking.1 It also requires some discipline to use responsibly.

In the abovementioned case, yes, the (full) scope is not visible to the reader of the docstring, and it would be prudent of the author of this docstring to not relative-reference names that can't be clearly matched to enclosing scopes or parent modules. But I'm a strong believer of treating the user as an adult competent and not second-guessing or belittling them,23 and I therefore really, really think that there should be some way for authors to tell mkdocs+autorefs+mkdocstrings-python+griffe “Yes, when I don't specify a full path, I absolutely 100% mean the Python object currently visible at that path. And I accept full responsibility for any messup because of wrong or non-existing objects at that path, and for unaddressable object paths.“.

And concerning the “dot-prefix“ syntax, I don't believe that the ability to be able to decode what the prefix refers to is that much consolation. For both systems, if you use them where you ought to better have used an absolute cross-reference, you'll probably be bogged down by trying to decode just what in the world that crossref is referencing. If they are both used where they make sense, then the “scope“ version probably needs less lexical parsing than the “dot-prefix“ version. I imagine that you could get used to both of them, given enough exposure. That said, from my own experience with (not) using Python's relative import syntax, I'm pretty sure that I would need to do calculations and double-checks to make sure my identifiers are correct in the “dot-prefix“ version. I really can't imagine anyone needing that for the “scoped” version but not needing it for the “dot-prefix” version.

Again, I realize that I am asking for a certain implementation that only works for some kinds of relative cross-references, that only works correctly4 for some kinds of relative crossrefs, and that there is a lot of potential both for ambiguity in what the user actually meant and for the reference texts to be non-sensical to readers consulting the raw docstring without having access to the rest of the source code (e.g. help(...)). If this only ever makes it to a non-default, configurable option, or perhaps an extension/a plugin, I'd be happy with that too.

Also, btw, did I perhaps ever mention that I'm completely fine with this being available only as an extension/a plugin that users need to explicitly opt in to…?

Footnotes

  1. Git the DVCS vs. git the slang term for other people. Python the language vs. python the animal. Keys in database systems vs. keys in cryptography. Et cetera.

  2. I'm told this is part of the UNIX philosophy, and supposedly the main reason why commands like rm do not by default prompt you if you really want to delete this file you just said you want to delete.

  3. While it is OK to include safeguards against potentially destructive behavior (such as the aforementioned deleting of files), it turns into belittling the user if it is hard or impossible to bypass this safeguard.

  4. …as in, only sometimes resolves the reference the way the user intended it to be resolved.

from griffe.

pawamoy avatar pawamoy commented on June 18, 2024 1

If I understand correctly, you're fine with this being available only as an extension... 😀

Good then, this is likely what will happen 🙂

treating the user as an adult competent

Just a quick maintainer perspective: there can be a lot of less-advanced users that will trip over anything that can be tripped over. That generally means more user support falling down onto maintainers (helping them, turning down feature requests, closing invalid bugs, writing non-confusing documentation, putting more efforts into warning messages and error handling). That's something to take into account.

from griffe.

pawamoy avatar pawamoy commented on June 18, 2024 1

Also, thanks again for your suggestion. Using the object's scope is something I didn't think about before and it's a nice idea 😄

from griffe.

pawamoy avatar pawamoy commented on June 18, 2024

Thanks for this detailed feature request @the-13th-letter 🙂 I appreciate the thoroughness and the honest opinions.

Before answering, let me try and explain how cross-references work.

  • The autorefs plugin registers a Markdown extension in the on_config hook.
  • When the mkdocstrings-python handler converts docstrings from Markdown to HTML, the extension registered by autorefs kicks in and transforms Markdown links [title][ref] into HTML spans with a data-autorefs-identifier attribute. It only transforms links that were left untouched by Python-Markdown itself because they didn't have their URL set.
  • Once all pages have been converted to HTML, the autorefs plugin searches for autorefs spans and replaces them with actual links, using an internal URL map that was built through previous events, and querying existing handlers for equivalent identifiers when it doesn't know one.

Now for my answer.

If we wanted to implement your suggestion, we would have to either:

  • hook the autorefs Markdown extension and the Python handler together somehow, so that the extension can know what object is being rendered, and use it to resolve object references.
  • or as you suggest, have Griffe somehow inject additional information in Markdown links in docstrings

The second option doesn't look good to me, because:

  • modifying (not just extending) docstrings is dirty: we can tolerate it but only in last recourse
  • while the regular expression is not complex (we can reuse Python-Markdown's regex), it feels like we're short-circuiting the Markdown-to-HTML converter, when really all Markdown-to-HTML logic (even if just pre-processing) should live there and not in Griffe. Griffe is supposed to be both markup agnostic (Markdown/reStructuredText/Asciidoc/whatever) and SSG agnostic (MkDocs/Quarto/etc.). We do have a tiny bit of Markdown logic in our docstrings parsers, to avoid interpreting a few things in code blocks delimited by triple-backticks fences, but I would very much like to avoid adding more.

So that leaves us with option 1. I have read mkdocstrings/autorefs#37 again and came to the conclusion that it's my preferred solution going forward. I'll expand there on the general debate over generic vs handler-specific handling of references, and stay focused here on the syntax you propose.

While I agree that using the current object's scope to resolve partial paths (like configuration.get_config mentioned above) looks good, and is consistent with what we do for types in docstrings, I worry about a few things.

First, the restrictions it brings. If we are to rely on the object's scope to resolve a partial path, that means the object we reference must actually exist in the given scope (either at runtime or just in the sources, guarded by if TYPE_CHECKING or similar conditions). For example, if you want to reference a function in a sibling module, you have to actually import this function, or its parent module, to be able to cross-ref it using its path. That will definitely not play well with linters/formatters that remove unused imports, since they wouldn't be able to detect that these imports are made to allow cross-refs in docstrings.

Second, and as you mentioned, there could be cases where the reference becomes ambiguous, because it's a complete path but this path also exists in the current scope. For example:

from official import thing
import third_party as official

def func():
    """Link to [official][] package.""""

Here it's unclear whether official refers to the official external package, or to the third_party external package. It would still be unclear if we removed the line from official import thing. I don't want to bake such uncertainty into the design.

The point of readability is important, but IMO the most readable references will always be the fully typed ones. Relative references, whether they use the current scope (not visible when displaying help(function)) or dots as prefix (requiring some mental gymnastics), will always be less readable. I would argue that at least the "dot-prefix" case allows to understand what is referred without printing and reading actual source code, contrary to the "scope" case.

from griffe.

the-13th-letter avatar the-13th-letter commented on June 18, 2024

If I understand correctly, you're fine with this being available only as an extension... 😀

Good then, this is likely what will happen 🙂

Thanks :)

treating the user as an adult competent

Just a quick maintainer perspective: there can be a lot of less-advanced users that will trip over anything that can be tripped over. That generally means more user support falling down onto maintainers (helping them, turning down feature requests, closing invalid bugs, writing non-confusing documentation, putting more efforts into warning messages and error handling). That's something to take into account.

Surely. And maybe my view on this will change once I have done some (serious) software maintenance work. But as of now, I am shaped much more by my repeated experiences at $WORK with software that never lets you outgrow the “you're probably new to this, let me hide any dangerous settings that would only confuse you” phase, because it is programmed that way or because the administrative policy demands it that way. Which may or may not show in what I feature-request, and how.

from griffe.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.