Git Product home page Git Product logo

Comments (8)

picnixz avatar picnixz commented on June 12, 2024 1

Thank you for your very precise investigation (I didn't ask for that much actually). Personally, I prefer not changing anything on the inventory side and fix the thing on the resolver side instead as you said:

I'd be happy to do an alternative PR to modify _resolve_reference_in_domain_by_target if that solution is preferable to normalizing the inventory on read.

Maybe we will change that later again if there are still issues.

from sphinx.

electric-coder avatar electric-coder commented on June 12, 2024

For reference:

Identifier Normalization

Docutils adds a normalization by downcasing

I'm not really sure what you're proposing here? That Sphinx should internally postpone the normalization step to process objects.inv that might not be normalized? But isn't the point of objects.inv files to have links that are URL normalized already?

from sphinx.

goerz avatar goerz commented on June 12, 2024

I don't know what Docutils has to do with this or what the context of the "Identifier Normalization" is. Certainly, Sphinx does not generally use lowercase names or lowercase anchors in its URLs. For example, a random line from Sphinx' inventory is

sphinx.builders.dirhtml.DirectoryHTMLBuilder.format py:attribute 1 usage/builders/index.html#$ -

which describes the attribute at https://www.sphinx-doc.org/en/master/usage/builders/index.html#sphinx.builders.dirhtml.DirectoryHTMLBuilder.format (note the uppercase anchor name).

Just for section headers in particular, Sphinx happens to "sluggify" them to lowercase names, and that's how they get written to the inventory. There's nothing in particular that specifies that choice of sluggification method, and other systems (like Documenter) have a different sluggification that preservers case. There is no "normalization" happening when writing inventories.

Sphinx chooses to "normalize" e.g., :ref:`Syntax` to :ref:`syntax` , due to 'ref': XRefRole(lowercase=True, …). The ref and numref roles are the only two that do this. That gets handled correctly for local references, but with Intersphinx, the problem is that it then tries to look up the lowercase "syntax" (for example) directly in the inventory:

if target in inventory[objtype]:
# Case sensitive match, use it
data = inventory[objtype][target]
elif objtype == 'std:term':
# Check for potential case insensitive matches for terms only
target_lower = target.lower()
insensitive_matches = list(filter(lambda k: k.lower() == target_lower,
inventory[objtype].keys()))
if insensitive_matches:
data = inventory[objtype][insensitive_matches[0]]
else:
# No case insensitive match either, continue to the next candidate
continue
else:
# Could reach here if we're not a term but have a case insensitive match.
# This is a fix for terms specifically, but potentially should apply to
# other types.
continue
return _create_element_from_result(domain, inv_name, data, node, contnode)

(line 336). What I'm saying is that the normalization needs to happen in both places: the target, and the inventory[objtype] we're looking it up in.

So you either normalize the keys in inventory["std:label"] when you load the inventory (which this PR does), or you implement a case-insensitive lookup, just like for std:term objtype in the above code (which has a similar problem for other reasons).

I'd be happy to do an alternative PR to modify _resolve_reference_in_domain_by_target if that solution is preferable to normalizing the inventory on read.

But isn't the point of objects.inv files to have links that are URL normalized already?

I'm not sure what you mean. But no, objects.inv files do not have any kind of inherent normalization.

from sphinx.

picnixz avatar picnixz commented on June 12, 2024

Before anything, I want to understand why the ref and numref roles use a lowercasing in the first place... Could you investigate this one?

One reason that I can think of is because of labels that are auto-generated for sections with autosectionlabel but I'm not sure about it. Also, you could perhaps find the commit or related issues concerning capitalization of section titles + references.

from sphinx.

goerz avatar goerz commented on June 12, 2024

XRefRole(lowercase=True, …) was introduced in c02b714

Before that: 'ref': make_xref_role(lowercase_link_func, None, nodes.emphasis), introduced in 957be3b

Before that (f82a4a4):

    elif typ == 'ref':
        # reST label names are always lowercased
        target = ws_re.sub('', target).lower()

Ultimately, it comes down to

commit 2e698fcb0962fc42aae233e4b4495b74cbe0b9b6
Author: Georg Brandl <[email protected]>
Date:   Fri Jul 4 14:27:25 2008 +0000

    Merged revisions 64642-64643,64698 via svnmerge from
    svn+ssh://[email protected]/doctools/branches/0.4.x

    ........
      r64642 | georg.brandl | 2008-07-01 23:02:35 +0200 (Tue, 01 Jul 2008) | 2 lines

      #3251: label names are case insensitive.
    ........
      r64643 | georg.brandl | 2008-07-01 23:24:55 +0200 (Tue, 01 Jul 2008) | 2 lines

      Add a note about decorated functions.
    ........
      r64698 | georg.brandl | 2008-07-04 12:21:09 +0200 (Fri, 04 Jul 2008) | 2 lines

      Allow setting current module to None.
    ........

 doc/ext/autodoc.rst        | 11 +++++++++++
 sphinx/directives/other.py |  5 ++++-
 sphinx/roles.py            |  3 +++
 3 files changed, 18 insertions(+), 1 deletion(-)

See also https://mail.python.org/pipermail/python-checkins/2008-July.txt

  • Label names in references are now case-insensitive, since reST label
    names are always lowercased.

At that point, we're entering the pre-git era, so I don't think I can track this much further. The https://svn.python.org/projects/doctools/ site is still around if someone wants to dig deeper.

I'm pretty sure all of this predates any serious Intersphinx capabilities, definitely the v2 inventory format.

It's perfectly fine for label names in references to be case-insensitive

since reST label names are always lowercased.

But label names in inventory files that don't originate from .rst files are not necessarily lowercased, which is why I'm normalizing them in this PR.

from sphinx.

goerz avatar goerz commented on June 12, 2024

Maybe I should clarify the core of the issue more concisely, since it's quite narrow and specific to intersphinx:

  • Sphinx chooses to normalize (lowercase) the label in :ref:`label` before resolving it (for whatever reason, I suppose people wanted to write references without worrying about case)
  • An inventory is basically a dict of labels to URLs. Note that inventories are external (user-supplied) data, so we can't make too many assumptions about them, apart from that they parse according to the specification. In particular, if they're not originally generated by Sphinx, they might use mixed-case anchors for section titles.
  • Very generally, if we're looking up label in a dict, and we've normalized label to lowercase, we should also normalize the keys in the dict (or tweak the lookup). That's all this PR does.

from sphinx.

goerz avatar goerz commented on June 12, 2024

That’s fine! I’ll make a second PR for that within the next couple of day.

from sphinx.

goerz avatar goerz commented on June 12, 2024

OK, opened the alternative PR #12033

I agree that this is a better solution, as the problem is really the same one as was resolved earlier for :term: references

from sphinx.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.