Git Product home page Git Product logo

Comments (13)

abravalheri avatar abravalheri commented on August 23, 2024

Hi @osamuaoki , have you tried setting the readme.content-type to text/plain?

Sorry, wrong hint: this does not seem to solve the problem...

It is probably a "folding" issue with the syntax of the METADATA file. The effect of license.file is to embedded all the text from the file into METADATA. Omitting it relies on Setuptools using the standard rules to add the license file into the wheel as a separated file.

from sampleproject.

osamuaoki avatar osamuaoki commented on August 23, 2024

Yah, I was thinking along the same line. Despite the upload ERROR message initially gave me an impression, the real issue seems to be inclusion of LICENCE and its data conversion to fit it into METADATA.

setuptool seems to accept only ReST compatible file. (Just impression. No serious test is done yet.)

flint seems to ignore this license entry in pyproject.toml.

According to my vague memory, I tried something along:

license = { file="LICENSE.txt", content-type="text/plane" }

It didn't fix this issue.

I don't know if this is bug on build tool side or not. Important thing is to inform user for practical workaround for the moment and someone who understand situation to ask other packages to work with this situation etc.

from sampleproject.

jeanas avatar jeanas commented on August 23, 2024

I couldn't reproduce. After replacing the contents of LICENSE.txt with the GPL text, I still get

$ twine check *
Checking sampleproject-3.0.0-py3-none-any.whl: PASSED
Checking sampleproject-3.0.0.tar.gz: PASSED

from sampleproject.

abravalheri avatar abravalheri commented on August 23, 2024

I cannot reproduce it either... This is what I did in detail:

# docker run --rm -it python:3.12-bookworm /bin/bash
cd /tmp
git clone https://github.com/pypa/sampleproject
cd sampleproject
wget https://www.gnu.org/licenses/gpl-3.0.txt -O LICENSE.txt
sed -i 's/name = "sampleproject"/name = "sampleproject-abc"/g' pyproject.toml

python -m pip install -U pip build twine

python -m build
twine check dist/*
# Checking dist/sampleproject-3.0.0-py3-none-any.whl: PASSED
# Checking dist/sampleproject-3.0.0.tar.gz: PASSED

twine upload --verbose --repository testpypi dist/*
# ...
# INFO     Response from https://test.pypi.org/legacy/:
#         200 OK
# ...
# View at:
# https://test.pypi.org/project/sampleproject-abc/3.0.0/


unzip -p dist/sampleproject_abc-3.0.0-py3-none-any.whl sampleproject_abc-3.0.0.dist-info/METADATA > /tmp/METADATA.orig
head -n 20 /tmp/METADATA.orig
# Metadata-Version: 2.1
# Name: sampleproject-abc
# Version: 3.0.0
# Summary: A sample Python project
# Author-email: "A. Random Developer" <[email protected]>
# Maintainer-email: "A. Great Maintainer" <[email protected]>
# License: GNU GENERAL PUBLIC LICENSE
#                                Version 3, 29 June 2007
#
#          Copyright (C) 2007 Free Software Foundation, Inc. <https://fsf.org/>
#          Everyone is permitted to copy and distribute verbatim copies
#          of this license document, but changing it is not allowed.
#
#                                     Preamble
#
#           The GNU General Public License is a free, copyleft license for
#         software and other kinds of works.
#
#           The licenses for most software and other practical works are designed
#         to take away your freedom to share and change the works.  By contrast,


wget https://test-files.pythonhosted.org/packages/c0/c0/7b07efedcdcfab3c2700ce7c23d7a737b4b2419f5917f43993c8635f320e/sampleproject_abc-3.0.0-py3-none-any.whl -O /tmp/sampleproject_abc-3.0.0-py3-none-any.whl
unzip -p /tmp/sampleproject_abc-3.0.0-py3-none-any.whl sampleproject_abc-3.0.0.dist-info/METADATA > /tmp/METADATA.download
diff /tmp/METADATA.*
# => no difference

# Further investigate if the escaping works well
# for the "email message" format used in METADATA:
cat <<EOF > /tmp/test-metadata.py
import sys
import email

with open(sys.argv[1], "rb") as fp:
    message = email.message_from_binary_file(fp)

for field, text in (
    ("license", message["License"]),
    ("readme", message.get_payload()),
):
    print(f"------- {field} --------")
    lines = text.splitlines(keepends=True)
    print("".join(lines[:3]))
    print("...")
    print("".join(lines[-3:]))
EOF

python /tmp/test-metadata.py /tmp/METADATA.orig
# ------- license --------
# GNU GENERAL PUBLIC LICENSE
#                                Version 3, 29 June 2007
#
#
# ...
#         Public License instead of this License.  But first, please read
#         <https://www.gnu.org/licenses/why-not-lgpl.html>.
#
# ------- readme --------
# # A sample Python project
#
# ![Python Logo](https://www.python.org/static/community_logos/python-logo.png "Sample inline image")
#
# ...
# [rst]: http://docutils.sourceforge.net/rst.html
# [md]: https://tools.ietf.org/html/rfc7764#section-3.5 "CommonMark variant"
# [md use]: https://packaging.python.org/specifications/core-metadata/#description-content-type-optional

# ---- Eventhing seems to work well in the METADATA file ----

Anyway, I think this repository is not the best place to be discussing this issue.
Once the author has a complete minimal reproducer, the following possibilities might be investigated:

  1. Escaping problem, to be reported/investigated in the pypa/setuptools project (later, after the investigation is complete, it might also involve pypa/distutils and/or pypa/wheel).
  2. Validation problem, to be reported/investigated in the pypi/warehouse project.
  3. Difference between twine check and the checks in pypi, to be discussed in the pypa/twine and/or pypi/warehouse projects.

However it is important to have a clear idea what item(s) in the list above is actually happening, before opening any new issue; and verify them with solid reproducers.

Please feel free to use the quick script in the example above, to check if the METADATA file is malformed... It is by no means an exhaustive check though but maybe when added to twine check is enough?

from sampleproject.

osamuaoki avatar osamuaoki commented on August 23, 2024

I should have been more explicit which LICENSE text I was using. I am using good old GPL2 text as https://github.com/osamuaoki/imediff/blob/main/LICENSE and renamed it to LICENSE.txt . This file is causing problem.

With your test case with GPL3 text, I also get no problem on Debian/stable:

$ wget https://www.gnu.org/licenses/gpl-3.0.txt -O LICENSE.txt
$ python3 -m build
$ twine check dist/*
Checking dist/sampleproject-3.0.0-py3-none-any.whl: PASSED
Checking dist/sampleproject-3.0.0.tar.gz: PASSED

So the problem is specific to the GPL2 text. Quotation styles are different.

My GPL2 file uses `foo'
Your GPL3 file uses "foo"

FYI I am using Debian 12/ stable as the base. This means Python3.11. (I also used the latest PIP packages under venv to be current)

from sampleproject.

abravalheri avatar abravalheri commented on August 23, 2024

Thank you for the reproducer.
This seems to be the same as pypa/setuptools#4033.
I would say the root of the problem is the \x0c characters in the license text. Isn't it?

Quoting Paul Moore in the thread I linked above:

This is because neither the metadata spec nor the email standards really say what to do with control characters in header values, and the stdlib email package appears to handle it inconsistently.

The escaping goes wrong when pypa/wheel tries to regenerated the metadata file by ingesting the file that pypa/setuptools generates. Per se, the file generated in pypa/setuptools parses correctly, but after being re-generated by pypa/wheel, it does not.

Recently I submitted a PR to pypa/distutils that will make the life of pypa/wheel easier, and prevent this round-trip problem from happening all togheter. However pypa/distutils has not been merged into pypa/setuptools yet.

If the problem is indeed \x0c, I recommend closing this issue since the problem is already being tracked in pypa/setuptools#4033.

As a workaround you can replace all the \x0c characters in your license file with a \n, or simply remove out the licence = ... in your pyproject.toml as you have previously suggested (the license file will still be included in the wheel as a standalone file, so you are still distributing the license).

from sampleproject.

pfmoore avatar pfmoore commented on August 23, 2024

I'm confused. The reported error is:

ERROR    `long_description` has syntax errors in markup and would not be rendered on PyPI.                                                                                   
         line 2: Warning: Block quote ends without a blank line; unexpected unindent.                                                                                        
WARNING  `long_description_content_type` missing. defaulting to `text/x-rst`.

That's about the README, not the license, and it's saying that the README isn't valid Restructured text. It looks like whatever changes the OP made to the readme (or to the filename of the readme, because if it were still a .md file it would be detected as Markdown) are the problem here.

from sampleproject.

osamuaoki avatar osamuaoki commented on August 23, 2024

I can confirm the problematic content is in the LICENSE text not in README.

Good old FormFeed aka. FF (0x0c) in LICENSE was the real root cause. By removing them, I get clean result with no ERROR nor WARNING. So the problem is in setuptools.

As for the unhelpful ERROR message which seems to imply the problem reside in README.md or content-type setting... I don't know how exactly it happens to say this but this error report message has some room to be improved to lead people to the root cause.

from sampleproject.

abravalheri avatar abravalheri commented on August 23, 2024

I'm confused. The reported error is:

Hi @pfmoore, I am trying to summarise my thought process below. Hopefully it will improve the understanding 🤞.

When the stdlib's email functions are used to read the METADATA files, they don't care if there is a \xc0 character... It seems to deal with it very well1. So that is why the PKG-INFO inside tar.gz sdist parses correctly.

However, when pypa/wheel regenerates the METADATA file from PKG-INFO it replaces the \xc0 character with a \n. If you happen to have in your license file a \n\xc0 or \xc0\n sequence, this means that pypa/wheel will generate a METADATA file with \n\n.

Now, in the email message format, the \n\n sequence is used to mark the end of the headers section and the beginning of the body/payload. In the packaging standard, we use the body/payload of the email message for the README/long_description.

So the error message is complaining about the README/long_description, because part of the LICENSE file is incorrectly being considered as the email message body/payload.

This is the same error discussed in pypa/setuptools#4033, for which pypa/distutils#213 should provide a good workaround (we have to wait until it lands on setuptools).

Footnotes

  1. And/or at least the functions used for parsing in twine/PyPI.

from sampleproject.

pfmoore avatar pfmoore commented on August 23, 2024

Ah, I see. Thanks - that's a lot more subtle than I realised.

from sampleproject.

osamuaoki avatar osamuaoki commented on August 23, 2024

Hi, looking at pypa/setuptools#4033 , problematic control codes are:

 '0x0b': ('\x0b', ['', '']), 
 '0x0c': ('\x0c', ['', '']),
 '0x1c': ('\x1c', ['', '']),
 '0x1d': ('\x1d', ['', '']),
 '0x1e': ('\x1e', ['', '']),

In other words

Oct   Dec   Hex   Char                     
─────────────────────────────
013   11    0B    VT  '\v' (vertical tab)  Ctrl-K
014   12    0C    FF  '\f' (form feed)     Ctrl-L
034   28    1C    FS  (file separator)     Ctrl-\
035   29    1D    GS  (group separator)    Ctrl-]
036   30    1E    RS  (record separator)   Ctrl-^

These needs to be removed from the original LICENSE file to make source code robust and portable to older platforms.

As for the following text in pyproject.toml:

# This is either text indicating the license for the distribution, or a file
# that contains the license
# https://packaging.python.org/en/latest/specifications/core-metadata/#license
license = {file = "LICENSE.txt"}

I suggest updating this section with:

# This is either text indicating the license for the distribution, or a file
# that contains the license
# For the better compatibility with the older tool chain, please remove control
# characters (0x0b=^K, 0x0c=^L, 0x1c=^\, 0x1d=^], 0x1e=^^) in the` LICENSE.txt`
# file if they exist.  See https://github.com/pypa/setuptools/issues/4033  .
# https://packaging.python.org/en/latest/specifications/core-metadata/#license
license = {file = "LICENSE.txt"}

from sampleproject.

jeanas avatar jeanas commented on August 23, 2024

IMHO, we should just fix that bug and not document the workaround, because pip and build use build isolation by default and most people never change that setting, which should fix the problem without people needing to upgrade their toolchain as soon as a new release of wheel is out.

from sampleproject.

osamuaoki avatar osamuaoki commented on August 23, 2024

I see your point and agree.

Maybe leave this issue open until the new release of tool chain.

from sampleproject.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.