Git Product home page Git Product logo

Comments (4)

kyclark avatar kyclark commented on July 25, 2024 1

Well, first off, my apologies for taking so long to address the original bug @safeisrisky. Yes, I can see there is a discrepancy between how I'm counting bytes using the length of a string and the actual size of the string. At this point, I can't fix the book, so I think it will just have to remain "wrong" where here "wrong" means "not the same as wc." It's unfortunate, but the point of the exercise is to get the reader to think about lines, words, and characters. I hate that I missed the distinction between characters and memory! Thanks for pointing this out.

from tiny_python_projects.

AlbertUlysses avatar AlbertUlysses commented on July 25, 2024

Yes, I can also confirm this. I think the method the python script uses disregards some bytes. I double checked using the Linux's wc command
@kyclark
do you have any input on this?

from tiny_python_projects.

AlbertUlysses avatar AlbertUlysses commented on July 25, 2024

I just saw in the book the solution is correct on page 111, when it uses the linux's wc command it has the 714 total, but the test.py in the repo is wrong.
However, when we enter solution.py into the test.py in the repo it passes all the test.
when we compare the two:

>>> import os
>>> len(open('../inputs/sonnet-29.txt').read())
661
>>> os.path.getsize('../inputs/sonnet-29.txt')
669
>>> 

We can see that the len of the file's string isn't the same as a byte size.
Basically the different between the two is explained here:

A. To count number of characters in str object, you can use len() function:

>>> print(len('please anwser my question'))
25

B. To get memory size in bytes allocated to store str object, you can use sys.getsizeof() function

>>> from sys import getsizeof
>>> print(getsizeof('please anwser my question'))
50

source: https://stackoverflow.com/questions/4967580/how-to-get-the-size-of-a-string-in-python

from tiny_python_projects.

AlbertUlysses avatar AlbertUlysses commented on July 25, 2024

Hi!
I wanted to say one last thing, which is mostly for anyone curious about this. There isn't anything "wrong" with the code. The "issue" occurs because Python strings are in latin1, and the text file (sonnet-29.txt) has UTF8 characters (the apostrophes). So when we read the string, it returns a slightly smaller number. The easiest way to "fix" this is to change one variable in the solution code to:

num_bytes += len(line.encode('UTF8'))

With this change, there should be a change in the test.py

Also, thank you @kyclark for all the work you did on this book. I'm really enjoying it.

from tiny_python_projects.

Related Issues (18)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.