Comments (4)
Well, first off, my apologies for taking so long to address the original bug @safeisrisky. Yes, I can see there is a discrepancy between how I'm counting bytes using the length of a string and the actual size of the string. At this point, I can't fix the book, so I think it will just have to remain "wrong" where here "wrong" means "not the same as wc." It's unfortunate, but the point of the exercise is to get the reader to think about lines, words, and characters. I hate that I missed the distinction between characters and memory! Thanks for pointing this out.
from tiny_python_projects.
Yes, I can also confirm this. I think the method the python script uses disregards some bytes. I double checked using the Linux's wc command
@kyclark
do you have any input on this?
from tiny_python_projects.
I just saw in the book the solution is correct on page 111, when it uses the linux's wc command it has the 714 total, but the test.py
in the repo is wrong.
However, when we enter solution.py
into the test.py
in the repo it passes all the test.
when we compare the two:
>>> import os
>>> len(open('../inputs/sonnet-29.txt').read())
661
>>> os.path.getsize('../inputs/sonnet-29.txt')
669
>>>
We can see that the len of the file's string isn't the same as a byte size.
Basically the different between the two is explained here:
A. To count number of characters in str object, you can use len() function:
>>> print(len('please anwser my question'))
25
B. To get memory size in bytes allocated to store str object, you can use sys.getsizeof() function
>>> from sys import getsizeof
>>> print(getsizeof('please anwser my question'))
50
source: https://stackoverflow.com/questions/4967580/how-to-get-the-size-of-a-string-in-python
from tiny_python_projects.
Hi!
I wanted to say one last thing, which is mostly for anyone curious about this. There isn't anything "wrong" with the code. The "issue" occurs because Python strings are in latin1, and the text file (sonnet-29.txt) has UTF8 characters (the apostrophes). So when we read the string, it returns a slightly smaller number. The easiest way to "fix" this is to change one variable in the solution code to:
num_bytes += len(line.encode('UTF8'))
With this change, there should be a change in the test.py
Also, thank you @kyclark for all the work you did on this book. I'm really enjoying it.
from tiny_python_projects.
Related Issues (18)
- Pyghon HOT 1
- Support Different OS File Separators HOT 1
- Windows: WSL is Required HOT 1
- One more possible solution to telephone.py? HOT 1
- Unable To Run pytest For 01_hello HOT 1
- itictactoe No winner if win at the last turn HOT 1
- 01_hello/test.py does not work for Windows 10 Professional HOT 1
- Issue with command line arguments example HOT 2
- pytest -v test.py throwing an error HOT 1
- tiny project test
- The term 'python3' is not recognized as the name of a cmdlet HOT 2
- test_excutable() E AssertionError: asser ''=='Hello, World!' E -Hello, World! HOT 3
- Request HOT 1
- hello ? for ProbeDesigner HOT 2
- Learn pha
- Add Code Of Conduct and Contibuting.md HOT 2
- A small discrepancy HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tiny_python_projects.