Comments (8)
You should extract all href links from this page, and filter them to ones that contain a suffix that match an element of the names
parameter. See the doctest for an example.
Note that the return type is a SortedSet.
Self-links should be returned by the get_links
method, but you should then filter them out in the read_links
method. The read_links
method will call get_links
.
from main.
I am getting 5035 inlinks and outlinks instead of 5047. The ranks are correct with different values though.
I was getting a charmap error while forming a string from the HTML file. To solve this I used "encoding=utf8". Do I have to use a different encoding to get the correct results?
from main.
Hmm...can you confirm that read_names returns 509 names? Some of the file names have strange characters, which perhaps is handled differently by different operating systems.
For reference, I've included here the number of outlinks found for each page.
outlinks.txt
from main.
The read_names is returning 509 names. there seems to be a difference of 1 outlink for most(490) of the names.
I have attached my output for outlinks
myout.txt
from main.
Perhaps you should not assume the /wiki/
prefix.
from main.
Sir,
I tried 2-3 variations for finding the outlinks.
- I removed '/wiki/' from the search criteria: 435 names have different numbers
- I retained the self names i.e., Ada_Lovelace in Ada_Lovelace: 94 names have different numbers
- I kept '/wiki/' and retained self names: 33 names have different outlink length
None of the above versions had total outlink near 5047 though
In the description of read_links(), outlinks['Ada_Lovelace'] has 2 outlinks, but in the outlinks.txt, you provided for reference, has 3 outlinks.
from main.
Here are the three links get_links should return for Ada_Lovelace:
['Ada_Lovelace', 'Alan_Turing', 'Charles_Babbage']
Inside the read_links function, the self link should be removed, leaving
Turing & Babbage.
On Wed, Apr 20, 2016 at 12:21 PM, dakshaau [email protected] wrote:
Sir,
I tried 2-3 variations for finding the outlinks.
- I removed '/wiki/' from the search criteria: 435 names have
different numbers- I retained the self names i.e., Ada_Lovelace in Ada_Lovelace: 94
names have different numbers- I kept '/wiki/' and retained self names: 33 names have different
outlink lengthNone of the above versions had total outlink near 5047 though
In the description of read_links(), outlinks['Ada_Lovelace'] has 2
outlinks, but in the outlinks.txt, you provided for reference, has 3
outlinks.—
You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#159 (comment)
from main.
Sir,
I think I found the issue. there is a name in your outlinks 'Guy_L._Steele,_Jr.' but in my data folder the name of the file is 'Guy_L._Steele,_Jr' without '.' because of Windows OS. The name of this file is correct in the archive but when it is extracted the second '.' disappears.
I have 12 less links, and since this name is read wrong, then probably this is the one causing the problem.
What should be done in this case?
EDIT: Adding '.' forcibly to the name 'Guy_L._Steele,_Jr' fixed the issue
from main.
Related Issues (20)
- score of terms never seen before HOT 1
- Classify Takes 3-5 minutes to run HOT 3
- A3 : taking more time to complete HOT 4
- A3: All ham, no spam HOT 3
- A3: Word Probability Doctest HOT 2
- A3: Formatting output for classify method HOT 2
- error value in bonus assignment HOT 1
- Bonus Assignment: how to Initialize K mean vector to be first k-docs HOT 4
- Bonus Assignment: need of "mean_norm" in distance() method ? HOT 6
- Is submission for the Bonus assignment through our personal repository? HOT 1
- Bonus: I'm off by a little in each iteration HOT 8
- Runitme for bonus assignment HOT 3
- PageRank: Issue with compute page rank HOT 4
- Ada Lovelace Example Different HOT 4
- Wrong Inlinks values from read_links method HOT 3
- HITS normalization HOT 1
- get_top_pageranks HOT 1
- Smaller Subset of data log file
- Off by 0.0004 & 12 links HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from main.