Git Product home page Git Product logo

Comments (8)

aronwc avatar aronwc commented on September 18, 2024

You should extract all href links from this page, and filter them to ones that contain a suffix that match an element of the names parameter. See the doctest for an example.
Note that the return type is a SortedSet.

Self-links should be returned by the get_links method, but you should then filter them out in the read_links method. The read_links method will call get_links.

from main.

dakshaau avatar dakshaau commented on September 18, 2024

I am getting 5035 inlinks and outlinks instead of 5047. The ranks are correct with different values though.

I was getting a charmap error while forming a string from the HTML file. To solve this I used "encoding=utf8". Do I have to use a different encoding to get the correct results?

from main.

aronwc avatar aronwc commented on September 18, 2024

Hmm...can you confirm that read_names returns 509 names? Some of the file names have strange characters, which perhaps is handled differently by different operating systems.
For reference, I've included here the number of outlinks found for each page.
outlinks.txt

from main.

dakshaau avatar dakshaau commented on September 18, 2024

The read_names is returning 509 names. there seems to be a difference of 1 outlink for most(490) of the names.

I have attached my output for outlinks
myout.txt

from main.

aronwc avatar aronwc commented on September 18, 2024

Perhaps you should not assume the /wiki/ prefix.

from main.

dakshaau avatar dakshaau commented on September 18, 2024

Sir,

I tried 2-3 variations for finding the outlinks.

  1. I removed '/wiki/' from the search criteria: 435 names have different numbers
  2. I retained the self names i.e., Ada_Lovelace in Ada_Lovelace: 94 names have different numbers
  3. I kept '/wiki/' and retained self names: 33 names have different outlink length

None of the above versions had total outlink near 5047 though

In the description of read_links(), outlinks['Ada_Lovelace'] has 2 outlinks, but in the outlinks.txt, you provided for reference, has 3 outlinks.

from main.

aronwc avatar aronwc commented on September 18, 2024

Here are the three links get_links should return for Ada_Lovelace:

['Ada_Lovelace', 'Alan_Turing', 'Charles_Babbage']
Inside the read_links function, the self link should be removed, leaving
Turing & Babbage.

On Wed, Apr 20, 2016 at 12:21 PM, dakshaau [email protected] wrote:

Sir,

I tried 2-3 variations for finding the outlinks.

  1. I removed '/wiki/' from the search criteria: 435 names have
    different numbers
  2. I retained the self names i.e., Ada_Lovelace in Ada_Lovelace: 94
    names have different numbers
  3. I kept '/wiki/' and retained self names: 33 names have different
    outlink length

None of the above versions had total outlink near 5047 though

In the description of read_links(), outlinks['Ada_Lovelace'] has 2
outlinks, but in the outlinks.txt, you provided for reference, has 3
outlinks.


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#159 (comment)

from main.

dakshaau avatar dakshaau commented on September 18, 2024

Sir,

I think I found the issue. there is a name in your outlinks 'Guy_L._Steele,_Jr.' but in my data folder the name of the file is 'Guy_L._Steele,_Jr' without '.' because of Windows OS. The name of this file is correct in the archive but when it is extracted the second '.' disappears.

I have 12 less links, and since this name is read wrong, then probably this is the one causing the problem.

What should be done in this case?

EDIT: Adding '.' forcibly to the name 'Guy_L._Steele,_Jr' fixed the issue

from main.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.