Git Product home page Git Product logo

Comments (5)

sjdirect avatar sjdirect commented on June 7, 2024

Have you tried creating your own IHyperLinkParser or Extending the AnglesharpHyperlinkParser to implement this logic. Wouldn't be hard to do. You would also need to change the following to make sure it would download the content of the sitemap url...

        config.DownloadableContentTypes = "text/html, application/xml";

from abot.

JoshTango avatar JoshTango commented on June 7, 2024

I might one day.
but the sitemap.xml is suck a generalized standard thing these days I thought you might want to build it in to Abot

from abot.

winzig avatar winzig commented on June 7, 2024

Abot doesn't use sitemaps to help discover pages to crawl?

from abot.

sjdirect avatar sjdirect commented on June 7, 2024

It's default behavior is to crawl the site based on real navigate-able links. The sitemap can be completely out of sync with the real site so was never part of the original design. However, you can implement your own IHyperLinkParser like mentioned above that will use the sitemap.

from abot.

winzig avatar winzig commented on June 7, 2024

In my experience, we have used sitemaps extensively to help search engines index pages of our sites that they may otherwise have trouble finding. So yeah we'll have to implement this internally I guess.

from abot.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.