Git Product home page Git Product logo

Comments (3)

garyillyes avatar garyillyes commented on May 5, 2024

Hi Mojmir and thanks for opening this issue. Custom lines such as Crawl-delay and Sitemap should indeed not affect parsing of other lines, in fact they should be ignored as also stipulated in the REP internet draft. For example, from the perspective of Googlebot these two robots.txt snippets are equivalent:

User-agent: *
Crawl-delay: 10

User-agent: badbot
Disallow: /

VS

User-agent: *
Some other unsupported line in plain text

User-agent: badbot
Disallow: /

This means that User-agent: * is merged together with User-agent: badbot, essentially disallowing everything for the global (*) user-agent.

Not ignoring custom lines such as Crawl-delay was a bug, and has been fixed in the following commit: c8ac4b1

Unfortunately the testing tool in Google Search Console, unlike Googlebot, is not using this library so we haven't gotten to fixing this obscure bug there, too.

from robotstxt.

mojmirdurik avatar mojmirdurik commented on May 5, 2024

Hi Gary, thank you for your answer.

I didn't know - the syntax of robots.txt is really a bit tricky. Unofficial rules (e.g. Crawl-delay) can result in different evaluations by different bots. Because if unofficial rule is ignored by bot then the two groups are merged into one and the meaning of robots.txt can be changed dramatically.

Having this on mind it is better to put unofficial rules at the end of file, especially if they are used with User-agent: *. So Googlebot will be blocked by robots.txt like this (ignoring Crawl-delay):

User-agent: *
Crawl-delay: 10

User-agent: badbot
Disallow: /

...but not by robots.txt like this:

User-agent: badbot
Disallow: /

User-agent: *
Crawl-delay: 10

...even both seems to do the same thing.

from robotstxt.

garyillyes avatar garyillyes commented on May 5, 2024

You're correct, lines that are not supported by Googlebot but are in a group otherwise like Crawl-delay in your examples, ideally should be at the end of file.

from robotstxt.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.