Comments (3)
Hi Mojmir and thanks for opening this issue. Custom lines such as Crawl-delay
and Sitemap
should indeed not affect parsing of other lines, in fact they should be ignored as also stipulated in the REP internet draft. For example, from the perspective of Googlebot these two robots.txt snippets are equivalent:
User-agent: *
Crawl-delay: 10
User-agent: badbot
Disallow: /
VS
User-agent: *
Some other unsupported line in plain text
User-agent: badbot
Disallow: /
This means that User-agent: *
is merged together with User-agent: badbot
, essentially disallowing everything for the global (*
) user-agent.
Not ignoring custom lines such as Crawl-delay
was a bug, and has been fixed in the following commit: c8ac4b1
Unfortunately the testing tool in Google Search Console, unlike Googlebot, is not using this library so we haven't gotten to fixing this obscure bug there, too.
from robotstxt.
Hi Gary, thank you for your answer.
I didn't know - the syntax of robots.txt is really a bit tricky. Unofficial rules (e.g. Crawl-delay
) can result in different evaluations by different bots. Because if unofficial rule is ignored by bot then the two groups are merged into one and the meaning of robots.txt can be changed dramatically.
Having this on mind it is better to put unofficial rules at the end of file, especially if they are used with User-agent: *
. So Googlebot will be blocked by robots.txt like this (ignoring Crawl-delay
):
User-agent: *
Crawl-delay: 10
User-agent: badbot
Disallow: /
...but not by robots.txt like this:
User-agent: badbot
Disallow: /
User-agent: *
Crawl-delay: 10
...even both seems to do the same thing.
from robotstxt.
You're correct, lines that are not supported by Googlebot but are in a group otherwise like Crawl-delay
in your examples, ideally should be at the end of file.
from robotstxt.
Related Issues (20)
- [redacted]
- [redacted]
- [redacted]
- [redacted]
- [redacted]
- [redacted] HOT 1
- Consider a WASM build HOT 2
- googletest.git has tag main HOT 2
- Google's robots.txt parser and matchet
- Allow wider range of chars for valid user-agent identifier / 'product' token HOT 13
- bazel test failed with `bazelisk`: Repository '@bazel_skylib' is not defined
- CMAKE_CXX_STANDARD 14 HOT 1
- Update README build requirements HOT 1
- User-agent names in test ID_UserAgentValueCaseInsensitive to follow the standard HOT 5
- Special characters * and $ not matched in URI
- Issues with Bazel build HOT 1
- gpyrobotstxt, a Python Native port of this repo HOT 1
- SEOサイト
- An encoding test does not appear to match the RFC?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from robotstxt.