Git Product home page Git Product logo

human_power's Introduction

Build Status

Human Power

Human Power lets you write your robots.txt in plain Ruby and force the robots into submission!

Bingbot getting taught

Installation

Add this line to your Gemfile:

gem 'human_power'

Then run:

$ bundle

Or install it yourself:

$ gem install human_power

If you are using Rails, you can add a sample config/robots.rb configuration file and route for /robots.txt:

$ rails g human_power:install

It will allow crawlers access to the whole site by default.

Now you can restart your server and visit /robots.txt to see what's generated from the new configuration file.

Usage

Using in Ruby on Rails

In config/robots.rb:

# Disallow everything in /admin for all user agents
disallow_tree admin_path

# Googlebot
googlebot do
  disallow reviews_path # Disallow a path
  allow product_path("one") # Allow a path
  disallow new_product_path, new_category_path # Disallow multiple paths in one line
end

# Bingbot
bingbot do
  disallow :all # There you go, Bingbot! (Disallow everything)
end

# Identical settings for multiple user agents
user_agent [:bingbot, :googlebot] do
  disallow login_path
end

# Custom user agent
user_agent "My Custom User Agent String" do
  disallow some_path
end

# You have access to everything from your ApplicationController
if request.subdomains.first == "api"
  disallow :all
end

# Add one or more sitemaps
sitemap sitemap_url
sitemap one_url, two_url

Then visit /robots.txt in your browser.

Crawlers

Please see user_agents.yml for a list of 170+ built-in user agents/crawlers you can use like shown above. The list is from UserAgentString.com.

Bot detection

You can use the HumanPower.is_bot? method to check if a user agent is a known bot / crawler:

# Googlebot
ua = "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
HumanPower.is_bot?(ua) # => true

# Chrome
ua = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1944.0 Safari/537.36"
HumanPower.is_bot?(ua) # => false

# in Rails
HumanPower.is_bot?(request.user_agent) # => performs check on current user agent

Regular expression

If you need to get a regular expression for bot detection, you can use:

HumanPower.bot_regex # => regular expression that matches all known bots / crawlers

Caveats

Human Power is great for adding rules to your robots.txt. You should note, however, that the user agents are sorted alphabetically upon rendering. This is fine for most use cases, but if you add more advanced rules relying on user agent order, please be sure to check that robots.txt is generated into something that meets your needs. If you need more advanced features and rely heavily on ordering, please submit an issue or pull request. Thanks.

Versioning

Follows semantic versioning.

Contributing

You are very welcome to contribute to this project if you have a feature that you think others can use. I'd appreciate it.

  1. Fork the project
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create new pull request

human_power's People

Contributors

lassebunk avatar pikachuexe avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

human_power's Issues

Rails Environment-specific Configuration

Is there any way to have different configurations for different Rails environments?

if Rails.env.production?
  disallow :none
else
  disallow :all
end

Error:

NoMethodError (undefined method `env' for HumanPower::Rails:Module):
  config/robots.rb:1:in `block in robots'
  lib/silverpop_middleware.rb:13:in `call'
  lib/health_check_middleware.rb:10:in `call'

rules for all user agents

Is there an easy way in the DSL to write write rules for all user agents instead. So instead of...

googlebot do
 ...
end

...or...

user_agent[:googlebot, :bingbot] do
... 
end

...is there an equivalent of the hard-coded phrase Use_agent: * that would allow any user agent at the top of robots.txt ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.