Git Product home page Git Product logo

prerender_rails's Introduction

Prerender

Prerender is a node server that uses Headless Chrome to render HTML, screenshots, PDFs, and HAR files out of any web page. The Prerender server listens for an http request, takes the URL and loads it in Headless Chrome, waits for the page to finish loading by waiting for the network to be idle, and then returns your content.

The quickest way to run your own prerender server:
$ npm install prerender
server.js
const prerender = require('prerender');
const server = prerender();
server.start();
test it:
curl http://localhost:3000/render?url=https://www.example.com/

Use Cases

The Prerender server can be used in conjunction with our Prerender.io middleware in order to serve the prerendered HTML of your javascript website to search engines (Google, Bing, etc) and social networks (Facebook, Twitter, etc) for SEO. We run the Prerender server at scale for SEO needs at https://prerender.io/.

The Prerender server can be used on its own to crawl any web page and pull down the content for your own parsing needs. We host the Prerender server for your own crawling needs at https://prerender.com/.

Prerender differs from Google Puppeteer in that Prerender is a web server that takes in URLs and loads them in parallel in a new tab in Headless Chrome. Puppeteer is an API for interacting with Chrome, but you still have to write that interaction yourself. With Prerender, you don't have to write any code to launch Chrome, load pages, wait for the page to load, or pull the content off of the page. The Prerender server handles all of that for you so you can focus on more important things!

Below you will find documentation for our Prerender.io service (website SEO) and our Prerender.com service (web crawling).

Click here to jump to Prerender.io documentation

Click here to jump to Prerender.com documentation

Prerender.io

For serving your prerendered HTML to crawlers for SEO

Prerender solves SEO by serving prerendered HTML to Google and other search engines. It's easy:

  • Just install the appropriate middleware for your app (or check out the source code and build your own)
  • Make sure search engines have a way of discovering your pages (e.g. sitemap.xml and links from other parts of your site or from around the web)
  • That's it! Perfect SEO on javascript pages.

Middleware

This is a list of middleware available to use with the prerender service:

Official middleware

Javascript
Ruby
Apache
Nginx

Community middleware

PHP
Java
Go
Grails
Nginx
Apache

Request more middleware for a different framework in this issue.

How it works

This is a simple service that only takes a url and returns the rendered HTML (with all script tags removed).

Note: you should proxy the request through your server (using middleware) so that any relative links to CSS/images/etc still work.

GET https://service.prerender.io/https://www.google.com

GET https://service.prerender.io/https://www.google.com/search?q=angular

Running locally

If you are trying to test Prerender with your website on localhost, you'll have to run the Prerender server locally so that Prerender can access your local dev website.

If you are running the prerender service locally. Make sure you set your middleware to point to your local Prerender server with:

export PRERENDER_SERVICE_URL=http://localhost:3000

$ git clone https://github.com/prerender/prerender.git
$ cd prerender
$ npm install
$ node server.js

Prerender will now be running on http://localhost:3000. If you wanted to start a web app that ran on say, http://localhost:8000, you can now visit the URL http://localhost:3000/http://localhost:8000 to see how your app would render in Prerender.

To test how your website will render through Prerender using the middleware, you'll want to visit the URL http://localhost:8000?_escaped_fragment_=

That should send a request to the Prerender server and display the prerendered page through your website. If you View Source of that page, you should see the HTML with all of the <script> tags removed.

Keep in mind you will see 504s for relative URLs when accessing http://localhost:3000/http://localhost:8000 because the actual domain on that request is your prerender server. This isn't really an issue because once you proxy that request through the middleware, then the domain will be your website and those requests won't be sent to the prerender server. For instance if you want to see your relative URLS working visit http://localhost:8000?_escaped_fragment_=

Customization

You can clone this repo and run server.js OR include prerender in your project with npm install prerender --save to create an express-like server with custom plugins.

Options

chromeLocation

var prerender = require('./lib');

var server = prerender({
    chromeLocation: '/Applications/Google\ Chrome\ Canary.app/Contents/MacOS/Google\ Chrome\ Canary'
});

server.start();

Uses a chrome install at a certain location. Prerender does not download Chrome so you will want to make sure Chrome is installed on your server already. The Prerender server checks a few known locations for Chrome but this lets you override that.

Default: null

logRequests

var prerender = require('./lib');

var server = prerender({
    logRequests: true
});

server.start();

Causes the Prerender server to print out every request made represented by a + and every response received represented by a -. Lets you analyze page load times.

Default: false

captureConsoleLog

var prerender = require('./lib');

var server = prerender({
    captureConsoleLog: true
});

server.start();

Prerender server will store all console logs into pageLoadInfo.logEntries for further analytics.

Default: false

pageDoneCheckInterval

var prerender = require('./lib');

var server = prerender({
    pageDoneCheckInterval: 1000
});

server.start();

Number of milliseconds between the interval of checking whether the page is done loading or not. You can also set the environment variable of PAGE_DONE_CHECK_INTERVAL instead of passing in the pageDoneCheckInterval parameter.

Default: 500

pageLoadTimeout

var prerender = require('./lib');

var server = prerender({
    pageLoadTimeout: 20 * 1000
});

server.start();

Maximum number of milliseconds to wait while downloading the page, waiting for all pending requests/ajax calls to complete before timing out and continuing on. Time out condition does not cause an error, it just returns the HTML on the page at that moment. You can also set the environment variable of PAGE_LOAD_TIMEOUT instead of passing in the pageLoadTimeout parameter.

Default: 20000

waitAfterLastRequest

var prerender = require('./lib');

var server = prerender({
    waitAfterLastRequest: 500
});

server.start();

Number of milliseconds to wait after the number of requests/ajax calls in flight reaches zero. HTML is pulled off of the page at this point. You can also set the environment variable of WAIT_AFTER_LAST_REQUEST instead of passing in the waitAfterLastRequest parameter.

Default: 500

followRedirects

var prerender = require('./lib');

var server = prerender({
    followRedirects: false
});

server.start();

Whether Chrome follows a redirect on the first request if a redirect is encountered. Normally, for SEO purposes, you do not want to follow redirects. Instead, you want the Prerender server to return the redirect to the crawlers so they can update their index. Don't set this to true unless you know what you are doing. You can also set the environment variable of FOLLOW_REDIRECTS instead of passing in the followRedirects parameter.

Default: false

Plugins

We use a plugin system in the same way that Connect and Express use middleware. Our plugins are a little different and we don't want to confuse the prerender plugins with the prerender middleware, so we opted to call them "plugins".

Plugins are in the lib/plugins directory, and add functionality to the prerender service.

Each plugin can implement any of the plugin methods:

init()

requestReceived(req, res, next)

tabCreated(req, res, next)

pageLoaded(req, res, next)

beforeSend(req, res, next)

Available plugins

You can use any of these plugins by modifying the server.js file

basicAuth

If you want to only allow access to your Prerender server from authorized parties, enable the basic auth plugin.

You will need to add the BASIC_AUTH_USERNAME and BASIC_AUTH_PASSWORD environment variables.

export BASIC_AUTH_USERNAME=prerender
export BASIC_AUTH_PASSWORD=test

Then make sure to pass the basic authentication headers (password base64 encoded).

curl -u prerender:wrong http://localhost:3000/http://example.com -> 401
curl -u prerender:test http://localhost:3000/http://example.com -> 200

removeScriptTags

We remove script tags because we don't want any framework specific routing/rendering to happen on the rendered HTML once it's executed by the crawler. The crawlers may not execute javascript, but we'd rather be safe than have something get screwed up.

For example, if you rendered the HTML of an angular page but left the angular scripts in there, your browser would try to execute the angular routing and possibly end up clearing out the HTML of the page.

This plugin implements the pageLoaded function, so make sure any caching plugins run after this plugin is run to ensure you are caching pages with javascript removed.

httpHeaders

If your Javascript routing has a catch-all for things like 404's, you can tell the prerender service to serve a 404 to google instead of a 200. This way, google won't index your 404's.

Add these tags in the <head> of your page if you want to serve soft http headers. Note: Prerender will still send the HTML of the page. This just modifies the status code and headers being sent.

Example: telling prerender to server this page as a 404

<meta name="prerender-status-code" content="404" />

Example: telling prerender to serve this page as a 302 redirect

<meta name="prerender-status-code" content="302" />
<meta name="prerender-header" content="Location: https://www.google.com" />

whitelist

If you only want to allow requests to a certain domain, use this plugin to cause a 404 for any other domains.

You can add the whitelisted domains to the plugin itself, or use the ALLOWED_DOMAINS environment variable.

export ALLOWED_DOMAINS=www.prerender.io,prerender.io

blacklist

If you want to disallow requests to a certain domain, use this plugin to cause a 404 for the domains.

You can add the blacklisted domains to the plugin itself, or use the BLACKLISTED_DOMAINS environment variable.

export BLACKLISTED_DOMAINS=yahoo.com,www.google.com

in-memory-cache

Caches pages in memory. Available at prerender-memory-cache

s3-html-cache

Caches pages in S3. Available at coming soon


Prerender.com

For doing your own web crawling

When running your Prerender server in the web crawling context, we have a separate "API" for the server that is more complex to let you do different things like:

  • get HTML from a web page
  • get screenshots (viewport or full screen) from a web page
  • get PDFS from a web page
  • get HAR files from a web page
  • execute your own javascript and return json along with the HTML

If you make an http request to the /render endpoint, you can pass any of the following options. You can pass any of these options as query parameters on a GET request or as JSON properties on a POST request. We recommend using a POST request but we will display GET requests here for brevity. Click here to see how to send the POST request.

These examples assume you have the server running locally on port 3000 but you can also use our hosted service at https://prerender.com/.

url

The URL you want to load. Returns HTML by default.

http://localhost:3000/render?url=https://www.example.com/

renderType

The type of content you want to pull off the page.

http://localhost:3000/render?renderType=html&url=https://www.example.com/

Options are html, jpeg, png, pdf, har.

userAgent

Send your own custom user agent when Chrome loads the page.

http://localhost:3000/render?userAgent=ExampleCrawlerUserAgent&url=https://www.example.com/

fullpage

Whether you want your screenshot to be the entire height of the document or just the viewport.

http://localhost:3000/render?fullpage=true&renderType=html&url=https://www.example.com/

Don't include fullpage and we'll just screenshot the normal browser viewport. Include fullpage=true for a full page screenshot.

width

Screen width. Lets you emulate different screen sizes.

http://localhost:3000/render?width=990&url=https://www.example.com/

Default is 1440.

height

Screen height. Lets you emulate different screen sizes.

http://localhost:3000/render?height=100&url=https://www.example.com/

Default is 718.

followRedirects

By default, we don't follow 301 redirects on the initial request so you can be alerted of any changes in URLs to update your crawling data. If you want us to follow redirects instead, you can pass this parameter.

http://localhost:3000/render?followRedirects=true&url=https://www.example.com/

Default is false.

javascript

Execute javascript to modify the page before we snapshot your content. If you set window.prerenderData to an object, we will pull the object off the page and return it to you. Great for parsing extra data from a page in javascript.

http://localhost:3000/render?javascript=window.prerenderData=window.angular.version&url=https://www.example.com/

When using this parameter and window.prerenderData, the response from Prerender will look like:

{
	prerenderData: { example: 'data' },
	content: '<html><body></body></html>'
}

If you don't set window.prerenderData, the response won't be JSON. The response will just be the normal HTML.

Get vs Post

You can send all options as a query parameter on a GET request or as a JSON property on a POST request. We recommend using the POST request when possible to avoid any issues with URL encoding of GET request query strings. Here's a few pseudo examples:

POST http://localhost:3000/render
{
	renderType: 'html',
	javascript: 'window.prerenderData = window.angular.version',
	url: 'https://www.example.com/'
}
POST http://localhost:3000/render
{
	renderType: 'jpeg',
	fullpage: 'true',
	url: 'https://www.example.com/'
}

Check out our full documentation

License

The MIT License (MIT)

Copyright (c) 2013 Todd Hooper <[email protected]>

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

prerender_rails's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

prerender_rails's Issues

Improvement

Why do you define the user-agents as non downcase and then every request to downcase them?
Would be more efficient to define them as downcase in advance AND make them symbols and just the incoming UA make downcase + symbol

Performance issues

Hello, I spotted some performance issues in the middleware that can be really easily addressed. I am adding improvements as PRs, so I would really appreciate if they can be merged if they are ok. Any code change or improvement is welcome.

Thanks.

Prerender my homepage is taking a life

My app is build with angular and when I try to crawl it with Facebook URL Debugger I get a timeout. I use a local server to test the middleware and it never ends. I also try to cache it manually adding the URL in prerender.io and is also taking a life and not sure it accomplish the caching.

The website is in development and hosted in heroku http://lataberna.herokuapp.com/#/

Skype support

Skype now support thumbnails and Title for links. But don't support angular sites.

Googlebot, yahoo and bingbot are in not allowed users agents

Does it mean that the bots won't give response? This comment looks confused. It's not clear why they are removed from the list. Could you expand the comment, please?
If we don't have the bots in the allowed lists does it mean that Google will not index the site?

Calling prerender from the controller

This gem is probably better called prerender_rack as it can be used with an Rack compatible framework, not just Rails.

I like to be able to return prerender responses from inside my Rails controllers. Some of our URLs are impossible to whitelist from the Rack level. I'm thinking of forking this gem and opening up a Pull Request. Any guidance or thoughts before I do that?

Also how does the hosted service treat requests for prerendered pages that return redirects and non-2xx status?

Thanks, Robert

Setting and using a different prerender service using PRERENDER_SERVICE_URL

I have recently discovered this gem and it's pretty amazing.
Now, the thing is, that I have tried to set a different prerender service I have deployed on my own on heroku.
The fact is that I don't know how to handle it.
You mentioned 'export PRERENDER_SERVICE_URL=' but I don't get how to use this.
I have placed it into the terminal and on my remote server's console. I get no results, no differences.

I am trying this mainly because your server is experiencing some issues. So I wanted to try on my own.

Heroku h27 errors galore

I just tried the Prerender service, and was soon faced with almost constant H27 - Client Request Interrupted from Heroku. I removed the prerender_rails gem and they ceased almost immediately.

Is this a known issue? Possibly related to the fact that I wasn't yet paying for Prerender.io or a fault in my use of it?

As always, very sorry if this is a known issue, it was just unexpected for me.

Update: I re-added prerender to my app and signed up for a paid plan. Still getting loads of h27s. Anyone have an idea as to why?

Update 2: A console error when testing locally that comes about now is really puzzling. This also went away when I removed prerender.

Error: [$compile:tpload] Failed to load template: /assets/user/confirmation.html (HTTP status: -1 )

http://errors.angularjs.org/1.5.7/$compile/tpload?p0=%2Fassets%2Flistings%2Fsaa_apartment_item.html&p1=-1&p2=
    at angular.self-09d2205….js?body=1:69
    at handleError (angular.self-09d2205….js?body=1:19196)
    at processQueue (angular.self-09d2205….js?body=1:16171)
    at angular.self-09d2205….js?body=1:16187
    at Scope.$eval (angular.self-09d2205….js?body=1:17445)
    at Scope.$digest (angular.self-09d2205….js?body=1:17258)
    at Scope.scopePrototype.$digest (hint.js:1364)
    at Scope.$apply (angular.self-09d2205….js?body=1:17553)
    at Scope.scopePrototype.$apply (hint.js:1427)
    at done (angular.self-09d2205….js?body=1:11698)

No idea what an HTTP status of -1 is, but i suspect its the return for an indexOf? Really puzzling, and not sure whether I should continue with this pre-render thing.

I'm on Rails 4.2.1, Ruby 2.2.6, Angular 1.5.7, using the latest version of the gem. Followed instructions in documentation to a T, or so I believe.

Any help or documentation to clarify what I may have screwed up is appreciated.

JS not rendering on prerender.io

HI.
I've been trying to implement prerender with my rails app but when I add a url to prerender.io and then view the raw html it just shows a blank white page with some social media share buttons. These buttons are on my application.html page so they don't depend on javascript.

After inspection I found that my minified application.js file is not present at all. Looking at the log files there is no get request for the application.js file. I can see the get for the application.css.

I have no idea where to begin looking for what might be the cause of this.
Please let me know of any files/configs or log files that might help solve this.

Option to ignore query parameters

Prerender is currently holding cache for every query parameter combination of a page, which, for our app is not really necessary. I think it would be a great enhancement to add the option to care about query parameters.

Thoughts?

Is this working?

I tried installing this in my Rails 4.2 app, and it doesn't appear to be doing anything. I've tried on a self-hosted version of Prerender, and your hosted version.

I'm appending ?_escaped_fragment_= to my route, but it's just passing right on to the controller instead of handing off to Prerender.

Production.rb:
config.middleware.use 'Rack::Prerender', protocol: 'https', whitelist: '^/app.*'

301 redirect loop when crawler tries to access website

Hi,

We're using the gem since a pretty long time, but we recently switch our web server from Apache to Nginx. I'm not sure what is going on, but accessing the website using a browser or via curl works fine.
However, Google bot and other search engines face a 301 redirect infinite loop.

What I've been able to figure out so far:

  • Nginx is redirecting once from protocol HTTP (port 80) to HTTPS (port 443).
  • Website is publicly accessible through "regular" client
  • 301 Redirect loop when client is a bot.
  • There's no other handling or special treatment for search engine crawlers than what prerender_rails does.

Example with HTTPie and facebookexternalhit crawler:

This results in a 301 Redirect to the exact same URL

http GET 'https://www.domain.com/' 'User-Agent':'facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)'

This works fine
http GET 'https://www.domain.com/'

Any idea on why this is happening?

Strange problem with cached pages

Every time I made a new deploy of my app (rails + angular) all my cached pages (all of them cached before the deploy - 11 days ago) gets damaged.

All cached pages will have the following error:
Error compiling CSS Asset
Sprockets::FileNotFound: couldn't find file 'bootstrat-sass-official')

screen shot 2016-05-13 at 20 15 53

I can understand this error if it happen when caching pages happened at same time of deploy. But not the case. The deploy was yesterday and all pages was cached 11-to-16 days ago.

Did you already saw something like that ? I need to delete all cached pages after a deploy ?.
Is not the first time this is happening.

Using cache does not set page headers

When using redis cache with the after_render/before_render as described in the readme, the raw text response is returned on cached response, and chrome interpret it as a text file instead of parsing the html.

It seems like some headers are missing when returning a cached response

Doesn't seem to work for root page

I can successfully get example.com/users?_escaped_fragment_= to work, but example.com/?_escaped_fragment_= does not.

I have not configured any blacklist or whitelist settings.

I'm using Rails 4.2 and prerender_rails 1.3

Confusion about the meta fragment content

In the testing section:

If your URLs use push-state:
If you want to see `http://localhost:5000/profiles/1234`
Then go to `http://localhost:5000/profiles/1234?_escaped_fragment_=`

And earlier in the README it talks about adding Just add <meta name="fragment" content="!"> to the <head> of all of your pages. Do we change content="!" to something else? The google bot doesn't fetch the pre-rendered version.

filter caching feature on host or domain ?

Is there any way to filter eligible pages for caching on host or domain ? Except misunderstanding, today, we can only white list or black list on path …

I can see something like this on the Node middleware with host option.

problems with prerender_service_url

Hey there...

Im running the server on http://localhost:3000 and it just works fine (except of the broken css...) when I watch the crawling result or my local rails app via http://localhost:3000/http://mylocalrailsapp.dev

But unfortunately I can't make my rails app work to deliver the result via the middleware.

I added the following lines to my development.rb:

config.middleware.use Rack::Prerender
config.middleware.use Rack::Prerender, prerender_service_url: 'http://localhost:3000/'

When I open http://mylocalrailsapp.dev/?_escaped_fragment_=/ I just receive

<html><head></head><body></body></html>

Am I missing something?

PS: Thanks for that great project and making it open source!

Support Rack 3.x

This gem is currently locked to rack version ~> 2.2.2, however rack 3.x is now released. Can we add support for rack 3.x?

Prerender doesn't work for Heroku

https://www.app_name.com/?_escaped_fragment_= works, but https://app_name.herokuapp.com/?_escaped_fragment_= doesn't work.

Using Heroku and Rails 5.0 and Prerender.io. By applying ?_escaped_fragment_= prerender works for our production app with the main domain, but not for our heroku domain, even though it's the same app.

Thanks!

Using `_escaped_fragment_` instead of user agent

I was wondering why are you using user agent instead of _escaped_fragment_ as in specification? As far as I know the difference between serving prerendered website and bot testing modified URL is that in second situation response time is not counted toward page ranking. Additional bonus would be possibility to simply redirecting bot instead of using Net::HTTP so server would be loaded for shorter time.

Config for adding crawler bot user agents?

I realized that the redditbot which polls for metadata on URLs submitted to Reddit, is not listed in prerender_rails.rb initializer @crawler_user_agents.

Now, I could do and fork, customize it, use that, and then issue a pull request for others. And I see that I can pass the whole list in as an option to the initializer.

It'd be nice if there was an "additional_agents" option which would append (and de-dup) my list with the ones built into the gem. It's nice to get the automatic updates for new bots when they are added to the distribution, while still being able to extend on my own.

Any thoughts? Is there another way to accomplish this cleanly?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.