Git Product home page Git Product logo

ghost-town's People

Contributors

tkazec avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ghost-town's Issues

Tests are failing.

So, I have created a small project using ghost-town, but no items that I queued were ever going to my worker, so I decided to run the project test suite to ensure everything was going as expected on my system.

However, when I run the test suite (after running npm install --dev), all I get is multiple instances of the error below:

Error: channel closed
    at process.target.send (internal/child_process.js:509:16)
    at sendHelper (cluster.js:696:8)
    at send (cluster.js:683:5)
    at EventEmitter.cluster._getServer (cluster.js:552:5)
    at listen (net.js:1271:11)
    at net.js:1376:9
    at GetAddrInfoReqWrap.asyncCallback [as callback] (dns.js:62:16)
    at GetAddrInfoReqWrap.onlookup [as oncomplete] (dns.js:81:10)

I am using node 4.1.0 on OSX 10.11. All dependencies installed with no errors.

Ghost-Town on Nodejs 0.10 on Windows Server 2012 R2

Would you happen to know if there is a difference between the way NodeJS works on a windows server as opposed to a linux box? In the ghost-town example, "town.on" never fires as a worker thread.

I have written console log lines to get some information and I have found the following. In "Master.prototype._process = function ()" I noticed that "this._workerQueue.length" is always 0, and thus never falls into the while loop. What ends up happening is "town.on('queue')..." never fires. For this reason, I have not been able to get Ghost-Town to work on a Windows Server 2012 R2. I am running NodeJS v0.10.0 for Windows.

Update to new version

Hi, nice solution, do you plan to update this package dependencies to support new version of phantoms (2.1.0)

Workers stop processing jobs after a while

I'm using GhostTown to render pages via PhantomJS then generate a diff using ResembleJS. On a small scale or on my dev Mac OSX environment, this works swimmingly; but I'm running into a very confusing behavior after leaving the script running for several weeks.

The GhostTown master process adds a log message every time it checks for new jobs and adds a job to the queue. If there are more than a couple jobs in the queue, master should skip this queue-up process and log a different message. I've pasted the entire script below; see the function check_for_new_jobs() for the code that handles the queue-up.

What I'm seeing is that after a while, the queuing process appears to break down. Master adds to the queue continually, but the queue doesn't grow in size and those jobs appear to get silently dropped:

2015-06-22 12:40:26 2139 - MASTER - took 1 job from queue (1 total so far; 0 in queue now)
2015-06-22 12:40:27 2139 - MASTER - took 1 job from queue (2 total so far; 0 in queue now)
2015-06-22 12:40:28 2139 - MASTER - took 1 job from queue (3 total so far; 0 in queue now)
2015-06-22 12:40:29 2139 - MASTER - took 1 job from queue (4 total so far; 0 in queue now)
2015-06-22 12:40:30 2139 - MASTER - took 1 job from queue (5 total so far; 0 in queue now)
2015-06-22 12:40:31 2139 - MASTER - took 1 job from queue (6 total so far; 0 in queue now)
2015-06-22 12:40:32 2139 - MASTER - took 1 job from queue (7 total so far; 0 in queue now)
2015-06-22 12:40:33 2139 - MASTER - took 1 job from queue (8 total so far; 0 in queue now)
2015-06-22 12:40:34 2139 - MASTER - took 1 job from queue (9 total so far; 0 in queue now)
(... and so on until the queue is empty)

When I reboot the server and start the GhostTown script again, it processes around a hundred jobs or so then it starts dropping them in the above pattern again. My main question right now is, How would GhostTown's _itemQueue.length stay at zero (see full script code below) when I add a new job? For reference, in the past I've been able to leave the server running for days on end and it reliably processed several thousand jobs per hour.

It's possible that this isn't an issue with the GhostTown package itself, but I'm at a loss for where else to turn and any ideas or advice would be super appreciated.

Earlier in the logs, I also saw a couple error messages that appear related but I've been unable to figure out how, and I've been unable to reproduce them on-demand, making me think they must only occur when the Node process hits some resource limit:

Documentation can be found at http://nodejs.org/
Debug port must be in range 1024 to 65535.
Usage: node [options] [ -e script | script.js ] [arguments] 
       node debug script.js [arguments] 

Options:
  -v, --version        print node's version
  -e, --eval script    evaluate script
(...and so on)

and:

Documentation can be found at http://nodejs.org/
events.js:85
      throw er; // Unhandled 'error' event
            ^
Error: spawn /usr/local/bin/node EMFILE
    at exports._errnoException (util.js:746:11)
    at Process.ChildProcess._handle.onexit (child_process.js:1053:32)
    at child_process.js:1144:20
    at process._tickCallback (node.js:355:11)

The EMFILE error makes me think that perhaps too many files are getting opened, but this system's ulimit is set to unlimited.

The command I use to start the GhostTown script is:

node lib/ghost_town.js redis_server redis_port >> log/ghost_town.log 2>&1 &

And the full GhostTown script is pasted here:

#!/usr/bin/env node
//
// GhostTown server for managing a PhantomJS render farm
//
// Usage: node ghost_town.js [host] [port]
// To run it locally (eg. in development; output appears in console):
//   node ghost_town.js 2>&1
// To connect to remote server, log all output to file, and send to background:
//   node ghost_town.js [host] [port] >> ghost_town.log 2>&1 &

var os   = require("os");
var town = require("ghost-town")({
  // Keep a couple spare CPUs as a buffer. This appears to be the difference
  // between "tolerable" and "molasses"
  workerCount: Math.round(os.cpus().length * 0.75),
  // If workers are left to run forever, they often get stuck somewhere in the code (apparently the `page.evaluate()` callback) and never move on to another job. To prevent the worker pool from dying out in this way, we use `workerShift` to force-kill each worker after a reasonable interval. The GhostTown author advises against this; see https://github.com/Buzzvil/ghost-town/issues/12.
  // This by itself would make it likely that the worker times out in the middle of a legitimate job, so we also set `workerDeath` to a low number to ensure that the timeout deadline will rarely be reached in the normal course of non-buggy jobs.
  // Another solution would be to set a timer when the worker begins each job so that the worker kills itself if the job timeout interval passes, but then we have to worry about how to cancel the timer if the job succeeds (so the worker doesn't kill himself in the middle of the NEXT job).
  workerShift: 90000,
  workerDeath: 2,
  pageDeath: 30000,
  pageTries: 0 // If a job times out, give up. No second chance.
});

function log(message){
  date_string = new Date().toISOString().
    replace(/T/, ' ').
    replace(/\..+/, '');
  console.log(date_string + ' ' + process.pid + ' - ' + message);
}

if (town.isMaster) { // MASTER: poll Redis for new jobs and queue them locally
  log("MASTER - starting up.");
  var num_queued = 0;
  var num_completed = 0;
  var time_started = new Date();

  var redis = require("redis");
  var redis_host = process.argv[2] || "127.0.0.1";
  var redis_port = process.argv[3] || "6379";
  var redis_client = redis.createClient(redis_port, redis_host);

  function check_for_new_jobs(){
    if (can_handle_new_jobs()) {
      redis_client.lpop("ghost_town:new_jobs", function(err, url){
        if (url){
          num_queued ++;
          log("MASTER - took 1 job from queue ("+num_queued+" total so far; "+town._itemQueue.length+" in queue now)");
          prepare_status_and_queue(url);
        } else {
          log("MASTER - no jobs in queue.");
        }
      });
    } else {
      log("MASTER - skipping queue-up.");
    }

    setTimeout(check_for_new_jobs, 1000);
  }

  function can_handle_new_jobs(){
    // Simple load balancer: we only accept jobs at the rate we can handle them
    return town._workerCount > town._itemQueue.length;
  }

  function prepare_status_and_queue(url){
    // The current job status may contain the reference image base64 for diffing.
    get_status(url, function(raw_status){
      if (raw_status) {
        queue_job(url, raw_status);
      } else { // This likely means an error with queuing.
        set_status(url, { error: "Initial job status not found on Redis." });
      }
    });
  }

  function queue_job(url, raw_status){
    var old_status = JSON.parse(raw_status);
    var job_data = { url: url, reference: old_status.reference };

    set_status(url, { status: "working" });

    town.queue(job_data, function(error, data){ // job_is_complete handler
      if (! data) { var data = {}; }
      data.status = "complete";

      if (error) {
        log("WORKER - TIMED OUT on url " + url);
        data.error = "GhostTown worker timed out on this page"
      } else {
        num_completed ++;
        log("WORKER - COMPLETED url "+url);
      }

      set_status(url, data);
    });
  }

  function report_jobs_per_hour(){
    var seconds_passed = (new Date() - time_started) / 1000;
    var completed_per_hour = Math.round(num_completed / seconds_passed * 60 * 60);
    var num_failed = (num_queued - num_completed - town._itemQueue.length);
    var failed_per_hour = Math.round(num_failed / seconds_passed * 60 * 60);

    log("MASTER - THROUGHPUT: " +
      completed_per_hour + " successful / " +
      failed_per_hour + " failed per hour");

    setTimeout(report_jobs_per_hour, 10000);
  }

  function get_status(url, callback){
    var redis_key  = "ghost_town:" + url;
    redis_client.get(redis_key, function(error, value){ callback(value); });
  }

  function set_status(url, status){
    var redis_key = "ghost_town:" + url;
    redis_client.set(redis_key, JSON.stringify(status), "EX", 60*60);
  }

  setTimeout(check_for_new_jobs,   1000);
  setTimeout(report_jobs_per_hour, 10000);
} else { // WORKER: accept a job from the queue, process it and return the result
  log("WORKER - starting up.");

  // `resemble` package is great but doesn't have an option for
  var Resemble = require("resemble");

  function configure_phantom_page(page){
    // $page is from `phantom` package so syntax varies from native PhantomJS.
    // See https://www.npmjs.com/package/phantom for a rundown
    page.set('viewportSize', { width: 1000, height: 1 }); // height autoadjusts
    page.set('settings.resourceTimeout', 8000);
    page.set('onError', function(){ }); // Ignore JS errors on page load
  }

  function generate_diff(reference, image, callback){
    // Resemble can't decode base64; it needs raw binary data.
    Resemble.
      resemble(decode_base64(reference)).
      compareTo(decode_base64(image)).
      ignoreAntialiasing().
      onComplete(function(diff_data){ callback(diff_data); });
  }

  function decode_base64(string){
    return new Buffer(string, 'base64');
  }

  function prepare_diff_results(diff_data, output){
    output.pct_changed = diff_data.misMatchPercentage;

    if (output.pct_changed >= 0.1){
      output.is_changed = "true";
      // getImageDataUrl generates full base64 string incl. tags
      output.diff = diff_data.getImageDataUrl();
    } else {
      output.is_changed = "false";
      output.pct_changed = 0;
    }
  }

  function seconds_since(start_time){
    var end_time = new Date();
    return (end_time - start_time) / 1000;
  }

  town.on("queue", function(page, job_data, job_is_complete){
    var start_time = new Date();
    var url        = job_data.url;
    var reference  = job_data.reference;
    log("WORKER - starting on url " + url);

    configure_phantom_page(page);

    page.open(url, function(status){
      if (status == "success") {
        // log("WORKER - Loaded url " + url);
        setTimeout(function(){ // Wait a couple seconds for the JS to settle
          page.evaluate(
            function(){ return document.body.offsetHeight; },
            function(page_height){
              page.set('clipRect', {
                top: 0,
                left: 0,
                width: 1000,
                height: Math.min(page_height, 1500)
              });

              page.renderBase64("jpeg", function(image){
                // renderBase64 doesn't generate starting tags, so add them manually
                var output = { image: "data:image/jpg;base64," + image };

                // Generate and include diff if available
                if (reference){
                  generate_diff(reference, image, function(diff_data){
                    prepare_diff_results(diff_data, output);
                    output.seconds = seconds_since(start_time);
                    job_is_complete(null, output);
                  });
                } else {
                  output.seconds = seconds_since(start_time);
                  job_is_complete(null, output);
                }
              });
            }
          );
        }, 2000);
      } else {
        log("WORKER - failed to load url " + url);
        var output = {
          error: "The url failed to load.",
          seconds: seconds_since(start_time)
        };
        job_is_complete(null, output);
      }
    });
  })
}

Thanks in advance!

page.evaluate always undefined

No matter how I try it, I cannot get a page.evaluate to return me any info about the page. I've tried setting wait timeouts and intervals and nothing gets it to work. page.render works without issue, what am I missing from page.evaluate()?

Thank you!

`town.on("queue", function (page, data, next) {

    page.open(data.Url, function () {
        page.includeJs("http://ajax.googleapis.com/ajax/libs/jquery/1.6.1/jquery.min.js", function () {
            //setTimeout(function () {
                var btnCount = page.evaluate(function () {
                    return $("button").length;
                });
                console.log(btnCount);
            //}, 2000);
        });
    });
});`

RPC GhostTown

Hi, I'm attempting to run GhostTown as an RPC using the Request/Reply socket types. Everything is working great, until it is time to respond back with a message. I've tried responding directly back to the replyTo, but I'm getting:

Error: Channel closed by server: 403 (ACCESS-REFUSED) with message "ACCESS_REFUSED - queue name 'amq.gen-sMgjS-5c4qvKKR2ArMhKTA' contains reserved prefix 'amq.*'

Which makes sense that it's a reserved queue name. But if I can't write to it, how do I reply?

Request:

context.on("ready", function () {
    var request = context.socket("REQ", {expiration: TIMEOUT}); 
     request.setEncoding('utf8');
     request.on("data", function (message) {
      process.nextTick(function () {
        request.close();
        res.send(message, {});
      });
    });
     request.connect("RPCQueueName", function () {
      var message = JSON.stringify(req.body);
      console.log(request);
      request.write(message, "utf8");
    });
  });

Reply:

town.on("queue", function (page, json, next) {
reply.connect(json.ReplyTo, function () {
                            reply.setEncoding('utf8');
                            reply.write(JSON.stringify({ Response: 'response' }), "utf8");
                        });
});

I've tried messing with the replyTo and correlationId to no avail. Do you know of any ways to reply directly to an AMPQ queue - or another method?

Memory is leaking

Hello,

When a trivial expressjs app is running with ghost-town my app memory is leaking. Do you have same issues on production?

screen shot 2015-02-21 at 10 12 47 pm

Workers get "frozen" after page times out

We're using GhostTown to manage PhantomJS renders and image processing tasks that are likely to fail often on slow-loading pages etc. Recently when queuing up a large number of items I noticed that the GhostTown server eventually stops processing all jobs; the Master continues to check the queue, but the four Workers appear to have frozen or died.

Here's a highly simplistic script in which the worker never returns the job (presumably because of some slow-loading page or intricate bug):

var town = require("ghost-town")({ pageDeath: 1000, pageTries: 0 });

if (town.isMaster) { // MASTER: poll Redis for new jobs and queue them locally
  console.log("MASTER starting up.");

  function check_for_new_jobs(){
    console.log("MASTER queuing up job.");

    town.queue({}, function(error, data){
      if (error) {
        console.log("WORKER timed out!");
      } else {
        console.log("WORKER job complete.");
      }
    });

    setTimeout(check_for_new_jobs, 1000);
  }

  setTimeout(check_for_new_jobs, 1000);
} else { // WORKER: accept a job from the queue, process it and return the result
  console.log("WORKER starting up.");

  town.on("queue", function(page, job_data, next){
    console.log("WORKER starting job.");
    // Some failure state means that this worker never calls next()
  });
}

Note the pageDeath setting. When I run this script, I would expect the following behavior:

  • Master queues up jobs once per second
  • The worker starts a job
  • 1 second later, the job times out, and the worker starts on the next one in the queue
  • Workers continue starting jobs and timing out indefinitely

Instead, here's an output of the behavior I see:

MASTER starting up.
WORKER starting up.
WORKER starting up.
WORKER starting up.
WORKER starting up.
MASTER queuing up job.
WORKER starting job.
MASTER queuing up job.
WORKER starting job.
WORKER timed out!
WORKER timed out!
MASTER queuing up job.
WORKER starting job.
WORKER timed out!
MASTER queuing up job.
WORKER starting job.
WORKER timed out!
MASTER queuing up job.
MASTER queuing up job.
MASTER queuing up job.
MASTER queuing up job.
MASTER queuing up job.
MASTER queuing up job.
MASTER queuing up job.
MASTER queuing up job.
MASTER queuing up job.
MASTER queuing up job.
MASTER queuing up job.
MASTER queuing up job.
MASTER queuing up job.
MASTER queuing up job.

As you can see, the four workers start on exactly 4 jobs. Each job times out after 1 second (as expected), but the worker never moves on to another job, it just hangs indefinitely.

The above script is obviously built to fail, but my much more complex script boils down to the same thing: when a job times out, its worker doesn't move on to the next item in the queue for some reason.

Any idea why this is happening? Any workaround you can think of? Thanks in advance!

Screen Capturing

Hi

How can I capture screen using ghost-town.
I tried
town.on('queue',function(page , data , next){ page.open('http://google.ca') .then(status => { console.log(status); page.render("google.png"); }); });

`

Add onStdout and onStderr to options for phantomjs-node

Can the onStdout and onStderr be added to the options list passed to phantomjs-node so that it could be possible to catch the output send to /dev/stdout from a phantomJS script.

A use case would be rendering a PDF file with phantomJS to stdout, then catching the rendered output and passing it along to other components concerned about PDF content.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.