Git Product home page Git Product logo

zartan's People

Contributors

awong-raybeam avatar cmihm31 avatar davidh-raybeam avatar wjduenow avatar zeo210 avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

ilieash

zartan's Issues

Authenticatication and Authorization

At present there is no authentication or authorization for zartan. Implement some sort of locking mechanism such as Oauth/local Accounts to prevent anyone from getting a proxy.

Protection against IP address re-use

If we decommission a proxy and we provision a new proxy with the same IP address then that proxy will probably under perform. Find some way to prevent that IP address from being used for scraping without entering into a tight decommission/provision loop where we keep getting the same IP address.

Add another Source class

Another source class would allow us to start using proxies from a different IP address pool when one or the other starts to get bad.

Proxy view page does not indicate the proxy is deleted

When viewing a proxy page, there should be some indication whether the proxy has been deleted and if it was removed from any sites.

2.2.0 :001 > proxy = Proxy.find(13)
  Proxy Load (1.5ms)  SELECT  "proxies".* FROM "proxies" WHERE "proxies"."id" = ? LIMIT 1  [["id", 13]]
 => #<Proxy id: 13, host: "104.131.93.67", port: 8888, source_id: 1, deleted_at: "2015-03-10 19:12:16", created_at: "2015-03-10 16:05:10", updated_at: "2015-03-10 19:12:16"> 
2.2.0 :002 > proxy.proxy_performances.each {|p| puts p.inspect}
  ProxyPerformance Load (1.2ms)  SELECT "proxy_performances".* FROM "proxy_performances" WHERE "proxy_performances"."proxy_id" = ?  [["proxy_id", 13]]
#<ProxyPerformance id: 23, proxy_id: 13, site_id: 2, reset_at: nil, deleted_at: "2015-03-10 19:12:11", created_at: "2015-03-10 16:11:36", updated_at: "2015-03-10 19:12:11", times_succeeded: 0, times_failed: 2>

zartan_error

Sites view page lists deleted proxies

2.2.0 :001 > proxy = Proxy.find(13)
  Proxy Load (1.5ms)  SELECT  "proxies".* FROM "proxies" WHERE "proxies"."id" = ? LIMIT 1  [["id", 13]]
 => #<Proxy id: 13, host: "104.131.93.67", port: 8888, source_id: 1, deleted_at: "2015-03-10 19:12:16", created_at: "2015-03-10 16:05:10", updated_at: "2015-03-10 19:12:16"> 

zartan_error

Provisioning new machines to the correct sites

If there's more than one site then we can run into a situation where newly provisioned machines get added to the wrong site. For example, if site 1 provisions 1 new proxy, but site 2 provisions 100 new proxies then there exists a race condition where site 1 can detect the new proxies first and add them all to site 1. Site 2 won't get any of those proxies unless it's still under the minimum number of proxies.

Zartan's proxy list can become out of sync

On one of the production installations, I've observed that there are some live (not soft_deleted) proxies in Zartan that do not correspond to any in DigitalOcean. Since the actual proxy server backing the Proxy instance does not exist, this is slowing down client processes and causing a number of errors.

Add an activity view

Add a global activity view to Zartan that displays recent activity. Suggested events:

  • Requesting a new proxy from a source and receiving one
  • Requesting a new proxy from a source but timing out waiting for it
  • Identifying a previously provisioned but untracked proxy
  • Decommissioning an existing proxy

Purge Joyent Datacenter

Goal: With a given Joyent datacenter, delete all of the proxies within that datacenter. Make this only available to run in the console.

Add error type tracking

Allow users to submit an error string when reporting proxy failures.

Zartan should treat these strings as opaque identifiers and present for each site, proxy, and (site, proxy) the number of each error type reported.

Clients may choose any string to represent an error. The string should identify the type of error encountered, not the specific details. For example, if the client is a scraper reporting an HTTP 500 response, any of "http500", "500", or "InternalServerError" would be appropriate error strings, whereas the entire backtrace of the exception thrown would not be appropriate.

Readme

Now that zartan is relatively stable, we need to document what it does in Readme.md.

API Key

Add api keys and api key verification in the api controller(s)

Delete redis objects when deleting site

If we delete site from the console then we should delete its redis objects as well, both to save space on redis and in case we happen to create a new site in the future with the same ID.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.