raybeam / zartan Goto Github PK
View Code? Open in Web Editor NEWWeb UI to create and manage http proxies from one or more sources.
License: BSD 3-Clause "New" or "Revised" License
Web UI to create and manage http proxies from one or more sources.
License: BSD 3-Clause "New" or "Revised" License
At present there is no authentication or authorization for zartan. Implement some sort of locking mechanism such as Oauth/local Accounts to prevent anyone from getting a proxy.
If we decommission a proxy and we provision a new proxy with the same IP address then that proxy will probably under perform. Find some way to prevent that IP address from being used for scraping without entering into a tight decommission/provision loop where we keep getting the same IP address.
Another source class would allow us to start using proxies from a different IP address pool when one or the other starts to get bad.
When viewing a proxy page, there should be some indication whether the proxy has been deleted and if it was removed from any sites.
2.2.0 :001 > proxy = Proxy.find(13)
Proxy Load (1.5ms) SELECT "proxies".* FROM "proxies" WHERE "proxies"."id" = ? LIMIT 1 [["id", 13]]
=> #<Proxy id: 13, host: "104.131.93.67", port: 8888, source_id: 1, deleted_at: "2015-03-10 19:12:16", created_at: "2015-03-10 16:05:10", updated_at: "2015-03-10 19:12:16">
2.2.0 :002 > proxy.proxy_performances.each {|p| puts p.inspect}
ProxyPerformance Load (1.2ms) SELECT "proxy_performances".* FROM "proxy_performances" WHERE "proxy_performances"."proxy_id" = ? [["proxy_id", 13]]
#<ProxyPerformance id: 23, proxy_id: 13, site_id: 2, reset_at: nil, deleted_at: "2015-03-10 19:12:11", created_at: "2015-03-10 16:11:36", updated_at: "2015-03-10 19:12:11", times_succeeded: 0, times_failed: 2>
2.2.0 :001 > proxy = Proxy.find(13)
Proxy Load (1.5ms) SELECT "proxies".* FROM "proxies" WHERE "proxies"."id" = ? LIMIT 1 [["id", 13]]
=> #<Proxy id: 13, host: "104.131.93.67", port: 8888, source_id: 1, deleted_at: "2015-03-10 19:12:16", created_at: "2015-03-10 16:05:10", updated_at: "2015-03-10 19:12:16">
If there's more than one site then we can run into a situation where newly provisioned machines get added to the wrong site. For example, if site 1 provisions 1 new proxy, but site 2 provisions 100 new proxies then there exists a race condition where site 1 can detect the new proxies first and add them all to site 1. Site 2 won't get any of those proxies unless it's still under the minimum number of proxies.
On one of the production installations, I've observed that there are some live (not soft_deleted) proxies in Zartan that do not correspond to any in DigitalOcean. Since the actual proxy server backing the Proxy
instance does not exist, this is slowing down client processes and causing a number of errors.
Add a global activity view to Zartan that displays recent activity. Suggested events:
Goal: With a given Joyent datacenter, delete all of the proxies within that datacenter. Make this only available to run in the console.
Allow users to submit an error string when reporting proxy failures.
Zartan should treat these strings as opaque identifiers and present for each site, proxy, and (site, proxy) the number of each error type reported.
Clients may choose any string to represent an error. The string should identify the type of error encountered, not the specific details. For example, if the client is a scraper reporting an HTTP 500 response, any of "http500"
, "500"
, or "InternalServerError"
would be appropriate error strings, whereas the entire backtrace of the exception thrown would not be appropriate.
Now that zartan is relatively stable, we need to document what it does in Readme.md.
Add api keys and api key verification in the api controller(s)
If we delete site from the console then we should delete its redis objects as well, both to save space on redis and in case we happen to create a new site in the future with the same ID.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.