toypaj / browsercrawler Goto Github PK
View Code? Open in Web Editor NEWThis project forked from spullara/browsercrawler
Crawl websites from your browser and save them in S3
Home Page: http://browsercrawl.com
This project forked from spullara/browsercrawler
Crawl websites from your browser and save them in S3
Home Page: http://browsercrawl.com
WHAT The BrowserCrawler plugin for Safari will pull all the linked pages under a particular root on a website and upload them directly to your S3 bucket. There is no limit to the depth it will crawl. WHY? Sometimes you want a permanent copy of something you find on the web. For example, I used it to grab a copy of the Mozilla Javascript documentation to have offline. Use it to backup your own data on sites that require authentication but don't have a great data portability solution. HOW 1. Install the plugin either by downloading it or building it in Safari with the built-in Extension Builder 2. Configure your AWS S3 settings in the preferences 3. When you are on a page where you want to start the crawl either click the spider button or right click and start crawl. It will continue the crawl as long as the page is open, updating a progress box with the current URL it is crawling. It will only crawl pages at the same level or below as the seed page. You can cancel at any time by clicking "Cancel". WARNING BrowserCrawler will crawl the site as YOU and so will capture any data that you would normally have access to using your cookies. It will not run javascript but it does download the images (though doesn't upload them), this can use a tremendous amount of bandwidth if you happen to crawl a big website. Also, be careful with small sites, they might not have the capacity to endure a full-speed crawl, even from a single machine. CREDITS Thanks to l.m.orchard at pobox.com for the S3 library and Paul Johnston, Greg Holt, Andrew Kepert, Ydnar, and Lostinet for the SHA1 implementation that it uses. JQuery 1.5.2 isn't included but it is used by the plugin to do the actual crawl.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.