Git Product home page Git Product logo

utsusemi's Introduction

utsusemi Build Status

logo

utsusemi = "空蝉"

A tool to generate a static website by crawling the original site.

Using framework

  • Serverless Framework ⚡

How to deploy

:octocat: STEP 1. Clone

$ git clone https://github.com/k1LoW/utsusemi.git
$ cd utsusemi
$ npm install

📝 STEP 2. Edit config

Copy config.example.yml to config.yml. And edit.

🚀 STEP 3. Deploy to AWS

$ AWS_PROFILE=XXxxXXX npm run deploy

And get endpoints URL and UtsusemiWebsiteURL

💣 Destroy utsusemi

  1. Call API /delete?path=/
  2. Run following command.
$ AWS_PROFILE=XXxxXXX npm run destroy

Usage

Start crawling /in?path={startPath}&depth={crawlDepth}

Start crawling to targetHost.

$ curl https://xxxxxxxxxx.execute-api.ap-northeast-1.amazonaws.com/v0/in?path=/&depth=3

And, access UtsusemiWebsiteURL.

force option

Disable cache

$ curl https://xxxxxxxxxx.execute-api.ap-northeast-1.amazonaws.com/v0/in?path=/&depth=3&force=1

Purge crawling queue /purge

Cancel crawling.

$ curl https://xxxxxxxxxx.execute-api.ap-northeast-1.amazonaws.com/v0/purge

Delete object of utsusemi content /delete?prefix={objectPrefix}

Delete S3 object.

$ curl https://xxxxxxxxxx.execute-api.ap-northeast-1.amazonaws.com/v0/delete?path=/

Show crawling queue status /status

$ curl https://xxxxxxxxxx.execute-api.ap-northeast-1.amazonaws.com/v0/status

Set N crawling action POST /nin

Start crawling to targetHost with N crawling action.

$ curl -X POST -H "Content-Type: application/json" -d @nin-sample.json https://xxxxxxxxxx.execute-api.ap-northeast-1.amazonaws.com/v0/nin

Architecture

Architecture

Crawling rule

  • HTML -> depth = depth - 1
  • CSS -> The source request in the CSS does not consume depth.
  • Other contents -> End ( depth = 0 )
  • 403, 404, 410 -> Delete S3 object

utsusemi's People

Contributors

k1low avatar

Watchers

wokamoto avatar James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.