Git Product home page Git Product logo

puppeteer2's Introduction

Puppeteer2 Server

Puppeteer2 is a Node.js server that uses Puppeteer to provide a web scraping API. It allows users to send POST requests with JSON payloads to interact with web pages and retrieve data in various formats.

Features

  • Accepts POST requests with application/json content type.
  • Supports navigation to URLs with optional custom HTTP methods, post data, content type, and headers.
  • Returns data in different formats based on the responseType specified: text, html, links, images, text+links, text+images, links+images, text+links+images, text+links-inline, text+images-inline, text+links+images-inline.

Usage

Send a POST request to http://<server-ip>:5469 with a JSON payload containing the url property and optional parameters. The server will process the request and return the extracted data from the web page.

Example request using curl:

curl -X POST http://<server-ip>:5469 -H "Content-Type: application/json" -d '{"url": "https://example.com", "responseType": "text"}'

Installation

Clone the repository and navigate to the puppeteer2 directory. Run npm install to install dependencies. Set up the puppeteer2.service file in /etc/systemd/system/ to run the server as a service.

Security

The server runs under the puppeteer user with limited privileges for security. It uses a headless Chromium browser to minimize exposure.

Contributing

Contributions are welcome. Please submit pull requests to the master branch.

License

Puppeteer2 is open-source software licensed under the MIT license.

puppeteer2's People

Contributors

rpurinton avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.