Git Product home page Git Product logo

webscraper-bot's Introduction

webscraper-bot

Discord web scraping bot used to scrape websites with dynamic content and send notifications when there is a new item.

Instructions

1.Adding bot to your server

Simply click the link: https://discord.com/api/oauth2/authorize?client_id=1007981622005596171&permissions=2147485696&scope=bot%20applications.commands

2.Adding new scraper job

Use /create-job command and fill in the form:

  • name - name of the job
  • url - url that the bot will scrape
  • selector - selector for an a tag with link to element (examples below)
  • interval - interval of the job in minutes (min 1 minute)
  • active? - default true, if the job should be active right away (you can always enable/disable job with [enable/disable]-job commands)
  • channel? - defualt channel where the command was run which channel should be messaged when the new item appear
  • clean? - default true, if the query params should be ignored (essential for some sites like Ebay)

image

After that you can do basic CRUD opeartions on jobs with commands like /list-jobs /update-job /delete-job...

3.Waiting...

After that you simply wait for the job run and it will send message when new items are found:

image

You can also run the job manually with /run-job. Note that the first run will get all the elements that will be on the webiste. image

How to get selector

The selector is a querySelectorAll string that the bot uses to get unique a tags linking to items you want to scrape. To verify if your selector is working you can run document.querySelectorAll(<your selector>) in browser console and check if the function returns items you want to scrape.

Examples of selectors

  • Ebay: .srp-river-results .s-item__image a
  • Otodom: [data-cy=listing-item-link]
  • Olx: [data-cy=l-card] > a

Feel free to contribute, there are a lot of things to improve :)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.