Git Product home page Git Product logo

amazon-scraper's Introduction

Amazon Product Scraper

NPM npm

Useful tool to scrape product information from the amazon

If you like this tool then please Star it


Buy Me A Coffee


Features

  • Scrape products from the search result
  • Scrape product data by asin
  • Scrape product reviews
  • Sort result by sponsored products only
  • Sorts result by discounted products only
  • Result can be saved to the JSON/CSV files
  • You can scrape up to 500 produtcs and 1000 reviews

Product List alt text Review List alt text

Note:

  • Empty parameter = empty value

Possible errors

  • If there will be let me know

Installation

Install from NPM

$ npm i -g amazon-buddy

Install from YARN

$ yarn global add amazon-buddy

USAGE

Terminal

$ amazon-buddy --help

Usage: amazon-buddy <command> [options]

Commands:
  amazon-buddy products   scrape for a products from the provided key word
  amazon-buddy reviews    scrape reviews from a product by using ASIN
  amazon-buddy asin [id]  scrape data from a single product by using ASIN

Options:
  --help, -h     help                                                  [boolean]
  --version      Show version number                                   [boolean]
  --keyword, -k  Amazon search keyword ex. 'Xbox one'     [string] [default: ""]
  --number, -n   Number of products to scrape. Maximum 100 products or 300 reviews        [default: 10]
  --filetype      Type of the output file where data will be saved. 'all' - save
                  datat to the ` 'json' and 'csv' files
                            [choices: "csv", "json", "all", ""] [default: "csv"]
  --sort         If searching for a products then list will be sorted by a higher
                 score(reviews*rating). If searching for a reviews then they will
                 be sorted by rating.                 [boolean] [default: false]
  --discount, -d Scrape only products with the discount
                                                      [boolean] [default: false]
  --sponsored     Scrape only sponsored products      [boolean] [default: false]
  --min-rating    Minimum allowed rating                            [default: 1]
  --max-rating    Maximum allowed rating                            [default: 5]
  --host, -H      The custom amazon host (can be www.amazon.fr, www.amazon.de, etc.)
                                            [string] [default: "www.amazon.com"]
  --random-ua     Randomize user agent version. This helps to prevent request
                  blocking from the amazon side       [boolean] [default: false]
  --timeout, -t   Timeout between requests. Timeout is set in mls: 1000 mls = 1
                  second                                   [number] [default: 0]


Examples:
  amazon-buddy products -k 'Xbox one'
  amazon-buddy products -k 'Xbox one' --host 'www.amazon.fr'
  amazon-buddy reviews B01GW3H3U8
  amazon-buddy asin B01GW3H3U8

Example 1

Scrape 40 producs from amazon search result by using keyword "vacume cleaner" and save result to the CSV file

$ amazon-buddy products -k 'vacume cleaner' -n 40 --filetype csv

The file will be saved in a folder from which you run the script: products(vacume cleaner)_1589470796380

Example 2

Scrape 100 reviews from a product by using ASIN. NOTE: ASIN is a uniq amazon product ID, it can be found in product URL or if you have scraped product list with our tool you will find it in a CSV/JSON files

$ amazon-buddy reviews B01GW3H3U8 -n 100

The file will be saved in a folder from which you run the script: reviews(B01GW3H3U8)_1589470878252

Example 3

Scrape 300 producs from the "xbox one" keyword with rating minimum rating 3 and maximum rating 4 and save everything to a CSV file

$ amazon-buddy products -k 'xbox one' -n 300 --min-rating 3 --max-rating 4

The file will be saved in a folder from which you run the script: 1552945544582_products.csv

Module

Promise

const amazonScraper = require('amazon-buddy');

(async () => {
    try {
        // Collect 50 products from a keyword 'xbox one'
        const products = await amazonScraper.products({ keyword: 'Xbox One', number: 50, save: true });
        // Collect products that are located on page number 2
        const reviews = await amazonScraper.products({ keyword: 'Xbox One', bulk: false, page: 2 });
        // Collect 50 products from a keyword 'xbox one' with rating between 3-5 stars
        const products_rank = await amazonScraper.products({ keyword: 'Xbox One', number: 50, rating: [3, 5] });

        // Collect 50 reviews from a product ID B01GW3H3U8
        const reviews = await amazonScraper.reviews({ asin: 'B01GW3H3U8', number: 50, save: true });
        // Collect 50 reviews from a product ID B01GW3H3U8  with rating between 1-2 stars
        const reviews_rank = await amazonScraper.reviews({ asin: 'B01GW3H3U8', number: 50, rating: [1, 2] });

        const product_by_asin = await amazonScraper.asin({ asin: 'B01GW3H3U8' });
    } catch (error) {
        console.log(error);
    }
})();

Event

  • You won't be able to use promises.
  • {sort} and {save} will be ignored
const amazonScraper = require('amazon-buddy');

let products = amazonScraper.products({
    keyword: 'xbox',
    number: 50,
    event: true,
});

products.on('error message', (error) => {
    console.log(error);
});

products.on('item', (item) => {
    console.log(item);
});

products.on('completed', () => {
    console.log('completed');
});
products._startScraper();

JSON/CSV output(products):

[{
    asin: 'B01N6HLV9L',
    discounted: false,  // is true if product is with the discount
    sponsored: false,  // is true if product is sponsored
    amazonChoice: true,// if amazon choice badge is present
    price: '$32.99',
    before_discount: '$42.99', // displayed only if price is discounted
    title:'product title',
    url:'long amazon url'
}...]

JSON/CSV output(reviews):

[{
    id: 'R335O5YFEWQUNE',
    review_data: '6-Apr-17',
    name: 'Bob',
    title: 'Happy Gamer',
    rating: 5,
    review: 'blah blah blah'
}...]

JSON/CSV output(asin):

{
        title: 'Apple iPhone 6S, 64GB, Rose Gold - For AT&T / T-Mobile (Renewed)',
        url: 'https://www.amazon.com/dp/B01CR1FQMG',
        reviews: { total_reviews: 2406, rating: '3.8', answered_questions: 677 },
        price: { current_price: 14.98, discounted: false, before_price: 14.98, savings_amount: 0, savings_percent: 0 },
        images: [
            'https://images-na.ssl-images-amazon.com/images/I/412jWjEIzKL._AC_SY879_.jpg',
            'https://images-na.ssl-images-amazon.com/images/I/41XdO4T0xvL._AC_SY879_.jpg',
            'https://images-na.ssl-images-amazon.com/images/I/31qHuwnKOkL._AC_SY879_.jpg',
            'https://images-na.ssl-images-amazon.com/images/I/21CAx9aDlfL._AC_SY879_.jpg',
        ],
        storeID: 'wireless',
        brand: 'Amazon Renewed',
        badges: { amazonChoice: false, amazonPrime: false },
    }

Options

let options = {
    //Search keyword: {string default: ""}
    keyword: "",

    //Number of products to scrape: {int default: 10}
    number: 10,

    // If {bulk} is set to {false} then you can only scrape products by page. Note that {number} will be ignored
    // Very usefull if you need to scrape products from a specific page
    bulk: true,

    // Search result {page} number
    // You can set this value to 5 and scraper will collect all products starting from the {page} number 5
    page: 0,

    // Enable/disabled EventEmitter: {boolean default: false}
    // If enabled then you won't be able to use promises
    event: false,

    // Save result to a file: {boolean default: ''}
    // You can set ['json', 'csv', 'all', '']
    // 'all' - save result to JSON and CSV files
    filetype: '',

    //Set proxy: {string default: ""}
    proxy: "",

    //Sort by rating. [minRating, maxRating]: {array default: [1,5]}
    rating:[1,5],

    //Sorting. If searching for a products then list will be sorted by a higher score(number of reviews*rating). If searching for a reviews then they will be sorted by rating.: {boolean default: false}
    sort: false,

    //Scrape only products with the discount: {boolean default: false}
    discount: false,

    //Scrape only sponsored products: {boolean default: false}
    sponsored: false,

    //Search on custom amazon host to list products in specific language
    host: "www.amazon.de",

    //Randomize user agent version. This helps to prevent request blocking from the amazon side
    randomUa: false

    //Timeout between requests. Timeout is set in mls: 1000 mls = 1 second
    timeout: 0
};

License

MIT

Free Software

amazon-scraper's People

Contributors

drawrowfly avatar gpenverne avatar nilportugues avatar shaunlwm avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.