Git Product home page Git Product logo

amazon-order-history's Introduction

Amazon Order History Web Scraper

Uses Selenium to simulate login and going through all the users orders. Saves the received data in a json file for later evaluation.

Currently only works for the german version of Amazon (amazon.de). For amazon.com users there is already a built in feature to export your data to a csv file.

Install

  1. clone the repo https://github.com/MaX-Lo/Amazon-Order-History.git

  2. install requirements pip install -r requirements.txt

  3. Make sure your Geckodriver is installed and on your PATH variable. For convenience there is a bash script in the project root dir for that. This script downloads the latest version of geckodriver, makes it executable and puts in the /usr/share/bin which is already in the PATH by default. It need sudo permission to do so though. For the Skript run:

chmod +x geckodriver_installer.sh

./geckodriver.sh.

Usage

If you are using a device where you've never logged in before, Amazon might require a confirmation code from an email it has send to you. Therefore it can be necessary to log into Amazon with your browser before using that script on a new device. After logging in the first time there shouldn't be anymore email confirmations necessary. The same applies if you have two-factor authentication activated.

Scraping

python -m scraping scrape --email [email protected] --password 123

If you don't want your password appearing in the bash history or on the terminal output, you can create a pw.txt in the project root directory (Amazon-Order-History), which contains your password and don't use the password parameter.

In case of import errors pay attention to start the script from the main folder (Scraping) and not from inside Scraping/scraping

Evaluation

python -m scraping dash starts a flask server (should be under http://127.0.0.1:8050/)

Help

There are some optional parameters available, python -m scraping --help shows a description for each of them.

amazon-order-history's People

Contributors

frauelster avatar henne90gen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

amazon-order-history's Issues

scrape with orders.json and start and end flag

If last scraped order in orders.json is more recent than end flag, than start date is more recent as end date and it crashes.

Example:
orders.json last date 2019

python -m scraping scrape --email [email protected] --password password --start 2015 --end 2017

--> start becomes 2019 and is therefore greater than end

Calculating item price via details page is incorrect

If a item has no price we try to get that price via the details page by simply looking for the string behind "Summe:". This results in:

  • not checking for discounts which are subtracted afterwards
  • setting the full order price for a item when having multiple items in an order is very wrong
  • setting the price as string instead of float is inconsistent with normal prices being floats

Not all items in order detected

If one order contains multiple items they get grouped by the seller. For different sellers curently only each first item is detected.

Support for special price tags

Sometimes the price field for single item doesn't contain the price but some string as "Blitzangebote". Solution could be

  1. setting price to 0.0, if no reconstruction is possible
  2. setting price to total order price, for orders with only one item
  3. reconstruct prize from subtracting prices of other items from total order price, remainder is this items price

Extended scraping option

Scraping the category for later grouping/evaluation would be useful. Requires going through each item individually and therefore is probably better to do it just as an option.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.