Git Product home page Git Product logo

craigslistscraper's Introduction

CraigslistScraper

Note: CraigslistScraper is for personal use and data science only.

CraigslistScraper is a web scraper for craigslist. Users define what they would like to search for then CraigslistScraper pulls ad data from their defined search and places it neatly inside of a JSON file. Users can filter out keywords and extract specs of the items.

3# Table of Contents

=================

Usage

Example:

Write a configuration similar to the example configuration at config/config.json. Pass in the configuration path to the Scraper.

{
    "City": "chicago",
    "Item": "mac",
    "SearchFilter": {
        "hasPic": 1,
        "min_price": 30,
        "max_price": 500,
        "postedToday": 1
    },
    "PostContentFilter": {
        "TitleMustHaveList" : ["macbook", "mac book", "macbook pro", "mac book pro"],
        "TitleBlackList" : ["case", "cover", "sleeve", "bag", "charger", "adapter", "screen"],
        "DescriptionBlackList" : ["dead", "broken", "cracked", "damaged", "faulty", "not working"],
        "DescriptionMustHaveList" : ["pro"]
    },
    "KeywordExtraction" : {
        "NumberedSpecs" : [
            "gb", "inch", "Ghz", "MHz"
        ],
        "Specs": [
            "retina", "touch bar"
        ]
    }
}

The Scraper takes in a path to the user configuration file and performs the scraping. It currently pretty prints the results..

from craigslistscraper import CraigslistScraper 

def main():
    scraper = CraigslistScraper('config/config.json')
    scraper.scrape()

Here is an example of the results:

{
    "title": "Macbook Pro Retina 13 inch laptop",
    "price": 175,
    "link": "https://chicago.craigslist.org/sys/d/xxxx/yyyy.html",
    "detail": [
        [
            " (xxxxxx)"
        ],
        [
            "condition",
            "good"
        ],
        [
            "make / manufacturer",
            "apple"
        ]
    ],
    "description": "i have a macbook pro retina early 2013 for sale."
}

Users can also filter for Specs, in which case the results are in this format:

{
    "title": "2011 MacBook Pro 13 $200 OBO",
    "price": 200,
    "link": "https://chicago.craigslist.org/sys/d/xxxx/yyyy.html",
    "keywords": [
        "16.0 gb",
        "13.3 inch",
        "2.8 ghz",
        "1333.0 mhz"
    ]
}

Note #1: Filters are user defined. Check the supported schema at config/options.json and an example configuration at config/config.json. Current schema supports all search filter options available on Craigslist, and allows strict keyword filtering for title and description.

Note #2: For a list of cities view the craigslistscraper/data/cities.csv file

License

Distributed under the MIT License. See LICENSE for more information.

craigslistscraper's People

Contributors

ryanirl avatar abracax avatar

Stargazers

AnnaLLL555 avatar

Forkers

annalll555

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.