Git Product home page Git Product logo

web-crawler-challenge's Introduction

Web Crawler Challenge

PLEASE NOTE: I did not create this challenge, therefore this README is not mine. This challenege was part of my interview for Datafiniti. Please see details of the challenege below.

To run the program: node crawler.js

Please read EXTENSIONS.txt to see how I would have built a more efficient crawler given more time and knowledge of building web scrapers

Purpose

This exercise is designed to test your ability to use object-oriented design principles, data structures and standard algorithms to create a web crawler. We will not only be looking at the data points you have collected from the web, but at the style of your code, its modularity, its extensibility, and ease at which the app can be built and tested. As a small team we believe these principles are a key element of our continued success.

Problem Description

You will need to design and implement a basic web crawler that has two primary functions. The crawler must navigate from a “starting url” to a “listing page” on the website. It must also collect a small number of specified attributes on the page. Usually we collect all products available on a site, but for this coding challenge we would like you to start on amazon’s home page and navigate to the book category. Collect at least 10 books and grab the following information (if it is available on the page).

You will need to design and implement a fully functioning web crawler that can locate the following data points:

  • Name
  • List Price
  • Description
  • Product Dimensions
  • Image URLs
  • Weight

Your application should output its results in a valid and well structured JSON document like the example below:

{
"product": {
"id": 1,
"name": "Sushi at Home: a Mat-To-Table Sushi Cookbook",
"listPrice": 17.99,
"description": "Eating Sushi is Easy. Making Sushi is Even Easier.Let your love of sushi inspire you to prepare and enjoy it in your home. This beautiful guide and cookbook opens a window to everything that's so fascinating--and intimidating--about sushi, while laying out easy-to-follow tips and techniques to help sushi lovers become confident sushi chefs.",
"product_dimension": "8 X 0.6 X 8 inches",
"imageURLs": [
"https://images-na.ssl-images-amazon.com/images/I/611AZDSUHvL._SY496_BO1,204,203,200_.jpg",
"https://images-na.ssl-images-amazon.com/images/I/81ECOQVXVGL.jpg"
],
"weight": "13.9 oz",
"sourceURL": "https://www.amazon.com/gp/product/1623155975/ref=s9_acsd_simh_bw_c_x_1_w?pf_rd_m=ATVPDKIKX0DER&pf_rd_s=merchandised-search-3&pf_rd_r=5S54Z6125KJDKW8DEBTV&pf_rd_r=5S54Z6125KJDKW8DEBTV&pf_rd_t=101&pf_rd_p=fe185ec9-c8f5-44c0-897e-4c0bde93268c&pf_rd_p=fe185ec9-c8f5-44c0-897e-4c0bde93268c&pf_rd_i=283155"
}

}

Once your solution is completed please add an EXTENSIONS.txt file to your solution that notes how your application could be extended to handle the following:

  1. Domains beyond Amazon.com
  2. Products beyond just simply books.

To begin fork this repository to your personal Github account. We ask that you submit your solution within 1 week of forking the repo.

Submission Requirements

  • Use Javascript.
  • You may use any third party libraries you wish. Any dependencies must be fully managed by a standard build tool for the language used.
  • You must follow standard Object Oriented Design principles and techniques (e.g., submissions with only a single class are not worthy).
  • Email us when you have finished your submission.

What We'll Be Looking For

  • Code readability and reusability.
  • Testing is not required, but we'd love to see it.

web-crawler-challenge's People

Contributors

szunjic avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.