Git Product home page Git Product logo

dotnet-webcrawler-starter's Introduction

Please, note that you have 5 days to complete the exercise from the day it has been sent out.

Here are the instructions for the Buildit - Wipro Digital Platform Engineer - Cloud exercise :

What we are looking for

There are no tricks or hidden agendas. The purpose of this exercise is for you to demonstrate your technical knowledge, reasoning, and engineering practices using current software development technologies and methods. Please make sure your code is clear and demonstrates your best practices. The exercise should be done as if you were building software to hand off to someone else. Refrain from using this as an opportunity to learn a new framework, library or paradigm besides what you feel would be essential to completing this task.

Your solution will form the basis for discussion in subsequent interviews.

What you need to do

Please write a simple web crawler in C#.

The crawler should be limited to one domain. Given a starting URL โ€“ say http://wiprodigital.com - it should visit all pages within the domain, but not follow the links to external sites such as Google or Twitter. None of the links in your output should end with a slash (/).

The expected output format:

{
    "uri": "https://test.com/about.html",
    "internalLinks": [
        "https://test.com",
        "https://test.com/about.html#",
        "https://test.com/search.html",
        "https://test.com/categories.html",
        "https://test.com/articles/2015-04-23-forum",
        "https://test.com/feed.xml"
    ],
    "externalLinks": [
        "https://groups.google.com/forum/#!forum/test",
        "https://test.tumblr.com",
        "https://www.agilementoring.com",
        "mailto:[email protected]",
        "https://github.com/test",
        "https://twitter.com/test"
    ],
    "images": [
        "/assets/test.svg"
    ]
}

Please update this README.md describing your thought process and the tradeoffs made. Also, detail anything further that you would like to achieve with more time.

Once done, please make your solution available on Github and forward the link. Where possible please include your commit history to provide visibility of your thinking and working practice.

What you need to share with us

  • A working crawler as per requirements above
  • An updated README.md explaining:
  • Reasoning and describe any trade offs
  • Explanation of what could be done with more time
  • Project builds / runs / tests as per instruction

Good luck and thank you for your time - we look forward to seeing your creation.

Running the app

  1. Test the endpoint with curl http://localhost:8080/crawl?url=<-- url to be crawled>

dotnet-webcrawler-starter's People

Contributors

christopherdecker avatar

Watchers

 avatar

Forkers

shipyardtech

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.