Git Product home page Git Product logo

fyle-interview-de-intern's Introduction

Fyle Data Extraction Challenge

Who is this for?

This challenge is meant for candidates who wish to intern at Fyle and work with the ML team.

  • You should be available to start by Sept 7, 2021
  • You should be able to commit to at least 3 months (we strongly prefer 6 months)

Why intern at Fyle?

Fyle is a fast-growing Expense Management SaaS product. We are ~40 strong engineering team at the moment. About 60% of our engineers started off as interns. Interns at Fyle do extremely challenging and impactful work.

People love working at Fyle. Check out our Glassdoor reviews here. You can read stories from our teammates here.

Challenge outline

Under the data directory, you will find 20 receipt directories. Each directory has the following files:

  • An image file that corresponds to a receipt (e.g. data/receipt1/recpu6in7u.jpeg)
  • OCR output that was obtained by running the receipt through AWS Textract (e.g. data/receipt1/ocr.json). You can learn about this file's structure in this document by AWS (link).
  • An expected.json file that contains the receipt amount that should've been extracted

You'll need to fill in a stub function in extract.py called extract_amount that extracts the amount, given the receipt directory. You can choose to extract from the receipt or the ocr.json or combination of both.

Please don't use specific markers in the given receipts in your submission - you need to write a generic solution that works across the test data. You will be disqualified if we see hacks like this.

Local setup

First, fork this repo to your github account (keep it public so it is easy for us to check the submission later).

Then, clone the repo to your laptop.

This codebase requires Python 3.7+. It is recommended to use virtualenv.

Then install all the dependencies.

    pip install -r requirements.txt

You're ready to begin your task.

Your task

Your task is to fix up extract_amount function so that all the tests pass i.e. amounts in all 20 receipts are extracted correctly. You are free to use the receipt image or the AWS Textract output for this purpose - please do not ask us which one to use.

Once all the tests pass locally, take a screenshot of the successful run with 100% tests passing. Commit and push your code to your repository.

Please do not spend more than 3 hours on this task.

Running tests

Run the tests that validate if your extract_amount is working fine against the test data. You can run all the tests using:

    python -m pytest

You will initially see failures. This is expected since the stub function returns a constant 0.0. The output should look like this.

collected 20 items                                                                                                                                               

test_extract.py::test_extract[./data/receipt8] FAILED                                                                                                                      [  5%]
test_extract.py::test_extract[./data/receipt1] FAILED                                                                                                                      [ 10%]
test_extract.py::test_extract[./data/receipt6] FAILED                                                                                                                      [ 15%]
test_extract.py::test_extract[./data/receipt7] FAILED                                                                                                                      [ 20%]
test_extract.py::test_extract[./data/receipt9] FAILED                                                                                                                      [ 25%]
...

If you'd like to run the test against a single directory, run it like this:

    python -m pytest test_extract.py::test_extract[./data/receipt1]

Once you finish your task successfully, all tests should pass.

Fixing styling

Please run this command to check for any linting errors. You can run this command:

    pylint extract.py

If this shows any warnings or errors, please fix them and commit your changes.

Submission

Once you are done with your task, please use this form to complete your submission.

What happens next?

You will hear back within 48 hours from us via email. We may request for some changes based on reviewing your code.

Subsequently, we will schedule a phone interview with a Fyle Engineer.

If that goes well, we'll make an offer.

fyle-interview-de-intern's People

Contributors

snarayanank2 avatar chitrabhat04 avatar madhavmansuriya40 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.