Git Product home page Git Product logo

nbtechinterview's Introduction

Tech Interview Checklist and Approachement

1. WebUI

  • Sign in - ✅ WITH AUTH0
  • Sign up - ✅ WITH AUTH0
  • Upload a keyword file - ✅ DONE WITH PAPAPARSE
  • View list of keywords - ✅ REACT-TABLE
  • View the search result information for each keyword - ✅ SHOW WITH REACT-TABLE, htmlCode were embedded into a jsonblob link via their APIs
  • Search across all reports - ✅ DATABASE ACTION WITH pg

2. API

Leveraged the NextJS api routes mostly and serverless with lambda

  • For uploading - POST /api/search - integrated with Lambda behind the scence
  • Lambda Logic Repository: https://github.com/php1301/lambda-google
  • For the sake of quick developement - leveraged the serverless framework with free tier AWS account
  • Searching in Database - POST /api/user-search

3. Technical Requirements

  • ✅ Use a web framework of your choice - NextJS
  • ✅ Use PostgreSQL.
  • ✅ For the interface, front-end frameworks such as Bootstrap, Tailwind or Foundation can be used. Use SASS as the CSS preprocessor - Used Tailwind
  • ✅ Extra points are provided to the neatness and user-friendliness of the frontend.
  • ✅ Use Git during the development process. Push to a public repository on Github or Gitlab. Make regular commits and merge code using pull requests - Integrated linter, unit testing with Github Actions
  • ✅ Write tests using your framework of choice.
  • Optional: deploy the application to a cloud provider e.g. Heroku, AWS, Google Cloud or Digital Ocean -> Working on this or cut this off due to free tier eligible

4. Approachment

  • Overview of ER Diagram for database: image
  • ⛔⛔ The 429 Too Many Requests - it's all about tricking Google to not blocking our request
  • Request factor: Can come from UserAgent, Remote address, IP, Header, Cookies, Fingerprint, Headless Browser, Request random delay...
  • Most viable and easiest approachments are all about rotating those above: mostly are UserAgent and IP address with Paid Proxy (Costly) or setup our own Proxy server(tor) SOCKS -> this would lead to the optional requirement - deploying on Cloud Provider
  • => Cost Optimization, Fast, Headless like Puppeteer is not optimized, Premium proxy maybe is too overkill for this technical assignment
  • => ✅✅✅ Lambda Free Tier Approachment is suitable for this workload -> over freetier can consider about EventBridge for CronJob Daily scraping and Thanks To AWS generous IP pool
  • => Rotating Lambda IP -> Best trick here, we update lambda configuration like Environment image
  • => not guaranteed 100% percent all the time (approx 80-90% of not having 429) -> implement the Axios-Retry with the trick above for new IP image
  • => Stil Not guranteed -> Redeploy the lambda function via aws-sdk or serverless script -> Tried and worked

5. Screenshot

  • Homepage image
  • When uploaded keyword image
  • 94 keywords scraped image
  • Request's time image
  • Database image
  • Searching keyword image

6. Limitations

  • Due to the time limit, the source code maybe not on its best practice (Working on It by actively pushing commit)

7. Reproduction

  • run yarn and add necessary env variables in .example.env
  • available routes:
    • homepage: /
    • keyword searching in database: '/my-keywords'
    • Upload csv of keywords: '/search'
  • Create Database with SQL script
  • keywords.csv in src/mocks folder

Available Scripts

Running the development server.

    yarn dev

Building for production.

    yarn build

Running the production server.

    yarn start

TailwindCSS

A utility-first CSS framework packed with classes like flex, pt-4, text-center and rotate-90 that can be composed to build any design, directly in your markup.

Go To Documentation

SASS/SCSS

Sass is a stylesheet language that’s compiled to CSS. It allows you to use variables, nested rules, mixins, functions, and more, all with a fully CSS-compatible syntax.

Go To Documentation

Axios

Promise based HTTP client for the browser and node.js.

Go To Documentation

Environment Variables

Use environment variables in your next.js project for server side, client or both.

Go To Documentation

Reverse Proxy

Proxying some URLs can be useful when you have a separate API backend development server and you want to send API requests on the same domain.

Go To Documentation

React Query

Hooks for fetching, caching and updating asynchronous data in React.

Go To Documentation

react-use

A Collection of useful React hooks.

Go To Documentation

Zustand

A small, fast and scalable bearbones state-management solution using simplified flux principles.

Go To Documentation

ESLint

A pluggable and configurable linter tool for identifying and reporting on patterns in JavaScript. Maintain your code quality with ease.

Go To Documentation

Prettier

An opinionated code formatter; Supports many languages; Integrates with most editors.

Go To Documentation

lint-staged

The concept of lint-staged is to run configured linter (or other) tasks on files that are staged in git.

Go To Documentation

Testing Library

The React Testing Library is a very light-weight solution for testing React components. It provides light utility functions on top of react-dom and react-dom/test-utils.

Go To Documentation

Cypress

Fast, easy and reliable testing for anything that runs in a browser.

Go To Documentation

Docker

Docker simplifies and accelerates your workflow, while giving developers the freedom to innovate with their choice of tools, application stacks, and deployment environments for each project.

Go To Documentation

Github Actions

GitHub Actions makes it easy to automate all your software workflows, now with world-class CI/CD. Build, test, and deploy your code right from GitHub.

Go To Documentation

License

MIT

nbtechinterview's People

Contributors

php1301 avatar

Stargazers

 avatar

Watchers

 avatar

nbtechinterview's Issues

[Question] Abscence of ORM usage

Issue

The current implementation solely relies on the NPM package for Postgres. Hence, all database queries are hard coded and unsafe (some are open to SQL injections).

const sql1 =
"SELECT *" +
" FROM keyword.user_keyword uskw" +
" INNER JOIN keyword.keywords kw" +
" on uskw.keyword_name = kw.keyword_name" +
" where uskw.user_email LIKE $1 and uskw.keyword_name LIKE $2";
const sql2 =
"SELECT *" +
" FROM keyword.user_keyword uskw" +
" INNER JOIN keyword.keywords kw" +
" on uskw.keyword_name = kw.keyword_name";

Why not use battle-tested solutions such as Prisma or TypeORM?

[Feature] Processing scraping of keywords asynchronously

Issue

Upon uploading the file with keywords, the processing of the latter is done in a loop synchronously, i.e., the code loop through each keyword, scrap the Google page for the content, then insert records in the database:

for (const kw of keywords) {
const { result }: { result: SearchResponseType } =
await getSearchResults(kw.keywords);
const insertKeywordValues = [
result.keyword,
result.totalAds,
result.totalLinks,
`${result.totalResults} - ${result.searchTime}`,
result.htmlLink,
];
const kwId = await (
await client.query(query2, insertKeywordValues)
)?.rows[0]?.keyword_name;
await client.query(query3, [user?.email, kwId]);
results.push(result);
}

However, scraping Google is time-consuming and error-prone. In the current implementation, even when offloading the scraping to a Lambda function, the scraping of keywords is synchronous, hence can lead to a long wait for users. In addition, if any error happens, the whole list of keywords fails.

Expected

The latter should be inserted into the database after uploading the file with keywords. There should be a status attribute to track the scraping status. Then, there should be a process to trigger the scraping of each keyword separately. The scraping outcome of one keyword should not affect the outcome for the other. As a result, some keywords could be processed successfully while others could fail. That is okay :-)

Start of the code review process 👋

Hello Potter 👋 , thank you for your effort on the code submission. I am Olivier from Nimble, and I am happy to be the reviewer for our code review session.

During the review process, I would like to know more about your decisions, and thus I will create issues where I think there could be more improvements regarding your performance here.

At the same time, please keep in mind that this is a bi-directional process, and I would also love to hear back from you. Therefore, do not hesitate to raise your questions or share your opinions about the implementation (if any) during the process.

We expect the code review process to be completed within 2-3 days at most. As a result, ensure you are responsive during this process. If you need additional time, please inform us immediately so we can plan accordingly.

In the end, I do hope that you find the process enjoyable. Good luck and happy coding. 🤘

[Chore] Unit test for core business logic

Issue

While there are unit tests for most of the React components. there are no tests for the API routes where the core business logic is. Asa. the result, there is no guarantee that the code is robust.

Expected

Normally, all critical paths such as authentication, file upload, and parsing of keywords, should be tested. However, since Auth0 is used for authentication, there would be no need to add tests for authentication but there should be tests for the file upload (+ CSV parsing) and the scraping logic as there are the core functionality

[Bug] SQL script cannot be imported

Issue

Upon following the installation steps in the README, I face the issue that the SCL script is not runnable:

image

Expected

The provide SQL script should be runnable or additional indications should be provided in the setup guide.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.