Git Product home page Git Product logo

ogpreview's Introduction

OpenGraph Previewer

This small application demonstrate some parts of the web development done by Espartaco Palma.

See instructions to execute this app in Executing OgPreview section

Goals

The goal of the application is to obtain the images offered by a given URL, using the OpenGraph tags when available. This app exposes an API that can be used as a Microservice serving the resolution of "Thumbnails" of websites, and also provides a standalone app for review.

OgPreview in action

The process for the Previewer application includes:

  • Validate input
  • Process the request
  • Execute callbacks for polling

Validate input

The web is a weird and wild place to be, that's why we should validate any user input. For minimal aspect we can roll out our own validation nesting if while receiving the parameters, but here I'm using a gem specifically created for this purpose: Dry::Validation, with this we save a lot of boilerplate and even be able to control some business rules:

I'm applying minimal rules for the user input but it can be expanded:

  • Minimal input should be 4 chars long
  • Maximal input should be less than 300 chars long
  • The input should have a valid schema http or https

This small piece will return errors by default, which we can just send back to the caller if needed at controller level.

We don't process unless the parameters complains with the rules.

Processing the request

This step is also break on small testable pieces:

  • Verify reachability
  • Persisting the requested URL input
  • Obtaining the Opengraph
  • Persisting the images

Verify reachability

Before trying to parse I'm issuing HEAD request to the already validated input. The gem HTTP is using here instead of the Net::HTTP library included in ruby's standard library because their flexibility and security.

It can be configured to have timeouts, which can be used in the future.

Persisting the requested URL input

We will persist the requested URLs, this records will be used in the future to update the status of them, initially the status will be enqueued, meaning we receive the petition, and will be processing it soon.

If there's some problem persisting the data in the database, their status will be changed to 'error' and log the problem.

Obtaining the Opengraph

The URL was already validated, we are sure we can reach the URL and persisted it as the reference. So we will try to obtain the OpenGraph's metadata. At this point the status will be changed to parsing.

I'm using the gem called OpenGrahp Parser, I've used this gem in the past instead of the recommended on Facebook's OpenGrah website, because it doesn't have a dependency with a old and pinned version of nokogirigirl gem that has not been patched a long time ago, having a huge security problem.

If there's any issue getting the metadata, the status of the request is changed to errorr and log the problem.

Persisting the images

Once obtained the metadata we will proceed to persist any images declared by the metadata.

In this step the status is changed to downloading. I'm using the gem called Down to download the image from specified on the metadata. The rationale goes about how flexible this gem is. With Down we can configure multiple performance and security aspects such as:

  • Timeouts
  • Size of the file to download
  • Chunk reading

Per every image downloaded we will be attaching them to the database using the Rails own engine ActiveStorage, which can also be configured to perform other task as resizing (which I'm not using right now). ActiveStorage can also be configure to use many different backend as Amazon S3, Azure, GCP at this time.

Once we download all the images, the status change to ready.

Creating Step process and Modules

In order process the request I'm using a gem called Dry::Monads, this give us the ability to create modules that follow the Monads principles of computing, where, once you execute a given function you either return a Success or Failure. For example:

d = Downloader.get('http://example.com/image.jpg')
if d.success?
  # continue working
else
  # Fallback
end

The above looks simply enough to just use a nil checking, or use exceptions. But when combined with Do Notation, the power of the Monads can be reached. Instead of nested conditions and try..catch blocks, we just do small Monads and chain them in notations that looks similar to what functional programming has:

with(url)
 |> create_tracking
 |> verify_website
 |> parse_opengraph
 |> extract_images
 |> persist_as_done

No if needed nor nil checks, not need for catching exceptions. This approach is also called the Railway Oriented Programming. The process is done in the class Transaction:

  def call(uri, user_id, job_id)
    url = yield create_url(uri, user_id, job_id)
    parsed = yield opengraph(url)
    images = yield extract_images(parsed)
    attach_images(images)
  end

The module will be trying to do their job, and continue if and only if the previous step was successful execute (returned a Success object), otherwise (it return a Failure), halting the execution.

At the end, the whole process is having no business logic but this:

  transaction = Transaction.call(...)
  Rails.logger.error(transaction.failure) if transaction.failure?

The above can be read as "Do the Transaction, if you were not able, just log the error".

The benefit is a process better, focused, and composable enough to operate by their own, and highly testable on isolation. When using the mixing the job is done and the caller, in this case the ActiveJob, has less knowledge of what is happening and how.

Callback for polling

When the user request the URL for Preview we immediately respond with an acknowledge token, preventing any blocking and letting the caller (in this case the standalone web page) to ask for this specific token on a polling basis. The website is configured to asynchronously request for update of the given token, and take decision based on the response and statuses.

If any other client request the same website, we will try to verify if we have it on our records and issue the same token, letting them call back as if this was the same user that first requested. This may add some round trips but also simplify the logic on controller level.

The poller will be doing a GET request to /status using the acknowledge parameter. Since the processing is also done asynchronously, we will always have an answer for any given user. As mention early, we will be answer with one of these statuses:

  • enqueued
  • parsing
  • downloading
  • ready
  • error

Once the given URL is process, the API also provides a series of URLs when ready:

{ "status": "ready",
  "images": ["http://localhost:3000/path_image1",
             "http://localhost:3000/path_image2",
             "http://localhost:3000/path_imagen"
  ]
}

Since the client will be using the images from our application, the app role is now a proxy between the given images on the requested website and the client.

Next Steps

If this were a functional product, it can be easily transformed to a two-way communication approach using websocket. Rails has ActiveChannel available and can be done a more real-time notification.

Executing OgPreview

How to run on local machine

You need nodejs and yarn installed:

Run yarn && bundle && rails s to make the magic happen

How to run with Docker

You need docker and docker-compose installed

Provisioning

Run the following commands to prepare your Docker dev env:

docker-compose build
docker-compose run runner yarn install
docker-compose run runner ./bin/setup

The above command builds the Docker image, installs Ruby and NodeJS dependencies, creates database, run migrations and seeds.

Commands

You can run the Rails up using the following command:

docker-compose up rails

If you want to run Webpack Dev server as well:

docker-compose up rails webpacker

Once your Rails server is running, just point the local URL: http://localhost:3000

Tests

For local environment, use the usual command:

bundle exec rspec
# or
# bundle exec rake

You can execute the test on Docker, using this command:

docker-compose run runner rspec
# or
# docker-compose run runner rake

ogpreview's People

Contributors

dependabot[bot] avatar esparta avatar

Watchers

 avatar  avatar

ogpreview's Issues

Fix local test suite

At some in the history of this project the test suite was failing when running locally:

Image

And yes, I've been very lazy and busy to fix it since the test suite was running just fine on CircleCI.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.