Git Product home page Git Product logo

mindfire-quest's Introduction

mindfire-quest

Mindfire Quest

Unveil the obscure network of company and location data, using smart algorithms and data wrangling

Mission Statement

By transforming Swiss Re into a truly tech- and data-led risk knowledge company, we aim to make an incremental change in better understanding and qualifying risks. Reflecting the real-world evidence in a systematic, digital way, for example understanding where companies have their offices, facilities, factories, and warehouses, is essential to perform risk assessment and resilience services in a data-driven way. While meta information on companies does exist, an extended view to its locations isn’t solved at scale yet. This quest aims to bridge this major information gap applying advanced techniques with artificial intelligence.

The quest's mission is to build relationships in company and location data. In particular, it will focus on:

  • Defining the many ways in which a company and a location (for example a factory, warehouse, sales point) can be interlinked
  • Identifying free or paid source of information, where data would support applying ontology to actual companies and buildings
  • Populating a first applied ontology with information retrieved from commercially-free data
  • Potentially retrieving unique identifiers when and where possible, to allow Swiss Re to map it to its internal data at a later stage

Methods

The methods that we have applied have been discussed in the following section:

  1. NLP Ontology: DNB & DUNS; Panama, Paradise and Pandora papers for offshore and virtual address, or the Global cement directory report

  2. Top View: Semantic image search using, e.g. CLIP (from OpenAI) applied to sat and terrain view repos of North America, Africa etc. a) Library used: CLIP Notebook b) Library Used: Google MUM Multi-Modal T5 c) Library Used: CV modals on TF or Sagemaker...

  3. Sat View: Text Extraction of company names from the buildings or building entrances signs, potentially for inclined sat view. (Cement factory company_detection from image)

  4. Google PlusCodes, SearchOnL coordinate numbering system for buildings.

In sprint 2, we have developed a hybrid of all the above methods.

Here's a workflow of what the solution is envisioned to look like:-

🎯 Ways to Implement

White-Label Path

This is the path that we think is possible at the moment. The following are the steps involved:

  1. YOLOR or similar algorithm to identify buildings and sites in satellite visual and terrain images for higher accuracy, or via transfer learning by feeding the bounding boxes to CLIP. Already done and uploaded for over 1bn buildings.

  2. Top view: CLIP notebook to search through the sat images with buildings in it. (Crosscheck and subtract the ones already listed in the NLP Ontology)

  3. Ground view: Text and/or logo detection

  4. Geolocation: Google Earth Engine (GEE) script for coordinates to street address or to Plus Codes

  5. Optional addition: Address normalization fuzzy logic tool.

How to Implement
  • Step 1: "T2 notebook" for any object detection algorithm implementation from TenserFlow Hub. “Open buildings region notebook” as example for the 2/3 of the continent of Africa for building detection and coordinates extraction as well as Github repos for USA & Canada

  • Step 2 "CLIP Cement Factory Beyond Tags - Semantic Search on images with OpenAI notebook" : notebook for identifying potential company candidates

  • Step 3 "Company_detection_from_image notebook": AWS cement factory detection notebook for uniquely identifying language and company from text on satellite or streetview images on company buildings, entrance or outdoor objects, such as branded trucks or containers on ships or harbors.

  • Step 4 GEE link - open buildings- here

  • Step 5 Address normalisation fuzzy logic tool.

Happy Path

Given the availability of software and resources, this is the path we think can make our tasks easier.

Step 1: GOOGLE MUM to be released soon

Step 2: "CLIP Cement Factory Beyond Tags - Semantic Search on images with OpenAI notebook": notebook for identifying potential company candidates

Step 3: "Company_detection_from_image notebook": AWS cement factory detection notebook for uniquely identifying language and company from text on satellite or streetview images on company buildings, entrance or outdoor objects, such as branded trucks or containers on ships or harbors.

Step 4: GEE link - open buildings*

Step 5: Address normalisation fuzzy logic tool

Data

The data has been collected from the DnB website. It scraped using a web scraper and the data can be found in folder data > dnb-single-page.csv. The data is a snapshot of the DnB database from a single page. Overall, the database has over 3000 entries.

We collected data of a variety of companies:

⚒️ Contributors

The team members undertaking this project are:


Susanne Kühne


Marco Fernandez


Stephanie Boyle


Ojasvi Gupta


Prakhar Rathi

mindfire-quest's People

Contributors

prakharrathi25 avatar sanniesun avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

mindfire-quest's Issues

Susamm

Prakhar pleae update the white label path referring to slide 13 in presentation
Update the pictures shown on github
Upload new Video (in total 4)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.