Git Product home page Git Product logo

companiesdb's Introduction

Companies DB

This is a companies DB that we use in AdGuard Home and AdGuard DNS. It is basically the Whotracks.me database converted to a simple JSON format with some additions from us.

In addition, there's also a file with companies metadata that we use in AdGuard VPN.

Workflow

  • create a fork of the repository on GitHub.
  • create a branch from actual main branch.
  • add a tracker.
  • create a Pull Request.

Naming of branches and commits

  • the branch name format: fix/issueNumber_domain
fix/34_showrss.info
  • the commit message format: Fix #issueNumber domain
Fix #34 showrss.info

Assignment of files

The list of trackers and companies is generated from the database whotracks.me.

Trackers:

Companies:

VPN Services:

How to add new or rewrite whotracks.me data

If you need to add new data or to rewrite whotracks.me data:

Warning

Add companies and tracker names in alphabetical order. Add tracker domains alphabetically by value.

How to add a new company or overwrite whotracks.me data

The data about the company is added to the source/companies.json file into the JSON key with the name that defines companyId, which is used when adding trackers:

  • name - the official name of the company, will be displayed in the filter log.
  • websiteUrl - the address of the company website, also used to define the company icon.
  • description - company description, not displayed anywhere.
"companyincID": {
    "name": "Company inc.",
    "websiteUrl": "https://www.company.org/",
    "description": "Description of Company inc."
}

How to add a new tracker or overwrite whotracks.me data

The data about the tracker is added to the source/trackers.json file into the nested JSON key inside the trackers section with the name that defines the tracker name of the company, which is used when adding trackers to the trackerDomains section:

"trackers": {
        "company_trackername": {
            "name": "Company inc. Analytics",
            "categoryId": 6,
            "url": "https://analytics.company.org/",
            "companyId": "companyincID"
        }
}

Add tracker domains to the trackerDomains section:

  • key - tracker domain.
  • value - the tracker name of the company.
"trackerDomains": {
        "collect.company.org": "company_trackername"
}

Warning

If the value does not exist - enter null:

"url": null

Tracker categories

# Name Purpose
0 audio_video_player Enables websites to publish, distribute, and optimize video and audio content
1 comments Enables comments sections for articles and product reviews
2 customer_interaction Includes chat, email messaging, customer support, and other interaction tools
3 pornvertising Delivers advertisements that generally appear on sites with adult content
4 advertising Provides advertising or advertising-related services such as data collection, behavioral analysis or re-targeting
5 essential Includes tag managers, privacy notices, and technologies that are critical to the functionality of a website
6 site_analytics Collects and analyzes data related to site usage and performance
7 social_media Integrates features related to social media sites
8 misc This tracker does not fit in other categories
9 cdn Content delivery network that delivers resources for different site utilities and usually for many different customers
10 hosting This is a service used by the content provider or site owner
11 unknown This tracker has either not been labelled yet, or we do not have enough information to label it
12 extensions -
13 email Includes webmail and email clients
14 consent -
15 telemetry -
16 mobile_analytics Collects and analyzes data related to mobile app usage and performance

How to build trackers data

yarn install
yarn convert

The result is:

  • dist/companies.json - companies data JSON file. This file contains the companies list from whotracks.me merged with AdGuard companies from source/companies.json.

  • dist/trackers.json - trackers data JSON file. Combined data from two files:

    • source/trackers.json
    • dist/whotracksme.json.

    An additional key is added to the information from AdGuard files: "source": "AdGuard"

  • dist/trackers.csv - trackers data CSV file. This file is used by the ETL process of AdGuard DNS, be very careful with changing it's structure.

  • dist/whotrackme.json - actual whotrack.me trackers data json file, compiled from trackerdb.sql.

During the build process, a list of warnings and errors is displayed that should be fixed.

Company icons

The favicon of the company website is used as the company icon. It can be checked using our icon service:

https://icons.adguard.org/icon?domain=adguard.com

Policy

The detailed policy currently is under development. The decision to add a company is at the discretion of the maintainers, each request will review on a case-by-case basis. Factors such as the company's industry, reputation, and relevance will be taken into account during the evaluation process.

Currently, we are avoiding adding personal websites/blogs or services that do not seem to have sufficient popularity.

Acknowledgements

We would like to thank the team at whotrack.me for their work. Initially, our database was built on top of the whotrack.me database, using their extensive data collection. However, we would like to emphasise that our current database is now independent and updated separately from whotrack.me.

companiesdb's People

Contributors

abezhovets avatar adguard avatar alex-302 avatar ameshkov avatar atropnikov avatar aydinv13 avatar denis9229 avatar dependabot[bot] avatar jellizaveta avatar mizzick avatar scripthunter7 avatar zhelvis avatar zloyden avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

companiesdb's Issues

hbbtv.oztam.com.au

OzTAM is the official source of television audience measurement (TAM) covering Australia’s five mainland metropolitan markets and nationally for subscription television.
OzTAM manages and markets television ratings data for total television viewing in Sydney, Melbourne, Brisbane, Adelaide and Perth across all television households and nationally for all subscription television households.

Whois data
Homepage

Screenshot 2023-02-15 at 3 25 56 pm

Add mdp-appconf-sg.heytapdl.com to companies db

mdp-appconf-sg.heytapdl.com is a subdomain of heytapdl.com. DNS resolution of mdp-appconf-sg.heytapdl.com points to 163.181.49.226 with a location in Madrid, Madrid, Comunidad de ES. The server responds with an SSL certificate issud by Digicert Inc to 深圳市欢太科技有限公司 under the common name nearme.com.cn.

Whois data

Screenshot 2023-02-16 at 9 01 48 am

Add allawnos.com to companies database

whois data: https://domain.glass/allawnos.com

allawnos.com doesn't resolve to an IP, but seems to be a domain used by phone companies such as OnePlus and Oppo which belong to BBK Electronics.

Reading a few posts on xda developers, there is a general assumption that it's an analytics service for tracking usage data.
It popped up on one of my devices which is an Oppo phone running ColorOS 12.

Domains appearing in my query log
classify-app-sg.allawnos.com
component-ota-sg.allawnos.com
icosa-service-sg.allawnos.com
opex-service-sg.allawnos.com
sau-server-sg.allawnos.com
rus-service-sg.allawnos.com

Screenshot 2023-02-03 at 6 27 18 am

Add pki.goog to Google (Company)

pki.google is Google Trust Services: https://who.is/whois/pki.goog
"Encryption is an important building block for a safer internet. Google Trust Services provides Transport Layer Security (TLS) certificates for Google services and users helping to authenticate and encrypt internet traffic. The service is built on Google’s geographically distributed infrastructure and backed by security and compliance audits helping to provide a transparent, trusted, and reliable Certificate Authority."

Screenshot 2023-01-28 at 6 18 36 am

Fix Snapchat icon svg

Currently the Snapchat icon svg appears like this on AdGuard DNS
Screenshot 2023-02-03 at 6 32 51 am

Here is what it should look like
snapchat-logo-svgrepo-com

Add 3gppnetwork.org to companies db

Whois data

This domain does not resolve to any IP address (via DNS resolution) and is thus offline. Check the dns records to see any associated non-IP records..Domain name registration belongs to Gsm Association, registered through Network Solutions, Llc.

Screenshot 2023-02-17 at 1 14 34 pm

Add ntp.org to companies db

DNS resolution of ntp.org points to 204.93.207.22 with a location in Chicago, Illinois US. Domain name registration belongs to Network Time Foundation, registered through Tucows, Inc.. The server responds with an SSL certificate issud by Let's Encrypt under the common name ntp.org.

Whois data

Screenshot 2023-02-17 at 1 33 13 pm

Add AdGuard domains to companies data

There's a lot of domains in the query logs from AdGuard, it would be good to see it added to the Companies data.

adguard-dns.io
AdGuard-dns.com
AdGuard.org

Add riot.im to Element companies data

The Riot app was renamed to Element in July 2020, but still uses many of the Riot domains, including Riot.im.
riot.im now redirects to element.io.

Screenshot 2023-01-31 at 7 44 18 am

Add kik.com to companies database

whois data: https://domain.glass/kik.com

Domains associated with kik.com

  • clientmetrics.kik.com (metrics)
  • clientmetrics-augmentum.kik.com (metrics)
  • bots-api.kik.com (bot api)
  • profilepics.cf.kik.com (domain for loading profile pics)
  • talk1600ip.kik.com
  • engine.apikik.com
  • platform.kik.com
  • kik-live.com (live streaming platform within Kik app)

Screenshot 2023-02-03 at 6 27 31 am

Add showrss.info to showRSS company data

Website: https://showrss.info/

"showRSS is a tool that makes it easier for people to track ongoing TV shows. Create a free account, pick your preferred shows and generate a feed link that you can then subscribe to. And because some torrent clients support subscriptions, you can easily automate your setup by just plugging your personal feed."

Screenshot 2023-01-31 at 10 01 13 am

Add vscode-unpkg.net to companies db

ms-vscode.vscode-unpkg.net is a subdomain of vscode-unpkg.net. DNS resolution of ms-vscode.vscode-unpkg.net points to 13.107.246.70 with a location in Amsterdam, Noord-Holland NL. Parent domain registration belongs to Microsoft Corporation, registered through CSC Corporate Domains, Inc.. The server responds with an SSL certificate issud by Microsoft Corporation to Microsoft Corporation under the common name *.vscode-unpkg.net.

This domain appeared when using the VSCode plugin or GitHub. It's essentially an online text editor.

whois data

Screenshot 2023-02-18 at 6 48 40 am

Add akaquill.net to companies db

q1.au-aest.gh-g.v1.akaquill.net is a subdomain of akaquill.net. This subdomain does not resolve to any IP address and is thus offline. Check the dns records to see any associated non-IP records.. Parent domain registration belongs to Akamai Technologies, inc., registered through Akamai Technologies, Inc..

Whois data

Screenshot 2023-02-17 at 1 26 04 pm

sectigo.com

Request: zerossl.ocsp.sectigo.com

zerossl.ocsp.sectigo.com is a subdomain of sectigo.com. DNS resolution of zerossl.ocsp.sectigo.com points to 151.139.128.14 with a location in Dallas, Texas US. Network services are provided by Highwinds Network Group. Parent domain registration belongs to Sectigo Limited, registered through CSC Corporate Domains, Inc.. The server responds with an SSL certificate issud by Sectigo Limited under the common name *.ssl.hwcdn.net.

Whois data

Screenshot 2023-02-15 at 3 18 26 pm

Calls to *.userapi.com are incorrectly identified as Megafon

Calls to addresses in the userapi.com domain are erroneously identified as Megafon addresses.
Currently, Alisher Usmanov (co-owner of Megafon) has left the shareholders. Officially, the company is called "VK Company Limited" according to https://vk.company/ru/investors/corpgov/ and is part of the company's Mail.ru Group https://ru.wikipedia.org/wiki/VK_(компания) . I propose to correct the ownership of *.userapi.com from Megafon to VK Company Limited (most accurately) or Mail.ru Group in the DNS Adguard logs.

95Pl9xUTK_xP8IDyGT4X8qNu31IChNax_9e_zNUVRSuE9sgVLAYfUTtIIEBNvG-ufHG_oEpm5lVfLHFBN0vvagFN

Add whatsapp.net to companies db

dit.whatsapp.net is a subdomain of whatsapp.net. This subdomain does not resolve to any IP address and is thus offline. Check the dns records to see any associated non-IP records.. Parent domain registration belongs to Whatsapp Inc., registered through RegistrarSafe, LLC.

whois data

Screenshot 2023-02-17 at 4 13 53 pm

updates.cdn-apple.com

updates.cdn-apple.com is a subdomain of cdn-apple.com. DNS resolution of updates.cdn-apple.com points to 17.253.31.201 with a location in Seattle, Washington US. Network services are provided by Apple. Parent domain registration belongs to Apple Inc., registered through CSC Corporate Domains, Inc.. The server responds with an SSL certificate issud by Apple Inc., C to Apple Inc., St under the common name updates.cdn-apple.com, o.

Whois data

Screenshot 2023-02-15 at 3 34 52 pm

Add vscode-cdn.net to companies db

v--12s0ddraiu48ap5kmmvijle8b05dqgun6k938bhf2pqvs7l0lbka.vscode-cdn.net is a subdomain of vscode-cdn.net. DNS resolution of v--12s0ddraiu48ap5kmmvijle8b05dqgun6k938bhf2pqvs7l0lbka.vscode-cdn.net points to 13.107.237.69 with a location in Redmond, Washington US.

whois data

Similar to #100

Screenshot 2023-02-18 at 7 01 28 am

Add mb2.messagebank.telstra.com to companies db

Whois data

mb2.messagebank.telstra.com is a subdomain of telstra.com. DNS resolution of mb2.messagebank.telstra.com points to 149.135.224.36 with a location in Sydney, New South Wales AU. Parent domain registration belongs to Telstra Corporation Ltd, registered through CSC Corporate Domains, Inc..

Find a way to handle local device entries in companies db

I have multiple entries originating from my router for various tasks (no knowledge of what they may be), but maybe these could appear as a non-specific label indicating that there's no company of origin.

Here's some examples:
7407731B.wlan0
7407731B.#

Add v0cdn.net to companies db

_cs22.wpc.v0cdn.net is a subdomain of v0cdn.net. DNS resolution of cs22.wpc.v0cdn.net points to 152.199.4.33 with a location in Ashburn, Virginia US. Hosting or network services services are provided on Verizon Internet Services networks via Verizon Business. Parent domain registration belongs to Verizon Trademark Services LLC, registered through MarkMonitor, Inc.. The server responds with an SSL certificate issud by Digicert Inc to Microsoft Corporation under the common name *.vo.msecnd.net.

This hostname may be used by Microsoft products for the purposes of Windows Telemetry(HTTP)._

whois data

Screenshot 2023-02-18 at 6 57 05 am

Add e-msedge.net to companies db

e-0014.e-msedge.net is a subdomain of e-msedge.net. DNS resolution of e-0014.e-msedge.net points to 13.107.5.93 with a location in Amsterdam, Noord-Holland NL. Parent domain registration belongs to Microsoft Corporation, registered through MarkMonitor, Inc.. The server responds with an SSL certificate issued by Microsoft Corporation to Microsoft Corporation under the common name *.azureedge.net.
The following endpoints are used to connect to the Microsoft 365 admin center's shared infrastructure, including Office in a browser. For more info, see Office 365 URLs and IP address ranges. You can turn this off by removing all Microsoft Office apps and the Mail and Calendar apps. If you turn off traffic for these endpoints, users won't be able to save documents to the cloud or see their recently used documents.
This hostname may be used by Microsoft products for the purposes of Used by Microsoft OfficeHub to get the metadata of Microsoft Office apps(HTTPS).

Whois data

Screenshot 2023-02-18 at 7 06 14 am

Add matrix.org to Matrix companies data

matrix.org is the domain used for the Matrix open standard.
Matrix is an open standard for interoperable, decentralised, real-time communication over IP. It can be used to power Instant Messaging, VoIP/WebRTC signalling, Internet of Things communication - or anywhere you need a standard HTTP API for publishing and subscribing to data whilst tracking the conversation history.

Screenshot 2023-01-31 at 7 50 12 am

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.