Git Product home page Git Product logo

adex-market's People

Contributors

dependabot[bot] avatar elpiel avatar ivopaunov avatar ivshti avatar rori4 avatar samparsky avatar shteryana avatar simzzz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

adex-market's Issues

API to add more validators to scrape (or to submit campaigns)

if you create a campaign with validators that the market doesn't know about, it won't be crawled

so, introduce one of the two:

  • an API to submit campaigns; this means that any new validators will be "discovered" and the market will continue crawling them
  • an API to add validators to crawl

integration tests

Integration tests for the routes (e.g. getting channels by earner) and the scrape loop (e.g. getting USD estimation)

tests for /units-for-slot

test:

  • returns units with prices
  • applies targeting rules properly (see the tests in the JS validator, integration.js)
  • applies min CPM
  • applies limit for number of campaigns earning from

adUnit (advertiser) auto-categorization

Definitions

  • adUnit object is the MongoDB object for an ad unit
  • unit.categories is set for each unit in campaignSpec.units
  • adUnitCategories refers to the targeting variable which is normally read from unit.categories

adUnit/advertiser auto-categorization based on AIP31:

Add a new route to the market, POST /unit-categorize that simply categorizes an unit by it's URL (targetUrl) and image and saves this to a collection. It should categorize using:

  • Google Vision/Google Natural Language
  • Webshrinker

For security, it should either compute the ipfs ID separately or store it's results separately for targetUrl/mediaUrl

this should be used in a few separate ways:

  • when the advertiser is uploading an ad unit, it should be categorized, with the results saved directly in the adUnit object
  • when the advertiser is creating a campaign, this can be used for suggesting default categories
  • when the advertiser is creating a campaign, we should set the unit.categories variable (in the units in campaignSpec) so that the publisher can exclude certain types of content (via adSlot.rules)

Basically, add everything in the adUnit mongo object, but govern it through the platform, which will use the market API beforehand to help auto categorization when creating a campaign, and to set unit.categories.

Further remarks/concerns:

  • adUnitCategories can be shimmed through the AdView getter (targetingInputGetter) from the old targeting tags if it's not set
  • if we need overrides, they can be storred directly in the adUnit Mongo object and used when unit.categories is generated on the platform when generating the campaignSpec

Market Health

each campaign/channel should have a "market health"

which is an aggregate of the

so a healthy channel would satisfy:

  • recent heartbeat for all validators
  • there is a recent NewState
  • last ApproveState is recent and reports healthy
  • recent NewState and ApproveState have the same stateRoot value

if the state is unhealthy, there should be good information on why it happens, for example "Channel is unhealthy cause validator A is offline" or "Channel is unhealthy cause validator A reports unhealthy"

Fix: isDisconnected Implementation

There is an issue with the way isDisconnected is implemented.
https://github.com/AdExNetwork/adex-market/blob/master/lib/getStatus.js#L54
it does a util.isDeepStrictEqual(h1, h2) but the signatures & timestamp on the messages would be different

hence the channel would return disconnected

An approach would be to get the leader propagated heartbeat messages from the follower and then do length comparison and check if the difference is within the allowed difference. And the same for the follower from the leader.

enforce publisherAddr limits

The market should take a publisherAddr query parameter to enforce two limits:

  1. Max number of channels the publisher is earning from; can be implemented via counting the Active channels the publisher is earning from, and if they’re >= maxChannels, add a filter to the query that only returns them (with .limit(maxChannels), sorted by earnings)

  2. Max allowed earnings for limited accounts - we will stop returning results entirely if the account is limited and the users total balance (on chain plus outstanding) is over that

API to get all channels by earner

get all channels where a certain address has earned funds

this requires

  • storing lastApproved in the DB
  • querying lastApproved.newState.balances with $exists and projecting only that key

This API will merely return the latest channel balance, not the outstanding (non-withdrawn) amount; The reason for this decision is because the market must be blockchain-agnostic, and outstanding is defined as balances[addr] - onChainWithdrawn[addr]

State backups

the market should save the full state tree for some (all?) channels it cares about, as a failsafe against failing nodes; it should, ofc, also save the signature that goes with it

an easy way to do this would be to always save the full NewState/ApproveState validator msg pairs - both of them, together, contain a full state tree (required to build proofs) and the two required signatures

auto-categorization of adSlots/websites

use the Alexa API to automatically categorize adSlots into relevant categories on POST; see lib/publisherVerification.js for an example on how to use the Alexa API

Map the results to relevant tags from the options on the Platform; if they do not match directly, just add extra targeting tags for each Alexa categorization.

@simzzz I am not familiar with the Alexa API and what it returns exactly, so you'll need to research and come up with the best way to implement this

NOTE: this will replace the tags entered by the user (override them); we'll make relevant changes in the Platform (we won't ask the user to provide tags for the slot)

NOTE: once this is implemented and a PR is opened, we'll test it and if it's not working sufficiently well (due to Alexa's API) we'll restructure into a different design: we'll still allow users to submit tags for the adSlot, we'll keep auto-categorizations from Alexa in the websites collection and merge them with the user-provided tags when doing GET /slot/

NEW IDEA: use webshrinker - turns out it's a mauch more adequate API

benchmarking tool

depends on #77 and #76

Benchmark the market server, against a pre-filled DB of campaigns (copied from production), using a set of ~10 requests on /campaigns with various real-world parameters (one with just ?status, the others with ?status&limitForPublisher= for different publishers - some that hit the limit, others who don't)

see how many requests per second can the market do on /campaigns

should be accessible with npm run benchmark

adSlot GET: return recommended earning limit

The /slot GET route should return a recommendedEarningLimitUSD (as a float that represents a USD value) based on the relevant entry in websites.

For now, use the following parameters:

  • lower than 10000 rank: 10k lifetime earning
  • lower than 100k: 5k lifetime earnings
  • lower than 300k: 1k lifetime earning
  • higher than 300k or none: 100 USD

implement clustering

like the validator, an ability to run in clustered mode + take MAX_WORKERS from the environment (services/cluster can just be copy-pasted)

however, this depends on #76, since the scraper should not run in multiple processes

Bad test assertion text for is_unhealthy

bug: Rejected messages are ignored

When there is RejectState, the campaign state should be changed to something that reflects that

When there's no recent NewState/ApproveState pair, it can either mean that the channel is genuinely not updated (no new events) or that there is a new NewState but no new ApproveState (but a RejectState instead, or the follower is offline)

publisher verification: require a minimum alexa rank

The query will be as follows:

one of the two

  • verifiedForce
  • min alexa rank AND (verifiedIntegration or verifiedOwnership)

Or alternatively one of three

  • verifiedForce
  • verifiedOwnership
  • min rank AND verifiedIntegration

This query has to be changed in two places

  • routes/adSlot
  • scripts/get-waf

So this should be unified via lib/publisherVerification

CPC/CPA pricing for /units-for-slot

Problem

It's currently possible to set CLICK pricing for a campaign with zero IMPRESSION pricing, essentially implementing a CPC campaign.

However, the price returned from /units-for-slot (see AIP31) is respected by the AdView as the final price, and it's used to sort the available bids.

Research

https://support.google.com/google-ads/thread/1452036?hl=en - using historic CTR
https://blog.rontar.com/behind-the-scenes-how-advertising-auctions-and-cost-per-click-work - same

Solution

Use the slot average CTR to calculate a per-impression price to shim the price value.

For example, if price.CLICK is 100 and the CTR is 0.01, the returned price will be 1

It's important to use a default value in case the slot average is not available, or there's not enough data to gather it (e.g. less than 2000 impressions).

Same can be applied for custom aquisition events in the future, except it will require the rate between that event and impressions.

*The challenge is that the Market doesn't know the CTR. We can store an expectedCtr in the targetingRules by using { set: ['expectedCtr', 0.002] } and update it via a script on the validator

GET /units-for-slot/{slotId}: the "supermarket"

Problem

There's a few things about the current market that can be easily optimized/improved:

  • the adview manager needs to request info about the slot first, and then all campaigns; this can be done in one request
  • we can do the targeting server-side and only return relevant campaigns; some targeting rules can only be applied on the client side (e.g. AdEx Profile, frequency capping) - so those rules will simply be ignored (see AIP31)
  • because of those limitations, we have to set cache times high, which leads to this issue, which happens because campaigns are still returned for some time after they've exhausted/expired

Solution

A new route that allows to get all units matching a certain ad slot. We will build it as a separate component called the "supermarket" and we'll run it separately, and route on a NGINX level

This gives us 2 advantages:

  • πŸš„ Speed: by doing 1 request instead of 2, and using a in-memory data structure, and Rust, we'd be able to cut cache times down to only a few seconds and deliver fresher data; furthermore, AdEx ads will load faster, and less KB will be sent over network
  • 🧐 Traceability: if there are no viable ad units, it will return the precise reasons, which means easier debugging of "why are my ads not showing"; also allows better internal stats

Functionality:

  • implements a route /units-for-slot, which returns all ad units which match this slot; it will apply targeting as per AIP31 AmbireTech/adex-supermarket#9
  • pulls up-to-date campaign data often and directly from validators to avoid trailing impressions Supermarket impl
  • applies targeting logic server-side using the adview manager rust code
  • [] (separated in own issue #170) returns issues and/or stats: a list of possible reasons why there are no returned units: NO_ACTIVE_CAMPAIGNS, CAMPAIGNS_NOT_SOUND, NO_DEPOSITASSET_CAMPAIGNS, NO_UNITS_FOR_SIZE, NO_UNITS_FOR_TARGETING, NO_UNITS_FOR_ADSLOTRULES, SLOT_NOT_VERIFIED (if acceptedReferrers.length == 0)

Tech design

The supermarket will function in-memory, without a database. It will pull all data it needs from the validators.

Recommendations for internal data structures:

  • active: HashMap<CampaignId, CamapignWithInfo> where CamapignWithInfo holds the channel, latest balance, latest status and etc.
  • finalized: Set<CampaignId> - a set of finalized campaigns (exhausted/expired, whatever)

That way only active (non-finalized) campaigns are kept in memory and updated, and once a campaign becomes inactive (which is irreversible), it will be flagged in finalized. Note that unsound (Unhealthy, Invalid, etc.) campaigns are not finalized - only Exhausted/Closed/Withdraw/Expired are finalized.

On start-up and every few minutes, we will get all known campaigns from a configurable list of validators. Every few seconds (10-20), we will update our active campaigns from the validators (update their latest balance tree/messages/status).

Ad slots can be retrieved from the market on demand without a cache: this means every request to /unit-for-slot will first request the slot from the market. The market endpoint will be configurable. Later on, we can cache that too if needed.

Applying earning limits

There are a few concepts of earning limits within AdEx: the limits recommended by the Market per slot. Those are based on the Alexa rank of the website in question and may include quick account limits too in the future.

We cannot apply earning limits because we can't compute all lifetime earnings because we drop finalized campaigns from memory (also, we only crawl active ones from validators). But the Market can apply limits at an ad slot level, by sneaking in { onlyShowIf: false } within adSlot.rules.

Prerequisites

https://github.com/AdExNetwork/aips/issues/31 and it's matching engine - although that's mostly implemented

optimize and security audit the market health function

  1. perform a detailed (security focused) review on the market health functions and it's tests - @samparsky and @elpiel - you should each do that
  2. rewrite it in rust in a more efficient way and use that in the Supermarket - @elpiel you should do that

it can be optimized by:

  • not querying the validators if not needed (e.g. channel is expired)
  • by using the heartbeats returned in last-approved
  • only retrieving latest NewState/RejectState individually when the ones in last-approved are not recent (to distinct between Invalid and just not having a new New/Approve pair)

In terms of types, this can be represented much more cleanly as an enum of:

  • Initializing
  • Waiting
  • Active
  • Finalized - should contain another enum containing { Expired, Exhausted }
  • Unsound - should contain another enum with a struct { disconnected, offline, rejectedState, unhealthy } - where all of those are booleans

This will be implemented in the supermarket first, then the logic should be backported to the JS implementation - we'll figure out how to translate the type considering JS does not have sum types

/units-for-slot: return basic issues

Return issues from the /units-for-slot route, as they're specified in #101

  • if campaignsActive.length is 0, do a .count() on campaigns with the same query but w/o depositAsset, and depending on that return NO_ACTIVE_CAMPAIGNS or NO_DEPOSITASSET_CAMPAIGNS
  • if all of the campaigns don't have units with the proper type, add NO_UNITS_FOR_SIZE
  • if the slot has no acceptedReferrers.length, add SLOT_NOT_VERIFIED

Tagging this with ux because using that, we'd be able to inform publishers why they don't get any ads.

/units-for-slot implementation (JS)

Implement /units-for-slot in JS before the supermarket is ready

This route should:

  • get all active campaigns
  • apply a global system-wide min CPM
  • get the slot
  • get all units from those campaigns which match the slot ad type
  • apply targeting
  • return the units, plus their price, and the targeting input variables

default depositAsset

if a depositAsset is not provided, default to DAI (either mainnet or testnet depending on the config)
related to #37

split campaign scraper and server in separate binaries

make a directory bin/ where there would be two separate things: server and scraper which can be started individually

the server would only handle requests, and the scraper would only scrape campaigns

npm start will default to the server

verify publishers on POST adSlot

  • call verifyPublisher and save the result every time a new adSlot is created; first check if the record exists - if it does, then it's an error (can do that with a single insert)
  • a convenience method that takes input, checks blacklist, and saves the successful result or returns issues: an array of strings with problems that the publisher must address (e.g. no DNS record); use this by scripts and POST; the existance of duplicates should also be an issue, but we will still save the record
  • if there is a verification error, send it back to the user with the message; return this in the form of issues array of err message strings
  • in the platform, show the error and explain how to add DNS TXT record: AmbireTech/adex-platform#394

/campaigns: fast filter by ?byEarner

query parameter that returns campaigns where there's a certain earner

do that by doing something like (not sure if correct) ```
{ [status.lastApprovedBalances.${req.params.earner}]: {$exists: true} }


this is required here: https://github.com/AdExNetwork/adex-relayer/blob/master/routineAuthsLoop.js#L42

this is basically the same as #17 but I'm reopening it cause the previous implementation appears to be querying everything that has `balances` and filtering it server side; we want to filter it on a DB query level, for performance

"Waiting" status

if a campaign status would be Ready but the activeFrom has not commenced yet, consider that a Waiting status

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.