ambiretech / adex-market Goto Github PK
View Code? Open in Web Editor NEWAdEx Market: a scraper that aggregates ad campaign information from the validator network
AdEx Market: a scraper that aggregates ad campaign information from the validator network
if you create a campaign with validators that the market doesn't know about, it won't be crawled
so, introduce one of the two:
Integration tests for the routes (e.g. getting channels by earner) and the scrape loop (e.g. getting USD estimation)
e.g. ?depositAsset=addr
Impose a limit on the maximum number of slots and maximum number of hostnames that a publisher can use
This is useful to prevent various types of abuse
Default values
test:
save the full lastApprove object
do not return it from /campaigns, but save it
rather than using confident
from webshrinker, use score
: see if it's over a certain threshold which we'll adjust
unit.categories
is set for each unit
in campaignSpec.units
adUnitCategories
refers to the targeting variable which is normally read from unit.categories
Add a new route to the market, POST /unit-categorize that simply categorizes an unit by it's URL (targetUrl
) and image and saves this to a collection. It should categorize using:
For security, it should either compute the ipfs ID separately or store it's results separately for targetUrl/mediaUrl
this should be used in a few separate ways:
unit.categories
variable (in the units in campaignSpec
) so that the publisher can exclude certain types of content (via adSlot.rules
)Basically, add everything in the adUnit mongo object, but govern it through the platform, which will use the market API beforehand to help auto categorization when creating a campaign, and to set unit.categories
.
Further remarks/concerns:
adUnitCategories
can be shimmed through the AdView getter (targetingInputGetter
) from the old targeting
tags if it's not setunit.categories
is generated on the platform when generating the campaignSpec
Part of #101 and AmbireTech/adex-supermarket#9
We need a type
query parameter for the route /units
to optimize for the Supermarket route /units-for-slot/${slotId}
.
it shouldn't be needed; should use the relayer to check all identity-related things
HTTP Routes
states like Expired, Exhausted (and possibly others?) are permanent; once they're reached, stop updating the campaign
each campaign/channel should have a "market health"
which is an aggregate of the
so a healthy channel would satisfy:
stateRoot
valueif the state is unhealthy, there should be good information on why it happens, for example "Channel is unhealthy cause validator A is offline" or "Channel is unhealthy cause validator A reports unhealthy"
solves #12
described here: https://github.com/AdExNetwork/aips/issues/32
Use the Relayer to verify EWT authentication tokens.
This will ensure we don't have to have blockchain-specific code in the Market, and we can use the same authentication method as the validator uses, so that tokens can be reused and the user will have to sign less messages on login
There is an issue with the way isDisconnected is implemented.
https://github.com/AdExNetwork/adex-market/blob/master/lib/getStatus.js#L54
it does a util.isDeepStrictEqual(h1, h2) but the signatures & timestamp on the messages would be different
hence the channel would return disconnected
An approach would be to get the leader propagated heartbeat messages from the follower and then do length comparison and check if the difference is within the allowed difference. And the same for the follower from the leader.
The market should take a publisherAddr query parameter to enforce two limits:
Max number of channels the publisher is earning from; can be implemented via counting the Active channels the publisher is earning from, and if theyβre >= maxChannels, add a filter to the query that only returns them (with .limit(maxChannels), sorted by earnings)
Max allowed earnings for limited accounts - we will stop returning results entirely if the account is limited and the users total balance (on chain plus outstanding) is over that
It needs to be an ISODate (meaning you have to pass a Date when saving)
see https://github.com/AdExNetwork/adex-validator/blob/master/bin/validatorWorker.js#L62
we need to crawl all pages from the validator
NOTE: total
might change to totalPages
soon: AmbireTech/adex-validator#180
get all channels where a certain address has earned funds
this requires
$exists
and projecting only that keyThis API will merely return the latest channel balance, not the outstanding (non-withdrawn) amount; The reason for this decision is because the market must be blockchain-agnostic, and outstanding is defined as balances[addr] - onChainWithdrawn[addr]
the market should save the full state tree for some (all?) channels it cares about, as a failsafe against failing nodes; it should, ofc, also save the signature that goes with it
an easy way to do this would be to always save the full NewState
/ApproveState
validator msg pairs - both of them, together, contain a full state tree (required to build proofs) and the two required signatures
use the Alexa API to automatically categorize adSlots into relevant categories on POST; see lib/publisherVerification.js
for an example on how to use the Alexa API
Map the results to relevant tags from the options on the Platform; if they do not match directly, just add extra targeting tags for each Alexa categorization.
@simzzz I am not familiar with the Alexa API and what it returns exactly, so you'll need to research and come up with the best way to implement this
NOTE: this will replace the tags entered by the user (override them); we'll make relevant changes in the Platform (we won't ask the user to provide tags for the slot)
NOTE: once this is implemented and a PR is opened, we'll test it and if it's not working sufficiently well (due to Alexa's API) we'll restructure into a different design: we'll still allow users to submit tags for the adSlot, we'll keep auto-categorizations from Alexa in the websites
collection and merge them with the user-provided tags when doing GET /slot/
NEW IDEA: use webshrinker - turns out it's a mauch more adequate API
as an alternative to the DNS TXT record, support .well-known/adex.txt, with content equivalent to the DNS TXT record (adex-publisher=)
Benchmark the market server, against a pre-filled DB of campaigns (copied from production), using a set of ~10 requests on /campaigns with various real-world parameters (one with just ?status, the others with ?status&limitForPublisher= for different publishers - some that hit the limit, others who don't)
see how many requests per second can the market do on /campaigns
should be accessible with npm run benchmark
instead of returning lastApproved
, return lastApprovedSigs: [sig1, sig2]
and lastApprovedBalances: {...}
change the way data is stored in mongo
The /slot GET route should return a recommendedEarningLimitUSD
(as a float that represents a USD value) based on the relevant entry in websites
.
For now, use the following parameters:
See scrpits/get-waf
Use the cloudflare
npm module to automate updating of the rule
see https://api.cloudflare.com/#account-level-firewall-access-rule-update-access-rule, namely PATCH accounts/:account_identifier/firewall/access_rules/rules/:identifier
There should be request body validation for /user/
https://github.com/AdExNetwork/adex-market/blob/master/routes/users.js#L11
like the validator, an ability to run in clustered mode + take MAX_WORKERS from the environment (services/cluster can just be copy-pasted)
however, this depends on #76, since the scraper should not run in multiple processes
When there is RejectState, the campaign state should be changed to something that reflects that
When there's no recent NewState/ApproveState pair, it can either mean that the channel is genuinely not updated (no new events) or that there is a new NewState but no new ApproveState (but a RejectState instead, or the follower is offline)
could be another field in the status
that's calculated by the status-loop
, using the same algorithm as this: https://github.com/SpankChain/uniprice
essentially, we get how much the token is in DAI by calling the uniswap contract
The query will be as follows:
one of the two
Or alternatively one of three
This query has to be changed in two places
So this should be unified via lib/publisherVerification
It's currently possible to set CLICK pricing for a campaign with zero IMPRESSION pricing, essentially implementing a CPC campaign.
However, the price
returned from /units-for-slot
(see AIP31) is respected by the AdView as the final price, and it's used to sort the available bids.
https://support.google.com/google-ads/thread/1452036?hl=en - using historic CTR
https://blog.rontar.com/behind-the-scenes-how-advertising-auctions-and-cost-per-click-work - same
Use the slot average CTR to calculate a per-impression price to shim the price
value.
For example, if price.CLICK
is 100 and the CTR is 0.01, the returned price
will be 1
It's important to use a default value in case the slot average is not available, or there's not enough data to gather it (e.g. less than 2000 impressions).
Same can be applied for custom aquisition events in the future, except it will require the rate between that event and impressions.
*The challenge is that the Market doesn't know the CTR. We can store an expectedCtr
in the targetingRules by using { set: ['expectedCtr', 0.002] }
and update it via a script on the validator
There's a few things about the current market that can be easily optimized/improved:
A new route that allows to get all units matching a certain ad slot. We will build it as a separate component called the "supermarket" and we'll run it separately, and route on a NGINX level
This gives us 2 advantages:
Functionality:
/units-for-slot
, which returns all ad units which match this slot; it will apply targeting as per AIP31 AmbireTech/adex-supermarket#9issues
and/or stats
: a list of possible reasons why there are no returned units: NO_ACTIVE_CAMPAIGNS, CAMPAIGNS_NOT_SOUND, NO_DEPOSITASSET_CAMPAIGNS, NO_UNITS_FOR_SIZE, NO_UNITS_FOR_TARGETING, NO_UNITS_FOR_ADSLOTRULES, SLOT_NOT_VERIFIED (if acceptedReferrers.length == 0
)The supermarket will function in-memory, without a database. It will pull all data it needs from the validators.
Recommendations for internal data structures:
active: HashMap<CampaignId, CamapignWithInfo>
where CamapignWithInfo
holds the channel, latest balance, latest status and etc.finalized: Set<CampaignId>
- a set of finalized campaigns (exhausted/expired, whatever)That way only active (non-finalized) campaigns are kept in memory and updated, and once a campaign becomes inactive (which is irreversible), it will be flagged in finalized
. Note that unsound (Unhealthy, Invalid, etc.) campaigns are not finalized - only Exhausted/Closed/Withdraw/Expired are finalized.
On start-up and every few minutes, we will get all known campaigns from a configurable list of validators. Every few seconds (10-20), we will update our active campaigns from the validators (update their latest balance tree/messages/status).
Ad slots can be retrieved from the market on demand without a cache: this means every request to /unit-for-slot
will first request the slot from the market. The market endpoint will be configurable. Later on, we can cache that too if needed.
There are a few concepts of earning limits within AdEx: the limits recommended by the Market per slot. Those are based on the Alexa rank of the website in question and may include quick account limits too in the future.
We cannot apply earning limits because we can't compute all lifetime earnings because we drop finalized campaigns from memory (also, we only crawl active ones from validators). But the Market can apply limits at an ad slot level, by sneaking in { onlyShowIf: false }
within adSlot.rules
.
https://github.com/AdExNetwork/aips/issues/31 and it's matching engine - although that's mostly implemented
Add units and slots to ipfs
it can be optimized by:
In terms of types, this can be represented much more cleanly as an enum of:
{ Expired, Exhausted }
{ disconnected, offline, rejectedState, unhealthy }
- where all of those are booleansThis will be implemented in the supermarket first, then the logic should be backported to the JS implementation - we'll figure out how to translate the type considering JS does not have sum types
Return issues
from the /units-for-slot
route, as they're specified in #101
campaignsActive.length
is 0, do a .count()
on campaigns with the same query but w/o depositAsset, and depending on that return NO_ACTIVE_CAMPAIGNS
or NO_DEPOSITASSET_CAMPAIGNS
type
, add NO_UNITS_FOR_SIZE
acceptedReferrers.length
, add SLOT_NOT_VERIFIED
Tagging this with ux
because using that, we'd be able to inform publishers why they don't get any ads.
Implement /units-for-slot in JS before the supermarket is ready
This route should:
if a depositAsset
is not provided, default to DAI (either mainnet or testnet depending on the config)
related to #37
make a directory bin/
where there would be two separate things: server
and scraper
which can be started individually
the server would only handle requests, and the scraper would only scrape campaigns
npm start
will default to the server
verifyPublisher
and save the result every time a new adSlot is created; first check if the record exists - if it does, then it's an error (can do that with a single insert
)issues
: an array of strings with problems that the publisher must address (e.g. no DNS record); use this by scripts and POST; the existance of duplicates should also be an issue, but we will still save the recordissues
array of err message stringsquery parameter that returns campaigns where there's a certain earner
do that by doing something like (not sure if correct) ```
{ [status.lastApprovedBalances.${req.params.earner}
]: {$exists: true} }
this is required here: https://github.com/AdExNetwork/adex-relayer/blob/master/routineAuthsLoop.js#L42
this is basically the same as #17 but I'm reopening it cause the previous implementation appears to be querying everything that has `balances` and filtering it server side; we want to filter it on a DB query level, for performance
if a campaign status would be Ready
but the activeFrom has not commenced yet, consider that a Waiting
status
A declarative, efficient, and flexible JavaScript library for building user interfaces.
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. πππ
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google β€οΈ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.