patcg-individual-drafts / topics Goto Github PK
View Code? Open in Web Editor NEWThe Topics API
Home Page: https://patcg-individual-drafts.github.io/topics/
License: Other
The Topics API
Home Page: https://patcg-individual-drafts.github.io/topics/
License: Other
This issue was spawned from an earlier Twitter discussion.
As currently proposed, Topics are available based on where specific 3Ps are present. This gating creates several issues:
I am deliberately not proposing a technical mechanism at this point, but rather focusing on the desirable outcomes first.
The idea is that instead of gating the shared elaboration of Topics through whoever happens to have the same 3P, it is done through explicit groups (leagues) of first-parties. Behaviour on any member of a given league contributes to the elaboration of topics that can be used for IG targeting on any other. This has several consequences:
meta
).In order for this to work, a league would have to be something that you can't join at high speed or frequency. This is significantly more complex than gating on origin and a method call; my assumption is that whatever technical mechanism underlies topic leagues would be shared with other approaches that also benefit from grouping first parties for whatever reason.
I realise this is more hand waving than the usual fare (though maybe by waving hands fast enough one may… take flight) but I'm trying to hold off jumping into solution space while we discuss the potential value of the overall approach.
These are some questions related to #31 that would help users of the Topics API understand when it could be called. (This is not just about compliance -- understanding these answers will help sites and third-party services manage the interactions of scripts on a page that call Topics API with scripts that handle consent and/or opt-outs/objections from the user. Covering this material at an early stage will help to evaluate how practical this proposal is to implement.)
Who is the controller? (Is any caller of document.browsingTopics()
a controller?)
What is the basis for processing?
Is the Topics API data "obtained from the data subject" because it is provided by the browser, which operates as the agent of the user (data subject)?
Because Topics API is an exchange of value for value (a trade of information about the user's activity on the current site for information about the user's activity on previous sites) is it considered a "sale" of personal information in California?
A caller may not get the same signal from every topic for selecting an ad, for instance "Auto insurance" may be more useful than "Vegan Cuisine".
Would it be possible for callers to provide a ranked priority list of topics, for example at a .well-known location, and for the API to return topics, if eligible, according to this priority list?
Why is the current API in Document? This has nothing to do with the current document, instead it should be in the Navigator object.
The Topics API provides one zero-argument function document.browsingTopics()
, which serves three logically distinct purposes:
It would be useful to provide a little more control over these three different aspects of the API. In particular, there is some tension between the first two and the last use case. For the first two use-cases, there is no downside (aside potentially from some latency) to calling the API. Each ad tech is incentivized to call the API whenever possible, either to get useful signals or enabling nonempty responses for future calls to the API.
On the other hand, there are potential downsides to calling the API when it comes to the third point. For example, for a very large publisher site with generic, not commercially relevant topics at the domain/subdomain level. The ad tech might like to call the API to get useful signals, but with the current API it may not be worth the risk of potentially contaminating the users' future top 5 topics with the generic, not commercially relevant topics.
It would be beneficial to perhaps provide an argument that controls the behavior, something like browsingTopics(add_current_topics=true)
. Since eligibility is determined per API caller there should be no ecosystem concern about "freeloaders" getting other callers topics without contributing. There also does not seem to be any detrimental effect on user privacy. While the concern mentioned above might be partially mitigated by improved Topics ranking and commercially focused taxonomy changes, it seems best to provide this flexibility for API callers so they have flexibility in how they use the API.
Topics may be useful for retail, travel, and other sites to identify more or less price-sensitive users.
Existing sources of data for dynamic or personalized pricing create a risk for the seller that they are inadvertently selecting members of protected groups for higher prices. However, Topics are intended to be non-sensitive (#4) so could be practical to use for dynamic pricing in cases where other data sources are not.
Personalised Pricing: The Demise of the Fixed Price?, by Joost Poort and Frederik Zuiderveen Borgesius, covers some of the incentives for retailers to adopt personalized pricing systems.
Price discrimination can benefit both buyers and sellers, leading to an increase of both consumer and producer welfare. Price discrimination can help the seller to recoup his fixed costs without losing many potential customers and make a good or service accessible to buyers with a smaller purse, even if it will lead to higher prices for other customers.
(Based on a previous issue from a previous proposal: WICG/floc#105 )
With Floc, it was possible to measure ad performance as the Floc ID was the same on the publisher and the advertiser website. It was less efficient than with cookies, and less precise, but it was at least possible to do something at the floc cohort level.
How does topic interact with the different measurement proposals? Indeed, the topics will not be the same across the publisher and advertiser website.
Did you think about how measurement is supposed to work with topics, especially at the discovery phase (which topics are the most suited for a given ads)
If a publisher is not allowed to see all the topics returned, it seems a new entity will arise, which has the responsibility of reflecting topics back to the publisher, or that ssps will do so as a condition of integration.
There are not a lot of obvious ways to restrict the topics from being shared amongst callers. One way seems to be to isolate the topics call inside an iframe. However, that would appear to break the bid request, as how would the ssp correlate the topics to it?
Another is to append the topics to a header in the bid request. In either case, the ssp has the opportunity to reflect the information back to the publisher and the publisher can accumulate them and deliver all the topics into OpenRTB.
This seems to defeat the purpose of limiting the information available to each caller, as the publisher is able to easily determine a lot more information about the user than they have now in the world of 3PC.
Finally, the publisher is now exposed to an enormous incremental security and performance risk profile, if all topics callers must run from third party js on their page. Publishers typically limit the number of parties that are allowed to do this prior to an auction. SSPs through advertisers are not able to run code unless they win. Typically only a video player, a header bidding wrapper, and an ad server have the privilege. It seems in the world of topics, publishers will be incentivized to run as many third parties as possible to try to get every conceivable topic, or to ensure they sent out a bid request that included all possible topics.
Apart from security, it seems this will also contribute to or engrain the existing bid jamming problem in adtech. If a user is a travel or finance enthusiast, the publisher is encouraged to basically spam the topics api with different third party callers and resultingly the bid stream until they can be reasonably assured they would have gotten that topic and its high value bid back.
Hi there, thanks for sharing this awesome proposal. I have one quick follow up questions regarding how to do aggregation using this API.
In original floc proposal, each websites would be able to view the cohort id of a user. One benefit that I can think of is that the website can learn which group of users visited its site most frequently. With this aggregation data, the advertiser can adjust its bidding strategy for users from different group.
For example, it could bid higher in the auction for user from a group that visited its site most frequently.
In this API, is there any plan for making such aggregation above possible as well?
One option that I can think of is that the advertisers also fetch the topics
when the browser visits its own website, and uses the topics
as an identifier of that user instead of using the cohort id.
Why not just allow contextual targeting as it exists today? Once a user exits a webpage, the topic they browsed is not relevant anymore thus there is no need to track the user. What added advantage does Topics API Add?
The Topics API restricts learning about topics to those callers that have observed the user on pages about those topics.
It sounds like if a publisher issues an ad request to an ad-exchange server then DSPs participating in the exchange can only receive topics known to the ad-exchange?
I wonder if browsingTopics()
can be extended to take an array of "reader" domains and respond with a mapping from each reader domain to the set of topics known to that reader domain, with each reader's set of topics encrypted with a public key published by that reader at a /.well-known/ path?
With this approach could the x-origin iframe also be avoided?
It seems quite unfair to be limiting the list of topics returned to a caller based on that caller presence on sites mapped to those same topics. This will create a large entry barrier for smaller actors, who will have a hard time accessing less common topics, vs large actors, such as Google, who already benefit from a very large foot print and won't be limited at all.
It also seems weird to defend that mechanism on the base of not providing more data than what cookies would provide, when same principle is not enforced on other proposals. For example, conversion measurement API will provide cross device functionality, which wasn't possible using third party cookies. One could argue that this is more privacy invasive as well.
Third party cookies shouldn't be used as the benchmark for privacy. Rather, we should consider whether such feature is following privacy expectation from users, or regulation's principle such as GDPR.
Seems like it would be more confusing for a user to know which topics he belongs to, but not knowing which callers can access them, vs simply providing all callers with same level of access.
Right now it appears anyone who can drop an iframe on a page is able to become a caller on my domain. This would include anyone who currently drops user syncs or potentially even advertisers.
As a publisher, will we be able to limit domains that are allowed to be callers and get access to browsingTopics on our site?
Suppose there are P third-parties {p1, p2, p3, …} who each have their code available to call the Topics API on many of the sites users visit. And suppose they either share information, or are even the same higher level party. I.e. ad-tech company A has servers p1,p2,p3,... all calling the Topics API on each of these sites. Each p calling the API on a given site sees a random 3 of the 5 top topics, with a 5% chance of the random topic. With enough simultaneous calls from p1,p2,p3…, they can learn the top 5 topics for that user by what is probabilistically returned.
For each site they might then create a pseudo-identifier that concatenates the top topics. E.g. If they learn the top topics for the user returned by that site are t1,t2,t3,t4,t5, then they might construct a string "t1-t2-t3-t4-t5".
Assuming these third parties are well distributed across the sites with the various topics a user visits, they might have full access to the user's topics. And they might then be able to gather a consistent pseudo-identifier for a user across the sites they visit ("t1-t2-t3-t4-t5").
Such pseudo-identifiers will likely be shared across many users who share the same interests, in any given week. Yet with 350 topics, there are 350**5 top topic combinations. Could some be unique?
Even if this kind of unicity is rare, the top topics change each week/epoch, and after a sufficient number of weeks, the sequence of pseudo-identifiers collected across these weeks might uniquely identify users.
i.e. this could allow the kind of cross-site tracking that removing third-party cookies is meant to do away with.
An assumption here is that the separate calls by p1,p2,p3,... for when a user visits a domain can be connected by the caller. This might be done by combining the timing of the calls with fingerprinting data.
This is a bit convoluted and makes a lot of assumptions. Are any of the assumptions and privacy concerns valid?
Assuming this should go into user.data on openrtb and confine to the sda standard, we need a segtax pull on the openrtb repo similar to https://github.com/InteractiveAdvertisingBureau/openrtb/pull/81/files
Not every API caller will receive a topic. Only callers that observed the user visit a site about the topic in question within the past three weeks can receive the topic. If the caller (specifically the site of the calling context) did not call the API in the past for that user on a site about that topic, then the topic will not be included in the array returned by the API.
While I appreciate the spirit of what you're trying to achieve here, I think in practice this restriction won't amount to much other than making the proposal more difficult to read and for browsers, implement. Here's why I think that:
Again, I appreciate the spirit of wanting to limit what could be known about a given site/user/Topic to what is readily observable by an API caller, but the realities of the data flows and systems in the ecosystem mean that the restriction doesn't hold up well. Thus I am suggesting this piece of the proposal be revisited.
Bigger picture, companies landing on sites/apps via the ads themselves are one of the more complicating factors of privacy and data protection. Restrictions here probably are better delivered via things like Fenced Frames.
If the user is known to be the same across colluding sites (e.g., because they’re logged into each with a persistent identifier), then it is possible for those sites to join their topics for the user together. This could also be achieved via adding topics to URLs when navigating between cooperating sites.
This analysis needs to be clarified.
Is the idea that the two sites already have the same identity for the user, e.g. same registered email address? If so, they can join topics backend.
However, if this analysis is just pointing out that two sites can collude to join topics, then the following should be reflected in the text:
It’s possible that you cover the above elsewhere and really intend to talk about the shared PII case here. If so, that needs to be clarified.
As far as I can envision, there are three high-level sets of data (corpus) for a given site that could be used by the classifier. In all cases, the output of the classifier might produce multiple strong signals about what the content of the site is (and "strong" will also need to be defined)
While the first and second options might be appealing methods as they are simple, they probably will give a very inaccurate view of the content of many sites. I think that the third would give the most accurate view of what a site is actually about.
As a browser user, I might choose to install an extension that will
It looks like the extension API should prevent extensions from seeing the user's topics or deducing any information about them, to limit incentives to submit malicious extensions.
Related: #78
Should there be a process to alter the assignment? What should the process be?
An alternative is to allow for sites to set their own topics via response header, as in #1.
Hi folks!
I'm the invited expert of PING (Privacy Interest Group) of W3C.
I've heard this Topics is the replacement of FloC.
Is it ready for privacy review now, or is it too early?
Thanks!
One major drawback of FLoC Origin Trial was the volume observed, to the point that drawing any meaningful conclusion was a challenge. What will be the expected volume of the future Topics API OT, and particularly:
This topic came up in the Web-Adv W3C group and I do not see a issue addressing it here? What is the performance impact of this system in terms of delaying time to first ad call if the ad system is dependent on calling this API?
I'm concerned allowing the browser javascript context direct access to the topics. There would be no way to ensure that the topics are delivered to the parties that the origin really would want or could be modified to disrupt behavior. For instance if I have a CSP that allows a CDN, those could scrape the topics without the origin knowing about it. It might make sense to create a blob thats opaque to the javascript context, and can be decrypted by an origin server given the key as a restricted header. Example below
document.browsingTopics()
only to itself.await document.browsingTopics();
and retrieves an opaque encrypted blob that is signed for the origin server only.fetch('https://anotherorigin.example.com', { includingTopics: true, body: JSON.stringify({ topics }) })
fetch
call outright.fetch('https://origin.example.com', { includingTopics: true, body: JSON.stringify({ topics }) })
.Topics-Key
header in the request and the value can be used by the receiving server can be used to decrypt the topics.The Topics API involves ranking the "top topics" for a user's browsing activity in one epoch. How would the API aggregate and rank a user's browsing activity to find the top 5 topics described in the explainer?
Currently the explainer says https://github.com/jkarlin/topics#specific-details: "at the end of an epoch, the browser calculates the list of eligible pages visited by the user in the previous week" and "the topics are accumulated".
More specifically, can you clarify:
If a user visits the same site multiple times, does that increase the ranking for the site's topics? If so,
Will the classifier output be one-hot (unit-weight) or some fractional weight? Having weights may make the classification seem less arbitrary.
Will the classifier output take the taxonomy hierarchy into account? For example, if the site is classified as /Computers & Electronics/Consumer Electronics/Cameras & Camcorders
, we can also include /Computers & Electronics/Consumer Electronics
and /Computers & Electronics
. This also should apply for per-caller eligibility.
How will the weights (unit or otherwise) be aggregated?
/Computers & Electronics/Consumer Electronics/Cameras & Camcorders
is in the top 5, then it implies the more general /Computers & Electronics
. Having a more diverse top 5 should improve the long-term utility.One concern with the Topics API is that users' top topics will be dominated by broad, signal-poor topics. Choosing the taxonomy and weighting algorithm thoughtfully may help.
The explainer sets as first privacy goal (https://github.com/jkarlin/topics#privacy-goals):
It must be difficult to reidentify significant numbers of users across sites using just the API.
What are the values (orders of magnitude) that Google would put behind "difficult" and "significant numbers of users"?
Since user can have up to 15 topics computed by the browser, and only 3 are returned on a given site, doesn't that mean that advertisers interested in specific topic will see their reach divided by 5 compared to today's cookie based approach?
For example, a user has the "sport" topic. As per current design, only 1 site out of 5 (on average) will access the "sport" topic from the user.
If an advertiser targets the sport topic, it means it can target that user only on that 1 site that got access to the sport topic.
Isn't it a risk of severely impacting the advertiser's reach?
The classifier is likely to be wrong from time to time and sites might which to adjust the topics returned for their site. One way to accomplish that is to allow sites to set their own topics via response headers.
The concern with this is if sites decide that some topics are more valuable than others, and decide to only list valuable topics, polluting the input to the API. How real is this risk?
We propose choosing topics of interest based only on website hostnames, rather than additional information like the full URL or contents of visited websites.
This is a difficult trade-off: topics based on more specific browsing activity might be more useful in picking relevant ads, but also might unintentionally pick up data with heightened privacy expectations.
Let's assume subdomains in this context are separate hostnames and we're using the terms interchangeably.
If publishers want to serve more relevant advertising, they're incentivized to send narrower topics. And if they are rewarded for sending narrower topics, then they're compelled to granularize their site into as many subdomains as possible without suffering traffic losses.
Making subdomain-heavy site architecture monetarily advantageous for publishers raises some concerns. @gui-poa's voiced this in another issue. A few of mine:
Most mainstream CMSs sit on a single subdomain and allow for content creation/organization via directories only. "Multisite" options exist, but are typically an enterprise solution. In this way, large publishers who can engineer their own hostname-first CMS and afford devOps to manage multiple name systems may be afforded an unfair advantage over small/independent publishers
Subdomains have their taxonomical role in isolating use-cases (support.example.com) and localization (es.example.com, de.example.com)… but in my experience, using them for subcategorization (i.e. arts.example.com) complicates breadcrumbs, wrecks sitemaps, dirties or breaks analytics, and causes cross-origin problems
The SEO debate around subdomains vs subfolders is tired and isn't worth rehashing here (just search "subdomains SEO")
Nightmare scenario: Publisher sites link internally to pages hosted on granularly categorized subdomains to appease the Topics API, all of which are cross-canonicalized to a single subdirectory-first site that the publisher feels is better optimized for search engines. Users only ever see this subdirectory site as a landing page... it exists for bots.
What alternatives are there to the proposed "one topic per hostname" rule? Can one respect heightened privacy expectations (avoiding hyper-targeted, mappable Topic API sends) without nudging publishers toward subdomains?
Hei,
I can read "The Topics API will have a user opt-out mechanism". I would strongly advise to go with opt in instead of opt out to go together with the stated privacy goals.
Just a note that opt out is very much not compatible with the GDPR:
Consent should be given by a clear affirmative act establishing a freely given, specific, informed and unambiguous indication of the data subject's agreement to the processing of personal data relating to him or her, such as by a written statement, including by electronic means, or an oral statement. This could include ticking a box when visiting an internet website, choosing technical settings for information society services or another statement or conduct which clearly indicates in this context the data subject's acceptance of the proposed processing of his or her personal data. Silence, pre-ticked boxes or inactivity should not therefore constitute consent.
https://eur-lex.europa.eu/eli/reg/2016/679/oj
floc was opt out (and using the ad blocking EasyList to track people for ads...) so it couldn't be enabled in Europe.
Using your example where the top five topics for the first week are:
Top Topic | Parties That Can Learn About the Topic |
---|---|
Apples | T, R, S |
Bananas | S |
Cantaloupe | T, S |
Emblica | S |
Grapes | T, R, S |
and that the user browses primarily the same (types of) sites in subsequent weeks so that the 5 topics (and parties that can learn about them) are identical for weeks 2 and 3, then is the following correct (ignoring the 5% random topic)?
The topics will be inferred by the browser. The browser will leverage a classifier model to map site hostnames to topics. The classifier weights will be public, perhaps built by an external partner, and will improve over time.
As others have already pointed out, this poses a challenge for sites that may not have descriptive hostnames, or span a wide array of topics under the same hostname (e.g. a publisher covering sports, business, entertainment, etc), a merchant with a large catalog of items (home goods, clothing, etc), and so on. Given that the current proposal considers hostname, not just domain name, this might create pressure for sites to adopt more subdomains to help with classification (e.g. sports.pub.com, homeware.shop.com, ...), but that's a costly undertaking with its own side effects.
Separately, there are open questions on misclassification (#2) and the ability to set (#1) topics.
My hunch is they're all semi-related and we could, perhaps, try to address them by enabling sites to "seed" a set of suggested topics. Going down this route would effectively translate the current proposal into a weakly-supervised classifier model: it doesn't make strict guarantees about the outcome of the classification but allows the site to influence and provide input signals.
More concretely, the rough model here could be...
By restricting suggested topics to the predefined list we're not any new labels/segments, etc. At the same time, enabling sites to provide page-level scoped topics would, I think, address the challenge for multi-topic sites. For example, a publisher or merchant could advertise relevant topics for each section of their site (which paths and pages get which topics is controlled by the site owner). Downstream, the browser can introspect the pagel-level browsing history of the visitor, build an aggregate count of observed topics by the visitor, apply its own filters/validation, and feed the resulting set as input into the classifier model.
As noted above, this makes no strict guarantees about the final output of the classification, but it enables the site to make suggestions, browser to audit/filter suggestions, classifier to act on suggestions. The net result is that a reader who spends most of their time on the sports section of pub.com, or a buyer on the homewares section of a large merchant, might then receive a relevant classification for the {site, user} tuple.
The taxonomy_v1 file proposes interests which seems to work just fine with product sale. Do you have any thoughts on how topics may work with job advertising? A user browsing pattern which emerges from the fact that you are in the state of a career/job change is not alone based on a categorization of interests.
If the user opts out of the Topics API, or is in incognito mode, or the user has cleared all of their history, the list of topics returned will be empty...
Seeing how sites now try to detect incognito mode and/or ad blockers and give users of those modes substandard experience, consider making empty responses more normal by providing them in perhaps .5% - 1% of cases. That might be high enough to discourage the provision of substandard experiences.
It's great that this proposal is incorporating public feedback.
It would be even better if this proposal was published with a set of tools and datasets for external researchers and the web community to better evaluate the proposal with empirical tests.
For example, a colleague and I recently did a post-mortem analysis of FLoC:
https://arxiv.org/pdf/2201.13402.pdf
Our analysis required us to re-implement FloC and to leverage a proprietary dataset of browsing histories. These hurdles make analysis by and for the public inaccessible to many external researchers and community members.
It would be helpful for the Chrome developers to publish tools, example datasets, and code so that their proposals can be more easily interrogated by researchers. Will such be made available?
Assuming that an aggregate conversion reporting mechanism is also supported by the browser, when a user converts on an advertiser's site, for each ad viewed and/or clicked, the aggregation report should include the:
This will allow the advertiser to understand the value of each topic or at least topics that are common for people who later convert on their site.
It sounds to me like this will make it easier to track households, and through that: individuals.
I'll give an example:
Companies A, B, and C have access to the topics API on different sites (along with passive "PII" like IP address, browser version, etc.). They then send this data to the open auction (RTB stream).
At this point, companies M and N bidding on the auction can aggregate the topics from A, B, and C. They also group it with other topics from the user's home IP address (for example, from another device, roommate's device, etc.). Side note: all this info together should make it relatively "easy" to keep tracking the household through daily IP changes: multiple devices + browsers + top topics.
Now, another party, Company X wants to know whether they should bid on an ad placement from a specific IP. They ask companies M and N what that IP's top topics are, for a price. The result is that they get a list of the top 10+ topics of all users at that IP. For a larger price they can filter by device info.
I'll make that last point clearer: any advertiser/third-party can have a pretty good understanding of any residential address' topics ("browsing history"). Not just the top 5.
Also, if a household's aggregate data is "unique enough" from other households. Even a user in the household who opts out of tracking can be classified/targeted based on passive data.
A domain should have to be viewed by some minimum number of unique users in order to be used for classification, particularly if the associated Topic is very low volume. For example:
if a user visits bluesmusic.com, but that domain only has 100 unique visitors in a week, that domain should not be used to classify into the Blues Music topic.
Very low usage domains may not be strong enough signals to the classifier to accurately represent a topic, and therefore should be disqualified.
Allow callers to specify a section name that the classifier can use to develop a topics list, to improve personalization for users of large, multi-topic sites. Callers could populate the section name in Topics API calls using the existing schema.org articleSection property already in use.
If the topic list is per-hostname, a user of a large general-interest site may receive inadequate personalization compared to a user of multiple niche sites with only a few topics per site.
A section can be any subdivision of a site, including a "channel" "group" or "space."
This is separate from the question of allowing publishers to specify individual topics. The publisher-provided "section" is just an identifier applied to a subset of pages on that site, and the actual topics for pages in that section would still have to be determined by the classifier.
The random response for an epoch is drawn uniformly at random from the full taxonomy (though we should probably remove the 5 topics for the epoch). Should that random response have been seen before by the caller in order to be returned?
If yes: Then the plausible deniability is limited to whether or not this was a top topic for the caller.
If no: Then the plausible deniability is increased as the caller cannot 100% know that they actually observed the topic for the user. On the other hand utility drops some since there is more noise in the system.
This would reduce the need for expensive (and slow) x-origin iframes to be created.
As documented, the Topics API has no particular minimums set about how many page visits/unique topic visits a user needs to have before they are classified. To prevent users from being classified into Topics with too little data, minimums should be put in place. For example:
In a week, if the user has not had at least 50 page visits, they can not be classified into any topics.
In a week, if the user has not had at least 10 page visits to a particular topic, they should not be classified into that topic
Those might not be exactly right, but some minimum usage component should be built into the system.
The explainer says that Topics API is meant to support interest-based advertising to display more relevant ads "helping to fund the sites that the user visits.".
Compared to the same kind of advertising with third-party cookies, what's the goal in terms of sites funding level?
As a component of the testing of the Topics system, some reconciliation should be done to ensure that each topic is of material size to be useful AND that there are enough domains/usage to qualify a large enough group of users. This would be above and beyond any privacy requirements.
For example, there appear to be very few domains that could cleanly map to the "Blues Music" topic. If origin trials and other testing prove that fewer than 50,000 users might be qualified into that segment, it should be removed from the taxonomy.
50,000 users might be enough to guarantee privacy, but is likely not enough to be useful from an advertising perspective. Since the number of segments is going to be low (in the hundreds, in theory), each one should be of maximal benefit from an advertising perspective, and very small segments will be of limited value overall. So in the example, Blues Music would be removed and replaced with another topic that might achieve higher scale.
The explainer says:
Whatever topic is returned, will continue to be returned for any caller on that site for the remainder of the three weeks.
Could you please tell if the following example is correct:
Hey,
I think this spec is quite a big step in the right direction considering privacy. But there's a few items that I'd like to have clarifications on specifically around the economics that are created with such a spec.
Today it's possible to receive a variety of topics as well as buying using third party data and first party data. However in the near future all of this will go away replaced with Privacy Sandbox. At which point a lot of demand will be concentrated in specs like this. Having just 100s of categories will mean that all buyers will have a limited set of topics to buy from, in the case of this spec, but this wouldn't be too bad of an issue if each buyer could see a different set of categories, as the spec implies that it is possible. However since buying is still done through exchanges or SSPs, in practice all buyers will see the view of the exchange in the topics and everyone in the same exchange will receive the same 3 topics.
At the scale we're all operating at, 100s of possible categories and just the same 3 topics could end up causing significant price inflation, and that typically is something that favors bigger advertisers with bigger budgets capable of sustaining higher CPMs.
Second, and still connected, a buyer adtech should be able to see topics while browser is visiting the advertiser site, a buyer could use this data to determine which topics are correlated with certain actions. However this set of tags will not match what is going to be available from the exchange/SSP because of the different install base. It's possible for example that a big exchange like Google's will have a lot of generic news sites or generic shopping sites that will capture the top sites for a given browser, while a small business adtech vendor will instead see a lot of nuance in their topics. This nuance will be erased due to differences in footprint, further compressing demand to the most generic topics and increasing their price not just by reducing the tag numbers, but also by reducing variability of them.
I don't know how someone would build a solution that would work for you as well.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.