robert-proximity-tracing / documents Goto Github PK

View Code? Open in Web Editor NEW

247.0 247.0 21.0 5.7 MB

Protocol specification, white paper, high level documents, etc.

License: Other

HTML 100.00%

documents's People

Contributors

Stargazers

Watchers

Forkers

btrd drtangta yourfrienddhruv bansicloud davidlama dmnized denislafonttrevisan martinseiter blueskeye teddybear06 lparth oucem 0364d173d462 darkslategrey fredericdasilva arodroz matdonell blackhalt anigasan

documents's Issues

Background limitation and impact?

TraceTogether/Bluetrace highlight in their WP the major limitation of BLE not being able to run in the background on iPhones (when screen off) requiring users to keep their phone's screen on by all times (see excerpt below). It seems that the background restrictions have not been considered in your design since you focus on privacy. Nonetheless:

Did you evaluate what is the impact of this limitation on contact discovery and hence contact tracing?
How do you plan going around that other than having to use the Apple-Google framework?

I've highlighted this limitation in my post here: https://medium.com/@legfranck/the-good-and-the-bad-of-apple-googles-privacy-preserving-contact-tracing-744806450be9

No at-least-once guarantee of delivery of ESR_REPLY_{A,i} message

Users can no longer submit ESR once they have been notified they are at risk (§ 7 server processing step 6).

Yet there is no user acknowlegement of the reception of an ESR_REPLY_{A, i} set to "1" (i.e. confirming an at-risk status), see §7 "Server Processing", "If the ESR_REQUEST_{A,i} is valid".

Should the end-user not be able to receive this message (because of a network failure, a compromised device, or an interception of this message), they would receive no information for this ESR, and all further ESRs from them would fail silently.

Inject false "at risk" status

If a malicious A who hates B wants B to get an "at risk" status at his next check, A can catch a Hello_B from B, find someone who got infected C and corrupt him to insert Hello_B in his LocalProximityList before he reports.

TPM?

K_A could be stolen by a malware or coercion attacks.
If K_A is stolen, A can be impersonated in HELLO_A messages and in communication with the server (status check, retrival of new EBIDs).
A TPM could help protecting K_A.

Bruno Sportisse Paper

Au lieu de vous passer de la pommade sur:

l'INRIA: l'inventeur du numérique, excusez du peu !
L'exception française...les autres ne sont pas aussi rigoureux que nous sur le droit numérique !
L'exceptionnel travail de ses chercheurs alors que je n'ai pas trouvé de papier qui fasse une synthèse et prenne du recul par rapport aux deux approches Robert et DP-3T
Un soit disante TaskForce hyper réactive alors qu'il n'y a ce jour pas un début de ligne de code en python sur Github
l'ignorance de l'avancée du travail de DP-3T, qui propose ce jour des implémentations qui fonctionnent. J'ai un téléphone Android qui échange avec un iPhone sans pb.
les qualité de l'OpenSource, ...allez, DIY, sortez vous les doigts du ... et contribuez, collaborez avec DP-3T puisque ce sont les plus en avance.

Le plus exceptionnel est d'insister sur le consentement et le caractère non obligatoire de cette app. Car si l'on constate dans quelques mois dans un pays que le contact tracing apporte un gain pour faire passer le Rt, alors nos chers politiques et directeurs de l'INRIA feront machine arrière et déclareront le port de l'application obligatoire, pour raison de santé publique.

La contribution de DP-3T n'est surement pas parfaite, mais elle a le mérite d'exister et de ne pas faire des déclarations publiques élogieuses, alors que pour Robert, c'est toute le contraire, on veut du CODE si vous voulez que les développeurs indépendant vous aident.

More proximity for more efficiency

Since a trusted server is accepted in this project why not let each user (A) "draw" a cercle (of 2m radius) around him[her]self and monitore each in and out of other users (X) in his circle. If a user stays more than 60sec in his circle then record the "[temp]ID" of that user ?

If "A" moves to another aera without someone entering his circle then that position is not saved.

That makes more traffic between users and the trusted-server but is more efficient because only real contacts are recorded.

With the bluetooth many false contacts because no way to assess the distances (or am i wrong ?)

Author names are missing

The responsible authors should be named as it is common practice in the scientific community since centuries.

Clarify 'securely erase'

Server operations in infected user declaration contains a securely erase feature

securely erases (HELLO_A; Time_A)
Could you detail on the procedure for secure erase and the benefits of secure erase the above message?

Risk comparaison with D^3PT

It would be nice to provide an evaluation about DP-3T
Their threat model is not based on HbC entities operating it but it's sending the ephids of all the infected to all the users instead wich may or may not be better.

At first ROBERT seems a bad for privacy, because the contact graph can be discovered by the server operator entity, but it may be better than sharing the infected anonymous cryptographic ids ?

Is there any serious evaluation about D3PT desanonymisation or infected one risk ?

Misleading: 'B and D do not get exposed' in Specification

In figure 2, it is clearly illustrated that user B is exposed but the corresponding texts mentions that B and D do not get exposed.

Figure 2 illustrates the effect (and benefit) of proximity tracing. C is infected and diagnosed COVID-positive. User A gets notified and become "at risk" users. A gets tested and is diagnosed COVID-positive. B is in turn notified, becomes "at risk" and is confined before meeting D. Consequently, B and D do not get exposed, and therefore infected, anymore.

What scenarios could be used to assess risk scores centrally, but not decentrally?

From the technical spec:

Although ROBERT is proposed as a ”proximity-tracing” protocol, ROBERT is actually a framework to assess the risk exposure of its users in order to fight pandemics. In our proposal, and as opposed to decentralized schemes, users do not get any information about the status of their contacts. In particular, they do not learn how many of their contacts are infected, nor which of them are. Instead, users get informed about their exposure level only upon the computation of a risk score by the server. The risk score may be based on proximity information, but also on other parameters that epidemiologists will define and adapt according to the evolution of the pandemic.

I would like to know which risk score assessment methodologies would rely on a central operator. Most risk assessments that rely on proximity, locality, or exposure through your job (e.g. in a hospital) can be processed far more efficient on the device itself. So which scenarios did you have in mind for this additional centralized risk assessment?

details of the upload authorization procedure

Thank you for all this good work. May I suggest the addition of details on the protocol to authorize a user to upload their data upon being confirmed positive ? Three types of privacy threats come to mind that are related to this part of the protocol.

One one hand, it will be very important to prevent people from falsely reporting that they have become positive and thus causing anxiety, embarrassment and other inconvenience to possibly a large part of the user base. A trusted authority (some sort of 'notary' actor will be needed, along with accountability and surveillance that this authority is not being abused.

It is also possible that this verification step could create a link between some of the data records that were meant to stay separated. The 'notary' (the person who authorize the user to release the information after verifying the test results) will have to know the identity of the person making the report, and will probably need to record the time and location of authorization (for accountability purposes). There is also a technical need to inform the user's mobile device to release the contact tracing information; this is possibly going to be based on an authorization code generated for the ID of the user, hence creating another link.

This mechanism will also a possible vector for attacking the mobile app.

thanks - Arnaud

Difference to Singapore's BlueTrace/TraceTogether

How does the described protocol differ from the BlueTrace/TraceTogether Whitepaper ( https://bluetrace.io/static/bluetrace_whitepaper-938063656596c104632def383eb33b3c.pdf ) which was released some time ago?

Parts that are the same in BlueTrace and ROBERT:

Relies on an mobile app communication with a trusted backend server
Contact of non infected users are stored locally
Contacts of infected users are communicated to the health authority / server
Pseudonyms are generated by the server and are linkable for the server
Pseudonyms are ephemeral and distributed via Bluetooth
Contact Tracing is done on the server allowing the server to determine who interacted with one-another and thereby leaking the social graph. It also leaks to the server who in particular is at risk. This information could be used to enforce quarantine and goes against voluntary nature of contact tracing approaches in Europe
Federation between countries and different organizations was also proposed and implemented by BlueTrace

The only difference I can spot so far, is that no BlueTrace/TraceTogether requires people to provide a phone number. But this is not essential to their design.

ROBERT also does not explain how an infected user can upload the PoximityList to the server without leaking its identiy in form of it's permanent identifier or in form of a verification token. BlueTrace uses authorization codes to determine if an upload legitimate. This helps the health authority to deanonymize other users.

Threat model is unrealistic

In the specification v1.0 document, under section 1.4 Adversarial Model :

The authority running the system [...] is ”honest-but-curious”.

I understand that it might be an interesting academic exercise to operate under such an assumption, however if the goal is to apply the described protocol in practice then history has shown that governments absolutely cannot be trusted to use the information at their disposal strictly for their original intended purposes and never extend them to new purposes. Examples of misuse abound.

It is also virtually impossible for individuals to audit code run by government bodies. Even if the source code were made available, it would not be possible to be sure that it is actually what the servers are running. So the threat model must be adapted accordingly.

Furthermore, the actual assumptions made in the document indicate that the authority is in fact trusted, which is a more stringent model. See section 6.2. Server Operations :

Upon reception of a hA = (HELLOA,TimeA), the server:
[...]
10. securely erases (HELLOA,TimeA).

A curious entity would keep that information to be able to infer more about the contact graph of the users. Other design decisions are the result of that threat model.

I feel that this is a serious issue, basing further development on flawed assumptions would be misleading and undermine the adoption of the protocol and apps by the population.

Add a license

The current set of documents are made public but do not seem to be accompanied by any license, either as an independent document or as a mention in the binary files.
As such, they are thus apparently under copyright, and cannot be modified nor even redistributed.

In order to follow an “open source approach”, it would be important to add a license to this repository so that one can understand what are the expectations and legal rights for redistributing and building upon this content.

Allow user to chose which EBIDs to send, and which not

Hi,
For users being confirmed as infected, it should be possible to select which elements of its local proximity list are sent to the server.
This would be an improvement to both:

Privacy and security. Alice might choose to send the EBIDs received when she went to the coffee shop, but not the one received during a job interview, for example.
Reducing false positives. Alice could choose not to send the EBIDs received when her phone was charging near the window with no-one in the room, hence eliminating the false positive of someone on the other side of a wall.

Documentation: In the figure 1

In the figure 1, the "contagious period" is not accurate for A and B

ROBERT status after first few days of reviewing ?

Just curious, what is the status of ROBERT protocol development after a few days of reviewing (with pretty strong negative remarks about its actual properties) ?

Is it the choosen/prefered/short listed protocol for the French governement app that may be in development phase for May 11th ? (not so many official informations about this) ?

Device type not known ?

Hello,

It is not intended to broadcast the device type in the Hello message.

Thera are hundreds of smartphone models that emit in Bluetooth with different signal levels.
Without knowing the device type that emitted the Hello message, how will it be possible to estimate the distance, with a sufficient precision ?

Thanks.

Do not use 3DES as a potential block-cipher

In Section 4, the EBID (Ephemeral Bluetooth Identifier for A) is generated using a 64-bit block-cipher where 3DES is given as a potential solution.

3DES is today considered as a weak cipher, and should not be promoted in modern solutions. Several sources (Wikipedia article cites a number of them) report this scientific consensus and show various weaknesses of this cipher.

Instead, more secure solutions should be considered.

Implementation concerns.

I am afraid StopCovid is going to cost loads of money for nothing.
The list of involved companies is massive:

Inria: coordination and transmission protocol, privacy-by-design;
ANSSI: cybersecurity;
Capgemini: back-end architecture and development;
Dassault Systèmes: SecNumCloud qualified sovereign data infrastructure;
Inserm: health models;
Lunabee Studio: development of mobile applications;
Orange: application distribution and interoperability;
Santé Publique France: integration and coordination of the application in the global strategy of contact tracing;
Withings: connected objects.

They have 13 days left. How can it even be ready. How the government chose these companies? Why not use smaller companies that are desperately need money and know how to deal with uncertainty and short deadlines? By using huge companies, communication between all these entities is gonna kill the thing and make it obsolete.

To my government: we are not building a new Airplane and do not need to to make every big companies happy with a piece of the cake (and a bag of coins). People are dying.

To INRIA: all concerns that people took time to explains here are unanswered. Why make this public then? Why INRIA was seen as the best entity to make the protocol knowing that their are tons of security laboratory in France with talented people?

Anyway all of this, as a Frenchman, makes me sick.

ROBERT's desired property is not clearly stated, proposing: "k-anonymity"

ROBERT's authors refer often (3 times) to @vaudenay's security analysis of DP-3T in their paper. Indeed, they position their paper as an alternative of "decentralized" applications like DP-3T:

Although it might seem attractive in term of privacy to adopt a fully decentralized solution, such
approaches face important challenges in terms of security and robustness against malicious users [6].

Authors precise their thought later:

Other, qualified as “decentralized”, schemes broadcast to each App an aggregate information containing the pseudonyms of all the infected users [1]. This information allows each App to decode the identifiers of infected users and verify if any of them are part of its contact list. Our scheme does not follow this principle because we believe that sending information about all infected users reveals too much information. In fact, it has been shown that this information can be easily used by malicious users to re-identify infected users at scale [6].

It seems that the main point they identify is exposure deanonymization.
However, defining exactly what attacks could lead to such deanonymization is not that simple.
It has been lengthily discussed in #46 and it appears that without a proper authentication mechanism, ROBERT is vulnerable to the same attacks than DP-3T. At the same time, it has been proven that such deanonymization attacks can't be conducted at scale on DP-3T. Finally, and following inherent problems of tracking applications, if you met only one person during the last 15 days and receive a notification, you will be able to identify the infected person. Please refer to aboutet's message for the authentication part, veale's message for DP-3T mitigations, and risques-tracage.fr for inherent attacks to tracking apps.

We define authenticated ROBERT as a patched version of ROBERT that is not vulnerable to Sybil attacks

It seems that the only goal of authenticated ROBERT is to, contrary to DP-3T, prevent a notified user to learn who may have infected her considering she met more than one app user in the last 15 days.

It looks like we could rephrase this desired property as "k-exposure-anonymity" (exposure referring to the fact that we were in contact long and close enough to be recorded by the app).
k would be the degree of anonymity that has an infected person A against another user of the app named B.
k will be equal to the number of people that B has seen during the last 15 days.
A has probably met many people during the last 15 days, so we name the people she was exposed to B_i (B_0, B_1, B_2, etc.).
Each B_i app user met a different number of users during these last 15 days.
It will mean that A will have a different anonymity degree against each user B_i.
Anonymity degree between A and B_i is referred to as k_i.
If one user B_j has seen only one app user in the last 15 days, A will be 1-contact-anonymous to B_j - so no anonymity is provided to A against B_j.

Introducing this k-exposure-anonymity logic could help us improve our exchange and evaluation of ROBERT:

At which value of k-exposure-anonymity we consider that infected user privacy is protected? 2? 5? 100?
Should we prevent from notifying users B_i that have met less person than the desired k value to protect infected persons' privacy?
Do the average user met enough people in a 15 days timeframe to justify k-exposure-anonymity?
How can we model a more accurate k value considering user's environment (for example if I live with 5 people, I will provide in theory at least k=5 anonymity to infected persons but the real k' value is k'=k-5)

To conclude, I think that ROBERT tries to provide a property that I name "k-exposure-anonymity" without explicitly defining and naming it. Authors should clearly state the property they seek and critically evaluate their proposal at the light of it

Note: This post is not about the usefulness of this property or about tradeoff it involves (like requiring authentication and trust in an authority), however an analysis of cost/benefit of introducing such a property would be very interesting

A possible DdoS escalation in case of federation key compromise

As described in §6.2 "Server operations", any server executes two successive steps (numbered resp. 2, and 3-4) in the treatment of incoming h_A messages:

it deciphers the country code in the message h_A using the federation key, and if valid yet not his, forwards it to a server of the appropriate country,
then decrypts any non-forwarded message (which therefore bears its own code) to check if it corresponds to a valid, known user.

Note that this second steps is the first full validation operated on any message h_A, and as a consequence, the server routes some messages before fully checking their authenticity. We'll see later how this is a necessity of the protocol as designed.

This could allow any attacker that has compromised the federation key to then break the full forwarding mechanism, through a Distributed Denial of Service (DDoS) against a particular country amplified by the other federation members themselves. This would probably happen in spite of denial of service protections.

But what about denial of service protections?

The ROBERT protocol, by virtue of being a service with pseudonymous users deployed on a continental scale, is susceptible to DoS attacks. One usual remediation would be to proxy access to back-end servers at the network level, whereby many proxy servers are the (load-balanced) initial recipients of user traffic and rate-limit any single network source of incoming packets. External users can't address back-end servers directly.

A counter-attack is to orchestrate a distributed denial of service (DDoS), by which many sources of packets emit an overwhelming number of seemingly valid messages overall, without any single one producing so much traffic as to be rate-limited. The proxies are not overwhelmed, but route their data to the back-end services, which are.

The answer to this is an application-level proxy, which would be able to authorize further processing of incoming messages by validating (or in-validating) their contents rather than just their network meta-data.

Yet there is precious little that a such an application-level proxy can do without access to a server's secret key K_S or the identifiers id_A tied to its users. At most, the time validation listed under steps 5-6 in §6.2 can be executed sooner in the process — yet valid time stamps would be easy to falsify for an attacker.

In particular, if those servers do not have the K_S and id_A used for foreign users — something which is an explicit design goal (and a lynchpin of data sovereignty)— they must take a forwarding decision without analyzing the validity of the data. From the point of view of international forwarding, there is no such thing as message validation.

The Escalation

Assume a malicious attacker has obtained the federation key K_G. As the federation key is shared with more and more endpoints along the growth of the deployment, and no key rotation protocol has been specified for this secret, the probability of this event only grows over time.

The attacker can then generate a colossal number of distinct messages h_A with an invalid EBID_A component but a valid country code, since he created EBID_A, knows K_G, and the country codes are public (§3.1). He generates this torrent of bogus messages to all bear the code CC_T of the Target country they want to attack, and sends it through a variety of network routes to the public endpoints of other, non-T countries.

These countries will forward the messages in question to the servers of the target country T. Furthermore, it is probable that this re-direction will be privileged, meaning that the initial message recipient is able to address the servers of the target country T without going through any public-facing proxies.

Indeed, if redirects coming from inside the federation were not privileged, that message forwarding would offer little to no benefit when compared to user re-direction, by which the initial recipient server simply indicates to the user to contact another country's public endpoint, and does no further processing.

As the DdoS' internationally forwarded messages are sent directly to back-end servers without validation, and it is probably possible to find a target that has much less back-end servers on its own than there are relaying servers in the whole of the federation outside of it, those server-powered redirections overwhelm the target country's ability to process messages, while helping hide the source of the attack.¹

Possible Remediations

It seems wise to want to minimize the chance of compromise of the federation key, at least by having a strong design for its custody, including rotation. A fall-back mode that would switch to user redirection rather than forwarding in case a DdoS is observed on /any/ country's back-end from any source also seems useful. Caution should be used in enabling heavy-handed DoS remediation at the final back-end, such as lengthy source address bans. Finally, we note a decentralized protocol presents no similar DdoS escalation path.

¹: Note that even in the absence of key compromise, any federation member with the resources to DdoS another can leverage the same attack and obtain plausible deniability by claiming it is the unwitting victim of the present attack. Other federation members are therefore trusted in that sense as well.

KISSACT proposal

I would be happy to help with such contribution:
https://github.com/pelinquin/kissact

Naive question about the continuous mobile phone connectivity assumption

If I read well, the proposed protocol assumes that the user's phone has unfettered, continuous Internet connectivity. This connectivity is required for changing ephemeral identifiers every epoch, which seems to be something around 15m from what one can read here and there.

What happens if a phone is disconnected from the Internet for a longer period of time? Does ROBERT keep using the same identifier for proximity tracing, even if it becomes less and less ephemeral? What are the security/privacy implications?

Notice that long periods of internet disconnection are the norm, not the exception. In Paris, Internet access in the underground is sporadic at best; in theaters and cinemas internet is often jammed (on purpose). And yet, these are the places where we are going to be most exposed to contact with strangers, which is the use case where a tracing application is most useful (I may be delusional, but I do expect to learn by myself when friends, colleagues or relatives get sick).

Source IP address

What measures can be taken to protect the source IP of the mobile device? (without using TOR). All of this seems a bit disingenuous if the server can associate between the pseudonym and the IP address of the user running the application. Any thoughts on overcoming this challenge? Clearly there will be a NAT on the way, but in many cases, this information can still be resolved as it is done routinely by LE today.

Why are EPIDs only generated by server?

What is the advantage of generating all the IDs and storing them at the server? What if these IDs were generated by the App and communicated to the server when needed?

For example, the App and server agree upon a server generated ID_A during initialization/registration phase. The server now has a ID list of unique IDs for each registered app. The app can generate EPIDs at the required interval( as mentioned in the specification but generated by the app).
The proximity discovery phase will remain same with app distributing app generated EPIDs. In infected user discovery, the infected user app sends its LocalProximityLsit to server which is stored by server in a list of exposed EPIDs. In exposed status request phase, the server can collect user's generated EPIDs and match it with its list of exposed EPIDS and calculate the risk score.
This is just a strawman protocol where the generation is done by app instead of server. Potentially, there will be many many registered users per server. This type of EPID generation may reduce the computation and storage load on server and the computation load on app is only increased minimally(to generating EPIDs) The storage size remains same for app side.

DDOS Prevention

As you consider the authority in an HBC model.

Have you considered the mitigation in case of a DDOS attack?

Recent examples by the french government for the covid "autorisation de sortie" website have shown it was completely outsourced to a company in Virginia... (not just a tracking pixel, or a gateway)

Considering ROBERT would likely generate much more traffic, and consume more ressources, are you assuming the anti-DDOS server(s) to still be located on the national soil and so possibly part of the HBC authority, or would it be once again outsource to the US? And are you considering them again to just be Honest but Curious?

Have you estimated, the amount of simultaneous requests it is likely to have to handle?

Purpose af account deactivation

ROBERT specification 1.0 : p5

If this score is larger than a
given threshold, the bit \1" (\at risk of exposure") is sent back to the App and her account is deactivated,
otherwise the bit \0" is sent back. Upon reception of this message, a noti cation is displayed to the user
that indicates the instructions to follow (e.g., go the hospital for a test, call a speci c phone number, stay
in quarantine, etc.).

What is the purpose of deactivating the account on an infection suspicion ?

Pseudonyms Generated and Managed by Authority, Communicated to Users

The ROBERT summary document contains the following diagram, showing authorities generating pseudonyms and transmitting them directly to users:

However, Section 1.3 of version 1.0 of the ROBERT specification states that, as a security and privacy requirement, ROBERT mandates the following:

Anonymity of users from a central authority. The central authority should not be able to learn information about the identities or locations of the participating users, whether diagnosed as COVID-positive
or not.

And yet, this assumption is only meant to hold under an honest authority:

The authority running the system, in turn, is ”honest-but-curious”. Specifically, it will not deploy spying devices or will not modify the protocols and the messages. However, it might use collected
information for other purposes such as to re-identify users or to infer their contact graphs. We assume the back-end system is secure, and regularly audited and controlled by external trusted and neutral authorities (such as Data Protection Authorities and National Cybersecurity Agencies).

Furthermore, Section 2.2 states the following:

When a user wants to use the service, she installs the application, App, from an official App store (Apple or Google). App then registers to the server that generates a permanent identifier (ID) and several Ephemeral Bluetooth Identifiers (EBIDs). The back-end maintains a table, IDTable, that keeps an entry for each registered ID. The stored information are “anonymous” and, by no mean, associated to a particular user (no personal information is stored in IDTable).

In short, all of ROBERT is built on trust from central authorities and the assumption that they will behave honestly and be impervious to third-party compromise. I am unable to determine how this is a strong, or even serious and realistic approach to real user privacy. Could you please justify how this protocol achieves any privacy from authorities, and how the current model of assuming that all authorities are:

Completely honest,
Impervious to server/back-end compromise,
Impervious to any transport-layer compromise or impersonation,

...is in any way realistic or something that can be taken seriously as a privacy-preserving protocol? Given the level of trust assurances that you are attributing to authorities, and given that authorities are responsible for generating, storing and communicating all pseudonyms directly to users to their devices, what security property is actually achieved in ROBERT in terms of pseudonymity between authorities and users?

Furthermore, it appears that the trust model for ROBERT is such that the server allocates pseudonyms and is thereafter trusted to never examine the social graph or any network relationship graph for users, ever. How could this possibly be a reasonable assumption for a privacy-preserving protocol?

Is this part of PEPP-PT?

Is this the same protocol as the one described yesterday by a colleague from @FraunhoferAISEC in a presentation organized by PEPP-PT?

Can the malicious user Alice raise the risk score of another user by editing her LocalProximityList before uploading?

Important:

I may have misunderstood the document, if so please sorry in advance.

Page 5

Alice finds out she is positive to COVID-19, so she decides to upload her LocalProximityList but in a past time she edited this list (this list is in her phone, so she has the root privileges to edit it). She doesn't like Bob and at least one Bob's EBID is in her LocalProximityList, she is sure to know his EBID because she remembers the moment where they were together and alone in the previous days (so at that moment any other EBID could no be received). She adds many Bob's EBIDs to her LocalProximityList, inventing the datetime. So she uploads a fake LocalProximityList to the server. Now Bob sends his EBIDs - without a datetime - to the server and the server calculates the "risk score". In the server's database there are many fake Bob's EBIDs uploaded by Alice so Bob received the alert by the server.

I'm not sure, so my question is: is this scenario plausible? Does the server check the data it receives?

I really agree with this point.

Best regards, Luigi
(I am sorry if it is a duplicate.)

Clarify "Honest-but-curious" terminology

Section 1.4 states that

The authority running the system, in turn, is ”honest-but-curious”. Specifically, it will not deploy
spying devices or will not modify the protocols and the messages.

Then, later:

The server does not learn the identifiers of the infected user’s App but only the EBIDs contained in
its LocalProximityList (list of Ephemeral Bluetooth Identifiers she was in proximity with).

and

Given any two random identifiers of IDTable that are flagged as “exposed”, the server Srv can not
tell whether they appeared in the same or in different LocalProximityList lists (the proximity links
between identifiers are not kept and, therefore, no proximity graph can be built).

However, section 6.1 acknowledges that the opposite is true:

A LocalProximityList contains the EBIDs of the devices that the infected user has
encountered in the last CT days. This information together with the timing information associated with each HELLO message could be used to build the de-identified social/proximity graph of the infected user. The aggregation of many such social/proximity graphs may lead, under some conditions, to the de-anonymization of its nodes, which results in the social graphs of the users.
Would that be a concern, it is necessary to ”break” the link between any two EBIDs contained in the LocalProximityList to prevent the server from getting get this information.

and then goes on listing several possible ways to prevent the honest-but-curious server from linking EBIDs and constructing the proximity graph, including:

mixnets,
crossing fingers and hoping that NAT translation will unlink the data,
using trusted servers,
trusted hardware.

Solution 1 is vague, and can thus not be analyzed. Solutions 3-4 are based on trust (on the network operator, on the hospital server, on the hardware), and thus do not fit in a honest-but-curious model. Furthermore, as the upload authorization procedure is not specified, it is impossible to tell whether the server can or cannot link EBIDs.

I strongly suggest you replace the "honest-but-curious" bit with "trusted" as long as these components are not fully specified.

"authority... will need to deploy sniffing devices"

In https://github.com/ROBERT-proximity-tracing/documents/blob/master/ROBERT-summary-EN.pdf it says:

"If the authority wants to do physical tracking, it will need to deploy sniffing devices"

This is false. Indeed, many such sniffing devices are already deployed (for adtech among other things), can be repurposed, or even remotely reprogrammed.

To be fair to ROBERT, this is not a fault of the protocol itself, and also a flaw in DP-3T. (I point out in their issues tracker in more details what kind of adversaries exist, which could turn out to be useful allies to snooping authorities).

Instead I would say both ROBERT and DP-3T operate in a tough landscape of surveillance, that has been allowed to persist due primarily to the centralization of data protection enforcement into the hands of a few incompetent or unwilling actors over several decades.

MAC verification at server end is performed last, should be first

According to the Specification, MAC is added to HELLO messages to prevent integrity attacks.

MAC_A;i : a HMAC − SHA256(KA; c1 j MA;i) truncated to 40 bits (c1 is the 8-bit prefix "01"). This field is used to prevent integrity attacks on the HELLO messages.

In ROBERT, the HELLO message verification by the server is performed last. In infected user declaration phase MAC verification is step 8 and in exposure status request phase MAC verification is step 5.

Shouldn't the MAC verification should be step 1 after the message is parsed and information is extracted? Ideally, to efficiently reject modified or corrupted messages, MAC should be verified first before any other step. This is typically done other protocols.

Not hiding disease status

Hi...

Not withstanding other concerns, about the pertinence of the threat model.
It seems, Robert does not achieve Honest but Curious...

Users are stored in the (backend) table as permanent IDs. Those ID are sent through a TLS session.
Except if you planned on embedding a Tor like feature, TLS does not hide user IP adressess. So at this step, the authority has identifying (in a CNIL/GDPR sense) user information
This means that the assertion "The central authority should not be able to learn information about the identities or locations of the participating users" is purely wrong...

Even if this data is not "stored" in the database, it is still present in the server logs. Those logs need to be kept for security purposes...
-> Server has personal information

When the app sends the Proximity List in case of infection, it does not get an answer
However when the App contacts the server to get a risk assessment, it gets a feedback.

In this case, any outsider can check whether the server answers or not, to know if it was a positive covid declaration...
-> Outsider can detect positive covid declaration (in particular internet providers can do that on the fly with preexisting infrastructure)

(Just to clarify, it is not a packet size issue, it is purely the absence of flow.)

The doc still mentions that an account can be deactivated:
What does it entail? Server deactivate an ID? -> With the logs server learns disease status
User app stop emitting? -> Network listeners can detect that no traffic is happening

About the will of activating Bluetooth

Dear author,

Thank you for sharing this great project! By reading the spec, I am asking myself how voluntary people will be to activate their Bluetooth 24/7. For my daily usage, I rarely activate my Bluetooth since Bluetooth communication is not as fast as WiFi and I don't have a Bluetooth device, without saying the extra power consumption brought by Bluetooth communication.

I can understand the idea is to have an application without tracing people's personal data, under the framework of GDPR. But would it be irrealistic to rely heavily on people's will to (1) download the application (2)activate their Bluetooth all the time?

However, the spec is great and I am wondering the possibility to use QR code instead of Bluetooth, based on the same anonymous strategies: we ask people to scan a QR code before entering public places, like entering a bus/tram, a restaurant, a hospital, a park, a school, a working place, etc. People who don't scan the QR code are not allowed to enter (like we control people who don't buy a transport ticket.) Knowing that by scanning the QR code, no personal info is sent but just a pseudo code, like in the protocol.

I didn't think into technical details but I think this kind of mandatary strategy is necessary, for the interest of the public health. Surely, people can say no to it, like there will be still people who don't buy transport tickets, which is fine for the functioning of a society, but there would be much less than people saying no to the Bluetooth strategy (and less cheating).

All above are just my personal opinions. Thank you again for the great work!

Why in 3.1 Server Setup is Tptsstart dynamic?

Wouldn't it be easier if Tptsstart was set to a constant value, for example Jan 1, 1970?

Social/contact/proximity graph - why are they needed? Are they all same?

A few questions related to social graphs:

Specification mentions how can a social/proximity graph be constructed or avoided. It does not mention the use of such graphs.
Consider using one type of graph naming- contact graph, social graph, proximity graph is interchangeably used within and across document/s.
Contradicting statements are made w.r.t to ability of central authority in constructing these graphs:

Specification:

The authority running the system, in turn, is "honest-but-curious". Specifically, it will not deploy
spying devices or will not modify the protocols and the messages. However, it might use collected
information for other purposes such as to re-identify users or to infer their contact graphs.

Given any two random identifiers of IDTable that are flagged as \exposed", the server Srv can not
tell whether they appeared in the same or in different LocalProximityList lists (the proximity links
between identifiers are not kept and, therefore, no proximity graph can be built)

A LocalProximityList contains the EBIDs of the devices that the infected user has encountered in the last CT days. This information together with the timing information associated with each HELLO message could be used to build the de-identified social/proximity graph of the infected user. The aggregation of many such social/proximity graphs may lead, under some conditions, to the de-anonymization of its nodes, which results in the social graphs of the users.

Summary:

The authority does not learn the real identities of any user, whether diagnosed with COVID-19 (i.e., tested positive), such as Alice above, or exposed, such as Bernard and Charles. Also, the authority cannot infer the “proximity graph” of Alice, Bernard or Charles.

information leaked through size/freq of client to server queries.

If i understand correctly there are 3 types of request from client to server:
-registration
-possible infection status query
-voluntary infection status update

I may have missed it, but without special care about the frequency / size of those request, i believe that simple analysis of request size/frequency may leak information that could be captured (or maybe must be captured by some legal logs). E.g. voluntary infection status will carry much more information and size will be larger. It will also be one unique 'asynch' message.
I'm not sure that the considered adversal assumptions adress this. If not I believe they are too weak.

The 'Exposure Status Request' mechanism exposes the full social graph of all users (infected or not) to the authority.

Many thanks for putting this proposal up for review by the community. I have provided some advice to the TCN coalition, and reviewed a few designs that proposed reporting / checking 'seen' beacons, as you propose too, as compared to designs reporting emitted beacon (like TCN, Google / Apple & DP3T). They all share an issue that, I believe, may also affect ROBERT.

The problem is that the service implementing the Exposure Status Request can reconstruct a very good approximation of the full social graph of the users querying it. This can be done within the honest-but-curious model the ROBERT scheme should be protecting against (and does not assume leakage of K_s which would be a separate, but devastating, attack).

The social graph reconstruction attack

The server end of the Exposure Status Request protocol observes tuple (user_a, time, [ EBIDi ]) regularly from each user. The service needs to know who the user is, to route back a request. (This is not the place where a mixnet is contemplated -- and the scale would make it impractical). The vector of EBIDi represent the beacons seen by a user within a period of time, say a few days.

Those beacons contain the encrypted ID of all the other users that user_a encountered. Even without the key K_s the side of the intersection between lists EBIDj and another list EBIDj belonging to another user_b is a very strong proxy of strength of social tie. The size of the intersection is therefore a measure of social adjacency (as well as proximity). This is due to social graphs, and location graphs, having a very large number of triangles -- and therefore user_a and user_b seeing the same set of users in common indicates they have a strong relationship. Of course they are also likely to see each other's EBID, and the authority can infer the EBID of each of the users, by looking for the most frequent EBID in 'adjacent' lists of EBIDs.

Of course the above is made worse by the fact that the server is assumed to know K_s and can therefore decrypt the long term identities of the users behind the EBIDs provided and simply read the full social graphs over time. Since this information leaks at the Exposure Status Request stage this leakage does not just affect infected users, or users that have been exposed, but all users in the system all the time. But I guess you are already aware of this issue. However, it is not clear why you do not, at least, allow for random EBIDs to prevent this trivial de-anonymization.

So the leakage happens both given knowledge of the key K_s and also without the knowledge of the key K_s. This makes pure key management solutions (keep K_s in an HSM) difficult. Since the bulk EBIDs are sensitive in themselves.

Is this a big deal?

In effect this scheme gives the server, and anyone else who can get lists of EBIDs from the server, the capability provided by the NSA co-traveller program (https://www.washingtonpost.com/world/national-security/nsa-tracking-cellphone-locations-worldwide-snowden-documents-show/2013/12/04/5492873a-5cf2-11e3-bc56-c6ca94801fac_story.html) that was using co-proximity in order to do contact tracing in an intelligence / national security setting. This capability, if sought by national or foreign intelligence agencies, would not be prevented by the GDPR, since personal data processed for the purposes of safeguarding national security or defence is outside the GDPR’s scope. Helpfully, the UK IP Act even provides a framework for processing such 'bulk data sets' for intelligence, and a code of practice that explains how such data can be used: https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/715478/Bulk_Personal_Datasets_Code_of_Practice.pdf
Needless to say that since this information has been the target of foreign signal intelligence agencies, any protections in national law are irrelevant anyway.

ROBERT is no better than DP-3T to protect users against "Deanonymizing Known Reported Users" attack

In many places in ROBERT whitepaper, a comparison is made with an analysis of DP-3T:

Although it might seem attractive in term of privacy to adopt a fully decentralized solution, such
approaches face important challenges in term of security and robustness against malicious users. [6]

and

Other, qualified as “decentralised”, schemes broadcast to each App an aggregate information containing the pseudonyms of all the infected users [1]. This information allow each App to decode the identifiers of infected users and verify if any of them are part of its contact list. Our scheme does not follow this principle because we believe that sending information about all infected users reveals too much information. In fact, it has been shown that this information can be easily used by malicious users to re-identify infected users at scale [6].

and

In our design, scores are computed on a trusted server and are used to notify users. While this offers more flexibility to adapt the scoring algorithms as needed and leads to more effective systems, it also increases the resilience of the systems against attackers that aim at identifying infected users: In order to be notified, an attacker must inject his own HELLO messages into a victim’s App in such a way that the risk scoring algorithm in the back-end selects him for notification. This makes such an attack more difficult as it requires an attacker to use invasive tools or put himself at risk, consequently reducing the scalability of such an attack. We consider especially the latter property to be a rather strong deterrent. In contrast, processing the risk of a contact on the phone upon reception of a notification inherently reduces the system to tracing, even for very brief encounters, between users and infected. This has major implications on privacy as using contextual metadata makes it trivial to identify infected users [6]. The system’s design would be based on the fact that all users which at any point ever saw an infected user’s HELLO can now use contextual metadata (such as a meeting date and time) to identify the infected users.

Reading the DP-3T analysis, it seems they always (and only) refer to the "5.2 Deanonymizing Known Reported Users" section:

If an adversary A encounters a user B, A can listen to theEphIDibroadcast then associatethisEphIDias belonging to B. If later B has itsSKtdisclosed, A can deanonymize this keyand learn that B was infected.

One example of this attack

Occasional disclosure. When a user A has its app raising an alert, he may be stressed and behave randomly. He could be curious to inspect his phone to figure out why it is raising an alert. If he knows DP3T enough, or if he finds a tool to do it for him, he would realize the alert is raised because of a series of EphIDi which were collected on the same coarse time on a given date. A could assume that it comes from those EphID′ is come from the same user and that their number indicate a duration of encounter. It may be enough for A to remember about B and therefore deanonymize B.

The same attack can be easily rewritten to fit ROBERT architecture specificities:

Occasional disclosure. User A can modify its application to register a new identity to the back-end server each time she meets a new user and log when the identity was created/used. Now, the user has as many identities as person seens and can independently probe the back-end server to know if one of her profile has been in contact with an infected person. As each profile is mapped to only one user and is associated with a timestamp, it may be enough for A to remember about B and therefore deanonymize B.

The same rationale can be applied for all the other attacks in this section (Paparazzi attack, Nerd attack and Militia attack).

Limiting the profile registration (ie: preventing Sybil attacks) would be needed to prevent an attacker from deanonymizing users. However, the mechanism presented is too weak to protect against even basic Sybil attacks:

A proof-of-work (PoW) system (like a CAPTCHA) is implemented in order to avoid automatic registrations by bots or denial of service attacks (the details of this PoW system are out of scope of this document).

Indeed, it is cheap and fast to hire micro-workers via platforms like Amazon Mecanical Turk to solve CAPTCHAS. But even more simply, it is not too long to solve ~10 CAPTCHA per days (just try to browse Google websites behind Tor to be convinced). Furthermore, many CAPTCHAs today work by collecting lot of data on the user behaviour and relaying on the fact the user is logged on the CAPTCHA provider services.

I claim that the only option would to require users to connect on the service via the FranceConnect portal. It would then de-anonymize users totally from the authorities. It would then break the defined threat model by ROBERT:

Anonymity of users from a central authority. The central authority should not be able to learn in-
formation about the identities or locations of the participating users, whether diagnosed as COVID-positive or not.

To conclude, I don't understand why ROBERT would better protect users' privacy than DP-3T as the same attacks (with a slight variant) can be applied to both protocols.

Abuse of the word "anonymous"

It seems that the ROBERT document uses "anonymous" quite liberally. The worst is "anonymous
pseudonym" (an oxymoron) in the summary document. Anonymity requires the lack of traceability. If identifiers are permanent, they cannot be called "anonymous". This sloppy use of "anonymous" is common in the paper.

K_S rotation?

If K_S leaks, this is a disaster. It could be wise to be able to rotate it.
By changing K_S, only the next few epochs which are already prepared would be compromised.

If K_S can evolve, the Hello messages would need to say which K_S version they are based on.
This could be done in clear, or like the country code to hide it.

Maybe the same holds for K_G (although it may be less sensitive).

Provide documents in a format that allows for feedback and discussion

The documents are currently provided only as binary files in the PDF format.
This choice makes it hard to collaborate and prevents suggestions of improvements through the standard open-source practices of pull (or “merge”) requests, forking for creating derived protocols, and discussion at line-level.

These documents look like they were created with LaTeX. The LaTeX sources should be provided in this repository instead of PDF files. The generated PDF files could be provided for consumption through another mean such as a public website, or at least in a dedicated folder such as dist.

At risk status directly correlated to identity. (and/or even weaker model)

Looking at the "specification" document:

It is said (on p11) that the server explicitly computes the risk score

computes a ”risk score” value, derived in part from the list LEE_A.

before sending that (properly encrypted) to the user.

So at this step, this means, that the server explicitly correlates the fact that the user is at risk with his IP.

Do you consider that it is outside your model to give (for free) this information to the server?
(No obfuscated computation is done there, the server knows that he needs to send an encryption of 0/1 to a given IP)

Of course, you might assume various proxies / nat / outsourced computation of the risk score. but this would mean that in addition to weak HBC model, you assume that none of the parties can ever collude. Also you would need to add authentication between those sub-parties in the back-end

===

As a side not... this is not "computer security" related, but more for political consideration if the app ends up being adopted.

Given that once considered "at risk", the user is banned from the app until proven non sick by a healthcare professional. This means, that adopting this solution is linked to being ready to deploy a wide range of tests even (and especially) for asymptomatic patients?

Deactivating 'at risk' accounts - why deactivate? what features are deactivated?

Specification states that users accounts that receive 'at risk of exposure' message will be deactivate.

Exposure Status Request: App queries (pull mechanism) the \exposure status" of its user by probing regularly the server with its EBIDs. The server then checks how many times the App’s EBIDs were flagged as \exposed" and computes a risk score from this information (and possibly other parameters, such the exposure duration or the user’s speed/acceleration during the contact). If this score is larger than a given threshold, the bit \1" (\at risk of exposure") is sent back to the App and her account is deactivated, otherwise the bit \0" is sent back. Upon reception of this message, a notification is displayed to the user that indicates the instructions to follow (e.g., go the hospital for a test, call a specific phone number, stay in quarantine, etc.)

UNA (\User A Notified"): this flag indicates if the associated user has already been notified to be at risk of exposure (\true") or not (\false"). It is initialized with value \false". Once set to \true", the user is not allowed to make any additional status request. The flag can be reset if the user can prove that she is not at risk anymore (for example by proving that she got a test and the result was negative).

Scenario 1. Does deactivating also disable the app from sending EPIDs collected in the future (After receiving 'at risk message')? An app user 'at risk' may continue to meet people (i.e. receive HELLO messages from other users) either knowingly or unknowingly when they are at risk. How are these HELLO messages collected? It is not clear how 'at risk' user interacts with other users.

Scenario 2. Does deactivating at risk accounts delete their data? For example, if an 'at risk' user A is diagnosed to be infected, is there provision for user A (whose account is deactivated) to send its LocalProximityList to Srv? Assuming that the list has valid IDs (within the advised 14 day period).

What does deactivation of an account imply? Does the 'at risk' user app lose functionality in all its stages - proximity discovery, declaration of contact pseudonyms of a user diagnosed with covid-19, exposure status request.

Based on the above two scenarios, I fell phase 1 and 2 of the protocol should still be active in 'at risk' users for accurate proximity tracing using ROBERT.

Non-traceability of nuisance alarms

In #7 we have established that in some conditions a malicious user may provoke a "false alarm" whereby a user is notified of an at-risk status without having had a genuine contact with a diagnosed patient.

The attack (adjusted to make timing valid and remove the need for bribes) seems to work as follows: the attacker places radio receivers in a target-rich environment, (e.g. a grocery store) and captures HELLO messages from targets. Using a means of rapid WAN communication, they send those messages near-instantaneously to a radio beacon that replays them exactly in a contamination-rich environment (e.g. an elderly care home in which an outbreak has been detected, or an hospital). Eventually, a person will be diagnosed positive in the contamination-rich environment, and subsequently trigger the "at risk" status of the targets.

Note the attack is passive in the target environment, and active only in the contamination environment.

Traceability

How could we detect this attack, in the honest-but-curious model?

all the measures that aim at preventing de-anonymization upon upload of a LocalProximityList in §6.1 also go in the direction of preventing the server from recognizing that this upload is "stuffed" with maliciously introduced HELLOs,
the attacker may inject a HELLO message corresponding to an ID under their control to be notified upon firing of the "at risk" event, which gives them warning to retrieve and hide their emitter beacon,
the eventually-diagnosed patients may never realize their report triggered a large amount of false alarms,

User fatigue

A few false alarms are to be expected in the normal course of events. For example, medical professionals are at a high contamination risk. But some may practice such good mask-wearing, hygiene & distancing that they would not pass on the illness despite being a first a carrier and then diagnosed. Such a case would create a "normal" false positive.

But should this repeat too often, this attack can create user fatigue, and lower motivation to get tested swiftly.

Moreover, the "at-risk" user is then excluded from emitting ESRs (see #16) until they get tested. Should they delay their testing, they might miss what would otherwise have been a second (and genuine this time) risk signal.

Contact with an infected patient, in an illness with a high attack rate, might happen frequently to everybody, whereas contact with a number n>1 of infected patients would be a more acute (and presumably less noisy) signal. Yet the current protocol is by design not delivering that signal.

In a decentralized protocol

In several decentralized protocols,

the attack above exists, but is reversed: the malicious attacker reads the beacons in a contamination rich environment and instantly replays them in a target-rich environment.
the signals of risk (pseudonymous events) are common knowledge between at risk individuals. As such, they can voluntarily disclose to their social circle (message neighbors, family) the nature and number of the signal(s) which is(/are) prompting them to get tested.

Should most of this community not test positive upon acting on the same signal, the context helps pin it on the low risk represented by that particular signal.

Should there be an unexpected pattern to these nuisance alarms (e.g. a grocery store), communities more easily become aware of the issue, and can be leveraged to investigate the context of these bad signals.

Reference to DP-3T Difference with

It seems that today, the DP-3T is more developped.

What are the differences ?
Why you do not collaborate with them ?

Identification of diagnosed people by their contacts, or an active malicious agent

How would the system prevent someone from getting many different names, so that they could use them for every contact ?
If I broadcast unique pseudonym, let's say Charles1 when I meet Alice, and Charles2 when I meet Bernard, and I get a warning when I check Charles1, I know Alice has been diagnosed.

On a larger scale, if a video-surveillance camera sends a unique name for each person walking by, then it can get pictures of diagnosed people.