yaleopenlab / openclimate-demo Goto Github PK

View Code? Open in Web Editor NEW

57.0 57.0 10.0 28.11 MB

Open Climate Repo

License: GNU Lesser General Public License v3.0

Go 97.40% Python 2.60%

backend climate climate-change climate-change-mitigation go golang python3

openclimate-demo's People

Contributors

Stargazers

Watchers

Forkers

fmarra72 liquidxd tianguistengo openclimateproject sumitis14 akshaymukadam4 rodolfodc quantdrogon rodoqui17

openclimate-demo's Issues

Closed and open data validation oracles

Closed and Open Data Oracles

As openclimate is structured right now, we have oracles which perform multiple data validation tasks - from companies, individuals, etc. On a broad scale, such data might be divided into two categories:

Open Data - data that is intended by the submitter to be public
Closed data - data that is intended by the submitter to be private (in parts)

Open Data

In an open data ecosystem, the data of the submitter is published to ipfs and the hash is logged by the platform. Furthermore, the committer creates either

A transaction with an OP_RETURN pointing to the stored ipfs hash, or
An adaptor signature that commits to the point pertaining to the ipfs hash

An oracle can either

Take the hash from the platform and proceed for verification, or
Observe the blockchain for confirmed hashes and proceed to validate them.

A submitter can choose to withdraw submitted data within the block interval time by double spending the funds and data is not deemed final until it is in the blockchain.

Once an oracle observes said hashes, it can access the data from ipfs, perform analyses on it and come to a conclusion on whether the data submitted by the submitter was indeed correct. If the data is said to be correct, the oracle can either:

Submit its own commitment transaction attesting to the validity of data, or
Submit a commitment to the smart contract which publishes a final transaction attesting to data

Oracles can define their own models for verification or follow standards ones as defined by appropriate bodies.

Closed Data

In a closed data system, there are parts of data that are deemed sensitive to be released to the public and as a result, the submitter will not submit their data to the platform. Instead, the submitter must either

Provide proofs (could be technical or legal) that their submitted data conforms to given standards
Provide access to a random oracle towards which certain queries on data can be made (the random oracle here can be imagined similar to the Random Oracles used for analysing cryptosystems)

Random Oracle

In a random oracle model, the querier can make a certain (in cryptosystems, infinite) amount of queries to the oracle and the oracle gives them a set of responses. In our application, the oracle might have to answer specific questions about emissions, tons of CO2 emissions prevented, etc that would lead to the querier believing that the underlying data is correct.

As with the oracles in the open oracle case, these modes can be formulated either by the oracles themselves or by third parties. Once the verifier is convinced that the submitter isn't lying, they can submit a commitment to the blockchain attesting that the data submitted by the submitter is indeed correct. This would work similar to the open oracle case.

Consensus among Oracles

The system that we want to design is composed of multiple oracles each of which should have the capability to independently verify data. In the event that two oracles come to opposing conclusions, we must have a mechanism to verify or validate which oracle is correct. This could be done by verifying the different models, having external opinions, etc but the easiest way this can be solved would be to have a mechanism for consensus within the oracles themselves.

Since the main purpose of oracles is to validate data, we don't assume that they will be running on very complex machines (and as a result, a scheme like Proof of Work has to be ruled out). Instead, we can either rely on BFT or Proof of Stake schemes to ensure consistency between oracles.

In the event that we do have oracle validation pools, these pools could internally have their own consensus mechanism for ensuring consistency which would make for an interesting application.

Note that the thing described above does not describe how the models around the oracles are designed (data of the platform, oracle models, etc). It instead describes how an oracle scheme can be built on top of any base layer (ie the oracle layer is independent of the base layer).

improve ring signatures

right now, the ring sigs use only pederson commitments, which has computational binding. By committing to another external point using ElGamal, we can make it fully binding. Not really necessary since computational binding is fine in most cases but would certainly be a cool idea to explore.

explore Pay to Contract

after ElGamal Commitments, it would be nice to explore pay to endpoint for improved privacy on bitcoin.

Switch to auth token auth for RPC

right now, we need the username and pwhash for authenticating multiple RPC endpoints. WE must shift to the new auth token system to make stuff consistent for the frontend

Automatic Data Inferring

Data inferences

A problem that arose while consuming data from public facing sites was that data was formatted into different names and it was non trivial to identify which names were associated with standard measurable values. This problem would compound when there are multiple providers uploading data and when the platform is not able to figure out where said data belongs. One way to approach this would be to have a standard list and ask uploaders to transfer data from the format they have into the new format that we define. But, as past efforts have shown, this is unsustainable and companies and countries are not incentivised to do this and as a result will not do this.

Assume there are three inputs - Input1, Input2, and Input3 with three fields to report

Input1 defines them to be Field1, Field2, Field3
Input2 defines them to be F1, F2, F3
Input3 defines them to be f1, f2, f3

Assume that the platform expects these fields to be defined as field1, field2, field3. The platform must have a way to infer that the respective fields are mapped to their correct domains by parsing their names. This model could be powered by a simple text parser, a ML based learning algorithm, etc. The idea is that this parsing layer must be a blackbox and everything put into it must come out cleanly formatted.

This blackbox could also potentially be used in other places where we might need inferential analysis (API endpoints, Names, etc). This would be a nice side project that can be easily plugged into the platform and does not depend on the platform to make any changes (one could write a parser that works on 100 examples and then run it on the platform)

Decide and add blockchain handlers

right now, we have two globals dedicated to committing values to the blockchain. We need to arrive at which blockchain we need to use and add the relevant handlers in.

Data storage and retrieval

Data Storage and Retrieval

When a platform primarily relies on data(such as openclimate), multiple problems related to data storage and retrieval come into picture. The platform in this case must have access to a certain subset of data but oracles must have access to all data in order to ensure that the data subset was indeed accurate. There are three approaches to giving another person access to data:

Direct access to data
Access to an Oracle which allows some queries to be made against data
Not allowing access

There is a fourth category which is third party audits (parties don't permit direct access to data but allow certain third parties to audit said data) but that would not fall under a technical solution.

The first approach is relatively simple - the party copies a portion of data into another directory and gives people access to that directory. The party could also store data in ipfs, publish the encryption key and allow people to retrieve data from there.

The second approach resonates with #30 and the construction will be quite similar as well

The third approach of denying access to data might sound weird but there are some categories within this class:

Publishing Zero knowledge proofs of data that allow independent third party verifiers to verify that the data is accurate
Publishing proofs that somehow prove that they are inline with their promised commitments
No data

The first two categories above are interesting to explore since most companies will refuse to share data publicly and will not agree to have an oracle which would allow some queries to be made.

The data itself can be stored in multiple places - on ipfs, on a traditional db system, on a blockchain, etc. It is important that access to this falls inline with the methods of data retrieval discussed above.

Another parameter to tweak would be the maximum amount of data that a particular entity can store. Since data will be encrypted and the platform itself would not have an idea about what data is being stored, a malicious party can upload arbitrary data to the platform, increasing storage costs. The parameter must not be too small such that committed reporters find it difficult to report their emissions while at the same time, it must not be too big such that bad actors take advantage of the platform. The best way would be to have an adjustable file size limit (people who report more have more storage access) with a default of 10MB.

Setup sandbox for test data

We need sample data to be populated on the frontend for the demo. This could either be in the form of a yaml file or in the form of static data set on the backend. Might be easier to do yaml since re-parsing is easier.

Implement 2 party ECDSA

One of the advantages with Schnorr Signatures is its ability to add signatures together and have threshold schemes. With ECDSA its a bit tougher since we need to multiply signatures instead of adding them but it is possible, as shown by different papers related to the topic.

Explore oblivious Transfers

At Scaling Bitcoin in Tel Aviv, there was an idea about oblivious transfers (trustless transfers between two parties) using a model similar to AOS signatures. This may be used for lotteries or something similar and could prove to be useful in certain applications.