Git Product home page Git Product logo

Comments (5)

FiloSottile avatar FiloSottile commented on May 24, 2024

We're still ramping up a maintenance framework for the project, so I think what comes across as a lack of transparency is really work in progress. For context, until end of March we worked on getting the handoff finalized, and we're still working on getting the generators published, for example.

What I can tell you is that most if not every major cryptographic implementation is interested in consuming the vectors, and some are interested in contributing. I have not heard requests for specific primitives besides ML-KEM and ML-DSA, for which we have some in-progress contributions. There is definitely interest in defining a new reusable format for test vectors, although I don't expect that to be something that works out in the span of a few weeks.

We had a session at OSCW to talk about what implementers want from test vector libraries. I am not sure it's super easy to follow in video, since there's a lot of audience participation, but you might be interested in the recording https://archive.org/details/oscw-2024-fillippo-valsorda-cryptographic-test-vectors or I can send you a transcript.

There is consensus around making the Wycheproof project not just a source of test vectors, but a repository where different sources/people/projects can pool vectors, so that downstreams can use them all at once. We'll work over the next few weeks to make it easier to contribute vectors and to consume them.

You're very welcome to send any new vectors. If you're worried about duplicating work, maybe open an issue to announce what you are working on, and then close it with the PR that submits those vectors?

Note that since we intend to accept vectors from multiple sources, we can't rely on regenerating them all when changing formats or adding new ones, but we will have to port the old ones, and iteratively add new ones.

from wycheproof.

bleichenbacher-daniel avatar bleichenbacher-daniel commented on May 24, 2024

Maybe I wasn't clear enough. When I left Google a year ago two managers independently asked me if I'm willing to continue the project. I also received some non-committal promises that I might get access to my generator code. Hence I've continued working on the project. Now we have two parallel projects. This is obviously not ideal. Hence it would make sense to have a meeting to clear things out. What worries me most is that I have worked on the project a over a decade. Hence I don't want to lose the project a second time.

Thanks for the link to the video. I have a few comments:

  • There are already tests comparing the test vectors to the JSON schemas. The JSON schemas and the documentation of the test vector formats are generated from the same source, so that they would not fall out of sync. I don't know how to set this up on github however.
  • You talked about test vectors with intermediate values. In most cases these should be relatively easy to generate. Another option would be code that guesses the location where an error occurred. Most of the code I had there were colabs. If this kind of stuff is of interest then maybe it would be possible to recover these colabs (or rewrite them. They are probably less than 1000 lines of code)
  • You talked about testing the test vectors. One issue that needs to be discussed are test vectors with unclear states. An example are test vectors with modified private key. Here it is unclear if a crash with a modified private key is a vulnerability or not, since in most cases users modifying their own key means that they are just shooting themselves into the foot. However if private keys can be uploaded to an HSM, then crashes do matter. For such tests it is important to have a way to gain consensus whether libraries need strong private key validations or not. A big question here is how to decide what checks a library should perform when importing keys, or performing similar functions that are difficult to attack. I have generated a relatively large number of faulty keys. They have not been published exactly, since I don't know the expectations.
  • Test vector format: if we want to change the format of the test vectors then it would make sense to tackle this now before making big announcements.
  • Data structures for various languages: I think it should be possible to generate the data structure from the same source that generates the JSON schema and documentation.

from wycheproof.

FiloSottile avatar FiloSottile commented on May 24, 2024

I also received some non-committal promises that I might get access to my generator code.

I'm trying to enable that!

Hence it would make sense to have a meeting to clear things out.

Sure! I still don't have an email address for you, but you can reach out at [email protected] and we can set up a call.

I want to be upfront: the goal of C2SP, my own intention, and the community's interest is in growing Wycheproof into a repository for (properly attributed!) test vectors from multiple sources. I think what you worked on can fit perfectly, but I want to be clear it's not the same single-source design of Wycheproof-at-Google.

There are already tests comparing the test vectors to the JSON schemas. The JSON schemas and the documentation of the test vector formats are generated from the same source, so that they would not fall out of sync. I don't know how to set this up on github however.

Happy to do the GitHub Actions setup.

I'm not sure I see the autogenerated documentation, where is it?

You talked about test vectors with intermediate values. In most cases these should be relatively easy to generate. Another option would be code that guesses the location where an error occurred. Most of the code I had there were colabs. If this kind of stuff is of interest then maybe it would be possible to recover these colabs (or rewrite them. They are probably less than 1000 lines of code)

Intermediate values are useful while developing an implementation, so I am not sure they make sense in the same format/place as the rest, but they would definitely be useful. Maybe they fit in the more "free-form" part of the repository we talked about at OSCW.

You talked about testing the test vectors. One issue that needs to be discussed are test vectors with unclear states.

When I talk about testing the test vector I just mean making sure they were generated correctly given their intention, so we can just write implementations that pass/fail based on the "acceptable" state.

Here it is unclear if a crash with a modified private key is a vulnerability or not, since in most cases users modifying their own key means that they are just shooting themselves into the foot.

Heh, this is a whole topic among implementers and different libraries take different views. I think it would make sense to have them, maybe with a specific flag/in specific files, to let libraries decide if they fit the threat model.

Test vector format: if we want to change the format of the test vectors then it would make sense to tackle this now before making big announcements.

I would rather take our time to gather community feedback on the new format. For now, I want to get us set up with refreshed docs, and the tooling to smoothly add and consume vectors in the current v1 format.

Data structures for various languages: I think it should be possible to generate the data structure from the same source that generates the JSON schema and documentation.

Generating them is indeed easy, but knowing what the right data structure is requires language-specific knowledge that I don't have across all languages. For now I think just making the JSON available is a good first step.

from wycheproof.

bleichenbacher-daniel avatar bleichenbacher-daniel commented on May 24, 2024

The auto generated file I talked about is
https://github.com/C2SP/wycheproof/blob/master/doc/types.md
Unfortunately, this is an old version, which is sad because the main goal was to generate doc and schemas from the same source, then test the schemas against the test vectors, which should ensure that the documentation reflects the test vector files. Of course if that gets out of sync, then nothing is gained.

Yes, feedback would be nice. One thing I could do is generate some sample test vector files with a new format, just for discussion. An issue about the about the test vector format is open #106. So far there are no comments yet.

from wycheproof.

bleichenbacher-daniel avatar bleichenbacher-daniel commented on May 24, 2024

I'm wondering if the upcoming Eurocrypt would be an opportunity for some small meeting.

For example discussing the following topics would be helpful for the project:

  • Organization: So far I've rewritten about 80'000 lines of code for the test vector generation. A significant fraction might not have been necessary with a clearer organization of the project. Hence, it might be useful to consider how duplicate work can be avoided in the future.

  • Priorities: To have some impact it would be very useful to determine if there are important algorithms or primitives that are missing.

  • Tools: One thing I noticed is that the test vectors are often converted to other formats. It might be helpful to provide some tools to support such conversions.

  • Annoyances that make the project harder to use than necessary.

from wycheproof.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.