Comments (16)
Oh I missed that issue multiformats/multicodec#199 - but I think we had a similar thought process :) Regarding your conclusion:
So it seems like the general recommendation should be to just expose a series of named constants for each of the codes. For example, in Go, we could simply code generate: [ ... ]
This is exactly what I've done in dennis-tra/go-multicodec/const.go. The other two files dennis-tra/go-multicodec/codecs.go and dennis-tra/go-multicodec/tags.go are just there to show other possible things one could do with such a generator approach.
While I think codecs.go
makes sense to keep, the tags.go
file may be superfluous. That's also speculated in multiformats/multicodec#199:
Similarly, categorizing the names or codes is not a good idea. The table has a tag field, but it doesn't mean that each of the codes fits neatly into a single category which will remain stable over time
If the tags.go
file wasn't generated, the API surface would consist of the list of constants and two maps string -> codec
and codec -> string
.
I'm curious how you approached this and looking forward to see your solution :) I'd appreciate if you left a note here or tagged me so it won't slip through.
Edit: I just removed the tags.go
generation -> https://github.com/dennis-tra/go-multicodec/tree/no-tags.
from go-multihash.
unreproducable for older commits. Having the submodule clearly links the dependency between these repos.
You can't have truly reproducible unless you simply vendor/copy the table into the git repo. A git clone can fail in the future.
If you want to know what commit was used in a past go generate
call, you could always include a commit hash in a comment in the generated code, though I'm not sure in what specific case this would be useful.
My understanding was that the name column of the table should be the corresponding string value for each constant.
See stringer -linecomment
.
This would result in "blindly" converting the
name
to camel case without considering acronyms/initialisms.
Sounds good to me.
The generator is already generating deprecated constants and prefixes the doc string
Ah, gotcha, that sounds good. I misread the generator source.
some constants got names like
Skein1024616
andPoseidonBls12381A2Fc1Sc
which is hardly readable compared toPOSEIDON_BLS12_381_A2_FC1_SC
andSKEIN1024_616
.
Here's a suggested heuristic: leave underscores untouched, and any dash between two digits must turn into an underscore. That way, bls12_381
turns into Bls122_381
, and skein1024-616
turns into Skein1024_616
. It won't be perfect, but it will be automatic and hopefully not require special cases.
from go-multihash.
Thanks! Looks good. I think we can leave out .gitignore
, because it's not really needed for a vanilla Go repo.
I'm also thinking that it would be best to leave the github action out of the first version, because I feel like it's a potential security concern - a third party action has access to write to our repo, and since we follow a tag/branch the upstream could run arbitrary code with those credentials. And it's really not all that necessary; the table should only update every few weeks, and the cost of updating is a simple go generate && git commit -m '...' && git push
.
I'll get to reviving the repo this week, thanks for your help!
from go-multihash.
My first attempt is at multiformats/go-multicodec#37, after unarchiving the repository and closing all previous issues and PRs. Let me know what you think. We should probably close this issue and continue over there.
from go-multihash.
The above has been merged, thanks @dennis-tra! I'll continue with PRs over there.
from go-multihash.
this is something @mvdan was considering doing and maybe has thoughts on this implementation
from go-multihash.
Indeed, I've had an implementation in mind for a few weeks :) I started implementing it over the weekend, and I plan to wrap it up and push it later this week. The summary is that the API would be smaller and there would overall be less code; see multiformats/multicodec#199.
from go-multihash.
Would you be up for adapting your implementation to the leaner approach I was thinking, so we can reuse your work? The bigger changes would be:
-
Remove the git submodule, since it's a bit overkill. An HTTPS fetch of the latest CSV is plenty.
-
You can simply use
stringer
to stringify constants, no need to roll out your own implementation. Plus,stringer
generates better code. -
The constants look like JAVA_NAMES. For example,
TORRENT_FILE
should just beTorrentFile
. This will likely require teaching the generator what words are acronyms or initialisms, like CID or DAG, which should not be spelled like Cid or Dag. -
I would remove the string->Codec mapping for now, because it's unclear if it's truly needed. If a package wants to understand some multicodec flag or argument, they should probably implement them too, at which point writing a string comparison is trivial. And if we want to add this "reverse String method" in the future, we should just do a Set, to implement
flag.Value
. -
I don't think we should avoid generating deprecated constants. Go has had a standard way to mark declarations as deprecated, so we could just do that instead.
I also wonder if the code generator is a tad over-engineered. You have two directories and five files, between Go code and templates. After the changes above, I bet you could end up with a rather simple generator in under 100 lines of Go in a single file.
from go-multihash.
like CID or DAG, which should not be spelled like Cid or Dag
This might not be necessary, there's a ton of prior art of just using upper camel case regardless of acronyms, look no further than https://godoc.org/github.com/ipfs/go-cid for lots of examples. Up to you both but I doubt people will mind if it's not strictly correct. If you end up trying to be correct you could be up for a lot of work figuring out if each new case should be capitalised or not -- are you going to go with BLAKE2b
rather than what an upper camel might give you since that would be technically correct? You'll end up doing a lot of teaching.
from go-multihash.
Maybe. Standard CamelCase would be better than JAVA_CASE anyway :)
from go-multihash.
Would you be up for adapting your implementation to the leaner approach I was thinking, so we can reuse your work? The bigger changes would be:
Sure :)
- Remove the git submodule, since it's a bit overkill. An HTTPS fetch of the latest CSV is plenty.
While I'm not the biggest fan of submodules either I see some benefits in using it. Imho there is some added value in knowing which commit in the multicodecs repo induced the change to the list of constants. Just using GET to fetch the table makes the generator result unreproducable for older commits. Having the submodule clearly links the dependency between these repos.
- You can simply use
stringer
to stringify constants, no need to roll out your own implementation. Plus,stringer
generates better code.
Initially I thought this is good idea, but Stringer
will generate the literal constant variable names. Depending on whether we'll use CamelCase
or SCREAMING_SNAKE_CASE
the String()
method will return just that. My understanding was that the name
column of the table should be the corresponding string value for each constant.
- The constants look like JAVA_NAMES. For example,
TORRENT_FILE
should just beTorrentFile
. This will likely require teaching the generator what words are acronyms or initialisms, like CID or DAG, which should not be spelled like Cid or Dag.
At first I haven't had the Java like names but the CamelCase ones (see this commit). Afaik this is also the recommended/idiomatic way of naming constants in Go (found the source). That's why I went for this approach first. For the camel case conversion from the name
values I used the iancoleman/strcase package. However I ran into exactly the problems you @mvdan and @rvagg mentioned. The generator would need to learn acronyms to satisfy the initialisms guideline.
Having the additional dependency of strcase
(I actually just copied the toCamel
method), a potentially always out of date list of acronyms (auto-update wouldn't necessarily work if a new acronym were introduced) and the multihash package also using JAVA_CASE were three arguments for using the Java like names.
However I just saw that the initialisms guideline also says:
Code generated by the protocol buffer compiler is exempt from this rule. Human-written code is held to a higher standard than machine-written code.
So I'd suggest doing it similarly to the go-cid package and reverting my aforementioned commit. This would result in "blindly" converting the name
to camel case without considering acronyms/initialisms.
- I would remove the string->Codec mapping for now, because it's unclear if it's truly needed. If a package wants to understand some multicodec flag or argument, they should probably implement them too, at which point writing a string comparison is trivial. And if we want to add this "reverse String method" in the future, we should just do a Set, to implement
flag.Value
.
👍
- I don't think we should avoid generating deprecated constants. Go has had a standard way to mark declarations as deprecated, so we could just do that instead.
The generator is already generating deprecated constants and prefixes the doc string with // Deprecated:
- is this the standard way you thought of?
I also wonder if the code generator is a tad over-engineered. You have two directories and five files, between Go code and templates. After the changes above, I bet you could end up with a rather simple generator in under 100 lines of Go in a single file.
That will likely be true :) thanks for your input.
EDIT: I just updated the no-tags
branch and now some constants got names like Skein1024616
and PoseidonBls12381A2Fc1Sc
which is hardly readable compared to POSEIDON_BLS12_381_A2_FC1_SC
and SKEIN1024_616
.
from go-multihash.
You can't have truly reproducible unless you simply vendor/copy the table into the git repo. A git clone can fail in the future.
If you want to know what commit was used in a past
go generate
call, you could always include a commit hash in a comment in the generated code, though I'm not sure in what specific case this would be useful.
I just updated the repository and removed the submodule.
See
stringer -linecomment
.
Great, I didn't know this was possible 👍
Here's a suggested heuristic: leave underscores untouched, and any dash between two digits must turn into an underscore. That way,
bls12_381
turns intoBls122_381
, andskein1024-616
turns intoSkein1024_616
. It won't be perfect, but it will be automatic and hopefully not require special cases.
Just tried that and it LGTM 👍
That's the current state:
https://github.com/dennis-tra/go-multicodec
from go-multihash.
Hi @mvdan and @rvagg, have you had a chance to look at the changes?
from go-multihash.
Yes; it's just not at the top of my TODO list :)
Are you still okay with transferring this code over to https://github.com/multiformats/go-multicodec? My plan is to remove all existing code there, keep the LICENSE and some parts of the README, add the new implementation, and tag v0.2.0 when done. I'd like the new implementation to be a single squashed commit, just for the sake of keeping the history clean.
You should remain the author of that single commit, since you wrote the software anyway. It would be best if you could squash all your work in a single commit with a good commit message explaining all that you've done, so that I can then cherry-pick that.
The only other detail that comes to mind is having me listed as the maintainer of the repository/module, which I think is fair given that I'm employed to take that kind of responsibility :) You would of course be welcome to open issues and contribute PRs, and I would review them.
As for the implementation, here are some more nits to polish the code a little more:
- Use
gofmt -w codec.go
instead ofgo fmt ./...
, since it's simpler and does less work. - I still think it would be best to join all of
gen/
intogen.go
, with the template being a multiline raw string literal. You can add a// +build ignore
to ensure it's not built as part of the package, and run it viago run gen.go
. - The constant godocs should comply with the standard (start with the name and end with a period) and be human-readable. For example, instead of
// multihash: raw binary
, you could do// Identity is a codec with tag "multihash" and described by: raw binary.
. The description part can be omitted if the csv entry has none. - I still see some constant names which could lose their underscores; for example,
Bitcoin_Tx
could beBitcoinTx
, andHolochain_Key_V1
could beHolochainKeyV1
.
This is all stuff I could do via a PR after we've moved the code to the main repository, though, so it's up to you if you want to do this yourself now or leave it to me to do later.
from go-multihash.
Yes; it's just not at the top of my TODO list :)
Sure thing, I had your fast replies from last week as a reference point ;)
Are you still okay with transferring this code over to https://github.com/multiformats/go-multicodec? My plan is to remove all existing code there, keep the LICENSE and some parts of the README, add the new implementation, and tag v0.2.0 when done. I'd like the new implementation to be a single squashed commit, just for the sake of keeping the history clean.
You should remain the author of that single commit, since you wrote the software anyway. It would be best if you could squash all your work in a single commit with a good commit message explaining all that you've done, so that I can then cherry-pick that.
The only other detail that comes to mind is having me listed as the maintainer of the repository/module, which I think is fair given that I'm employed to take that kind of responsibility :) You would of course be welcome to open issues and contribute PRs, and I would review them.
The work I've done here is no rocket science but still having my name somewhere would be nice. Your plan captures that :) 👍
As for the implementation, here are some more nits to polish the code a little more:
- Use
gofmt -w codec.go
instead ofgo fmt ./...
, since it's simpler and does less work.- I still think it would be best to join all of
gen/
intogen.go
, with the template being a multiline raw string literal. You can add a// +build ignore
to ensure it's not built as part of the package, and run it viago run gen.go
.- The constant godocs should comply with the standard (start with the name and end with a period) and be human-readable. For example, instead of
// multihash: raw binary
, you could do// Identity is a codec with tag "multihash" and described by: raw binary.
. The description part can be omitted if the csv entry has none.- I still see some constant names which could lose their underscores; for example,
Bitcoin_Tx
could beBitcoinTx
, andHolochain_Key_V1
could beHolochainKeyV1
.This is all stuff I could do via a PR after we've moved the code to the main repository, though, so it's up to you if you want to do this yourself now or leave it to me to do later.
I've covered the first three points and left the last one to you, because of the challenges with having special cases mentioned earlier.
You can find the commit in the transfer
branch or here: dennis-tra/go-multicodec@5b6d55a
from go-multihash.
Thanks @mvdan, I may have chimed, if I had seen the conversation there - great to see it merged :)
from go-multihash.
Related Issues (20)
- Possibly erroneous tag pushed: v1.0.2
- Possibly erroneous tag pushed: v1.0.2
- Possibly erroneous tag pushed: v1.0.2
- Version tags gone missing? HOT 3
- Doesn't work on Go playground HOT 2
- Convert hash to hex hash and vice versa HOT 2
- Please choose between identity / id and stick to it across all projects HOT 1
- Harden varint decoding
- Some algorithms don't implement Sum HOT 2
- [enhancement] adding BLAKE3 support HOT 21
- Split core types into a core subpackage HOT 1
- Go get multihash fails compilation due to sha3.NewLegacyKeccak512
- What's going on with murmur3? HOT 4
- approach to constants HOT 6
- Decode allocates
- Create a new tag to allow using recent master HOT 3
- hasher.Write error is not handled
- Interview partners for research about communication in GitHub projects wanted
- Faster blake3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from go-multihash.