Git Product home page Git Product logo

Comments (19)

tromp avatar tromp commented on August 29, 2024 1

This paper gives an explanation for why memory hard PoW is not possible.

No; it doesn't.
It only points out that an ethash style PoW, in which memory is used read-only, suffers from being sharable by many processors.

In Cuckoo Cycle, every instance (i.e. random bipartite graph) needs to completely rewrite memory.
So every instance needs its own memory.

from cuckoo.

tromp avatar tromp commented on August 29, 2024 1

You cannot always parallelize more. You're limited by die space and memory bandwidth.
C32 computes 4B siphashes in every round, but you wouldn't build a chip with millions of siphash circuits. Similarly, you cannot do millions of simultaneous memory accesses; your on-chip memory buses cannot handle that. Once your ASIC can do so many TB/s of internal memory bandwidth, the costs to further double that will more than double and it makes more sense just to make more smaller simpler units.

from cuckoo.

burdges avatar burdges commented on August 29, 2024

I doubt it. You might ask if memory bandwidth hard is possible. I doubt that too, because all those different CPUs could pull data from different memory locations in the same memory array faster than one CPU could do so, but at least here you're asking for a bespoke memory controller.

It's sounds plausible that an isogenies-based verifiable delay function could be made memory bandwidth hard, but only proof-of-stake systems can exploit a verifiable delay function, not proof-of-work systems.

from cuckoo.

zack-bitcoin avatar zack-bitcoin commented on August 29, 2024

How much time is spent analyzing a random bipartite graph before you realize it doesn't have a solution and move on to the next?
I guess you have to look at every byte at least once. and it is like 3.5 gigabytes right?
So I guess this takes at least like 0.5 seconds to scan that much RAM with a single cpu.

It seems like if you had 10 CPU sharing the same memory, then they could each scan different bytes at the same time, and finish in like 0.05 seconds.

from cuckoo.

zack-bitcoin avatar zack-bitcoin commented on August 29, 2024

is it expected that just loading the data into RAM is the expensive step, and finding if it has a solution is relatively cheap?

from cuckoo.

tromp avatar tromp commented on August 29, 2024

A really fast GPU needs about 1/3 of a second to analyze a Cuckatoo31 graph, which involves
writing 8-10.5 GB of DRAM multiple times.
CPUs have way less memory bandwidth and are an order of magnitude slower.

ASICs can use a more efficient algorithm that only needs 512 MB of SRAM, such as in this one:
https://obelisk.tech/products/grn1.html

from cuckoo.

zack-bitcoin avatar zack-bitcoin commented on August 29, 2024

So what portion of the time is spent on writing data to memory vs reading/processing that data?

from cuckoo.

tromp avatar tromp commented on August 29, 2024

everything that is written is also read, so writing and reading is pretty balanced. hash computation is a much smaller fraction; maybe around 10-20%

from cuckoo.

zack-bitcoin avatar zack-bitcoin commented on August 29, 2024

There is no way to parallelize the acts of reading/writing this data to a shared memory?

from cuckoo.

zack-bitcoin avatar zack-bitcoin commented on August 29, 2024

If I have a giant mining farm and spend thousands of dollars on electricity every day , wold I need to store more than one copy of the cuckoo cycle work in different memory at the same time?

from cuckoo.

tromp avatar tromp commented on August 29, 2024

You would create/process an instance every 1/3 second on a fast GPU, taking up pretty much all of its memory.

from cuckoo.

zack-bitcoin avatar zack-bitcoin commented on August 29, 2024

Yes, I know people are using gpu to mine for now.
I'm trying to imagine what the ideal cuckoo mining hardware would look like, and if it would be significantly more efficient than gpu.

from cuckoo.

zack-bitcoin avatar zack-bitcoin commented on August 29, 2024

More importantly, I want to know if ASICs for cuckoo cycle scales linearly with the amount of money the owner is investing into their mining hardware and electricity.

from cuckoo.

tromp avatar tromp commented on August 29, 2024

The ideal cuckoo mining ASIC has hundreds of MB of SRAM on a chip for the node bitmaps.
The total graphrate is linear in number of ASICs and power used.
Of course, a more expensive ASIC on a smaller scale (7nm instead of 12nm) would have lower power; so there's always a tradeoff between hardware cost and electricity cost.

from cuckoo.

zack-bitcoin avatar zack-bitcoin commented on August 29, 2024

Thanks for the quick answers to my questions.
You are doing a great job on cuckoo cycle.
The documentation is clear.

from cuckoo.

tromp avatar tromp commented on August 29, 2024

There is no way to parallelize the acts of reading/writing this data to a shared memory?

This is already extremely parallelized in existing GPU miners, with thousands of threads active at every stage...

from cuckoo.

tromp avatar tromp commented on August 29, 2024

Thanks for the quick answers to my questions.

You're welcome. And thanks for the praise.
Feel free to close the issue if you have no further questions...

from cuckoo.

zack-bitcoin avatar zack-bitcoin commented on August 29, 2024

ok.
I have a link pointed here https://twitter.com/zack_bitcoin/status/1163737778791862273?s=20
So if you have more ideas on this topic, you might want to add them to the closed thread.

from cuckoo.

zack-bitcoin avatar zack-bitcoin commented on August 29, 2024

It seems like if it is possible to parallelize any aspect of the computation of the PoW algorithm, then then that means we can still make a better ASIC.
And if it is possible to make a better ASIC, that means that there will always be some people who are getting better than linear returns from their investment, which would be a centralizing force on the network.

No matter how much we parallelize cuckoo cycle, it is still possible to make cuckoo cycle faster by parallelizing even more things.

I feel like we need some sort of proof that shows one version of hardware is optimal, and that the open source version is within like 5% of optimal, or some small bounds.

from cuckoo.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.