Comments (7)
Keep thinking about it: is there a reason to support something else than NumPy for CPU and CuPy for GPU?
TensorFlow is much more complex than CuPy, and more dissimilar to NumPy.
Numba for CPU should not provide a significant advantage over NumPy if array operations are used (even though there is an advantage, since, according to the QiboJIT paper, it can simulate 5-6 qubits more, i.e. a factor 50... I really wonder where it is coming from...), and CuQuantum is mostly duplicating CuPy.
If ever we'll go through this exercise, I'd really consider trimming down the number of backends, in such a way to support all platforms, but shaving off as much overhead as possible.
(in principle, we could swap CuPy with TensorFlow, relegating the existing backends to QiboJIT, but in practice this would mean to keep the whole backends' mechanism as it is, while if only have to support NumPy and NumPy-compatible for simulation + Qibolab we might be able to refactor more)
from qibojit.
@alecandido I agree with the your point about cuquantum
and tensorflow
. However, there has been some demand about a pytorch
backend, which could be a replacement for the tensorflow
backend.
from qibojit.
@alecandido I agree with the your point about
cuquantum
andtensorflow
. However, there has been some demand about apytorch
backend, which could be a replacement for thetensorflow
backend.
In the spirit of the first message, PyTorch would be definitely better than TensorFlow.
However, considering the potential simplification, even PyTorch is still one more backend.
@renatomello are you aware of the benefit of a PyTorch backend?
If it's only to use PyTorch somewhere (possibly external to Qibo, or at least out of the circuit simulation) we could use DLpack and friends to (zero-copy) cast arrays from one library to another one, without the need of a full backend for it.
But if there is a need deeply connected to the circuit simulation, of course it's much better to plan including a PyTorch backend from the beginning (if we'll ever start a refactor, this issue was mostly investigation until now - I just wanted to check if there is room for improvements and simplifications).
from qibojit.
@alecandido I personally haven't used pytorch
(just have not needed yet). But what I heard from multiple people that is using Qibo
for optimization is that these tensor-based backends allow for automatic differentiation. If one's simulating the circuits instead of sending to actual hardware, AD becomes a basic necessity. After that, it's a matter of preferring pytorch
over tensorflow
in general. But the main point of having at least one backend based on tensors is AD.
from qibojit.
I like the suggestions of the first post, I need to read it in more detail later, but I agree that the methods of Qibo's AbstractBackend
could be simplified.
Other than that, regarding the existing backends:
Numba for CPU should not provide a significant advantage over NumPy if array operations are used (even though there is an advantage, since, according to the QiboJIT paper, it can simulate 5-6 qubits more, i.e. a factor 50... I really wonder where it is coming from...),
The advantage is only when the custom kernels are used, which are only for applying gates to states and some state initialization. All other operations are delegated to numpy. I would say (without real proof) that the advantage is coming from the following points, ordered with decreasing importance:
- In-place updates. In numba we modify the state vector in-place, while
np.einsum
creates a copy. An easy way to test this
import qibo
import numpy as np
from qibo import gates, Circuit
qibo.set_backend("qibojit") # or "numpy"
c = Circuit(2)
c.add(gates.H(0))
c.add(gates.H(1))
state = np.random.random(4).astype(complex)
state2 = circuit(state)
print(state)
print(state2)
With numpy state2 != state
while with numba state2 == state
.
- Numpy is single-thread while our numba kernels use parallelization (
prange
) to take advantage of multi-threading CPUs. That being said, maybe there are simpler ways to make Numpy (in particularnp.einsum
) compatible with multi-threading. - We are using some binary operations to find the indices during gate multiplications which are fast, however we never really proved how much advantage we get from this. I am guessing the low-level implementation of
np.einsum("ec,abcd->abed", gate, state)
which applies a single-qubit gate to the 3rd qubit of a 4-qubit state, uses similar tricks but I have never checked the actual code.
and CuQuantum is mostly duplicating CuPy.
That's true, CuQuantum is there only for supporting an additional backend, which is backed by NVIDIA, and for allowing easy benchmarking (CuPy vs CuQuantum). It does not offer any additional features.
TensorFlow is much more complex than CuPy, and more dissimilar to NumPy.
As @renatomello said, the main motivation for using Tensorflow is automatic differentiation. Compared to numpy, it also supports multi-threading and GPUs but is still slower than qibojit primarily due to creating copies (point 1 above), which are needed for automatic differentiation. Indeed, there are alternative backends we could add for this point (PyTorch, JAX, etc.), I think we only have TensorFlow for historic reasons, as we started with this and also qibotf, the predecessor of qibojit.
from qibojit.
Thanks, @stavros11, for the summary, I believe now everything should be clear enough.
My current understanding is that we'll need:
- basic and parallel CPU support
- hardware accelerators support (mostly GPU, but if possible any)
- automatic differentiation
So, I'm not sure that point 3. is strictly required for simulation, because strict simulation can not derive a circuit (otherwise the same code would not run on hardware out of the box).
However, we could assume that we want it, unless it's blocking greater improvements it would also be fine like that.
On the one side, I have always been tempted to add a further requirement: go beyond Python. However, this, together with the three above, would be incredibly time-consuming, and I'm pretty sure it's not worth for the current state of the project.
In Python many array libraries are available, with a NumPy-like API and broad hardware support, while to move to C the only strategy I can think of would be to make direct usage of XLA, with all the niceties of Bazel...
Speaking of XLA, it seems like all the major ML frameworks are using it (in particular TensorFlow, JAX, and even PyTorch), and it should satisfy all the conditions above on its own.
Thinking twice, I actually wonder if it would be worth to investigate deeper CuPy vs XLA-based libraries. Because if JAX or PyTorch are good enough (maybe not TF, since it's the least interoperable one, and it already "failed" somehow), and they support all the use cases, why should we dedicate effort ourselves to develop/maintain multiple simulation backends?
Eventually, if we really needed something more fine-grained than what these libraries could provide, making a trip into XLA itself might even be worth (but I really hope not, at least for a long while... also because we would lose all/most of the autodiff...).
In particular: is there anything to be executed on GPU or differentiated that is not possible to be implemented with TensorFlow(/...)?
P.S.: about the copies, I was worried the problem could have persisted with the others, but there is room in JAX and PyTorch (all the trailing_
methods) for in-place operations. However, since it could even be an outer product, there is no way that einsum
could be in-place (the output requires more memory than the input), we'd need explicit contractions (as I believe you implemented in qibojit
)
from qibojit.
So, I'm not sure that point 3. is strictly required for simulation, because strict simulation can not derive a circuit (otherwise the >same code would not run on hardware out of the box).
However, we could assume that we want it, unless it's blocking greater improvements it would also be fine like that.
Yes, AD is much better for gradient simulation than any other method that is hardware-compatible. So it is very necessary to keep.
Getting the same computational complexity as AD on hardware is actually a hot topic right now in the QML circles, and there are some theoretical results showing that it even may be impossible to do it for a general circuit without violating complexity bounds. Of course, it can still be possible for specific circuits.
But the point is that AD is indispensable.
from qibojit.
Related Issues (20)
- porting workflows repo
- Invalid NUMBA_NUM_THREADS while running with Numpy HOT 10
- Inverse of singular matrix in a quantum-to-classical channel in `numba` HOT 1
- Port poetry HOT 3
- Improvements for cuQuantum backend HOT 3
- Cannot run pytest when cuquantum is installed HOT 1
- No version `0.0.11` on pypi HOT 5
- `CuQuantum` exception HOT 7
- Slow when using repeated execution HOT 1
- Documentation
- Benchmark performance not-in-place updates
- Add not-in-place updates
- Drop py38
- some qibojit benchmarks on NVIDIA Grace-Hopper (WIP) HOT 15
- Tests are failing for cuquantum `23.10`
- Module not found during installation HOT 10
- FSWAP operator
- Cuquantum fails using the latest release HOT 2
- Add cuQuantum version in documentation HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from qibojit.