Comments (10)
@kurtamohler
Thank you for responses!!! You're truly incredible!
from pytorch.
This does not repro for me on linux:
from pytorch.
@janeyx99 Thank you for your quick response! After testing the code, I observed that it occasionally returns True and sometimes False. To further investigate this behaviour, I executed the following code snippet:
import torch
torch.use_deterministic_algorithms(True)
for i in tqdm(range(2, 1000)):
a = torch.rand(i+1, i)
b = torch.rand(i, i)
if not torch.equal((a @ b)[0:1, :], (a[0:1, :]) @ b):
print(i)
break
In my environment, the code prints 16.
from pytorch.
from pytorch.
@michaelshekasta, while the two expressions (a @ b)[0:1, :]
and (a[0:1, :]) @ b
may be mathematically identical, they cannot be guaranteed to produce the same result because summations may occur in a different order. The same goes for the two expressions linear(a)[0, :]
and linear(a[0, :])
.
torch.use_deterministic_algorithms
is only meant to provide determinism for multiple calls to the exact same operation given the exact same numerical arguments.
From the documentation here:
That is, algorithms which, given the same input, and when run on the same software and hardware, always produce the same output.
from pytorch.
Hey @kurtamohler, I want to make sure I fully understand you. Can the same thing happen in linear layers as well? (Instead of using @)
from pytorch.
Yes. With linear(a)
and linear(a[0, :])
, you are giving two different inputs, so torch.use_deterministic_algorithms
does not guarantee that they give the same results.
from pytorch.
@kurtamohler does pytorch has a documentation how does the Linear work? I mean why does it not the same result?
from pytorch.
The documentation for Linear
doesn't explain this--aside from just saying that Linear
performs a vector-matrix multiplication, which implicitly requires summation.
A general fact about floating point numbers is that when two of them are added together, the result has a small amount of error that depends on the difference between the two numbers. So if a set of floating point numbers is summed in two different orders, the errors can accumulate differently and give two slightly different summation results.
Any operator in PyTorch that sums elements of a tensor together may perform the summation in a different order depending on the size of the input. There are multiple possible reasons for this (like performance)--it depends on the implementation of the operator.
With torch.use_deterministic_algorithms
, we can guarantee that a given operator will perform summations in the same order each time that it is given the same exact input. But we don't (and probably can't) enforce the same order of summation for two different inputs of different sizes
from pytorch.
Happy to help!
Closing this, since different inputs are expected to give different results
from pytorch.
Related Issues (20)
- Torch C++ extension build failed with `fatal error: nlohmann/json.hpp: No such file or directory`
- TorchScript: `Return value was annotated as having type float but is actually of type int` violates PEP 484
- Pytorch 2.4 RC cu118 wheels do not work on old drivers HOT 2
- The doc of `stack()` should say `tuple` or `list` of tensors for `tensors` argument HOT 1
- Compilation Fails with torch.sparse and "fullgraph=True" HOT 4
- The doc of `cat()` should say `tuple` or `list` of tensors for `tensors` argument HOT 1
- The doc of `hstack()` should say `tuple` or `list` of tensors for `tensors` argument HOT 1
- The doc of `vstack()` should say `tuple` or `list` of tensors for `tensors` argument HOT 1
- The doc of `column_stack()` should say `tuple` or `list` of tensors for `tensors` argument HOT 1
- The doc of `dstack()` should say `tuple` or `list` of tensors for `tensors` argument HOT 1
- fx.wrap() doesn't really work for things in torch/*
- Converting a numpy array of size larger than 32,768 to a tensor causes a segmentation fault HOT 7
- tree_map_only_ doesn't seem work as expected HOT 2
- [TensorDict - compile] dynamo unaware of `set().union()` HOT 3
- [TensorDict - compile] dynamo doesn't know about `callable(smth)`
- [dynamo] Add support for torch.cuda.FloatTensor() HOT 1
- Module _dynamo init failed with trition folder in workdir using torch 2.4 HOT 1
- Mask not materialized for embedding backward HOT 4
- TorchDispatchMode fails on jit.trace
- Encountering HIP OOM error for the baddbmm operator when auto-tune is enabled HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pytorch.