Comments (3)
For BFloat16 AMP, the grad scaler wouldn't be needed as its range is equivalent to fp32's.
Could you try the script without grad scaler?
from pytorch.
this is my cuda cudnn infos:
(llama) D:\codes\llm_about\self-llm>nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Mon_Apr__3_17:36:15_Pacific_Daylight_Time_2023
Cuda compilation tools, release 12.1, V12.1.105
Build cuda_12.1.r12.1/compiler.32688072_0
(llama) D:\codes\llm_about\self-llm>nvidia-smi
Sun May 26 00:17:50 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.85 Driver Version: 555.85 CUDA Version: 12.5 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Driver-Model | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 3090 WDDM | 00000000:01:00.0 On | N/A |
| 50% 38C P8 19W / 350W | 1076MiB / 24576MiB | 9% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
from pytorch.
some of my dataset like:
[{
"instruction": "请针对以下商品名称生成商品简称,并直接给出答案",
"input": "南孚 7号 电池 16粒/盒",
"output": "电池"
}, {
"instruction": "请针对以下商品名称生成商品简称,并直接给出答案",
"input": "三星 SAMSUNG Galaxy Z Flip3 5G 8GB+256GB 折叠屏5G手机 立式交互体验 IPX8防水 梦境极光",
"output": "电池"
}, {
"instruction": "请针对以下商品名称生成商品简称,并直接给出答案",
"input": "达伯埃 Double A 复印纸 80g-A4一箱5包-2500张",
"output": "复印纸"
}, {
"instruction": "请针对以下商品名称生成商品简称,并直接给出答案",
"input": "苏宁超市自营",
"output": "复印纸"
}, {
"instruction": "请针对以下商品名称生成商品简称,并直接给出答案",
"input": "戴尔DELL 成就3710 商用办公电脑整机 23.8英寸",
"output": "商用办公电脑整机"
}, {
"instruction": "请针对以下商品名称生成商品简称,并直接给出答案",
"input": "华硕 灵耀14 2022 超轻薄商务办公笔记本电脑 14英寸",
"output": "超轻薄商务办公笔记本电脑"
}, {
"instruction": "请针对以下商品名称生成商品简称,并直接给出答案",
"input": "国产 扎壶 1500ml",
"output": "扎壶"
}, {
"instruction": "请针对以下商品名称生成商品简称,并直接给出答案",
"input": "白云 尘推布 60cm",
"output": "尘推布"
}]
from pytorch.
Related Issues (20)
- torch.ones(2,4,1,30,1).to('mps').sum(dim=-2) throws "buffer is not large enough" on mps HOT 2
- MPS `any()` crashes on a Tensor with >4 dims HOT 5
- DISABLED test_graph_grad_scaling_foreach_False_fused_False_Adam_cuda_float32 (__main__.TestCudaOptimsCUDA) HOT 1
- `num_features` of `nn.BatchNorm1d()` with a tuple of `int` works
- _RendezvousJoinOp module initiation results in crash if state.deadline is none and datetime.utcnow() is deprecated HOT 2
- torch.linalg.lstsq: Argument 7 has illegal value. HOT 1
- MultiLabelMarginLoss with customized margin. HOT 1
- NotImplementedError: The operator 'aten::isin.Tensor_Tensor_out' is not currently implemented for the MPS device. HOT 1
- Dynamo: warning when we incorrectly inline autograd.Function.backward due to subclass gradients HOT 2
- test_c10d_nccl failure on A100 and H100 HOT 4
- [inductor][user triton] tensor used as a view in triton kernel is returned as the original (not viewed) dtype
- DISABLED test_pcontext_wait_on_a_child_thread (__main__.StartProcessesAsFuncTest) HOT 2
- OperatorBench Plan HOT 6
- DISABLED test_graph_grad_scaling_foreach_False_fused_False_SGD_cuda_float32 (__main__.TestCudaOptimsCUDA) HOT 2
- Disable Python torch.library calls under torch::deploy
- [ONNX] Inputs generated by onnx.export() with dynamo=False are not consistent with dynamo=True
- Corrupt Traces Due to PyTorch Inductor Config Name Control Characters HOT 2
- [inductor] assert_close numerics failed when fusing abs,max,clamp,mul,reciprocal (needed by float8) HOT 7
- "bmm_sparse_cuda" not implemented for 'BFloat16'
- Unable to Specify CUDA Stream for Collective Operations Using with torch.cuda.stream() context HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pytorch.