Comments (5)
可以在一部分结构上用 torch_scope
这个接口包一下,在 torch scope 里面的部分会用使用 fp32 进行训练,例如 moe 的例子里:
PatrickStar/examples/moe/moe_bert.py
Lines 53 to 64 in 0731c6e
不过注意,如果只是要把一层设置为 fp32 的话,这里的 do_allreduce
应该设置为 True
from patrickstar.
可以在一部分结构上用
torch_scope
这个接口包一下,在 torch scope 里面的部分会用使用 fp32 进行训练,例如 moe 的例子里:PatrickStar/examples/moe/moe_bert.py
Lines 53 to 64 in 0731c6e
不过注意,如果只是要把一层设置为 fp32 的话,这里的
do_allreduce
应该设置为True
妙啊,意思是这块是torch在管理的,不需要ps参与?
from patrickstar.
可以在一部分结构上用
torch_scope
这个接口包一下,在 torch scope 里面的部分会用使用 fp32 进行训练,例如 moe 的例子里:
PatrickStar/examples/moe/moe_bert.py
Lines 53 to 64 in 0731c6e
不过注意,如果只是要把一层设置为 fp32 的话,这里的
do_allreduce
应该设置为True
妙啊,意思是这块是torch在管理的,不需要ps参与?
应该是的,torch_scope 把 config 做了个临时修改
PatrickStar/patrickstar/core/preprocess.py
Lines 80 to 86 in d2a5e1d
在 Module init后将参数注册为torch管理,并且保持输入输出为float
PatrickStar/patrickstar/core/preprocess.py
Lines 366 to 375 in d2a5e1d
from patrickstar.
@Jack47 @liaojianjin
最近我们在对派大星进行全面的重构...所以这些特性可能之后都会有些变化.. 例如我们可能之后会直接复用 pytorch autocast,而不是实现自己版本的混合精度训练了,这样的话本 issue 中提到的 layernorm 设置成 fp32 的问题可能就迎刃而解了,也不需要在迁移后重新对齐精度了。所以现在的暴露的接口可能比较简陋,非常抱歉...
from patrickstar.
@Jack47 @liaojianjin 最近我们在对派大星进行全面的重构...所以这些特性可能之后都会有些变化.. 例如我们可能之后会直接复用 pytorch autocast,而不是实现自己版本的混合精度训练了,这样的话本 issue 中提到的 layernorm 设置成 fp32 的问题可能就迎刃而解了,也不需要在迁移后重新对齐精度了。所以现在的暴露的接口可能比较简陋,非常抱歉...
好的好的,
from patrickstar.
Related Issues (20)
- Reorganize logic of manager.
- Memory-centric tiling HOT 1
- Support both dynamic model data partition and static model data partition. HOT 1
- Polish memory and speed profiler.
- PatrickStar's Performance in Models Like GANs HOT 2
- Support NVMe HOT 1
- 运行报错 HOT 1
- Optimize chunk allocate and release HOT 2
- Proposal: overlap NVMe read and write with computing. HOT 2
- Skipping ADAM in warmup affects the overall performance.
- Support communication config before training
- Search the best chunk size. HOT 1
- Accelerate Chunk List Construction Speed. HOT 1
- support using PatrickStar on MegatronDeepSpeed? HOT 3
- Error when install under python3.6 HOT 1
- FP32ChunkReadBuffer throw errors for vit training.
- A major refactor to sacrifice some performance for flexiblity and simplicity HOT 1
- RuntimeError: chunk move failed. HOT 3
- install issue HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from patrickstar.