Git Product home page Git Product logo

Comments (5)

zhuzilin avatar zhuzilin commented on June 12, 2024

可以在一部分结构上用 torch_scope 这个接口包一下,在 torch scope 里面的部分会用使用 fp32 进行训练,例如 moe 的例子里:

# The MoE modules are mainly of model parallel, we need to use `torch_scope`
# to separate it from the other chunk based data parallel modules.
# Also, MoE modules will take cart of its own communication, that's why
# we need to disable allreduce in the torch scope.
with torch_scope(do_allreduce=False):
self.output = fmoe.FMoETransformerMLP(
num_expert=2,
world_size=get_world_size(),
d_model=config.hidden_size,
d_hidden=config.intermediate_size,
gate=fmoe.gates.NaiveGate,
)

不过注意,如果只是要把一层设置为 fp32 的话,这里的 do_allreduce 应该设置为 True

from patrickstar.

Jack47 avatar Jack47 commented on June 12, 2024

可以在一部分结构上用 torch_scope 这个接口包一下,在 torch scope 里面的部分会用使用 fp32 进行训练,例如 moe 的例子里:

# The MoE modules are mainly of model parallel, we need to use `torch_scope`
# to separate it from the other chunk based data parallel modules.
# Also, MoE modules will take cart of its own communication, that's why
# we need to disable allreduce in the torch scope.
with torch_scope(do_allreduce=False):
self.output = fmoe.FMoETransformerMLP(
num_expert=2,
world_size=get_world_size(),
d_model=config.hidden_size,
d_hidden=config.intermediate_size,
gate=fmoe.gates.NaiveGate,
)

不过注意,如果只是要把一层设置为 fp32 的话,这里的 do_allreduce 应该设置为 True

妙啊,意思是这块是torch在管理的,不需要ps参与?

from patrickstar.

liaojianjin avatar liaojianjin commented on June 12, 2024

可以在一部分结构上用 torch_scope 这个接口包一下,在 torch scope 里面的部分会用使用 fp32 进行训练,例如 moe 的例子里:

# The MoE modules are mainly of model parallel, we need to use `torch_scope`
# to separate it from the other chunk based data parallel modules.
# Also, MoE modules will take cart of its own communication, that's why
# we need to disable allreduce in the torch scope.
with torch_scope(do_allreduce=False):
self.output = fmoe.FMoETransformerMLP(
num_expert=2,
world_size=get_world_size(),
d_model=config.hidden_size,
d_hidden=config.intermediate_size,
gate=fmoe.gates.NaiveGate,
)

不过注意,如果只是要把一层设置为 fp32 的话,这里的 do_allreduce 应该设置为 True

妙啊,意思是这块是torch在管理的,不需要ps参与?

应该是的,torch_scope 把 config 做了个临时修改

def torch_scope(do_allreduce=True):
r"""All parameters initialized in this scope will not be managed in chunks."""
_runtime_config.push()
_runtime_config.config["use_chunk"] = False
_runtime_config.config["do_allreduce"] = do_allreduce
yield
_runtime_config.pop()

在 Module init后将参数注册为torch管理,并且保持输入输出为float
if not _runtime_config.use_chunk:
for name, param in module.named_parameters(recurse=False):
name = f"{module.__class__.__name__}.{name}_{self.param_idx}"
register_param(param, ParamType.TORCH_BASED, torch.float, name)
if _runtime_config.do_allreduce:
self.client.torch_param_allreduce_list.append(param)
# We need to cast the inputs to fp32 for the unmanaged modules.
cast_forward(module, torch.float)
return

from patrickstar.

zhuzilin avatar zhuzilin commented on June 12, 2024

@Jack47 @liaojianjin
最近我们在对派大星进行全面的重构...所以这些特性可能之后都会有些变化.. 例如我们可能之后会直接复用 pytorch autocast,而不是实现自己版本的混合精度训练了,这样的话本 issue 中提到的 layernorm 设置成 fp32 的问题可能就迎刃而解了,也不需要在迁移后重新对齐精度了。所以现在的暴露的接口可能比较简陋,非常抱歉...

from patrickstar.

Jack47 avatar Jack47 commented on June 12, 2024

@Jack47 @liaojianjin 最近我们在对派大星进行全面的重构...所以这些特性可能之后都会有些变化.. 例如我们可能之后会直接复用 pytorch autocast,而不是实现自己版本的混合精度训练了,这样的话本 issue 中提到的 layernorm 设置成 fp32 的问题可能就迎刃而解了,也不需要在迁移后重新对齐精度了。所以现在的暴露的接口可能比较简陋,非常抱歉...

好的好的,👍

from patrickstar.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.