Comments (4)
Quick solution is to modify the following file:
<path-to-anaconda-environment>/lib/python3.8/site-packages/deepspeed/elasticity/elastic_agent.py
-
Comment out the following
from torch.distributed.elastic.agent.server.api import _get_socket_with_port
-
Then add the following which I just lifted from old version of pytorch
import socket
def _get_socket_with_port() -> socket.socket:
"""Return a free port on localhost.
The free port is "reserved" by binding a temporary socket on it.
Close the socket before passing the port to the entity that
requires it. Usage example::
sock = _get_socket_with_port()
with closing(sock):
port = sock.getsockname()[1]
sock.close()
# there is still a race-condition that some other process
# may grab this port before func() runs
func(port)
"""
addrs = socket.getaddrinfo(
host="localhost", port=None, family=socket.AF_UNSPEC, type=socket.SOCK_STREAM
)
for addr in addrs:
family, type, proto, _, _ = addr
s = socket.socket(family, type, proto)
try:
s.bind(("localhost", 0))
s.listen(0)
return s
except OSError as e:
s.close()
log.info("Socket creation attempt failed.", exc_info=e)
raise RuntimeError("Failed to create a socket")
from deepspeed.
Got same error when run unit tests on cross encoder in sentence transformer.
from deepspeed.
facing the same issue
from deepspeed.
Hi @fahadh4ilyas - can you test with the linked PR?
from deepspeed.
Related Issues (20)
- [BUG] File not found in autotuner cache in multi-node setting on SLURM HOT 1
- Inference with the MoE based GPT model trained by ds_pretrain_gpt_345M_MoE128.sh [BUG]
- RuntimeError: still have inflight params[BUG]
- Install issue with setuptools 70 HOT 2
- [BUG] oneapi/ccl.hpp: No such file or directory. HOT 1
- [BUG]模型卡在trainer.train()一直不训练
- [BUG] Running llama2-7b step3 with tensor parallel and HE fails due to incompatible shapes
- RuntimeError: Error building extension 'cpu_adam', because /usr/bin/ld: can not find -lcurand,help! HOT 1
- Fail to use zero_init to construct llama2 with deepspeed zero3 and bnb!
- does DeepSpeed support AMSP (a new DP shard strategy)
- [BUG] 'Invalidate trace cache' with Seq2SeqTrainer+predict_with_generate+Zero3
- AssertionError: Unable to pre-compile ops without torch installed. Please install torch before attempting to pre-compile ops. HOT 2
- How to set different learning rates for different parameters of LLMs
- Getting parameters of embeddings (safe_get_local_fp32_param)and setting the weight of embeddings (safe_set_local_fp32_param) does not work (bug?).
- [BUG] DeepSpeed on pypi not compatible with latest `numpy` HOT 5
- [BUG] GPU memory leaking after deleting deepspeed engine
- [BUG] Using and Building DeepSpeedCPUAdam HOT 4
- Bug Report: Issues Building DeepSpeed on Windows HOT 4
- [BUG] Logs full of FutureWarning when training with nightly PyTorch HOT 1
- [BUG] inference ValueError
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from deepspeed.