Comments (4)
Hi Rachel,
thanks for reporting this. I'm not entirely sure what could be causing this issue.
How many cells does your mdata
have? And do you have the same problem when you use a small example dataset (e.g. scirpy.datasets.wu2020_3k()
?
from scirpy.
My mdata has 71706 cells in the airr modality and iedb database has 34509 TCRs.
I don't have the same problem if I use a small dataset.
I think something happened at the end of VJ dist mat calculation, where a thread fails to shut down and then it gets stuck in lock indefinitely, suggested by this traceback:
File ~/mambaforge/envs/compbio/lib/python3.10/threading.py:1096, in Thread.join(self, timeout)
1093 raise RuntimeError("cannot join current thread")
1095 if timeout is None:
-> 1096 self._wait_for_tstate_lock()
1097 else:
1098 # the behavior of a negative timeout isn't documented, but
1099 # historically .join(timeout=x) for x<0 has acted as if timeout=0
1100 self._wait_for_tstate_lock(timeout=max(timeout, 0))
File ~/mambaforge/envs/compbio/lib/python3.10/threading.py:1116, in Thread._wait_for_tstate_lock(self, block, timeout)
1113 return
1115 try:
-> 1116 if lock.acquire(block, timeout):
1117 lock.release()
1118 self._stop()
KeyboardInterrupt:
Could it be a generic threading problem? open-telemetry/opentelemetry-python#2284
from scirpy.
You could try to increase the block size, which will result in less threads that could be stuck :D The only downside is that the progressbar doesn't update as continuously:
dist_calc = ir.ir_dist.metrics.AlignmentDistanceCalculator(cutoff, block_size=5000, n_jobs=32, ...)
sc.pp.ir_dist(adata, metric=dist_calc)
from scirpy.
Hi @racng,
I believe I found a solution to this in #473.
joblib.Parallel is allegedly more robust than the multiprocessing module from the standard library.
An added benefit is that you can also use the dask backend via joblib.parallel_config
to distribute the workload across multiple machines. The version from that PR also has an improved version of the alignment distance that is faster.
Would be great if you can give it a try! You can install it with
pip install [email protected]:icbi-lab/scirpy.git@ir-dist-parallelism
from scirpy.
Related Issues (20)
- tl.define_clonotypes within_group parameter returns ValueError HOT 1
- Integrate TCRdist3 HOT 5
- Retrieving specific portions of the Immune Receptor beyond the junction (or CDR3). HOT 2
- IEDB database cdr3_aa stored as junction_aa HOT 10
- Unclear default value for the Hamming Distance cut-off HOT 1
- Dandelion interoperability
- Where has UMI count for AIR chains gone? HOT 1
- Large dataset tutorial HOT 1
- Make sure axes of nextwork plots don't have any ticks
- Add the Morisita-Horn index for repertoire overlap similarity scores HOT 1
- Sorting logic in `index_chains()` HOT 3
- Community tutorial page
- ir.tl.ir_query fails with error 'ValueError: max_workers must be greater than 0' HOT 1
- ir.tl.clonotype_modularity - ValueError: Length of values does not match length of index HOT 2
- "read_10x_vdj" not loading data properly HOT 2
- clone definition purely using CDR3 sequence HOT 3
- Optimize TCRdist metric HOT 1
- When running 'ir.tl.define_clonotypes' on MacOS14.4.1, I've got an Error:module 'os' has no attribute 'sched_getaffinity' HOT 2
- TypeError: join() got an unexpected keyword argument 'validate' HOT 9
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from scirpy.