Comments (19)
from cheui.
Hi,
Just a follow up, the preprocessing step can be run in parallel using the flag -n and the number of cores you want to run it. But in terms of storage and computing require is relativelly high. I would recommend using cloud/HPC to run CHEUI.
from cheui.
Hi,
I use 20 cpus for the preprocessing step using the flag -n on a cluster, but it takes much more time than I expected.
from cheui.
Hi,
Sorry about the issue. We are working on making the preprocessing faster.
In the meantime is it possible for you to try a really large number of n. Let's say -n 400. This is because, by defining -n flag you also define how many small files you can create from the input file. The number of parallel process will be limited by the number of CPU's you have. But since it will finish faster on small files maybe over all time can be reduced.
Thanks,
Akanksha
from cheui.
Hi
I generate the nanopolish result file is about 4.2T, and I use 20 cpus for the preprocessing step; but it seems only use single core to process. it's too long time to wait...; any suggestions ?
from cheui.
from cheui.
Hi,
Yes, I used the C++ version, and it generate so many folers that I can not open it, it takes too much time to open
Thanks a lot
from cheui.
Hi,
The preprocessing step will first create a new folder and generate some temp files. The number of temp files is the same as the number of CPUs you choose in your command line. This step will use only one CPU. After all the temp files are generated, the C++ program will run in parallel with multi-threads. And the temp files will be removed in the end.
Upgrading your GCC compiler to a later version can speed up. For C++ version, I recommend setting the -n CPU, --cpu CPU
to your actual physical CPU number.
Hope this helps,
Eileen
from cheui.
Hi there,
A faster preprocessing solution:
- Split the huge nanopolish file into smaller files
- Run with preprocessing smaller runs
- Combine all the preprocessing outputs into one file and run the next step.
Hope this helps,
Eileen
from cheui.
Hi,
Great! I will try it.
Thanks a lot
Bai
from cheui.
Hi Eileen,
I followed your suggestions, but I was unfamiliar with the preprocessing output file because it is a binary file. Can you please provide the scripts for combining all the preprocessing outputs into one file?
Thanks a lot,
Bai
from cheui.
Hi Bai,
To combine the split files can you please run the combine_binary_file.py in the scripts folder as below:
python3 ../scripts/combine_binary_file.py -i folder with split binary files
-o combined output file name
here you need to provide the path of the folder with all the split files in it and the output file name.
Thanks,
Akanksha
from cheui.
Hi Akanksha,
Big thanks for your help.
I still have a question. The splited preprocessed file and combined file was the same file format. but why the combined file size were small than splited file?
Thanks a lot,
Bai
from cheui.
Hi Bai,
Yes, we noted that as well in our test case for the script. But it could be because it is a binary file. Also, the number of processed signals in the combined file is equal to the sum of processed signals in the individual files. So it should be fine for the next step. But, I would recommend not deleting the individual split files until you have the final results.
Thanks,
Akanksha
from cheui.
Hi Akanksha,
The combined file seem does not work in next step, it throws the following error:
2023-05-16 15:09:28.362177: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1 2023-05-16 15:09:29.863226: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set 2023-05-16 15:09:29.863463: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory 2023-05-16 15:09:29.863479: W tensorflow/stream_executor/cuda/cuda_driver.cc:326] failed call to cuInit: UNKNOWN ERROR (303) 2023-05-16 15:09:29.863506: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (retox): /proc/driver/nvidia/version does not exist 2023-05-16 15:09:29.863697: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.1 SSE4.2 AVX AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-05-16 15:09:29.867528: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set dictionary update sequence element #0 has length 152; 2 is required All signals have been processed 1
It worked with my individual split files.
Thanks a lot
Bai
from cheui.
Hi Bai,
Sorry, there was a bug in the code. The code was only combining the keys and not the values.
I have updated it now. Could you please give it a try now?
Thanks,
Akanksha
from cheui.
Hi Akanksha,
Yes, it works, but it consumed too much memory usage.
Thanks,
Bai
from cheui.
Hi Bai,
Sorry, could you please try the latest updated version of the script? It should solve the memory issue.
Thanks,
Akanksha
from cheui.
Hi Akanksha,
Big thanks to you, the memory issue solved.
Thanks,
Bai
from cheui.
Related Issues (20)
- transpose expects a vector of size 3. But input(1) is a vector of size 4
- about reference transcriptome file and explanation of cheui output HOT 2
- multi-threading in running model 1
- Regarding output of the m6A probability and stoichiometry command HOT 3
- CHEUI model 2: prediction of stoichiometry and modification probability at transcriptomic sites : Executed with Error HOT 5
- Column header _position_ in site_level_prediction HOT 6
- Regarding CHEUI tool HOT 6
- Issue with the signal rejoining script and model 2 with the 4 dimensional issues HOT 3
- CHEUI preprocess C++ compiling issue
- Low m5C modification site number overlapped with m5c-atlas database HOT 1
- SQK-RNA004 HOT 2
- Regarding prediction of stoichiometry
- Do CHEUI models work on Bacterial RNA modification detection? HOT 1
- Understanding of the read-level prediction file output HOT 1
- pickle data was truncated HOT 4
- Do CHEUI models work on SARS-CoV-2 RNA modification detection? HOT 1
- 2 Questions about model 2 and python/c++ preprocess HOT 1
- Compatibility with the new direct RNA seq kit HOT 2
- CHEUI_preprocess_m5C.py give error HOT 15
- Problem with preprocessm6A HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cheui.