Git Product home page Git Product logo

Comments (14)

one-matrix avatar one-matrix commented on July 28, 2024 1

@pgrosu I agree with you very much, there are too many gene data formats at present, such as SAM/BAM/VCF/BCF, need to spend a lot of time on data preprocessing, tf records as a kind of intermediate data storage format, it is convenient to develop artificial intelligence models later.thank you for your contributions on it

from nucleus.

pgrosu avatar pgrosu commented on July 28, 2024 1

Hi @Tharindu-Nirmal,

It's possible to do, but really complex to install and properly configure given the new version of the CoLab environment. So the current Google CoLab now runs on Ubuntu 22.04 with Python 3.10.12, which is not what 0.6.0 of Nucleus is built on (Ubuntu 20.04 with Python 3.8). Basically a new version of Nucleus would need to be updated with that Ubuntu/Python version environment in mind. The easier alternative is to install Docker within the CoLab, and then to pull the Nucleus image using the following steps via the CoLab commands shown below:

!apt-get update
!apt-get install ca-certificates curl
!install -m 0755 -d /etc/apt/keyrings
!curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
!chmod a+r /etc/apt/keyrings/docker.asc

# Add the repository to Apt sources:
!echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
  $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
!apt-get update

!apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
!dockerd -b none --iptables=0 -l warn &

!docker pull yijun/nucleus:py3

I have not tried this specific image, and I think this is an older version of the Dockerized version of Nucleus. At least it will provide something to try it out. If you find a newer one, feel free to post it here so others can try it out.

Hope it helps,
Paul

from nucleus.

danielecook avatar danielecook commented on July 28, 2024

@one-matrix which version of pip are you using?

pip --version

from nucleus.

one-matrix avatar one-matrix commented on July 28, 2024

@danielecook it is pip 23.1.2. I tried python 3.7, not work,Don't know how to solve it

from nucleus.

pgrosu avatar pgrosu commented on July 28, 2024

@one-matrix I think you need Python 3.8 based on the following:

Classifier: Programming Language :: Python :: 3.8

from nucleus.

one-matrix avatar one-matrix commented on July 28, 2024

Snipaste_2023-05-11_13-44-39
Snipaste_2023-05-11_13-46-05
I used python 3.8 locally, the same error and you can use colab like that

from nucleus.

pgrosu avatar pgrosu commented on July 28, 2024

@one-matrix I see you're trying to install google-nucleus 0.5.6, but now the current version is 0.6.0:

https://pypi.org/project/google-nucleus/

Also the error you're seeing is saying that there is a newer version of tensorflow than the one you've specified that you can pick from: 2.8.0rc0, 2.8.0rc1, 2.8.0, etc.

Just to isolate the errors, I would first start with trying to pip install version 0.6.0 of google-nucleus and then see what is the next error afterwards that you get.

from nucleus.

one-matrix avatar one-matrix commented on July 28, 2024

@pgrosu hi,pgrosu .
Maybe you can go to the link below and try it out with "pip install google-nucleus". I want to fix this mistake.
“ERROR: Could not build wheels for google-nucleus, which is required to install pyproject.toml-based projects”
This version 0.6.0, has the same errors as 0.56.This link is also official.
https://colab.research.google.com/github/google/nucleus/blob/master/nucleus/examples/dna_sequencing_error_correction.ipynb

from nucleus.

pgrosu avatar pgrosu commented on July 28, 2024

@one-matrix If you replace it with the following it should work:

!pip download google-nucleus 
!tar xzf google_nucleus-0.6.0.tar.gz
%cd google_nucleus-0.6.0/
%rm -rf /usr/local/lib/python3.10/dist-packages/google_nucleus.egg-info
!python setup.py clean
!python setup.py install 
!pip install -q tensorflow==2.9.1

In the Python code you would need to also add the following after the import numpy as np:

import collections.abc
collections.MutableMapping = collections.abc.MutableMapping
collections.MutableSequence = collections.abc.MutableSequence

But there are other issues with the code. A bit swamped now, but will try to look at it when I get a bit of free time -- though happy if others jump in as well.

~p

from nucleus.

pgrosu avatar pgrosu commented on July 28, 2024

Hi @one-matrix,

So here's the simplest way I was able to get it to run.

  1. Replace the whole !pip install ... with the following code and execute it:
!wget https://bootstrap.pypa.io/get-pip.py
!python3.8 get-pip.py
!python3.8 -m pip install --upgrade pip
!python3.8 -m pip download google-nucleus 
!tar xzf google_nucleus-0.6.0.tar.gz
%cd google_nucleus-0.6.0/
!python3.8 setup.py clean
!python3.8 setup.py install 
!python3.8 -m pip install -r ./nucleus/pip_package/egg_files/requires.txt
!python3.8 -m pip install -q tensorflow==2.8.0
!python3.8 -m pip install protobuf==3.20.1

You will see as shown below that it installs Google Nucleus successfully:

image

  1. Now to run the code. For that you will just need to wrap it by adding a couple of extra lines at the beginning and end of each section, as I got it to run under Python 3.8. The following is the procedure on how to do this:
import subprocess

nucleus_code = """

  ...YOUR NUCLEUS CODE...

"""

nucleus_result = subprocess.run(['python3.8'], input=nucleus_code, capture_output=True, encoding='UTF-8')
print(nucleus_result.stdout)

Basically leave the code as is and add the top and bottom part. Below is a screenshot showing that it ran successfully:

image

Hope it makes sense, and feel free to let me know if you have any other questions.

Hope it helps,
Paul

from nucleus.

pgrosu avatar pgrosu commented on July 28, 2024

@one-matrix One more small thing, you will need to move tensorflow imports above the nucleus imports to not get an error. Below is the code:

import subprocess

nucleus_code = """

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import os
import random

import numpy as np

# Import TensorFlow after Nucleus.
import tensorflow as tf
from tensorflow.keras import layers

from nucleus.io import fasta
from nucleus.io import sam
from nucleus.io import vcf
from nucleus.io.genomics_writer import TFRecordWriter
from nucleus.protos import reads_pb2
from nucleus.util import cigar
from nucleus.util import ranges
from nucleus.util import utils

# Import TensorFlow after Nucleus.
#import tensorflow as tf
#from tensorflow.keras import layers

"""

nucleus_result = subprocess.run(['python3.8'], input=nucleus_code, capture_output=True, encoding='UTF-8')
print(nucleus_result.stderr)

Below is the standard error output showing a clean run:

image

from nucleus.

one-matrix avatar one-matrix commented on July 28, 2024

@pgrosu Thank you very much Paul, although it is a little strange to use, but it can work.
Nucleus are important and useful, what is the plan for nucleus in the future , I notice that it has not been updated for a long time. There are many similarities with "pysam:https://github.com/pysam-developers/pysam"

from nucleus.

pgrosu avatar pgrosu commented on July 28, 2024

@one-matrix Well, with one variation to pysam. Nucleus can create TensorFlow records of your data opening the world of machine learning to genomics. Basically think of different sets of genomic variations representing collections of varying language dialects. Then you can transform those into "dialect" models, such as "disease" dialects in a clinical setting. You can even ask a larger question, such as given a collection of library of books written in different dialects, what might have been the original book that started it all -- that would be your consensus dialect. All these models can even then help you with filling in missing data as well. But that's only the beginning, as there are so many ways to go from there. Regarding the roadmap for Nucleus that would something Google folks would know more than me, as I'm just dropping by at times helping out here and there.

from nucleus.

Tharindu-Nirmal avatar Tharindu-Nirmal commented on July 28, 2024

run_error
@pgrosu I followed your advise and tried to run the official notebook: from google. However, trying with python 3.8 gave distutils errors. I continued with python 3.10 (default for colab). However, the final run( ) command in the final cell throws an error:
RuntimeError: PythonNext() argument read is not valid: Dynamic cast failed

Hope someone could help me run the notebook.

from nucleus.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.