kundajelab / bpnet Goto Github PK

View Code? Open in Web Editor NEW

134.0 134.0 31.0 4.19 MB

Toolkit to train base-resolution deep neural networks on functional genomics data and to interpret them

Home Page: http://bit.ly/bpnet-colab

License: MIT License

Shell 0.06% Makefile 0.03% Python 11.70% Jupyter Notebook 88.21%

bpnet's People

Contributors

Stargazers

Watchers

bpnet's Issues

Computation of hypothetical contributions

I would like to ask about the computation of the "hypothetical" contribution scores (which tf modisco requires).

I understand from the paper and the code (particularly seqmodel._contrib_deeplift_fn), that the contribution is computed using deeplift as implemented in the DeepExplain package, using a zero sequence as baseline.

Then, judging from the code in the class ContribFile in contrib.py, specifically functions get_contrib and get_hyp_contrib, the actual and hypothetical contributions are computed as follows:

actual contribution = the deeplift result * the one hot input (so we zero everything except the signal for the base that was actually present)
hypothetical contribution = the deeplift result (averaged over strands?)
Is this correct?

As I understand, this works because this modification was made to DeepExplain: kundajelab/DeepExplain@2348bc8
(because, by default, deepexplain already does signal * (input - baseline), making a computation of the hypothetical impossible when a zero baseline is used).
Is this correct?

I am sorry if all of this is obvious, but I can't find this information in the paper, and I would like to do a similar analysis using Deeplift (from DeepExplain) + Modisco, so I want to make sure my understanding is correct.

Best,
Alex

`pip install bpnet` is broken due to `pprint` requirement

Running pip install bpnet currently fails with the following error:

ERROR: Could not find a version that satisfies the requirement pprint (from bpnet) (from versions: none)
ERROR: No matching distribution found for pprint (from bpnet)

This is likely related to the issue described in this SO post, where including pprint in a requirements list, such as this repository does in setup.py, was previously a harmless mistake and has now become a breaking mistake.

BPNet fails with wrong cudnn version, but tensorflow doesn't.

Hoo boy. Got a rough one.
I'm trying to run BPNet on chemical mapping data, and it gets to epoch one before it crashes. It doesn't even crash cleanly. There's a segfault, and control returns to the terminal, but several bpnet processes continue to exist though they don't seem to be doing anything. A killall bpnet is necessary to stop it. Logs are attached, with tensorflow complaining about driver versions.
But it gets worse than just the wrong version of the drivers. Because the simple Tensorflow tutorial succeeds. Included in the file is a testTensorflow.py file that executes correctly. This leads me to think that the problem is actually not a problem in tensorflow configuration, but rather an insidious bug in BPNet itself.
problemRun.zip

Good luck! Let me know if you need me to test anything.

Add an option to add track features to cwm-scan

Features to include:

max profile height
total counts
profile height at reference summits

BED file with strands

Hi,

I am trying to run the BPNET with Chip-seq data that is stranded. I have peaks specified in a bed file with a column specifying whether the peak belongs to the positive or negative strand. Is there a way to train the BPNET on specifically the stranded data? The bed file used in the BPNET tutorial does not specify the strand for each peak.

Thanks!

what is the expected input and targets for BPnet profile prediction

Hi,
first of all thanks for the great method. I am trying to reimplement the BPNet tool using most of the code from this repository while adjusting a few things for my needs. For my case I want only the profile predictor, I wish to know what the exact expected inputs for BPnet so I can implement a datagenerator using Keras.utils.Sequence. To be precise, for every sample i guess the the x will be a one hot encoded sequence, what is the target (y)? Is this the bigwig signal for the corresponding onehot sequence?

Thanks

input BED file needs to be BED3, not BED6

The TsvReader breaks when the dataspec.yaml file is provided with BED files that contain 6 columns because the naming convention does not match the expectations of that object. When converted to a BED3, the TsvReader works fine. Thoughts?

current pysam version installs incorrect libssl

In the current conda-env.yml the pysam version corresponds to one that incorrectly pulls the wrong libssl from conda. This causes an error when training indicating libcrypto0.1.0 does not exist. The issue can be fixed by using a later version. pysam==0.15.3 seems to do the trick.

bpnet installation problem

Hi,
I was trying to follow the end-to-end bpnet pipeline on google colab and had several installation problems I was hoping you could shed light on.
ERROR: Cannot install bpnet==0.0.1, bpnet==0.0.10, bpnet==0.0.11, bpnet==0.0.12, bpnet==0.0.13, bpnet==0.0.14, bpnet==0.0.15, bpnet==0.0.16, bpnet==0.0.17, bpnet==0.0.18, bpnet==0.0.19, bpnet==0.0.2, bpnet==0.0.20, bpnet==0.0.21, bpnet==0.0.22, bpnet==0.0.23, bpnet==0.0.3, bpnet==0.0.4, bpnet==0.0.5, bpnet==0.0.6, bpnet==0.0.7, bpnet==0.0.8 and bpnet==0.0.9 because these package versions have conflicting dependencies.

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tensorflow 2.5.0 requires h5py~=3.1.0, but you have h5py 3.3.0 which is incompatible.

Upon trying to install bpnet via pip, I receive this list of errors : The conflict is caused by:
bpnet 0.0.23 depends on pprint
bpnet 0.0.22 depends on pprint
bpnet 0.0.21 depends on pprint
bpnet 0.0.20 depends on pprint
bpnet 0.0.19 depends on pprint
bpnet 0.0.18 depends on pprint
bpnet 0.0.17 depends on pprint
bpnet 0.0.16 depends on pprint
bpnet 0.0.15 depends on pprint
bpnet 0.0.14 depends on pprint
bpnet 0.0.13 depends on pprint
bpnet 0.0.12 depends on pprint
bpnet 0.0.11 depends on pprint
bpnet 0.0.10 depends on pprint
bpnet 0.0.9 depends on pprint
bpnet 0.0.8 depends on pprint
bpnet 0.0.7 depends on pprint
bpnet 0.0.6 depends on pprint
bpnet 0.0.5 depends on pprint
bpnet 0.0.4 depends on pprint
bpnet 0.0.3 depends on pprint
bpnet 0.0.2 depends on pprint
bpnet 0.0.1 depends on pprint.

I found an issue where it said that the pprint requirement was removed from the setup file, but it seems that it's still causing trouble.

Thanks!

DeepLIFT contribution scores for one-hot encoded nucleotides

Given that ACGT are represented by one-hot encoding [1,0,0,0], [0,1,0,0], [0,0,1,0], and [0,0,0,1] respectively, each element in the encoding is assigned a DeepLIFT score (each nucleotide gets 4 scores). For example, given input ACGT represented by [1,0,0,0, 0,1,0,0, 0,0,1,0, 0,0,0,1] and hypothetical contribution scores [0.1, 0.07, -0.3, 0.01, 0.2, 0.09, -0.01, 0.8, 0.07, 0.02, 0.4, -0.1, -0.025, -1.0, 0.4, 0.35], how does BPNet assign one contribution score for each nucleotide?

Are the scores added such that the DeepLIFT contribution scores for ACGT are [0.1 + 0.07 - 0.3 + 0.01]/4, [0.2 + 0.09 - 0.01 + 0.8]/4, [0.07 + 0.02 + 0.4 - 0.1]/4, [-0.025 - 1.0 + 0.4 + 0.35]/4]? By looking at kundajelab/deeplift#106 (comment), it looks like the sum method is used in the DeepLIFT paper, but it's not very clear to me if that was the case for BPNet. Thanks.

examples file snakefile

rule chip_seq:
input:
[os.path.normpath(os.path.join('data', f))
for f in DataSpec.load('chip-nexus/dataspec.yml').list_all_files(include_peaks=True)]

should be:
for f in DataSpec.load('chip-seq/dataspec.yml').list_all_files(include_peaks=True)]

Empty batch error loading training data into memory

Hi there,

I am currently trying to train BPNet on some human ChIP-seq data we have for an RNA Pol III transcription factor. We have multiple cell lines that I would like to use as tasks for the model. I guess it's important to note that Pol III has very few targets in the human genome (and so does this TF we are working with, most of which overlap with Pol III targets), which mostly include tRNA genes, 5S rRNA and a few others. In total we get around 400 peaks in our stem cell model (similar to what others have seen), and we see great resolution and enrichment at expected sites, so we are sure the data is good.

I have set up BPNet in a conda environment and am running it from a notebook using that environment as the kernel. I am able to train on the test chip-seq data packaged with BPNet using this setup. However, when I try to use our data I get the following error:

0it [00:00, ?it/s]0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
Traceback (most recent call last):
  File "/home/drew/anaconda3/envs/bpnet/bin/bpnet", line 8, in <module>
    sys.exit(main())
  File "/home/drew/anaconda3/envs/bpnet/lib/python3.6/site-packages/bpnet/__main__.py", line 38, in main
    argh.dispatch(parser)
  File "/home/drew/anaconda3/envs/bpnet/lib/python3.6/site-packages/argh/dispatching.py", line 174, in dispatch
    for line in lines:
  File "/home/drew/anaconda3/envs/bpnet/lib/python3.6/site-packages/argh/dispatching.py", line 277, in _execute_command
    for line in result:
  File "/home/drew/anaconda3/envs/bpnet/lib/python3.6/site-packages/argh/dispatching.py", line 260, in _call
    result = function(*positional, **keywords)
  File "/home/drew/anaconda3/envs/bpnet/lib/python3.6/site-packages/bpnet/cli/train.py", line 698, in bpnet_train
    gpu=gpu)
  File "/home/drew/anaconda3/envs/bpnet/lib/python3.6/site-packages/gin/config.py", line 1069, in gin_wrapper
    utils.augment_exception_message_and_reraise(e, err_str)
  File "/home/drew/anaconda3/envs/bpnet/lib/python3.6/site-packages/gin/utils.py", line 41, in augment_exception_message_and_reraise
    raise proxy.with_traceback(exception.__traceback__) from None
  File "/home/drew/anaconda3/envs/bpnet/lib/python3.6/site-packages/gin/config.py", line 1046, in gin_wrapper
    return fn(*new_args, **new_kwargs)
  File "/home/drew/anaconda3/envs/bpnet/lib/python3.6/site-packages/bpnet/cli/train.py", line 410, in train
    num_workers=num_workers))
  File "/home/drew/anaconda3/envs/bpnet/lib/python3.6/site-packages/kipoi/data.py", line 408, in load_all
    return numpy_collate_concat([x for x in tqdm(self.batch_iter(batch_size, **kwargs))])
  File "/home/drew/anaconda3/envs/bpnet/lib/python3.6/site-packages/kipoi_utils/data_utils.py", line 21, in numpy_collate_fn
    if type(batch[0]).__module__ == 'numpy':
IndexError: list index out of range

I manually edited that data_utils.py script from kipoi to print the batch variable. Turns out its an empty list, hence the IndexError. While with the BPNet test data it is correctly filled with one-hot data for the sequences and count data. I have checked the input data and can't find anything obviously wrong, it all looks quite similar to the BPNet data. I generated the stranded bigwig files as per the FAQ section of the example notebook, and summit files come from macs2.
Any help would be much appreciated!

Drew

Bpnet Added Channels Leads to Profile Predictions of 0

Hi,

I am running Bpnet with 3 added input channels (in addition to the one hot input). These channels are the following bigwig files: NT2 RNA Seq rep 1, NT2 RNA Seq rep2, and NT2 Methyl-C Seq. In addition to adding these input channels, the only other change I made is the batchnorming of these channels (code I provide below). The three added channels are not sparse.

After running Bpnet with these changes, the training results are completely normal, showing a drop in both training and validation loss. But when I then extract the model from the generated seqmodel.pkl file in the result directory, and check to see its predictions on particular inputs (selected from the training set), I notice that all of Bpnets profile predictions contain 0's in its output, as shown in one example output below:


{'rep1/profile': array([[[4.6234442e-41, 1.6620867e-31],
         [1.2753405e-38, 1.9987589e-30],
         [1.3018063e-41, 2.2856440e-29],
         ...,
 
        **[[0.0000000e+00, 0.0000000e+00],
         [0.0000000e+00, 0.0000000e+00],
         [0.0000000e+00, 0.0000000e+00],
         ...,
         [0.0000000e+00, 0.0000000e+00],
         [0.0000000e+00, 0.0000000e+00],
         [0.0000000e+00, 0.0000000e+00]]]**, dtype=float32),

This doesn't make sense, as a prediction of 0 should lead to -infinite profile loss, yet during training the profile loss was not -inf and was showing general improvement. Is there a reason that those three added input channels, from that particular experiment that generated them, caused this issue?

I've checked to see if there is any divide by zero or Nan involved during batchnorm, and found no such occurrence. I've tested bpnet without using batchnorm of its inputs, and I still see the same thing. I've also tested on other added channels (from files not from the experiment that generated the three troublesome files), and Bpnet gives expected predictions after training.

Here is the added batch norm code:

# Function to normalize specific column in a minibatch
    def normalize_column(self, np_row, col):
        row = np_row[:,:,col]
        mean = np.mean(row, axis=0)
        var = np.var(row, axis=0)
        row = np.subtract(row, mean)
        return np.divide(row, np.sqrt(var) + 1e-6)

# Function to normalize minibatch
    def normalize(self, data):
        # Turn data to numpy array
        np_row = np.array(data["seq"])

        # Normalize each specified column
        for i in range(self.batchnorm_begin, self.batchnorm_end+1):
            data["seq"][:,:,i] = self.normalize_column(np_row, i)

        return data

bpnet export-bw skips contribution.bw file writing

In bpnet.BPNet.export_bw def on lines 422-424, the assertion fails when you try to run bpnet export-bw on a trained model, skipping all writing steps to the contrib.bw files.

Run BPnet with one input bw track

Hi,

I am trying to use bpnet on ATAC-seq data and there is one track for each tasks.
When I run bpnet chip-nexus-analysis, I got errors:

0%|          | 0/11 [00:00<?, ?it/s]/bpnet/lib/python3.6/site-packages/bpnet/modisco/table.py:130: UserWarning: region counts not present. Returning the default contribution scores
  warnings.warn("region counts not present. Returning the default contribution scores")
100%|██████████| 11/11 [00:35<00:00,  3.20s/it]
  0%|          | 0/11 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/bpnet/bin/bpnet", line 8, in <module>
    sys.exit(main())
  File "/bpnet/lib/python3.6/site-packages/bpnet/__main__.py", line 38, in main
    argh.dispatch(parser)
  File "/bpnet/lib/python3.6/site-packages/argh/dispatching.py", line 174, in dispatch
    for line in lines:
  File "/bpnet/lib/python3.6/site-packages/argh/dispatching.py", line 277, in _execute_command
    for line in result:
  File "/bpnet/lib/python3.6/site-packages/argh/dispatching.py", line 260, in _call
    result = function(*positional, **keywords)
  File "/bpnet/lib/python3.6/site-packages/bpnet/cli/modisco.py", line 905, in chip_nexus_analysis
    footprint_width=footprint_width)
  File "/bpnet/lib/python3.6/site-packages/bpnet/cli/modisco.py", line 722, in modisco_table
    df = modisco_table(data)
  File "/bpnet/lib/python3.6/site-packages/bpnet/modisco/table.py", line 199, in modisco_table
    for pattern in tqdm(data.mf.pattern_names())])
  File "/bpnet/lib/python3.6/site-packages/bpnet/modisco/table.py", line 199, in <listcomp>
    for pattern in tqdm(data.mf.pattern_names())])
  File "/bpnet/lib/python3.6/site-packages/bpnet/modisco/table.py", line 220, in pattern_features
    for res in pattern_task_features(pattern, task, data)] +
  File "/bpnet/lib/python3.6/site-packages/bpnet/modisco/table.py", line 220, in <listcomp>
    for res in pattern_task_features(pattern, task, data)] +
  File "/bpnet/lib/python3.6/site-packages/bpnet/modisco/table.py", line 278, in pattern_task_features
    ("footprint", task_footprint(pattern, task, data)),
  File "/bpnet/lib/python3.6/site-packages/bpnet/modisco/table.py", line 322, in task_footprint
    ("strandcor", correlate(agg_profile_norm[:, 0], agg_profile_norm[::-1, 1]).max()),
IndexError: index 1 is out of bounds for axis 1 with size 1

I thought this is because that there is only one track in agg_profile_norm. To run the programme, I changed the line 322 in bpnet/modisco/table.py to ("strandcor", correlate(agg_profile_norm[:], agg_profile_norm[::-1]).max()). I know doing this will change the "strandcor" value. So, I am wondering whether it is okay to do changes like this, and how much this will affect the motif discovery and the further CWM scan step? Or, is there another way to run the programme with only one bw track?

Thanks in advance for your help!

Best,
Yan

Error building Leidenalg wheel in Colab environment

Hello,

I am a graduate student from the Luca/Pique-Regi lab and I'm trying to install leidenalg to ultimately run BPNet. I am using a Google Colab environment with Python 3.6. Please see below:

!sudo apt-get install build-essential autoconf automake flex bison
!sudo update-alternatives --config python3
!sudo apt install python3-pip
!python -m pip install --upgrade pip
!apt-get install -y bedtools > /dev/null
!pip install pprint -i http://pypi.douban.com/simple/ --trusted-host pypi.douban.com
!pip install git+https://github.com/kundajelab/DeepExplain.git --quiet
!pip install -U cloudpickle<1.7.0 h5py tqdm --quiet
!pip install -U pyyaml --quiet
!pip install igraph
!pip install leidenalg --no-use-pep517
!export DISABLE_BCOLZ_AVX2=true
!pip install bpnet --quiet --quiet

I get the following error: ERROR: Failed building wheel for leidenalg. Do you know what might cause this? I do not think I will be able to use a different version of Python due to errors with other dependencies. Thank you!

bpnet contrib function error

Hi,

When running "bpnet contrib", there is a Type error:

DeepExplain: running "deeplift" explanation method (5)
Model with multiple inputs: True
DeepExplain: running "deeplift" explanation method (5)
Model with multiple inputs: True
DeepExplain: running "deeplift" explanation method (5)
Model with multiple inputs: True
DeepExplain: running "deeplift" explanation method (5)
Model with multiple inputs: True
0%| | 0/165.0 [00:07<?, ?it/s]
Traceback (most recent call last):
File "/data15/guang/anaconda3/bin/bpnet", line 33, in
sys.exit(load_entry_point('bpnet==0.0.23', 'console_scripts', 'bpnet')())
File "/data15/guang/anaconda3/lib/python3.7/site-packages/bpnet-0.0.23-py3.7.egg/bpnet/main.py", line 38, in main
argh.dispatch(parser)
File "/data15/guang/anaconda3/lib/python3.7/site-packages/argh/dispatching.py", line 174, in dispatch
for line in lines:
File "/data15/guang/anaconda3/lib/python3.7/site-packages/argh/dispatching.py", line 277, in _execute_command
for line in result:
File "/data15/guang/anaconda3/lib/python3.7/site-packages/argh/dispatching.py", line 260, in _call
result = function(*positional, **keywords)
File "/data15/guang/anaconda3/lib/python3.7/site-packages/bpnet-0.0.23-py3.7.egg/bpnet/cli/contrib.py", line 241, in bpnet_contrib
batch_size=None) # don't second-batch
File "/data15/guang/anaconda3/lib/python3.7/site-packages/bpnet-0.0.23-py3.7.egg/bpnet/seqmodel.py", line 244, in contrib_score
return fn(input_to_list(input_names, x))[0]
TypeError: 'list' object is not callable

It seems "fn" here is already a list. Could you please help with this? Any help would be much appreciated!

ERROR: No matching distribution found for deepexplain (Installation of BPnet)

Hi,

I created a new conda environment, and try to install bpnet following the instruction you provided as well as the solutions to the previous issues, and I met the following error when I run:

pip install git+https://github.com/kundajelab/bpnet.git

the error is:

ERROR: Could not find a version that satisfies the requirement deepexplain (from bpnet) (from versions: none)
ERROR: No matching distribution found for deepexplain

Could you please help me to solve this issue, thank you very much!
btw, I have tried different versions of DeepExplain:

Installing collected packages: deepexplain
  Attempting uninstall: deepexplain
    Found existing installation: deepexplain 0.1
    Uninstalling deepexplain-0.1:
      Successfully uninstalled deepexplain-0.1
  Running setup.py develop for deepexplain
Successfully installed deepexplain-0.3

Best regards,
Min

examples/recommendations for ATAC/Dnase-seq data

Dear developers,

This message comes from a very confused graduate student. Briefly, I'm trying to adopt the bpnet framework to my ATAC and Dnase-seq data (to avoid eyeballing motif scan results). However, I'm not sure what's the recommended practice to do so. From Anshul's early response in this issue (#18 (comment)), it seems that the basepairmodels repo should contain such info. However, the tutorial offered in this repo was based on ChIP-seq. I found another repo called kerasAC, which seems to be developed for ATAC/Dnase-seq data, as implied by the "AC" in its name. However, I had some issues getting the repo to work, and the examples offered are not as detailed as this repo. In summary, I'm not entirely sure which is the "right" repo to use for ATAC/Dnase data based analyses and if there is a guideline for parameters. I would highly appreciate it if you could point me to the correct direction!

Thank you so much,
Changxu Fan

preds.neg.bw and preds.pos.bw

Dear developers,

I want to predict all the potential CTCF-binding sites in the human genome. The input file is the chip-seq file from one cell line.

Using "bpnet export-bw", I got two bw files: the preds.neg.bw and preds.pos.bw .

Using "bpnet cwm-scan", i got one file "motif-instances.tsv".

I'm not sure which file is the exact file I needed. I would highly appreciate it if you could point me to the correct direction!

Thank you so much,

kangli zhu

pprint dependency prevents conda env create

Hi,

I get the error below when trying to create the conda env for bpnet. I get a similar error using either method in the git README. I can install the dependencies manually, then install bpnet with pip --no-deps but will I be missing something?

> conda env create -f conda-env.yml
Collecting package metadata (repodata.json): done
Solving environment: failed

ResolvePackageNotFound: 
  - pprint

export-bw when training a model without negs errors out.

I trained a model on some unstranded data but got the following error during bw export:

line 408, in export_bw    add_entry(bws[task]['preds.neg'], preds[:, 1],IndexError: index 1 is out of bounds for axis 1 with size 1

Looking at the line it's:

bpnet/bpnet/BPNet.py

Line 408 in 8d63940

add_entry(bws[task]['preds.neg'], preds[:, 1],

which exports the negative preds. Since my data don't have negative preds, my guess is that axis is empty. To test whether this was the issue, I commented out Line 408 and Line 335:

bpnet/bpnet/BPNet.py

Line 335 in 8d63940

output_feats = ['preds.pos', 'preds.neg', 'contrib.profile', 'contrib.counts']

I replaced 335 with an array lacking the preds.neg entry. Rebuilt & that works fine.

So it seems there needs to be a check somewhere to determine whether the model has negs or not.

Datasets not loading correctly unless `--in-memory` specified.

When training BPNet, if you (1) do not specify --in-memory and also (2) have a --config input that is different from the bpnet9 premade config file, the data does not load properly and the process freezes right before training the model. Everything loads correctly until that point. All CPU and GPU usage also crashes.

AttributeError: 'str' object has no attribute 'decode' error in bpnet contrib

Hello, I've been having some issues trying to get bpnet to run on an aws ec2 instance.

After installing and setting up bpnet using the conda-env.yml file, I started following the google colab tutorial using the example chip-nexus data.

I ran bpnet using the following command:

bpnet train /home/ec2-user/bpnet/examples/chip-nexus/dataspec.yml --premade=bpnet9 --config=/home/ec2-user/bpnet/examples/chip-nexus/config.gin . --override='train.epochs=10' --in-memory

This worked fine so I attempted to calculate contributions scores with the following command:

bpnet contrib /home/ec2-user/bpnet_output_dir/ --method=deeplift --contrib-wildcard='*/profile/wn' /home/ec2-user/bpnet_output_dir/contrib.deeplift.h5 --overwrite

However, I got the following error:

Traceback (most recent call last):
  File "/home/ec2-user/miniconda3/envs/bpnet/bin/bpnet", line 8, in <module>
    sys.exit(main())
  File "/home/ec2-user/miniconda3/envs/bpnet/lib/python3.6/site-packages/bpnet/__main__.py", line 38, in main
    argh.dispatch(parser)
  File "/home/ec2-user/miniconda3/envs/bpnet/lib/python3.6/site-packages/argh/dispatching.py", line 174, in dispatch
    for line in lines:
  File "/home/ec2-user/miniconda3/envs/bpnet/lib/python3.6/site-packages/argh/dispatching.py", line 277, in _execute_command
    for line in result:
  File "/home/ec2-user/miniconda3/envs/bpnet/lib/python3.6/site-packages/argh/dispatching.py", line 260, in _call
    result = function(*positional, **keywords)
  File "/home/ec2-user/miniconda3/envs/bpnet/lib/python3.6/site-packages/bpnet/cli/contrib.py", line 241, in bpnet_contrib
    batch_size=None)  # don't second-batch
  File "/home/ec2-user/miniconda3/envs/bpnet/lib/python3.6/site-packages/bpnet/seqmodel.py", line 225, in contrib_score
    fn = self._contrib_deeplift_fn(x, name, preact_only=preact_only)
  File "/home/ec2-user/miniconda3/envs/bpnet/lib/python3.6/site-packages/bpnet/seqmodel.py", line 145, in _contrib_deeplift_fn
    self.model = load_model(temp.name)
  File "/home/ec2-user/miniconda3/envs/bpnet/lib/python3.6/site-packages/keras/engine/saving.py", line 419, in load_model
    model = _deserialize_model(f, custom_objects, compile)
  File "/home/ec2-user/miniconda3/envs/bpnet/lib/python3.6/site-packages/keras/engine/saving.py", line 224, in _deserialize_model
    model_config = json.loads(model_config.decode('utf-8'))
AttributeError: 'str' object has no attribute 'decode'

To troubleshoot, I tried adding print statements in a few of the traceback files but I wasn't able to figure out what the initial cause of the error was.

Any help or guidance would be greatly appreciated!!

Thank you!

running bpnet cwm-scan modisco

Dear developers,

I want to discover motifs with TF-MoDISco. When i run "bpnet modisco-run contrib.scores.h5 --premade=modisco override='TfModiscoWorkflow.max_seqlets_per_metacluster=20000' modisco/", i got an warning message

H5pyDeprecationWarning: The default file mode will change to 'r' (read-only) in h5py 3.0. To suppress this warning, pass the mode you need to h5py.File(), or set the global default h5.get_config().default_file_mode, or set the environment variable H5PY_DEFAULT_READONLY=1. Available modes are: 'r', 'r+', 'w', 'w-'/'x', 'a'. See the docs for details. grp = h5py.File(output_path)"

and an error message

[Parallel(n_jobs=10)]: Using backend LokyBackend with 10 concurrent workers.
[Parallel(n_jobs=10)]: Done  30 tasks      | elapsed:  1.7min
[Parallel(n_jobs=10)]: Done  50 out of  50 | elapsed:  2.5min finished
[Parallel(n_jobs=10)]: Using backend LokyBackend with 10 concurrent workers.
[Parallel(n_jobs=10)]: Done  40 tasks      | elapsed:    0.9s
[Parallel(n_jobs=10)]: Done 340 tasks      | elapsed:    6.2s
[Parallel(n_jobs=10)]: Done 840 tasks      | elapsed:   14.9s
[Parallel(n_jobs=10)]: Done 892 out of 911 | elapsed:   15.9s remaining:    0.3s
[Parallel(n_jobs=10)]: Done 911 out of 911 | elapsed:   16.1s finished
[Parallel(n_jobs=10)]: Using backend LokyBackend with 10 concurrent workers.
[Parallel(n_jobs=10)]: Done  30 tasks      | elapsed:   30.9s
[Parallel(n_jobs=10)]: Done  50 out of  50 | elapsed:   45.7s finished
[Parallel(n_jobs=10)]: Using backend LokyBackend with 10 concurrent workers.
[Parallel(n_jobs=10)]: Done  40 tasks      | elapsed:    0.3s
[Parallel(n_jobs=10)]: Done 307 out of 307 | elapsed:    1.9s finished
[Parallel(n_jobs=10)]: Using backend LokyBackend with 10 concurrent workers.
[Parallel(n_jobs=10)]: Done  30 tasks      | elapsed:   14.6s
[Parallel(n_jobs=10)]: Done  50 out of  50 | elapsed:   22.1s finished
[Parallel(n_jobs=10)]: Using backend LokyBackend with 10 concurrent workers.
[Parallel(n_jobs=10)]: Done  40 tasks      | elapsed:    0.2s
[Parallel(n_jobs=10)]: Done 114 out of 114 | elapsed:    0.4s finished
[Parallel(n_jobs=10)]: Using backend LokyBackend with 10 concurrent workers.
[Parallel(n_jobs=10)]: Done  30 tasks      | elapsed:   11.8s
[Parallel(n_jobs=10)]: Done  50 out of  50 | elapsed:   18.0s finished
[Parallel(n_jobs=10)]: Using backend LokyBackend with 10 concurrent workers.
[Parallel(n_jobs=10)]: Done  40 tasks      | elapsed:    0.2s
[Parallel(n_jobs=10)]: Done  92 out of 111 | elapsed:    0.3s remaining:    0.1s
[Parallel(n_jobs=10)]: Done 111 out of 111 | elapsed:    0.4s finished
[Parallel(n_jobs=10)]: Using backend LokyBackend with 10 concurrent workers.
[Parallel(n_jobs=10)]: Done  30 tasks      | elapsed:   11.8s
[Parallel(n_jobs=10)]: Done  50 out of  50 | elapsed:   18.0s finished
[Parallel(n_jobs=10)]: Using backend LokyBackend with 10 concurrent workers.
[Parallel(n_jobs=10)]: Done  40 tasks      | elapsed:    0.1s
[Parallel(n_jobs=10)]: Done  54 out of  73 | elapsed:    0.2s remaining:    0.1s
[Parallel(n_jobs=10)]: Done  73 out of  73 | elapsed:    0.2s finished
[Parallel(n_jobs=10)]: Using backend LokyBackend with 10 concurrent workers.
[Parallel(n_jobs=10)]: Done  30 tasks      | elapsed:   11.6s
[Parallel(n_jobs=10)]: Done  50 out of  50 | elapsed:   17.5s finished
/home/zhenyingLab/zhukangli/.conda/envs/bpnet/lib/python3.6/site-packages/bpnet/cli/modisco.py:112: H5pyDeprecationWarning: The default file mode will change to 'r' (read-only) in h5py 3.0. To suppress this warning, pass the mode you need to h5py.File(), or set the global default h5.get_config().default_file_mode, or set the environment variable H5PY_DEFAULT_READONLY=1. Available modes are: 'r', 'r+', 'w', 'w-'/'x', 'a'. See the docs for details.
  grp = h5py.File(output_path)

Executing:   0%|          | 0/13 [00:00<?, ?cell/s]
Executing:   8%|▊         | 1/13 [00:08<01:40,  8.40s/cell]
Executing:  31%|███       | 4/13 [00:38<01:28,  9.82s/cell]
Executing:  54%|█████▍    | 7/13 [00:40<00:29,  4.88s/cell]
Executing:  69%|██████▉   | 9/13 [00:44<00:15,  3.82s/cell]
Executing:  69%|██████▉   | 9/13 [00:45<00:20,  5.00s/cell]
Traceback (most recent call last):
  File "/home/zhenyingLab/zhukangli/.conda/envs/bpnet/bin/bpnet", line 8, in <module>
    sys.exit(main())
  File "/home/zhenyingLab/zhukangli/.conda/envs/bpnet/lib/python3.6/site-packages/bpnet/__main__.py", line 38, in main
    argh.dispatch(parser)
  File "/home/zhenyingLab/zhukangli/.conda/envs/bpnet/lib/python3.6/site-packages/argh/dispatching.py", line 174, in dispatch
    for line in lines:
  File "/home/zhenyingLab/zhukangli/.conda/envs/bpnet/lib/python3.6/site-packages/argh/dispatching.py", line 277, in _execute_command
    for line in result:
  File "/home/zhenyingLab/zhukangli/.conda/envs/bpnet/lib/python3.6/site-packages/argh/dispatching.py", line 260, in _call
    result = function(*positional, **keywords)
  File "/home/zhenyingLab/zhukangli/.conda/envs/bpnet/lib/python3.6/site-packages/bpnet/cli/modisco.py", line 342, in bpnet_modisco_run
    null_per_pos_scores=null_per_pos_scores)
  File "/home/zhenyingLab/zhukangli/.conda/envs/bpnet/lib/python3.6/site-packages/gin/config.py", line 1605, in gin_wrapper
    utils.augment_exception_message_and_reraise(e, err_str)
  File "/home/zhenyingLab/zhukangli/.conda/envs/bpnet/lib/python3.6/site-packages/gin/utils.py", line 41, in augment_exception_message_and_reraise
    raise proxy.with_traceback(exception.__traceback__) from None
  File "/home/zhenyingLab/zhukangli/.conda/envs/bpnet/lib/python3.6/site-packages/gin/config.py", line 1582, in gin_wrapper
    return fn(*new_args, **new_kwargs)
  File "/home/zhenyingLab/zhukangli/.conda/envs/bpnet/lib/python3.6/site-packages/bpnet/cli/modisco.py", line 129, in modisco_run
    modisco_dir=os.path.dirname(output_path)))
  File "/home/zhenyingLab/zhukangli/.conda/envs/bpnet/lib/python3.6/site-packages/bpnet/utils.py", line 51, in render_ipynb
    parameters=params
  File "/home/zhenyingLab/zhukangli/.conda/envs/bpnet/lib/python3.6/site-packages/papermill/execute.py", line 122, in execute_notebook
    raise_for_execution_errors(nb, output_path)
  File "/home/zhenyingLab/zhukangli/.conda/envs/bpnet/lib/python3.6/site-packages/papermill/execute.py", line 234, in raise_for_execution_errors
    raise error
papermill.exceptions.PapermillExecutionError: 
---------------------------------------------------------------------------
Exception encountered at "In [6]":
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-6-fd816d05060b> in <module>
----> 1 mf.plot_all_patterns(trim_frac=0, letter_width=0.14, height=0.5, ylim=[0, 2], no_axis=True)

~/.conda/envs/bpnet/lib/python3.6/site-packages/bpnet/modisco/files.py in plot_all_patterns(self, kind, trim_frac, n_min_seqlets, ylim, no_axis, **kwargs)
    525                               kind=kind,
    526                               trim_frac=trim_frac,
--> 527                               **kwargs)
    528             if ylim is not None:
    529                 plt.ylim(ylim)

~/.conda/envs/bpnet/lib/python3.6/site-packages/bpnet/modisco/files.py in plot_pattern(self, pattern_name, kind, rc, trim_frac, letter_width, height, rotate_y, ylab)
    494                      rotate_y=0,
    495                      ylab=True):
--> 496         pattern = self.get_pattern(pattern_name)
    497         pattern = pattern.trim_seq_ic(trim_frac)
    498         ns = self.n_seqlets(pattern_name)

~/.conda/envs/bpnet/lib/python3.6/site-packages/bpnet/modisco/files.py in get_pattern(self, pattern_name)
    126         from bpnet.modisco.core import Pattern
    127         # TODO - add number of seqlets?
--> 128         return Pattern.from_hdf5_grp(self._get_pattern_grp(*pattern_name.split("/")), pattern_name)
    129 
    130     def metaclusters(self):

~/.conda/envs/bpnet/lib/python3.6/site-packages/bpnet/modisco/core.py in from_hdf5_grp(cls, grp, name)
    170                 return Pattern(name,
    171                                seq=grp['sequence']['fwd'][:],
--> 172                                contrib={t: grp[t][grp_1][contrib_name]['fwd'][:] for t in tasks},
    173                                hyp_contrib={t: grp[t][grp_1][hyp_contrib_name]['fwd'][:] for t in tasks})
    174 

~/.conda/envs/bpnet/lib/python3.6/site-packages/bpnet/modisco/core.py in <dictcomp>(.0)
    170                 return Pattern(name,
    171                                seq=grp['sequence']['fwd'][:],
--> 172                                contrib={t: grp[t][grp_1][contrib_name]['fwd'][:] for t in tasks},
    173                                hyp_contrib={t: grp[t][grp_1][hyp_contrib_name]['fwd'][:] for t in tasks})
    174 

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

~/.conda/envs/bpnet/lib/python3.6/site-packages/h5py/_hl/group.py in __getitem__(self, name)
    262                 raise ValueError("Invalid HDF5 object reference")
    263         else:
--> 264             oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
    265 
    266         otype = h5i.get_type(oid)

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

h5py/h5o.pyx in h5py.h5o.open()

KeyError: "Unable to open object (object 'profile' doesn't exist)"

  In call to configurable 'modisco_run' (<function modisco_run at 0x149578342f28>)

Any help or guidance would be greatly appreciated!

Thank you!

kangli zhu

Double-checking: full training data included in chip-nexus example?

Hi,

I was hoping to run an experiment in which I want to train the bpnet model on the full set of chip-nexus data. I know in the example notebook it limits to a subset of the chromosomes used, so I just want to verify that if I remove this line:

exclude_chr=["chrX","chrY","chr5","chr6","chr7","chr10","chr14","chr11","chr13","chr12","chr15"]

from the config, I'll be using the full dataset.

Related to this, I want to try something related to the sequence region on which the CRISPR experiment was done. This means I intend to use chromosome 10 as my hold-out chromosome. Just wanted to double-check that this should be fine.

Normalizing by SeqLen in Profile Loss

Dear authors,

First off, amazing work - I really enjoyed your paper!

I have a question regarding profile loss normalization by sequence length. In the implementation of multinomial_nll in losses.py, the sum-reduced profile loss is normalized by seqlen, however, seqlen is defined as seqlen = tf.to_float(tf.shape(true_counts)[0]). Wouldn't this normalize the loss by the batch size, since the shape of true_counts is (batch, seqlen, channels)?

Thanks in advance for clarifying!

Cannot import name `AsyncKernelManager` in bpnet training step of Colab notebook

When I originally ran the Colab notebook, I got an error that prevented the evaluation step of training from completing successfully:

Error: ImportError: cannot import name 'AsyncKernelManager' from 'jupyter_client`

I resolved this by adding !pip install jupyterclient==6.1.2 to the end of the pip install cell as suggested here, but this is admittedly a bit of a hack.

That said, I figured creating an issue so that others could find the solution easily was better than just fixing it silently.

Error when traning ATAC-seq with one track

Hi,
When training on ATAC-seq data without specifying positive and negative tracks, there is a IndexError

File "/storage/chen/home/zz4/anaconda3/envs/bpnet/lib/python3.6/site-packages/gin/utils.py", line 41, in augment_exception_message_and_reraise
    raise proxy.with_traceback(exception.__traceback__) from None
  File "/storage/chen/home/zz4/anaconda3/envs/bpnet/lib/python3.6/site-packages/gin/config.py", line 1582, in gin_wrapper
    return fn(*new_args, **new_kwargs)
  File "/storage/chen/home/zz4/anaconda3/envs/bpnet/lib/python3.6/site-packages/bpnet/cli/train.py", line 469, in train
    num_workers=num_workers))
  File "/storage/chen/home/zz4/anaconda3/envs/bpnet/lib/python3.6/site-packages/bpnet/utils.py", line 51, in render_ipynb
    parameters=params
  File "/storage/chen/home/zz4/anaconda3/envs/bpnet/lib/python3.6/site-packages/papermill/execute.py", line 122, in execute_notebook
    raise_for_execution_errors(nb, output_path)
  File "/storage/chen/home/zz4/anaconda3/envs/bpnet/lib/python3.6/site-packages/papermill/execute.py", line 234, in raise_for_execution_errors
    raise error
papermill.exceptions.PapermillExecutionError: 
---------------------------------------------------------------------------
Exception encountered at "In [26]":
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-26-36ba7ef40acb> in <module>
     30         # Predicted
     31         (f"\nPred", to_neg(pred[f'{task}/profile'][0] * np.exp(pred[f'{task}/counts'][0]))),
---> 32     ] for task_idx, task in enumerate(tasks)])
     33 
     34     sl = slice(*xlim)

<ipython-input-26-36ba7ef40acb> in <listcomp>(.0)
     30         # Predicted
     31         (f"\nPred", to_neg(pred[f'{task}/profile'][0] * np.exp(pred[f'{task}/counts'][0]))),
---> 32     ] for task_idx, task in enumerate(tasks)])
     33 
     34     sl = slice(*xlim)

<ipython-input-25-040d466e1211> in to_neg(track)
      3     """
      4     track = track.copy()
----> 5     track[:, 1] = - track[:, 1]
      6     return track

IndexError: index 1 is out of bounds for axis 1 with size 1

  In call to configurable 'train' (<function train at 0x7f93dcb01b70>)

Could you please provide an example of how to run BPNet on unstrand data type?

Viewing the hypothetical impscores

I'm interested in viewing the hyp_impscores of the BPNet outputs, as seen in line 5, row 2 of this document.

First, I'm able to view the standard contribution scores using the dictionary produced via:
viz_dict, seq, imp_scores = interval_predict(bpnet, ds, interval, tasks, smooth_obs_n=0, neg_rev=False, incl_pred=True)

E.g., via viz_dict['Nanog Imp profile'], which I understand to represent the absolute amount of the contribution score for task {task=Nanog} at the motif instance position — i.e., these are the "contribution of each base within the input sequence to the entire predicted ChiP nexus profile of the TF {task=Nanog} output by DeepLIFT"

As I understand, these are derived by taking the dot product of hyp_impscores with the given one-hot encoding. Is it true that imp_scores['Nanog/profile/wn'][0] (for this example) stores the hyp_impscores?

Installation error, likely crashing due to dependency DeepExplain

Hi,

I have been trying to install bpnet but I am unable to do that. I tried both pip and conda env create. With pip install bpnet I get the following error:

ERROR: Cannot install bpnet==0.0.1, bpnet==0.0.10, bpnet==0.0.11, bpnet==0.0.12, bpnet==0.0.13, bpnet==0.0.14, bpnet==0.0.15, bpnet==0.0.16, bpnet==0.0.17, bpnet==0.0.18, bpnet==0.0.19, bpnet==0.0.2, bpnet==0.0.20, bpnet==0.0.21, bpnet==0.0.22, bpnet==0.0.23, bpnet==0.0.3, bpnet==0.0.4, bpnet==0.0.5, bpnet==0.0.6, bpnet==0.0.7, bpnet==0.0.8 and bpnet==0.0.9 because these package versions have conflicting dependencies.

The conflict is caused by:
    bpnet 0.0.23 depends on deepexplain
    bpnet 0.0.22 depends on deepexplain
    bpnet 0.0.21 depends on deepexplain
    bpnet 0.0.20 depends on deepexplain
    bpnet 0.0.19 depends on deepexplain
    bpnet 0.0.18 depends on deepexplain
    bpnet 0.0.17 depends on deepexplain
    bpnet 0.0.16 depends on deepexplain
    bpnet 0.0.15 depends on deepexplain
    bpnet 0.0.14 depends on deepexplain
    bpnet 0.0.13 depends on deepexplain
    bpnet 0.0.12 depends on deepexplain
    bpnet 0.0.11 depends on deepexplain
    bpnet 0.0.10 depends on deepexplain
    bpnet 0.0.9 depends on deepexplain
    bpnet 0.0.8 depends on deepexplain
    bpnet 0.0.7 depends on deepexplain
    bpnet 0.0.6 depends on deepexplain
    bpnet 0.0.5 depends on deepexplain
    bpnet 0.0.4 depends on deepexplain
    bpnet 0.0.3 depends on deepexplain
    bpnet 0.0.2 depends on deepexplain
    bpnet 0.0.1 depends on deepexplain

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts

With conda env create -f conda-env.yml , I get the following error:

Pip subprocess error:
 Running command git clone -q https://github.com/kundajelab/DeepExplain.git /tmp/pip-req-build-fnit2dgn
 ERROR: Command errored out with exit status 1:
  command: /root/anaconda3/envs/bpnet/bin/python -c ‘import io, os, sys, setuptools, tokenize; sys.argv[0] = ‘“’”‘/data/abardet/bpnet/setup.py’“‘”’; __file__=‘“’”‘/data/abardet/bpnet/setup.py’“‘”’;f = getattr(tokenize, ‘“’”‘open’“‘”’, open)(__file__) if os.path.exists(__file__) else io.StringIO(‘“’”‘from setuptools import setup; setup()‘“’”‘);code = f.read().replace(‘“’”‘\r\n’“‘”’, ‘“’”‘\n’“‘”’);f.close();exec(compile(code, __file__, ‘“’”‘exec’“‘”’))' egg_info --egg-base /tmp/pip-pip-egg-info-tbox0he9
    cwd: /data/bpnet/
 Complete output (6 lines):
 Traceback (most recent call last):
  File “<string>“, line 1, in <module>
  File “/data/bpnet/setup.py”, line 8
   <!DOCTYPE html>
   ^

Note: I tried yml file on the repo as well as the yml file in the comment below, the above error is for the later case
#23 (comment)

Type error in modisco cwm-scan

On line 550 of cli/modisco.py (commit db8908), there's a check for the output file name's suffix. I get an error that posix_path doesn't have a .endswith method. Can you confirm that output_file is the expected type here?

Cannot make documentation

(I'm using the bpnet conda environment, git commit de99967)
In bpnet/docs,

make build                             
mkdir -p theme_dir/img/ipynb/
./render_ipynb.bash
pydocmd build
make: pydocmd: Command not found
make: *** [Makefile:13: build] Error 127

I then installed pydoc-markdown (pip install pydoc-markdown), and now I get (home directory replaced with ~):

make build                                           
mkdir -p theme_dir/img/ipynb/
./render_ipynb.bash
pydocmd build
~/anaconda3/install/envs/bpnet/lib/python3.6/site-packages/pydocmd/__main__.py:47: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  config = yaml.load(fp)
Traceback (most recent call last):
  File "~/anaconda3/install/envs/bpnet/bin/pydocmd", line 10, in <module>
    sys.exit(main())
  File "~/anaconda3/install/envs/bpnet/lib/python3.6/site-packages/pydocmd/__main__.py", line 177, in main
    preproc = import_object(config['preprocessor'])(config)
  File "~/anaconda3/install/envs/bpnet/lib/python3.6/site-packages/pydocmd/imp.py", line 44,  in import_object
    return import_object_with_scope(name)[0]
  File "~/anaconda3/install/envs/bpnet/lib/python3.6/site-packages/pydocmd/imp.py", line 73, in import_object_with_scope
    obj = scope = import_module(current_name)
  File "~/anaconda3/install/envs/bpnet/lib/python3.6/site-packages/pydocmd/imp.py", line 36, in import_module
    return __import__(name, fromlist=[''])
ModuleNotFoundError: No module named 'pydocmd.preprocessor.DataLoaderYamlPreprocessor'; 'pydocmd.preprocessor' is not a package
make: *** [Makefile:13: build] Error 1

OverflowError: can't convert negative value to CHRPOS

Dear developers,

Thanks for developing the package!

When i run "bpnet train --premade=bpnet9 dataspec.yml --override='seq_width=200;n_dil_layers=6' .", I got an error message below:

I wonder whether there is something wrong with my bw file. I used bowtie2 to map my chip-seq data to the reference genome, and "samtools view -q 30" to remove reads that had poor mapping quality. Then I used picard MarkDuplicates to remove redundant reads and MergeSamFiles to merge bams files from different replicates. Then I transform bam files to bw files as you introduced in this web https://colab.research.google.com/drive/1VNsNBfugPJfJ02LBgvPwj-gPK0L_djsD#scrollTo=J3Ck7qGJ1tSJ

installation problem

We're having a problem with permissions during installation. I was wondering if the code is being blocked for download right now ("failed" message on README page). If so, do you know when it will be up? If not, we can continue to see if the issue is on our side.

contrib.counts vs contrib.profile

Dear developers,

Thanks for developing the package! A quick question: bpnet export-bw generates 2 types of bw files, with suffixes contrib.counts and contrib.profile, respectively. I was wondering what the suffixes mean and what's the difference between them?

Thanks!!

"No object to concatenate" error

Hi,

I'm trying to test bpnet on C. elegans CHIP-seq data from encode. I made bw files from the encode bam files. I then made the following modifications to bpnet9 (because of C. elegans genome):

exclude_chr=["chrM"]
valid_chr = ['chrI']
test_chr = ['chrII', 'chrIII', 'chrIV',
            'chrX', 'chrV']```

I made the following yaml file:

fasta_file: /Users/termivac/Documents/10xThreeSpecies/ERC_prep/ce11/ce11.fa  # reference genome fasta file
task_specs:  # specifies multiple tasks (e.g. Oct4, Sox2 Nanog)

  CEH83: # Nanog is the task name
    tracks:
      - /Users/termivac/Documents/10xThreeSpecies/ERC_prep/tf_tracks/ceh-83_test.pos.bw
      - /Users/termivac/Documents/10xThreeSpecies/ERC_prep/tf_tracks/ceh-83_test.neg.bw

bias_specs:  # specifies multiple bias tracks
  input:  # first bias track
    tracks:  # can specify multiple tracks
      - /Users/termivac/Documents/10xThreeSpecies/ERC_prep/tf_tracks/ceh-83_input.pos.bw
      - /Users/termivac/Documents/10xThreeSpecies/ERC_prep/tf_tracks/ceh-83_input.neg.bw
    tasks:  # applies to Oct4, Sox2, Nanog tasks
      - CEH83

I'm running bpnet using the following command:

train --premade=bpnet9 --vmtouch CEH83.yml CEH83_output

I'm getting what looks like a parsing error:

Using TensorFlow backend.
/Users/termivac/anaconda3/envs/bpnet/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/Users/termivac/anaconda3/envs/bpnet/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/Users/termivac/anaconda3/envs/bpnet/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/Users/termivac/anaconda3/envs/bpnet/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/Users/termivac/anaconda3/envs/bpnet/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/Users/termivac/anaconda3/envs/bpnet/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
/Users/termivac/anaconda3/envs/bpnet/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/Users/termivac/anaconda3/envs/bpnet/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/Users/termivac/anaconda3/envs/bpnet/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/Users/termivac/anaconda3/envs/bpnet/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/Users/termivac/anaconda3/envs/bpnet/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/Users/termivac/anaconda3/envs/bpnet/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
WARNING:tensorflow:From /Users/termivac/anaconda3/envs/bpnet/lib/python3.6/site-packages/keras/optimizers.py:790: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.

2021-03-20 15:51:24,395 [WARNING] From /Users/termivac/anaconda3/envs/bpnet/lib/python3.6/site-packages/keras/optimizers.py:790: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.

2021-03-20 15:51:25,029 [INFO] Note: NumExpr detected 16 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
2021-03-20 15:51:25,029 [INFO] NumExpr defaulting to 8 threads.
/Users/termivac/anaconda3/envs/bpnet/lib/python3.6/site-packages/bpnet/plot/heatmaps.py:6: MatplotlibDeprecationWarning:
The mpl_toolkits.axes_grid1.colorbar module was deprecated in Matplotlib 3.2 and will be removed two minor releases later. Use matplotlib.colorbar instead.
  from mpl_toolkits.axes_grid1.colorbar import colorbar
/Users/termivac/Documents/10xThreeSpecies/ERC_prep/ce11/ce11.fa
[OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO] 24974/24974

           Files: 1
     Directories: 0
   Touched Pages: 24974 (97M)
         Elapsed: 0.036978 seconds
/Users/termivac/Documents/10xThreeSpecies/ERC_prep/tf_tracks/ceh-83_test.pos.bw
[OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO] 27285/27285

           Files: 1
     Directories: 0
   Touched Pages: 27285 (106M)
         Elapsed: 0.046646 seconds
/Users/termivac/Documents/10xThreeSpecies/ERC_prep/tf_tracks/ceh-83_test.neg.bw
[OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO] 27285/27285

           Files: 1
     Directories: 0
   Touched Pages: 27285 (106M)
         Elapsed: 0.048614 seconds
/Users/termivac/Documents/10xThreeSpecies/ERC_prep/tf_tracks/ceh-83_input.pos.bw
[OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO] 3533/3533

           Files: 1
     Directories: 0
   Touched Pages: 3533 (13M)
         Elapsed: 0.006081 seconds
/Users/termivac/Documents/10xThreeSpecies/ERC_prep/tf_tracks/ceh-83_input.neg.bw
[OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO] 3533/3533

           Files: 1
     Directories: 0
   Touched Pages: 3533 (13M)
         Elapsed: 0.006723 seconds
INFO [03-20 15:51:26] Using gpu: 0, memory fraction: 0.45
2021-03-20 15:51:26.372802: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
INFO [03-20 15:51:26] Using the following premade configuration: bpnet9
TF-MoDISco is using the TensorFlow backend.
Traceback (most recent call last):
  File "/Users/termivac/anaconda3/envs/bpnet/bin/bpnet", line 8, in <module>
    sys.exit(main())
  File "/Users/termivac/anaconda3/envs/bpnet/lib/python3.6/site-packages/bpnet/__main__.py", line 38, in main
    argh.dispatch(parser)
  File "/Users/termivac/anaconda3/envs/bpnet/lib/python3.6/site-packages/argh/dispatching.py", line 174, in dispatch
    for line in lines:
  File "/Users/termivac/anaconda3/envs/bpnet/lib/python3.6/site-packages/argh/dispatching.py", line 277, in _execute_command
    for line in result:
  File "/Users/termivac/anaconda3/envs/bpnet/lib/python3.6/site-packages/argh/dispatching.py", line 260, in _call
    result = function(*positional, **keywords)
  File "/Users/termivac/anaconda3/envs/bpnet/lib/python3.6/site-packages/bpnet/cli/train.py", line 697, in bpnet_train
    gpu=gpu)
  File "/Users/termivac/anaconda3/envs/bpnet/lib/python3.6/site-packages/gin/config.py", line 1009, in gin_wrapper
    new_kwargs = copy.deepcopy(new_kwargs)
  File "/Users/termivac/anaconda3/envs/bpnet/lib/python3.6/copy.py", line 150, in deepcopy
    y = copier(x, memo)
  File "/Users/termivac/anaconda3/envs/bpnet/lib/python3.6/copy.py", line 240, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/Users/termivac/anaconda3/envs/bpnet/lib/python3.6/copy.py", line 161, in deepcopy
    y = copier(memo)
  File "/Users/termivac/anaconda3/envs/bpnet/lib/python3.6/site-packages/gin/config.py", line 381, in __deepcopy__
    return self._scoped_configurable_fn()
  File "/Users/termivac/anaconda3/envs/bpnet/lib/python3.6/site-packages/gin/config.py", line 1069, in gin_wrapper
    utils.augment_exception_message_and_reraise(e, err_str)
  File "/Users/termivac/anaconda3/envs/bpnet/lib/python3.6/site-packages/gin/utils.py", line 41, in augment_exception_message_and_reraise
    raise proxy.with_traceback(exception.__traceback__) from None
  File "/Users/termivac/anaconda3/envs/bpnet/lib/python3.6/site-packages/gin/config.py", line 1046, in gin_wrapper
    return fn(*new_args, **new_kwargs)
  File "/Users/termivac/anaconda3/envs/bpnet/lib/python3.6/site-packages/bpnet/datasets.py", line 477, in bpnet_data
    interval_transformer=interval_transformer),
  File "/Users/termivac/anaconda3/envs/bpnet/lib/python3.6/site-packages/gin/config.py", line 1069, in gin_wrapper
    utils.augment_exception_message_and_reraise(e, err_str)
  File "/Users/termivac/anaconda3/envs/bpnet/lib/python3.6/site-packages/gin/utils.py", line 41, in augment_exception_message_and_reraise
    raise proxy.with_traceback(exception.__traceback__) from None
  File "/Users/termivac/anaconda3/envs/bpnet/lib/python3.6/site-packages/gin/config.py", line 1046, in gin_wrapper
    return fn(*new_args, **new_kwargs)
  File "/Users/termivac/anaconda3/envs/bpnet/lib/python3.6/site-packages/bpnet/datasets.py", line 278, in __init__
    for task, task_spec in self.ds.task_specs.items()
  File "/Users/termivac/anaconda3/envs/bpnet/lib/python3.6/site-packages/pandas/core/reshape/concat.py", line 284, in concat
    sort=sort,
  File "/Users/termivac/anaconda3/envs/bpnet/lib/python3.6/site-packages/pandas/core/reshape/concat.py", line 331, in __init__
    raise ValueError("No objects to concatenate")
ValueError: No objects to concatenate
  In call to configurable 'StrandedProfile' (<class 'bpnet.datasets.StrandedProfile'>)
  In call to configurable 'bpnet_data' (<function bpnet_data at 0x7fe77139e488>)

I'd appreciate any help in getting the software to work.

Best,

Eyal

Notebook not working

Hi, when running the first cell (install dependencies) of the tutorial notebook I encountered the following error message:

ERROR: Cannot install bpnet==0.0.1, bpnet==0.0.10, bpnet==0.0.11, bpnet==0.0.12, bpnet==0.0.13, bpnet==0.0.14, bpnet==0.0.15, bpnet==0.0.16, bpnet==0.0.17, bpnet==0.0.18, bpnet==0.0.19, bpnet==0.0.2, bpnet==0.0.20, bpnet==0.0.21, bpnet==0.0.22, bpnet==0.0.23, bpnet==0.0.3, bpnet==0.0.4, bpnet==0.0.5, bpnet==0.0.6, bpnet==0.0.7, bpnet==0.0.8 and bpnet==0.0.9 because these package versions have conflicting dependencies. ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/user_guide/#fixing-conflicting-dependencies ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. google-colab 1.0.0 requires tornado~=5.1.0, but you have tornado 6.2 which is incompatible.
and
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. google-colab 1.0.0 requires tornado~=5.1.0, but you have tornado 6.2 which is incompatible. flask 1.1.4 requires Jinja2<3.0,>=2.10.1, but you have jinja2 3.1.2 which is incompatible.
I would really appreciate if you could look into this!

kundajelab / bpnet Goto Github PK

bpnet's People

Contributors

Stargazers

Watchers

Forkers

bpnet's Issues

Recommend Projects

Recommend Topics

Recommend Org