emanuelgoncalves / crispy Goto Github PK

View Code? Open in Web Editor NEW

9.0 1.0 3.0 615.9 MB

Python module to analyse CRISPR-based libraries

License: BSD 3-Clause "New" or "Revised" License

Python 99.71% R 0.29%

python crispr sklearn gaussian

crispy's Introduction

Hi, I'm Emanuel 👋

crispy's People

Contributors

Stargazers

Watchers

Forkers

hugosimasalmeida zhaoxiangsimoncai jasenme

crispy's Issues

It does not work in window

Hi, EmanuselGoncalves:
I can not setup the package in windows,
I don't know the reason
would you tell me?
Thank you very much

/mnt/efs/home/stb3865/software/anaconda3/lib/python3.8/site-packages/crispy/data/crispr_manifests//project_score_all_qc_failed_samples.csv does not exist

Dear Dr. Gonçalves,

I am trying to run the following second test in the README.md file:

from crispy.CRISPRData import Library

Master Library, standardised assembly of KosukeYusa V1.1, Avana, Brunello and TKOv3

CRISPR-Cas9 libraries.

master_lib = Library.load_library("MasterLib_v1.csv.gz")

Genome-wide minimal CRISPR-Cas9 library.

minimal_lib = Library.load_library("MinLibCas9.csv.gz")

Some of the most broadly adopted CRISPR-Cas9 libraries:

'Avana_v1.csv.gz', 'Brunello_v1.csv.gz', 'GeCKO_v2.csv.gz', 'Manjunath_Wu_v1.csv.gz',

'TKOv3.csv.gz', 'Yusa_v1.1.csv.gz'

brunello_lib = Library.load_library("Brunello_v1.csv.gz")

from crispy.GuideSelection import GuideSelection

sgRNA selection class

gselection = GuideSelection()

Select 5 optimal sgRNAs for MCL1 across multiple libraries

gene_guides = gselection.select_sgrnas(
"MCL1", n_guides=5, offtarget=[1, 0], jacks_thres=1, ruleset2_thres=.4
)

Perform different rounds of sgRNA selection with increasingly relaxed efficiency thresholds

gene_guides = gselection.selection_rounds("TRIM49", n_guides=5, do_amber_round=True, do_red_round=True)

import crispy as cy
import matplotlib.pyplot as plt

Import data

rawcounts, copynumber = cy.Utils.get_example_data()

Import CRISPR-Cas9 library

lib = cy.Utils.get_crispr_lib()

Instantiate Crispy

crispy = cy.Crispy(
raw_counts=rawcounts, copy_number=copynumber, library=lib
)

Fold-changes and correction integrated funciton.

Output is a modified/expanded BED formated data-frame with sgRNA and segments information

bed_df = crispy.correct(x_features='ratio', y_feature='fold_change')
print(bed_df.head())

Gaussian Process Regression is stored

crispy.gpr.plot(x_feature='ratio', y_feature='fold_change')
plt.show()

When I run it,

python -m pdb test.py
/mnt/efs/home/stb3865/software/crispy/test/test.py(2)()
-> from crispy.CRISPRData import Library
(Pdb) n
/mnt/efs/home/stb3865/software/anaconda3/lib/python3.8/site-packages/sklearn/utils/deprecation.py:143: FutureWarning: The sklearn.metrics.ranking module is deprecated in version 0.22 and will be removed in version 0.24. The corresponding classes / functions should instead be imported from sklearn.metrics. Anything that cannot be imported from sklearn.metrics is now part of the private API.
warnings.warn(message, FutureWarning)
FileNotFoundError: [Errno 2] File /mnt/efs/home/stb3865/software/anaconda3/lib/python3.8/site-packages/crispy/data/crispr_manifests//project_score_all_qc_failed_samples.csv does not exist: '/mnt/efs/home/stb3865/software/anaconda3/lib/python3.8/site-packages/crispy/data/crispr_manifests//project_score_all_qc_failed_samples.csv'
/mnt/efs/home/stb3865/software/crispy/test/test.py(2)()
-> from crispy.CRISPRData import Library

I says "FileNotFoundError: [Errno 2] File /mnt/efs/home/stb3865/software/anaconda3/lib/python3.8/site-packages/crispy/data/crispr_manifests//project_score_all_qc_failed_samples.csv does not exist: ".

Could you please upload the missing files into github ?

Thanks!

KeyError: 'Passing list-likes to .loc or [] with any missing labels is no longer supported

I could run your sample inputs. However even our in-house datasets have the EXACT format in count_table, copy_number_table and library_table, I still get this error message:

KeyError: 'Passing list-likes to .loc or [] with any missing labels is no longer supported, see https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#deprecate-loc-reindex-listlike'

/mnt/efs/home/stb3865/bin/run_crispy.py(36)main()
-> sgrna_fc=sgrna_fc.mean(1), copy_number=copynumber, library=lib.loc[sgrna_fc.index]
I appreciate you could help. Thank you!

AttributeError: 'CrispyGaussian' object has no attribute 'copy_x_train'

Hello! I'm trying to use crispy to make a CNV correction of my CRISPR/Cas9 KO screen data, but first I wanted to run the example script, mostly to get a feel for the package and check that everything works smoothly. When I reach this step:

bed_df = crispy.correct(n_sgrna=10)
print(bed_df.head())

I find myself with this error:

AttributeError Traceback (most recent call last)
/home/antonio_gt/TFM/BAGEL/Intento_2/PatuT_control/day7/bagel_script_PatuT_control_day7.ipynb Cell 17 line 1
----> 1 bed_df_ex = crispy_ex.correct(n_sgrna=10)
2 print(bed_df_ex.head())

File ~/anaconda3/envs/mamba/envs/crispy/lib/python3.9/site-packages/crispy/CopyNumberCorrection.py:94, in Crispy.correct(self, x_features, y_feature, n_sgrna)
92 # - Fit Gaussian Process on segment fold-changes
93 self.gpr = CrispyGaussian(bed_df, n_sgrna=n_sgrna)
---> 94 self.gpr = self.gpr.fit(x=x_features, y=y_feature)
96 bed_df["gp_mean"] = self.gpr.predict(bed_df[x_features])
98 # - Correct fold-change by subtracting the estimated mean

File ~/anaconda3/envs/mamba/envs/crispy/lib/python3.9/site-packages/crispy/CopyNumberCorrection.py:270, in CrispyGaussian.fit(self, x, y, train_idx)
267 x = self.bed_seg.query(f"sgRNA_ID >= {self.n_sgrna}")[x]
268 y = self.bed_seg.query(f"sgRNA_ID >= {self.n_sgrna}")[y]
--> 270 return super().fit(x, y)

File ~/anaconda3/envs/mamba/envs/crispy/lib/python3.9/site-packages/sklearn/base.py:1145, in _fit_context..decorator..wrapper(estimator, *args, **kwargs)
1140 partial_fit_and_fitted = (
1141 fit_method.name == "partial_fit" and _is_fitted(estimator)
1142 )
1144 if not global_skip_validation and not partial_fit_and_fitted:
-> 1145 estimator._validate_params()
1147 with config_context(
...
--> 195 value = getattr(self, key)
196 if deep and hasattr(value, "get_params") and not isinstance(value, type):
197 deep_items = value.get_params().items()

AttributeError: 'CrispyGaussian' object has no attribute 'copy_x_train'

As I understand it, it seems the CrispyGaussian class is lacking on a way to read a 'copy_x_train' attribute. I'm still kind of new to Python, so I don't really know how to fix this issue. Could I request some assistance? I can give more info or data if necessary.

Crispy copy number input file and CRISPR screen input file format question

Dear Dr. Gonçalves,

I really like your Crispy paper, and I am trying to use Cripsy gene copy number ratio to correct our in-house CRISPR screen data.

I have downloaded CCLE copy number segment file :

more CCLE_segment_cn.csv
DepMap_ID,Chromosome,Start,End,Segment_Mean,Num_Probes,Status,Source
ACH-000001,1,0,6275858,1.7621356648395683,5148,+,Broad WGS
ACH-000001,1,6275859,7079442,2.4909654303888336,792,+,Broad WGS
ACH-000001,1,7079443,7502035,1.8494047510652951,417,+,Broad WGS
ACH-000001,1,7502035,14866457,1.7784014133107096,6664,+,Broad WGS
ACH-000001,1,14866458,15395313,2.450037786788354,514,+,Broad WGS
ACH-000001,1,15395314,16193821,1.896562611077248,669,+,Broad WGS
ACH-000001,1,16193822,16540624,0.6786586410933554,320,-,Broad WGS
ACH-000001,1,16540625,16714625,1.1495569977055504,112,+,Broad WGS
ACH-000001,1,16714625,16824125,0.21027000944192087,51,-,Broad WGS
ACH-000001,1,16824125,16952624,1.1637054997560865,96,+,Broad WGS
ACH-000001,1,16952625,30006371,0.6013967597557067,12311,-,Broad WGS
ACH-000001,1,30006372,30016464,1.2008082209857775e-08,8,-,Broad WGS
ACH-000001,1,30016465,31640468,0.611072304220861,1481,-,Broad WGS
ACH-000001,1,31640469,34408919,0.9093181156794582,2705,0,Broad WGS
ACH-000001,1,34408920,35181710,2.0345348600584243,739,+,Broad WGS
ACH-000001,1,35181711,44564369,0.8826583328748431,9133,-,Broad WGS
ACH-000001,1,44564370,45025720,1.1983281253956508,436,+,Broad WGS
ACH-000001,1,45025721,52779891,0.9128286640381652,7201,0,Broad WGS

… …

And we have CRIPSR screen file in format like:

An example format of read-count.txt
[sgRNA tag] \t [GENE] \t [sample 1] \t [sample 2] ...
ATAGATGTCCTGTGGCCCCG-P53 [tab] TP53 [tab] 403 [tab] 362 ....
ACTCACTTCCTGTGGCCCCG-MDM2 [tab] MDM2 [tab] 45 [tab] 64 ....

When I run to run the examply.py from Crispy package:

python example.py
......
FutureWarning: The sklearn.metrics.ranking module is deprecated in version 0.22 and will be removed in version 0.24. The corresponding classes / functions should instead be imported from sklearn.metrics. Anything that cannot be imported from sklearn.metrics is now part of the private API.
warnings.warn(message, FutureWarning)
Traceback (most recent call last):
File "example.py", line 8, in
rawcounts, copynumber = cy.Utils.get_example_data()
File "/mnt/efs/home/stb3865/software/anaconda3/lib/python3.8/site-packages/crispy/Utils.py", line 195, in get_example_data
raw_counts = pd.read_csv(
File "/mnt/efs/home/stb3865/software/anaconda3/lib/python3.8/site-packages/pandas/io/parsers.py", line 676, in parser_f
return _read(filepath_or_buffer, kwds)
File "/mnt/efs/home/stb3865/software/anaconda3/lib/python3.8/site-packages/pandas/io/parsers.py", line 448, in _read
parser = TextFileReader(fp_or_buf, **kwds)
File "/mnt/efs/home/stb3865/software/anaconda3/lib/python3.8/site-packages/pandas/io/parsers.py", line 880, in init
self._make_engine(self.engine)
File "/mnt/efs/home/stb3865/software/anaconda3/lib/python3.8/site-packages/pandas/io/parsers.py", line 1114, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/mnt/efs/home/stb3865/software/anaconda3/lib/python3.8/site-packages/pandas/io/parsers.py", line 1891, in init
self._reader = parsers.TextReader(src, **kwds)
File "pandas/_libs/parsers.pyx", line 374, in pandas._libs.parsers.TextReader.cinit
File "pandas/_libs/parsers.pyx", line 674, in pandas._libs.parsers.TextReader._setup_parser_source
FileNotFoundError: [Errno 2] File /mnt/efs/home/stb3865/software/anaconda3/lib/python3.8/site-packages/crispy/data//example_rawcounts.csv does not exist: '/mnt/efs/home/stb3865/software/anaconda3/lib/python3.8/site-packages/crispy/data//example_rawcounts.csv'

Because there is no example_rawcounts.csv In the package:

I do not know the input file formats for Crispy copy number input file and CRISPR screen input file format.

Could you tell me the formats for Crispy copy number input file and CRISPR screen input file format so that I could run the examply.py program ?

Thanks,