Comments (7)
Ah, sorry, PR #40 was never quite finished. I'll try and follow it up to get this fixed. Once this PR is merged, I can release a new version of the workflow and you should simply be able to run it with this new version number.
from dna-seq-gatk-variant-calling.
The Traceback
and the TypeError
actually point to the problem: The function get_contigs()
in line 44 of common.smk
uses the pandas
function read_table()
with the argument squeeze=True
:
As the TypeError
points out, this is not a known keyword argument for that read_table()
function. Previously, this keyword argument existed for this function, which is why it was used here. But the squeeze
keyword argument to read_table()
was deprecated in pandas a while ago (I think with version 1.4
), and snakemake pulls in a newer pandas version (for example, I get pandas 2.0.2
for snakemake 7.26.0
with the conda
installation).
I think the fix here would be to use the equivalent .squeeze("columns")
on the data frame resulting from the read_table()
. Would you feel comfortable providing a pull request with that change?
And also two side-notes:
- We don't actively maintain this workflow. For variant calling, I would instead recommend the
dna-seq-varlociraptor
workflow. - For pasting code lines, GitHub provides Markdown syntax, which is really easy to use. You can also use that to paste the content of
.txt
files in-line.
from dna-seq-gatk-variant-calling.
Hello @dlaehnemann, thank you very much for your answer and sorry about the formatting of my question. I think I don't feel confortable asking for the change because I feel like I am too new to this and don't really feel like I know what I am talking about. Would you be able to do it?
Thank you for your suggestion, I would like to use the varlociraptor workflow for dealing with pedigrees, I didn't realise I could also use it for general variant-calling scenarios.
from dna-seq-gatk-variant-calling.
No worries about the formatting, it's not always clear where and how to find info such as the docs for the GitHub Markdown syntax. And you definitely provided all the info that was needed for figuring out the problem and we need such error reports to fix and improve the workflows. So thanks to you!
The issue you report here, should be addressed by pull request #40. Once that is merged, feel free to retest your workflow setup.
And as for varlociraptor, you can create all kinds of calling scenarios that can include different kinds of sample dependencies. These can be mendelian inheritance relationships (pedigrees), or clonal inheritance (e.g. tumors and metastases) and all kinds of allele frequency setups. There is a little catalog of example scenarios in the docs:
https://varlociraptor.github.io/varlociraptor-scenarios/landing/
And more detailed docs about how to write such a scenario is here:
https://varlociraptor.github.io/docs/calling/#generic-variant-calling
And also feel free to ask questions in the varlociraptor repository or the repository of the snakemake workflow using varlociraptor.
from dna-seq-gatk-variant-calling.
Thank you so much for submitting the pull request and for all these info on varlociraptor I will look it up!
from dna-seq-gatk-variant-calling.
Hello dlaehnemann and ClaraApicella
Thank you for this issue, I have the same problem while testing this GATK workflow.
I saw in request #40 that we have to change squeeze=True by .squeeze("columns"), which is perfectly clear.
But as I am new in snakemake, which file do I have to modify and how can I rerun the modified workflow to fix this please ?
thanks for your help.
The example in the issue #40 is
y = pd.read_csv(path, index_col=0, squeeze=True, dtype={1: float})
to replace by y = pd.read_csv(path, index_col=0, dtype={1: float}).squeeze("columns")
Regards
from dna-seq-gatk-variant-calling.
Thanks David.
You rock
from dna-seq-gatk-variant-calling.
Related Issues (20)
- Erro in rule snpeff HOT 1
- README report link served through RawGit will stop working
- snpeff with custom genome database
- Direct output to directory of interest HOT 1
- Which singularity image is used? HOT 1
- Failed to open environment file
- Failed to open environment file mergevcfs HOT 1
- Running without known-variants HOT 3
- Haplotype caller with intervals runs slow HOT 2
- Not executed fastqc for specified inputs HOT 1
- Errors in two rules terminating run HOT 2
- missing python dependency in rule plot_stats HOT 2
- Choice of reference genome HOT 2
- can't set java opts for rule recalibrate_base_qualities
- REMOVE_DUPLICATES is false according to logs in rule mark_duplicates
- VQSR not fully implemented HOT 4
- rule plot_stats fails with "OverflowError: value too large to convert to npy_uint32" HOT 4
- Error tokenizing data. C error: Expected 5 fields in line 4, saw 6 HOT 2
- Help with dna-seq-gatk-variant-calling
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dna-seq-gatk-variant-calling.