stephenturner / annotables Goto Github PK
View Code? Open in Web Editor NEWR data package for annotating/converting Gene IDs
Home Page: http://www.gettinggeneticsdone.blogspot.com/2015/11/annotables-convert-gene-ids.html
R data package for annotating/converting Gene IDs
Home Page: http://www.gettinggeneticsdone.blogspot.com/2015/11/annotables-convert-gene-ids.html
Hi,
thanks for the useful package.
The entrez gene ids are missing in most cases
length(is.na(grch38$entrez)) [1] 66531
length(is.na(grch38$entrez)) [1] 67416
See http://bioinf.wehi.edu.au/software/MSigDB/
This could be useful in all sorts of ways.
If you don't mind, I share some tips on how to improve the pander
calls and make the code more readable and easier to maintain by using global options. First of all, there is no real need to call pandoc.table
, it's totally fine to use the pander
general S3 method on anything.
I've seen you do not want to split the tables. For this end, you can define a global option that will be used in all future pander
(or pandoc.table
) calls. Eg you can set it to 100
like you did, or even to disable this feature if needed:
panderOptions('table.split.table', Inf)
Similarly, you can set the table style as well:
panderOptions('table.style', 'rmarkdown')
And the cell alignment can be specified too via the table.alignment.default
option, which can be a function as well to eg justify numbers to the left and everything else to the right etc. See eg this thread on SO: http://stackoverflow.com/a/27014481/564164
Please feel free to close this ticket, I just wanted to share these (hopefully) useful tricks :)
thelovelab/tximport@8f31541
thelovelab/tximport#13
The grch38_gt
data_table is useful, but could we maybe have a grch38_tg
as well?
tx gene
1 ENST00000387314 ENSG00000210049
2 ENST00000389680 ENSG00000211459
3 ENST00000387342 ENSG00000210077
4 ENST00000387347 ENSG00000210082
5 ENST00000386347 ENSG00000209082
6 ENST00000361390 ENSG00000198888
``
I suspect mistakes in gene symbols.
I was making the wrong assumption that there is unique correspondence between the rows in grch38/grch37 and the different ENSGs. It turned out there there are ensgene
repetitions:
> sum(data.frame(table(grch38$ensgene))$Freq > 1)
[1] 361
I checked several of these 361 duplicating genes and it seems that the entrez gid's are the only difference:
> grch38[ grch38$ensgene == "ENSG00000198668", ]
# A tibble: 3 x 9
ensgene entrez symbol chr start end strand biotype description
<chr> <int> <chr> <chr> <int> <int> <int> <chr> <chr>
1 ENSG00000198668 801 CALM1 14 9.04e7 9.04e7 1 protei… calmodulin…
2 ENSG00000198668 805 CALM1 14 9.04e7 9.04e7 1 protei… calmodulin…
3 ENSG00000198668 808 CALM1 14 9.04e7 9.04e7 1 protei… calmodulin…
I further looked at the NCBI webside for the different entrez gid's and they point to different genes CALM genes (not only CALM1).
CALM1: https://www.ncbi.nlm.nih.gov/gene/?term=801
CALM2: https://www.ncbi.nlm.nih.gov/gene/?term=805
CALM3: https://www.ncbi.nlm.nih.gov/gene/?term=808
Version:
> packageVersion("annotables")
[1] ‘0.1.91’
Hi Steven,
Just one more thing that I noticed. When I click on the webpage link on this repo: www.gettinggeneticsdone.com/2015/11/annotables-convert-gene-ids.html.
It redirects me to this weird japanese webpage:
https://earthgekinka.com/creditcardgenkinka/jibundedekiru.html
Could you please add Zebrafish to the list?
need to update documentation with changes made in #6 by @aaronwolen to document automated creation of new datasets based on YAML files
hi~
I used the R (3.5.0) to install annotables packages. But I got some Errors.
Commands:
install.packages("devtools")
devtools::install_github("stephenturner/annotables")
Errors:
devtools::install_github("stephenturner/annotables")
Error in curl::new_handle() : An unknown option was passed in to libcurl
Could you give a favour. How to manage this error. I have done my best to do it.
Thank you!
Thank you for creating this package.
I tried to add grch37 and failed.
When I run the code from your README.Rmd:
fix_genes <- . %>%
tbl_df %>%
distinct %>%
rename(ensgene=ensembl_gene_id,
entrez=entrezgene,
symbol=external_gene_name,
chr=chromosome_name,
start=start_position,
end=end_position,
biotype=gene_biotype)
myattributes <- c("ensembl_gene_id",
"entrezgene",
"external_gene_name",
"chromosome_name",
"start_position",
"end_position",
"strand",
"gene_biotype",
"description")
and adding grch37 following your code, I get:
Error in rename(., ensgene = ensembl_gene_id, entrez = entrezgene, symbol = external_gene_name, :
object 'ensembl_gene_id' not found
By just removing the rename function and last pipe, everything seems to work.
I am quite new to bioinformatics, R and github. I hope 'Issues' is the right place to ask my question.
If I want to update annotables for a build more recent than your git hub, is it simply a matter of cloning and building the annotables package? In other words, does building the package automatically go to the latest ensembl build? If yes, where do I change the code to reflect the current version so ensembl_version returns the correct value.
I did clone and build the package successfully, but ensembl_version still reports ensembl 91 and it would be work to compare to a known older version. I'm not so familiar with these data packages and I'm having trouble dissecting the package to find the source code that's hitting ensembl to answer the versioning question myself.
Thanks,
John Thompson
Hi, I am using the version 0.1.91 of the annotables package. I have noticed that the grch37 and grch38 objects are exactly the same.
library(annotables)
identical(grch37, grch38)
I am surprised that even the genomic positions are exactly the same being two different versions of the human genome. Am I being stupid and missing something?
Can you make tables with ensembl versions (on genes & transcripts)?
Hi,
I find your package very useful, but I'm not very R savvy.
Is it possible to add new organisms to annotables?
I'm interested in Mmul10 (Macaca mulatta).
Thanks for your help.
with the changes in #6 it's much easier to recreate annotation tables. the files are named e.g. galgal5
, but which version/build is actually used depends on what's current in ensembl. e.g., when I first built this package, chicken was on galgal4. i had to manually update the filenames, and I probably did the wrong thing by just deleting (rather than deprecating) the old datasets. maybe that's okay since it's still versioned in a release. not sure how to best handle these issues.
Hi Stephen,
Hope everything is going well. Ensembl released version 90 last monthish, is there a plan to update the annotations? I'm not sure what your vision was involving keeping up. Thanks so much! It is surprising how useful it is to just have the annotations on hand and not have to re-look them up every time.
Hello, I am rather new to R and am not able to install the annotables package. Here is what I get:
` install.packages("devtools")
Error in install.packages : Updating loaded packages
Restarting R session...
During startup - Warning messages:
1: Setting LC_CTYPE failed, using "C"
2: Setting LC_COLLATE failed, using "C"
3: Setting LC_TIME failed, using "C"
4: Setting LC_MESSAGES failed, using "C"
5: Setting LC_MONETARY failed, using "C"
Use devtools to install the package
devtools::install_github("stephenturner/annotables")
Downloading GitHub repo stephenturner/annotables@master
tar: Failed to set default locale
tar: Failed to set default locale
During startup - Warning messages:
1: Setting LC_CTYPE failed, using "C"
2: Setting LC_TIME failed, using "C"
3: Setting LC_MESSAGES failed, using "C"
4: Setting LC_MONETARY failed, using "C"
v checking for file '/private/var/folders/0m/_xp6xtk96v3c7nztw552_zzh0000gn/T/RtmphObqL7/remotes107e2dec0e0d/stephenturner-annotables-805a247/DESCRIPTION' ...
Error: (converted from warning) Setting LC_CTYPE failed, using "C"
Execution halted
Error in i.p(...) :
(converted from warning) installation of package '/var/folders/0m/_xp6xtk96v3c7nztw552_zzh0000gn/T//RtmphObqL7/file107e1486950e/annotables_0.1.91.tar.gz' had non-zero exit status
library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
library(annotables)
Error in library(annotables) : there is no package called 'annotables'`
I also tried other installing methods but in the end there was never a package called 'annotables'.
Thank you for your help!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.