What does this package do? (explain in 50 words or less)<br

Thanks for the submission <a class="user-mention notranslate" data-hovercard-type="use

Reviewers: <a class="user-mention notranslate" data-hovercard-type="user" data-hoverca

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

snake_case is just a suggestion - consistency is more important, camelCase when

genbankr about software-review HOT 14 CLOSED

ropensci commented on July 24, 2024

genbankr

from software-review.

Comments (14)

sckott commented on July 24, 2024

Thanks for the submission @gmbecker

from software-review.

sckott commented on July 24, 2024

Seeking reviewers

from software-review.

sckott commented on July 24, 2024

Reviewers: @dwinter
Due date: 2016-06-15

from software-review.

sckott commented on July 24, 2024

@dwinter
Due date: 2016-06-15 - hey there, it's been 21 days, please get your review in soon, thanks 😺 (ropensci-bot)

from software-review.

dwinter commented on July 24, 2024

Hi @sckott / bot / @gmbecker -- sorry about the delay, I've been working on this but it will take a little longer. Will aim for this week though.

from software-review.

sckott commented on July 24, 2024

thanks!

from software-review.

dwinter commented on July 24, 2024

This is a really nice package, which fills an important gap in the tools available to work on sequence data in R. Genbank is a very large sequence database, and genbank-formatted records provide rich information about the sequences it contains. This package provides tools to fetch and parse genbank records, and a set of functions to extract information from the parsed records. Importantly, the paclage integrates well with Bioconductor allowing the information within sequence files to be represented as flexible and useful objects like those provided by GenomicRanges.

Clashes between ropensci and Bioconductor styles

One of the difficulties in reviewing the package is that genbankr is part of the Bioconductor project, and therefore meets different coding standards and practices than those suggested for ropensci. Obviously it's up to the ropensci leaders and @gmbecker to decide how this can be negotiated, I will just list those I find here.

ropensci suggests using testthat for unit tests, Bioconductor has its own testing framework that uses RUnit.
ropensci suggest snake_case for most variables, Bioconductor uses camelCase.
As I understand it, the github repo for this package is a fork of a mirror of the Bioconductor source? I don't know what role the github repo will play for the package and how it can be kept in synch with the "official" source. (I'm sure it can be done, I just wanted to flag it as something that needs to be thought about)

Readme

This is perhaps another example of the different between each project. In ropensci the github repo is a "landing page" for the package, whereas as Bioconductor presents much of the same information here. Even so, the current README for genbankr source code could be improved by adding

badges for release, continuous integration and test coverage
Some small usage example
A clear statement about where to go for help or to file bugs (the Bioconductor QA site?)

I have to say I also find the the installation instructions a little daunting. Perhaps start by emphasizing that almost all users should use the current Bioconductor release, which can be installed easily with biocLite. Can the right version of biocLite be accessed via library(BiocInstaller)? If so, these seems preferable to telling users to source a URL to install packages.

Documentation

I found package-level function to be very good, both clear and complete. Though perhaps not technically documentation, the print function for genbank records is also well thought-out, providing the user wih a clear indication of what each record contains.

The package vignette is also clear and does a good job of demonstrating basic usages and integrations for the package. I think it would also be very helpful to show how the features in a genbank records can be used in an analysis.

For instance, an example demonstrating the best way to use features in a given file and getSeq to retriece only the CDS from an mRNA sequence would document a possible use-case and demonstrate how integration with GenomicRanges can be helpful.

For the example of reading data into and R sessions, I think it's better to store the data in inst/exdata and use system.file to specify the path in vignette. That way a user can copy and paste the example and see it work. (Hadley's book has a section on this)

I appreciate the vignette section of the "limitations" fo the package. The lack of standards as to exactly what goes into a genbank file means such limitations are inevitable, but this clear statement about how those problems are dealt with is helpful.

The code

I have very little to comment on the code itself -- it's well written, clear and appropriately modular. I have not stress tested the package on very large files, but performance on typically-sized files has been good. I don't think there are an missing functions -- the package does a good job of "doing job well", so I like the focus of the current packages is good.

@aahowel3 and I have been using the package in research this week, and we found a couple of records that break the parsing function. These are filed under issue gmbecker/genbankr#1

Everything else

It would be good to have a code of conduct -- even if you envisage mostly working on the project yourself it's useful to have an indication that that would-be collaborators can expect to be treated well.

As mentioned above, it would be useful to have a clear indication of where bugs and/or questions about the package should be sent.

I think the the VignetteBuilder entry in the DESCRIPTION should be changed to rmarkdown, and knitr should be replaced by rmarkdown in the Suggests section too. With these changes I can compile the vignette, run R CMD check and have all tests pass.

Finally, the reviewer recommendations ask for an estimate of the time taken to review the package.
I guess I spend about 2hrs using the package in research (i.e. doing stuff I would otherwise be working on, probably less efficiently 😄), another 2 closely reviewing the code, design and documentation of the package and one hour to prepare this comment.

from software-review.

gmbecker commented on July 24, 2024

@dwinter

Thank you for the the thorough review. I have addressed and closed the issues you kindly filed in my genbankr repository. Please find my initial responses to the other points of your review below:

Style

Unfortunately, in many cases my use of camelCase are not cases of me adopting the Bioconductor style, but rather my package providing methods to existing Bioconductor generics (e.g., getSeq, cdsBy, etc). As such I'm unable to change the function naming style of my package. In the interest of full disclosure, I would have chosen to use camelCase rather than snake_case anyway, as it is my preference, but that is moot as consistency is paramount and my hands are tied regarding the names of numerous methods my package provides.

Readme

The BiocInstaller package is not itself available on CRAN. The mechanism for initially installing it is the sourcing of the file you mention. Once that is installed, you are correct that library(BiocInstaller) is sufficient, and the sourcing the file is no longer needed. I attempted to communicate that via the readme but I can revisit that if it was unclear.

With respect to badges, these are avaiable at genbankr's Bioconductor splashpage here: https://bioconductor.org/packages/release/bioc/html/genbankr.html. I figured that was sufficient. It looks like I might be able to jury rig the Bioc badges to show up on the github page. I am somewhat loath to do that, but I can look into it if it is a sticking point.

I will add some mention of the Bioconductor help site to the Readme, I agree that that isn't super clear now.

Documentation

I have changed the vignette so that it uses system file to retrieve the example genbank file (this is in devel only, not backported to the current release).

I don't really have the applied subject material expertise to determine what a compelling, more realistic example would be, so I have not added one now. I will talk to people I know are using it and try to work one in for the next release. I will add the mini-applications the reviewer mentions as well for that release.

Code

Thank you for your kind words. As mentioned, I have fixed the two bugs you identified in your issue. Please bring any future problems you find to my attention as well, and the turn-around time should be shorter when I'm not in the middle of a week of conferences.

Everything else

I would have thought that once genbankr is accepted into ROpenSci, the ROpenSci code of conduct would apply. Is that not the case? A package having a code of conduct strikes me as a bit odd (it's not really a community...) but I am of course for all the things that codes of conduct protect and enforce, and would do so within my package's development environment regardless.

@sckott (or @noamross ?) Please let me know if these responses are acceptable, and if not where more work on my part is required.

from software-review.

sckott commented on July 24, 2024

snake_case is just a suggestion - consistency is more important, camelCase when interfacing with lots of other camelCase is good
If the main location for badges, etc, is on Bioconductor, this should be noted and clearly linked. - Since this is a read-only mirror, and presumably contributions should be provided in a manner different than PRs to this repository, this should be noted, as well, in the README.md or a CONTRIBUTING.md; I imagine issues shouldn't be opened here either, or do you want that?
We prefer a CoC in the package repository - the idea is to make it very clear that we expect contributors to follow the CoC. The people contributing and using the package are a community though

from software-review.

gmbecker commented on July 24, 2024

I have modified the README.md file to contain a link at the top to the official Bioconductor splashpage for genbankr.

I've also added CONDUCT.md.

PRs to this github repository should actually be fine, so I haven't made any changes there.

Please let me know if there's anything else outstanding that I need to address.

from software-review.

sckott commented on July 24, 2024

as mentioned above:

Some small usage example
A clear statement about where to go for help or to file bugs (the Bioconductor QA site?)

Can those be added to readme, or some reason why not?

p.s. @dwinter - thanks for the review hrs estimate

from software-review.

gmbecker commented on July 24, 2024

Hmm, I missed this message somehow. My bad.

I'd prefer any example usage code that has any meat to it live somewhere where it's compiled (and thus automatically tested), e.g. the vignette or examples in the documentation. I suppose I can add a non-run example call to readGenBank to the readme. Going beyond that seems like it is approaching a vignette that never gets built.

I'll add the link to bioc support, I had meant to do that.

from software-review.

gmbecker commented on July 24, 2024

Please see the latest commit of README.md, which now includes very basic usage, a link to the vignette for more complete usage examples, and a section on getting help and filing bugs and feature requests.

from software-review.

sckott commented on July 24, 2024

@gmbecker Looks good to me

approved

from software-review.

genbankr about software-review HOT 14 CLOSED

Comments (14)

Clashes between ropensci and Bioconductor styles

Readme

Documentation

The code

Everything else

Style

Readme

Documentation

Code

Everything else

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent