qiime2 / keemei Goto Github PK
View Code? Open in Web Editor NEWValidate tabular bioinformatics file formats in Google Sheets
Home Page: https://keemei.qiime2.org
License: BSD 3-Clause "New" or "Revised" License
Validate tabular bioinformatics file formats in Google Sheets
Home Page: https://keemei.qiime2.org
License: BSD 3-Clause "New" or "Revised" License
Currently only says whether there are invalid characters in a cell, but not what the offending characters are.
Current docs are on the repo's wiki. It'd be better to create a dedicated website as the add-on's Help menu will direct users to this site.
Required if add-on is accepted for publication in the web store.
Add instructions that show how to install the plugin. Need to cut a release first and make it an real Google Sheets "add-on".
Validation should mark cells with leading/trailing whitespace as a warning.
Header.gs is pretty messy, and a lot of the code could be refactored to use pieces in Base.gs and Column.gs.
A negative number formatted as a currency is marked as an empty cell. Example: (42.45)
Since we're using a third-party library (SheetConverter) for these conversions, this issue may need to be raised on their issue tracker.
This is basically QIIME's mapping file format with some additional required columns.
Improvement Description
A user may have a column with locations of the form 1846 19th street
, 9500 Gilman Drive
, etc and may want to create latitude and longitude columns from this information.
This should be able to be re-factored to just return the state that represents that column/row and even per cell. This can then be batched as appropriate.
Improvement Description
@gregcaporaso suggested displaying what the valid characters are for a given cell.
Proposed Behavior
For example, listing valid characters for sample IDs.
It'd be really great to see some screenshots, and ultimately demos (e.g., like the examples on the bottom of the Emperor website).
Required if add-on is accepted for publication in the web store.
Keemei doesn't state whether a message (displayed in a note) is an error or a warning. Instead it relies on the color of the cell to indicate this state. The notes should also include this info to improve accessibility.
@gregcaporaso suggested adding a link to the lab website in the About dialog, along with info about who developed the tool. This info should also be added somewhere on the "website" (currently GitHub wiki) and readme.
There are a number of standards that need to be adhered to for this to become an add-on. These need to be met before submitting the add-on proposal.
Improvement Description
Would be ideal to integrate with Ontomaton if possible.
References
Comments
Suggested by @rob-knight and @ackermag.
Required if add-on is accepted for publication in the web store.
Validation is pretty slow for larger sheets. Look into cutting down the number of Google API calls -- these are what's slowing it down. Profile via View -> Execution transcript
.
Initially this will be done by either coloring the column header cell green, or perhaps every valid cell.
Add an About dialog that contains info about the add-on's functionality and the file formats it validates. This feature was requested in the review performed by Google.
Empty cells are currently marked as invalid for the wrong reasons. Specific checks should be put in place for empty cells.
Support comments below header, before data starts.
If we can get this to work, Keemei will be less intrusive because it won't have to modify cells' notes.
Required if add-on is accepted for publication in the web store.
Not sure what the Google Sheets add-ons API permits, but @rob-knight pointed out that a frequent problem is that people sometimes sort the rows of their mapping files but accidentally do not select all the columns (resulting in "scrambled" metadata files)
Ensure barcodes are all the same length unless the user has specified variable-length barcodes.
Numeric cells currently throw an error during validation. They need to be cast to strings first.
It will be important for the set of rules that are applied to be customizable. For example, Qiita has different requirements for its sample template and prep template (qiita-spots/qiita#933) than QIIME 1.x has for its mapping files.
What probably makes the most sense for this would be if there were multiple validate
functions, which were tool/version-specific (e.g., QIIME 1.9.x, Qiita 0.1.x, ...), and if the user could choose to validate with one or more of these from within the Google Spreadsheets interface.
Improvement Description
The particular value might be user-defined, or pre-defined (e.g., NaN)
Improvement Description
Would it be possible to abstract out the rules, so that if (for example) we wanted to build a QIIME2-based (i.e., python) validator that could work without google access (e.g., while working on a plane, or somewhere where access to google is blocked such as parts of China or DoD/USDA facilities) we don't have to define these rules twice? Ideally, this could be used for auto-generating documentation as well.
LinkerPrimerSequence and ReversePrimer can have multiple primers separated by commas.
Current Behavior
Right now validation is performed manually via a menu item.
It'd be helpful to list the location of duplicates cells in each error/warning message. Right now it just states that the cell isn't unique and the user has to search for the duplicates.
This will perform better than the current approach of coloring/annotating each cell separately.
Adding a sidebar that lists the various errors and warnings would be really helpful for larger spreadsheets where it's hard to find all problems simply by looking at the cells. See the sidebar documentation. Would be awesome if clicking an error/warning focused on the offending cell in the spreadsheet.
Optional if add-on is accepted for publication in the web store.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.