Comments (15)
I found another issue with this one - when the numbers are entered European-style (with , instead of .) they get converted to 0. Would you be able to get those entries read properly?
library_info <- get_df("TAB_Library", con)
library_info <- library_info %>% convert_all_ids_to_values()
library_info %>%
select(library.Full_Library_Id, library.Batch, `library.Quantification_post-Indexing_total`) %>%
filter(str_detect(library.Full_Library_Id, "MID02"))
# A tibble: 10 x 3
library.Full_Library_Id library.Batch `library.Quantification_post-Indexing_total`
<chr> <chr> <int64>
1 MID020.A0101 Li01_SA_2016-11-17 332000000000
2 MID021.A0101 Li01_SA_2016-11-17 518000000000
3 MID022.A0101 Li02_LS_2019-12-09_MID NA
4 MID023.A0101 Li02_LS_2019-12-09_MID NA
5 MID024.A0101 Li02_LS_2019-12-09_MID NA
6 MID025.A0101 Li02_LS_2019-12-09_MID NA
7 MID026.A0101 Li02_LS_2019-12-09_MID NA
8 MID027.A0101 Li02_LS_2019-12-09_MID NA
9 MID028.A0101 Li02_LS_2019-12-09_MID NA
10 MID029.A0101 Li02_LS_2019-12-09_MID NA
Where the values for MID022-MID029 are entered in Pandora as, for example: 4,05E+18 rather than 4.05E+18
from sidora.core.
These particular ones are locked because they're more than 6 months old, and would need to be adjusted by someone with Pandora editing permission (so that fixing's probably not going to happen with all the work people have right now, it's a small issue for them). I don't know how common it is beyond these batches.
This all from 1 tech, and I can point out that she needs to change her settings in excel/libreoffice for the future, but I can't be sure that will actually happen. And since a lot of details can slip by or be missed during Pandora entry training, it may happen again through new HiWis or new techs.
Which is to say it would definitely be helpful to have this caught somewhere, whether it's having Pandora give an error during the batch upload/manual entry stage, or in sidora
from sidora.core.
Also sometimes the values are missing in library.Quantification_pre-Indexing_total
sites_df %>% filter(library.Full_Library_Id == "CDM020.A0203") %>% select(contains("Quantification"))
# A tibble: 1 x 2
`library.Quantification_pre-Indexing_total` `library.Quantification_post-Indexing_total`
<int> <int>
1 NA NA
Online the values are:
pre - 11600000000
post - 5130000000000
from sidora.core.
I'm pretty sure that has something to do with R integer size limits: https://stackoverflow.com/questions/14589354/struggling-with-integers-maximum-integer-size
Puh. We probably could think of enabling 64bit integers (e.g. https://cran.r-project.org/web/packages/bit64/bit64.pdf), but I have the feeling that might become hell soon with all sorts of unexpected effects. So maybe we downscale or store these values as strings?
from sidora.core.
Ah poop... that's sucky.
I don't like the idea of storing them as strings as then you can't actually use the data for computations (which is ultimately what sidora should allow). The 64bit integers might not be a bad idea if we consider we don't actually have much numeric data. I guess the worry is whether it'll break other packages etc..., is that correct?
from sidora.core.
Ja - that and weird effects if you want to handle these columns together with other integers/doubles. It's basically an own data type different from "normal" numbers.
Could you quickly explain to me why these numbers are so big? What does this number mean? Couldn't we just divide by 10^6?
If we want to introduce this data type that should be as easy as adding bit64
to the dependencies in the DESCRIPTION
file, changing the type in the relevant fields in data-raw/pandora_column_types.tsv
and adding the as.integer64
command here:
sidora.core/R/dataprep_enforce_types.R
Lines 52 to 64 in 9f66a05
from sidora.core.
These are the estimated total number of adapter-ligated (pre-indexing) and indexed (post-indexing) DNA fragments in the library, based on the qPCR results. The numbers are pasted from the calculation sheets into Pandora upload sheets. I worry changing them in any way (such as dividing by 10^6) would lead to confusion, so I would prefer to leave them, if possible
from sidora.core.
I like the suggestion of as.integer64
!
from sidora.core.
Alright - thank you for the explanation. Then integer64
it is
from sidora.core.
Not sure if you need to know this, but something you might want to be aware of - ggplot doesn't know how to handle these numbers. I ended up with a blank plot, with the error:
Don't know how to automatically pick scale for object of type integer64. Defaulting to continuous.
Removed 115 rows containing missing values (geom_point).
I had to convert them with as.numeric
to have my points show up.
from sidora.core.
Ja - that's one of the effects I predicted above. Good that you found a workaround, but be careful with that: Representing big integers as numerics can have unexpected side effects as well.
I suggest to divide by 10^X for analysis and plotting.
from sidora.core.
Thanks for finding the source of the issue already!
This is a classic shit-in-shit-out-problem. We could probably catch this, but seriously: These entries in the DB are simply wrong. Can't they just be fixed there? Or is this prevented by the almost non-existent user-management of Pandora?
from sidora.core.
Please check if sidora.core v0.1.1 fixes the problem at hand for you.
The input check should be clearly done by Pandora, which should only accept numeric values in a certain format. I would say report the issue, but afaik there is no place to submit tickets
from sidora.core.
Yes it works now, thanks!
Maybe @jfy133 can tell me who to talk to about having that format check added to Pandora?
from sidora.core.
@ivelsko Robert FΓ€rber!
from sidora.core.
Related Issues (20)
- convert datetime-columns automatically HOT 9
- Automatic lookup of capture probe set and other easy-to-lookup columns HOT 16
- Add analysis_result_string table loading HOT 3
- Request convinence function to provide flat table of non-sequential tables.
- Add input validation checks for all functions HOT 2
- Analysis tab format is inconsistent with the other tabs HOT 4
- Allow data-entry updating utility functions? HOT 4
- Convert blank cells to NA for consistency?
- Update documentation clarification and error message for new release.
- Add support for new coredb ID entries HOT 2
- convert_all_ids_to_values() requires con
- Make function to call `TAB_Field_Comment` HOT 1
- Fake columns HOT 2
- Update to support new columns HOT 1
- Standardise logical columns
- List of Release names
- Erroneous entries for library index IDs HOT 10
- Soften unnecessarily strict check for table download HOT 1
- Provided environment not working HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sidora.core.