The value in the Library tab 'Quantification post-Indexing' box isn't being read, and

<g-emoji class="g-emoji" alias="point_up" fallback-src="https://github.githubassets.co

missing data in library.Quantification_post-Indexing_total about sidora.core HOT 15 CLOSED

sidora-tools commented on July 17, 2024

missing data in library.Quantification_post-Indexing_total

from sidora.core.

Comments (15)

ivelsko commented on July 17, 2024 1

I found another issue with this one - when the numbers are entered European-style (with , instead of .) they get converted to 0. Would you be able to get those entries read properly?

library_info <- get_df("TAB_Library", con)
library_info <- library_info %>% convert_all_ids_to_values()
library_info %>%
  select(library.Full_Library_Id, library.Batch, `library.Quantification_post-Indexing_total`) %>%
  filter(str_detect(library.Full_Library_Id, "MID02"))

# A tibble: 10 x 3
   library.Full_Library_Id library.Batch          `library.Quantification_post-Indexing_total`
   <chr>                   <chr>                                                       <int64>
 1 MID020.A0101            Li01_SA_2016-11-17                                     332000000000
 2 MID021.A0101            Li01_SA_2016-11-17                                     518000000000
 3 MID022.A0101            Li02_LS_2019-12-09_MID                                           NA
 4 MID023.A0101            Li02_LS_2019-12-09_MID                                           NA
 5 MID024.A0101            Li02_LS_2019-12-09_MID                                           NA
 6 MID025.A0101            Li02_LS_2019-12-09_MID                                           NA
 7 MID026.A0101            Li02_LS_2019-12-09_MID                                           NA
 8 MID027.A0101            Li02_LS_2019-12-09_MID                                           NA
 9 MID028.A0101            Li02_LS_2019-12-09_MID                                           NA
10 MID029.A0101            Li02_LS_2019-12-09_MID                                           NA

Where the values for MID022-MID029 are entered in Pandora as, for example: 4,05E+18 rather than 4.05E+18

from sidora.core.

ivelsko commented on July 17, 2024 1

These particular ones are locked because they're more than 6 months old, and would need to be adjusted by someone with Pandora editing permission (so that fixing's probably not going to happen with all the work people have right now, it's a small issue for them). I don't know how common it is beyond these batches.

This all from 1 tech, and I can point out that she needs to change her settings in excel/libreoffice for the future, but I can't be sure that will actually happen. And since a lot of details can slip by or be missed during Pandora entry training, it may happen again through new HiWis or new techs.

Which is to say it would definitely be helpful to have this caught somewhere, whether it's having Pandora give an error during the batch upload/manual entry stage, or in sidora

from sidora.core.

ivelsko commented on July 17, 2024

Also sometimes the values are missing in library.Quantification_pre-Indexing_total

sites_df %>% filter(library.Full_Library_Id == "CDM020.A0203") %>% select(contains("Quantification"))
# A tibble: 1 x 2
  `library.Quantification_pre-Indexing_total` `library.Quantification_post-Indexing_total`
                                        <int>                                        <int>
1                                          NA                                           NA

Online the values are:
pre - 11600000000
post - 5130000000000

from sidora.core.

nevrome commented on July 17, 2024

I'm pretty sure that has something to do with R integer size limits: https://stackoverflow.com/questions/14589354/struggling-with-integers-maximum-integer-size

Puh. We probably could think of enabling 64bit integers (e.g. https://cran.r-project.org/web/packages/bit64/bit64.pdf), but I have the feeling that might become hell soon with all sorts of unexpected effects. So maybe we downscale or store these values as strings?

from sidora.core.

jfy133 commented on July 17, 2024

Ah poop... that's sucky.

I don't like the idea of storing them as strings as then you can't actually use the data for computations (which is ultimately what sidora should allow). The 64bit integers might not be a bad idea if we consider we don't actually have much numeric data. I guess the worry is whether it'll break other packages etc..., is that correct?

from sidora.core.

nevrome commented on July 17, 2024

Ja - that and weird effects if you want to handle these columns together with other integers/doubles. It's basically an own data type different from "normal" numbers.

Could you quickly explain to me why these numbers are so big? What does this number mean? Couldn't we just divide by 10^6? 😄

If we want to introduce this data type that should be as easy as adding bit64 to the dependencies in the DESCRIPTION file, changing the type in the relevant fields in data-raw/pandora_column_types.tsv and adding the as.integer64 command here:

sidora.core/R/dataprep_enforce_types.R

Lines 52 to 64 in 9f66a05

 string_to_as <- function(x) { 

 switch( 

 x, 

 "integer" = as.integer, 

 "double" = as.numeric, 

 "factor" = as.factor, 

 "logical" = as.logical, 

 "character" = as.character, 

 "yesno_logical" = yesno_logical_to_logical, 

 "datetime" = as.POSIXct, 

 NA 

 ) 

 }

from sidora.core.

ivelsko commented on July 17, 2024

These are the estimated total number of adapter-ligated (pre-indexing) and indexed (post-indexing) DNA fragments in the library, based on the qPCR results. The numbers are pasted from the calculation sheets into Pandora upload sheets. I worry changing them in any way (such as dividing by 10^6) would lead to confusion, so I would prefer to leave them, if possible

from sidora.core.

jfy133 commented on July 17, 2024

☝️ I think this important, these are exact counts but can range massively, so division wouldn't always been easily interpretable. I agree with Irina we don't want to change this as reporting in that way is basically never done (as far as I know), and will very likely lead to confusion.

I like the suggestion of as.integer64!

from sidora.core.

nevrome commented on July 17, 2024

Alright - thank you for the explanation. Then integer64 it is 🤷‍♂️

from sidora.core.

ivelsko commented on July 17, 2024

Not sure if you need to know this, but something you might want to be aware of - ggplot doesn't know how to handle these numbers. I ended up with a blank plot, with the error:

Don't know how to automatically pick scale for object of type integer64. Defaulting to continuous. 
Removed 115 rows containing missing values (geom_point).

I had to convert them with as.numeric to have my points show up.

from sidora.core.

nevrome commented on July 17, 2024

Ja - that's one of the effects I predicted above. Good that you found a workaround, but be careful with that: Representing big integers as numerics can have unexpected side effects as well.

I suggest to divide by 10^X for analysis and plotting.

from sidora.core.

nevrome commented on July 17, 2024

Thanks for finding the source of the issue already!

This is a classic shit-in-shit-out-problem. We could probably catch this, but seriously: These entries in the DB are simply wrong. Can't they just be fixed there? Or is this prevented by the almost non-existent user-management of Pandora?

from sidora.core.

nevrome commented on July 17, 2024

Please check if sidora.core v0.1.1 fixes the problem at hand for you.

The input check should be clearly done by Pandora, which should only accept numeric values in a certain format. I would say report the issue, but afaik there is no place to submit tickets 🤷‍♂️

from sidora.core.

ivelsko commented on July 17, 2024

Yes it works now, thanks!

Maybe @jfy133 can tell me who to talk to about having that format check added to Pandora?

from sidora.core.

jfy133 commented on July 17, 2024

@ivelsko Robert Färber!

from sidora.core.

missing data in library.Quantification_post-Indexing_total about sidora.core HOT 15 CLOSED

Comments (15)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	string_to_as <- function(x) {
	switch(
	x,
	"integer" = as.integer,
	"double" = as.numeric,
	"factor" = as.factor,
	"logical" = as.logical,
	"character" = as.character,
	"yesno_logical" = yesno_logical_to_logical,
	"datetime" = as.POSIXct,
	NA
	)
	}