Git Product home page Git Product logo

Comments (15)

ivelsko avatar ivelsko commented on July 17, 2024 1

I found another issue with this one - when the numbers are entered European-style (with , instead of .) they get converted to 0. Would you be able to get those entries read properly?

library_info <- get_df("TAB_Library", con)
library_info <- library_info %>% convert_all_ids_to_values()
library_info %>%
  select(library.Full_Library_Id, library.Batch, `library.Quantification_post-Indexing_total`) %>%
  filter(str_detect(library.Full_Library_Id, "MID02"))

# A tibble: 10 x 3
   library.Full_Library_Id library.Batch          `library.Quantification_post-Indexing_total`
   <chr>                   <chr>                                                       <int64>
 1 MID020.A0101            Li01_SA_2016-11-17                                     332000000000
 2 MID021.A0101            Li01_SA_2016-11-17                                     518000000000
 3 MID022.A0101            Li02_LS_2019-12-09_MID                                           NA
 4 MID023.A0101            Li02_LS_2019-12-09_MID                                           NA
 5 MID024.A0101            Li02_LS_2019-12-09_MID                                           NA
 6 MID025.A0101            Li02_LS_2019-12-09_MID                                           NA
 7 MID026.A0101            Li02_LS_2019-12-09_MID                                           NA
 8 MID027.A0101            Li02_LS_2019-12-09_MID                                           NA
 9 MID028.A0101            Li02_LS_2019-12-09_MID                                           NA
10 MID029.A0101            Li02_LS_2019-12-09_MID                                           NA

Where the values for MID022-MID029 are entered in Pandora as, for example: 4,05E+18 rather than 4.05E+18

from sidora.core.

ivelsko avatar ivelsko commented on July 17, 2024 1

These particular ones are locked because they're more than 6 months old, and would need to be adjusted by someone with Pandora editing permission (so that fixing's probably not going to happen with all the work people have right now, it's a small issue for them). I don't know how common it is beyond these batches.

This all from 1 tech, and I can point out that she needs to change her settings in excel/libreoffice for the future, but I can't be sure that will actually happen. And since a lot of details can slip by or be missed during Pandora entry training, it may happen again through new HiWis or new techs.

Which is to say it would definitely be helpful to have this caught somewhere, whether it's having Pandora give an error during the batch upload/manual entry stage, or in sidora

from sidora.core.

ivelsko avatar ivelsko commented on July 17, 2024

Also sometimes the values are missing in library.Quantification_pre-Indexing_total

sites_df %>% filter(library.Full_Library_Id == "CDM020.A0203") %>% select(contains("Quantification"))
# A tibble: 1 x 2
  `library.Quantification_pre-Indexing_total` `library.Quantification_post-Indexing_total`
                                        <int>                                        <int>
1                                          NA                                           NA

Online the values are:
pre - 11600000000
post - 5130000000000

from sidora.core.

nevrome avatar nevrome commented on July 17, 2024

I'm pretty sure that has something to do with R integer size limits: https://stackoverflow.com/questions/14589354/struggling-with-integers-maximum-integer-size

Puh. We probably could think of enabling 64bit integers (e.g. https://cran.r-project.org/web/packages/bit64/bit64.pdf), but I have the feeling that might become hell soon with all sorts of unexpected effects. So maybe we downscale or store these values as strings?

from sidora.core.

jfy133 avatar jfy133 commented on July 17, 2024

Ah poop... that's sucky.

I don't like the idea of storing them as strings as then you can't actually use the data for computations (which is ultimately what sidora should allow). The 64bit integers might not be a bad idea if we consider we don't actually have much numeric data. I guess the worry is whether it'll break other packages etc..., is that correct?

from sidora.core.

nevrome avatar nevrome commented on July 17, 2024

Ja - that and weird effects if you want to handle these columns together with other integers/doubles. It's basically an own data type different from "normal" numbers.

Could you quickly explain to me why these numbers are so big? What does this number mean? Couldn't we just divide by 10^6? πŸ˜„

If we want to introduce this data type that should be as easy as adding bit64 to the dependencies in the DESCRIPTION file, changing the type in the relevant fields in data-raw/pandora_column_types.tsv and adding the as.integer64 command here:

string_to_as <- function(x) {
switch(
x,
"integer" = as.integer,
"double" = as.numeric,
"factor" = as.factor,
"logical" = as.logical,
"character" = as.character,
"yesno_logical" = yesno_logical_to_logical,
"datetime" = as.POSIXct,
NA
)
}

from sidora.core.

ivelsko avatar ivelsko commented on July 17, 2024

These are the estimated total number of adapter-ligated (pre-indexing) and indexed (post-indexing) DNA fragments in the library, based on the qPCR results. The numbers are pasted from the calculation sheets into Pandora upload sheets. I worry changing them in any way (such as dividing by 10^6) would lead to confusion, so I would prefer to leave them, if possible

from sidora.core.

jfy133 avatar jfy133 commented on July 17, 2024

☝️ I think this important, these are exact counts but can range massively, so division wouldn't always been easily interpretable. I agree with Irina we don't want to change this as reporting in that way is basically never done (as far as I know), and will very likely lead to confusion.

I like the suggestion of as.integer64!

from sidora.core.

nevrome avatar nevrome commented on July 17, 2024

Alright - thank you for the explanation. Then integer64 it is πŸ€·β€β™‚οΈ

from sidora.core.

ivelsko avatar ivelsko commented on July 17, 2024

Not sure if you need to know this, but something you might want to be aware of - ggplot doesn't know how to handle these numbers. I ended up with a blank plot, with the error:

Don't know how to automatically pick scale for object of type integer64. Defaulting to continuous. 
Removed 115 rows containing missing values (geom_point).

I had to convert them with as.numeric to have my points show up.

from sidora.core.

nevrome avatar nevrome commented on July 17, 2024

Ja - that's one of the effects I predicted above. Good that you found a workaround, but be careful with that: Representing big integers as numerics can have unexpected side effects as well.

I suggest to divide by 10^X for analysis and plotting.

from sidora.core.

nevrome avatar nevrome commented on July 17, 2024

Thanks for finding the source of the issue already!

This is a classic shit-in-shit-out-problem. We could probably catch this, but seriously: These entries in the DB are simply wrong. Can't they just be fixed there? Or is this prevented by the almost non-existent user-management of Pandora?

from sidora.core.

nevrome avatar nevrome commented on July 17, 2024

Please check if sidora.core v0.1.1 fixes the problem at hand for you.

The input check should be clearly done by Pandora, which should only accept numeric values in a certain format. I would say report the issue, but afaik there is no place to submit tickets πŸ€·β€β™‚οΈ

from sidora.core.

ivelsko avatar ivelsko commented on July 17, 2024

Yes it works now, thanks!

Maybe @jfy133 can tell me who to talk to about having that format check added to Pandora?

from sidora.core.

jfy133 avatar jfy133 commented on July 17, 2024

@ivelsko Robert FΓ€rber!

from sidora.core.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.