usepa / comptox-toxcast-tcpl Goto Github PK

View Code? Open in Web Editor NEW

25.0 15.0 12.0 57.15 MB

US EPA's Toxicity Forecaster (ToxCast) Pipeline

Home Page: https://cran.r-project.org/package=tcpl

License: Other

R 99.99% Rez 0.01%

ord comptox ccte

comptox-toxcast-tcpl's People

Stargazers

Watchers

Forkers

hqdata mshobair mfakaehler paulsonak rti-edsp jardeko-1127 ontox-project mvandervoet carolineshep rnaimehaom zhenglei-bcs kimist99

comptox-toxcast-tcpl's Issues

tcplLite overwriting mc0 with replicate plates

I am running into a problem where I have run a replicate plate in a later experiment and want to add that data to my existing database. The plate is an exact replicate of a plate already loaded, but it has a new apid and a new scrf. However, tcplWriteLvl0 overwrites the plate that already exists in the database (in mc0.csv) with the new data instead of adding it as expected. Any ideas?

Thanks!
Amanda

Add compare parameter in tcplPlot

Add "compare" parameter to allow concentration response curves to be easily compared within the same plot. Default would remain as compare=FALSE.

Previously: tcplMakeAeidPlts( c(aeid1, aeid2), compare = T, lvl = 5L) created an output that showed two aeids on same plots for same chemical. Ex:

New functionality could also compare different chemicals within same endpoint, or user could select the m5ids they want to compare (axis scaled)

blineShift bug when # of unique concentrations < 4

I think I've run into an issue with the way the blineShift() function works in cases where fewer than 4 unique concentrations are provided.

Minimial reproducible example:

logc in order (Correct Result)

logc = c(1, 2, 3)
resp = c(1, 2, 3)
rslt = tcpl:::blineShift(resp , logc, 50)
all(resp == rslt)

Result: TRUE

logc in non-ascending order (Incorrect Result)

logc = c(3, 2, 1)
resp = c(3, 2, 1)
rslt = tcpl:::blineShift(resp , logc, 50)
print(all(resp == rslt))
message(sprintf("blineShift result should be %s but is %s", paste0(resp, collapse = ","), paste0(rslt, collapse = ",")))

Result: FALSE
blineShift result should be 3,2,1 but is 1,2,3

Example in v3_2 of the database

Here is an example from the current version of the database where there are only 3 unique concentrations at each replicate index. I've only included replicate index 1 for the sake of a clearer example. The resp.pc column is the percentage of positive control before the baseline shift occurs and the resp column is the result after the baseline shift as calculated by TCPL.

SELECT 
    mc4.m4id,
    mc0.m0id,
    mc0.rval,
    mc3.bval,
    mc3.pval,
    mc3.logc,
    ((rval - bval) / (pval - bval)) * 100 as "resp.pc",
    mc3.resp
FROM mc4
INNER JOIN mc4_agg ON mc4_agg.m4id = mc4.m4id
INNER JOIN mc3 ON mc3.m3id = mc4_agg.m3id
INNER JOIN mc1 ON mc1.m1id = mc4_agg.m1id
INNER JOIN mc0 ON mc0.m0id = mc4_agg.m0id
WHERE mc4.spid = 'RO16-6491' 
AND mc4.aeid = 605 
AND repi = 1

m4id	m0id	rval	bval	pval	logc	resp.pc	resp
7674087	94599642	4175.56	7805.2	440.92	-1.30102999566398	49.2870993498346	49.2870993498346
7674087	94599637	429.66	7805.2	440.92	0.698970004336019	100.15290021563547	93.9414307983944
7674087	94599639	887.09	7805.2	440.92	-0.301029995663981	93.9414307983944	100.152900215635

Because there are fewer than 4 unique concentrations, the baselineShift() function should just return the responses without modification. If you take the logc and resp.pc columns from the table above and call baselineShift(), the result should just be the resp.pc values in the same order but that does not happen.

logc = c(-1.30103, 0.69897, -0.30103)
resp = c(49.287,   100.153,   93.941)
rslt = tcpl:::blineShift(resp , logc, 50)
print(all(resp == rslt))

Result: FALSE

If however, you first order the logc/resp pairs, then the function will return correctly

logc = c(-1.30103, -0.30103, 0.69897)
resp = c(49.287,     93.941,   100.153)
rslt = tcpl:::blineShift(resp , logc, 50)
print(all(resp == rslt))

Result: TRUE

The underlying issue:

Here's a snippet from the blineShift.R file that demonstrates the issue.

blineShift.R

blineShift <- function(resp, logc, wndw) {
  
  wndw <- unique(wndw)[1]
  
  ordr <- order(logc)
  resp <- resp[ordr]
  logc <- logc[ordr]
  
  uconc <- unique(logc)
  nconc <- length(uconc)
  
  if (any(is.na(resp))) return(resp)
  if (nconc < 4) return(resp)

...

The bug happens because the provided logc and resp parameters are ordered by logc. The two parameters are then overwritten in the new order.

ordr <- order(logc)
resp <- resp[ordr]
logc <- logc[ordr]

When we hit the check:

if(nconc < 4) return(resp)

we exit the function returning the reordered responses. This doesn't happen in the case of 4 or more unique concentrations as the shifted responses are placed back into their original order before returning.

resp[order(ordr)]

A simple fix for this issue would be to do the 2 if checks (NA and unique concentrations) before reordering the logc and resp variables. Something like:

blineShift <- function(resp, logc, wndw) {
  wndw <- unique(wndw)[1]
  uconc <- unique(logc)
  nconc <- length(uconc)

  if (any(is.na(resp))) return(resp)
  if (nconc < 4) return(resp)

  ordr <- order(logc)
  resp <- resp[ordr]
  logc <- logc[ordr]

  ...

Add ToxCast pipeline overview diagram to vignettes

Insert diagram (.jpg created for manuscript) to the processing and introductory vignettes

bug building v2 vignette

Due to new constraints on lvl 0 we can no longer input conc values of 0, this breaks the vignette apparently. Change the value to something other than 0 in the vignette.

Quitting from lines 189-191 (Data_processing-Archive_tcpl_v2.Rmd)
Error: processing vignette 'Data_processing-Archive_tcpl_v2.Rmd' failed with diagnostics:
Cannot process a concentration value of 0. Confirm source document lists 0 as concentration value and rerun with dummy value of .01

Data retrieval vignette update: Annotations retrieval

Add a new annotations selection between data and chemical information with example query statements. Following text adapted from the annotations sop

Assay Annotations
Assay source, assay, assay component, and assay endpoint are registered via tcpl scripting into a
collection of tables. The database structure takes the annotations and organizes them as attributes
of the assay conductors, the assays (i.e., experiments), the assay components (i.e., raw readouts),
or the assay endpoints (i.e., normalized component data) enabling aggregation and differentiation
of the data generated through ToxCast and Tox21. The annotations capture four types of information:
i. Identification information
ii. Design information such as the technology, format, and objective aspects that decompress the assay’s innovations,
iii. Target information such as the target of technological measurement and the
biologically intended target, and
iv. Analysis information about how the data were processed and analyzed.

#load libraries and connections
library(RMySQL)
con <- dbConnect(drv = RMySQL::MySQL(), user="user", pass="pass", db="InvitroDB", host="host")
#query database using RMySQL:
#use source table to identify which ids are needed in subsequent queries.
tcplLoadAsid()
source <- tcplLoadAeid(fld="asid",val=1, add.fld = c("aid", "anm", "acid", "acnm"))
#select annotation and subset by ids or name
assay <- dbGetQuery(con, "SELECT * FROM invitrodb.assay where aid=1;")
component <- dbGetQuery(con, "SELECTFROM invitrodb.assay_component;")
component <- subset(component, acid %in% source$acid)
endpoint <- dbGetQuery(con, "SELECTFROM invitrodb.assay_component_endpoint;")
endpoint <- endpoint [grepl("ATG", endpoint $assay_component_endpoint_name),]

Make tcplCytoPt backwards compatible

Once functionality is confirmed for 4.1 release we need to ensure we can use the same function to recalculate 3.5 and earlier with the same methods

tcplCytoPt() update

when connected to prod_internal_invitrodb_v4_0, there is an error in tcplCytoPt() where it seems to look for modl_ga from the wrong table - so could not run function for invitrodb v4.0. I think it could be updated to work with the new data schema (even without any updates to the burst assay annotation). Thank you for investigating.

tcplConf(drvr='MySQL', db='prod_internal_invitrodb_v4_0', pass='', user='', host='ccte-mysql-res.epa.gov')
cyto4 <- tcplCytoPt()
[1] 1
[1] 2
[1] 3
[1] 3.1
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
Error in setorderv(dat, c("aeid", "chid", "fitc.ordr", "nflg", "modl_ga", :
some columns are not in the data.table: modl_ga

Trying to use R with MySQL: The used command is not allowed with this MySQL version

If you are using tcpl with MySQL (v8.0+) installed locally, you may run into this issue. This will be true for any package that relies on the RMySQL dependency. Please see the suggested resolution here:
https://stackoverflow.com/questions/50745431/trying-to-use-r-with-mysql-the-used-command-is-not-allowed-with-this-mysql-vers?rq=1

Add medresp.3bmad flag

Add mc6 flag to encompass tcpl v2 fit_all parameter logic. This will flag concentration series where at least one median resp value is not greater than 3bmad

Reprocess mc6 for all endpoints with flag added

NULL chid in tcplCytoPt output

1 cytotox record failed to load in datamart because the chid value is null. We calculate cytotox based on a subset of values where chid should always be present. Investigate why NULL is present and correct

chid	cytotox_median_raw	cytotox_mad	global_mad	cytotox_median_log	cytotox_median_um	cytotox_lower_bound_um	ntested	nhit	cytotox_lower_bound_log	created_date
NULL	-1.628288418	0.21821334	0.159706964	-1.628288418	0.023534858	0.007808915	65	25	-2.107409311	5/2/2023

bmad calculation

Why doesn't "bmad" calculation use "constant=1" within R's "mad" function?

For example, instead of

e1 <- bquote(dat[ , bmad := mad(resp[cndx %in% 1:2 & wllt == "t"], na.rm = TRUE)])

why not

e1 <- bquote(dat[ , bmad := mad(resp[cndx %in% 1:2 & wllt == "t"], constant = 1, na.rm = TRUE)]) ?

I think that the default value should be used for larger samples. In the "bmad" calculation, we only consider 2 responses for the two lowest concentrations.

Below I show the difference between default mad function and with constant=1:

mad(c(6,7,7,8,12,14,15,16,16,19,22,24,26,26,29,46))
[1] 11.8608

mad(c(6,7,7,8,12,14,15,16,16,19,22,24,26,26,29,46),constant=1)
[1] 8

Update License to 2023

@cthunes please add your name to contributors on tcpl

Update authorship?

Hi, it seems we either misunderstood or behavior is changed, but there are packages on CRAN which specify a creator in Authors@R and a separate maintainer. This should allow me to be designated as c("cre", "aut") with Richard listed under a separate Maintainer field. I greatly appreciate being included as the sole author since Richard had to be made creator, but with this seemingly new behavior, it would be most fair to list those who have contributed significantly as authors. At minimum, Matt, Woody, and Richard contributed to the core functionality/design.

Create Unit tests with example data

Remove "<br />" from vignette index titles

Remove breaks from vignette links on CRAN:

Titles were copied as vignette index entries

Level 6 processing error

Error in eval(jsub, SDenv, parent.frame()) : object 'bmd' not found

Investigate how we can get full test coverage for tcplDefine

For example, investigate how to alter tcplQuery to be used with the example .rda files

Single Conc Bidirectional = FALSE method

Some assays do not make sense bidirectionally. For those situations we need a single conc method that will fit in only the up direction.

Add mc5 fit categories diagram and text to processing vignette

The mc5 fit categories text and diagram had been removed from processing vignette while updated fitc were still under development. Text and a new diagram can be reintroduced given the updated fit categories.

Create tcplDefine function

Create a function that will query the tcpl database and return information from the new data dictionary table. This should include the ability to query via parameters by field and value. It will be helpful to review and adapt existing tcplLoad(Data, Chem, Acid, etc) functions.

tcplprepotpt returns sql error when datatable with 0 rows are input

Should instead return input table with 0 rows and the prepotpt columns

TOX-485 Chemical not populated in cytotox table even though there is data at mc4

Cytotox table should populate for every chemical that has data in invitrodb even if there are no datapoints in burst assays (should default to a value). Need to investigate why galaxolide was not populated in the cytotox table:

use prod_internal_invitrodb_v3_5;
select * from chemical where dsstox_substance_id = "DTXSID8027373";
select * from cytotox where chid = 27373;
select * from mc4 where spid in (select spid from sample where chid = 2737

Updating to version 3.2 of the database

I'm trying to update to version 3.2 of the database, using the .bat file as described in the README (from the 3.2 download). After a long time, I get:

I was hoping that wouldn't matter, but when I do this:

library(tcpl)

tcplConf(drvr = 'MySQL', 
         user = 'xxx',
         pass = 'xxx',
         host = 'xxx', 
         db = 'tox32')

cas <- "1912-24-9"
chem_info <- tcplLoadChem(field = 'casn', val = cas)
Warning message:
  In tcplLoadChem(field = "casn", val = cas) :
  The given casn(s) are not in the tcpl database.

"1912-24-9" is on the Comptox dashboard.

Do you happen to have any suggestions specific to the 3.2 version?

Dose response curves

Is this the simplest way to create a single dose response curve?

library(tcpl)

tcplConf(drvr = 'MySQL',
         user = 'xxx',
         pass = 'xxx',
         host = 'localhost',
         db = 'tox31')

ep <- "APR_Hepat_MitoFxnI_48hr_dn"
cas <- "64902-72-3"

chem_info <- tcplLoadChem(field = 'casn', val = cas)
assay_info <- tcplLoadAcid(fld = "acnm", val = ep)

mc3 <- tcplLoadData(lvl = 3, type = "mc", 
                    fld = c("acid","spid"), 
                    val = list(assay_info$acid,
                               chem_info$spid))

mc4 <- tcplLoadData(lvl = 4, type = "mc", 
                    fld = c("spid", "aeid"), 
                    val = list(chem_info$spid,
                               unique(mc3$aeid)))

tcplPlotM4ID(mc4[1,], lvl = 5)

If so......is there a subset of the sql files in the prod_internal_invitrodb_v3_2 folder that could be run individually? I only use the tcpl package to create dose response curves in specific orders. It's taking me days to re-install the database. Admittedly most of that time is me trying to remember how to use MySQL/Workbench (the documentation from Oracle is not as user friendly as I'm used to coming from an R background)... But, maybe I don't need to run any of the enormous mc5 and above sql files for instance...or the ignore the "sc" type files?

add option for export_ready status in tcplsubsetchid

Optional parameter in tcplsubsetchid should take into account export_ready status and set chid_rep to 0 for those spids that are export_ready 0. This will allow us to generate representative sample before filtering.

TOX-444 tcplCytoPt and use_global_mad=FALSE

Jason Brown

This function may need some updates. It works fine for calculating the invitrodb.cytotox table, but it doesn't work for calculating a single chemical because for some reason the default seems to be use_global_mad=FALSE, which means no lower bound can be computed from the median cytotox value.

This will still be important in tcpl_v3+ because we'd like to get estimates of cytotoxicity regardless of what the curve-fitting looks like.

It may help to do some testing with 1, 2, and small groups of chemicals?

See example:

tcplCytoPt(chid=20182)
[1] 1
[1] 2
[1] 3
[1] 3.1
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9
[1] 10
chid code chnm casn med mad ntst nhit use_global_mad global_mad cyto_pt cyto_pt_um lower_bnd_um
1: 20182 C80057 Bisphenol A 80-05-7 1.728439 0.1873701 88 17 FALSE NA 1.728439 53.51055 NA

Issues with the next version of ggplot2

We are preparing the next release of ggplot2 and our reverse dependency checks show an issue with tcpl. Specifically this call

CompTox-ToxCast-tcpl/vignettes/Data_processing.Rmd

Lines 1461 to 1464 in ba7d24d

  override.aes = list( 

  colour = c(potency_cols), 

  alpha = 1 

  )

is problematic as it passes multiple values to override.aes which is not permitted (the argument is used to override the parameters with a single value). In the next version this will result in an error.

We plan to release ggplot2 by the end of October and hope you are able to provide a fix before that.

kind regards
Thomas

Investigate testing without the use of expect_snapshot

expect_snapshot does not seem to be capable of handling inconsistent appearance of warnings, such as a warning about package/R version. Look into solving this problem, or more likely deciding on a different testing solution.

Testing github projects

Remove hard coded mc5 and mc6 methods

For newly registered endpoints, mc5 and mc6 methods appear prepopulated. This isn’t happening at other levels, so I am wondering if that’s intentional or hard coded somewhere. User needs to clear methods to assign new ones. For level 6, some of the flags have been combined (border flag) or are not yet implemented (overfit.hit)

Current auto-assigned methods:

tcplMthdLoad(lvl=5, id = 3032) aeid mthd mthd_id
1: 3032 bmad3 1
2: 3032 pc20 2
3: 3032 bmad5 5
tcplMthdLoad(lvl=6, id = 3032) aeid mthd mthd_id nddr
1: 3032 singlept.hit.high 6 0
2: 3032 singlept.hit.mid 7 0
3: 3032 multipoint.neg 8 0
4: 3032 noise 10 0
5: 3032 border.hit 11 0
6: 3032 border.miss 12 0
7: 3032 gnls.lowconc 15 0
8: 3032 overfit.hit 16 0

Update tcplPlot and processing for control data inclusion

Some assays screen positive or negative control chemicals, and we indicate these wells with wllt designations

c: Gain-of-signal control in multiple concentrations
p: Gain-of-signal control in single concentration
m: Loss-of-signal control in multiple concentrations
o: Loss-of-signal control in single concentration

tcplPlot only created plots for DMSO (wllt=n) and test wells (wllt=t), and it appears only these samples are passed to mc5. mc1.R appears to filter out certain well types. These may or may not have spids

Investigate where exactly filtering occurs and see if additional wllt can be added, particularly wllt=="c" or "m"

Investigate snapshot issue where svg file is not consistent between environments.

Investigate/resolve warnings during plotting

When plotting many plots (entire aeid for example) there is generally multiple warnings due to missing data. Add checks into the tcplplot function so that the missing data is not attempted to be added. This will resolve the warnings.

tcplPlot(lvl=5, fld = "aeid",val = 2237, output = "pdf",verbose = FALSE)
There were 50 or more warnings (use warnings() to see the first 50)
> warnings()
Warning messages:
1: Removed 101 rows containing missing values (`geom_function()`).
2: Removed 101 rows containing missing values (`geom_function()`).
3: Removed 101 rows containing missing values (`geom_function()`).
4: Removed 101 rows containing missing values (`geom_function()`).
5: Removed 101 rows containing missing values (`geom_function()`).
6: Removed 101 rows containing missing values (`geom_function()`).
7: Removed 101 rows containing missing values (`geom_function()`).
8: Removed 101 rows containing missing values (`geom_function()`).
9: Removed 101 rows containing missing values (`geom_function()`).
10: Removed 34 rows containing missing values (`geom_vline()`).

Interactive plotly plot loads very slow

Probably a bug within the resolution setting causing too many points to be plotted

Tiny bug: Typo in installation instructions

In the Readme you suggest:

devtools::install_github("USEPA/ComptTox-ToxCast-tcpl")

there seems to be one "t" too many (it should be CompTox not ComptTox)...

tcplloaddata not working for old versions of db

tcplLoadData(5, fld = "aeid", val = 2)
Error in tidyr::pivot_wider() at tcpl/R/tcplLoadData.R:659:4:
! Must select at least one item.
Run rlang::last_trace() to see where the error occurred.

TOX-96 tcplPlotPlate not working with sc data

tcplPlotPlate works when sc apids are provided. However, tcplPlotPlate by default looks for mc ids rather than sc ids. A type parameter could be added so the user can specify whether they want 'mc' or 'sc' plates

Add mc4 bidirectional.false.loss method

Add mc4 bidirectional.false.loss method to prevent bidirectional fitting and only fit data in negative analysis direction. Currently the mc4 bidirectional=false method requires data to plot in the positive analysis direction. This can make biological interpretation difficult ex. data for cytotoxicity endpoints with unidirectional loss of signal always need to be inverted

tcplPlot BMR/BMD line intersection for plots in the negative analysis direction

BMD/BMR line intersection appears to always display in the positive analysis direction, despite data plotting in the negative direction

MySQL headaches

I fought to set up ToxCast v.2, and now I'm fighting to set up version 3 (I don't use MySQL in any other project).

Do you have any ideas what is happening here:

I've downloaded and installed MySQL
I've downloaded and installed MySQL workbench
I run MySQL workbench as an administrator
I go to "Data Import/Restore"
I click on "New..." Default Target Schema, naming it "toxcastdb"
In the "Import from Self-Contained File" radio button, I navigate to my file that ends in "prod_internal_invitrodb_v3.all.sql" (that...I downloaded and unzipped before step 1)
Click over to the "Import Progress" tab to click Start Import.
Wait...who-knows-how-long....(At least on my Windows 10 machine, I never see the progress bar move. Googling it makes me think that might be normal).

I took that screen shot after waiting well over an hour.
Click over to "Users and Privileges", create a user "ToxCast" and give them all the possible rights. (I'm not setting this up for adding new data or anything, just want to make the dose response curves).
I up the "max_allowed_packet" in MySQL Workbench to whatever that message indicated.
Then this is what I do in R:

library(tcpl)

tcplConf(drvr = 'MySQL', 
         user = 'ToxCval',
         pass = 'xxx',
         host = 'localhost', 
         db = 'toxcastdb')

tcplPlotM4ID(m4id = 18609966, lvl = 5)
Error in .prepField(fld = fld, tbl = tbls, db = getOption("TCPL_DB")) : 
  Not all given fields available in query.
tcplConfList()
$TCPL_DB
[1] "toxcastdb"

$TCPL_USER
[1] "toxEval"

$TCPL_HOST
[1] "localhost"

$TCPL_DRVR
[1] "MySQL"

Do you know what the error means? My guess is that I haven't waited long enough for that data import to complete. Is there a way to test? Like, can I query for a complete list of table names or something?

Specify database port to be used in dbConnect connection

I'm running the ToxCast DB in a MySQL database on a non-standard DB port. Currently, there is no input option to specify the port. However, you are using DBI::dbConnect to actually make the database connection, so it seems reasonably possible to add the port as an optional input as part of tcplConf.

Would it be possible to get the port added in as an option?
Am I misunderstanding anything about how you are using DBI::dbConnect that would cause an issue with adding in the DB port option?

Thanks in advance!

Investigate GitHub Actions for test coverage

One feature of GitHub is the use of automated workflows in a continuous integration service called GitHub Actions.

Investigate R package examples of test coverage and see if it is possible to implement testthat coverage for an action, such as a GitHub pull request in tcpl.

Understanding GitHub Actions
https://docs.github.com/en/actions/learn-github-actions/understanding-github-actions

Some example GitHub Action workflows using R packages:
https://github.com/r-lib/actions/tree/v2/examples#test-coverage-workflow

bidirectional sc methods

I think that we probably need to add in a method for sc1 or sc2 that allows for the absolute value of the max_med to exceed the absolute value of the cutoff for sc endpoints that are bidirectionally interpretable. Consider testing on CEETOX endpoints potentially since these have corollary mc aeids that could be used to map to.

data dictionary?

Hi there,

I'm interested in pulling out all the dose response fit parameters in the toxcast database for a set of 21 chemicals. My issue is that I can't find a complete data dictionary anywhere for the level 5 multiple comparison (mc5) variables returned by my query.

Can anyone help me define all the variables returned from my query? (see list of variables below)

For example, I think that modl_ga = The log10AC50 for the best fit model, based on Appendix B at the bottom of this vignette, but I'm not sure what all the other modl_<other stuff here> variables are.

Workflow

Installed MySQL
Built the database locally using the mysql_th813_prod_internal_invitrodb_v3_5_2022-07-20_00h00m.dmp file.
Installed the tcpl and tcplfit2 packages in R v(4.2.1)
Used the tcplConf function to setup queries to the database
Ran the code below to query the invitrodb_v2 database

library(tidyverse)
library(tcplfit2)
library(tcpl)

# define chemical list by CAS
chems <- c("94-75-7", "116-06-3", "1912-24-9", "10108-64-2", "63-25-2",
          "5234-68-4", "2921-88-2",	"8018-01-7", "16752-77-5", "115-09-3",
          "7718-54-9", "1910-42-5", "114-26-1", "175013-18-0", "115-86-6",
          "7646-85-7", "121-75-5", "122453-73-0", "1327-53-3", "1897-45-6",
          "7447-39-4")

# get every endpoint ID in the database: aeid = (assay endpoint ID)
aeids <- tcplLoadAeid()

# pull all the mc5 data for every endpoint ID?
mc5 <- tcplPrepOtpt(
  tcplLoadData(lvl=5, # data level
               fld="aeid", # query on assay endpoint ID
               val=aeids$aeid, # give me every value
               type = "mc") # return multiple concentration data type
)

# filter to the chemicals I care about, sorry everybody else, lol
mc5.chem <- mc5 %>%
  dplyr::filter(casn %in% chems)

# glimpse the data to see output
glimpse(mc5.chem)

Got the mc5.chem data.frame = mc5.chem.issue.rda in issue_files.zip
I've been building the data dictionary for this object but I don't knwo what the variables are with question marks in the dd.csv file in issue_files.zip
Here's the zip file issue_files.zip

Major variables in question - what are these bad boys?

modl_er
modl_tp
modl_ga
modl_gw
modl_la
modl_lw
modl_rmse
modl_prob
modl_acc
modl_acb
modl_ac10

Dose response curves

I've updated my script for creating custom dose response curve files. There is still one filter that I can't figure out that I think would assure the plots match up with the plots on https://comptox.epa.gov/dashboard.

Here is my current script:

cas <- "94125-34-5"
ep <- "ATG_PXRE_CIS_up"
chem_info <- tcplLoadChem(field = 'casn', val = cas)
assay_info <- tcplLoadAeid(fld = "aenm", val = ep)

mc4 <- tcplLoadData(lvl = 4, type = "mc", 
                    fld = c("spid", "aeid"), 
                    val = list(chem_info$spid,
                               assay_info$aeid))
mc5 <- tcplLoadData(lvl = 5L, type = "mc", 
                    fld = "m4id", val = mc4)

mc4_id <- mc5$m4id[which(mc5$hitc == 1)]
index <- which(mc4$m4id %in% mc4_id)

tcplPlotM4ID(mc4[index,], lvl = 5)

Calling the mc5 table lets me filter for a hitc of 1, but in this case still leaves me with 3 plots. I've been told "use the index where gsid_rep=1". I can see the "gsid_rep" column in the ToxCast "INVITRODB_V3_2_LEVEL5" csv files. In those csv files, there are 90 columns. In the "mc5" data frame from the tcplLoadData function, there are 68 columns (but otherwise the ones that are there match up).

So my question is...is there a to filter the mc4 data to just gsid_rep=1? I'm trying to find a document to figure out what the official/long name for the gsid_rep column is, but so far haven't found it, but I'll comment back on this issue if I do.

What I'm trying to reproduce:
https://comptox.epa.gov/dashboard/dsstoxdb/results?search=94125-34-5#invitrodb-bioassays-toxcast-tox21

tcplLoadData mc4, add.fld=FALSE - SQL syntax error

When loading example data from invitrodb at each level for the TOX-612 ticket, I found that mc4 with add.fld=FALSE receives the SQL syntax error found in TOX-614

tcplLoadData_mc4_addfldF <- tcplLoadData(lvl = 4, fld = c("spid", "aeid"), val = list( c("01504209", "1210314466", "TP0001652A01", "TP0001652B01", "TP0001652D01"), 80), type = "mc", add.fld = FALSE)

Error: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'AND mc4.spid IN ("01504209","1210314466","TP0001652A01","TP0001652B01","TP00016' at line 20 [1064]

TOX-602 Summary statistics need to account for bidirectional fitting

Example: resp_max should be the maximum response in the positive or negative direction if bidirectional = TRUE and only the positive direction if bidirectional = FALSE

Error with tcplVarMat using tcplLite

Minimum reproducible example

The example in ?tcplVarMat:

## Store the current config settings, so they can be reloaded at the end 
## of the examples
conf_store <- tcplConfList()
TCPLlite <- file.path(system.file(package = "tcpl"), "example")
tcplConf(db = TCPLlite, user = NA, host = NA, drvr = "tcplLite")
## Not run: 
## Demonstrate the returned values. Note with no "burst" assays defined in
## the example database, the user must provide which aeid values to use 
## in calculating the cytotoxicity distributions for the 'zscore' matrix.
tcplVarMat(chid = 1:5, cyto.pars = list(aeid = 1:2))

## Other changes can be made
tcplVarMat(chid = 1:5, row.id = "chnm", cyto.pars = list(aeid = 1:2))
tcplVarMat(chid = 1:5, add.vars = "max_med", cyto.pars = list(aeid = 1:2))

## End(Not run)
## Reset configuration
options(conf_store)

Problem explanation and steps taken to resolve it

Running the tcplVarMat command yields Error: no such table: mc5. I changed lines 145 and 146 of tcplVarMat.R to:

mc <- tcplQuery("SELECT DISTINCT mc5.aeid, spid FROM mc5 inner join mc4 on mc4.m4id = mc5.m4id;", tbl=c('mc4','mc5'))
sc <- tcplQuery("SELECT DISTINCT aeid, spid FROM sc2;", tbl='sc2')

The next error I got was at line 227, mat.tested[ , eval(e1)]:

Error in `:=`("chid", NULL) : 
  Check that is.data.table(DT) == TRUE. Otherwise, := and `:=`(...) are defined for use in j, once only and in particular ways. See help(":=").

To solve this I changed all of the dcast commands (lines 209, 214, 219, 240) to dcast.data.table.

However I am not able to test whether these changes break the sql version of the function.

Also, I can't tell for sure if these changes actually produce the correct output.

Questions

My questions are:

Should these changes be accepted?
Are my package versions the reason for dcast not working properly?
- I installed tcpl using devtools::install_github("USEPA/CompTox-ToxCast-tcpl")
- I had to follow several troubleshooting steps when installing data.table before installing tcpl, outlined here.

Here is the output from my sessionInfo():

> sessionInfo()
R version 4.0.1 (2020-06-06)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.3

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] dplyr_1.0.0       reshape2_1.4.4    nlsr_2019.9.7     RSQLite_2.2.0     htmlTable_2.0.1   tcpl_2.1.0       
[7] data.table_1.12.9

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.4.6        pillar_1.4.4        compiler_4.0.1      RColorBrewer_1.1-2  plyr_1.8.6         
 [6] tools_4.0.1         testthat_2.3.2      pkgload_1.1.0       digest_0.6.25       bit_1.1-15.2       
[11] tibble_3.0.1        lifecycle_0.2.0     memoise_1.1.0       checkmate_2.0.0     pkgconfig_2.0.3    
[16] rlang_0.4.6         DBI_1.1.0           rstudioapi_0.11     parallel_4.0.1      xfun_0.15          
[21] proto_1.0.0         withr_2.2.0         RMariaDB_1.0.8      stringr_1.4.0       knitr_1.29         
[26] desc_1.2.0          generics_0.0.2      vctrs_0.3.1         htmlwidgets_1.5.1   hms_0.5.3          
[31] rprojroot_1.3-2     tidyselect_1.1.0    bit64_0.9-7         glue_1.4.1          sqldf_0.4-11       
[36] R6_2.4.1            tcltk_4.0.1         gsubfn_0.7          purrr_0.3.4         blob_1.2.1         
[41] magrittr_1.5        ellipsis_0.3.1      backports_1.1.8     htmltools_0.5.0     assertthat_0.2.1   
[46] numDeriv_2016.8-1.1 stringi_1.4.6       crayon_1.3.4        chron_2.3-55