Git Product home page Git Product logo

Comments (30)

Phenomniverse avatar Phenomniverse commented on June 3, 2024 1

No its all good. The output read_chroms generates has the advantage of including the meta data from the FID file, which I lose as soon as I turn it into a data frame with two columns only. It's only for the sake of my existing code that I am converting it into a data frame. Thanks again for your efforts on this project!

from chromconverter.

ethanbass avatar ethanbass commented on June 3, 2024

from chromconverter.

ethanbass avatar ethanbass commented on June 3, 2024

Regarding the OpenChrom path, I think on Windows you need the full path to the .exe file with the extension. Unfortunately, it is not totally consistent between operating systems. I should probably update the docs to reflect this more clearly. Once you provide the path, it should be saved so that you don't have to provide the path again when you reopen the package, as you don't move or delete your OpenChrom installation.

from chromconverter.

Phenomniverse avatar Phenomniverse commented on June 3, 2024

Thanks Ethan I'll try it out tomorrow when I'm back in my office. Would you mind giving me an overview of the command line arguments that OpenChrom requires to convert a csd file to csv? I couldn't find any OpenChrom documentation on this point.

from chromconverter.

ethanbass avatar ethanbass commented on June 3, 2024

What you have looks good for converting just one file.

read_chroms(paths = <>, find_files=FALSE, path_out=getwd(), export=TRUE, parser="openchrom", format_in='csd', export_format = 'csv')

Theoretically in the latest version you shouldn't need the find_files argument either, but it can't hurt to be explicit. I just added a new set of parsers from the rainbow package for python that should theoretically work at well for Agilent, but i haven't tested them out on Windows yet. You should be able to call this from read_chroms also as follows:

read_chroms(paths = <>, find_files=FALSE, path_out=getwd(), export=TRUE, parser="rainbow", format_in='chemstation', data_format = "wide", export_format = 'csv')

You may need to run configure_rainbow() first. It's supposed to install automatically, but seems to not work so well on Windows.

from chromconverter.

ethanbass avatar ethanbass commented on June 3, 2024

Oh you meant for OpenChrom, sorry. You need to provide an xml batch file to the OpenChrom command line to access the parsers. Then you run openchrom -cli -batchfile <path_to_batchfile>. The main work that my R function is doing under the hood is to write the batch file. It has to include all the paths to the files you want to convert at "InputEntries" and then it has ProcessEntries with the commands you want OpenChrom to carry out. The ProcessEntry you need to convert csd to csv is csd.export.org.eclipse.chemclipse.csd.converter.supplier.csv. You can look at the source code for the call_openchrom function to see how it is constructing the batch files (https://github.com/ethanbass/chromConverter/blob/master/R/call_openchrom.R). It's not very well documented anywhere as far as I've been able to find.

from chromconverter.

ethanbass avatar ethanbass commented on June 3, 2024

(Maybe I should split the batch file constructor code off into a separate function in case people want to call that separately to just make the batch files). I think that might make for cleaner code anyway

from chromconverter.

ethanbass avatar ethanbass commented on June 3, 2024

Also, If you'd be open to sharing one of those FID files with me that would be helpful for testing the package. I don't think I have any FID files from Agilent in my little collection yet

from chromconverter.

Phenomniverse avatar Phenomniverse commented on June 3, 2024

I just pushed a patch to the main branch that I think should solve this issue (070f297). It would be great if you can test it for me

I reinstalled the package from github but the path_out issue still occurs. The error message no longer includes the path_out value, so I can't confirm whether this is because of the leading slash still being present or if there is another issue.

>   read_chroms(paste0(archive_sample_dirs[1],'/FID1A.ch'),find_files=FALSE, path_out=data_dump,export=TRUE, parser="openchrom", format_in='csd',export_format = 'csv')
Error in (function (files, path_out, format_in, export_format = c("csv",  : 
  'path_out' not found. Make sure directory exists.

Not specifying a path_out results in being prompted to accept export to 'temp' directory, which also doesn't work:

>   read_chroms(paste0(archive_sample_dirs[1],'/FID1A.ch'),find_files=FALSE,export=TRUE, parser="openchrom", format_in='csd',export_format = 'csv')
Export directory not specified! Export files to `temp` directory (y/n)?y
Error in (function (files, path_out, format_in, export_format = c("csv",  : 
  'path_out' not found. Make sure directory exists.

I just added a new set of parsers from the rainbow package for python that should theoretically work at well for Agilent, but i haven't tested them out on Windows yet.

I attempted this (I ran the configure_rainbow() function first, but I don't think it was necessary, I think that ran automatically when I loaded the package). This is the result:

>   read_chroms(paste0(archive_sample_dirs[1],'/FID1A.ch'), find_files=FALSE, path_out=getwd(), export=TRUE, parser="rainbow", format_in='chemstation', data_format = "wide", export_format = 'csv')
Warning in read_chroms(paste0(archive_sample_dirs[1], "/FID1A.ch"), find_files = FALSE,  :
  Error in converter(file) : could not find function "converter"

The following chromatograms could not be interpreted: 1
list()

(Maybe I should split the batch file constructor code off into a separate function in case people want to call that separately to just make the batch files). I think that might make for cleaner code anyway

Sounds like a good idea to me.

Also, If you'd be open to sharing one of those FID files with me that would be helpful for testing the package. I don't think I have any FID files from Agilent in my little collection yet

I should be able to find a chemstation data file that is okay to share. What's the best way to get it to you?

from chromconverter.

ethanbass avatar ethanbass commented on June 3, 2024

Ahh sorry this still isn't working. Thank you for the detailed report!

I attempted this (I ran the configure_rainbow() function first, but I don't think it was necessary, I think that ran automatically when I loaded the package). This is the result:

>   read_chroms(paste0(archive_sample_dirs[1],'/FID1A.ch'), find_files=FALSE, path_out=getwd(), export=TRUE, parser="rainbow", format_in='chemstation', data_format = "wide", export_format = 'csv')
Warning in read_chroms(paste0(archive_sample_dirs[1], "/FID1A.ch"), find_files = FALSE,  :
  Error in converter(file) : could not find function "converter"

The following chromatograms could not be interpreted: 1
list()

I'm pretty sure I know why this wasn't working and just patched it. Fixing the path issue on windows is going to be a bit trickier. I'll need to spend some time troubleshooting on the Windows box in the lab which I won't be able to do until next week because of the holiday. I clearly introduced a bug somewhere but I can't quite figure out what the problem is. The back slashes in the windows paths make a lot of problems in R...

If you wanna to just email me an example file (at [email protected]) that would be great. I can confirm for you if you want that rainbow is able to read it.

from chromconverter.

ethanbass avatar ethanbass commented on June 3, 2024

So I actually messed around with the paths a little more just now and pushed another update to fix the way the paths are parsed on Windows. I have the OpenChrom parser working again on my Windows 10 computer in the lab. Might be worth another look when you get a chance. I also confirmed that the rainbow parser seems to be running smoothly now at least on my installation of Windows 10.

from chromconverter.

Phenomniverse avatar Phenomniverse commented on June 3, 2024

Ok so this is a bit weird but the path_out option seems to work for the rainbow parser, but not the openchrom one. However, the rainbow parser is returning null rather than collecting the data from the .ch file. But it is successfully creating the empty .csv file in the working directory. So that's progress. Another issue that I forsee when I get the parser to work properly is that the output .csv file is named as per the input .ch file, but if read_chroms is iterating over a list of input files that all have the same name (in different directories), I'll only end up with a .csv of the last .ch file evaluated. Chemstation saves its raw data as FID1A.ch by default and I have a lot (thousands) of these files so its not ideal to be changing the input file names.

>   read_chroms(paste0(archive_sample_dirs[1],'/FID1A.ch'),find_files=FALSE, path_out=getwd(),export=TRUE, parser="openchrom", format_in='csd',export_format = 'csv')
Error in (function (files, path_out, format_in, export_format = c("csv",  : 
  'path_out' not found. Make sure directory exists.
>   read_chroms(paste0(archive_sample_dirs[1],'/FID1A.ch'), find_files=FALSE, path_out=getwd(), export=TRUE, parser="rainbow", format_in='chemstation', data_format = "wide", export_format = 'csv')
$FID1A
NULL

from chromconverter.

Phenomniverse avatar Phenomniverse commented on June 3, 2024

If you wanna to just email me an example file

Email sent :)

from chromconverter.

ethanbass avatar ethanbass commented on June 3, 2024

Thank you for the update and for sending along the files. I'm glad to hear that the rainbow parser is running for you now (even though it isn't actually reading your files). Unfortunately, I checked out your files and it doesn't seem like any of the parsers that are currently included with the package are able to read them properly. I know Roderick Bovee who develops the entab package is working on an update for the agilent FID parser (bovee/entab#42) and it seems like the rainbow developers might also be interested in your file (since they have a parser for agilent FID files that is throwing an error on your file). Do you know what version of Chemstation your files are created by? With your permission, I'd be happy to pass your files along to Roderick and the rainbow people and they might be able to update the parsers to properly read your files.

Regarding the path issue with OpenChrom, I'm pretty perplexed and don't really understand why this is happening since it's working fine on the other Windows computer I have access to. I will have to look into this further.

Regarding the file name issue, if you have the files in the original .D directory you can change format_in to agilent_d and it will read in the filenames from the directory name. This is still a good point though. I may think about adding an additional argument to read the directory names when using the format_in = chemstation or at least have the parser read the directory names into the metadata for the file. I will have to think a little more about how to best implement this.

Thank you for all the feedback and please let me know if it would be OK to share the files.

from chromconverter.

ethanbass avatar ethanbass commented on June 3, 2024

It seems that your file is from B.04 chemstation and maybe the rainbow parser can only handle files from b.03 chemstation

from chromconverter.

Phenomniverse avatar Phenomniverse commented on June 3, 2024

I'll confirm the chemstation version when I'm back in the office tomorrow. It's possible we have different versions on different machines so I might be able to try more than one version. I will email you re: sharing FID files.

from chromconverter.

ethanbass avatar ethanbass commented on June 3, 2024

OK sounds good. I am making some progress by the way figuring out the binary. I don't usually do this myself but I decided to try and I am getting back a chromatogram that looks a lot like the one you sent me, so I'm pretty sure it shouldn't be too hard to write a parser directly or R or adapt the one in the rainbow package

from chromconverter.

ethanbass avatar ethanbass commented on June 3, 2024

I added a new parser in ba59590 to read these agilent FID files natively in R. It seems to work for both the 179 type files and 181 type files. If anyone reading this has a 180 type file to see where it falls along this spectrum that would be helpful.

from chromconverter.

Phenomniverse avatar Phenomniverse commented on June 3, 2024

Ok, our GCs are mostly running various iterations of Chemstation B.04.03, but we do have a couple on OpenLab CDS Chemstation Edition, and a couple on a newer version of OpenLab. We also have an GCMS running MSD Chemstation D.02.00.275.

We have GCFID data going back to 2013, and maybe earlier, which I suspect has been collected using older versions of chemstation.

Your parser (parser="chromconverter", format_in='chemstation_fid') works pretty well for more recent files, it doesn't work so well with the 2013 file that I tried.

With the newer files, the actual chromatogram data is good, which is the important bit, but some of the attribute fields that read_chroms generates are a bit off. For my purposes this information is more readily accessed through the other files stored in the .D folder anyway (eg .xls, .xml, .txt files, depending on what was specified in the chemstation acquisition method).
But for further troubleshooting if you want to delve into it, the read_chroms attribute called 'notebook' is actually the sample name. The attribute called "parent.directory" is the operator's name as entered in chemstation when the sequence is started. And the "instrument" attribute seems to be populated by a concatenation of the acquisition method name and a string ("GCI\002GC\024" in the file I looked at) which doesn't related to the instrument in any way that is obvious to me.
I note that when outputing to .CSV, read_chroms is now saving the file as per the directory name, which is helpful,

Regarding the older files (tested on one from 2013 and another from 2014), the parser isn't working as expected. The rt (x-axis) data seems okay, but the value (y-axis) data is very strange. See image below (I've ploted as points because the line plot just fills the whole screen with black). Also, the attributes fields contain a lot of unreadable characters. I'm not sure what version of chemstation these older files were generated by, although maybe its contained in the software field, if it could be interpreted. The newer file just had "Asterix ChemStation" in that field though.

attr(,"version")
[1] "181"
attr(,"file.type")
[1] "GC DATA FILE\022\003\005Èå\022\022\003\005!\u009dY|´"
attr(,"notebook")
[1] "blue cypress 111wÿÿÿÿ¸é\022ÐÁüw\030\aóNì\022ÿÿÿÿ©\024xBLUE CYPRESS.M\022ø£S\020Àì\022Üõ\022äè\022hè\022\003Œ*øwó\030\aó\003ÐH\003\b@è\022€w0ê\022U\037øwÐ*øwÿÿÿÿ@ê\022ÐÁüw\030\aóÿÿÿÿ©\024xDS\\BLUE CYPRESS.M\\\022H\003\001v\r"
attr(,"parent.directory")
[1] "M_GC-3ú¢ÂiwNì\022ç\016ö\022áw¢î\022lû\022\037Ð\020Pÿÿÿÿ,ì\022\bs\020P@„î\022\001¸î\022Ð0=@tì\022\003Œ*øw€ì\022\003Œ*øwó\030\aó\003ÐH\003\bXì\022€wHî\022U\037øwÐ*øwÿÿÿÿXî\022ÐÁüw\030\aóä\"UXŽV"
attr(,"run_date")
[1] "23/04/2014 11:49:59 AMlû\022j½xó"
attr(,"instrument")
[1] "HP G1530A\003H\003\b\03000\027\002GC\022Ðí\022ÐH\003\bœî\022\023¬î\022ÐWøwìWøw\016BLUE CYPRESS.M|\023¼\001\v\001XŽV\003¸\001\v\001\001\u0090î\022\002lû\022D\037\\|¨!W|ÿÿÿÿ4ï\022ä\"U\"cBÐï\022ä\"Ux\004À\021\bàáÃ\001\004(ï\022"
attr(,"method")
[1] "BLUE CYPRESS.M|\023¼\001\v\001XŽV\003¸\001\v\001\001\u0090î\022\002lû\022D\037\\|¨!W|ÿÿÿÿ4ï\022ä\"U\"cBÐï\022ä\"Ux\004À\021\bàáÃ\001\004(ï\022x>Ö\030,¼\001\v\001´\001\v\001\020@<ï\022ð?@ï\022\001ð?\001Tï\022 ¤ãwÈ3\02400\027\001"
attr(,"software")
[1] "\v\001\002œ\v\001ÄC\001ý\a\002\024È/\024œ\v\001Ä\177\002è2ý\a¨3\024\001\177\016\177\002hô\022\034G£\001ˆ’C\001ˆ’C\001ˆ’C\001ˆ’C\001ˆ’C\001ˆ’C\001ˆ’C\0013\021e<\b\v\001\024ò\022çäX|ëáX|q4e0ú\0248÷\001\bèñ\022\003\002\022\a\004@aø\a\t\004\021v\024ÝáwØ\u0090\023Pñ\022\035Œ*øwóø\vó\0350Õü\a(ñ\022\026\030ó\022U\037øwÐ*øwÿÿÿÿ(ó\022ÐÁüwø\vó \005M$êø\a>\003\003Í«ºÜ\u0090ñ\022Akáw\024ÝáwÆ\004S\021v \005M€_Œ\b€_Œ\b°ñ\022sPâw\024ÝáwÆ\004S\021v \005MÐñ\022æŒÿv\024ÝáwÆ\004S\021v \005M"
attr(,"unit")
[1] "pA"
attr(,"signal")
[1] ""
attr(,"time_range")
[1] 2.291713e-05 3.000312e+01
attr(,"data_format")
[1] "long"
attr(,"parser")
[1] "chromConverter"

image

from chromconverter.

Phenomniverse avatar Phenomniverse commented on June 3, 2024

Further to the above, the chromconverter parser works well for FID data generated on OpenLab CDS Chemstation Edition C.01.

from chromconverter.

ethanbass avatar ethanbass commented on June 3, 2024

Thanks for sending the additional files and doing all this testing. I actually have a function in the package already that reads the chemstation XLS files and attaches some of the metadata. I forgot to "activate it" for the new parser, but probably that would be worth doing. I will try to fix up the attribute fields from the binary files as well when I have time, thanks for the tips.

I'm not sure what's going on with the other chemstation files you sent yet. They seem to be encoded differently than the newer files even though they're also the "181"-type files

from chromconverter.

ethanbass avatar ethanbass commented on June 3, 2024

@Phenomniverse I just updated the agilent FID parser to a new version that can read the older 181 files you sent me. (06a80bf). Give it a whirl and please let me know if you have any other FID files that aren't being translated correctly.

from chromconverter.

Phenomniverse avatar Phenomniverse commented on June 3, 2024

Hi @ethanbass happy new year! Great to know that you're still working on this!

Ok, so I tested the latest update against about a dozen random GCFID files dating back to ~2004, and it appears to be working nicely, I haven't found any that it doesn't work for yet, but will keep you updated if I do. I did have to tweak my code a little bit because it appears that the structure of the output from read_chroms has changed a little bit with this latest update. Previously I was able to convert the read_chroms output to a dataframe and it would have two columns capturing the x- and y-axis values respectively. Now it seems that the x-axis data is appearing as the row names when converted to a data frame. This isn't a big problem, I can change it back to the format that the rest of my code expects with one line. Just curious as to why the change in the structure of the read_chroms output?

from chromconverter.

ethanbass avatar ethanbass commented on June 3, 2024

Glad it seems to be working. Sorry for the unexpected change re: the format of the data.frames. This is still not completely consistent between different parsers wrapped by the read_chroms function. The way I have it now is actually more consistent with the output of most of the other parsers -- the way you were accustomed to with the two columns was more of an outlier. I could consider maybe adding an additional option to generate data.frames in the two column format, if you think it would be helpful.

from chromconverter.

ethanbass avatar ethanbass commented on June 3, 2024

Thank you as well for all your helpful feedback! By the way, the data.frame generated by read_chroms should actually contain the same metadata as the matrix version, but it doesn't automatically get printed to the console for whatever reason. You should be able to access it using attributes().

from chromconverter.

Phenomniverse avatar Phenomniverse commented on June 3, 2024

read_chroms outputs as a list:

fid <- read_chroms("FID1A.CH",parser="chromconverter",format_in="chemstation_fid")

typeof(fid)
[1] "list"

str(fid)
List of 1
 $ FID1A.CH: num [1:36001, 1] 22.7 22.7 22.7 22.7 22.7 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:36001] "-0.000584383328755697" "0.000248926827725658" "0.00108223698420701" "0.00191554714068837" ...
  .. ..$ : chr "Intensity"
  ..- attr(*, "version")= chr "181"
  ..- attr(*, "sample_name")= chr "Wild Crafted Buddha Wood Oil Batch 52"
  ..- attr(*, "run_date")= chr "07-Sep-22, 17:41:36"
  ..- attr(*, "instrument")= chr "GCI"
  ..- attr(*, "method")= chr "EO BASEMETHOD 2020.M"
  ..- attr(*, "software")= chr "Asterix ChemStation"
  ..- attr(*, "unit")= chr "pA"
  ..- attr(*, "signal")= chr ""
  ..- attr(*, "time_range")= num [1:2] -0.000584 29.998581
  ..- attr(*, "data_format")= chr "long"
  ..- attr(*, "parser")= chr "chromConverter"

I turn it into a dataframe using:

fid_df <- as.data.frame(fid)
fid_df<-setDT(fid_df,keep.rownames=TRUE)[]
colnames(fid_df)<-c('rt','value')

The setDT function from data.table package is converting the rownames into a column here.
I lose the attributes data in converting it to a data frame, but I can always access it from the original output if I need it.

Maybe a better way to do it would be:
fid_df<-data.frame(rt=rownames(fid$FID1A.CH), value=fid$FID1A.CH[1])

I suppose I could pass the attributes from fid$FID1A.CH to fid_df as a comment, but there's no real need for it.

from chromconverter.

ethanbass avatar ethanbass commented on June 3, 2024

You could do something like this if you want to transfer the metadata over:

fid_df <- lapply(fid, function(xx){
  # convert to data.frame
  x_new <- data.frame(rt=rownames(xx), value=xx)
 # transfer metadata
  mostattributes(x_new) <- attributes(xx)
  # if you don't want rownames include the following line
  rownames(x_new) <- NULL
  x_new
})

from chromconverter.

Phenomniverse avatar Phenomniverse commented on June 3, 2024

yeah okay, playing around with this makes me realise that retaining the metadata is pretty helpful.
I notice that the metadata for some of those older FID files contains some strings of nonsense characters although the actual metadata is in the strings as well. For example, see below, where some of the attribute fields are normal and others contain extraneous characters :

$ attributes:List of 11
  ..$ version    : chr "181"
  ..$ sample_name: chr "REF TTOØ\022\024\002\002‘|E"
  ..$ run_date   : chr "12/12/2018 12:03:24 AM"
  ..$ instrument : chr "HP G1530AÀ&$÷j‘|\030ß\022"
  ..$ method     : chr "REF-TTO.MÐý\177àý\177Ðý\177:4@\004x(e$Hß\022"
  ..$ software   : chr "‰$‘|?'‘|\002\a@u\002\a$á\022\a$‘|\177'‘|\002\a´à\022\034á\022Dá\022\235'‘|X \003@u\002\a$á\022n \200|\002\a@u\0"| __truncated__
  ..$ unit       : chr "pA"
  ..$ signal     : chr ""
  ..$ time_range : num [1:2] 1.82e-04 3.00e+01
  ..$ data_format: chr "long"
  ..$ parser     : chr "chromConverter"

from chromconverter.

ethanbass avatar ethanbass commented on June 3, 2024

from chromconverter.

ethanbass avatar ethanbass commented on June 3, 2024

@Phenomniverse FIY this issue with the metadata in the chemstation 181 files should be resolved in the latest version (v0.4.0). Also, you can now toggle the format of the data.frames using the data_format argument: wide format (the default) will return retention times as row names while long format will return retention times in their own column.

from chromconverter.

Related Issues (17)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.