ethanbass / chromconverter Goto Github PK

View Code? Open in Web Editor NEW

23.0 4.0 3.0 6.9 MB

Parsers for chromatography data in R (HPLC-DAD/UV, GC-FID, MS)

Home Page: https://ethanbass.github.io/chromConverter/

License: GNU General Public License v3.0

R 99.96% Shell 0.04%

chromatography r-package hplc hplc-dad hplc-uv cheminformatics metabolomics metabolomics-data open-science gc-fid

chromconverter's People

Contributors

Stargazers

Watchers

Forkers

exqmjmz yguitton he-hai

chromconverter's Issues

Converting .cdf to .mzml - different parsers?

I am trying to convert a .cdf file from a Shimadzu machine to .mzml to use in GNPS, and when I try, it says that exporting to .mzml requires the openchrom parser, but when you use openchrom it says it doesn't recognize .cdf. Is there a way around this? From the documentation it seems like standard read_chroms parsers should recognize both .cdf and .mzml

error when I don't specify a parser:
The selected export format is currently only supported by openchrom parsers.

error when I specify openchrom:
The ‘cdf’ format can be converted using the following parsers: ‘chromconverter’.
The ‘openchrom’ parser can take the following formats as inputs:
‘msd’, ‘csd’, ‘wsd’.

Here's the code I'm trying to use (without specifying a parser), if it's simply just something wrong with how it's written:
data <- read_chroms(paths = "mypath/filename.CDF", format_in = "cdf", export = TRUE, path_out = "mypath/filename.mzml", export_format = "mzml")

Would appreciate any help!!

importing cdf (raw) from Waters

Hi Ethan,

This is a reply to the issue that I opened previously (ethanbass/chromatographR#29).

When exporting from the Waters instrument via their software (Empower) there are two options: 1) ascii (the resulting files have the .arw extension) and 2) raw (the resulting files have the .cdf extension).

Weird error when trying to convert to cdf

Hello, extreme noob to R here... Am trying to convert a csv HPLC chromatogram (exported from agilent chemstation) to .cdf format.

I'm getting:

read_chroms("test2.CSV", format_in = "chemstation_csv", find_files = FALSE, export=TRUE, export_format = "cdf")
Export directory not specified! Export files to temp directory (y/n)?y
Error in if (is.null(lambda) && ncol(x) + as.numeric(attr(x, "data_format") == :
missing value where TRUE/FALSE needed

Whereas when export_format = "csv", the command succeeds.

It could be some extremely minor problem, but due to my inexperience I have no idea whether this is a real problem or not.

Any suggestions/pointers/advice would be greatly appreciated! thanks :)

Interface to OpenChrom file parsers

OpenChrom includes a command line interface that can be used to call their file parsers: https://github.com/OpenChrom/openchrom/wiki/CLI.

R should be able to write an appropriate batch job file and feed it to OpenChrom with a system call to access their file converters. For example, they have parsers for several FID formats which don't seem to be available otherwise.

Alternatively, if someone who understands Java reads this and wants to figure out how to call the parsers directly from R (e.g. using rJava) I think that would be great as well.

Reading Intensities Correctly

Hello! Thank you for this excellent repository. I'm trying to use read_chemstation_ch.R to read in a v179 .ch GC-FID file. The retention times seem right, but the intensities are very different from what I see when I open the file in MassHunter. Could there be an issue with the scaling factor that is applied?
I've attached the .ch (FID1A.txt) file as well as an excel where I compared the MassHunter and read_chemstation_ch intensities. Any help is appreciated!

FID1A.txt
Test.xlsx

Reading shimadzu .lcd files

Dear Ethan,
I tried to use your code for reading the raw data of our Shimadzu HPLC, thanks for that code!

I am not a programmer and I am mainly in Python and not in R. Here are some results from our (mine and my colleaque using R) last days working on this, I wonder whether you would like to include the issues we found for your R code.

We are both using a new Mac with a M1 chip: the standard installation, some dependencies in Miniconda are not provided for this processor architecture! So, we could not use your code directly but did also not spend time to figure out the details of which dependency is failing. My colleaque used an older PC instead.
Instead I am trying to translate the codes for reading shimadzu-.lcd files in Python. This code is now (most likely) working for my data and I can also extract the data of our fluorometer in addition to that of the PDA.

I needed to change mainly two things:

your line 147 in read_shimadzu_lcd.R, mat <- matrix(NA, nrow = fsize/(n_lambdas*1.5), ncol = n_lambdas)
This is about the size of the data stream which depends on the number of wavelength from the PDA and the total time of the HPLC run. A simple factor 1.5 does not work for my data. Instead, I first scan the PDA raw data stream for the start bits of each header of the data set and sum them up. Second, I now found the entry in a stream that contains the number of datasets and can simply be read out.
your line 249 in function decode_shimadzu_block: buffer[[2]] <- twos_complement(substr(bin, 5, nchar(bin))),
This line cuts off the first 4 bits of the bit string that finally contains the number of the difference to the former value. It worked this way for my PDA data, but could not reproduce the results of the fluoremeter at some positions and distorted the signal. I needed some time to understand this but at the end the funstion simply failed when the value for the difference is a large number and mpre bytes are needed to decode it. At the end I simple reduced the cut and are using the bits from position 3. This seemed to work!
My question here is: did you find the number '5' simply by trial and error, or was there a reason?

If there is interest from your side, I can spend some time to described more details, e.g. where to find the fluorescence data and how to read it or the file size in the .lcd file.
Best
Rüdiger

Path problems with system calls on Windows

Hi,
I'm struggling to get read_chroms to accept a path_out, and its unclear what formatting it is looking for. It seems to want to add a "/" to the front of whatever I specify as path_out. The default behaviour is supposed to save it in the current working directory, but it doesn't do that either, it prompts whether to save it in 'temp', but it can't find 'temp' folder even when I manually created it as a subfolder of the working directory. I would prefer to give the full path from the drive name (eg "C:/ ... "), but this doesn't work because its putting a "/" in front of the drive name. See below where I've used getwd() to provide the path to the current directory.

read_chroms(paste0(archive_sample_dirs[1],'/FID1A.ch'),find_files=FALSE, path_out=getwd(), export=TRUE, parser="openchrom", format_in='csd',export_format = 'csv')

Error in read_chroms(paste0(archive_sample_dirs[1], "/FID1A.ch"), find_files = FALSE,  : 
  The export directory '/W:/ARL/Analytical/OPERATOR METHOD TEMPLATES/chemstation data/' does not exist.

If I make path_out = "", it ignores this problem at least for long enough to encounter an additional problem, which is that it can't find the path to the OpenChrom command line. It seems to want the pathname with filename, sans extension, eg "C:/Users/ ... /Programs/OpenChrom/openchrom" for this. If I type this in, it moves on to another error (I suspect its back to the first error). It seems to save this path in the path_to_openchrom_commandline.txt file, but it doesn't seem to be able to find OpenChrom unless I manually type it in each time. When I do so, I get:

Error in write_xml.xml_document(x, file = path_xml) : Error closing file
In addition: Warning message:
In write_xml.xml_document(x, file = path_xml) : Permission denie [1501]

Python environment sometimes fails to load on Windows due to dependency issues

Python environment sometimes fails to load (due to issues building C++ extensions?) as discussed in issue #7. I'm not sure what can be done about this directly, but at least providing better documentation about what the actual requirements are would probably be helpful.

seeking chemstation version 181 file

I am seeking a chemstation version 181 file to use for unit testing. If anyone has a file they wouldn't mind contributing, I would be grateful.

Add support for Waters PDA files

See ethanbass/chromatographR#26

Add R-based parser for Chemstation UV files

I don't want to be so dependent on the python and rust-based dependencies for interpreting Chemstation UV files. I think it shouldn't be too difficult to write a parser directly in R, following the documentation kindly provided in the rainbow-api package (https://rainbow-api.readthedocs.io/en/latest/agilent/uv.html).

Update chemstation 181 parser to read older FID files

The new read_chemstation_fid parser can read some of the newer chemstation 181 .ch files supplied by Phenomniverse but the older files seem to be encoded differently and aren't being interpreted correctly. Also see issue #6. (These 181 files are apparently not able to be read by any of the external libraries currently included with chromConverter either).

read_chroms() can't find openChrom executable

Apologies in advance, this is my first time posting a issue on GitHub. I am trying to use read_chroms() to convert Agilent .ch WSD files to .csv, however I cannot get read_chroms() to find the openchrom executable. Also openchrom CLI is not behaving as expected.

OS: MacOS Ventura 13.1

Environment: R interpreter in zsh.

Executing:

dat <- read_chroms(paths = file_path, format_in = 'wsd', parser = 'chromconverter', format_out = "data.frame", export = FALSE)

Produces the following error dialog:

Export directory not specified! Export files to `temp` directory (y/n)?y
Warning in configure_call_openchrom() : OpenChrom not found!
Please provide path to `OpenChrom` command line):/Applications/Eclipse.app/contents/MacOS/openchrom
    The OpenChrom command-line interface is turned off!
    Update `openchrom.ini` to activate the command-line interface (y/n)?
    (Warning: This will deactivate the GUI on your OpenChrom installation!)
y
sh: /Applications/OpenChrom_CL.app/Contents/MacOS/openchrom: No such file or directory
Error in file(file, "rt") : cannot open the connection
In addition: Warning messages:
1: In system(paste0(openchrom_path, " -nosplash -cli -batchfile ",  :
  error in running command
2: In file(file, "rt") :
  cannot open file '/Users/jonathan/001_chromconverter_test_env/temp/DAD1D.csv': No such file or directory

Then running it again produces a different message:

Export directory not specified! Export files to `temp` directory (y/n)?y
sh: /Applications/OpenChrom_CL.app/Contents/MacOS/openchrom: No such file or directory
Error in read.table(file = file, header = header, sep = sep, quote = quote,  : 
  no lines available in input
In addition: Warning message:
In system(paste0(openchrom_path, " -nosplash -cli -batchfile ",  :
  error in running command

Manually setting up OpenChrom CLI and running ./openchrom -nosplash -cli --help results in an error message as below, instead of the help dialog:

WARNING: Using incubator modules: jdk.incubator.foreign, jdk.incubator.vector
<<<< EncryptedJarClassLoader created >>>>
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.

Which I don't think is a good thing.

Investigating the openchrom path, I found that

../chromConverter/shell/path_to_openchrom_commandline.txt

contained

/Applications/Eclipse.app/contents/MacOS/openchrom

the path I initally inputted, so that's behaving as expected, however I'm not sure why read_chrom() is looking in a different path. Any help would be appreciated.

Issue reading in ChemStation .ch files

Hi Ethan-

I am importing chemstation .ch files, and for a significant portion of my files I get NaN values for intensity. Typically I have four .ch files per sample run from different wavelengths, and sometimes it happens for all four and sometimes only some of them. I can open the files without issue in ChemStation, so I know the files contain data. Unfortunately we don't have .uv files for this particular dataset.

I am running chromConverter 0.4.2

Github won't let me attach .ch files, but there are some example offensive files here if you would like to check them out:
https://drive.google.com/drive/folders/1dhJe-JdV3ilXz_a_NBGWJ6COX4ENocxz?usp=sharing

With these four example files (all different wavelengths from the same run, I can read in Signal C and D but not A and B. Here is some example code:

`chroms <- read_chroms(paths="C:/Chem32/1/DATA/Loren RV/PHENOLICS 2022 2022-02-03 14-31-44", format_in = "chemstation_ch")

head(chroms[[1]])
head(chroms[[3]]) `

Any tips appreciated!!

Susan

Add Shimadzu .lcd parser?

Hi, is it possible to add a parser for Shimadzu .lcd files? I have attached a file example for reference.
Best, Silas
Anthocyanin_2_MeOH001.zip

package fails to load if python dependencies can't be installed on Windows

Hi Ethan,

Not sure what has happened because this was working previously, but now I'm not able to load the chromConverter package.

From the error message I gather that there are some issues with the python package 'python-lzf' installation process, looks like it needs an install of Microsoft Visual C++, and also I also see this warning:

SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.

I'll try manually installing the build tools for Microsoft Visual C++ and see if that resolves the issue, but in the mean time, here's the full output of library(chromConverter):

> library(chromConverter)
Configuring package 'chromConverter': please wait ...

C:\users>CALL "C:\Users\regan\AppData\Local\r-miniconda\condabin\activate.bat" "C:\Users\regan\AppData\Local\r-miniconda\envs\r-reticulate" 

C:\users>conda.bat activate "C:\Users\regan\AppData\Local\r-miniconda\envs\r-reticulate" 

(r-reticulate) C:\users>"C:/Users/regan/AppData/Local/r-miniconda/envs/r-reticulate/python.exe" -m pip install --upgrade --no-user "aston" "numpy" "pandas" "rainbow-api" "scipy" 
Collecting aston
  Using cached Aston-0.7.1-py3-none-any.whl (74 kB)
Requirement already satisfied: numpy in c:\users\regan\appdata\local\r-miniconda\envs\r-reticulate\lib\site-packages (1.23.5)
Collecting pandas
  Using cached pandas-1.5.2-cp38-cp38-win_amd64.whl (11.0 MB)
Collecting rainbow-api
  Using cached rainbow_api-1.0.1-py3-none-any.whl (21 kB)
Collecting scipy
  Using cached scipy-1.9.3-cp38-cp38-win_amd64.whl (39.8 MB)
Requirement already satisfied: pytz>=2020.1 in c:\users\regan\appdata\local\r-miniconda\envs\r-reticulate\lib\site-packages (from pandas) (2022.6)
Collecting python-dateutil>=2.8.1
  Using cached python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)
Collecting python-lzf
  Using cached python-lzf-0.2.4.tar.gz (9.3 kB)
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Collecting matplotlib
  Using cached matplotlib-3.6.2-cp38-cp38-win_amd64.whl (7.2 MB)
Collecting lxml
  Using cached lxml-4.9.1-cp38-cp38-win_amd64.whl (3.6 MB)
Collecting six>=1.5
  Using cached six-1.16.0-py2.py3-none-any.whl (11 kB)
Collecting cycler>=0.10
  Using cached cycler-0.11.0-py3-none-any.whl (6.4 kB)
Collecting kiwisolver>=1.0.1
  Using cached kiwisolver-1.4.4-cp38-cp38-win_amd64.whl (55 kB)
Collecting pillow>=6.2.0
  Using cached Pillow-9.3.0-cp38-cp38-win_amd64.whl (2.5 MB)
Collecting fonttools>=4.22.0
  Using cached fonttools-4.38.0-py3-none-any.whl (965 kB)
Collecting pyparsing>=2.2.1
  Using cached pyparsing-3.0.9-py3-none-any.whl (98 kB)
Collecting contourpy>=1.0.1
  Using cached contourpy-1.0.6-cp38-cp38-win_amd64.whl (163 kB)
Collecting packaging>=20.0
  Using cached packaging-21.3-py3-none-any.whl (40 kB)
Building wheels for collected packages: python-lzf
  Building wheel for python-lzf (setup.py): started
  Building wheel for python-lzf (setup.py): finished with status 'error'
  error: subprocess-exited-with-error
  
  × python setup.py bdist_wheel did not run successfully.
  │ exit code: 1
  ╰─> [5 lines of output]
      running bdist_wheel
      running build
      running build_ext
      building 'lzf' extension
      error: Microsoft Visual C++ 14.0 or greater is required. Get it with "Microsoft C++ Build Tools": https://visualstudio.microsoft.com/visual-cpp-build-tools/
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for python-lzf
  Running setup.py clean for python-lzf
Failed to build python-lzf
Installing collected packages: python-lzf, six, scipy, pyparsing, pillow, lxml, kiwisolver, fonttools, cycler, contourpy, python-dateutil, packaging, aston, pandas, matplotlib, rainbow-api
  Running setup.py install for python-lzf: started
  Running setup.py install for python-lzf: finished with status 'error'
  error: subprocess-exited-with-error
  
  × Running setup.py install for python-lzf did not run successfully.
  │ exit code: 1
  ╰─> [7 lines of output]
      running install
      C:\Users\regan\AppData\Local\r-miniconda\envs\r-reticulate\lib\site-packages\setuptools\command\install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
        warnings.warn(
      running build
      running build_ext
      building 'lzf' extension
      error: Microsoft Visual C++ 14.0 or greater is required. Get it with "Microsoft C++ Build Tools": https://visualstudio.microsoft.com/visual-cpp-build-tools/
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: legacy-install-failure

× Encountered error while trying to install package.
╰─> python-lzf

note: This is an issue with the package mentioned above, not pip.
hint: See above for output from the failure.
Error: package or namespace load failed for ‘chromConverter’:
 .onLoad failed in loadNamespace() for 'chromConverter', details:
  call: NULL
  error: Error installing package(s): "\"aston\"", "\"numpy\"", "\"pandas\"", "\"rainbow-api\"", "\"scipy\""
In addition: Warning message:
In shell(fi, intern = intern) :
  'C:\Users\regan\AppData\Local\Temp\RtmpINwiN5\file2c3c7541260c.bat' execution failed with error code 1

Python bindings don't work correctly in latest version of RStudio without altering Python settings (ModuleNotFoundError)

Recent versions of RStudio made some strange changes to the way reticulate functions, as discussed in this thread, which interfere with chromConverter's python bindings. chromConverter will still load but python-based parsers will likely not be available if a project is loaded. When trying to access python parsers, a module not found error will be generated. As far as I can tell, this is a bug with RStudio rather than chromConverter (though RStudio developers seem to think this is the expected behavior).

This issue can apparently be resolved by unchecking a box in the RStudio settings. To do this, open RStudio settings and navigate to the Python pane (Tools:Global Options:Python). Then uncheck the box that says "Automatically activate project-local Python environments" and click Apply. RStudio must then be restarted for the settings to take effect.

Path interpretation issue in read_chroms

Hi,

There is a problem with the read_chrom function, a "/" is added at the beginning and end of path_out which prevents the function from working properly.

if i try thise code (on Windows):

library(chromConverter)

if (dir.exists("DATAs/OUTPUT/neg")){
print("THE DIRECTORY EXIST")
}

dat <- read_chroms(paths = "DATAs/INPUT/neg", format_in = "thermoraw", path_out = "DATAs/OUTPUT/neg")

i have this output:

library(chromConverter)

if (dir.exists("DATAs/OUTPUT/neg")){
print("THE DIRECTORY EXIST")
}
1] "THE DIRECTORY EXIST"

dat <- read_chroms(paths = "DATAs/INPUT/neg", format_in = "thermoraw", path_out = "DATAs/OUTPUT/neg")
Error in read_chroms(paths = "DATAs/INPUT/neg", format_in = "thermoraw", :
The export directory '/DATAs/OUTPUT/neg/' does not exist.

Chromatograms from mzXML files

Hi dear developers,

We want to use your tools to generate chromatograms for our article.

We have data in mzXML format for which we do not have the constructor files.

It seems that your tool does not support this format. Would it be possible to add it?

thank you