clintval / sample-sheet Goto Github PK
View Code? Open in Web Editor NEWParse Illumina sample sheets with Python
Home Page: https://sample-sheet.rtfd.io
License: MIT License
Parse Illumina sample sheets with Python
Home Page: https://sample-sheet.rtfd.io
License: MIT License
When running
import os
import sys
import csv
from sample_sheet import SampleSheet
url = 'https://raw.githubusercontent.com/clintval/sample-sheet/master/tests/resources/{}'
sample_sheet = SampleSheet(url.format('paired-end-single-index.csv'))
print sample_sheet
I get this
>> python3 SampleSheet.csv
File "SampleSheet.csv", line 1
[Header],,,,,,,,,,
^
SyntaxError: invalid syntax
both in the example SampleSheet and my own.
Running code installed with pip3, not the one on the repository.
Is it possible to bypass index validation?
sample_sheet = SampleSheet(args.samplesheet)
File "/home/ec2-user/cellranger-docker/env/lib/python3.6/site-packages/sample_sheet/__init__.py", line 418, in __init__
self._parse(self.path)
File "/home/ec2-user/cellranger-docker/env/lib/python3.6/site-packages/sample_sheet/__init__.py", line 524, in _parse
self.add_sample(Sample(dict(zip(sample_header, line))))
File "/home/ec2-user/cellranger-docker/env/lib/python3.6/site-packages/sample_sheet/__init__.py", line 296, in __init__
raise ValueError(f'Not a valid index: {value}')
ValueError: Not a valid index: SI-GA-F8
See #95
Clearly describe the add_sample()
method along with all validation checks performed.
sample-sheet/sample_sheet/_sample_sheet.py
Lines 482 to 502 in 87deb18
Manifests
sectionHi @clintval,
do you have any plans of supporting the new SampleSheet v2 format?
Seems Illumina has released new tools and along a new version of the Samplesheet.
(https://blog.software.illumina.com/2020/07/30/announcing-the-release-of-bcl-convert-software/)
Cheers,
Florian
I have a sample sheet that looks like this:
[Header]
IEMFileVersion,4
Date,11/16/2015
Workflow,GenerateFASTQ
Application,RNA-Seq
Assay,TruSeq LT
Description
Chemistry,Default
[Reads]
75
75
[Settings]
Adapter,AGATCGGAAGAGCACACGTCTGAACTCCAGTCA
AdapterRead2,AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT
[Data]
Sample_ID,Sample_Name,Sample_Plate,Sample_Well,I7_Index_ID,index,GenomeFolder,Sample_Project,Description
MiSeq_L20151106_A1,MiSeq_L20151106_A1,,,AR001,ATCACG,Homo_sapiens\UCSC\hg19\Sequence\WholeGenomeFasta,QC,GeneexperessionQC
MiSeq_L20151106_B1,MiSeq_L20151106_B1,,,AR003,TTAGGC,Homo_sapiens\UCSC\hg19\Sequence\WholeGenomeFasta,QC,GeneexperessionQC
When reading this file using:
ss = IlluminaSampleSheet('SampleSheet.csv')
results in
$ python test.py
Traceback (most recent call last):
File "test.py", line 5, in <module>
ss = IlluminaSampleSheet(sample_sheet_path)
File "/Users/golharr/workspace/.venv/lib/python3.6/site-packages/sample_sheet/__init__.py", line 419, in __init__
self._parse(self.path)
File "/Users/golharr/workspace/.venv/lib/python3.6/site-packages/sample_sheet/__init__.py", line 537, in _parse
key, value, *_ = line
ValueError: not enough values to unpack (expected at least 2, got 1)
Look like the trailing *_ is causing the problem on 537. If you remove that, the key, value gets read correctly. The *_ is not used and hence no need to include it here.
After making the change to line 537, the same problem arises for the Description line, since there is no comma, only a key is provided, and no corresponding value. I think a better check would be to execute these lines is if len(line) >= 2. I'l submit a PR that works for me.
Hi there. Cool lib!
Do you have any intention on adding the ability to use this to create new (and write) sample sheets? Seems like most of the mechanics are already in place, probably just need a function to write back to string.
Hi, thanks for putting together this great library. It seems like currently it is not supporting empty lines which the spec allows:
Empty lines or lines that consist entirely of commas and/or whitespace characters are valid, but ignored.
Traceback (most recent call last):
File "/snap/pycharm-community/62/helpers/pydev/pydevd.py", line 1664, in <module>
main()
File "/snap/pycharm-community/62/helpers/pydev/pydevd.py", line 1658, in main
globals = debugger.run(setup['file'], None, None, is_module)
File "/snap/pycharm-community/62/helpers/pydev/pydevd.py", line 1068, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "/snap/pycharm-community/62/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/home/cborroto/projects/samplesheet/scripts/wdlhelper.py", line 210, in <module>
main()
File "/home/cborroto/projects/samplesheet/scripts/wdlhelper.py", line 205, in main
'summary': summary,
File "/home/cborroto/.pyenv/versions/miniconda3-4.3.30/envs/samplesheet/lib/python3.6/site-packages/fire/core.py", line 127, in Fire
component_trace = _Fire(component, args, context, name)
File "/home/cborroto/.pyenv/versions/miniconda3-4.3.30/envs/samplesheet/lib/python3.6/site-packages/fire/core.py", line 366, in _Fire
component, remaining_args)
File "/home/cborroto/.pyenv/versions/miniconda3-4.3.30/envs/samplesheet/lib/python3.6/site-packages/fire/core.py", line 542, in _CallCallable
result = fn(*varargs, **kwargs)
File "/home/cborroto/projects/samplesheet/scripts/wdlhelper.py", line 39, in launch
sample_sheet = SampleSheet(sample_sheet)
File "/home/cborroto/.pyenv/versions/miniconda3-4.3.30/envs/samplesheet/lib/python3.6/site-packages/sample_sheet/_sample_sheet.py", line 376, in __init__
self._parse(str(self.path))
File "/home/cborroto/.pyenv/versions/miniconda3-4.3.30/envs/samplesheet/lib/python3.6/site-packages/sample_sheet/_sample_sheet.py", line 468, in _parse
header_match = self._section_header_re.match(line[0])
IndexError: list index out of range
Getting a syntax error when installing on a fresh virtualenv system
(pipeline_run3) data_management> python
Python 3.4.5 (default, Sep 08 2016, 13:41:53) [GCC] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>
(pipeline_run3) pipeline@SequenceKing:~/cancerplus/code/data_management> pip3 install sample_sheet
Collecting sample_sheet
Using cached https://files.pythonhosted.org/packages/fc/75/69cab3b91ea745a909bedc53f30789414eedc11ce2ebe6189733560a9583/sample_sheet-0.6.0.tar.gz
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/tmp/pip-install-5eqifgel/sample-sheet/setup.py", line 12
URL = f'https://github.com/clintval/{PACKAGE_NAME}'
^
SyntaxError: invalid syntax
----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-install-5eqifgel/sample-sheet/
Something like:
>>> sample_sheet = SampleSheet('SampleSheet.csv')
>>> first_sample = sample_sheet.samples[0]
>>> first_sample.sample_sheet
SampleSheet("SampleSheet.csv")
But not on a Sample
instantiated plainly:
>>> sample = Sample(dict(Sample_Name='test', Sample_ID='XXX', index='ACGT'))
>>> sample.sample_sheet
None
Would help with tracking the source of samples when reading in more than one sample sheet into an environment and then processing on the samples in other data structures like dictionaries.
We only use it for a case-insensitive dictionary.
Running the test suite on the master branch fails on MacOS.
I am running MacOS Monterey 12.2.1 on a INTEL mac with python 3.7.
pytest Error:
E AssertionError: 'BARC[68 chars]TC\t/System/Volumes/Data/home/user/49-tissue.e[278 chars]\t\n' != 'BARC[68 chars]TC\t/home/user/49-tissue.exp001/49-tissue.GAAC[218 chars]\t\n'
E Diff is 748 characters long. Set self.maxDiff to None to see it.
Full Error:
>tox
...cut
________________________________________ TestSampleSheet.test_to_picard_basecalling_params_output_files _________________________________________
self = <test_sample_sheet.TestSampleSheet testMethod=test_to_picard_basecalling_params_output_files>
def test_to_picard_basecalling_params_output_files(self):
"""Test ``to_picard_basecalling_params()`` output files"""
bam_prefix = '/home/user'
lanes = [1, 2]
with TemporaryDirectory() as temp_dir:
sample1 = Sample(
{
'Sample_ID': 49,
'Sample_Name': '49-tissue',
'Library_ID': 'exp001',
'Description': 'Lorum ipsum!',
'index': 'GAACT',
'index2': 'AGTTC',
}
)
sample2 = Sample(
{
'Sample_ID': 23,
'Sample_Name': '23-tissue',
'Library_ID': 'exp001',
'Description': 'Test description!',
'index': 'TGGGT',
'index2': 'ACCCA',
}
)
sample_sheet = SampleSheet()
sample_sheet.add_sample(sample1)
sample_sheet.add_sample(sample2)
sample_sheet.to_picard_basecalling_params(
directory=temp_dir, bam_prefix=bam_prefix, lanes=lanes
)
prefix = Path(temp_dir)
assert_true((prefix / 'barcode_params.1.txt').exists())
assert_true((prefix / 'barcode_params.2.txt').exists())
assert_true((prefix / 'library_params.1.txt').exists())
assert_true((prefix / 'library_params.2.txt').exists())
barcode_params = (
'barcode_sequence_1\tbarcode_sequence_2\tbarcode_name\tlibrary_name\n' # noqa
'GAACT\tAGTTC\tGAACTAGTTC\texp001\n' # noqa
'TGGGT\tACCCA\tTGGGTACCCA\texp001\n'
) # noqa
library_params = (
'BARCODE_1\tBARCODE_2\tOUTPUT\tSAMPLE_ALIAS\tLIBRARY_NAME\tDS\n' # noqa
'GAACT\tAGTTC\t/home/user/49-tissue.exp001/49-tissue.GAACTAGTTC.{lane}.bam\t49-tissue\texp001\tLorum ipsum!\n' # noqa
'TGGGT\tACCCA\t/home/user/23-tissue.exp001/23-tissue.TGGGTACCCA.{lane}.bam\t23-tissue\texp001\tTest description!\n' # noqa
'N\tN\t/home/user/unmatched.{lane}.bam\tunmatched\tunmatchedunmatched\t\n'
) # noqa
self.assertMultiLineEqual(
(prefix / 'barcode_params.1.txt').read_text(), barcode_params
)
self.assertMultiLineEqual(
(prefix / 'barcode_params.2.txt').read_text(), barcode_params
)
self.assertMultiLineEqual(
(prefix / 'library_params.1.txt').read_text(),
> library_params.format(lane=1),
)
E AssertionError: 'BARC[68 chars]TC\t/System/Volumes/Data/home/user/49-tissue.e[278 chars]\t\n' != 'BARC[68 chars]TC\t/home/user/49-tissue.exp001/49-tissue.GAAC[218 chars]\t\n'
E Diff is 748 characters long. Set self.maxDiff to None to see it.
tests/test_sample_sheet.py:535: AssertionError
Would be wonderful if it supported the TrueSight 170 style sample sheets.
Hi
I am trying to write a simple logic to reverse complement the i5 index within the sample_sheet object using a custom reverse_complement
function
Can you let me know how can I update the object such as to accomodate the reverse complemented i5 index and then write an updated CSV sample sheet file ?
for sample in sample_sheet:
index_i5 = sample.index2
index_i5_rc = reverse_complement(index_i5)
sample.index2 = index_i5_rc
I'm experiencing two separate issues when trying to install this package:
The README says to do the following:
conda install -c bioconda sample-sheet
However this results in a missing dependency:
Collecting package metadata (current_repodata.json): done
Solving environment: failed
Collecting package metadata (repodata.json): done
Solving environment: failed
PackagesNotFoundError: The following packages are not available from current channels:
- sample-sheet -> terminaltables
Current channels:
<truncated>
This can be fixed by explicitly including -c conda-forge
like so:
conda install -c bioconda -c conda-forge sample-sheet
Using Bitbucket pipelines to test a package, I get the following output when listing sample-sheet
as a dependency in my setup.py
and installing my package:
Downloading https://files.pythonhosted.org/packages/e1/4e/c4af36fcf9f7d3364723a49ec802637e6dbe73725bd2be97e5c7647a0669/sample-sheet-0.9.0.tar.gz
ERROR: Complete output from command python setup.py egg_info:
ERROR: Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/tmp/pip-install-py7rsp3d/sample-sheet/setup.py", line 20, in <module>
long_description=Path('README.md').read_text(),
File "/opt/miniconda3/envs/test/lib/python3.6/pathlib.py", line 1197, in read_text
return f.read()
File "/opt/miniconda3/envs/test/lib/python3.6/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 1207: ordinal not in range(128)
----------------------------------------
ERROR: Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-install-py7rsp3d/sample-sheet/
Note that I do not get this error in a local development environment, so it could be specific to Bitbucket pipelines.
Hi Clint,
we had an issue before (#32) where the same Sample_ID
caused issues, even if the lanes were different.
Now we are having the same issue, but without specifying lanes. This time it's 10X data, which you mentioned briefly in the previous issue.
Basically we would like to merge across lanes and indexes and the corresponding sample sheets end up having the same Sample_ID
across multiple Data
lines.
An example:
Sample_ID,Sample_Name,Sample_Plate,Sample_Well,Index_Plate_Well,I7_Index_ID,index,I5_Index_ID,index2,Sample_Project,Description
PRJ180538_VPH20T,,,,,SI-GA-G6_1,CTGACGCG,,,,
PRJ180538_VPH20T,,,,,SI-GA-G6_2,GGTCGTAC,,,,
PRJ180538_VPH20T,,,,,SI-GA-G6_3,TCCTTCTT,,,,
PRJ180538_VPH20T,,,,,SI-GA-G6_4,AAAGAAGA,,,,
Here we merge across the four indexes and all lanes. However, your library does not currently allow this.
Could you allow duplicated Sample_ID
s or recommend an alternative approach?
Thanks!
Florian
import typing
If the file passed to the SampleSheet constructor contains data but no sections the error produced is:
'SampleSheet' object has no attribute ''
Should output an invalid formatted sample sheet error like "missing sections"
Hi,
I am using using your fantastic library to manipulate existing sample sheets and I have a question/request:
Is it possible to retain/control the order of the columns in the Data section?
I know the order should not really matter, but we try to stick to internal conventions for readability. Besides, the reordering makes it a bit more difficult to compare the original and modified versions of the sample sheets we process.
Perhaps using an OrderedDict instead of just a dict could be a possiblity...?
https://docs.python.org/3/library/collections.html#collections.OrderedDict
Thanks!
Florian
def __iter__(self):
for sample in self.samples:
yield sample
This will negate the need for __next__()
and for binding a temporary _iter = []
variable
One of our lab operators provided an empty samplesheet intending to fill it out later, but never did. As a result, this run made it into our pipeline where we use this package to read the samplesheet. The samplesheet provided is:
[Data],,,,,,,,,
Sample_ID,Sample_Name,Sample_Plate,Sample_Well,I7_Index_ID,index,I5_Index_ID,index2,Sample_Project,Description
No other sections, and no samples. When I loaded this samplesheet using
from sample_sheet import SampleSheet as IlluminaSampleSheet
ss = IlluminaSampleSheet(sample_sheet_path)
no error was thrown. Is that the intended behavior?
from typing import Optional
I finally have a need. And it's a sad sad but very important need.
The smart_open.smart_open
function has been deprecated in version 1.8.1 of the smart_open
library in favor of smart_open.open
.
This throws a warning in the console whenever a sample sheet is opened.
piskvorky/smart_open#268
Version 1.8.1 also added SSH/SCP/SFTP support which throws a warning if paramiko is not installed.
piskvorky/smart_open#267
This could be solved by adding a paramiko requirement and changing smart_open
calls or setting the version of smart_open
to 1.8.0 in the setup file.
Recently downloaded some example data from Illumina and I noticed the sample sheet had "Index" in it, not "index". Unfortunately wasn't able to read the sample sheet because of this. FYI, the data header line looked like:
Sample_ID,Sample_Name,BASESPACE_SAMPLE_RESOURCE_ID,GenomeFolder,I7_Index_ID,I5_Index_ID,Index,Index2,Sample_Well,Manifest,FastqFolder
We have a sample sheet where we've added a custom section which is a headered table (along the lines of the [Data] section. It would be handy to be able to parse such a section using the sample-sheet library. (Currently, the library assumes that the custom section must be a dictionary of key-value pairs).
A made-up example of such a section follows:
[Animals],,
Name,Species,Status
Benji,Dog,Good
Sparky,Cat,Aloof
Sparky,Bird,Sleeping
ReadStructure
class supports all example read structures
+
operator on last read structure tokenTry this:
import sample_sheet._sample_sheet
from sample_sheet._sample_sheet import is_ipython_interpreter
class TestIsIpythonInterpreter(TestCase):
"""Unit tests for ``is_ipython_interpreter()``"""
def test_is_ipython_interpreter(self):
"""Test if this test framework is run in an IPython interpreter."""
assert_false(is_ipython_interpreter())
_sample_sheet.__IPYTHON__ = None
assert_true(is_python_interpreter())
Add code coverage hook to commits and PRs:
Hi, I have a sample sheet that has the same sample ID but on multiple lanes, which in our case can happen quite frequently. This case is currently not supported. Could this be added?
Example [Data] section:
[Data],,,,,,,
Lane,Sample_ID,Sample_Name,Sample_Plate,Sample_Well,I7_Index_ID,index,Sample_Project,Description
1,WES013BL,,,,A010,TAGCTT,,
1,WES013FR,,,,A027,ATTCCT,,
1,MDx150891,,,,A012,CTTGTA,,
1,MDx150892,,,,A016,CCGTCC,,
2,WES013BL,,,,A010,TAGCTT,,
2,WES013FR,,,,A027,ATTCCT,,
2,MDx150891,,,,A012,CTTGTA,,
2,MDx150892,,,,A016,CCGTCC,,
self.__dict__.get(attr)
sample-sheet/sample_sheet/_sample_sheet.py
Line 259 in 87deb18
sample-sheet/sample_sheet/_sample_sheet.py
Lines 320 to 325 in 87deb18
return None if len(self.Reads) == 0 else len(self.Reads) == 1
sample-sheet/sample_sheet/_sample_sheet.py
Lines 437 to 439 in 87deb18
VALID_ASCII = {string.ascii_letters + string.digits + '-_'}
sample-sheet/sample_sheet/_sample_sheet.py
Lines 31 to 36 in 87deb18
When opening a SampleSheet with multiple samples with the same index (usual for me), where the final index for the sample is determined by the pair and not by uniqueness of each single index, we get an ValueError
:
ValueError: Sample index combination for XX-XX-4Y has already been added: XX-XX-5G
It would be good to allow for share indexes among samples.
https://conda.io/docs/build.html
Project may be suited for the bioconda channel:
This 10X index name is being rejected, SI-P03-C9. I know why. I submitted a few fixes to support 10X indices, but would be nice to turn off validation of indexes, generally speaking.
Our users copy/paste to create the samplesheet. They save the samplesheet using Excel. Somewhere along the way a space is added to the end of the index column. When we parse the samplesheet with this package, we get a validation error on the index since the space is present. Is it possible to strip the field of invalid characters before parsing?
I tried to install the package on a Windows machine with pip install sample_sheet
following the docs. Unfortunately it resulted in an error that looks like a Linux/Windows path compatibility issue:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Users\Foo\AppData\Local\Temp\pip-install-a02d2846\sample-sheet\setup.py", line 28, in <module>
packages=setuptools.find_packages(where='./'),
File "c:\users\tpham1.unimelb\appdata\local\programs\python\python37\lib\site-packages\setuptools\__init__.py", line 71, in find
convert_path(where),
File "c:\users\tpham1.unimelb\appdata\local\programs\python\python37\lib\distutils\util.py", line 112, in convert_path
raise ValueError("path '%s' cannot end with '/'" % pathname)
ValueError: path './' cannot end with '/'
----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in C:\Users\TPHAM1~1.UNI\AppData\Local\Temp\pip-install-a02d2846\sample-sheet\
Technically this is not specified in Illumina's sample sheet specification, but it would be handy if the parser could handle "comments" i.e. lines that start with a #
character. If these lines could be stored in the SampleSheet class and then saved when writing to a file that would be very handy.
>>> sample = Sample(dict(
>>> Sample_ID='1823A',
>>> Sample_Name='1823A-tissue',
>>> index='ACGT'))
SampleSheet
from HTTPS remoteSampleSheet
and Sample
s/shallow/deep/
sample-sheet/sample_sheet/_sample_sheet.py
Line 161 in 87deb18
sample-sheet/sample_sheet/_sample_sheet.py
Line 732 in 87deb18
sample-sheet/sample_sheet/_sample_sheet.py
Lines 875 to 884 in 87deb18
sample-sheet/sample_sheet/_sample_sheet.py
Line 415 in 87deb18
"Read_Structure"
sample-sheet/sample_sheet/_sample_sheet.py
Line 184 in 87deb18
Would be useful to have the option to output the sample sheet as json.
Hi there, I'm having some trouble parsing sample sheet files that don't use the comma-padded format. E.g.
[Header]
IEM1FileVersion,4
Investigator Name,jdoe
Experiment Name,exp001
Date,11/16/2017
Workflow,SureSelectXT
Application,NextSeq FASTQ Only
Assay,SureSelectXT
Description,A description of this flow cell
Chemistry,Default
[Reads]
151
151
[Settings]
CreateFastqForIndexReads,1
BarcodeMismatches,2
[Data]
Sample_ID,Sample_Name,index,Description,Library_ID,Read_Structure,Reference_Name,Sample_Project,Target_Set
1823A,1823A-tissue,GAATCTGA,0.5x treatment,2017-01-20,151T8B151T,mm10,exp001,Intervals-001
1823B,1823B-tissue,AGCAGGAA,0.5x treatment,2017-01-20,151T8B151T,mm10,exp001,Intervals-001
1824A,1824A-tissue,GAGCTGAA,1.0x treatment,2017-01-20,151T8B151T,mm10,exp001,Intervals-001
1825A,1825A-tissue,AAACATCG,10.0x treatment,2017-01-20,151T8B151T,mm10,exp001,Intervals-001
1826A,1826A-tissue,GAGTTAGC,100.0x treatment,2017-01-20,151T8B151T,mm10,exp001,Intervals-001
1826B,1823A-tissue,CGAACTTA,0.5x treatment,2017-01-17,151T8B151T,mm10,exp001,Intervals-001
1829A,1823B-tissue,GATAGACA,0.5x treatment,2017-01-17,151T8B151T,mm10,exp001,Intervals-001
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "sample_sheet/_sample_sheet.py", line 376, in __init__
self._parse(str(self.path))
File "sample_sheet/_sample_sheet.py", line 468, in _parse
header_match = self._section_header_re.match(line[0])
IndexError: list index out of range
list index out of range
It seems the lack of commas is causing csv.reader
to produce an empty list for the empty lines. Moving the empty line check to the top of the for
block fixes this.
A SampleCollection
is a container for samples which may have originated from multiple sample sheets / flow cells / lanes.
A SampleCollection
will facilitate organizing samples by their Sample_Name
or Library_ID
. A few methods will help with merge strategies for identical samples that have either been topped-off (same library, sequenced on different flow cells or lanes) or re-prepared (different library, can exist on same flow cell or lane).
>>> from sample_sheet import SampleCollection
>>> collection = SampleCollection(samples)
>>> collection.visualize()
"""
collection(n=4)
├─ sample1
│ ├─ library1
| │ ├─ loading1
| │ └─ loading2
│ └─ library2
| └─ loading1
└─ sample2
└─ library1
└─ loading1
"""
Grouping samples by loading returns a new collection. Samples that can be merged at this level will be equivalent (see L261-L265)
>>> collection = collection.group_by_loading(attr='Loading_ID')
>>> collection.visualize()
"""
collection(n=3)
├─ sample1
│ ├─ library1
│ └─ library2
└─ sample2
└─ library1
"""
Grouping samples by library returns a final collection.
>>> collection = collection.group_by_library(attr='Library_ID')
>>> collection.visualize()
"""
collection(n=2)
├─ sample1
└─ sample2
"""
Hi,
thanks for this.
I have a lot of problems with incorrect and weird samplesheets from the lab generated with "copy-paste" and strange barcode schemes, such as mixed Truseq and Nextera.
I was starting to write a very simple validator to pick up on the worst errors, but now see you have done much more.
Are you planning to write a standalone validator or is this already possible via your library?
Thanks, Colin
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.