Git Product home page Git Product logo

sample-sheet's Introduction

sample-sheet

Testing Status Documentation Build Status Code Coverage PyPI Release install with bioconda Python Versions MyPy Checked Code style: black

A permissively licensed library designed to replace Illumina's Experiment Manager.

❯ pip install sample-sheet

Or install with the Conda package manager after setting up your Bioconda channels:

❯ conda install sample-sheet

Which should be equivalent to:

❯ conda install -c bioconda -c conda-forge -c defaults sample-sheet

Features:

  • Roundtrip reading, editing, and writing of Sample Sheets
  • de novo creation creation of Sample Sheets
  • Exporting Sample Sheets to JSON

Read the documentation at: sample-sheet.readthedocs.io

sample-sheet's People

Contributors

accumb3ns avatar clintval avatar dsommer avatar golharam avatar nuin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sample-sheet's Issues

Experimental support SampleCollections and Loading_ID

A SampleCollection is a container for samples which may have originated from multiple sample sheets / flow cells / lanes.

A SampleCollection will facilitate organizing samples by their Sample_Name or Library_ID. A few methods will help with merge strategies for identical samples that have either been topped-off (same library, sequenced on different flow cells or lanes) or re-prepared (different library, can exist on same flow cell or lane).

>>> from sample_sheet import SampleCollection
>>> collection = SampleCollection(samples)
>>> collection.visualize()
"""
collection(n=4)
├─ sample1
│  ├─ library1
|  │  ├─ loading1
|  │  └─ loading2
│  └─ library2
|     └─ loading1
└─ sample2
   └─ library1
      └─ loading1
"""

Grouping samples by loading returns a new collection. Samples that can be merged at this level will be equivalent (see L261-L265)

>>> collection = collection.group_by_loading(attr='Loading_ID')
>>> collection.visualize()
"""
collection(n=3)
├─ sample1
│  ├─ library1
│  └─ library2
└─ sample2
   └─ library1
"""

Grouping samples by library returns a final collection.

>>> collection = collection.group_by_library(attr='Library_ID')
>>> collection.visualize()
"""
collection(n=2)
├─ sample1
└─ sample2
"""

test_to_picard_basecalling_params_output_files fails on MacOS

Running the test suite on the master branch fails on MacOS.
I am running MacOS Monterey 12.2.1 on a INTEL mac with python 3.7.

pytest Error:

E           AssertionError: 'BARC[68 chars]TC\t/System/Volumes/Data/home/user/49-tissue.e[278 chars]\t\n' != 'BARC[68 chars]TC\t/home/user/49-tissue.exp001/49-tissue.GAAC[218 chars]\t\n'
E           Diff is 748 characters long. Set self.maxDiff to None to see it.

Full Error:

>tox
...cut
________________________________________ TestSampleSheet.test_to_picard_basecalling_params_output_files _________________________________________

self = <test_sample_sheet.TestSampleSheet testMethod=test_to_picard_basecalling_params_output_files>

    def test_to_picard_basecalling_params_output_files(self):
        """Test ``to_picard_basecalling_params()`` output files"""
        bam_prefix = '/home/user'
        lanes = [1, 2]
        with TemporaryDirectory() as temp_dir:
            sample1 = Sample(
                {
                    'Sample_ID': 49,
                    'Sample_Name': '49-tissue',
                    'Library_ID': 'exp001',
                    'Description': 'Lorum ipsum!',
                    'index': 'GAACT',
                    'index2': 'AGTTC',
                }
            )
            sample2 = Sample(
                {
                    'Sample_ID': 23,
                    'Sample_Name': '23-tissue',
                    'Library_ID': 'exp001',
                    'Description': 'Test description!',
                    'index': 'TGGGT',
                    'index2': 'ACCCA',
                }
            )

            sample_sheet = SampleSheet()
            sample_sheet.add_sample(sample1)
            sample_sheet.add_sample(sample2)
            sample_sheet.to_picard_basecalling_params(
                directory=temp_dir, bam_prefix=bam_prefix, lanes=lanes
            )

            prefix = Path(temp_dir)
            assert_true((prefix / 'barcode_params.1.txt').exists())
            assert_true((prefix / 'barcode_params.2.txt').exists())
            assert_true((prefix / 'library_params.1.txt').exists())
            assert_true((prefix / 'library_params.2.txt').exists())

            barcode_params = (
                'barcode_sequence_1\tbarcode_sequence_2\tbarcode_name\tlibrary_name\n'  # noqa
                'GAACT\tAGTTC\tGAACTAGTTC\texp001\n'  # noqa
                'TGGGT\tACCCA\tTGGGTACCCA\texp001\n'
            )  # noqa

            library_params = (
                'BARCODE_1\tBARCODE_2\tOUTPUT\tSAMPLE_ALIAS\tLIBRARY_NAME\tDS\n'  # noqa
                'GAACT\tAGTTC\t/home/user/49-tissue.exp001/49-tissue.GAACTAGTTC.{lane}.bam\t49-tissue\texp001\tLorum ipsum!\n'  # noqa
                'TGGGT\tACCCA\t/home/user/23-tissue.exp001/23-tissue.TGGGTACCCA.{lane}.bam\t23-tissue\texp001\tTest description!\n'  # noqa
                'N\tN\t/home/user/unmatched.{lane}.bam\tunmatched\tunmatchedunmatched\t\n'
            )  # noqa

            self.assertMultiLineEqual(
                (prefix / 'barcode_params.1.txt').read_text(), barcode_params
            )
            self.assertMultiLineEqual(
                (prefix / 'barcode_params.2.txt').read_text(), barcode_params
            )
            self.assertMultiLineEqual(
                (prefix / 'library_params.1.txt').read_text(),
>               library_params.format(lane=1),
            )
E           AssertionError: 'BARC[68 chars]TC\t/System/Volumes/Data/home/user/49-tissue.e[278 chars]\t\n' != 'BARC[68 chars]TC\t/home/user/49-tissue.exp001/49-tissue.GAAC[218 chars]\t\n'
E           Diff is 748 characters long. Set self.maxDiff to None to see it.

tests/test_sample_sheet.py:535: AssertionError

Index validation on indexes with spaces at the end

Our users copy/paste to create the samplesheet. They save the samplesheet using Excel. Somewhere along the way a space is added to the end of the index column. When we parse the samplesheet with this package, we get a validation error on the index since the space is present. Is it possible to strip the field of invalid characters before parsing?

Request - handling nonstandard tables

We have a sample sheet where we've added a custom section which is a headered table (along the lines of the [Data] section. It would be handy to be able to parse such a section using the sample-sheet library. (Currently, the library assumes that the custom section must be a dictionary of key-value pairs).

A made-up example of such a section follows:
[Animals],,
Name,Species,Status
Benji,Dog,Good
Sparky,Cat,Aloof
Sparky,Bird,Sleeping

Support for empty lines

Hi, thanks for putting together this great library. It seems like currently it is not supporting empty lines which the spec allows:

Empty lines or lines that consist entirely of commas and/or whitespace characters are valid, but ignored.

Traceback (most recent call last):
  File "/snap/pycharm-community/62/helpers/pydev/pydevd.py", line 1664, in <module>
    main()
  File "/snap/pycharm-community/62/helpers/pydev/pydevd.py", line 1658, in main
    globals = debugger.run(setup['file'], None, None, is_module)
  File "/snap/pycharm-community/62/helpers/pydev/pydevd.py", line 1068, in run
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "/snap/pycharm-community/62/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/home/cborroto/projects/samplesheet/scripts/wdlhelper.py", line 210, in <module>
    main()
  File "/home/cborroto/projects/samplesheet/scripts/wdlhelper.py", line 205, in main
    'summary': summary,
  File "/home/cborroto/.pyenv/versions/miniconda3-4.3.30/envs/samplesheet/lib/python3.6/site-packages/fire/core.py", line 127, in Fire
    component_trace = _Fire(component, args, context, name)
  File "/home/cborroto/.pyenv/versions/miniconda3-4.3.30/envs/samplesheet/lib/python3.6/site-packages/fire/core.py", line 366, in _Fire
    component, remaining_args)
  File "/home/cborroto/.pyenv/versions/miniconda3-4.3.30/envs/samplesheet/lib/python3.6/site-packages/fire/core.py", line 542, in _CallCallable
    result = fn(*varargs, **kwargs)
  File "/home/cborroto/projects/samplesheet/scripts/wdlhelper.py", line 39, in launch
    sample_sheet = SampleSheet(sample_sheet)
  File "/home/cborroto/.pyenv/versions/miniconda3-4.3.30/envs/samplesheet/lib/python3.6/site-packages/sample_sheet/_sample_sheet.py", line 376, in __init__
    self._parse(str(self.path))
  File "/home/cborroto/.pyenv/versions/miniconda3-4.3.30/envs/samplesheet/lib/python3.6/site-packages/sample_sheet/_sample_sheet.py", line 468, in _parse
    header_match = self._section_header_re.match(line[0])
IndexError: list index out of range

Order of columns in Data section

Hi,

I am using using your fantastic library to manipulate existing sample sheets and I have a question/request:

Is it possible to retain/control the order of the columns in the Data section?

I know the order should not really matter, but we try to stick to internal conventions for readability. Besides, the reordering makes it a bit more difficult to compare the original and modified versions of the sample sheets we process.

Perhaps using an OrderedDict instead of just a dict could be a possiblity...?
https://docs.python.org/3/library/collections.html#collections.OrderedDict

Thanks!
Florian

Installation issues

I'm experiencing two separate issues when trying to install this package:

Install with conda

The README says to do the following:

conda install -c bioconda sample-sheet

However this results in a missing dependency:

Collecting package metadata (current_repodata.json): done
Solving environment: failed
Collecting package metadata (repodata.json): done
Solving environment: failed

PackagesNotFoundError: The following packages are not available from current channels:

  - sample-sheet -> terminaltables

Current channels:
<truncated>

This can be fixed by explicitly including -c conda-forge like so:

conda install -c bioconda -c conda-forge sample-sheet

Install with pip on bitbucket pipelines

Using Bitbucket pipelines to test a package, I get the following output when listing sample-sheet as a dependency in my setup.py and installing my package:

  Downloading https://files.pythonhosted.org/packages/e1/4e/c4af36fcf9f7d3364723a49ec802637e6dbe73725bd2be97e5c7647a0669/sample-sheet-0.9.0.tar.gz
    ERROR: Complete output from command python setup.py egg_info:
    ERROR: Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-install-py7rsp3d/sample-sheet/setup.py", line 20, in <module>
        long_description=Path('README.md').read_text(),
      File "/opt/miniconda3/envs/test/lib/python3.6/pathlib.py", line 1197, in read_text
        return f.read()
      File "/opt/miniconda3/envs/test/lib/python3.6/encodings/ascii.py", line 26, in decode
        return codecs.ascii_decode(input, self.errors)[0]
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 1207: ordinal not in range(128)
    ----------------------------------------
ERROR: Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-install-py7rsp3d/sample-sheet/

Note that I do not get this error in a local development environment, so it could be specific to Bitbucket pipelines.

Intended behavior of empty samplesheet with missing sections

One of our lab operators provided an empty samplesheet intending to fill it out later, but never did. As a result, this run made it into our pipeline where we use this package to read the samplesheet. The samplesheet provided is:

[Data],,,,,,,,,
Sample_ID,Sample_Name,Sample_Plate,Sample_Well,I7_Index_ID,index,I5_Index_ID,index2,Sample_Project,Description

No other sections, and no samples. When I loaded this samplesheet using

from sample_sheet import SampleSheet as IlluminaSampleSheet
ss = IlluminaSampleSheet(sample_sheet_path)

no error was thrown. Is that the intended behavior?

Duplicated Sample_ID

Hi Clint,

we had an issue before (#32) where the same Sample_ID caused issues, even if the lanes were different.

Now we are having the same issue, but without specifying lanes. This time it's 10X data, which you mentioned briefly in the previous issue.

Basically we would like to merge across lanes and indexes and the corresponding sample sheets end up having the same Sample_ID across multiple Data lines.

An example:

Sample_ID,Sample_Name,Sample_Plate,Sample_Well,Index_Plate_Well,I7_Index_ID,index,I5_Index_ID,index2,Sample_Project,Description
PRJ180538_VPH20T,,,,,SI-GA-G6_1,CTGACGCG,,,,
PRJ180538_VPH20T,,,,,SI-GA-G6_2,GGTCGTAC,,,,
PRJ180538_VPH20T,,,,,SI-GA-G6_3,TCCTTCTT,,,,
PRJ180538_VPH20T,,,,,SI-GA-G6_4,AAAGAAGA,,,,

Here we merge across the four indexes and all lanes. However, your library does not currently allow this.

Could you allow duplicated Sample_IDs or recommend an alternative approach?

Thanks!
Florian

How to update the sample sheet object?

Hi

I am trying to write a simple logic to reverse complement the i5 index within the sample_sheet object using a custom reverse_complement function

Can you let me know how can I update the object such as to accomodate the reverse complemented i5 index and then write an updated CSV sample sheet file ?

for sample in sample_sheet:

	index_i5 = sample.index2

	index_i5_rc = reverse_complement(index_i5)
	
	sample.index2 = index_i5_rc

Review and update `add_sample` docstring

Clearly describe the add_sample() method along with all validation checks performed.

def add_sample(self, sample):
"""Validate and add a ``Sample`` to this ``SampleSheet``.
All samples are checked against the first sample added to ensure they
all have the sample ``read_structure`` attribute, if supplied. The
``SampleSheet`` will inherit the same ``read_structure`` attribute.
Samples cannot be added if the following criteria is met:
- ``Sample_ID`` and ``Sample_Library`` combination exists
- ``index`` and/or ``index2`` combination exists
- Samplesheet.reads and Sample.Read_Structure are incompatible
- Sample does not have ``index`` defined but others do
- Sample does not have ``index2`` defined but others do
- If defined, sample ``read_structure`` is different than others
Parameters
----------
sample : Sample
Sample to add to this sample sheet.
"""

Error parsing samplesheet

I have a sample sheet that looks like this:

[Header]
IEMFileVersion,4
Date,11/16/2015
Workflow,GenerateFASTQ
Application,RNA-Seq
Assay,TruSeq LT
Description
Chemistry,Default

[Reads]
75
75

[Settings]
Adapter,AGATCGGAAGAGCACACGTCTGAACTCCAGTCA
AdapterRead2,AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT

[Data]
Sample_ID,Sample_Name,Sample_Plate,Sample_Well,I7_Index_ID,index,GenomeFolder,Sample_Project,Description
MiSeq_L20151106_A1,MiSeq_L20151106_A1,,,AR001,ATCACG,Homo_sapiens\UCSC\hg19\Sequence\WholeGenomeFasta,QC,GeneexperessionQC
MiSeq_L20151106_B1,MiSeq_L20151106_B1,,,AR003,TTAGGC,Homo_sapiens\UCSC\hg19\Sequence\WholeGenomeFasta,QC,GeneexperessionQC

When reading this file using:

ss = IlluminaSampleSheet('SampleSheet.csv')

results in

$ python test.py 
Traceback (most recent call last):
  File "test.py", line 5, in <module>
    ss = IlluminaSampleSheet(sample_sheet_path)
  File "/Users/golharr/workspace/.venv/lib/python3.6/site-packages/sample_sheet/__init__.py", line 419, in __init__
    self._parse(self.path)
  File "/Users/golharr/workspace/.venv/lib/python3.6/site-packages/sample_sheet/__init__.py", line 537, in _parse
    key, value, *_ = line
ValueError: not enough values to unpack (expected at least 2, got 1)

Look like the trailing *_ is causing the problem on 537. If you remove that, the key, value gets read correctly. The *_ is not used and hence no need to include it here.

After making the change to line 537, the same problem arises for the Description line, since there is no comma, only a key is provided, and no corresponding value. I think a better check would be to execute these lines is if len(line) >= 2. I'l submit a PR that works for me.

Multiple samples with same index

When opening a SampleSheet with multiple samples with the same index (usual for me), where the final index for the sample is determined by the pair and not by uniqueness of each single index, we get an ValueError:

ValueError: Sample index combination for XX-XX-4Y has already been added: XX-XX-5G

It would be good to allow for share indexes among samples.

Error when attempting to parse non-comma-padded sample sheet files

Hi there, I'm having some trouble parsing sample sheet files that don't use the comma-padded format. E.g.

[Header]
IEM1FileVersion,4
Investigator Name,jdoe
Experiment Name,exp001
Date,11/16/2017
Workflow,SureSelectXT
Application,NextSeq FASTQ Only
Assay,SureSelectXT
Description,A description of this flow cell
Chemistry,Default

[Reads]
151
151

[Settings]
CreateFastqForIndexReads,1
BarcodeMismatches,2

[Data]
Sample_ID,Sample_Name,index,Description,Library_ID,Read_Structure,Reference_Name,Sample_Project,Target_Set
1823A,1823A-tissue,GAATCTGA,0.5x treatment,2017-01-20,151T8B151T,mm10,exp001,Intervals-001
1823B,1823B-tissue,AGCAGGAA,0.5x treatment,2017-01-20,151T8B151T,mm10,exp001,Intervals-001
1824A,1824A-tissue,GAGCTGAA,1.0x treatment,2017-01-20,151T8B151T,mm10,exp001,Intervals-001
1825A,1825A-tissue,AAACATCG,10.0x treatment,2017-01-20,151T8B151T,mm10,exp001,Intervals-001
1826A,1826A-tissue,GAGTTAGC,100.0x treatment,2017-01-20,151T8B151T,mm10,exp001,Intervals-001
1826B,1823A-tissue,CGAACTTA,0.5x treatment,2017-01-17,151T8B151T,mm10,exp001,Intervals-001
1829A,1823B-tissue,GATAGACA,0.5x treatment,2017-01-17,151T8B151T,mm10,exp001,Intervals-001
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "sample_sheet/_sample_sheet.py", line 376, in __init__
    self._parse(str(self.path))
  File "sample_sheet/_sample_sheet.py", line 468, in _parse
    header_match = self._section_header_re.match(line[0])
IndexError: list index out of range
list index out of range

It seems the lack of commas is causing csv.reader to produce an empty list for the empty lines. Moving the empty line check to the top of the for block fixes this.

package fails installing on Windows

I tried to install the package on a Windows machine with pip install sample_sheet following the docs. Unfortunately it resulted in an error that looks like a Linux/Windows path compatibility issue:

Traceback (most recent call last):
     File "<string>", line 1, in <module>
     File "C:\Users\Foo\AppData\Local\Temp\pip-install-a02d2846\sample-sheet\setup.py", line 28, in <module>
       packages=setuptools.find_packages(where='./'),
     File "c:\users\tpham1.unimelb\appdata\local\programs\python\python37\lib\site-packages\setuptools\__init__.py", line 71, in find
       convert_path(where),
     File "c:\users\tpham1.unimelb\appdata\local\programs\python\python37\lib\distutils\util.py", line 112, in convert_path
       raise ValueError("path '%s' cannot end with '/'" % pathname)
   ValueError: path './' cannot end with '/'

   ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in C:\Users\TPHAM1~1.UNI\AppData\Local\Temp\pip-install-a02d2846\sample-sheet\

SyntaxError: invalid syntax

When running

import os
import sys
import csv
from sample_sheet import SampleSheet

url = 'https://raw.githubusercontent.com/clintval/sample-sheet/master/tests/resources/{}'
sample_sheet = SampleSheet(url.format('paired-end-single-index.csv'))

print sample_sheet

I get this

>> python3 SampleSheet.csv
  File "SampleSheet.csv", line 1
    [Header],,,,,,,,,,
             ^
SyntaxError: invalid syntax

both in the example SampleSheet and my own.

Running code installed with pip3, not the one on the repository.

smart_open Version 1.8.1 Incompatibilities

The smart_open.smart_open function has been deprecated in version 1.8.1 of the smart_open library in favor of smart_open.open.

This throws a warning in the console whenever a sample sheet is opened.
piskvorky/smart_open#268

Version 1.8.1 also added SSH/SCP/SFTP support which throws a warning if paramiko is not installed.
piskvorky/smart_open#267

This could be solved by adding a paramiko requirement and changing smart_open calls or setting the version of smart_open to 1.8.0 in the setup file.

Output as json

Would be useful to have the option to output the sample sheet as json.

Support comments

Technically this is not specified in Illumina's sample sheet specification, but it would be handy if the parser could handle "comments" i.e. lines that start with a # character. If these lines could be stored in the SampleSheet class and then saved when writing to a file that would be very handy.

Unit test for IPython Interpreters

Try this:

import sample_sheet._sample_sheet

from sample_sheet._sample_sheet import is_ipython_interpreter

class TestIsIpythonInterpreter(TestCase):
    """Unit tests for ``is_ipython_interpreter()``"""

    def test_is_ipython_interpreter(self):
        """Test if this test framework is run in an IPython interpreter."""
        assert_false(is_ipython_interpreter())

        _sample_sheet.__IPYTHON__ = None

        assert_true(is_python_interpreter())

Same sample ID on multiple lanes causes error

Hi, I have a sample sheet that has the same sample ID but on multiple lanes, which in our case can happen quite frequently. This case is currently not supported. Could this be added?

Example [Data] section:

[Data],,,,,,,
Lane,Sample_ID,Sample_Name,Sample_Plate,Sample_Well,I7_Index_ID,index,Sample_Project,Description
1,WES013BL,,,,A010,TAGCTT,,
1,WES013FR,,,,A027,ATTCCT,,
1,MDx150891,,,,A012,CTTGTA,,
1,MDx150892,,,,A016,CCGTCC,,
2,WES013BL,,,,A010,TAGCTT,,
2,WES013FR,,,,A027,ATTCCT,,
2,MDx150891,,,,A012,CTTGTA,,
2,MDx150892,,,,A016,CCGTCC,,

CellRanger indexes are not recognized

Is it possible to bypass index validation?

    sample_sheet = SampleSheet(args.samplesheet)
  File "/home/ec2-user/cellranger-docker/env/lib/python3.6/site-packages/sample_sheet/__init__.py", line 418, in __init__
    self._parse(self.path)
  File "/home/ec2-user/cellranger-docker/env/lib/python3.6/site-packages/sample_sheet/__init__.py", line 524, in _parse
    self.add_sample(Sample(dict(zip(sample_header, line))))
  File "/home/ec2-user/cellranger-docker/env/lib/python3.6/site-packages/sample_sheet/__init__.py", line 296, in __init__
    raise ValueError(f'Not a valid index: {value}')
ValueError: Not a valid index: SI-GA-F8

Samples to have `sample_sheet` reference when instantiated through parsing

Something like:

>>> sample_sheet = SampleSheet('SampleSheet.csv')
>>> first_sample = sample_sheet.samples[0]
>>> first_sample.sample_sheet
SampleSheet("SampleSheet.csv")

But not on a Sample instantiated plainly:

>>> sample = Sample(dict(Sample_Name='test', Sample_ID='XXX', index='ACGT'))
>>> sample.sample_sheet
None

Would help with tracking the source of samples when reading in more than one sample sheet into an environment and then processing on the samples in other data structures like dictionaries.

Index vs index

Recently downloaded some example data from Illumina and I noticed the sample sheet had "Index" in it, not "index". Unfortunately wasn't able to read the sample sheet because of this. FYI, the data header line looked like:

Sample_ID,Sample_Name,BASESPACE_SAMPLE_RESOURCE_ID,GenomeFolder,I7_Index_ID,I5_Index_ID,Index,Index2,Sample_Well,Manifest,FastqFolder

Support Python2

I finally have a need. And it's a sad sad but very important need.

Re-write a few un-Pythonic lines

Error installing with pip

Getting a syntax error when installing on a fresh virtualenv system

(pipeline_run3) data_management> python
Python 3.4.5 (default, Sep 08 2016, 13:41:53) [GCC] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>
(pipeline_run3) pipeline@SequenceKing:~/cancerplus/code/data_management> pip3 install sample_sheet
Collecting sample_sheet
  Using cached https://files.pythonhosted.org/packages/fc/75/69cab3b91ea745a909bedc53f30789414eedc11ce2ebe6189733560a9583/sample_sheet-0.6.0.tar.gz
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-install-5eqifgel/sample-sheet/setup.py", line 12
        URL = f'https://github.com/clintval/{PACKAGE_NAME}'
                                                          ^
    SyntaxError: invalid syntax

    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-install-5eqifgel/sample-sheet/

Confusing error on file with no sections

If the file passed to the SampleSheet constructor contains data but no sections the error produced is:

'SampleSheet' object has no attribute ''

Should output an invalid formatted sample sheet error like "missing sections"

index validation

This 10X index name is being rejected, SI-P03-C9. I know why. I submitted a few fixes to support 10X indices, but would be nice to turn off validation of indexes, generally speaking.

__iter__() should be a generator

def __iter__(self):
    for sample in self.samples:
        yield sample

This will negate the need for __next__() and for binding a temporary _iter = [] variable

Cleaner and more explicit documentation

  • Inline short assignments
>>> sample = Sample(dict(
>>>     Sample_ID='1823A',
>>>     Sample_Name='1823A-tissue',
>>>     index='ACGT'))

SampleSheet.write()?

Hi there. Cool lib!

Do you have any intention on adding the ability to use this to create new (and write) sample sheets? Seems like most of the mechanics are already in place, probably just need a function to write back to string.

relationship with PEP

Hey just came across this repo -- I'm not sure if you're still developing this, but just wanted to alert you to our work on PEP, and more specifically, peppy. I thought you might find it interesting and may trigger some possibility for collaborating.

validator

Hi,

thanks for this.

I have a lot of problems with incorrect and weird samplesheets from the lab generated with "copy-paste" and strange barcode schemes, such as mixed Truseq and Nextera.

I was starting to write a very simple validator to pick up on the worst errors, but now see you have done much more.

Are you planning to write a standalone validator or is this already possible via your library?

Thanks, Colin

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.