Git Product home page Git Product logo

buisciii-tools's Introduction

buisciii-tools's People

Contributors

alema91 avatar daniel-vm avatar erikakvalem avatar fgomez-aldecoa avatar github-actions[bot] avatar guillegorines avatar jaimeozaez avatar luissian avatar saramonzon avatar shettland avatar svarona avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

buisciii-tools's Issues

Sort path building in all modules

  • Make path building homogeneous among the modules.
  • Use Area and Center data from requesting user for path creation same as in archive module:
    def get_service_paths(conf, type, service):
    """
    Given a service, a conf and a type,
    get the path it would have in the
    archive, and outside of it
    """
    # Path in archive
    archived = os.path.join(
    conf["archive_path"],
    type,
    service["serviceUserId"]["Center"],
    service["serviceUserId"]["Area"],
    )
    # Path out of archive
    non_archived = os.path.join(
    conf["data_path"],
    type,
    service["serviceUserId"]["Center"],
    service["serviceUserId"]["Area"],
    )

Files that should be included in clean process for rna-seq template

Aligned.out.bam and Aligned.toTranscriptome.out.bam are intermediate files for each sample that are generated during rna-seq's nextflow pipeline but are not used in later processes of the service. The size of these files is ussually enough to be taken into consideration for removal during cleaning of the folders.

I'd recommend adding to "files" in https://github.com/BU-ISCIII/buisciii-tools/blob/main/bu_isciii/templates/services.json:

"rnaseq": {
"label": "",
"template": "rnaseq",
"url": "https://github.com/nf-core/rnaseq",
"order": 1,
"begin": "",
"end": "mag_met",
"description": "RNA-seq analysis",
"clean": {
"folders":[],
"files":[]
},

Service template improvements or fixes

Things that has to be fixed in the service's templates or conifgs:

  • MTBseq:

    • MTBseq & MTBseq assembly: fix folder to clean and update lablogs. @Daniel-VM
    • Fix 03-mtbseq lablog. Use current date instead of fixed year value to filter. @Daniel-VM
  • Assembly:

    • Fix clean directory in service.json @Daniel-VM
    • Update nextflow custom config (trimmed sequences). @Daniel-VM

Add permission fixing in finish module

Although the HPC configuration should ensure that the permissions for folders is 775 and 664 for files, all of them assigned to group bi, somehow some intermediate or copied files are created with user specific permissions in the shared folders.
This is annoying for several reasons:

  • Sometimes any other user needs to modify that file, or include some file in that "wrong permissions" folder and we need to ask to the owner to change it.
  • Sometimes the owner may be not working in the unit anymore or can be on holidays so we need to create a ticket for Bruno to delete it.
  • Archive module strongly fails when it tries to handle a file or folder with wrong permissions making that we need to start all over the process, which is a real pain.

The solution would be to add a process after finish module so when each service analysis is finished, we fix the permissions ensuring that all folders are 775, and all files 664 and adding bi group ownership to all files and folders. Ideally this permission and group configuration should be stablished in the config file so we can change it if needed without changing the code.

Update the archive module

The archive module needs some work done!

The following will be needed:

  • Adding a archive function by month and not just by years (Completed 18-01-2023)
  • Including a rough size calculation before filing (Completed 18-01-2023)
  • Adding a n input file to select services to archive or retrieve
  • Adding .tar.gz conversion and md5 obtention
  • Table log with services to archive with needed spaced.
  • Big log file with all the information
  • Remove deleting from full archive option. Removal should be done separately once everything is checked.
  • date params as command line parameter. Date checks after date is complete.
  • Move select research or service and colaborations to the begining. If research no Api request, just search for the folder name in the predefined path.
  • Testing on real services!

Bioinfo-doc does not properly resume delivery function

If you provide delivery notes from a txt file, and before the end of the process, an abort occurs, when attempting to resume it by selecting "No" when asked ? Do you want to overwrite delivery info?, the process crashes after executing the following line:


? Do you want to add some delivery notes to the e-mail? Yes
AttributeError: 'BioinfoDoc' object has no attribute 'provided_txt'

It seems that Bioinfo-doc is not able to find the file provided previously.

Adding IIER services to the template list

As far as I can see, there are some trio exome services that are making our recorded service queue look way too thick.
I intend to face this as soon as I'm able to, and of course, generate a proper template for all involved services. This will take a while to start and to finish though

If I recall correctly, the services I should focus on are:

  • WGSTRIO
    • to do. Follow and refine 20220503_UDPsoftwareupdate_gregorios_T
  • TRIO
    • to do. to do. Follow and refine 20220503_UDPsoftwareupdate_gregorios_T
  • RBPANEL - lowfreq_panel
    • change trimmomatic with fastp
  • EXOMAEB - exomeeb
    • update to last version sarek
  • RNASeq - rnaseq @svarona
    • check if template is updated

I might be missing some, but I think its a solid first step!

I'll try and check out previous lablogs work so I have a solid base to begin with!

Make sure every template is correctly included in iskylims

There are some templates in services.json which are not included into its correspondent service in iskylims. Therefore, when a researcher asks for one of these services, the default template is loaded (viralrecon). This should be fixed.

Improvement of error message when resolution ID does not exist

When using bu-isciii tools with a resolution ID that does not exist (for example, for service ID SRVCNM1078, only SRVCNM1078.1 exists):

bu-isciii scratch --direction service_to_scratch SRVCNM1078.2

the following error message appears:

───────────────────────── Traceback (most recent call last) ──────────────────────────────────╮
│ /data/bi/pipelines/miniconda3/envs/buisciii-tools/bin/bu-isciii:8 in <module>                         │
│                                                                                                       │
│   7 │   sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])                              │
│ ❱ 8 │   sys.exit(run_bu_isciii())                                                                     │
│   9                                                                                                   │
│                                                                                                       │
│ /data/bi/pipelines/miniconda3/envs/buisciii-tools/lib/python3.9/site-packages/bu_isciii/main.py:6 │
│ 4 in run_bu_isciii                                                                                    │
│                                                                                                       │
│    63 │   # Lanch the click cli                                                                       │
│ ❱  64 │   bu_isciii_cli()                                                                             │
│    65                                                                                                 │
│                                                                                                       │
│ /data/bi/pipelines/miniconda3/envs/buisciii-tools/lib/python3.9/site-packages/click/core.py:1130 in   │
│ call                                                                                              │
│                                                                                                       │
│   1129 │   │   """Alias for :meth:main."""                                                          │
│ ❱ 1130 │   │   return self.main(*args, **kwargs)                                                      │
│   1131                                                                                                │
│                                                                                                       │
│ /data/bi/pipelines/miniconda3/envs/buisciii-tools/lib/python3.9/site-packages/click/core.py:1055 in   │
│ main                                                                                                  │
│                                                                                                       │
│   1054 │   │   │   │   with self.make_context(prog_name, args, **extra) as ctx:                       │
│ ❱ 1055 │   │   │   │   │   rv = self.invoke(ctx)                                                      │
│   1056 │   │   │   │   │   if not standalone_mode:                                                    │
│                                                                                                       │
│ /data/bi/pipelines/miniconda3/envs/buisciii-tools/lib/python3.9/site-packages/click/core.py:1657 in   │
│ invoke                                                                                                │
│                                                                                                       │
│   1656 │   │   │   │   with sub_ctx:                                                                  │
│ ❱ 1657 │   │   │   │   │   return _process_result(sub_ctx.command.invoke(sub_ctx))                    │
│   1658                                                                                                │
│                                                                                                       │
│ /data/bi/pipelines/miniconda3/envs/buisciii-tools/lib/python3.9/site-packages/click/core.py:1404 in   │
│ invoke                                                                                                │
│                                                                                                       │
│   1403 │   │   if self.callback is not None:                                                          │
│ ❱ 1404 │   │   │   return ctx.invoke(self.callback, **ctx.params)                                     │
│   1405                                                                                                │
│                                                                                                       │
│ /data/bi/pipelines/miniconda3/envs/buisciii-tools/lib/python3.9/site-packages/click/core.py:760 in    │
│ invoke                                                                                                │
│                                                                                                       │
│    759 │   │   │   with ctx:                                                                          │
│ ❱  760 │   │   │   │   return __callback(*args, **kwargs)                                             │
│    761                                                                                                │
│                                                                                                       │
│ /data/bi/pipelines/miniconda3/envs/buisciii-tools/lib/python3.9/site-packages/click/decorators.py:26  │
│ in new_func                                                                                           │
│                                                                                                       │
│    25 │   def new_func(*args, **kwargs):  # type: ignore                                              │
│ ❱  26 │   │   return f(get_current_context(), *args, **kwargs)                                        │
│    27                                                                                                 │
│                                                                                                       │
│ /data/bi/pipelines/miniconda3/envs/buisciii-tools/lib/python3.9/site-packages/bu_isciii/main.py:2 │
│ 59 in scratch                                                                                         │
│                                                                                                       │
│   258 │   """                                                                                         │
│ ❱ 259 │   scratch_copy = bu_isciii.scratch.Scratch(                                                   │
│   260 │   │   resolution,                                                                             │
│                                                                                                       │
│ /data/bi/pipelines/miniconda3/envs/buisciii-tools/lib/python3.9/site-packages/bu_isciii/scratch.py:70 │
│ in init                                                                                           │
│                                                                                                       │
│    69 │   │   )                                                                                       │
│ ❱  70 │   │   self.service_folder = self.resolution_info["resolutions"][0][                           │
│    71 │   │   │   "resolution_full_number"                                                            │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────╯
TypeError: 'int' object is not subscriptable

This error should be catched as: "Resolution SRVCNM1078.2 does not exist for service SRVCNM1078"

Deliver automatization

[ ] Copy in sftp
[ ] Update state in iskylims
[ ] Get information from iskylims, user, samples, dates, etc.
[ ] Generate markdown, html and pdf report

Big junk files that should be deleted in trio services

There's a file named dbNSFP_ENSG_plugin_Columns.txt that is generated during 03-annotation process in both EXOMETRIO and WGSTRIO services.

The size of this file is roughly 24GB, and should be included in each of these templates lablog so its deleted properly.

Better handling of scratch module execution

With the new release 2.0.0 scratch.py executes the rsync command using SLURM's srun in order to provide better resource handling from the HTC cluster.

Although the module is perfectly functional using srun leads to a small problem: as the exit_code thrown by srun is automatically generated once the command is successfully executed, instead of once its finished, we are not correctly checking the exit status of the command. The workaround for this was to generate service_info.txt inside the original folder instead of the destination folder (inside /scratch/bi/). If by any reason rsync fails during its execution, although you will see this fail status in your terminal prompt, it will still report that the copy was successful.

It would be very helpful to find a solution for this, either waiting for the process to end (without using while + sleep) or some way to get the correct exit status.

Bioinfo-doc does not offer to write delivery notes for the email through prompt after using a txt file

? Do you wish to provide a text file for delivery notes? Yes
? Write the path to the file with RAW text as delivery notes ./delivery.txt
File selected: ./delivery.txt
...
...
...
? Do you want to add some delivery notes to the e-mail? Yes
? Do you want to use notes from ./delivery.txt? No
? Do you want to send e-mail automatically? Yes
? Do you want to add any other sender? apart from [email protected]. Note: [email protected] is the default CC. No
 Mail sent correctly

As shown above, after selecting a txt file for iSkyLIMS delivery notes, if you try to type manually some email delivery notes, the script ignores this option (even after offering it).

New service throwing error when no samples involved

When there are no samples linked to the service in iSKYlims, the "new-service" module will throw a somewhat ugly error.

Captura de pantalla de 2023-01-11 12-54-54

This, obvious as it seems, could perhaps be handled more elegantly by throwing a warning instead of the whole error, or even a prompt to ask whether generate the directory scaffold without linking the samples. This would make the tools a little smoother to work with when the samples are provided externally (and as a consequence, not loaded on iSKYlims).

Delete this issue if not on board with the idea plz!

Incorrect parsing of file sample names when long read files are present

ISSUE DESCRIPTION

When creating the samples_id.txt file I noticed that find ../RAW -name ".fastq.gz" | cut -d "/" -f 3 | cut -d "_" -f 1 | sort -u > samples_id.txtdoesn't correctly parse the file sample's name when long reads are present in the template's RAW/ directory. The following command might handle both long and short reads fastq.gz file names:

find ../RAW -name ".fastq.gz" | rev | cut -d "/" -f 1 | rev | cut -d "_" -f 1 | cut -d "." -f 1 | sort -u > samples_id.txt

Fix error when resolution id does not exits

│ /data/bi/pipelines/miniconda3/envs/buisciii-tools/lib/python3.9/site-packages/bu_isciii/drylab_api.py:41 in get_request                                                                              │
│                                                                                                                                                                                                      │
│    40 │   │   │   │   │   log.info(                                                                                                                                                                  │
│ ❱  41 │   │   │   │   │   │   "Resolution id does not exist. Status code: " + req.status_code                                                                                                        │
│    42 │   │   │   │   │   )                                                                                                                                                                          │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
TypeError: can only concatenate str (not "int") to str

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.