Git Product home page Git Product logo

lncrna_bc's Introduction

lncRNA_BC

ABSTRACT

It is a repository that contains information about my master's project. The main topic is lincRNA as biomarkers in breast cancer. The main objective is to identify lincRNA biomarkers by transcriptome analysis. For more information about breast cancer and the use of lincRNA as biomarkers, please check the general Repo_Analysis.

INTRODUCTION

This repository is divided into 2 kinds of analysis:

  1. Transcriptome Analysis
  2. Exome Analysis

Transcriptome Analysis is for the identification of lincRNA over or sub-expressed in breast cancer patients who didn't response to neoadjuvant chemotherapy treatment. The main objective for this analysis is to identify the association of this lincRNA with coding genes and how they are involved in chemoresistance mechanisms.

On the other hand, the exome analysis is performed with the objective to identify the possible genetic cause of the distortion in coding gene expression profiles in chemoresistance in breast cancer patients, and if it is related to some clinical variables, like molecular subtype or hormonal receptors expression. With this information, we will be able to associate the genetic profile of coding genes to prognostic and prediction in order to provide benefits for patients management.

The final aim of this whole project is to predict functional association between lincRNA and coding genes through bioinformatic analysis and then validate them to identify possible therapeutic targets and panels of biomarkers for prediction and prognostic in breast cancer patients.

INDEX

  1. Transcriptome analysis ..\lncRNA_BC\Transcriptome: It contains folders with RNA-Seq data (paired-end), scripts, metadata and graphical results of the bioinformatic analysis. It contains a README too.
  • Scripts and pipeline ..\lncRNA_BC\Transcriptome\bin: It contains .R files with functions and scripts for the biomarker identification pipeline, that are organized in chronological order of use. The complete pipeline is specified in README.

  • Data ..\lncRNA_BC\Transcriptome\data: It contains all RNA-Seq raw data (fastq files). You can also find data in my OSF profile (User: Laura Contreras. Public Project: lincRNA as predictive biomarkers in Breast Cancer, Transcriptome data Folder)

    • Results ..\lncRNA_BC\Transcriptome\data\results: It contains all the processed files, and is subdivided into:
      • Quality ..\lncRNA_BC\Transcriptome\data\results\quality: It contains all the FastQC quality reports in .pdf format.

      • Tables ..\lncRNA_BC\Transcriptome\data\results\tables: It contains the output of transcripts counts in .csv.

  • Meta ..\lncRNA_BC\Transcriptome\meta: It contains the clinical information about samples in .csv format.

  • Graphs ..\lncRNA_BC\Transcriptome\Graphs: It contains .jpeg graphical results.

  1. Exome analysis ..\lncRNA_BC\Exome: It contains folders with exome data, scripts, metadata and graphical results of the bioinformatic analysis. It contains a README too.
  • Scripts and pipeline ..\lncRNA_BC\Exome\bin: It contains .R files with functions and scripts for the biomarker identification pipeline, and they are by numerical order of use. The complete pipeline is specified in README.md.

  • Data ..\lncRNA_BC\Exome\data: It contains all exome raw data (fastq files). You can also find data in my OSF profile (User: Laura Contreras. Public Project: lincRNA as predictive biomarkers in Breast Cancer, Exome Folder)

    • Results ..\lncRNA_BC\Exome\data\result: It contains all the processed files, and is subdivided into:

      • Quality ..\lncRNA_BC\Exome\data\result\quality: It contains all the FastQC quality reports in .pdf format.
  • Meta ..\lncRNA_BC\Exome\meta: It contains the clinical information about samples in .csv format.

APENDIX 1

Selected topics of Bioinformatics (STB)

..\lncRNA_BC\STB

This folders contains activities from our course, which are:

  • Analysis and discussion of repository
  • Master Project Slides: Identification_of_lincRNA_as_predictive_biomarkers_in_breast_cancer
  • An image for script to solve in issue: Volcano plot problem.
  • Methodological seminar: STAR ALIGNER.
    • Images for seminar document:

lncrna_bc's People

Contributors

lauramce avatar camillethuyentruong avatar

Watchers

Alicia Mastretta-Yanes avatar Rodolfo Ángeles Argáiz avatar  avatar

lncrna_bc's Issues

Code color in graphs

Help ith my advisor's issue

Hi! My advisor ask me to do a very difficult task with my paper's pictures. Here is literally what he said:

There are some problems with the code colors in the volcano plot and the PCA plot. Firstable, the volcano plot does not differ between sub and over expressed genes in colors: both have the same color. Second, there is no difference in color between significant differentially expressed genes and non significant (all are in black dots). Please check that.Second: We had followed the code colors in other graphs for Resistant and sensitive patients (Blue cyan and red, respectively). In your graph they are inverted. Please correct. Thanks!

So, I hope that you can help me with the 2 issues.

1.- Volcano Plot

This is the code I'm using for Volcano plot.

##Indicate color code##

cols[BCresultsNR$log2FoldChange < -1.5] <- "#0066FF"
cols[BCresultsNR$log2FoldChange > 1.5] <- "#0033CC"
cols[BCresultsNR$pvalue == 0] <- "black"
cols[BCresultsNR$sig < -log10(alpha) ] <- "#000033"
cols[BCresultsNR$pvalue > 0.05] <- "#CCCCCC"


And my problem is that I couldn't have a color code with 6 colors. Please Help!

2.- PCA plot (SOLVED!!)

I had problems with PCA too. But I solved it. Here is my first script, that does not work with what I want, and here is my solution. Thanks!!

Preliminary evaluation

Hi Laura,

In generar your repository structure are fine, and your general README.md are well documented an explicit. But it is necessary to use this 2 big sections? (Exoma and Transcriptoma) (is your decision)

Your numerated scripts are well commented, but what about Choc_GSEA?
In your/Transcriptome/bin/ script 2 at line 25 and 34 you are using absolute paths, I suggest to use relative paths in order to avoid problems if the user do not set the /lncRNA_BC/ at ~/

You have no data or scripts in the /Exome/ directories, remember you must put your data in OSF or toy data (just like in the transcriptome section), and it looks like the .csv in bot meta directories are the same, is this correct?
Your Exome/README.md are empty
I understand this repo is in development, but this half is almost empty

Remember the evaluation criteria, at this moment you do not have any R plot and there are no discussion file

final repo evaluation

Hi Laura,

I enjoyed revising your repo that is clear and well organized. You have been particularly careful at attending the issues that I mentioned in the last evaluation. I greatly appreciate it.

Some minor comments:
Be careful with the names of folders between your readme and the repo (i.e. "transcriptome analysis" or "transcriptome"). It can be confusing if they are not the same.

The discussion of results (Repo_Abstract_and_Discussion.md) need to be mentioned in the general README, at least with a link (I almost missed it!). It would be good to link more your results and graphics to this file because right now it is a little bit isolated from the rest of your repo. The idea of such file is to make a summary of what you have done!

The files GRAPHS.txt are superfluous and can now be discarded. It would also be good to link the graphics to the codes and data they were generated with, for example in the README of each analysis. That way the reader interested by a figure can readily find the code to generate it.

Don't hesitate to get back to us for any feedback. I hope you enjoyed the course!

Improve tasks

It is necessary for your own good

  • Subdivide your cards

  • Write Scripts

Using different scales in barplot simultaneously

Hi! I'm here again with problems to construct graphs.

I have very different data of the expression of one particular lincRNA between breast cancer cell lines MCF-10A, MCF-7, BT474 and MDA-MB-231, as I show in my data Promlc here

I want to construct a barplot with these data, and I want it to look like this

image

But... it looks like this...

image

This is the R code that I've been following:

Pre-requisites:

library(plotrix)

plotrix::gap.barplot(Promlc, gap = c(150, 650), xaxlab= c("MCF-10A", "MCF-7", "BT474", "MDA-MB-231"), ytics = c( 50, 100, 800, 900), col =  c("blue", "light blue", "blue", "green"), ylim =c(0,400))

My principal problem is that I cannot stablish scales to make the barplot better. I'm not sure if I'm using the wrong tool, or I'm not providing the necessary parameters in my original code.

Please Help!!

Transcriptome: Results folder

The folder for "results" is missing in data. I suggest you make a separate folder inside the Transcriptome directory for results and graphics

Evaluation 30 October

Hi Laura,

I had a look at your repo and README file. In general it is good and well-structured, well done! Below are some suggestions for improvement:

General README

  • I suggest you include the information from Repo_analysis.md as an introduction for your project in the general README (or at least a link to Repo_analysis.md in the general README)
  • Please describe shortly what are the files included in the STB folder, with links to your presentations

Exome project
The README file of the Exome project needs to be completed, explaining briefly the goals of the projects and describing the data, scripts and code for meta that are (or will be) included.

Transcriptome project:

  • Please explain the code for the meta data in the README file
  • The scripts Choc_GSEA and Choc_Volcano_plot are not described in the README file
  • The folder for "results" is missing in data. I suggest you make a separate folder inside the Transcriptome directory for results and graphics

Put data on OSF

You have no data or scripts in the /Exome/ directories, remember you must put your data in OSF or toy data (just like in the transcriptome section), and it looks like the .csv in bot meta directories are the same, is this correct?

Troubles with sublisting in Markdown

Hi! I was trying to put in order my repo and I started with my README.

I organized my repo with the following folders:

  1. Transcriptome
  • Bin
  • Graphs
  • Metadata
  • Data
  1. Exome
  • Bin
  • Metadata
  • Data

And, for specific use in my project, I need to have sub-folders in Data, just like this:

  • Data
    • Results
      • Quality (for FastQC reports)
      • Tables (for the count tables)
      • DESeq objects (for DESeq objects, because it contains inside a lot of factor's info)

But I couldn't achieved that... All my folders look like they were located in the same level in my organization, just like this. I tried also to use numbered list style, but it doesn't help to follow the README...

Could you help me with that?

Modify stadistic parameters to implement GSEA Analysis on DESEq results

Hi! I have an issue that I want to discuss with you, hopping that you can give me an adecuate advice.

As you know, I want to know the lincRNA that are differentially expressed in resistant patients, and I also want to know more information about how this lincRNA are involved in resistant processes. To answer that, the simplest way is to look every lincRNA differentially expressed... but I have the problem that most of my differentially expressed lincRNA are only annotated and there is no functional or biological information about them. For that reason, I though it could be a good idea to perform a gene set enrichment analysis, to know in what kind of processes this lincRNA could be involved.

To do that, I was using SeqGSEA package in DE-only analysis mode in R, but when I'm running it, this appears...

Warning message:
In .local(object, ...) :
  in estimateDispersions: sharingMode=='gene-est-only' will cause inflated numbers of false positives unless you have many replicates.

Anyway, if I continue running the analysis, it stops in solving permutation and then appears this:

DEpermNBstat <- DENBStatPermut4GSEA(DEG, permuteMat)
There were 50 or more warnings (use warnings() to see the first 50)

And there is no object created.

In here you can find my count table for differential expression analysis.

And in here is my script.

I want to know if you could help me with GSEA analysis: If you know another package in R for GSEA, or if you know how to avoid the replicates problem... or another way to perform this analysis.

Thanks!

Scripts in transcriptome

Your numerated scripts are well commented, but what about Choc_GSEA?
In your/Transcriptome/bin/ script 2 at line 25 and 34 you are using absolute paths, I suggest to use relative paths in order to avoid problems if the user do not set the /lncRNA_BC/ at ~/

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.