lauramce / lncrna_bc Goto Github PK

It is a repository that contains information about my master's project. The main topic is lincRNA as biomarkers in breast cancer. The main objective is to identificate lincRNA biomarkers by transcriptome analysis

R 0.40% HTML 99.60%

lncrna_bc's Introduction

lncRNA_BC

ABSTRACT

It is a repository that contains information about my master's project. The main topic is lincRNA as biomarkers in breast cancer. The main objective is to identify lincRNA biomarkers by transcriptome analysis. For more information about breast cancer and the use of lincRNA as biomarkers, please check the general Repo_Analysis.

INTRODUCTION

This repository is divided into 2 kinds of analysis:

Transcriptome Analysis
Exome Analysis

Transcriptome Analysis is for the identification of lincRNA over or sub-expressed in breast cancer patients who didn't response to neoadjuvant chemotherapy treatment. The main objective for this analysis is to identify the association of this lincRNA with coding genes and how they are involved in chemoresistance mechanisms.

On the other hand, the exome analysis is performed with the objective to identify the possible genetic cause of the distortion in coding gene expression profiles in chemoresistance in breast cancer patients, and if it is related to some clinical variables, like molecular subtype or hormonal receptors expression. With this information, we will be able to associate the genetic profile of coding genes to prognostic and prediction in order to provide benefits for patients management.

The final aim of this whole project is to predict functional association between lincRNA and coding genes through bioinformatic analysis and then validate them to identify possible therapeutic targets and panels of biomarkers for prediction and prognostic in breast cancer patients.

INDEX

Transcriptome analysis ..\lncRNA_BC\Transcriptome: It contains folders with RNA-Seq data (paired-end), scripts, metadata and graphical results of the bioinformatic analysis. It contains a README too.

Scripts and pipeline ..\lncRNA_BC\Transcriptome\bin: It contains .R files with functions and scripts for the biomarker identification pipeline, that are organized in chronological order of use. The complete pipeline is specified in README.
Data ..\lncRNA_BC\Transcriptome\data: It contains all RNA-Seq raw data (fastq files). You can also find data in my OSF profile (User: Laura Contreras. Public Project: lincRNA as predictive biomarkers in Breast Cancer, Transcriptome data Folder)
- Results ..\lncRNA_BC\Transcriptome\data\results: It contains all the processed files, and is subdivided into:
  - Quality ..\lncRNA_BC\Transcriptome\data\results\quality: It contains all the FastQC quality reports in .pdf format.
  - Tables ..\lncRNA_BC\Transcriptome\data\results\tables: It contains the output of transcripts counts in .csv.
Meta ..\lncRNA_BC\Transcriptome\meta: It contains the clinical information about samples in .csv format.
Graphs ..\lncRNA_BC\Transcriptome\Graphs: It contains .jpeg graphical results.

Exome analysis ..\lncRNA_BC\Exome: It contains folders with exome data, scripts, metadata and graphical results of the bioinformatic analysis. It contains a README too.

Scripts and pipeline ..\lncRNA_BC\Exome\bin: It contains .R files with functions and scripts for the biomarker identification pipeline, and they are by numerical order of use. The complete pipeline is specified in README.md.
Data ..\lncRNA_BC\Exome\data: It contains all exome raw data (fastq files). You can also find data in my OSF profile (User: Laura Contreras. Public Project: lincRNA as predictive biomarkers in Breast Cancer, Exome Folder)
- Results ..\lncRNA_BC\Exome\data\result: It contains all the processed files, and is subdivided into:
  - Quality ..\lncRNA_BC\Exome\data\result\quality: It contains all the FastQC quality reports in .pdf format.
Meta ..\lncRNA_BC\Exome\meta: It contains the clinical information about samples in .csv format.

APENDIX 1

Selected topics of Bioinformatics (STB)

..\lncRNA_BC\STB

This folders contains activities from our course, which are:

Analysis and discussion of repository
Master Project Slides: Identification_of_lincRNA_as_predictive_biomarkers_in_breast_cancer
An image for script to solve in issue: Volcano plot problem.
Methodological seminar: STAR ALIGNER.
- Images for seminar document:
  - Maximal Mapable Prefixes diagram
  - RNA-Seq methodology diagram
  - RNA-Seq Workflow diagram

lncrna_bc's People

Contributors

Watchers

Forkers

almamelisa cristoichkov fernandadiaz12 jianguozhou3 arunabio lkp23

lncrna_bc's Issues

Complete Choco and GSEA

The scripts Choc_GSEA and Choc_Volcano_plot are not described in the README file

Code color in graphs

Help ith my advisor's issue

Hi! My advisor ask me to do a very difficult task with my paper's pictures. Here is literally what he said:

There are some problems with the code colors in the volcano plot and the PCA plot. Firstable, the volcano plot does not differ between sub and over expressed genes in colors: both have the same color. Second, there is no difference in color between significant differentially expressed genes and non significant (all are in black dots). Please check that.Second: We had followed the code colors in other graphs for Resistant and sensitive patients (Blue cyan and red, respectively). In your graph they are inverted. Please correct. Thanks!

So, I hope that you can help me with the 2 issues.

1.- Volcano Plot

This is the code I'm using for Volcano plot.

##Indicate color code##

cols[BCresultsNR$log2FoldChange < -1.5] <- "#0066FF"
cols[BCresultsNR$log2FoldChange > 1.5] <- "#0033CC"
cols[BCresultsNR$pvalue == 0] <- "black"
cols[BCresultsNR$sig < -log10(alpha) ] <- "#000033"
cols[BCresultsNR$pvalue > 0.05] <- "#CCCCCC"

And my problem is that I couldn't have a color code with 6 colors. Please Help!

2.- PCA plot (SOLVED!!)

I had problems with PCA too. But I solved it. Here is my first script, that does not work with what I want, and here is my solution. Thanks!!

Describe STB

Add meta folder to Transcriptome

Update scripts Exome

Preliminary evaluation

Hi Laura,

In generar your repository structure are fine, and your general README.md are well documented an explicit. But it is necessary to use this 2 big sections? (Exoma and Transcriptoma) (is your decision)

Your numerated scripts are well commented, but what about Choc_GSEA?
In your/Transcriptome/bin/ script 2 at line 25 and 34 you are using absolute paths, I suggest to use relative paths in order to avoid problems if the user do not set the /lncRNA_BC/ at ~/

You have no data or scripts in the /Exome/ directories, remember you must put your data in OSF or toy data (just like in the transcriptome section), and it looks like the .csv in bot meta directories are the same, is this correct?
Your Exome/README.md are empty
I understand this repo is in development, but this half is almost empty

Remember the evaluation criteria, at this moment you do not have any R plot and there are no discussion file

final repo evaluation

Hi Laura,

I enjoyed revising your repo that is clear and well organized. You have been particularly careful at attending the issues that I mentioned in the last evaluation. I greatly appreciate it.

Some minor comments:
Be careful with the names of folders between your readme and the repo (i.e. "transcriptome analysis" or "transcriptome"). It can be confusing if they are not the same.

The discussion of results (Repo_Abstract_and_Discussion.md) need to be mentioned in the general README, at least with a link (I almost missed it!). It would be good to link more your results and graphics to this file because right now it is a little bit isolated from the rest of your repo. The idea of such file is to make a summary of what you have done!

The files GRAPHS.txt are superfluous and can now be discarded. It would also be good to link the graphics to the codes and data they were generated with, for example in the README of each analysis. That way the reader interested by a figure can readily find the code to generate it.

Don't hesitate to get back to us for any feedback. I hope you enjoyed the course!

Modify in markdown READMEs

Edit and add info. Be organized and simple

Add Plots in R (Results)

R plot and there are no discussion file

Transcriptome project: Explain code of meta

Update Scripts

All scripts in transcriptome folder

Improve tasks

It is necessary for your own good

Subdivide your cards
Write Scripts

Using different scales in barplot simultaneously

Hi! I'm here again with problems to construct graphs.

I have very different data of the expression of one particular lincRNA between breast cancer cell lines MCF-10A, MCF-7, BT474 and MDA-MB-231, as I show in my data Promlc here

I want to construct a barplot with these data, and I want it to look like this

But... it looks like this...

This is the R code that I've been following:

Pre-requisites:

library(plotrix)

plotrix::gap.barplot(Promlc, gap = c(150, 650), xaxlab= c("MCF-10A", "MCF-7", "BT474", "MDA-MB-231"), ytics = c( 50, 100, 800, 900), col =  c("blue", "light blue", "blue", "green"), ylim =c(0,400))

My principal problem is that I cannot stablish scales to make the barplot better. I'm not sure if I'm using the wrong tool, or I'm not providing the necessary parameters in my original code.

Please Help!!

Reorganize Folders

You must reorganize your repo structure

Modify General README

Transcriptome: Results folder

The folder for "results" is missing in data. I suggest you make a separate folder inside the Transcriptome directory for results and graphics

Adding exome pipeline in R

Evaluation 30 October

Hi Laura,

I had a look at your repo and README file. In general it is good and well-structured, well done! Below are some suggestions for improvement:

General README

I suggest you include the information from Repo_analysis.md as an introduction for your project in the general README (or at least a link to Repo_analysis.md in the general README)
Please describe shortly what are the files included in the STB folder, with links to your presentations

Exome project
The README file of the Exome project needs to be completed, explaining briefly the goals of the projects and describing the data, scripts and code for meta that are (or will be) included.

Transcriptome project:

Please explain the code for the meta data in the README file
The scripts Choc_GSEA and Choc_Volcano_plot are not described in the README file
The folder for "results" is missing in data. I suggest you make a separate folder inside the Transcriptome directory for results and graphics

Add README to Exome

Your Exome/README.md are empty

Add discussion file

Put data on OSF

Add closed issues to Unit4 in repo

Yoy have to add in the Markdown file your issues

Upload exome files

Add link to GENCODE

In README

Troubles with sublisting in Markdown

Hi! I was trying to put in order my repo and I started with my README.

I organized my repo with the following folders:

Transcriptome

Bin
Graphs
Metadata
Data

Exome

Bin
Metadata
Data

And, for specific use in my project, I need to have sub-folders in Data, just like this:

Data
- Results
  - Quality (for FastQC reports)
  - Tables (for the count tables)
  - DESeq objects (for DESeq objects, because it contains inside a lot of factor's info)

But I couldn't achieved that... All my folders look like they were located in the same level in my organization, just like this. I tried also to use numbered list style, but it doesn't help to follow the README...

Could you help me with that?

Modify stadistic parameters to implement GSEA Analysis on DESEq results

Hi! I have an issue that I want to discuss with you, hopping that you can give me an adecuate advice.

As you know, I want to know the lincRNA that are differentially expressed in resistant patients, and I also want to know more information about how this lincRNA are involved in resistant processes. To answer that, the simplest way is to look every lincRNA differentially expressed... but I have the problem that most of my differentially expressed lincRNA are only annotated and there is no functional or biological information about them. For that reason, I though it could be a good idea to perform a gene set enrichment analysis, to know in what kind of processes this lincRNA could be involved.

To do that, I was using SeqGSEA package in DE-only analysis mode in R, but when I'm running it, this appears...

Warning message:
In .local(object, ...) :
  in estimateDispersions: sharingMode=='gene-est-only' will cause inflated numbers of false positives unless you have many replicates.

Anyway, if I continue running the analysis, it stops in solving permutation and then appears this:

DEpermNBstat <- DENBStatPermut4GSEA(DEG, permuteMat)
There were 50 or more warnings (use warnings() to see the first 50)

And there is no object created.

In here you can find my count table for differential expression analysis.

And in here is my script.

I want to know if you could help me with GSEA analysis: If you know another package in R for GSEA, or if you know how to avoid the replicates problem... or another way to perform this analysis.

Thanks!

Scripts in transcriptome

Update locations in issues

Uptdate all the files locations in Volcano plot issue, bar plot issue and GSEA issue

lauramce / lncrna_bc Goto Github PK

lncrna_bc's Introduction

lncRNA_BC

ABSTRACT

INTRODUCTION

INDEX

APENDIX 1

lncrna_bc's People

Contributors

Watchers

Forkers

lncrna_bc's Issues

Recommend Projects

Recommend Topics

Recommend Org