Abstract

Early human development requires precise regulation of gene expression in order to facilitate normal embryonic growth. Long noncoding RNAs (lncRNAs) are RNA transcripts lacking coding potential that regulate gene expression during development. We have identified a novel human-specific X-linked lncRNA called lncRHOXF2B, located in the RHOX cluster of the X chromosome. LncRHOXF2B is expressed from a 4.8kb gene to produce a 780 bp transcript, found in both the cytoplasm and the nucleus, and is subject to X-chromosome inactivation. To analyze the function of this novel lncRNA we targeted an inducible lncRHOXF2B transgene to chromosome 9 in human embryonic stem cells (hESCs).Overexpession of lncRHOXF2B in undifferentiated hESCs reduced cellular growth rates and increased differentiation. In this experiment, we examined the transcriptional profile of undifferentiated hESCs overexpressing lncRHOXF2B by illumina microarray. The expression profiling anaylsis along with highlighted results are reviewed below.

R packages

The following R/bioconductor packages are used in this analysis:

library(lumi)
library(lumiHumanIDMapping) 
library(lumiHumanAll.db)
library(RColorBrewer)
library(gplots)
library(ggplot2)
library(genefilter)
library(limma)
library(annotate)
library(reshape2)
library(Biobase)
library(dplyr)

This lncRHOXF2B microarray analysis summary report was compiled in Rmarkdown using the following packages:

library(rmarkdown)
library(knitr) 

Set-up & QC

First, I will read in the raw illumina microarray data and will add in controls
I will remove clone23 and just focus on clone28, and will get rid of the outlier replicate 2

rawData <- lumiR("FinalReport_noNorm_noBkrnd_samples.txt", convertNuID = TRUE, sep = NULL, detectionTh = 0.01, na.rm = TRUE, lib = "lumiHumanIDMapping")
rawData <- addControlData2lumi("FinalReport_noNorm_noBkrnd_controls.txt", rawData)
rawData <- rawData[,-1:-5] #gets rid of clone23 data
rawData <- rawData[,-2] #gets rid of an outlier replicate for clone28

Next, I will read in a text file that describes the design of this study, and will use this file to set up sample labels, treatment groups for comparison, and replicates

targets <- read.delim("Anguera_studyDesign_exper2.txt", sep="\t", stringsAsFactors = FALSE)
myGroups <- factor(paste(targets$Ident, targets$Treat, sep="."))

Figure 1: Housekeeping genes

I begin by analyzing the array quality and consistency by looking at how a set of housekeeping genes behaves on the array
Result: embryonic stem cells appear to have variable housekeeping gene signatures

plotHousekeepingGene(rawData)

Figure 2: Distribution of signal intensity

This is another QC check before normalization or data filtering occurs

cols <- topo.colors(n=6, alpha=1)
hist(rawData, xlab = "log2 expression", main = "non-normalized data - histograms", col=cols)