Reads Alignments

Supported file formats

Reads alignment can be visualized from files in SAM or BAM format. The specification of the two formats have been defined by the SAM Format Specification Working Group, which publishes an online version. The SAM format (Sequence Alignment/Map) is a tab-delimited format, with each line describing a sequence and how it is mapped to the reference. The BAM format is the binary counterpart of SAM, which is usually used to save disk space.

The inner part of RNAseqViewer can only read sorted and indexed BAM files. If SAM files are provided or if the file is not sorted and/or not indexed, RNAseqViewer will offer you to convert, sort and/or index your file before displaying it. These processes can take some time, but they only need to be executed once.

BAM files are usually generated by RNA mappers like TopHat. Single end reads and pair end reads are both supported.

Reads have three type of view: Reads View, Read Coverage View and Heatmap View.

Reads View

The Reads View shows the individual reads mapped to the genome. If the scale is large enough, the nucleotide content of each read is shown. If the reference DNA sequence is also provided, only the nucleotides which don’t match the reference sequence are shown, while the other ones remain grey.

The information about a read and is alignment is shown in tool tips when you hover over the read with your mouse. At the same time the read’s color changes to red and, if it is a paired-end read, its mate is also shown in red.

../../_images/reads.png

An example of the Reads View. The reads are displayed as grey rectangles. The reference sequence has also been loaded, and the mismatching nucleotides are shown in color. The reads on the left are mapped to a splicing junction. Only the right part of the junction reads are shown, the left part is at the other end of the linking line toward right.

../../_images/reads-small.png

Another example of the Reads View at smaller scale. The read under the mouse and its pair match are shown in red. Additional data are shown in a tool tip.

Read Coverage View

Alternatively, you can choose to see the read coverage. The scale of the plot is dynamically adjusted. You can choose in the Settings (Edit menu) whether you want all the plots to share a common scale or you prefer each one to have its own scale. The upper limit of the scale is displayed in the top left-hand corner of the track.

../../_images/coverage.png

An example of the Read Coverage View. 8 BAM files have been loaded and the coverage of the samples are shown with a common scale: every track’s y axis range from 0 to 5,034.

Heatmap View

The Heatmap View is particularly useful to display a large number of tracks since it can be very compact. It shows the gene expression for each exon of some genome annotations.

The view is divided according the limits of the exons in the loaded annotations. Hence columns are created for each exon or zone of overlapping exons. Each track shows colored bands under each exon and the color depends on the FPKM for the given exon and the given sample. So the view should be regarded as a FPKM heat map, with columns representing exons and lines representing samples.

Defining the heatmap’s columns

Before choosing the Heatmap view, you should first add some annotations in order to define the columns of the heat map. These can be BED Tracks, Genome Annotations or Transcripts. If several annotation tracks are loaded, you can see and change the choice of the annotations which are used for the heat map. Therefore open the track settings (right click on the track) and select the annotation file(s) you want to take as reference.

Note

One column is usually defined by one exon. This behavior is affected when several exons overlap. In this case, more columns are added to represent the overlapping sections. For example, if an exon A and an exon B overlap partially, then there will be three columns: 1) exon A without exon B, 2) overlapping region of A and B, and 3) exon B without exon A.

When your cursor hovers over a heatmap cell, more information is shown and the region corresponding to the column is highlighted on the annotation track.

Heatmap’s configuration

In default configuration, the color depends on \log_{10}(\text{FPKM} + 1), but this can be changed in the track settings (right click on the track). The global settings window (Edit menu) offers two other configuration choices:

  • Whether the color range should be common to all the tracks or independent for each track. The latter option can be useful for samples with different coverage depth.
  • Which color scheme should be used, depending on your needs (intensity or diverging datasets, color-blind friendly, black & white printing-friendly, etc.)

Note

The colors range is dynamically adjusted to represent the data currently on the screen. So the color of a given exon might change if you scroll or zoom since the range might change to take into account other exons. For the best result, you should zoom to see only the gene of interest on your screen.

../../_images/heatmap.png

An example of the Heatmap View. 3 BAM files have been loaded and the coverage of the exons is shown as a heatmap. Colors represent expression of the samples (described by the lines) for the exons of the HDAC2 gene which is shown on top. Colors are computed \log_{10}(\text{FPKM} + 1).

../../_images/heatmap-big.png

This example is very similar to the previous one. With regards to the latter, more datasets have been added and intronic regions have been hidden in order to have a better view of the expression for small exons. Colors of the datasets have also been changed to reflect the two groups: case samples and control samples.