News

[Nov 8, 2011] SeqSite paper has been accepted, and will be published on the journal of BMC Systems Biology.

[Nov 18, 2010] SeqSite version 1.1.2 released, with added options in running SeqSite.

[Oct 28, 2010] SeqSite version 1.1.0 released, with added the function to ouput detected binding regions.

[Sep 29, 2009] SeqSite version 1.0.0 released, with modified background modeling.

[Jun 10, 2009] SeqSite version 0.9.0 released.

[Jun 1, 2009] SeqSite website starts.

Introduction

SeqSite is an efficient and easy-to-use software tool implementing a novel method for identifying and pinpointing transcription factor binding sites. It first detects transcription factor binding regions by clustering tags and statistical hypothesis testing, and locates every binding site in detected binding regions by modeling the tag profiles. It can pinpoint closely spaced adjacent binding sites from ChIP-seq data. This software is coded in C/C++, and supports major computer platforms.

Usage

SeqSite [options] <input.bed> <output.bar> <output.bed> 
        input.bed    ChIP-seq data in BED format 
                     (4 fields required: chrId, start, end, and strand)
        output.bar   BAR file containing binding sites identified
        output.bed   BED file containing binding regions detected
Options: (* advanced)
        -c <string>  control data in BED format 
                     4 fields are required: chrId, start, end, and strand
                     (default: not use)
        -g <int>     effective genome size 
                     (default: 2.4e+9 for the human genome)
        -d <int>     * tag clustering distance 
                     (default: 30)
        -n <int>     * min tag count in a tag cluster 
                     (default: 10)
        -S           * filter single-strand tag clusters 
                     (default: not filter)
        -l <double>  * average DNA fragment length 
                     (default: estimate from data)
        -t <int>     * top <int>% read clusters for frag. length estimating 
                     (default: 5)
        -p <double>  p-value cutoff for binding region detection 
                     (default: 1e-3)
        -f <double>  FDR for binding region detection 
                     (default: 0.1)
        -s <int>     * arm length for smoothing tag signal 
                     (default: 20)
        -k <int>     * kernel density bandwidth for smoothing tag signal
                     (default: use -s)
        -w <int>     * experimental motif width 
                     (default: 20)
        -F           * filter out the duplicate reads 
                     (default: FALSE)
        -q           quiet: no screen display 
                     (default: show progress)
Help Options:
        -h           show this help message
        -v           show version information
        -a           about SeqSite
	

Output

The BED file archives all detected binding regions, with statistical significance levels.

Each column of the BED file represents:

	chr#, start, end, read-count|fold-change|p-value|q-value, score, strand(+)
	

The BAR file archives all identified binding sites with the normalized scores indicating binding affinity.

Each column of the BAR file represents:

	chr#, position, p-value, fold-change, q-value, R-square, slope(normalized)
	

Downloads

Software:

The last version: SeqSite 1.1.2

Source Code: download

for Windows: download

for Linux 64-bit: download

for Linux 32-bit: download

 

Datasets used:

GABP (gzipped BED, 68.8MB) [original source: Valouev, A. et al (2008). Nat Methods]

STAT1 (gzipped BED, 242MB) [original source: Rozowsky, J. et al (2009). Nat Biotechnol]

NRSF (gzipped BED, 77.2MB) [original source: Valouev, A. et al (2008). Nat Methods]

Control data for GABP & NRSF (gzipped BED, 153MB) [original source: Valouev, A. et al (2008). Nat Methods]

Control data for STAT1 (gzipped BED, 211MB) [original source: Rozowsky, J. et al (2009). Nat Biotechnol]

 

Analysis results: (all results are in hg18 coordinates)

GABP: binding regions (gzipped BED, 234KB) binding sites (gzipped BAR, 595KB)

STAT1: binding regions (gzipped BED, 696KB) binding sites (gzipped BAR, 1.7MB)

NRSF: binding regions (gzipped BED, 66KB) binding sites (gzipped BAR, 123KB)

FAQ

Q: How can I convert other read-mapping formats to BED format?

A: We provide PERL script to do this job: [Eland] [bowtie] [ZOOM] [SAM].

 

Q: How can I browse BAR files?

A: BAR files can be viewed in CisGenome Browser, a mini browser for genomic data visualization. It can also browse other types of data, such as BED, RefFlat, and etc..

See also

QuEST: Valouev, A. et al. (2008). Nat Methods

SISSRs: Jothi, R. et al. (2008). Nucleic Acids Res

MACS: Zhang, Y. et al. (2008). Genome Biol

PeakSeq: Rozowsky, J. et al. (2009). Nat Biotechnol

GPS: Guo, Y. et al. (2010). Bioinformatics

PICS: Zhang, X. et al. (2010). Biometrics

Citation

Xi Wang and Xuegong Zhang. Pinpointing transcription factor binding sites from ChIP-seq data with SeqSite. BMC Systems Biology, 5(Suppl 2):S3. [abstract]

Feedback

Corresponding: Xuegong Zhang

Email | Website

Software details/bugs: Xi Wang

Email | Website