[Nov 8, 2011] SeqSite paper has been accepted, and will be published on the journal of BMC Systems Biology.

[Nov 18, 2010] SeqSite version 1.1.2 released, with added options in running SeqSite.

[Oct 28, 2010] SeqSite version 1.1.0 released, with added the function to ouput detected binding regions.

[Sep 29, 2009] SeqSite version 1.0.0 released, with modified background modeling.

[Jun 10, 2009] SeqSite version 0.9.0 released.

[Jun 1, 2009] SeqSite website starts.


SeqSite is an efficient and easy-to-use software tool implementing a novel method for identifying and pinpointing transcription factor binding sites. It first detects transcription factor binding regions by clustering tags and statistical hypothesis testing, and locates every binding site in detected binding regions by modeling the tag profiles. It can pinpoint closely spaced adjacent binding sites from ChIP-seq data. This software is coded in C/C++, and supports major computer platforms.


SeqSite [options] <input.bed> <> <output.bed> 
        input.bed    ChIP-seq data in BED format 
                     (4 fields required: chrId, start, end, and strand)   BAR file containing binding sites identified
        output.bed   BED file containing binding regions detected
Options: (* advanced)
        -c <string>  control data in BED format 
                     4 fields are required: chrId, start, end, and strand
                     (default: not use)
        -g <int>     effective genome size 
                     (default: 2.4e+9 for the human genome)
        -d <int>     * tag clustering distance 
                     (default: 30)
        -n <int>     * min tag count in a tag cluster 
                     (default: 10)
        -S           * filter single-strand tag clusters 
                     (default: not filter)
        -l <double>  * average DNA fragment length 
                     (default: estimate from data)
        -t <int>     * top <int>% read clusters for frag. length estimating 
                     (default: 5)
        -p <double>  p-value cutoff for binding region detection 
                     (default: 1e-3)
        -f <double>  FDR for binding region detection 
                     (default: 0.1)
        -s <int>     * arm length for smoothing tag signal 
                     (default: 20)
        -k <int>     * kernel density bandwidth for smoothing tag signal
                     (default: use -s)
        -w <int>     * experimental motif width 
                     (default: 20)
        -F           * filter out the duplicate reads 
                     (default: FALSE)
        -q           quiet: no screen display 
                     (default: show progress)
Help Options:
        -h           show this help message
        -v           show version information
        -a           about SeqSite


The BED file archives all detected binding regions, with statistical significance levels.

Each column of the BED file represents:

	chr#, start, end, read-count|fold-change|p-value|q-value, score, strand(+)

The BAR file archives all identified binding sites with the normalized scores indicating binding affinity.

Each column of the BAR file represents:

	chr#, position, p-value, fold-change, q-value, R-square, slope(normalized)



The last version: SeqSite 1.1.2

Source Code: download

for Windows: download

for Linux 64-bit: download

for Linux 32-bit: download


Datasets used:

GABP (gzipped BED, 68.8MB) [original source: Valouev, A. et al (2008). Nat Methods]

STAT1 (gzipped BED, 242MB) [original source: Rozowsky, J. et al (2009). Nat Biotechnol]

NRSF (gzipped BED, 77.2MB) [original source: Valouev, A. et al (2008). Nat Methods]

Control data for GABP & NRSF (gzipped BED, 153MB) [original source: Valouev, A. et al (2008). Nat Methods]

Control data for STAT1 (gzipped BED, 211MB) [original source: Rozowsky, J. et al (2009). Nat Biotechnol]


Analysis results: (all results are in hg18 coordinates)

GABP: binding regions (gzipped BED, 234KB) binding sites (gzipped BAR, 595KB)

STAT1: binding regions (gzipped BED, 696KB) binding sites (gzipped BAR, 1.7MB)

NRSF: binding regions (gzipped BED, 66KB) binding sites (gzipped BAR, 123KB)


Q: How can I convert other read-mapping formats to BED format?

A: We provide PERL script to do this job: [Eland] [bowtie] [ZOOM] [SAM].


Q: How can I browse BAR files?

A: BAR files can be viewed in CisGenome Browser, a mini browser for genomic data visualization. It can also browse other types of data, such as BED, RefFlat, and etc..

See also

QuEST: Valouev, A. et al. (2008). Nat Methods

SISSRs: Jothi, R. et al. (2008). Nucleic Acids Res

MACS: Zhang, Y. et al. (2008). Genome Biol

PeakSeq: Rozowsky, J. et al. (2009). Nat Biotechnol

GPS: Guo, Y. et al. (2010). Bioinformatics

PICS: Zhang, X. et al. (2010). Biometrics


Xi Wang and Xuegong Zhang. Pinpointing transcription factor binding sites from ChIP-seq data with SeqSite. BMC Systems Biology, 5(Suppl 2):S3. [abstract]


Corresponding: Xuegong Zhang

Email | Website

Software details/bugs: Xi Wang

Email | Website