News
[Nov 8, 2011] SeqSite paper has been accepted, and will be published on the journal of BMC Systems Biology.
[Nov 18, 2010] SeqSite version 1.1.2 released, with added options in running SeqSite.
[Oct 28, 2010] SeqSite version 1.1.0 released, with added the function to ouput detected binding regions.
[Sep 29, 2009] SeqSite version 1.0.0 released, with modified background modeling.
[Jun 10, 2009] SeqSite version 0.9.0 released.
[Jun 1, 2009] SeqSite website starts.
Introduction
SeqSite is an efficient and easy-to-use software tool implementing a novel method for identifying and pinpointing transcription factor binding sites. It first detects transcription factor binding regions by clustering tags and statistical hypothesis testing, and locates every binding site in detected binding regions by modeling the tag profiles. It can pinpoint closely spaced adjacent binding sites from ChIP-seq data. This software is coded in C/C++, and supports major computer platforms.
Usage
SeqSite [options] <input.bed> <output.bar> <output.bed>
input.bed ChIP-seq data in BED format
(4 fields required: chrId, start, end, and strand)
output.bar BAR file containing binding sites identified
output.bed BED file containing binding regions detected
Options: (* advanced)
-c <string> control data in BED format
4 fields are required: chrId, start, end, and strand
(default: not use)
-g <int> effective genome size
(default: 2.4e+9 for the human genome)
-d <int> * tag clustering distance
(default: 30)
-n <int> * min tag count in a tag cluster
(default: 10)
-S * filter single-strand tag clusters
(default: not filter)
-l <double> * average DNA fragment length
(default: estimate from data)
-t <int> * top <int>% read clusters for frag. length estimating
(default: 5)
-p <double> p-value cutoff for binding region detection
(default: 1e-3)
-f <double> FDR for binding region detection
(default: 0.1)
-s <int> * arm length for smoothing tag signal
(default: 20)
-k <int> * kernel density bandwidth for smoothing tag signal
(default: use -s)
-w <int> * experimental motif width
(default: 20)
-F * filter out the duplicate reads
(default: FALSE)
-q quiet: no screen display
(default: show progress)
Help Options:
-h show this help message
-v show version information
-a about SeqSite
Output
The BED file archives all detected binding regions, with statistical significance levels.
Each column of the BED file represents:
chr#, start, end, read-count|fold-change|p-value|q-value, score, strand(+)
The BAR file archives all identified binding sites with the normalized scores indicating binding affinity.
Each column of the BAR file represents:
chr#, position, p-value, fold-change, q-value, R-square, slope(normalized)