Child pages
  • Extract Consensus from Assembly

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Name in Workflow File: in-assembly

Slots:

 

Slot In GUI

Slot in Workflow File

Type

Assembly data

assembly

assembly

 

And 1 outut port:

Name in GUI: out-sequence

Name in Workflow File: out-sequence

Slots:

 

Slot In GUISlot in Workflow FileType
Sequencesequencestring

 

 

CAP3 is a contig assembly program. It allows to assembly long DNA reads (up to 1000 bp). Binaries can be downloaded from http://seq.cs.iastate.edu/cap3.html Huang, X. and Madan, A. (1999) CAP3: A DNA Sequence Assembly Program, Genome Research, 9: 868-877.

Parameters in GUI

 

ParameterDescriptionDefault valueOutput file Write assembly results to this output file in ACE format..result.aceQuality cutoff for clippingBase quality cutoff for clipping (-c).12Clipping rangeSet a number which unit is base. It will get the refGenes in n bases from peak center. (--distance).100Quality cutoff for differenecesBase quality cutoff for differences (-b).20Maximum difference scoreMax qscore sum at differences (-d). If an overlap contains lots of differences at bases of high quality, then the overlap is removed. The difference score is calculated as follows. If the overlap contains a difference at bases of quality values q1 and q2, then the score at the difference is max(0, min(q1, q2) - b), where b is Quality cutoff for differences. The difference score of an overlap is the sum of scores at each difference.200Match score factorMatch score factor (-m) is one of the parameters that affects similarity score of an overlap. See Overlap similarity score cutoff description for details.2Mismatch score factorMismatch score factor (-n) is one of the parameters that affects similarity score of an overlap. See Overlap similarity score cutoff description for details.-5Gap penalty factorGap penalty factor (-g) is one of the parameters that affects similarity score of an overlap. See Overlap similarity score cutoff description for details.6Overlap similarity score cutoffIf the similarity score of an overlap is less than the overlap similarity score cutoff (-s), then the overlap is removed. The similarity score of an overlapping alignment is defined using base quality values as follows. A match at bases of quality values q1 and q2 is given a score of m * min(q1,q2), where m is Match score factor. A mismatch at bases of quality values q1 and q2 is given a score of n * min(q1,q2), where n is Mismatch score factor. A base of quality value q1 in a gap is given a score of -g * min(q1,q2), where q2 is the quality value of the base in the other sequence right before the gap and g is Gap penalty factor. The score of a gap is the sum of scores of each base in the gap minus a gap open penalty. The similarity score of an overlapping alignment is the sum of scores of each match, each mismatch, and each gap. 900Overlap length cutoffAn overlap is taken into account only if the length of the overlap in bp is no less than the specified value (parameter -o of CAP3).40Overlap percent identity cutoffAn overlap is taken into account only if the percent identity of the overlap is no less than the specified value (parameter -p of CAP3).90Max number of word matchesThis parameter allows one to trade off the efficiency of the program for its accuracy (parameter -t of CAP3). For a read f, CAP3 computes overlaps between read f and other reads by considering short word matches between read f and other reads. A word match is examined to see if it can be extended into a long overlap. If read f has overlaps with many other reads, then read f has many short word matches with many other reads. This parameter gives an upper limit, for any word, on the number of word matches between read f and other reads that are considered by CAP3. Using a large value for this parameter allows CAP3 to consider more word matches between read f and other reads, which can find more overlaps for read f, but slows down the program. Using a small value for this parameter has the opposite effect.
300Band expansion sizeCAP3 determines a minimum band of diagonals for an overlapping alignment between two
sequence
reads. The band is expanded by a number of bases specified by this value (parameter -a of CAP3).20Max gap length in an overlapThe maximum length of gaps allowed in any overlap (-f). I.e. overlaps with longer gaps are rejected. Note that a small value for this parameter may cause the program to remove true overlaps and to produce incorrect results. The parameter may be used to split reads from alternative splicing forms into separate contigs.20Assembly reverse readsSpecifies whether to consider reads in reverse orientation for assembly (originally, parameter -r of CAP3).TrueCAP3 tool pathThe path to the CAP3 external tool in UGENE.defaultTemporary directoryThe directory for temporary files.default

Parameters in Workflow File

Type: cap3
ParameterParameter in the GUIType
out-fileOutput file string
clipping-cutoffQuality cutoff for clippingnumeric
clipping-rangeClipping rangenumeric
diff-cutoffQuality cutoff for differenecesnumeric
diff-max-qscoreMaximum difference scorenumeric
match-score-factorMatch score factornumeric
mismatch-score-factorMismatch score factornumeric
gap-penalty-factorGap penalty factornumeric
overlap-sim-score-cutoffOverlap similarity score cutoffnumeric
overlap-length-cutoffOverlap length cutoffnumeric
overlap-perc-id-cutoffOverlap percent identity cutoffnumeric
max-num-word-matchesMax number of word matchesnumeric
band-exp-sizeBand expansion sizenumeric
max-gap-in-overlapMax gap length in an overlapnumeric
assembly-reverseAssembly reverse readsboolean
pathCAP3 tool pathstring
tmp-dir
Temporary directorystring

Input/Output Ports

The element has 1 input port:Name in GUI: Input sequencesName in Workflow File: in-dataSlots:Slot In GUISlot in Workflow FileTypeDataset namedatasetstringInput URL(s)in.url
string