Child pages
  • Assembly Sequences with CAP3

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

 

ParameterDescriptionDefault value
Output file Select which type of genes need to output. up for genes upstream to peak summit, down for genes downstream to peak summit, all for both up and down. (--op).allWrite assembly results to this output file in ACE format..result.ace
Quality cutoff for clippingOutput official gene symbol instead of refseq name. (--symbolBase quality cutoff for clipping (-c).False12
Clipping rangeSet a number which unit is base. It will get the refGenes in n bases from peak center. (--distance).3000100
Quality cutoff for differeneces  Base quality cutoff for differences (-b).20
Maximum difference score  

Max qscore sum at differences (-d). If an overlap contains lots of differences at bases of high quality, then the overlap is removed. The difference score is calculated as follows. If the overlap contains a difference at bases of quality values q1 and q2, then the score at the difference is max(0, min(q1, q2) - b), where b is Quality cutoff for differences. The difference score of an overlap is the sum of scores at each difference.

200
Match score factor  
Masmatch score factor  
Gap penalty factor  
Overlap similarity score cutoff  
Overlap length cutoff  
Overlap percent identity cutoff  
Max number of word matches  
Band expansion size  
Max gap length in an overlap  
Assembly reverse reads  
CAP3 tool path  
Temporary directory  

Parameters in Workflow File

Type: peak2gene-id
Match score factor (-m) is one of the parameters that affects similarity score of an overlap. See Overlap similarity score cutoff description for details.2
Mismatch score factorMismatch score factor (-n) is one of the parameters that affects similarity score of an overlap. See Overlap similarity score cutoff description for details.-5
Gap penalty factorGap penalty factor (-g) is one of the parameters that affects similarity score of an overlap. See Overlap similarity score cutoff description for details.6
Overlap similarity score cutoff

If the similarity score of an overlap is less than the overlap similarity score cutoff (-s), then the overlap is removed. The similarity score of an overlapping alignment is defined using base quality values as follows. A match at bases of quality values q1 and q2 is given a score of m * min(q1,q2), where m is Match score factor. A mismatch at bases of quality values q1 and q2 is given a score of n * min(q1,q2), where n is Mismatch score factor. A base of quality value q1 in a gap is given a score of -g * min(q1,q2), where q2 is the quality value of the base in the other sequence right before the gap and g is Gap penalty factor. The score of a gap is the sum of scores of each base in the gap minus a gap open penalty. The similarity score of an overlapping alignment is the sum of scores of each match, each mismatch, and each gap. 

900
Overlap length cutoffAn overlap is taken into account only if the length of the overlap in bp is no less than the specified value (parameter -o of CAP3).40
Overlap percent identity cutoffAn overlap is taken into account only if the percent identity of the overlap is no less than the specified value (parameter -p of CAP3).90
Max number of word matches

This parameter allows one to trade off the efficiency of the program for its accuracy (parameter -t of CAP3). For a read f, CAP3 computes overlaps between read f and other reads by considering short word matches between read f and other reads. A word match is examined to see if it can be extended into a long overlap. If read f has overlaps with many other reads, then read f has many short word matches with many other reads. This parameter gives an upper limit, for any word, on the number of word matches between read f and other reads that are considered by CAP3. Using a large value for this parameter allows CAP3 to consider more word matches between read f and other reads, which can find more overlaps for read f, but slows down the program. Using a small value for this parameter has the opposite effect.

300
Band expansion sizeCAP3 determines a minimum band of diagonals for an overlapping alignment between two sequence reads. The band is expanded by a number of bases specified by this value (parameter -a of CAP3).20
Max gap length in an overlap

The maximum length of gaps allowed in any overlap (-f). I.e. overlaps with longer gaps are rejected. Note that a small value for this parameter may cause the program to remove true overlaps and to produce incorrect results. The parameter may be used to split reads from alternative splicing forms into separate contigs.

20
Assembly reverse readsSpecifies whether to consider reads in reverse orientation for assembly (originally, parameter -r of CAP3).True
CAP3 tool pathThe path to the CAP3 external tool in UGENE.default
Temporary directoryThe directory for temporary files.default

Parameters in Workflow File

Type: cap3

ParameterParameter in the GUIType
genomeout-fileGenome Output file

string

outposOutput filestring
symbolOfficial gene symbolsboolean
distanceDistancenumeric

Input/Output Ports

The element has 1 input port:

Name in GUI: Peak2gene data

Name in Workflow File: in-data

Slots:

Slot In GUISlot in Workflow FileType
Treatment features_treat-annann-table-list
And 1 output

clipping-cutoffQuality cutoff for clippingnumeric
clipping-rangeClipping rangenumeric
diff-cutoffQuality cutoff for differenecesnumeric
diff-max-qscoreMaximum difference scorenumeric
match-score-factorMatch score factornumeric
mismatch-score-factorMismatch score factornumeric
gap-penalty-factorGap penalty factornumeric
overlap-sim-score-cutoffOverlap similarity score cutoffnumeric
overlap-length-cutoffOverlap length cutoffnumeric
overlap-perc-id-cutoffOverlap percent identity cutoffnumeric
max-num-word-matchesMax number of word matchesnumeric
band-exp-sizeBand expansion sizenumeric
max-gap-in-overlapMax gap length in an overlapnumeric
assembly-reverseAssembly reverse readsboolean
pathCAP3 tool pathstring
tmp-dir
Temporary directorystring

Input/Output Ports

The element has 1 input port:

Name in GUI: Peak2gene output data Input sequences

Name in Workflow File: out in-data

Slots:

ann-table-list
Slot In GUISlot in Workflow FileType
Gene regionsgene-annotationann-table-list
Peak regionspeak-annotationDataset namedatasetstring
Input URL(s)in.urlstring