This looks like a Soaplab service. Click here for more info and guidance on how to use this service
About Soaplab
Soaplab services are command line applications, wrapped as SOAP services, and served from a Soaplab server. All Soaplab services have the same generic set of SOAP operations (depending on the Soaplab version) as they all share a standardised interface.
Certain tools, like the Taverna workflow workbench, provide automatic support for the Soaplab way of executing these services. In some cases you will need to use the Soaplab Server Base URL rather than the WSDL location in these tools.
More information on Soaplab clients is available here.
Further documentation on Soaplab services is available:
- Soaplab overview
- Soaplab client guide page
- EBI Soaplab server documentation
- Soaplab 2
- Sourceforge site for EMBOSS Soaplab services
Provider:
European Bioinformatics Institute (EBI)
Location:
UNITED KINGDOM
Submitter / Source:
Gilles (29 days ago)
Base URL:
http://www.ebi.ac.uk/soaplab/services/assembly_fragment_assembly.emiraest
WSDL Location:
http://www.ebi.ac.uk/soaplab/services/assembly_fragment_assembly.emiraest?wsdl(download last cached WSDL file)
Documentation URL(s): None Login to add a documentation URL Description(s): No description(s) yet Login to add a description Details (from Soaplab server): from Soaplab server(8 days ago)
- ds_lsr_analysis :
- analysis :
- name : emiraest
- output :
- type : Assembly Fragment Assembly
- version : 6.3.0
- installation : Soaplab2 default installation
- description : MIRAest fragment assembly program
- analysis_extension :
- analysis :
- ds_lsr_analysis :
- analysis :
- name : emiraest
- output :
- type : Assembly Fragment Assembly
- version : 6.3.0
- installation : Soaplab2 default installation
- description : MIRAest fragment assembly program
- analysis_extension :
- parameter :
- standard :
- repeatable :
- base :
- data :
- ioformat : unspecified
- iotype : input
- repeatable :
- base :
- standard :
- list :
- name : Set parameters suited to the input type
- list_item :
- shown_as : Unspecified
- level : 0
- value : unspecified
- shown_as : Fasta
- level : 0
- value : fasta
- shown_as : PHD
- level : 0
- value : phd
- shown_as : CAF
- level : 0
- value : caf
- type : full
- repeatable :
- list :
- base :
- data :
- ioformat : url
- iotype : input
- repeatable :
- base :
- data :
- ioformat : url
- iotype : input
- repeatable :
- base :
- data :
- ioformat : unspecified
- iotype : input
- repeatable :
- base :
- data :
- ioformat : unspecified
- iotype : input
- repeatable :
- base :
- data :
- ioformat : unspecified
- iotype : input
- repeatable :
- base :
- data :
- ioformat : unspecified
- iotype : input
- repeatable :
- base :
- data :
- ioformat : unspecified
- iotype : input
- repeatable :
- base :
- data :
- ioformat : unspecified
- iotype : input
- repeatable :
- base :
- data :
- ioformat : unspecified
- iotype : input
- repeatable :
- base :
- data :
- ioformat : unspecified
- iotype : input
- repeatable :
- base :
- standard :
- list :
- name : Quality grades of de-novo assembly
- list_item :
- shown_as : Draft
- level : 0
- value : draft
- shown_as : Normal
- level : 0
- value : normal
- shown_as : Accurate
- level : 0
- value : accurate
- type : full
- repeatable :
- list :
- base :
- standard :
- list :
- name : Quality grades for mapping
- list_item :
- shown_as : Draft
- level : 0
- value : draft
- shown_as : Normal
- level : 0
- value : normal
- shown_as : Accurate
- level : 0
- value : accurate
- type : full
- repeatable :
- list :
- base :
- standard :
- list :
- name : Clipping grade modifiers
- list_item :
- shown_as : Light
- level : 0
- value : light
- shown_as : Medium
- level : 0
- value : medium
- shown_as : Heavy
- level : 0
- value : heavy
- type : full
- repeatable :
- list :
- base :
- base :
- ordering : 21
- name : highlyrepetitive
- help : A modifier switch for genome data that is deemed to be highly repetitive. The assemblies will run slower due to more iterative cycles that give mira a chance to resolve nasty repeats.
- option :
- name : EDAM:
- value : Generic boolean
- type : normal
- qualifier : highlyrepetitive
- default : false
- mandatory : false
- type : boolean
- prompt : Highly repetitive DNA
- base :
- ordering : 22
- name : highqualitydata
- help : A modifier switch when the sequences that are used are of exceptional quality. mira will then bump up a few quality parameters which should lead to less false positives in the repeat and SNP detection routines.
- option :
- name : EDAM:
- value : Generic boolean
- type : normal
- qualifier : highqualitydata
- default : false
- mandatory : false
- type : boolean
- prompt : High quality data
- base :
- ordering : 23
- name : estmode
- help : Switches mira to a good initial preset for assembling EST data. Note that this is not needed (and even counterproductive) when used with miraEST.
- option :
- name : EDAM:
- value : Generic boolean
- type : normal
- qualifier : estmode
- default : false
- mandatory : false
- type : boolean
- prompt : Preset EST assembly mode
- base :
- ordering : 24
- name : horrid
- help : Sets a number of parameters useful when dealing with really horrid data sets. Useful means that parameters are chosen to so that time and memory consumption do not explode beyond all hope of the program returning. Note that MIRA will return in most cases useful assemblies with this switch, but these might not be as optimised as with normal operation. The definition of ‘horrid’ is a bit flexible, for example, (a) a genomic projects with more than 2.000 reads that all seem to align partly to each other but have different repetitive structures or (b) EST clusters with a few thousand almost similar reads.
- option :
- name : EDAM:
- value : Generic boolean
- type : normal
- qualifier : horrid
- default : false
- mandatory : false
- type : boolean
- prompt : Preset horrid data set mode
- base :
- ordering : 25
- name : borg
- help : Sets several parameters to have mira try to assemble as many reads as possible. Will probably slow down the assembly process and use more memory. ‘We are MIRA of borg. You will be assembled, resistance is futile!’
- option :
- name : EDAM:
- value : Generic boolean
- type : normal
- qualifier : borg
- default : false
- mandatory : false
- type : boolean
- prompt : Force assembly
- standard :
- list :
- name : Load Job Type
- list_item :
- shown_as : EXP files from a file of filenames
- level : 0
- value : fofnexp
- shown_as : Load and assemble FASTA
- level : 0
- value : fasta
- shown_as : Load and assemble CAF
- level : 0
- value : caf
- shown_as : Load and assemble PHD
- level : 0
- value : phd
- shown_as : PHD files from a file of filenames
- level : 0
- value : fofnphd
- type : full
- repeatable :
- list :
- base :
- base :
- ordering : 27
- name : fo
- help : If set to ‘Y’, the project will not be assembled and no assembly output files will be produced. Instead, the project files will only be loaded. This switch is useful for checking consistency of input files.
- option :
- name : EDAM:
- value : Generic boolean
- type : normal
- qualifier : fo
- default : false
- mandatory : false
- type : boolean
- prompt : Filecheck only
- base :
- ordering : 28
- name : mxti
- help : Some file formats above (FASTA, PHD or even CAF and EXP) possibly don’t contain all the info necessary or useful for each read of an assembly. Should additional information, such as like clipping positions etc., be available in a XML trace info file in NCBI format (see File formats), then set this option to ‘Y’ and it will be merged to the data loaded. Please note, quality clippings given here will override quality clippings loaded earlier or performed by mira. Minimum clippings will still be made by the program, though.
- option :
- name : EDAM:
- value : Generic boolean
- type : normal
- qualifier : mxti
- default : false
- mandatory : false
- type : boolean
- prompt : Merge XML trace info
- standard :
- list :
- name : Read Naming Scheme
- list_item :
- shown_as : Sanger
- level : 0
- value : sanger
- shown_as : TIGR
- level : 0
- value : tigr
- type : full
- repeatable :
- list :
- base :
- standard :
- list :
- name : External quality
- list_item :
- shown_as : None
- level : 0
- value : none
- shown_as : SCF
- level : 0
- value : SCF
- type : full
- repeatable :
- list :
- base :
- base :
- ordering : 31
- name : eqo
- help : Only takes effect when ‘lj’ is fofnexp. Defines whether or not the qualities from the external source override the possibly loaded qualities from the load job project. This might be of use in case some post-processing software fiddles around with the quality values of the input file but one wants to have the original ones.
- option :
- name : EDAM:
- value : Generic boolean
- type : normal
- qualifier : eqo
- default : false
- mandatory : false
- type : boolean
- prompt : External quality override
- base :
- ordering : 32
- name : droeqe
- help : Should there be a major mismatch between the external quality source and the sequence (e.g. the base sequence read from a SCF file does not match the originally read base sequence), should the read be excluded from assembly or not. If not, it will use the qualities it had before trying to load the external qualities (either default qualities or the ones loaded from the original source).
- option :
- name : EDAM:
- value : Generic boolean
- type : normal
- qualifier : droeqe
- default : false
- mandatory : false
- type : boolean
- prompt : Discard read on eq error
- base :
- ordering : 33
- name : uti
- help : Two reads sequenced from the same clone template form a read pair with a known minimum and maximum distance. This feature will definitively help for contigs containing lots of repeats. Set this to ‘Y’ if your data contains information on insert sizes. Information on insert sizes can be given via the SI tag in EXP files (for each read pair individually), or for the whole project using dismin and dismax
- option :
- name : EDAM:
- value : Generic boolean
- type : normal
- qualifier : uti
- default : false
- mandatory : false
- type : boolean
- prompt : Use template information
- range :
- format : %d
- max : 4
- min : 1
- repeatable :
- base :
- ordering : 34
- name : ess
- help : Controls the starting step of the EST assembly and is therefore only useful in miraEST. EST assembly is a three step process, each with different settings to the assembly engine, with the result of each step being saved to disk. If results of previous steps are present in a directory, one can easily ‘play around’ with different setting for subsequent steps by reusing the results of the previous steps and directly starting with step two or three.
- option :
- name : EDAM:
- value : Generic integer
- type : normal
- name : scalemax
- value : 4
- type : style
- name : scalemin
- value : 1
- type : style
- qualifier : ess
- default : 1
- mandatory : false
- type : long
- prompt : Integer start step
- base :
- ordering : 35
- name : ps
- help : Controls whether date and time are printed out during the assembly. Suppressing it isn’t useful in normal operation, only when debugging or benchmarking.
- option :
- name : EDAM:
- value : Generic boolean
- type : normal
- qualifier : ps
- default : false
- mandatory : false
- type : boolean
- prompt : Print date
- base :
- ordering : 36
- name : lsd
- help : Straindata is a key value file, one read per line. First the name of the read, then the strain name of the organism the read comes from. It is used by the program to differentiate different types of SNPs appearing in organisms and classifying them.
- option :
- name : EDAM:
- value : Generic boolean
- type : normal
- qualifier : lsd
- default : false
- mandatory : false
- type : boolean
- prompt : Load straindata
- base :
- ordering : 37
- name : lb
- help : A backbone is a sequence (or a previous assembly) that is used as a template for the current assembly. The current assembly process will first assemble reads to loaded backbone contigs before creating new contigs. This feature is helpful for assembling against previous (and already possibly edited) assembly iterations, or to make a comparative assembly of two very closely related organisms. Please read ‘very closely related’ as in ‘only SNP mutations or short indels present’.
- option :
- name : EDAM:
- value : Generic boolean
- type : normal
- qualifier : lb
- default : false
- mandatory : false
- type : boolean
- prompt : Load backbone
- range :
- format : %d
- min : 1
- repeatable :
- base :
- ordering : 38
- name : sbuip
- help : When assembling against backbones, this parameter defines the pass iteration (see nop) from which on the backbones will be really used. In the passes preceding this number, the non-backbone reads will be assembled together as if no backbones existed. This allows mira to correctly spot repetitive stretches that differ by single bases and tag them accordingly. Rule of thumb – if backbones belong to the same strain as the reads to assemble, set to 1. If backbones are a different strain, then set sbuib to 1 lower than nop (example – nop 4 and sbuip 3).
- option :
- name : EDAM:
- value : Generic integer
- type : normal
- name : scalemin
- value : 1
- type : style
- qualifier : sbuip
- default : 3
- mandatory : false
- type : long
- prompt : Start backbone usage in pass
- standard :
- repeatable :
- base :
- standard :
- list :
- name : Backbone File type
- list_item :
- shown_as : Fasta
- level : 0
- value : fasta
- shown_as : CAF
- level : 0
- value : caf
- shown_as : GenBank
- level : 0
- value : gbf
- type : full
- repeatable :
- list :
- base :
- range :
- format : %d
- max : 3000
- min : 1000
- repeatable :
- base :
- ordering : 41
- name : brl
- help : Parameter for the internal sectioning size of the backbone. Extremely repetitive sequences may require reducing the default value, but the default value should work well in 99.9% of all cases.
- option :
- name : EDAM:
- value : Generic integer
- type : normal
- name : scalemax
- value : 3000
- type : style
- name : scalemin
- value : 1000
- type : style
- qualifier : brl
- default : 2500
- mandatory : false
- type : long
- prompt : Backbone rail length
- range :
- format : %d
- max : 100
- min : -1
- repeatable :
- base :
- ordering : 42
- name : bbq
- help : Defines the default quality that the backbone sequences have if they came without quality values in their files (like in GBF format or when FASTA is used without .qual files). A value of -1 causes mira to use the same default quality for backbones as for reads.
- option :
- name : EDAM:
- value : Generic integer
- type : normal
- name : scalemax
- value : 100
- type : style
- name : scalemin
- value : -1
- type : style
- qualifier : bbq
- default : -1
- mandatory : false
- type : long
- prompt : Backbone base quality
- base :
- ordering : 43
- name : abnc
- help : The standard mode of the assembler is to assemble available reads to a backbone and make new contigs with the remaining reads. If this option is set to ‘N’, the reads that cannot be assembled into existing contigs are put as singlets into the assembly, not forming new contigs.
- option :
- name : EDAM:
- value : Generic boolean
- type : normal
- qualifier : abnc
- default : false
- mandatory : false
- type : boolean
- prompt : Also build new contigs
- range :
- format : %d
- min : 20
- repeatable :
- base :
- ordering : 44
- name : mrl
- help : Minimum length that reads must have to be considered for the assembly. Shorter sequences will be filtered out at the beginning of the process and won’t be present in the final project.
- option :
- name : EDAM:
- value : Generic integer
- type : normal
- name : scalemin
- value : 20
- type : style
- qualifier : mrl
- default : 40
- mandatory : false
- type : long
- prompt : Minimum read length
- range :
- format : %d
- min : 1
- repeatable :
- base :
- ordering : 45
- name : nop
- help : Defines how many iterations of the whole assembly process are done. Rule of thumb – for quick and dirty assembly use 1 (not recommended). For assembly using read extensions and / or automatic contig editing (-ure and -ace) use at least 2. The recommended setting is 3 or higher, as some knowledge generated by the assembler can be used only from the third iteration on. More than 3 passes might be useful for projects containing many repetitive elements. See also -rbl and -mr for parameters that affect the assembly and disentanglement of possible repeats.
- option :
- name : EDAM:
- value : Generic integer
- type : normal
- name : scalemin
- value : 1
- type : style
- qualifier : nop
- default : 3
- mandatory : false
- type : long
- prompt : Number of passes
- base :
- ordering : 46
- name : sep
- help : Defines whether the skim algorithm (and with it also the recalculation of Smith-Waterman alignments) is called in between each main pass. If set to ‘N’, skimming is done only when needed by the workflow, either when read extensions are searched for (-ure) or when possible vector leftovers are to be clipped (-pvc). Setting this option to ‘Y’ is highly recommended, setting it to ‘N’ is only for quick and dirty assemblies.
- option :
- name : EDAM:
- value : Generic boolean
- type : normal
- qualifier : sep
- default : false
- mandatory : false
- type : boolean
- prompt : Skim each pass
- range :
- format : %d
- min : 1
- repeatable :
- base :
- ordering : 47
- name : rbl
- help : Defines the maximum number of times a contig can be rebuilt during main assembly passes (-nop) if misassemblies, due to possible repeats, are found.
- option :
- name : EDAM:
- value : Generic integer
- type : normal
- name : scalemin
- value : 1
- type : style
- qualifier : rbl
- default : 2
- mandatory : false
- type : long
- prompt : RMB break loops
- base :
- ordering : 48
- name : sd
- help : Default is ‘Y’ for mira and ‘N’ for miraEST. A spoiler can be either a chimeric read or it is a read with long parts of unclipped vector sequence still included (that was too long for the -pvc vector leftover clipping routines). A spoiler typically prevents contigs being joined; MIRA will cut them back so that they present no more harm to the assembly. Recommended for assemblies of mid-to-high coverage genomic assemblies; not recommended for assemblies of ESTs as one might lose splice variants with that. A minimum number of two assembly passes (-nop) must be run for this option to take effect.
- option :
- name : EDAM:
- value : Generic boolean
- type : normal
- qualifier : sd
- default : false
- mandatory : false
- type : boolean
- prompt : Spoiler detection
- base :
- ordering : 49
- name : sdlpo
- help : Defines whether the spoiler detection algorithms are run only for the last pass or for all passes (-nop). Takes effect only if spoiler detection (-sd) is on.
- option :
- name : EDAM:
- value : Generic boolean
- type : normal
- qualifier : sdlpo
- default : false
- mandatory : false
- type : boolean
- prompt : Spoiler detection last pass only
- range :
- format : %d
- min : 0
- repeatable :
- base :
- ordering : 50
- name : bdq
- help : Defines the default base quality of reads that have no quality read from a file.
- option :
- name : EDAM:
- value : Generic integer
- type : normal
- qualifier : bdq
- default : 10
- mandatory : false
- type : long
- prompt : Base default quality
- base :
- ordering : 51
- name : ugpf
- help : MIRA has two different pathfinder algorithms it chooses from to find its way through the (more or less) complete set of possible sequence overlaps; a genomic and an EST pathfinder. The genomic looks a bit into the future of the assembly and tries to stay on safe grounds using a maximum of information already present in the contig that is being built. The EST version, on the contrary, will directly jump at the complex cases posed by very similar repetitive sequences and try to solve those first; it is willing to fall down to brute force when really bad cases (such as coverage with thousands of sequences) are encountered. Generally, the genomic pathfinder will also work quite well with EST sequences (but might get slowed down a lot in pathological cases), while the EST algorithm does not work so well on genomes. If in doubt, leaveas ‘Y’ for genome projects and set to ‘N’ for EST projects.
- option :
- name : EDAM:
- value : Generic boolean
- type : normal
- qualifier : ugpf
- default : false
- mandatory : false
- type : boolean
- prompt : Use genomic pathfinder
- base :
- ordering : 52
- name : uess
- help : Another important switch if you plan to assemble non-normalised EST libraries, where some ESTs may reach coverages of several hundreds or thousands of reads. This switch lets MIRA save a lot of computational time when aligning those extremely high coverage areas (but only there), at the expense of some accuracy.
- option :
- name : EDAM:
- value : Generic boolean
- type : normal
- qualifier : uess
- default : false
- mandatory : false
- type : boolean
- prompt : Use emergency search stop
- range :
- format : %d
- min : 1
- repeatable :
- base :
- ordering : 53
- name : esspd
- help : Defines the number of potential partners a read must have for MIRA switching into emergency search stop mode for that read.
- option :
- name : EDAM:
- value : Generic integer
- type : normal
- name : scalemin
- value : 1
- type : style
- qualifier : esspd
- default : 500
- mandatory : false
- type : long
- prompt : Emergency search stop partner depth
- base :
- ordering : 54
- name : umcbt
- help : Defines whether there is an upper limit of time to be used to build one contig. Set this to ‘Y’ in EST assemblies where you think that extremely high coverages occur. Less useful for assembly of genomic sequences.
- option :
- name : EDAM:
- value : Generic boolean
- type : normal
- qualifier : umcbt
- default : false
- mandatory : false
- type : boolean
- prompt : Use max contig build time
- range :
- format : %d
- min : 1
- repeatable :
- base :
- ordering : 55
- name : bts
- help : Depending on -umcbt above, this number defines the time in seconds alloted to building one contig.
- option :
- name : EDAM:
- value : Generic integer
- type : normal
- name : scalemin
- value : 1
- type : style
- qualifier : bts
- default : 10000
- mandatory : false
- type : long
- prompt : Build time in seconds
- base :
- ordering : 56
- name : ure
- help : Defines whether there is an upper limit of time to be used to build one contig. Set this to ‘Y’ in EST assemblies where you think that extremely high coverages occur. Less useful for assembly of genomic sequences.
- option :
- name : EDAM:
- value : Generic boolean
- type : normal
- qualifier : ure
- default : false
- mandatory : false
- type : boolean
- prompt : Use read extension
- range :
- format : %d
- min : 1
- repeatable :
- base :
- ordering : 57
- name : rewl
- help : Only takes effect when -ure is set to ‘Y’. The read extension routines use a sliding window approach on Smith-Waterman alignments. This parameter defines the window length.
- option :
- name : EDAM:
- value : Generic integer
- type : normal
- name : scalemin
- value : 1
- type : style
- qualifier : rewl
- default : 30
- mandatory : false
- type : long
- prompt : Read extension window length
- range :
- format : %d
- min : 1
- repeatable :
- base :
- ordering : 58
- name : rewme
- help : Only takes effect when -ure is set to ‘Y’. The read extension routines use a sliding window approach on Smith-Waterman alignments. This parameter defines the number maximum number of errors (disagreements) between two alignments in the given window.
- option :
- name : EDAM:
- value : Generic integer
- type : normal
- name : scalemin
- value : 1
- type : style
- qualifier : rewme
- default : 2
- mandatory : false
- type : long
- prompt : Read extension with max errors
- range :
- format : %d
- min : 0
- repeatable :
- base :
- ordering : 59
- name : feip
- help : Only takes effect when -ure is set to ‘Y’. The read extension routines can be called before assembly and/or after each assembly pass (see -nop). This parameter defines the first pass in which the read extension routines are called. The default of 0 tells mira to extend the reads the first time before the first assembly pass.
- option :
- name : EDAM:
- value : Generic integer
- type : normal
- qualifier : feip
- default : 0
- mandatory : false
- type : long
- prompt : First extension in pass
- range :
- format : %d
- min : 0
- repeatable :
- base :
- ordering : 60
- name : leip
- help : Only takes effect when -ure is set to ‘Y’. The read extension routines can be called before assembly and/or after each assembly pass (see -nop). This parameter defines the last pass in which the read extension routines are called. The default of 0 tells mira to extend the reads the last time before the first assembly pass.
- option :
- name : EDAM:
- value : Generic integer
- type : normal
- qualifier : leip
- default : 0
- mandatory : false
- type : long
- prompt : Last extension in pass
- base :
- ordering : 61
- name : tpae
- help : This option is useful in EST assembly. Poly-AT stretches at the end of reads that were not correctly masked or clipped in pre-processing steps from external programs get tagged here. The assembler will not use these stretches for critical operations. Additionally, the tags do provide a good visual anchor when looking at the assembly with different programs.
- option :
- name : EDAM:
- value : Generic boolean
- type : normal
- qualifier : tpae
- default : false
- mandatory : false
- type : boolean
- prompt : Tag poly-AT at ends
- range :
- format : %d
- min : 1
- repeatable :
- base :
- ordering : 62
- name : pbwl
- help : Only takes effect when -tpae is set to ‘Y’. Defines the window length within which all bases (except the maximum number of errors allowed) must be either A or T to be considered a polybase stretch.
- option :
- name : EDAM:
- value : Generic integer
- type : normal
- name : scalemin
- value : 1
- type : style
- qualifier : pbwl
- default : 7
- mandatory : false
- type : long
- prompt : Polybase window length
- range :
- format : %d
- min : 1
- repeatable :
- base :
- ordering : 63
- name : pbwme
- help : Only takes effect when -tpae is set to ‘Y. Defines the maximum number of errors allowed in a given window length such that a stretch is considered to be a polybase stretch. The distribution of these errors is not important.
- option :
- name : EDAM:
- value : Generic integer
- type : normal
- name : scalemin
- value : 1
- type : style
- qualifier : pbwme
- default : 2
- mandatory : false
- type : long
- prompt : Polybase window max errors
- range :
- format : %d
- min : 1
- repeatable :
- base :
- ordering : 64
- name : pbwgd
- help : Only takes effect when -tpae is set to ‘Y’. Defines the number of bases from the end of a sequence (if masked, from the end of the masked area) within which a polybase stretch is looked for without finding one.
- option :
- name : EDAM:
- value : Generic integer
- type : normal
- name : scalemin
- value : 1
- type : style
- qualifier : pbwgd
- default : 9
- mandatory : false
- type : long
- prompt : Polybase window grace distance
- base :
- ordering : 65
- name : pvc
- help : Mira will try to identify possible sequencing vector relicts present at the start of a sequence and clip them away. These relicts are usually a few bases long and were not correctly removed from the sequence in data pre-processing steps of external programs. You might want to turn off this option if you know (or think) that your data contains a lot of repeats and the option below to fine tune the clipping behaviour does not give the expected results.
- option :
- name : EDAM:
- value : Generic boolean
- type : normal
- qualifier : pvc
- default : false
- mandatory : false
- type : boolean
- prompt : Possible vector clip
- range :
- format : %d
- min : 0
- repeatable :
- base :
- ordering : 66
- name : pvcmla
- help : The clipping of possible vector relicts option works quite well. Unfortunately the bounds of repeats or differences in EST splice variants sometimes show the same alignment behaviour as possible sequencing vector relicts and could therefore also be clipped. To stop the vector clipping from mistakenly clipping repetitive regions or EST splice variants, this option puts an upper bound to the number of bases a potential clip is allowed to have. If the number of bases is below or equal to this threshold then the bases are clipped. If the number of bases exceeds the threshold then the clip is NOT performed. Setting the value to 0 turns off the threshold i.e. clips are then always performed if a potential vector is found.
- option :
- name : EDAM:
- value : Generic integer
- type : normal
- qualifier : pvcmla
- default : 18
- mandatory : false
- type : long
- prompt : Possible vector clip max length allowed
- base :
- ordering : 67
- name : qc
- help : Default is ‘N’, but is automatically set to ‘Y’ when using the setparam options ‘fasta’ or ‘phd’ (can be turned off again by subsequent options afterwards). This will let mira perform its own quality clipping before sequences are entered into the assembly. The clip function performed is a sequence end window quality clip with back iteration to get a maximum number of bases as useful sequence. Note that the bases clipped away here can still be used afterwards if there is enough evidence supporting their correctness when the option -ure is turned on.
- option :
- name : EDAM:
- value : Generic boolean
- type : normal
- qualifier : qc
- default : false
- mandatory : false
- type : boolean
- prompt : Quality clip
- range :
- format : %d
- max : 35
- min : 15
- repeatable :
- base :
- ordering : 68
- name : qcmq
- help : This is the minimum quality required of bases in a window in order to be accepted. Please be cautious and don’t use extreme values here, because then the clipping will be too lax or too harsh. Values below 15 and higher than 35 are disallowed.
- option :
- name : EDAM:
- value : Generic integer
- type : normal
- name : scalemax
- value : 35
- type : style
- name : scalemin
- value : 15
- type : style
- qualifier : qcmq
- default : 20
- mandatory : false
- type : long
- prompt : Quality clip minimum quality
- range :
- format : %d
- min : 10
- repeatable :
- base :
- ordering : 69
- name : qcwl
- help : This is the length of a window in bases for the quality clip.
- option :
- name : EDAM:
- value : Generic integer
- type : normal
- name : scalemin
- value : 10
- type : style
- qualifier : qcwl
- default : 30
- mandatory : false
- type : long
- prompt : Quality clip window length
- base :
- ordering : 70
- name : mbc
- help : This will let mira perform a ‘clipping’ of bases that were masked out (replaced with the character X). It is generally not a good idea to use mask bases to remove unwanted portions of a sequence; the EXP file format and the NCBI traceinfo format have excellent possibilities to circumvent this. But because a lot of pre-processing software is built around cross_match, scylla- and phrap-style base masking, the need arised for mira to be able to handle this too. mira will look at the start and end of each sequence to see whether there are masked bases that should be ‘clipped’.
- option :
- name : EDAM:
- value : Generic boolean
- type : normal
- qualifier : mbc
- default : false
- mandatory : false
- type : boolean
- prompt : Masked bases clip
- range :
- format : %d
- min : 0
- repeatable :
- base :
- ordering : 71
- name : mbcgs
- help : While performing the clip of masked bases, mira will look if it can merge larger chunks of masked bases that are a maximum of -mbcgs apart.
- option :
- name : EDAM:
- value : Generic integer
- type : normal
- qualifier : mbcgs
- default : 20
- mandatory : false
- type : long
- prompt : Masked bases clip gap size
- range :
- format : %d
- min : 0
- repeatable :
- base :
- ordering : 72
- name : mbcmfg
- help : While performing the clip of masked bases at the start of a sequence, mira will allow up to this number of unmasked bases in front of a masked stretch.
- option :
- name : EDAM:
- value : Generic integer
- type : normal
- qualifier : mbcmfg
- default : 40
- mandatory : false
- type : long
- prompt : Masked bases clip max front gap
- range :
- format : %d
- min : 0
- repeatable :
- base :
- ordering : 73
- name : mbcmeg
- help : While performing the clip of masked bases at the end of a sequence, mira will allow up to this number of unmasked bases behind a masked stretch.
- option :
- name : EDAM:
- value : Generic integer
- type : normal
- qualifier : mbcmeg
- default : 60
- mandatory : false
- type : long
- prompt : Masked bases clip max end gap
- base :
- ordering : 74
- name : emlc
- help : If on, ensures a minimum left clip on each read according to the parameters in -mlcr & -smlc
- option :
- name : EDAM:
- value : Generic boolean
- type : normal
- qualifier : emlc
- default : false
- mandatory : false
- type : boolean
- prompt : Ensure minimum left clip
- range :
- format : %d
- min : 0
- repeatable :
- base :
- ordering : 75
- name : mlcr
- help : If -emlc is ‘Y’, checks whether there is a left clip whose length is at least the size specified here.
- option :
- name : EDAM:
- value : Generic integer
- type : normal
- qualifier : mlcr
- default : 25
- mandatory : false
- type : long
- prompt : Minimum left clip required
- range :
- format : %d
- min : 0
- repeatable :
- base :
- ordering : 76
- name : smlc
- help : If -emlc is ‘Y’ and the actual left clip is < -mlcr, then set the left clip of read to the value given here.
- option :
- name : EDAM:
- value : Generic integer
- type : normal
- qualifier : smlc
- default : 30
- mandatory : false
- type : long
- prompt : Set minimum left clip
- range :
- format : %d
- min : 1
- repeatable :
- base :
- ordering : 77
- name : bph
- help : Default is 14 on 32 bit systems and 16 on 64 bit systems. Controls the number of consecutive bases n which are used as a word hash. The higher the value the faster the search. The lower the value the more weak matches are found. Values below 10 are not recommended.
- option :
- name : EDAM:
- value : Generic integer
- type : normal
- name : scalemin
- value : 1
- type : style
- qualifier : bph
- default : 14
- mandatory : false
- type : long
- prompt : Bases per hash
- range :
- format : %d
- min : 1
- repeatable :
- base :
- ordering : 78
- name : hss
- help : This is a parameter controlling the stepping increments with which hashes are generated. This allows for a more fine-grained search as matches are now found with at least n+s (see -bph) equal bases instead of the SSAHA 2n. The higher the value the faster the search. The lower the value the more weak matches are found.
- option :
- name : EDAM:
- value : Generic integer
- type : normal
- name : scalemin
- value : 1
- type : style
- qualifier : hss
- default : 4
- mandatory : false
- type : long
- prompt : Hash saving step
- range :
- format : %d
- min : 1
- repeatable :
- base :
- ordering : 79
- name : pr
- help : Controls the relative percentage of exact word matches in an approximate overlap that has to be reached to accept the overlap as a possible match. Increasing this number will decrease the number of possible alignments that have to be checked by Smith-Waterman later on in the assembly, but it might also lead to the rejection of weaker overlaps (i.e. overlaps that contain a higher number of mismatches).
- option :
- name : EDAM:
- value : Generic integer
- type : normal
- name : scalemin
- value : 1
- type : style
- qualifier : pr
- default : 50
- mandatory : false
- type : long
- prompt : Percent required
- range :
- format : %d
- min : 1
- repeatable :
- base :
- ordering : 80
- name : mhpr
- help : Controls the maximum number of possible hits one read can maximally transport to the Smith-Waterman alignment phase. If more potential hits are found, only the best ones are taken. This is an important option for tackling projects that contain extreme assembly conditions. For example, 5000 reads that are all very similar would generate around 40 to 50 million possible alignments (forward and reverse complement). Setting this parameter to 200 reduces the number of alignments to check to around 1.5-2 million. As the assembly increases in passes (-nop), different combinations of possible hits will be checked, always the probably best ones first. So the accuracy of the assembly should only suffer when lowering this number too much.
- option :
- name : EDAM:
- value : Generic integer
- type : normal
- name : scalemin
- value : 1
- type : style
- qualifier : mhpr
- default : 200
- mandatory : false
- type : long
- prompt : Max hits per read
- range :
- format : %d
- max : 100
- min : 1
- repeatable :
- base :
- ordering : 81
- name : bip
- help : The banded Smith-Waterman alignment uses this percentage number to compute the bandwidth it has to use when computing the alignment matrix. E.g. expected overlap is 150 bases, bip=10 -> the banded SW will compute a band of 15 bases to each side of the expected alignment diagonal, thus allowing up to 15 unbalanced inserts / deletes in the alignment. INCREASING AND DECREASING THIS NUMBER – increasing will find more non-optimal alignments but will also increase SW runtime between linear and ^2, decreasing will work the other way round (it might miss a few bad alignments but gain speed).
- option :
- name : EDAM:
- value : Generic integer
- type : normal
- name : scalemax
- value : 100
- type : style
- name : scalemin
- value : 1
- type : style
- qualifier : bip
- default : 15
- mandatory : false
- type : long
- prompt : Bandwidth in percent
- range :
- format : %d
- min : 1
- repeatable :
- base :
- ordering : 82
- name : bmin
- help : Minimum bandwidth in bases to each side.
- option :
- name : EDAM:
- value : Generic integer
- type : normal
- name : scalemin
- value : 1
- type : style
- qualifier : bmin
- default : 25
- mandatory : false
- type : long
- prompt : Bandwidth minimum
- range :
- format : %d
- min : 1
- repeatable :
- base :
- ordering : 83
- name : bmax
- help : Maximum bandwidth in bases to each side.
- option :
- name : EDAM:
- value : Generic integer
- type : normal
- name : scalemin
- value : 1
- type : style
- qualifier : bmax
- default : 50
- mandatory : false
- type : long
- prompt : Bandwidth maximum
- range :
- format : %d
- min : 1
- repeatable :
- base :
- ordering : 84
- name : mo
- help : Minimum number of overlapping bases needed in an alignment of two sequences to be accepted.
- option :
- name : EDAM:
- value : Generic integer
- type : normal
- name : scalemin
- value : 1
- type : style
- qualifier : mo
- default : 15
- mandatory : false
- type : long
- prompt : Minimum overlap
- range :
- format : %d
- min : 1
- repeatable :
- base :
- ordering : 85
- name : ms
- help : Describes the minimum score of an overlap to be taken into account for assembly. mira uses a default scoring scheme for SW align. Each match counts 1, a match with an N counts 0, each mismatch with a non-N base -1 and each gap -2. Use a bigger score to weed out a number of chance matches, a lower score to perhaps find the single (short) alignment that might join two contigs together (at the expense of computing time and memory).
- option :
- name : EDAM:
- value : Generic integer
- type : normal
- name : scalemin
- value : 1
- type : style
- qualifier : ms
- default : 15
- mandatory : false
- type : long
- prompt : Minimum score
- range :
- format : %d
- max : 100
- min : 1
- repeatable :
- base :
- ordering : 86
- name : mrs
- help : Describes the min percentage of matching between two reads to be considered for assembly. Increasing this number will save memory but one might lose possible alignments. A maximum of 80 is probably sensible here. Decreasing below 55 will probably make memory and time consumption explode.
- option :
- name : EDAM:
- value : Generic integer
- type : normal
- name : scalemax
- value : 100
- type : style
- name : scalemin
- value : 1
- type : style
- qualifier : mrs
- default : 65
- mandatory : false
- type : long
- prompt : Minimum relative score
- base :
- ordering : 87
- name : egp
- help : Defines whether or not to increase penalties applied to alignments containing long gaps. Setting this to ‘Y’ might help in projects with frequent repeats. On the other hand, it is definitively disturbing when assembling very long reads containing multiple long indels in the called base sequence … although this should not happen in the first place and is a sure sign for problems lying ahead. When in doubt, set it to ‘Y’ for EST projects and de-novo genome assembly, set it to ‘N’ for assembly of closely related strains (assembly against a backbone). When set to ‘N’, it is recommended to have -amgb and -amgbemc both set to ‘Y’.
- option :
- name : EDAM:
- value : Generic boolean
- type : normal
- qualifier : egp
- default : false
- mandatory : false
- type : boolean
- prompt : Extra gap penalty
- standard :
- list :
- name : Extra gap penalty level
- list_item :
- shown_as : Low
- level : 0
- value : low
- shown_as : Medium
- level : 0
- value : medium
- shown_as : High
- level : 0
- value : high
- shown_as : EST split splices
- level : 0
- value : est
- type : full
- repeatable :
- list :
- base :
- range :
- format : %d
- max : 100
- min : 1
- repeatable :
- base :
- ordering : 89
- name : megpp
- help : Has no effect if extra_gap_penalty is off. Defines the maximum extra penalty in percent applied to ‘long’ gaps.
- option :
- name : EDAM:
- value : Generic integer
- type : normal
- name : scalemax
- value : 100
- type : style
- name : scalemin
- value : 1
- type : style
- qualifier : megpp
- default : 100
- mandatory : false
- type : long
- prompt : Maximum extra gap penalty percent
- standard :
- repeatable :
- base :
- standard :
- list :
- name : Contig analysis
- list_item :
- shown_as : None
- level : 0
- value : none
- shown_as : Text
- level : 0
- value : text
- shown_as : Signal
- level : 0
- value : signal
- type : full
- repeatable :
- list :
- base :
- range :
- format : %d
- max : 100
- min : 1
- repeatable :
- base :
- ordering : 92
- name : rodirs
- help : When adding reads to a contig, reject the reads if the drop in the quality of the consensus is > the given value in %. Lower values mean stricter checking. This value is doubled should a read be entered that has a template partner (a read pair) at the right distance.
- option :
- name : EDAM:
- value : Generic integer
- type : normal
- name : scalemax
- value : 100
- type : style
- name : scalemin
- value : 1
- type : style
- qualifier : rodirs
- default : 15
- mandatory : false
- type : long
- prompt : Reject on drop in relscore
- range :
- format : %d
- max : 100
- min : 1
- repeatable :
- base :
- ordering : 93
- name : dmer
- help : When adding reads to a contig, reject the reads if the error in zones known as dangerous exceeds the given value in %. Lower values mean stricter checking in these danger zones. For the time being, only regions tagged as ALUS or REPT in the experiment file are considered dangerous.
- option :
- name : EDAM:
- value : Generic integer
- type : normal
- name : scalemax
- value : 100
- type : style
- name : scalemin
- value : 1
- type : style
- qualifier : dmer
- default : 1
- mandatory : false
- type : long
- prompt : Danger max error rate
- base :
- ordering : 94
- name : mr
- help : One of the most important switches in MIRA. If set to ‘Y’, MIRA will try to resolve misassemblies due to repeats by identifying single base stretch differences and tag those critical bases as RMB (Repeat Marker Base, weak or strong). This switch is also needed when MIRA is run in EST mode to identify possible inter-, intra- and intra-and-interorganism SNPs.
- option :
- name : EDAM:
- value : Generic boolean
- type : normal
- qualifier : mr
- default : false
- mandatory : false
- type : boolean
- prompt : Mark repeats
- base :
- ordering : 95
- name : asir
- help : Only takes effect when -mr is set to ‘Y’, effect is also dependent on the fact whether strain data (see -lsd) is present or not. Usually, mira will mark bases that differentiate between repeats, when a conflict occurs between reads that belong to one strain. If the conflict occurs between reads belonging to different strains they are marked as SNP. However, if this switch is set to ‘Y’,= then conflicts within a strain are also marked as SNP. This switch is mainly used in assemblies of ESTs; it should not be set for genomic assembly.
- option :
- name : EDAM:
- value : Generic boolean
- type : normal
- qualifier : asir
- default : false
- mandatory : false
- type : boolean
- prompt : Assume SNP instead of repeat
- range :
- format : %d
- min : 2
- repeatable :
- base :
- ordering : 96
- name : mrpg
- help : Only takes effect when -mr is set to ‘Y’. This defines the minimum number of reads in a group that are needed for the RMB (Repeat Marker Bases) or SNP detection routines to be triggered. A group is defined by the reads carrying the same nucleotide for a given position, i.e., an assembly with mrpg=2 will need at least two times two reads with the same nucleotide (having at least a quality as defined in -mgqrt) to be recognised as repeat marker or a SNP. Setting this to a low number increases sensitivity, but might produce a few false positives, resulting in reads being thrown out of contigs because of falsely identified possible repeat markers (or wrongly recognised as SNP).
- option :
- name : EDAM:
- value : Generic integer
- type : normal
- name : scalemin
- value : 2
- type : style
- qualifier : mrpg
- default : 2
- mandatory : false
- type : long
- prompt : Minimum reads per group
- range :
- format : %d
- min : 25
- repeatable :
- base :
- ordering : 97
- name : mgqrt
- help : Only takes effect when -mr is set to ‘Y’. This defines the minimum quality of a group of bases to be taken into account as potential repeat marker. The lower the number, the more sensitive you get, but lowering below 25 is not recommended as a lot of wrongly called bases can have a quality approaching this value and you’d end up with a lot of false positives. The higher the overall coverage of your project the better, and the higher you can set this number. A value of 35 will probably remove all false po
- option :
- name : EDAM:
- value : Generic integer
- type : normal
- name : scalemin
- value : 25
- type : style
- standard :
- parameter :
- analysis :
License(s): None Login to add license info Cost: No info yet Login to add cost info Usage Conditions: No info yet Login to add usage conditions info Contact Info: None Login to add contact info Publications: for this service. This can be a URI to the publication and/or a DOI. None Login to add publication info Citations: None Login to add a citation