 |

November 21st, 2009, 04:21 AM
|
|
Senior User
|
|
Join Date: Jun 2007
Posts: 2,964
|
|
Sequence Analysis Using MUSCLE
I thought I'd put together some simple instructions for anyone who wants to learn how to align sequences and understand what they see.
There are other ways to do this but I thought this is the easiest to begin with. If you choose, you can download sequences and the MUSCLE program to your hard drive and work offline. Genbank does have this program installed and you may find it easier to use theirs or one from one of the other sites.
I work with 2 open Internet Explorer browsers; Opera is not compatible and I'm not sure about Firefox.
-----------------------------------------
As an example, we will compare 2 sequences with the H274Y mutation that confers Tamiflu resistance to one that doesn't. This will help you see where the mutation is if you don't know what position to look for.
We need to look at the NA segment to find this particular mutation.
1. Start with the program found here:
http://www.ebi.ac.uk/Tools/muscle/
2. This page contains the latest H1N1 sequences at Genbank:
http://www.ncbi.nlm.nih.gov/genomes/FLU/SwineFlu.html At the bottom of the page in the left-hand corner, there is a small box; click on it to gain access to all the pandemic flu sequences.
Edited Jan. 1 : Since the Genbank list is getting so long, I made a file containing these three sequences for convience. http://www.ncbi.nlm.nih.gov/sites/my...7Sk7evg15jW57/
3. On Nov 20, we see Genbank noted that Pavia/21 contains that mutation; click on GU216651* and that sequence segment will open.
4. Almost at the top of the page, we see "FASTA"; that is the format we want to use. Click it and the segment will reload.
5. We need to cut and paste that information starting with that little ">gi" and continuing all the way to the end of the rows of letters.
6. Now we go to the MUSCLE window and paste that into the box where it says to enter a sequence. Make sure the FASTA box is checked.
7. Go back to the Genbank page and click on A/Omsk/02/2009, which is a couple of sequences down the page from Pava/21. Omsk does not have the mutation.
8. Do the same process with Omsk, making sure you have the NA segment. Add this one on the next line in the MUSCLE box just like we did the first one. If your browser window is not at maximum size, the entries will look strange, there will be one long line and one single letter, one long line, one letter, etc. Do not change this.
9. Back to Genbank, we'll need another sequence with the mutation, Scroll down to Oct 13 and there is A/Quebec/147365/2009, open FN434454*, repeat what we did with the other two.
10. After we've entered our 3rd sequence in the MUSCLE box, click on "RUN" (and wait a while if you're on dialup). A window will come up as our job processes and when it's done, we will see "Start Jalview", click it.
11. A widow will open with our segments lined up in different colors and a black bar on the bottom with a scroll bar. Up on the left of the colored bar, mouse over the each of info lines and a small window will pop up with segment info. Hopefully, the 2 with the mutations (Pavia, Quebec) will be the 2 top ones.
12. Slowly move the scroll bar to the right and the single nucleotide mutations will appear. See the white rectangle in the black bar? When we put the mouse over the "C", the position will appear at the bottom left; it should be 218 C. Now, click on the C and a red bar will appear at the top, serving as a visual aid.
13. We know 3 nucleotides make 1 amino acid; so the amino acid change H274Y should be near the 822 position (274 x 3), so we continue to scroll until we reach that point. At position 831, we see that white rectangle in the black bar and a "C" in a white box in the colored bar. Note that the two above letters are TT.
------------------------------------------------
Now, this is where my knowledge pretty much ends; I'm not sure if #14 is 100% correct or why we see the mutation at position 831 instead of 822.
I hope someone with more knowledge will chime in and futher explain how to interpret what we see.
__________________
The salvage of human life ought to be placed above barter and exchange ~ Louis Harris, 1918
|

November 21st, 2009, 04:54 AM
|
 |
Registered User
|
|
Join Date: Feb 2006
Location: germany
Posts: 9,949
|
|
Re: Sequence Analysis Using MUSCLE
I just tried muscle, but it ran out of memory and seems a little slow.
I had been using kalign.exe and MAFFT online.
For simple flu-alignments (no insertions,deletions, no different
HAs or NAs or NSs) I have a self-written simple program
which is very fast and needs little memory.
So, e.g. I can align 7000 PB2s in 2min.
|

November 21st, 2009, 04:58 AM
|
 |
Registered User
|
|
Join Date: Feb 2006
Location: germany
Posts: 9,949
|
|
Re: Sequence Analysis Using MUSCLE
the nucleotide-sequences start with ~50 nucleotides
which are not decoded to amino-acids.
Often only parts of these 50 are given or none
The first occurrance of "ATG" is usually the first decoded amino-acid
(Methionine,Met,M)
also niman-H274Y is H275Y in N1
and D225G is D239G in H1
|

November 21st, 2009, 06:13 AM
|
|
Senior User
|
|
Join Date: Jun 2007
Posts: 2,964
|
|
Re: Sequence Analysis Using MUSCLE
OK! The example I'm showing has the first "atg" for all three starting at position 9.
"We see the mutation at position 831 instead of 822;" so if I subtract 8, I'm at position 823... so I'm still 1 off? Niman's 275 makes it worse.
When I've seen the nucleotides and amino acids aligned for comparison purposes, the M is under the T.. so when I count, is the M position considered #1 or #2?
BTW, thank you for all your hours and patience.
__________________
The salvage of human life ought to be placed above barter and exchange ~ Louis Harris, 1918
|

November 21st, 2009, 06:18 AM
|
 |
Registered User
|
|
Join Date: Feb 2006
Location: germany
Posts: 9,949
|
|
Re: Sequence Analysis Using MUSCLE
C823T(6,n)=H275Y(NA) CAC-->TAC
this is for starting to count at the coding region (=first amino acid, ATG), which I think is unusual
for ******:
S224P in PA is T670C(3)
M582L in PA is A1741C(3)
S91P in HA is T298C(4)
S206T in HA is T658A(4)
V323I in HA is G1012A(4)
V100I in NP is G298A(5)
T373I in NP is C1118T(5)
V106I in NA is G316A(6)
N247D in NA is A742G(6)
all the 3 can mutate, see list below
mutations at position 3 in a codon (3 consecutive nucleotides)
are usually synonymous
(don't change the encoded amino acid)
Alanine,Ala,A,4,GCT,GCC,GCA,GCG
Arginine,Arg,R,6,CGT,CGC,CGA,CGG,AGA,AGG
Asparagine,Asn,N,2,AAT,AAC
AsparticAcid,Asp,D,2,GAT,GAC
Cysteine,Cys,C,2,TGT,TGC
GlutamicAcid,Glu,E,2,GAA,GAG
Glutamine,Gln,Q,2,CAA,CAG
Glycine,Gly,G,4,GGT,GGC,GGA,GGG
Histidine,His,H,2,CAT,CAC
Isoleucine,Ile,I,3,ATT,ATC,ATA
Leucine,Leu,L,6,TTA,TTG,CTT,CTC,CTA,CTG
Lysine,Lys,K,2,AAA,AAG
Methionine,Met,M,1,ATG
Phenylalanine,Phe,F,2,TTT,TTC
Proline,Pro,P,4,CCT,CCC,CCA,CCG
Serine,Ser,S,6,TCT,TCC,TCA,TCG,AGT,AGC
Threonine,Thr,T,4,ACT,ACC,ACA,ACG
Tryptophan,Trp,W,1,TGG
Tyrosine,Tyr,Y,2,TAT,TAC
Valine,Val,V,4,GTT,GTC,GTA,GTG
STOP,Sto,},3,TAG,TGA,TAA
hydrophobic:GAVLIMFWP
hydrophilic:STCYNQ,DE,KRH
|

November 27th, 2009, 04:23 PM
|
 |
Managing Editor - Vice President
|
|
Join Date: Feb 2006
Posts: 10,382
|
|
Re: Sequence Analysis Using MUSCLE
Thanks, great instructions!
|

November 29th, 2009, 09:12 PM
|
 |
Managing Editor - Vice President
|
|
Join Date: Feb 2006
Posts: 10,382
|
|
Re: Sequence Analysis Using MUSCLE
Converting bare sequences to FASTA format
1. Get bare sequence
2. paste in to the Readseq - biosequence conversion tool http://www.ebi.ac.uk/cgi-bin/readseq.cgi
3.Select PeasonFasta,
4 Select view in browser (or download to file ) click submit
5. Copy paste contents into MUSCLE http://www.ebi.ac.uk/Tools/muscle/
|

November 29th, 2009, 11:28 PM
|
 |
Managing Editor - Vice President
|
|
Join Date: Feb 2006
Posts: 10,382
|
|
Re: Mutations in A/H1N1 Not Confirmed to Affect Effectiveness of Current Vaccine
We both used the same data, why is the numbering different?
|

November 30th, 2009, 12:39 AM
|
 |
Managing Editor - Vice President
|
|
Join Date: Feb 2006
Posts: 10,382
|
|
Re: Mutations in A/H1N1 Not Confirmed to Affect Effectiveness of Current Vaccine
Quote:
Originally Posted by Sally
Comparing both sequences against the vaccine. A/California/07/2009(H1N1) accession number FJ969540
|
Why would there be an R (2 of them) on the A/California/07/2009(H1N1) at about positions 715 and 718 ?
|

November 30th, 2009, 02:51 AM
|
 |
Managing Editor - Vice President
|
|
Join Date: Feb 2006
Posts: 10,382
|
|
Stop codons: the Good, the Bad, and the Ugly
Comes with video
http://www.mcb.arizona.edu/courses/m.../XLateTut.html
Stop codons: the Good, the Bad, and the Ugly
Stop codons are a normal part of protein synthesis--they're the reason that all proteins don't go on 'forever'. Given a translation machinery that simply puts one foot in front of the other endlessly, a mechanism must exist for derailing the machine when its work is done. This machinery is the three Stop (or 'nonsense') codons and the proteins that read them. They're encoded by every gene, and are already there when the mRNA is produced--the whole process of translation is the interpretation of a ticker tape by an elegant machine (the ribosome) charged with 'translating' a nucleotide language into an amino acid language.
It is not known, at least by me, why there are 3 stop codons and why they are UAA, UAG and UGA (indeed, in some systems, such as some mitochondria, UGA actually specifies Trp instead of stop). But given that there are 64 possible codons and 3 mean 'stop', ON AVERAGE, with all other things being equal (which they never are...) 1 of 20 randomly selected codons says STOP. Similarly, if you're reading in an unanticipated/incorrect reading frame, you're in essence reading random codons, so will ON AVERAGE get about 20 amino acids before being stopped out. That's not very far!
The existence of stop codons needs to permeate your thinking about what is and is not 'fixable'. Sure, a -1 frameshift has the ability to compensate for a +1 frameshift--IF there is no intervening stop codon! Recall the translation tutorial (or review it if you can't recall it...). In the second movie shown, reading in the +1 frame (the result of a single nucleotide insertion) 'uncovered' a stop codon that derailed translation. In the third movie, our hero, in the form of a -1 frameshift (== nucleotide removal) fixed things 'just in time' such that reading frame was restored before the evil stop codon brought the party crashing down. Any mutations FURTHER DOWN (rightward, = the 3' direction) would have availed us naught.
Some simple questions to direct you thinking in fruitful ways about the influence of stop codons for good and ill:
--How can you pick a region such that you can be reasonably confident that a stop codon occurs in a given reading frame?
--if you don't wish to worry your pretty little head about the nasty possibility of stop codons, what locations will you choose to examine for your compensating mutations vis-a-vis the location of the mutation they're meant to fix?
--in general, what rules determine where a compensating mutation can occur relative to the mutation being 'fixed' or compensated for (this can be a little tricky, given most of our innate biases about who is the 'problem' and who the 'solution'--recall any frameshift is a drag unless corrected in a timely fashion, and that any solution is a good solution so long as we're still reading and reading in frame when we hit the 'business end' of the rIIb gene!
http://www.mcb.arizona.edu/courses/m...der/Stops.html
|

November 30th, 2009, 04:35 AM
|
|
Senior Moderator
|
|
Join Date: Feb 2006
Location: UK
Posts: 1,608
|
|
Re: Mutations in A/H1N1 Not Confirmed to Affect Effectiveness of Current Vaccine
Quote:
Originally Posted by Sally
We both used the same data, why is the numbering different?
|
The wonders of the numbering systems is something I have yet to master. I did use some other sequences in my alignment and a program called CLC sequence viewer for my nucleotide alignment. I exported a Custal .aln alignment file which I then loaded into Bioedit (because I am more familiar with it).
However if I adjust my aligned sequences so D225G really is at position 225 then the two non-change-changes are N2N (A NADTL) & D475D (HKC DNTC) or in nucleotide terms 6 & 1425.
EDIT:
oops this must be confusing the hell out of everyone as this is in the wrong thread and relates to Sally and my numbering differences on the Lviv sequences
|

November 30th, 2009, 04:46 AM
|
 |
Managing Editor - Vice President
|
|
Join Date: Feb 2006
Posts: 10,382
|
|
Re: Mutations in A/H1N1 Not Confirmed to Affect Effectiveness of Current Vaccine
Quote:
Originally Posted by JJackson
The wonders of the numbering systems is something I have yet to master. I did use some other sequences in my alignment and a program called CLC sequence viewer for my nucleotide alignment. I exported a Custal .aln alignment file which I then loaded into Bioedit (because I am more familiar with it).
However if I adjust my aligned sequences so D225G really is at position 225 then the two non-change-changes are N2N (ANADTL) & D475D (HKCDNTC) or in nucleotide terms 6 & 1425.
|
How did you get these easily. N2N (A NADTL) & D475D (HKC DNTC) . Do you have conversion program?
|

November 30th, 2009, 04:52 AM
|
|
Senior Moderator
|
|
Join Date: Feb 2006
Location: UK
Posts: 1,608
|
|
Re: Sequence Analysis Using MUSCLE
I use a program call Bioedit. You just hold the Ctrl key down and press G to toggle backwards and forwards between the Protein & Nucleotide sequences.
|

November 30th, 2009, 05:02 AM
|
 |
Editor-in-Chief & President
|
|
Join Date: Feb 2006
Posts: 23,694
|
|
Re: Mutations in A/H1N1 Not Confirmed to Affect Effectiveness of Current Vaccine
Quote:
Originally Posted by JJackson
The wonders of the numbering systems is something I have yet to master. I did use some other sequences in my alignment and a program called CLC sequence viewer for my nucleotide alignment. I exported a Custal .aln alignment file which I then loaded into Bioedit (because I am more familiar with it).
However if I adjust my aligned sequences so D225G really is at position 225 then the two non-change-changes are N2N (ANADTL) & D475D (HKCDNTC) or in nucleotide terms 6 & 1425.
EDIT:
oops this must be confusing the hell out of everyone as this is in the wrong thread and relates to Sally and my numbering differences on the Lviv sequences
|
This is good on this thread because this is the learning to read sequences thread.
__________________
"May the long time sun
Shine upon you,
All love surround you,
And the pure light within you
Guide your way on."
"Where your talents and the needs of the world cross, lies your calling."
Aristotle
“In a gentle way, you can shake the world.”
Mohandas Gandhi
Be the light that is within.
|

November 30th, 2009, 05:10 AM
|
 |
Managing Editor - Vice President
|
|
Join Date: Feb 2006
Posts: 10,382
|
|
Re: Sequence Analysis Using MUSCLE
Thanks JJackson.
How do we know what the start point of the sequence is? Is it where the START codon (AUG) is?
Is the difference between H1 and H3 numbering dependent on whether the start codon is included in the numbering?
|

November 30th, 2009, 05:31 AM
|
|
Senior Moderator
|
|
Join Date: Feb 2006
Location: UK
Posts: 1,608
|
|
Re: Sequence Analysis Using MUSCLE
Quote:
Originally Posted by Sally
Thanks JJackson.
How do we know what the start point of the sequence is? Is it where the START codon (AUG) is?
Is the difference between H1 and H3 numbering dependent on whether the start codon is included in the numbering?
|
I don't really know I have always assumed so as it seems the logical place to start. Nearly all my sequence aligning was done years ago on H5N1 where I just got to know the sequences around the areas that were of interest to me (mainly binding and cleavage sites). I have just begun to look at sequences again briefly over the last week or so and H1 is new to me as are the sites and online tools.
I have just been having a look for a downloadable Bioedit but it seems to no longer be available, however it is so basic it may just run from the executable without needing installing so if any one is interested I can see if it will work like that and upload a copy if it does. It is a lot more basic than most of the current stuff. The CLC (free version) only seems to work with multiple aligned nucleotides when I try to convert to proteins it splits the alignment into individual sequences which you can nolonger see side-by-side.
Bioedit screen captures before and after Ctrl+G
|

November 30th, 2009, 09:55 AM
|
|
Senior User
|
|
Join Date: Jun 2007
Posts: 2,964
|
|
Re: Sequence Analysis Using MUSCLE
Quote:
Originally Posted by gsgs
the nucleotide-sequences start with ~50 nucleotides
which are not decoded to amino-acids.
Often only parts of these 50 are given or none
The first occurrance of "ATG" is usually the first decoded amino-acid
(Methionine,Met,M)
also niman-H274Y is H275Y in N1
and D225G is D239G in H1
|
Notice what Gs says here about the starting place: the first occurrance of "ATG" = "M"
So on the MUSCLE alignment example, you will notice their starting positions may vary but consensus starts with the first "atg". See my remarks about counting:
Quote:
OK! The example I'm showing has the first "atg" for all three starting at position 9.
"We see the mutation at position 831 instead of 822;" so if I subtract 8, I'm at position 823... so I'm still 1 off?
|
I'm not sure how the counting will work out if we just look at the sequence itself instead of using an alignment program.
__________________
The salvage of human life ought to be placed above barter and exchange ~ Louis Harris, 1918
|

November 30th, 2009, 12:26 PM
|
 |
Registered User
|
|
Join Date: Feb 2006
Location: germany
Posts: 9,949
|
|
Re: Sequence Analysis Using MUSCLE
in segment 5 it's the 2nd ATG
what I use:
Code:
>A/Index/******/2009/02/01(H1N1)
XXXXXXXXXXXXXXXXXXXXTAGCAAAAAAGCAGGTCAAATATATTCAAT:ATGGAGAGAATA
XXAGTTTGTAAAGGGACGTCCAGTAAGCAAAAGCAGGTCAAACCATTTGA:ATGGATGT
XXXXXXXXXXXXXXXXXXXXXXXTTAGCAAAAAGCAGGTACTGATCCAAA:ATGGAAGACTTT
XXXXXXAGCAATAACAAGAGCAAAAGCAGGGGAAAACAAAAGCAACAAAA:ATGAAG
XTTAAGCAAAAGCAGGGTAGATAATCACCTCAATGAGTGACATCGAAGCC:ATGGCGT
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXAGCAAAAGCAGGAGTTTAAA:ATGAATCCAAACC
XXXXXXXXXXXXXXXXXXXXCAGGGAGCAAAAGCAGGTAGATATTTAAAG:ATGAGTCTTCT
XXXXXXXXXXXXXXXXXXXXXXXXAGCAAAAGCAGGGTGACAAAAACATA:ATGGACTCCAA
start is after the ":"
there are sometimes differences in the very first or last nucleotides, which I usually
ignore as supposed sequencing-errors
|

November 30th, 2009, 02:54 PM
|
|
Senior User
|
|
Join Date: Jun 2007
Posts: 2,964
|
|
Re: Sequence Analysis Using MUSCLE
Thanks for posting that; it's exactly what we needed.
Just to clarify,
I think Gs's example is of all the segments of one virus. We won't be comparing in that same manner; we compare like segments to like: segment 4 with segment 4, etc...
But this is how the sequences may look when aligned in MUSCLE. Notice how they all align at the ":" but some have fewer letters on the left hand side. MUSCLE begins searching for the mutations at the (":") consensus point.
Why do we start at the 2nd "atg" in segment 5?
__________________
The salvage of human life ought to be placed above barter and exchange ~ Louis Harris, 1918
|

January 1st, 2010, 02:42 AM
|
|
Senior User
|
|
Join Date: Jun 2007
Posts: 2,964
|
|
Re: Mutations in A/H1N1 Not Confirmed to Affect Effectiveness of Current Vaccine
Quote:
Originally Posted by Sally Furniss
Why would there be an R (2 of them) on the A/California/07/2009(H1N1) at about positions 715 and 718 ?
|
I found 3 California/07 sequences at Genbank. 2 have no changes and 1 has mixed signals at amino acid positions 225 (715-717 nucleotide) and 226 (718-720 nucleotide). "R" indicates "A"s and "G"s were found in that position.
Here's a list from gs what the letters in the nucleotide portion mean:
These are the nucleotides:
"A" = "Adenosine"
"C" = "Cytosine"
"G" = "Guanine"
"T" = "Thymidine"
These indicate changes:
"Y" = "Pyrimidine (C & T)"
"R" = "Purine (A & G)"
"W" = "Weak (A & T)"
"S" = "Strong (G & C)"
"K" = "Keto (T & G)"
"D" = "Not C"
"V" = "Not T"
"H" = "Not G"
"B" = "Not A"
"X" = "Unknown"
"N" = "Unknown"
__________________
The salvage of human life ought to be placed above barter and exchange ~ Louis Harris, 1918
|

September 17th, 2012, 02:20 AM
|
 |
Registered User
|
|
Join Date: Feb 2006
Location: germany
Posts: 9,949
|
|
Re: Sequence Analysis Using MUSCLE
-------------------------------------
better alignment program:
make a database of some hundred index-consensus-nucleotide-sequences
compute all their length-12-subsequences and store the sequence numbers and
positions into the 4^12 database of 12-subdequences.
Whenever you get a new sequence, lookup all their 12-subsequences in the database,
ffind the index-sequence with the maximum matches, choose their alignment.
If no index-sequence matches good enough, then add the new sequence to the index-database.
Easier to use, easier to program and faster and better than the existing alignment programs
that I'm aware of.
It _should_ exist already.
-----------------------------------------------------------
with the author's help I had installed MAFFT now as a Windows-XP executable.
Works well and quite fast. But I can't run it from batch since it doesn't return
to the commandline (command.com ?) when finished. Presumably just a small
programming error.
-------------------------------------------------------------------
For big databases (>~20MB) of similar sequences in my format (one header line,
one line with nucleotides or proteins, ascii13+10 = EOL) I use my own
program align.c which doesn't produce gaps, just finds the best match by shifting.
This is quite fast, e.g. 1 minute for 7000 avian PB2s ,16MB. But some (~1%)
of the sequences are bad aligned because of insertions or deletions (which are
usually sequencing errors). Those sequences can be filtered out and
aligned separately with MAFFT or MUSCLE, which then is much faster
because of the reduced size.
---------------------------------------------------------
but as I wrote above, IMO everyone should be using that 12-subsequences-method above.
For longer sequences of other species we could use 14-subsequences.
---------------------------------------------------------------------------
I have a program typesz1.c , that finds the best match from a list of index-sequences
using that subsequences-database-method, but it doesn't align (yet).
-------------------------------------------------------------
or types1.c : finds the flugenome.org types of the unaligned sequences in a file with that method
gb191 , 318MB all 204496 genbank flu-A sequences, it takes only 40seq to assign the types to it.
out of a list of 189 types. The error rate is low.
It should be possible to align the sequences with this methos in almost the same time
-------------------------------------------------------------
this is flu-specific alignment only. But larger databases could be built to include more species.
Long (DNA) sequences could be split
--------------------------------------------------------
searching ...
http://www.google.de/#hl=de&sclient=...iw=971&bih=512
http://en.wikipedia.org/wiki/Smith%E...rman_algorithm
...
http://mafft.cbrc.jp/alignment/software/source66.html
CONTRAfold algorithm, instead of the McCaskill algorithm
MAFFT is a multiple alignment method that includes two algorithmic techniques: ...
MAFFT employs a progressive method (FFT-NS-2) and an iterative refinement ...
MAFFT: a novel method for rapid multiple sequence alignment based
on fast Fourier transform. Kazutaka Katoh
All pairwise alignments are computed with the Needleman-Wunsch algorithm.
More accurate but slower than --6merpair.
----------------------------------------------------------
|

December 26th, 2012, 01:20 AM
|
 |
Registered User
|
|
Join Date: Feb 2006
Location: germany
Posts: 9,949
|
|
Re: Sequence Analysis Using MUSCLE
http://en.wikipedia.org/wiki/BLAT_(bioinformatics)
Windows executable:
http://hgwdev.cse.ucsc.edu/~kent/exe...Suite.33x5.zip
blat - Standalone BLAT v. 33x5 fast sequence search command line tool
usage:
blat database query [-ooc=11.ooc] output.psl
where:
database and query are each either a .fa , .nib or .2bit file,
or a list these files one file name per line.
-ooc=11.ooc tells the program to load over-occurring 11-mers from
and external file. This will increase the speed
by a factor of 40 in many cases, but is not required
output.psl is where to put the output.
Subranges of nib and .2bit files may specified using the syntax:
/path/file.nib:seqid:start-end
or
/path/file.2bit:seqid:start-end
or
/path/file.nib:start-end
With the second form, a sequence id of file:start-end will be used.
options:
-t=type Database type. Type is one of:
dna - DNA sequence
prot - protein sequence
dnax - DNA sequence translated in six frames to protein
The default is dna
-q=type Query type. Type is one of:
dna - DNA sequence
rna - RNA sequence
prot - protein sequence
dnax - DNA sequence translated in six frames to protein
rnax - DNA sequence translated in three frames to protein
The default is dna
-prot Synonymous with -t=prot -q=prot
-ooc=N.ooc Use overused tile file N.ooc. N should correspond to
the tileSize
-tileSize=N sets the size of match that triggers an alignment.
Usually between 8 and 12
Default is 11 for DNA and 5 for protein.
-stepSize=N spacing between tiles. Default is tileSize.
-oneOff=N If set to 1 this allows one mismatch in tile and still
triggers an alignments. Default is 0.
-minMatch=N sets the number of tile matches. Usually set from 2 to 4
Default is 2 for nucleotide, 1 for protein.
-minScore=N sets minimum score. This is the matches minus the
mismatches minus some sort of gap penalty. Default is 30
-minIdentity=N Sets minimum sequence identity (in percent). Default is
90 for nucleotide searches, 25 for protein or translated
protein searches.
-maxGap=N sets the size of maximum gap between tiles in a clump. Usually
set from 0 to 3. Default is 2. Only relevent for minMatch > 1.
-noHead suppress .psl header (so it's just a tab-separated file)
-makeOoc=N.ooc Make overused tile file. Target needs to be complete genome.
-repMatch=N sets the number of repetitions of a tile allowed before
it is marked as overused. Typically this is 256 for tileSize
12, 1024 for tile size 11, 4096 for tile size 10.
Default is 1024. Typically only comes into play with makeOoc
-mask=type Mask out repeats. Alignments won't be started in masked region
but may extend through it in nucleotide searches. Masked areas
are ignored entirely in protein or translated searches. Types are
lower - mask out lower cased sequence
upper - mask out upper cased sequence
out - mask according to database.out RepeatMasker .out file
file.out - mask database according to RepeatMasker file.out
-qMask=type Mask out repeats in query sequence. Similar to -mask above but
for query rather than target sequence.
-repeats=type Type is same as mask types above. Repeat bases will not be
masked in any way, but matches in repeat areas will be reported
separately from matches in other areas in the psl output.
-minRepDivergence=NN - minimum percent divergence of repeats to allow
them to be unmasked. Default is 15. Only relevant for
masking using RepeatMasker .out files.
-dots=N Output dot every N sequences to show program's progress
-trimT Trim leading poly-T
-noTrimA Don't trim trailing poly-A
-trimHardA Remove poly-A tail from qSize as well as alignments in
psl output
-fastMap Run for fast DNA/DNA remapping - not allowing introns,
requiring high %ID
-out=type Controls output file format. Type is one of:
psl - Default. Tab separated format, no sequence
pslx - Tab separated format with sequence
axt - blastz-associated axt format
maf - multiz-associated maf format
sim4 - similar to sim4 format
wublast - similar to wublast format
blast - similar to NCBI blast format
blast8- NCBI blast tabular format
blast9 - NCBI blast tabular format with comments
-fine For high quality mRNAs look harder for small initial and
terminal exons. Not recommended for ESTs
-maxIntron=N Sets maximum intron size. Default is 750000
-extendThroughN - Allows extension of alignment through large blocks of N's
|

December 27th, 2012, 07:05 AM
|
 |
Registered User
|
|
Join Date: Feb 2006
Location: germany
Posts: 9,949
|
|
Re: Sequence Analysis Using MUSCLE
mentioned on the DNA-forums was also: (wrt. RNA-virus alignment)
Bowtie
BWA
TopHat
|

January 8th, 2013, 01:00 AM
|
 |
Registered User
|
|
Join Date: Feb 2006
Location: germany
Posts: 9,949
|
|
Re: Sequence Analysis Using MUSCLE
|
 |
|
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
|
|
|
| Thread Tools |
Search this Thread |
|
|
|
| Display Modes |
Linear Mode
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is On
|
|
|
Disclaimers:
The reader is responsible for discerning the validity, factuality or implications of information posted here, be it fictional or based on real events. Moderators on this forum make every effort to review the material posted on this site however, it is not realistically possible for our staff to manually review each post.
The content of posts on this site, including but not limited to links to other web sites, are the expressed opinion of the original authors or posters and are not endorsed by, or representative of the opinions of, the owners or administration of this website. The posts on this website are the opinion of the specific author or poster and should not be construed as statements of advice or factual information.
Not all posts on this website are intended as truthful or factual assertion by their authors. NO posts on this website should be considered factual information on face value alone. Users are encouraged to USE DISCERNMENT and do their own follow up research while reading and posting on this website. FluTrackers.com Inc. reserves the right to make changes to, corrections and/or remove entirely at any time posts made on this website without notice. In addition, FluTrackers.com Inc. disclaims any and all liability for damages incurred directly or indirectly as a result of a post on this website.
This site is provided "as is" without warranty of any kind, either expressed or implied. You should not assume that this site is error-free or that it will be suitable for the particular purpose which you have in mind when using it. In no event shall FluTrackers.com Inc. be liable for any special, incidental, indirect or consequential damages of any kind, or any damages whatsoever, including, without limitation, those resulting from loss of use, data or profits, whether or not advised of the possibility of damage, and on any theory of liability, arising out of or in connection with the use or performance of this site or other documents which are referenced by or linked to this site.
Finally, FluTrackers.com Inc. reserves the right to delete, correct, or make changes to any post on this website without notice at any time for any reason.
Fair Use Notice:
This site may contain copyrighted material the use of which has not always been specifically authorized by the copyright owner. Users may make such material available in an effort to advance awareness and understanding of issues relating to public health, civil rights, economics, individual rights, international affairs, liberty, science & technology, etc. We believe this constitutes a 'fair use' of any such copyrighted material as provided for in section 107 of the US Copyright Law. In accordance with Title 17 U.S.C.Section 107, the material on this site is distributed to those who have expressed a prior interest in receiving the included information for research and educational purposes.
In accordance with industry accepted best practices we ask that users limit their copy / paste of copyrighted material to the relevant portions of the article you wish to discuss and no more than 1 paragraph, and in no case more than 50% of the source material provide a link back to the original article and provide your original comments / criticism in your post with the article. Please remember you are responsible for what you post on the internet and you could be sued by the original copyright holder if you do not honor these rules.
If you are a legal copyright holder or a designated agent for such and you believe a post on this website falls outside the boundaries of "Fair Use" and legitimately infringes on yours or your clients copyright
we may be contacted concerning copyright matters at:
FluTrackers.com Inc.
c/o Sharon Sanders
1676 Hibiscus Avenue
Winter Park, Florida 32789
Phone: 407-745-1513
E-Mail: flutrackers@earthlink.net
In accordance with section 512 of the U.S. Copyright Act our contact information has been registered with the United States Copyright Office. "Safe Harbor" noticing procedures as outlined in the DMCA apply to this website concerning all 3rd party posts published herein.
If notice is given of an alleged copyright violation we will act expeditiously to remove or disable access to the material(s) in question.
All 3rd party material posted on this website is the copyright of the respective owners / authors. FluTrackers.com Inc. makes no claim of copyright on such material.
For more information please visit:
http://www.law.cornell.edu/uscode/17/107.shtml
Please be aware any communications sent complaining about a post on this website may be posted publicly at the discretion of the administration.
FluTrackers Does Not Provide Any Medical Advice:
FluTrackers, Inc. does not provide medical advice. Information on this web site is collected from various internet resources, and the FluTrackers board of directors makes no warranty to the safety, efficacy, correctness or completeness of the information posted on this site by any author or poster.
The information collated here is for instructional and/or discussion purposes only and is NOT intended to diagnose or treat any disease, illness, or other medical condition. Every individual reader or poster should seek advice from their personal physician/healthcare practitioner before considering or using any interventions that are discussed on this website.
By continuing to access this website you agree to consult your personal physican before using any interventions posted on this website, and you agree to hold harmless FluTrackers.com Inc., the board of directors, the members, and all authors and posters for any effects from use of any medication, supplement, vitamin or other substance, device, intervention, etc. mentioned in posts on this website, or other internet venues referenced in posts on this website.
By using and/or accessing this site, either passively or actively, you are agreeing to all of the above conditions. Also, by using and/or accessing this site, either passively or actively, you agree to conduct all business and legal affairs related to this website in the jurisdiction of Flutrackers.com Inc. which is registered in Central Florida, USA.
These Disclaimers are subject to change at anytime.
Email the Webmaster with questions or comments about this site at flutrackers@earthlink.net
All times are GMT -5. The time now is 08:11 PM.
|