Sometimes I find it quicker to just look at a sequence instead of running it through tools; so I thought I'd share how I do it. If anyone sees any errors with what I'm saying or doing, please feel free to add it to this post.
Since we are all watching the D225 mutations in HA, that's what we'll look at. Here's an explanation of the numbering; I have no idea why people use all the different ones when talking about H1N1 pandemic flu. If we start at the first "M" (which we are doing, then to actually count, we'll find the "D" in position 239. If we were starting after the signal peptide sequence (at DTLC where I underlined), then we would find the "D" at position 222. If we were using H3 numbering, we would find it at 225.
This first sequence is one I chose because it's always easy to spot the mutations when the amino acid part starts with "M" and the nucleotide part starts with "atg". For space purposes, I'm just going to show you the parts that are relevant to this post. This is A/California/04/2009 from Genbank. It does not have the mutation.
First, the amino acid part; notice all the different letters. We are looking for position at 225, so we start counting beginning with the "M" and find the "D" right there (red). In all the sequences I've looked at, I've always found it in that place as long as this part begins with "M" (underlined). It's easier for me to spot a series of letters instead of just 1; so in line 5, I look for: PKVRDQEGRMNY. When there is a mutation, you will see an "N", "G", "E", "B" or "X" in place of the "D"
/translation="MKAILVVLLYTFATANADTLCIGYHANNSTDTVDTVLEKNVTVT
HSVNLLEDKHNGKLCKLRGVAPLHLGKCNIAGWILGNPECESLSTASSWS YIVETPSS
DNGTCYPGDFIDYEELREQLSSVSSFERFEIFPKTSSWPNHDSNKGVTAA CPHAGAKS
FYKNLIWLVKKGNSYPKLSKSYINDKGKEVLVLWGIHHPSTSADQQSLYQ NADTYVFV
GSSRYSKKFKPEIAIRPKVRDQEGRMNYYWTLVEPGDKITFEATGNLVVPRYAFAMER
NAGSGIIISDTPVHDCNTTCQTPKGAINTSLPFQNIHPITIGKCPKYVKS TKLRLATG
Then we look at the nucleotide portion, which only has "acgt"s except for the mutation letters, like "R" and some others. We are looking for positions 715, 716 and 717 because we know it takes 3 nucleotides to make 1 amino acid. We know that a "D" is "gat" (or gac) and we find it is the second triplet of letters (codon) from the end of line 661; I marked it in red. I usually look for the codon before and after, which will be normally be "agg gat caa".
When there is a "N" mutation, the codon will be "att", the "G" mutation will be "ggt" and the "E" will be "gaa" or there will be "r"s for mixed signals.
001 atgaaggcaa tactagtagt tctgctatat acatttgcaa ccgcaaatgc agacacatta
061 tgtataggtt atcatgcgaa caattcaaca gacactgtag acacagtact agaaaagaat
121 gtaacagtaa cacactctgt taaccttcta gaagacaagc ataacgggaa actatgcaaa
181 ctaagagggg tagccccatt gcatttgggt aaatgtaaca ttgctggctg gatcctggga
241 aatccagagt gtgaatcact ctccacagca agctcatggt cctacattgt ggaaacacct
301 agttcagaca atggaacgtg ttacccagga gatttcatcg attatgagga gctaagagag
361 caattgagct cagtgtcatc atttgaaagg tttgagatat tccccaagac aagttcatgg
421 cccaatcatg actcgaacaa aggtgtaacg gcagcatgtc ctcatgctgg agcaaaaagc
481 ttctacaaaa atttaatatg gctagttaaa aaaggaaatt catacccaaa gctcagcaaa
541 tcctacatta atgataaagg gaaagaagtc ctcgtgctat ggggcattca ccatccatct
601 actagtgctg accaacaaag tctctatcag aatgcagata catatgtttt tgtggggtca
661 tcaagataca gcaagaagtt caagccggaa atagcaataa gacccaaagt gagggatcaa
721 gaagggagaa tgaactatta ctggacacta gtagagccgg gagacaaaat aacattcgaa
781 gcaactggaa atctagtggt accgagatat gcattcgcaa tggaaagaaa tgctggatct
841 ggtattatca tttcagatac accagtccac gattgcaata caacttgtca aacacccaag
901 ggtgctataa acaccagcct cccatttcag aatatacatc cgatcacaat tggaaaatgt
Please feel free to ask questions because this probably is not crystal clear.
Since we are all watching the D225 mutations in HA, that's what we'll look at. Here's an explanation of the numbering; I have no idea why people use all the different ones when talking about H1N1 pandemic flu. If we start at the first "M" (which we are doing, then to actually count, we'll find the "D" in position 239. If we were starting after the signal peptide sequence (at DTLC where I underlined), then we would find the "D" at position 222. If we were using H3 numbering, we would find it at 225.
This first sequence is one I chose because it's always easy to spot the mutations when the amino acid part starts with "M" and the nucleotide part starts with "atg". For space purposes, I'm just going to show you the parts that are relevant to this post. This is A/California/04/2009 from Genbank. It does not have the mutation.
First, the amino acid part; notice all the different letters. We are looking for position at 225, so we start counting beginning with the "M" and find the "D" right there (red). In all the sequences I've looked at, I've always found it in that place as long as this part begins with "M" (underlined). It's easier for me to spot a series of letters instead of just 1; so in line 5, I look for: PKVRDQEGRMNY. When there is a mutation, you will see an "N", "G", "E", "B" or "X" in place of the "D"
/translation="MKAILVVLLYTFATANADTLCIGYHANNSTDTVDTVLEKNVTVT
HSVNLLEDKHNGKLCKLRGVAPLHLGKCNIAGWILGNPECESLSTASSWS YIVETPSS
DNGTCYPGDFIDYEELREQLSSVSSFERFEIFPKTSSWPNHDSNKGVTAA CPHAGAKS
FYKNLIWLVKKGNSYPKLSKSYINDKGKEVLVLWGIHHPSTSADQQSLYQ NADTYVFV
GSSRYSKKFKPEIAIRPKVRDQEGRMNYYWTLVEPGDKITFEATGNLVVPRYAFAMER
NAGSGIIISDTPVHDCNTTCQTPKGAINTSLPFQNIHPITIGKCPKYVKS TKLRLATG
Then we look at the nucleotide portion, which only has "acgt"s except for the mutation letters, like "R" and some others. We are looking for positions 715, 716 and 717 because we know it takes 3 nucleotides to make 1 amino acid. We know that a "D" is "gat" (or gac) and we find it is the second triplet of letters (codon) from the end of line 661; I marked it in red. I usually look for the codon before and after, which will be normally be "agg gat caa".
When there is a "N" mutation, the codon will be "att", the "G" mutation will be "ggt" and the "E" will be "gaa" or there will be "r"s for mixed signals.
001 atgaaggcaa tactagtagt tctgctatat acatttgcaa ccgcaaatgc agacacatta
061 tgtataggtt atcatgcgaa caattcaaca gacactgtag acacagtact agaaaagaat
121 gtaacagtaa cacactctgt taaccttcta gaagacaagc ataacgggaa actatgcaaa
181 ctaagagggg tagccccatt gcatttgggt aaatgtaaca ttgctggctg gatcctggga
241 aatccagagt gtgaatcact ctccacagca agctcatggt cctacattgt ggaaacacct
301 agttcagaca atggaacgtg ttacccagga gatttcatcg attatgagga gctaagagag
361 caattgagct cagtgtcatc atttgaaagg tttgagatat tccccaagac aagttcatgg
421 cccaatcatg actcgaacaa aggtgtaacg gcagcatgtc ctcatgctgg agcaaaaagc
481 ttctacaaaa atttaatatg gctagttaaa aaaggaaatt catacccaaa gctcagcaaa
541 tcctacatta atgataaagg gaaagaagtc ctcgtgctat ggggcattca ccatccatct
601 actagtgctg accaacaaag tctctatcag aatgcagata catatgtttt tgtggggtca
661 tcaagataca gcaagaagtt caagccggaa atagcaataa gacccaaagt gagggatcaa
721 gaagggagaa tgaactatta ctggacacta gtagagccgg gagacaaaat aacattcgaa
781 gcaactggaa atctagtggt accgagatat gcattcgcaa tggaaagaaa tgctggatct
841 ggtattatca tttcagatac accagtccac gattgcaata caacttgtca aacacccaag
901 ggtgctataa acaccagcct cccatttcag aatatacatc cgatcacaat tggaaaatgt
Please feel free to ask questions because this probably is not crystal clear.
Comment