Announcement

Collapse
No announcement yet.

Reading Sequences Without Using Tools

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Reading Sequences Without Using Tools

    Sometimes I find it quicker to just look at a sequence instead of running it through tools; so I thought I'd share how I do it. If anyone sees any errors with what I'm saying or doing, please feel free to add it to this post.

    Since we are all watching the D225 mutations in HA, that's what we'll look at. Here's an explanation of the numbering; I have no idea why people use all the different ones when talking about H1N1 pandemic flu. If we start at the first "M" (which we are doing, then to actually count, we'll find the "D" in position 239. If we were starting after the signal peptide sequence (at DTLC where I underlined), then we would find the "D" at position 222. If we were using H3 numbering, we would find it at 225.

    This first sequence is one I chose because it's always easy to spot the mutations when the amino acid part starts with "M" and the nucleotide part starts with "atg". For space purposes, I'm just going to show you the parts that are relevant to this post. This is A/California/04/2009 from Genbank. It does not have the mutation.

    First, the amino acid part; notice all the different letters. We are looking for position at 225, so we start counting beginning with the "M" and find the "D" right there (red). In all the sequences I've looked at, I've always found it in that place as long as this part begins with "M" (underlined). It's easier for me to spot a series of letters instead of just 1; so in line 5, I look for: PKVRDQEGRMNY. When there is a mutation, you will see an "N", "G", "E", "B" or "X" in place of the "D"

    /translation="MKAILVVLLYTFATANADTLCIGYHANNSTDTVDTVLEKNVTVT
    HSVNLLEDKHNGKLCKLRGVAPLHLGKCNIAGWILGNPECESLSTASSWS YIVETPSS
    DNGTCYPGDFIDYEELREQLSSVSSFERFEIFPKTSSWPNHDSNKGVTAA CPHAGAKS
    FYKNLIWLVKKGNSYPKLSKSYINDKGKEVLVLWGIHHPSTSADQQSLYQ NADTYVFV
    GSSRYSKKFKPEIAIRPKVRDQEGRMNYYWTLVEPGDKITFEATGNLVVPRYAFAMER
    NAGSGIIISDTPVHDCNTTCQTPKGAINTSLPFQNIHPITIGKCPKYVKS TKLRLATG


    Then we look at the nucleotide portion, which only has "acgt"s except for the mutation letters, like "R" and some others. We are looking for positions 715, 716 and 717 because we know it takes 3 nucleotides to make 1 amino acid. We know that a "D" is "gat" (or gac) and we find it is the second triplet of letters (codon) from the end of line 661; I marked it in red. I usually look for the codon before and after, which will be normally be "agg gat caa".

    When there is a "N" mutation, the codon will be "att", the "G" mutation will be "ggt" and the "E" will be "gaa" or there will be "r"s for mixed signals.

    001 atgaaggcaa tactagtagt tctgctatat acatttgcaa ccgcaaatgc agacacatta
    061 tgtataggtt atcatgcgaa caattcaaca gacactgtag acacagtact agaaaagaat
    121 gtaacagtaa cacactctgt taaccttcta gaagacaagc ataacgggaa actatgcaaa
    181 ctaagagggg tagccccatt gcatttgggt aaatgtaaca ttgctggctg gatcctggga
    241 aatccagagt gtgaatcact ctccacagca agctcatggt cctacattgt ggaaacacct
    301 agttcagaca atggaacgtg ttacccagga gatttcatcg attatgagga gctaagagag
    361 caattgagct cagtgtcatc atttgaaagg tttgagatat tccccaagac aagttcatgg
    421 cccaatcatg actcgaacaa aggtgtaacg gcagcatgtc ctcatgctgg agcaaaaagc
    481 ttctacaaaa atttaatatg gctagttaaa aaaggaaatt catacccaaa gctcagcaaa
    541 tcctacatta atgataaagg gaaagaagtc ctcgtgctat ggggcattca ccatccatct
    601 actagtgctg accaacaaag tctctatcag aatgcagata catatgtttt tgtggggtca
    661 tcaagataca gcaagaagtt caagccggaa atagcaataa gacccaaagt gagggatcaa
    721 gaagggagaa tgaactatta ctggacacta gtagagccgg gagacaaaat aacattcgaa
    781 gcaactggaa atctagtggt accgagatat gcattcgcaa tggaaagaaa tgctggatct
    841 ggtattatca tttcagatac accagtccac gattgcaata caacttgtca aacacccaag
    901 ggtgctataa acaccagcct cccatttcag aatatacatc cgatcacaat tggaaaatgt

    Please feel free to ask questions because this probably is not crystal clear.
    The salvage of human life ought to be placed above barter and exchange ~ Louis Harris, 1918

  • #2
    Re: Reading Sequences Without Using Tools

    I use FireFox as a browser and if you 'Ctrl + F' it puts a 'Find' bar at the bottom of the page. Place the sequence you are interested in e.g. PKVRDQEGRMNY in the find box and it will stay there as you change pages. If a sequence has PKVRDQEGRMNY it will high light it for you - just check the next letter on each page.

    Comment


    • #3
      Re: Reading Sequences Without Using Tools

      Well, thanks for that tip JJackson. That's pretty slick.

      I use Opera and tried it. A little box popped up and I pasted the series of letters into it then opened a sequence. It told me that text couldn't be found. I immediately thought it wasn't working, until I looked and saw: PKVRGQEGRMNY

      I wanted to add a note about counting in the amino acid part; gs has said a number of times we need to add 14 to 225; so the actual position is at 239.
      The salvage of human life ought to be placed above barter and exchange ~ Louis Harris, 1918

      Comment


      • #4
        Re: Reading Sequences Without Using Tools

        In reality I generally use about 6 letters running up to the point I am interested in - so they act as a pointer to the position that is likely to change - and do not include it. Avoids the problem of having to hunt for the exact point of chanaged sequences. I have not tried Opera but IE tends to hide the 'find' box if you change pages where FF keeps it on top with the same text in the box as you jump around between the tabs you have opened, which makes it very quick to spot the SNPs you are interested in

        Comment


        • #5
          Re: Reading Sequences Without Using Tools

          that works for single sequences, but you have to open the genbank-record,
          which is slow for me (for you too ?)

          But there are thousands of sequences, so I download them in bulk
          and then analyze with software.


          someone should do this and publish the mutations
          regularly on some webpage. No need that everyone who is interested
          repeats the procedure
          I'm interested in expert panflu damage estimates
          my current links: http://bit.ly/hFI7H ILI-charts: http://bit.ly/CcRgT

          Comment


          • #6
            Re: Reading Sequences Without Using Tools

            JJackson,
            I thought that would be ideal at GISAID, where it's harder to locate positions. But I couldn't get it to work there with Opera, didn't try with IE and I've never been able to get FF to work on this computer.

            Gs,
            Genbank is slow for me, too. I can look at all the sequences in your file in the time it takes me to look at 1 from Genbank. Your program is great; thanks for sharing it.

            I can post lists if there is an interest; the one for non-pandemic flu 225 mutations is quite long and dates back quite a few years.
            The salvage of human life ought to be placed above barter and exchange ~ Louis Harris, 1918

            Comment


            • #7
              Re: Reading Sequences Without Using Tools

              Using find is only really useful if you are looking at a small number of new sequences for a specific SNP. For larger numbers of sequences, or changes in other positions, gsgs is right download align and compare - it is quicker in the long-run.

              Comment


              • #8
                Re: Reading Sequences Without Using Tools

                I mentioned to Sally some time ago, I thought it would be really helpful for me to understand the codons if I could actually see the triplets. She found that option in a feature at this site:
                http://www.fludb.org/brc/home.do?decorator=influenza It's similar to Genbank with some different features.

                I use this often when I'm trying to see what position a mutation is in. Feel free to copy it or I can post instructions for the site if anyone is interested.

                This is InDRE/4487 and it only has 1 mutation in this HA segment: A719G. The underlined 0048 shows how many nucleotides are in each row and the mutations are the next letters/numbers to the right.

                ATG AAG GCA ATA CTA GTA GTT CTG CTA TAT ACA TTT GCA ACC GCA AAT 0048
                GCA GAC ACA TTA TGT ATA GGT TAT CAT GCG AAC AAT TCA ACA GAC ACT 0096
                GTA GAC ACA GTA CTA GAA AAG AAT GTA ACA GTA ACA CAC TCT GTT AAC 0144
                CTT CTA GAA GAC AAG CAT AAC GGG AAA CTA TGC AAA CTA AGA GGG GTA 0192
                GCC CCA TTG CAT TTG GGT AAA TGT AAC ATT GCT GGC TGG ATC CTG GGA 0240 G237A
                AAT CCA GAG TGT GAA TCA CTC TCC ACA GCA AGC TCA TGG TCC TAC ATT 0288
                GTG GAA ACA TCT AGT TCA GAC AAT GGA ACG TGT TAC CCA GGA GAT TTC 0336
                ATC GAT TAT GAG GAG CTA AGA GAG CAA TTG AGC TCA GTG TCA TCA TTT 0384
                GAA AGG TTT GAG ATA TTC CCC AAG ACA AGT TCA TGG CCC AAT CAT GAC 0432
                TCG AAC AAA GGT GTA ACG GCA GCA TGT CCT CAT GCT GGA GCA AAA AGC 0480
                TTC TAC AAA AAT TTA ATA TGG CTA GTT AAA AAA GGA AAT TCA TAC CCA 0528
                AAG CTC AGC AAA TCC TAC ATT AAT GAT AAA GGG AAA GAA GTC CTC GTG 0576
                CTA TGG GGC ATT CAC CAT CCA TCT ACT AGT GCT GAC CAA CAA AGT CTC 0624 G610K
                TAT CAG AAT GCA GAT GCA TAT GTT TTT GTG GGG TCA TCA AGA TAC AGC 0672 T658A
                AAG AAG TTC AAG CCG GAA ATA GCA ATA AGA CCC AAA GTG AGG GAT CGA 0720 A719G
                GAA GGG AGA ATG AAC TAT TAC TGG ACA CTA GTA GAG CCG GGA GAC AAA 0768 A733G
                ATA ACA TTC GAA GCA ACT GGA AAT CTA GTG GTA CCG AGA TAT GCA TTC 0816
                GCA ATG GAA AGA AAT GCT GGA TCT GGT ATT ATC ATT TCA GAT ACA CCA 0864
                GTC CAC GAT TGC AAT ACA ACT TGT CAG ACA CCC AAG GGT GCT ATA AAC 0912 C870T(M)
                ACC AGC CTC CCA TTT CAG AAT ATA CAT CCG ATC ACA ATT GGA AAA TGT 0960
                CCA AAA TAT GTA AAA AGC ACA AAA TTG AGA CTG GCC ACA GGA TTG AGG 1008
                AAT GTC CCG TCT ATT CAA TCT AGA GGC CTA TTT GGG GCC ATT GCC GGT 1056
                TTC ATT GAA GGG GGG TGG ACA GGG ATG GTA GAT GGA TGG TAC GGT TAT 1104
                CAC CAT CAA AAT GAG CAG GGG TCA GGA TAT GCA GCC GAC CTG AAG AGC 1152
                ACA CAG AAT GCC ATT GAC GAG ATT ACT AAC AAA GTA AAT TCT GTT ATT 1200
                GAA AAG ATG AAT ACA CAG TTC ACA GCA GTA GGT AAA GAG TTC AAC CAC 1248
                CTG GAA AAA AGA ATA GAG AAT TTA AAT AAA AAA GTT GAT GAT GGT TTC 1296 A1281G
                CTG GAC ATT TGG ACT TAC AAT GCC GAA CTG TTG GTT CTA TTG GAA AAT 1344
                GAA AGA ACT TTG GAC TAC CAC GAT TCA AAT GTG AAG AAC TTA TAT GAA 1392
                AAG GTA AGA AGC CAG CTA AAA AAC AAT GCC AAG GAA ATT GGA AAC GGC 1440 C1408T
                TGC TTT GAA TTT TAC CAC AAA TGC GAT AAC ACG TGC ATG GAA AGT GTC 1488
                AAA AAT GGG ACT TAT GAC TAC CCA AAA TAC TCA GAG GAA GCA AAA TTA 1536
                AAC AGA GAA GAA ATA GAT GGG GTA AAG CTG GAA TCA ACA AGG ATT TAC 1584
                CAG ATT TTG GCG ATC TAT TCA ACT GTC GCC AGT TCA TTG GTA CTG GTA 1632
                GTC TCC CTG GGG GCA ATC AGT TTC TGG ATG TGC TCT AAT GGG TCT CTA 1680
                CAG TGT AGA ATA TGT ATT TAA 1701
                The salvage of human life ought to be placed above barter and exchange ~ Louis Harris, 1918

                Comment

                Working...
                X