Announcement

Collapse
No announcement yet.

Genbank Metadata

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Genbank Metadata

    I recently discovered that for many sequences they have additional
    data at genbank. (link)
    http://www.ncbi.nlm.nih.gov/Traces/t...CODE%3D'RT-PCR'



    When they sequence a flu-genome, they get ~200 partial subsequences
    from "random" positions in the genome of length ~500-800 nucleotides.
    From these subsequences the genome is assembled, they usually overlap
    and every position is covered multiple times.
    And for ~9000 flu-genomes (7000 with ftp) these sets of subsequences
    are also available.
    I took the ~600 sets from wild birds only, filtered those that for some reason
    didn't work with my programs and 435 were remaining for which I calculated
    the alignments, the average number of subsequences that cover a position
    (green in the pic) and the probability that the finally assigned nucleotide
    (I don't know how they assign that value) differs from the average
    (black in the pic)

    I noticed that the region in H2 was more often covered than the others,
    maybe subsequences break and start preferrably in that region

    ======================================
    Attached Files
    I'm interested in expert panflu damage estimates
    my current links: http://bit.ly/hFI7H ILI-charts: http://bit.ly/CcRgT
Working...
X