API
===

.. toctree::
   :maxdepth: 2

   API/convert
   API/gff
   API/genbank
   API/fasta
   API/consensus
   API/stats
   API/go
   API/utils


GFFtk works by parsing annotation files and storing in a python dictionary. After initial parsing
the records are sorted by contig and start position, translated into protein space to test complete
gene models or not, and then output in an a python OrderedDict(). The structure looks like this:

.. code-block:: none

   locustag: {
      'contig': contigName,  #string
      'type': [],  # list of str one for each transcript mRNA/rRNA/tRNA/ncRNA
      'location': (start, end), #integer tuple
      'strand': +/-, #string
      'ids': [transcript/protein IDs], #list
      'mRNA':[[(ex1,ex1),(ex2,ex2)]], #list of lists of tuples (start, end)
      'CDS':[[(cds1,cds1),(cds2,cds2)]], #list of lists of tuples (start, end)
      'transcript': [seq1, seq2], #list of mRNA trnascripts
      'cds_transcript': [seq1, seq2], #list of mRNA trnascripts (no UTRs)
      'protein': [protseq1,protseq2], #list of CDS translations
      'codon_start': [1,1], #codon start for translations
      'note': [[first note, second note], [first, second, etc]], #list of lists
      'name': genename, # str common gene name
      'product': [hypothetical protein, velvet complex], #list of product definitions
      'gene_synonym': [], # list of gene name Aliases
      'EC_number': [[ec number]], # list of lists
      'go_terms': [[GO:0000001,GO:0000002]],  #list of lists
      'db_xref': [[InterPro:IPR0001,PFAM:004384]],  #list of lists
      'partialStart': [bool], # list of True/False for each transcript
      'partialStop': [bootl], $ list of True/False for each transcript
      'source': source, # string annotation source
      'phase': [[0,2,1]], list of lists
      '5UTR': [[(),()]], #list of lists of tuples (start, end)
      '3UTR': [[(),()]] #list of lists of tuples (start, end)
      }
   }