Convert

A module for converting genome annotation (GFF/TBL) files into different formats.

gfftk.convert.gff2cdstranscripts(gff, fasta, output=False, table=1, debug=False, grep=[], grepv=[])

Convert GFF3 format to CDS transcript [no UTRs] FASTA format.

Will parse GFF3 format into GFFtk annotation dictionary and then write CDS transcripts in FASTA format.

Parameters:
  • gff (filename) – genome annotation text file in GFF3 format

  • fasta (filename) – genome sequence in FASTA format

  • table (int, default=1) – codon table [1]

  • debug (bool, default=False) – print debug information to stderr

  • output (str, default=sys.stdout) – translated amino acids (proteins) in FASTA format

  • grep (list, default=[]) – Filter gene models, keep matches. [key:value]

  • grepv (list, default=[]) – Filter gene models, remove matches [key:value]

gfftk.convert.gff2combined(gff, fasta, output=False, table=1, debug=False, grep=[], grepv=[])

Convert GFF3 and FASTA to combined GFF3+FASTA format.

Will parse GFF3 format into GFFtk annotation dictionary and then write to combined GFF3+FASTA format with both annotations and sequences.

Parameters:
  • gff (filename) – genome annotation text file in GFF3 format

  • fasta (filename) – genome sequence in FASTA format

  • table (int, default=1) – codon table [1]

  • debug (bool, default=False) – print debug information to stderr

  • output (str, default=sys.stdout) – combined GFF3+FASTA format file

  • grep (list, default=[]) – filter results to only include gene models with locus_tag matching grep

  • grepv (list, default=[]) – filter results to exclude gene models with locus_tag matching grepv

gfftk.convert.gff2gbff(gff, fasta, output=False, table=1, organism=False, strain=False, debug=False, tmpdir='/tmp', cleanup=True, grep=[], grepv=[])

Convert GFF3 format to GenBank format.

Will parse GFF3 format into GFFtk annotation dictionary and then write to GenBank output.

Parameters:
  • gff (filename) – genome annotation text file in NCBI tbl format

  • fasta (filename) – genome sequence in FASTA format

  • table (int, default=1) – codon table [1]

  • output (str, default=sys.stdout) – annotation file in GenBank format

gfftk.convert.gff2gff3(gff, fasta, output=False, table=1, debug=False, grep=[], grepv=[], url_encode=False)

Convert GFF3 format to GFF3 format with filtering.

Will parse GFF3 format into GFFtk annotation dictionary, apply filtering, and then write to GFF3 output. This is useful for filtering GFF3 files. Default is to write to stdout.

Parameters:
  • gff (filename) – genome annotation text file in GFF3 format

  • fasta (filename) – genome sequence in FASTA format

  • table (int, default=1) – codon table [1]

  • debug (bool, default=False) – print debug information to stderr

  • output (str, default=sys.stdout) – annotation file in GFF3 format

  • grep (list, default=[]) – Filter gene models, keep matches. [key:value]

  • grepv (list, default=[]) – Filter gene models, remove matches [key:value]

gfftk.convert.gff2gtf(gff, fasta, output=False, table=1, debug=False, grep=[], grepv=[])

Convert GFF3 format to GTF format.

Will parse GFF3 format into GFFtk annotation dictionary and then write to GTF output. Only coding genes are output with this method. Default is to write to stdout.

Parameters:
  • gff (filename) – genome annotation text file in NCBI tbl format

  • fasta (filename) – genome sequence in FASTA format

  • table (int, default=1) – codon table [1]

  • debug (bool, default=False) – print debug information to stderr

  • output (str, default=sys.stdout) – annotation file in GTF format

  • grep (list, default=[]) – Filter gene models, keep matches. [key:value]

  • grepv (list, default=[]) – Filter gene models, remove matches [key:value]

gfftk.convert.gff2proteins(gff, fasta, output=False, table=1, strip_stop=False, debug=False, grep=[], grepv=[])

Convert GFF3 format to translated protein FASTA format.

Will parse GFF3 format into GFFtk annotation dictionary and then write protein coding translations to FASTA format.

Parameters:
  • gff (filename) – genome annotation text file in GFF3 format

  • fasta (filename) – genome sequence in FASTA format

  • table (int, default=1) – codon table [1]

  • strip_stop (bool, default=False) – remove stop codons (*) from translation

  • debug (bool, default=False) – print debug information to stderr

  • output (str, default=sys.stdout) – translated amino acids (proteins) in FASTA format

gfftk.convert.gff2tbl(gff, fasta, output=False, table=1, debug=False, grep=[], grepv=[])

Convert GFF3 format to NCBI TBL format .

Will parse GFF3 annotation format into GFFtk annotation dictionary and then write to NCBI TBL output. Default is to write to stdout.

Parameters:
  • gff (filename) – genome annotation text file in GFF3 format

  • fasta (filename) – genome sequence in FASTA format

  • table (int, default=1) – codon table [1]

  • debug (bool, default=False) – print debug information to stderr

  • output (str, default=sys.stdout) – annotation file in NCBI TBL format

gfftk.convert.gff2transcripts(gff, fasta, output=False, table=1, debug=False, grep=[], grepv=[])

Convert GFF3 format to transcript FASTA format.

Will parse GFF3 format into GFFtk annotation dictionary and then write transcripts in FASTA format.

Parameters:
  • gff (filename) – genome annotation text file in GFF3 format

  • fasta (filename) – genome sequence in FASTA format

  • table (int, default=1) – codon table [1]

  • debug (bool, default=False) – print debug information to stderr

  • output (str, default=sys.stdout) – translated amino acids (proteins) in FASTA format

  • grep (list, default=[]) – Filter gene models, keep matches. [key:value]

  • grepv (list, default=[]) – Filter gene models, remove matches [key:value]

gfftk.convert.gtf2cdstranscripts(gff, fasta, output=False, table=1, debug=False, grep=[], grepv=[])

Convert GTF format to CDS transcript [no UTRs] FASTA format.

Will parse GFF3 format into GFFtk annotation dictionary and then write CDS transcripts in FASTA format.

Parameters:
  • gff (filename) – genome annotation text file in GTF format

  • fasta (filename) – genome sequence in FASTA format

  • table (int, default=1) – codon table [1]

  • debug (bool, default=False) – print debug information to stderr

  • output (str, default=sys.stdout) – translated amino acids (proteins) in FASTA format

gfftk.convert.gtf2gbff(gtf, fasta, output=False, table=1, organism=False, strain=False, debug=False, tmpdir='/tmp', cleanup=True, grep=[], grepv=[])

Convert GTF format to GenBank format.

Will parse GTF format into GFFtk annotation dictionary and then write to GenBank output.

Parameters:
  • gtf (filename) – genome annotation text file in NCBI tbl format

  • fasta (filename) – genome sequence in FASTA format

  • table (int, default=1) – codon table [1]

  • output (str, default=sys.stdout) – annotation file in GenBank format

gfftk.convert.gtf2gff(gff, fasta, output=False, table=1, debug=False, grep=[], grepv=[])

Convert GTF format to GFF format.

Will parse GTF format into GFFtk annotation dictionary and then write to GFF3 output. Only coding genes are output with this method. Default is to write to stdout.

Parameters:
  • gff (filename) – genome annotation text file in NCBI tbl format

  • fasta (filename) – genome sequence in FASTA format

  • table (int, default=1) – codon table [1]

  • debug (bool, default=False) – print debug information to stderr

  • output (str, default=sys.stdout) – annotation file in GTF format

gfftk.convert.gtf2proteins(gff, fasta, output=False, table=1, strip_stop=False, debug=False, grep=[], grepv=[])

Convert GTF format to translated protein FASTA format.

Will parse GTF format into GFFtk annotation dictionary and then write protein coding translations to FASTA format.

Parameters:
  • gff (filename) – genome annotation text file in GTF format

  • fasta (filename) – genome sequence in FASTA format

  • table (int, default=1) – codon table [1]

  • strip_stop (bool, default=False) – remove stop codons (*) from translation

  • debug (bool, default=False) – print debug information to stderr

  • output (str, default=sys.stdout) – translated amino acids (proteins) in FASTA format

gfftk.convert.gtf2tbl(gff, fasta, output=False, table=1, debug=False, grep=[], grepv=[])

Convert GTF format to NCBI TBL format .

Will parse GTF annotation format into GFFtk annotation dictionary and then write to NCBI TBL output. Default is to write to stdout.

Parameters:
  • gff (filename) – genome annotation text file in GTF format

  • fasta (filename) – genome sequence in FASTA format

  • table (int, default=1) – codon table [1]

  • debug (bool, default=False) – print debug information to stderr

  • output (str, default=sys.stdout) – annotation file in NCBI TBL format

gfftk.convert.gtf2transcripts(gff, fasta, output=False, table=1, debug=False, grep=[], grepv=[])

Convert GTF format to transcript FASTA format.

Will parse GTF format into GFFtk annotation dictionary and then write transcripts in FASTA format.

Parameters:
  • gff (filename) – genome annotation text file in GTF format

  • fasta (filename) – genome sequence in FASTA format

  • table (int, default=1) – codon table [1]

  • debug (bool, default=False) – print debug information to stderr

  • output (str, default=sys.stdout) – translated amino acids (proteins) in FASTA format

gfftk.convert.tbl2cdstranscripts(tbl, fasta, output=False, table=1, grep=[], grepv=[])

Convert NCBI TBL format to CDS transcript [no UTRS] in FASTA format.

Will parse NCBI TBL format into GFFtk annotation dictionary and then write CDS transcripts in FASTA format.

Parameters:
  • tbl (filename) – genome annotation text file in NCBI tbl format

  • fasta (filename) – genome sequence in FASTA format

  • table (int, default=1) – codon table [1]

  • output (str, default=sys.stdout) – translated amino acids (proteins) in FASTA format

gfftk.convert.tbl2gbff(tbl, fasta, output=False, table=1, organism=False, strain=False, tmpdir='/tmp', cleanup=True, grep=[], grepv=[])

Convert NCBI TBL format to GenBank format.

Will parse NCBI TBL format into GFFtk annotation dictionary and then write to GenBank output.

Parameters:
  • tbl (filename) – genome annotation text file in NCBI tbl format

  • fasta (filename) – genome sequence in FASTA format

  • table (int, default=1) – codon table [1]

  • output (str, default=sys.stdout) – annotation file in GenBank format

gfftk.convert.tbl2gff3(tbl, fasta, output=False, table=1, grep=[], grepv=[])

Convert NCBI TBL format to GFF3 format.

Will parse NCBI TBL format into GFFtk annotation dictionary and then write to GFF3 output. Default is to write to stdout.

Parameters:
  • tbl (filename) – genome annotation text file in NCBI tbl format

  • fasta (filename) – genome sequence in FASTA format

  • table (int, default=1) – codon table [1]

  • output (str, default=sys.stdout) – annotation file in GFF3 format

  • grep (list, default=[]) – Filter gene models, keep matches. [key:value]

  • grepv (list, default=[]) – Filter gene models, remove matches [key:value]

gfftk.convert.tbl2gtf(tbl, fasta, output=False, table=1, grep=[], grepv=[])

Convert NCBI TBL format to GTF format.

Will parse NCBI TBL format into GFFtk annotation dictionary and then write to GTF output. Only coding genes are output with this method. Default is to write to stdout.

Parameters:
  • tbl (filename) – genome annotation text file in NCBI tbl format

  • fasta (filename) – genome sequence in FASTA format

  • table (int, default=1) – codon table [1]

  • output (str, default=sys.stdout) – annotation file in GTF format

  • grep (list, default=[]) – Filter gene models, keep matches. [key:value]

  • grepv (list, default=[]) – Filter gene models, remove matches [key:value]

gfftk.convert.tbl2proteins(tbl, fasta, output=False, table=1, strip_stop=False, grep=[], grepv=[])

Convert NCBI TBL format to translated protein FASTA format.

Will parse NCBI TBL format into GFFtk annotation dictionary and then write protein coding translations to FASTA format.

Parameters:
  • tbl (filename) – genome annotation text file in NCBI tbl format

  • fasta (filename) – genome sequence in FASTA format

  • table (int, default=1) – codon table [1]

  • strip_stop (bool, default=False) – remove stop codons (*) from translation

  • output (str, default=sys.stdout) – translated amino acids (proteins) in FASTA format

gfftk.convert.tbl2transcripts(tbl, fasta, output=False, table=1, grep=[], grepv=[])

Convert NCBI TBL format to transcript FASTA format.

Will parse NCBI TBL format into GFFtk annotation dictionary and then write transcripts in FASTA format.

Parameters:
  • tbl (filename) – genome annotation text file in NCBI tbl format

  • fasta (filename) – genome sequence in FASTA format

  • table (int, default=1) – codon table [1]

  • output (str, default=sys.stdout) – translated amino acids (proteins) in FASTA format