Convert
A module for converting genome annotation (GFF/TBL) files into different formats.
- gfftk.convert.gff2cdstranscripts(gff, fasta, output=False, table=1, debug=False, grep=[], grepv=[])
Convert GFF3 format to CDS transcript [no UTRs] FASTA format.
Will parse GFF3 format into GFFtk annotation dictionary and then write CDS transcripts in FASTA format.
- Parameters:
gff (filename) – genome annotation text file in GFF3 format
fasta (filename) – genome sequence in FASTA format
table (int, default=1) – codon table [1]
debug (bool, default=False) – print debug information to stderr
output (str, default=sys.stdout) – translated amino acids (proteins) in FASTA format
grep (list, default=[]) – Filter gene models, keep matches. [key:value]
grepv (list, default=[]) – Filter gene models, remove matches [key:value]
- gfftk.convert.gff2combined(gff, fasta, output=False, table=1, debug=False, grep=[], grepv=[])
Convert GFF3 and FASTA to combined GFF3+FASTA format.
Will parse GFF3 format into GFFtk annotation dictionary and then write to combined GFF3+FASTA format with both annotations and sequences.
- Parameters:
gff (filename) – genome annotation text file in GFF3 format
fasta (filename) – genome sequence in FASTA format
table (int, default=1) – codon table [1]
debug (bool, default=False) – print debug information to stderr
output (str, default=sys.stdout) – combined GFF3+FASTA format file
grep (list, default=[]) – filter results to only include gene models with locus_tag matching grep
grepv (list, default=[]) – filter results to exclude gene models with locus_tag matching grepv
- gfftk.convert.gff2gbff(gff, fasta, output=False, table=1, organism=False, strain=False, debug=False, tmpdir='/tmp', cleanup=True, grep=[], grepv=[])
Convert GFF3 format to GenBank format.
Will parse GFF3 format into GFFtk annotation dictionary and then write to GenBank output.
- gfftk.convert.gff2gff3(gff, fasta, output=False, table=1, debug=False, grep=[], grepv=[], url_encode=False)
Convert GFF3 format to GFF3 format with filtering.
Will parse GFF3 format into GFFtk annotation dictionary, apply filtering, and then write to GFF3 output. This is useful for filtering GFF3 files. Default is to write to stdout.
- Parameters:
gff (filename) – genome annotation text file in GFF3 format
fasta (filename) – genome sequence in FASTA format
table (int, default=1) – codon table [1]
debug (bool, default=False) – print debug information to stderr
output (str, default=sys.stdout) – annotation file in GFF3 format
grep (list, default=[]) – Filter gene models, keep matches. [key:value]
grepv (list, default=[]) – Filter gene models, remove matches [key:value]
- gfftk.convert.gff2gtf(gff, fasta, output=False, table=1, debug=False, grep=[], grepv=[])
Convert GFF3 format to GTF format.
Will parse GFF3 format into GFFtk annotation dictionary and then write to GTF output. Only coding genes are output with this method. Default is to write to stdout.
- Parameters:
gff (filename) – genome annotation text file in NCBI tbl format
fasta (filename) – genome sequence in FASTA format
table (int, default=1) – codon table [1]
debug (bool, default=False) – print debug information to stderr
output (str, default=sys.stdout) – annotation file in GTF format
grep (list, default=[]) – Filter gene models, keep matches. [key:value]
grepv (list, default=[]) – Filter gene models, remove matches [key:value]
- gfftk.convert.gff2proteins(gff, fasta, output=False, table=1, strip_stop=False, debug=False, grep=[], grepv=[])
Convert GFF3 format to translated protein FASTA format.
Will parse GFF3 format into GFFtk annotation dictionary and then write protein coding translations to FASTA format.
- Parameters:
gff (filename) – genome annotation text file in GFF3 format
fasta (filename) – genome sequence in FASTA format
table (int, default=1) – codon table [1]
strip_stop (bool, default=False) – remove stop codons (*) from translation
debug (bool, default=False) – print debug information to stderr
output (str, default=sys.stdout) – translated amino acids (proteins) in FASTA format
- gfftk.convert.gff2tbl(gff, fasta, output=False, table=1, debug=False, grep=[], grepv=[])
Convert GFF3 format to NCBI TBL format .
Will parse GFF3 annotation format into GFFtk annotation dictionary and then write to NCBI TBL output. Default is to write to stdout.
- gfftk.convert.gff2transcripts(gff, fasta, output=False, table=1, debug=False, grep=[], grepv=[])
Convert GFF3 format to transcript FASTA format.
Will parse GFF3 format into GFFtk annotation dictionary and then write transcripts in FASTA format.
- Parameters:
gff (filename) – genome annotation text file in GFF3 format
fasta (filename) – genome sequence in FASTA format
table (int, default=1) – codon table [1]
debug (bool, default=False) – print debug information to stderr
output (str, default=sys.stdout) – translated amino acids (proteins) in FASTA format
grep (list, default=[]) – Filter gene models, keep matches. [key:value]
grepv (list, default=[]) – Filter gene models, remove matches [key:value]
- gfftk.convert.gtf2cdstranscripts(gff, fasta, output=False, table=1, debug=False, grep=[], grepv=[])
Convert GTF format to CDS transcript [no UTRs] FASTA format.
Will parse GFF3 format into GFFtk annotation dictionary and then write CDS transcripts in FASTA format.
- Parameters:
- gfftk.convert.gtf2gbff(gtf, fasta, output=False, table=1, organism=False, strain=False, debug=False, tmpdir='/tmp', cleanup=True, grep=[], grepv=[])
Convert GTF format to GenBank format.
Will parse GTF format into GFFtk annotation dictionary and then write to GenBank output.
- gfftk.convert.gtf2gff(gff, fasta, output=False, table=1, debug=False, grep=[], grepv=[])
Convert GTF format to GFF format.
Will parse GTF format into GFFtk annotation dictionary and then write to GFF3 output. Only coding genes are output with this method. Default is to write to stdout.
- gfftk.convert.gtf2proteins(gff, fasta, output=False, table=1, strip_stop=False, debug=False, grep=[], grepv=[])
Convert GTF format to translated protein FASTA format.
Will parse GTF format into GFFtk annotation dictionary and then write protein coding translations to FASTA format.
- Parameters:
gff (filename) – genome annotation text file in GTF format
fasta (filename) – genome sequence in FASTA format
table (int, default=1) – codon table [1]
strip_stop (bool, default=False) – remove stop codons (*) from translation
debug (bool, default=False) – print debug information to stderr
output (str, default=sys.stdout) – translated amino acids (proteins) in FASTA format
- gfftk.convert.gtf2tbl(gff, fasta, output=False, table=1, debug=False, grep=[], grepv=[])
Convert GTF format to NCBI TBL format .
Will parse GTF annotation format into GFFtk annotation dictionary and then write to NCBI TBL output. Default is to write to stdout.
- gfftk.convert.gtf2transcripts(gff, fasta, output=False, table=1, debug=False, grep=[], grepv=[])
Convert GTF format to transcript FASTA format.
Will parse GTF format into GFFtk annotation dictionary and then write transcripts in FASTA format.
- Parameters:
- gfftk.convert.tbl2cdstranscripts(tbl, fasta, output=False, table=1, grep=[], grepv=[])
Convert NCBI TBL format to CDS transcript [no UTRS] in FASTA format.
Will parse NCBI TBL format into GFFtk annotation dictionary and then write CDS transcripts in FASTA format.
- gfftk.convert.tbl2gbff(tbl, fasta, output=False, table=1, organism=False, strain=False, tmpdir='/tmp', cleanup=True, grep=[], grepv=[])
Convert NCBI TBL format to GenBank format.
Will parse NCBI TBL format into GFFtk annotation dictionary and then write to GenBank output.
- gfftk.convert.tbl2gff3(tbl, fasta, output=False, table=1, grep=[], grepv=[])
Convert NCBI TBL format to GFF3 format.
Will parse NCBI TBL format into GFFtk annotation dictionary and then write to GFF3 output. Default is to write to stdout.
- Parameters:
tbl (filename) – genome annotation text file in NCBI tbl format
fasta (filename) – genome sequence in FASTA format
table (int, default=1) – codon table [1]
output (str, default=sys.stdout) – annotation file in GFF3 format
grep (list, default=[]) – Filter gene models, keep matches. [key:value]
grepv (list, default=[]) – Filter gene models, remove matches [key:value]
- gfftk.convert.tbl2gtf(tbl, fasta, output=False, table=1, grep=[], grepv=[])
Convert NCBI TBL format to GTF format.
Will parse NCBI TBL format into GFFtk annotation dictionary and then write to GTF output. Only coding genes are output with this method. Default is to write to stdout.
- Parameters:
tbl (filename) – genome annotation text file in NCBI tbl format
fasta (filename) – genome sequence in FASTA format
table (int, default=1) – codon table [1]
output (str, default=sys.stdout) – annotation file in GTF format
grep (list, default=[]) – Filter gene models, keep matches. [key:value]
grepv (list, default=[]) – Filter gene models, remove matches [key:value]
- gfftk.convert.tbl2proteins(tbl, fasta, output=False, table=1, strip_stop=False, grep=[], grepv=[])
Convert NCBI TBL format to translated protein FASTA format.
Will parse NCBI TBL format into GFFtk annotation dictionary and then write protein coding translations to FASTA format.
- Parameters:
tbl (filename) – genome annotation text file in NCBI tbl format
fasta (filename) – genome sequence in FASTA format
table (int, default=1) – codon table [1]
strip_stop (bool, default=False) – remove stop codons (*) from translation
output (str, default=sys.stdout) – translated amino acids (proteins) in FASTA format
- gfftk.convert.tbl2transcripts(tbl, fasta, output=False, table=1, grep=[], grepv=[])
Convert NCBI TBL format to transcript FASTA format.
Will parse NCBI TBL format into GFFtk annotation dictionary and then write transcripts in FASTA format.