GFF

A module for parsing/generating generic feature file (GFF3) format.

gfftk.gff.dict2combined_gff_fasta(annotation_dict, fasta_dict, output=False, debug=False, source=False)

Write GFFtk annotation dictionary and FASTA sequences to combined GFF3+FASTA format.

Parameters:
  • annotation_dict (dict) – GFFtk standardized annotation dictionary

  • fasta_dict (dict) – Dictionary of sequences keyed by contig name

  • output (str or file handle, default=False) – Output file path or handle. If False, writes to stdout

  • debug (bool, default=False) – Print debug information

  • source (str, default=False) – Override source field in GFF3 output

Return type:

None

gfftk.gff.dict2gff3(infile, output=False, debug=False, source=False, newline=False, url_encode=False)

Convert GFFtk standardized annotation dictionary to GFF3 file.

Annotation dictionary generated by gff2dict or tbl2dict passed as input. This function then write to GFF3 format

Parameters:
  • infile (dict of dict) – standardized annotation dictionary keyed by locus_tag

  • output (str, default=sys.stdout) – annotation file in GFF3 format

  • debug (bool, default=False) – print debug information to stderr

  • source (str, default=False) – override source field in GFF3 output

  • newline (bool, default=False) – add newline after each gene

  • url_encode (bool, default=False) – URL encode attribute values for downstream tool compatibility

gfftk.gff.dict2gff3alignments(infile, output=False, debug=False, alignments='transcript', source=False, newline=False)

Convert GFFtk standardized annotation dictionary to GFF3 alignments file.

Annotation dictionary generated by gff2dict or tbl2dict passed as input. Output format is GFF3-alignment, aka EVM evidence format

Parameters:
  • infile (dict of dict) – standardized annotation dictionary keyed by locus_tag

  • output (str, default=sys.stdout) – annotation file in GFF3 format

  • debug (bool, default=False) – print debug information to stderr

gfftk.gff.dict2gtf(infile, output=False, source=False)

Convert GFFtk standardized annotation dictionary to GTF file.

Annotation dictionary generated by gff2dict or tbl2dict passed as input. This function then write to GTF format, notably this function only writes protein coding CDS features.

Parameters:
  • infile (dict of dict) – standardized annotation dictionary keyed by locus_tag

  • output (str, default=sys.stdout) – annotation in GTF format

gfftk.gff.gff2dict(gff, fasta, annotation=False, table=1, debug=False, gap_filter=False, gff_format='auto', logger=<built-in method write of _io.TextIOWrapper object>)

Convert GFF3 and FASTA to standardized GFFtk dictionary format.

Annotation file in GFF3 format and genome FASTA file are parsed. The result is a dictionary that is keyed by locus_tag (gene name) and the value is a nested dictionary containing feature information.

Parameters:
  • gff (filename : str) – annotation text file in GFF3 format

  • fasta (filename : str) – genome text file in FASTA format

  • annotation (dict of str) – existing annotation dictionary

  • table (int, default=1) – codon table [1]

  • debug (bool, default=False) – print debug information to stderr

  • gap_filter (bool, default=False) – remove gene models that span gaps in sequence

  • logger (handle, default=sys.stderr.write) – where to log messages to

Returns:

annotation – standardized annotation dictionary (OrderedDict) keyed by locus_tag

Return type:

dict of dict

gfftk.gff.gtf2dict(gtf, fasta, annotation=False, table=1, debug=False, gap_filter=False, gtf_format='auto', logger=<built-in method write of _io.TextIOWrapper object>)

Convert GTF and FASTA to standardized GFFtk dictionary format.

Annotation file in GTF format and genome FASTA file are parsed. The result is a dictionary that is keyed by locus_tag (gene name) and the value is a nested dictionary containing feature information.

Parameters:
  • gtf (filename : str) – annotation text file in GTF format

  • fasta (filename : str) – genome text file in FASTA format

  • annotation (dict of str) – existing annotation dictionary

  • table (int, default=1) – codon table [1]

  • debug (bool, default=False) – print debug information to stderr

  • gap_filter (bool, default=False) – remove gene models that span gaps in sequence

  • logger (handle, default=sys.stderr.write) – where to log messages to

Returns:

annotation – standardized annotation dictionary (OrderedDict) keyed by locus_tag

Return type:

dict of dict

gfftk.gff.is_combined_gff_fasta(filename)

Check if a file contains both GFF3 and FASTA data.

Parameters:

filename (str) – Path to the file to check

Returns:

True if file contains ##FASTA directive, False otherwise

Return type:

bool

gfftk.gff.split_combined_gff_fasta(filename)

Split a combined GFF3+FASTA file into separate GFF3 and FASTA components.

Parameters:

filename (str) – Path to the combined file

Returns:

(gff_content, fasta_content) as file-like objects

Return type:

tuple