GFF
A module for parsing/generating generic feature file (GFF3) format.
- gfftk.gff.dict2combined_gff_fasta(annotation_dict, fasta_dict, output=False, debug=False, source=False)
Write GFFtk annotation dictionary and FASTA sequences to combined GFF3+FASTA format.
- Parameters:
annotation_dict (dict) – GFFtk standardized annotation dictionary
fasta_dict (dict) – Dictionary of sequences keyed by contig name
output (str or file handle, default=False) – Output file path or handle. If False, writes to stdout
debug (bool, default=False) – Print debug information
source (str, default=False) – Override source field in GFF3 output
- Return type:
None
- gfftk.gff.dict2gff3(infile, output=False, debug=False, source=False, newline=False, url_encode=False)
Convert GFFtk standardized annotation dictionary to GFF3 file.
Annotation dictionary generated by gff2dict or tbl2dict passed as input. This function then write to GFF3 format
- Parameters:
infile (dict of dict) – standardized annotation dictionary keyed by locus_tag
output (str, default=sys.stdout) – annotation file in GFF3 format
debug (bool, default=False) – print debug information to stderr
source (str, default=False) – override source field in GFF3 output
newline (bool, default=False) – add newline after each gene
url_encode (bool, default=False) – URL encode attribute values for downstream tool compatibility
- gfftk.gff.dict2gff3alignments(infile, output=False, debug=False, alignments='transcript', source=False, newline=False)
Convert GFFtk standardized annotation dictionary to GFF3 alignments file.
Annotation dictionary generated by gff2dict or tbl2dict passed as input. Output format is GFF3-alignment, aka EVM evidence format
- gfftk.gff.dict2gtf(infile, output=False, source=False)
Convert GFFtk standardized annotation dictionary to GTF file.
Annotation dictionary generated by gff2dict or tbl2dict passed as input. This function then write to GTF format, notably this function only writes protein coding CDS features.
- gfftk.gff.gff2dict(gff, fasta, annotation=False, table=1, debug=False, gap_filter=False, gff_format='auto', logger=<built-in method write of _io.TextIOWrapper object>)
Convert GFF3 and FASTA to standardized GFFtk dictionary format.
Annotation file in GFF3 format and genome FASTA file are parsed. The result is a dictionary that is keyed by locus_tag (gene name) and the value is a nested dictionary containing feature information.
- Parameters:
gff (filename : str) – annotation text file in GFF3 format
fasta (filename : str) – genome text file in FASTA format
table (int, default=1) – codon table [1]
debug (bool, default=False) – print debug information to stderr
gap_filter (bool, default=False) – remove gene models that span gaps in sequence
logger (handle, default=sys.stderr.write) – where to log messages to
- Returns:
annotation – standardized annotation dictionary (OrderedDict) keyed by locus_tag
- Return type:
- gfftk.gff.gtf2dict(gtf, fasta, annotation=False, table=1, debug=False, gap_filter=False, gtf_format='auto', logger=<built-in method write of _io.TextIOWrapper object>)
Convert GTF and FASTA to standardized GFFtk dictionary format.
Annotation file in GTF format and genome FASTA file are parsed. The result is a dictionary that is keyed by locus_tag (gene name) and the value is a nested dictionary containing feature information.
- Parameters:
gtf (filename : str) – annotation text file in GTF format
fasta (filename : str) – genome text file in FASTA format
table (int, default=1) – codon table [1]
debug (bool, default=False) – print debug information to stderr
gap_filter (bool, default=False) – remove gene models that span gaps in sequence
logger (handle, default=sys.stderr.write) – where to log messages to
- Returns:
annotation – standardized annotation dictionary (OrderedDict) keyed by locus_tag
- Return type:
- gfftk.gff.is_combined_gff_fasta(filename)
Check if a file contains both GFF3 and FASTA data.