Usage
Command-Line Interface
gfftk provides a command-line interface with several subcommands for working with GFF3 files.
Basic Usage
gfftk [subcommand] [options]
Available Subcommands
consensus: Generate consensus gene models from multiple sourcesconvert: Convert between file formatssort: Sort GFF3 filessanitize: Clean up GFF3 filesrename: Rename features in GFF3 filesstats: Generate statistics about GFF3 filescompare: Compare GFF3 files
Detailed Command Options
consensus
gfftk consensus -h
usage: gfftk consensus [-h] -i GFF3 [GFF3 ...] -f FASTA [-o OUTPUT] [-w WEIGHTS] [-t THRESHOLD] [--no-progress] [--debug]
options:
-h, --help show this help message and exit
-i GFF3 [GFF3 ...], --input GFF3 [GFF3 ...]
Input GFF3 file(s)
-f FASTA, --fasta FASTA
Genome FASTA file
-o OUTPUT, --output OUTPUT
Output GFF3 file
-w WEIGHTS, --weights WEIGHTS
JSON file with weights for each input
-t THRESHOLD, --threshold THRESHOLD
Score threshold for consensus
--no-progress Disable progress bar
--debug Enable debug logging
convert
gfftk convert -h
usage: gfftk convert [-h] -i GFF3 -o OUTPUT -f {gtf,bed,tbl,proteins,transcripts} [-g GENOME] [--no-progress] [--debug]
options:
-h, --help show this help message and exit
-i GFF3, --input GFF3
Input GFF3 file
-o OUTPUT, --output OUTPUT
Output file
-f {gtf,bed,tbl,proteins,transcripts}, --format {gtf,bed,tbl,proteins,transcripts}
Output format
-g GENOME, --genome GENOME
Genome FASTA file (required for some formats)
--no-progress Disable progress bar
--debug Enable debug logging
sort
gfftk sort -h
usage: gfftk sort [-h] -i GFF3 -o OUTPUT [--no-progress] [--debug]
options:
-h, --help show this help message and exit
-i GFF3, --input GFF3
Input GFF3 file
-o OUTPUT, --output OUTPUT
Output GFF3 file
--no-progress Disable progress bar
--debug Enable debug logging
sanitize
gfftk sanitize -h
usage: gfftk sanitize [-h] -i GFF3 -o OUTPUT [--no-progress] [--debug]
options:
-h, --help show this help message and exit
-i GFF3, --input GFF3
Input GFF3 file
-o OUTPUT, --output OUTPUT
Output GFF3 file
--no-progress Disable progress bar
--debug Enable debug logging
rename
gfftk rename -h
usage: gfftk rename [-h] -i GFF3 -o OUTPUT -p PREFIX [--no-progress] [--debug]
options:
-h, --help show this help message and exit
-i GFF3, --input GFF3
Input GFF3 file
-o OUTPUT, --output OUTPUT
Output GFF3 file
-p PREFIX, --prefix PREFIX
Prefix for gene IDs
--no-progress Disable progress bar
--debug Enable debug logging
stats
gfftk stats -h
usage: gfftk stats [-h] -i GFF3 [-o OUTPUT] [--no-progress] [--debug]
options:
-h, --help show this help message and exit
-i GFF3, --input GFF3
Input GFF3 file
-o OUTPUT, --output OUTPUT
Output file for statistics
--no-progress Disable progress bar
--debug Enable debug logging
compare
gfftk compare -h
usage: gfftk compare [-h] -i GFF3 -c GFF3 -f FASTA [-o OUTPUT] [--no-progress] [--debug]
options:
-h, --help show this help message and exit
-i GFF3, --input GFF3
Input GFF3 file
-c GFF3, --compare GFF3
GFF3 file to compare against
-f FASTA, --fasta FASTA
Genome FASTA file
-o OUTPUT, --output OUTPUT
Output file for comparison results
--no-progress Disable progress bar
--debug Enable debug logging
Examples
Generate consensus gene models:
gfftk consensus -i input1.gff3 input2.gff3 -f genome.fasta -o consensus.gff3
Convert a GFF3 file to GTF format:
gfftk convert -i input.gff3 -o output.gtf -f gtf
Sort a GFF3 file:
gfftk sort -i input.gff3 -o sorted.gff3
Generate statistics about a GFF3 file:
gfftk stats -i input.gff3
Compare two GFF3 files:
gfftk compare -i input1.gff3 -c input2.gff3 -o comparison.txt
Python API
gfftk can also be used as a Python library. The library provides a comprehensive set of functions for working with GFF3 files.
Core Modules
gfftk.gff: Functions for parsing, manipulating, and writing GFF3 filesgfftk.consensus: Functions for generating consensus gene modelsgfftk.convert: Functions for converting between file formatsgfftk.compare: Functions for comparing GFF3 filesgfftk.fasta: Functions for working with FASTA filesgfftk.genbank: Functions for working with GenBank files
Basic Usage
import gfftk
# Parse a GFF3 file
gff_dict = gfftk.gff.gff2dict("input.gff3", "genome.fasta")
# Modify the GFF3 data
# ...
# Write the modified data back to a GFF3 file
gfftk.gff.dict2gff3(gff_dict, output="output.gff3")
GFF3 Data Structure
The GFF3 data structure used by gfftk is a nested dictionary with the following structure:
{
"gene_id": {
"type": "gene",
"contig": "contig_name",
"location": [start, end],
"strand": "+" or "-",
"source": "source_name",
"score": score_value,
"phase": phase_value,
"ID": "gene_id",
"Name": "gene_name",
# Other attributes...
"mRNA": [
{
"type": "mRNA",
"id": "mrna_id",
"parent": "gene_id",
"exon": [[start1, end1], [start2, end2], ...],
"CDS": [[start1, end1], [start2, end2], ...],
# Other features and attributes...
},
# More mRNAs...
],
},
# More genes...
}
Examples
Parse a GFF3 file and extract gene information:
import gfftk
# Parse a GFF3 file
gff_dict = gfftk.gff.gff2dict("input.gff3", "genome.fasta")
# Print information about each gene
for gene_id, gene in gff_dict.items():
print(f"Gene ID: {gene_id}")
print(f"Location: {gene['contig']}:{gene['location'][0]}-{gene['location'][1]}")
print(f"Strand: {gene['strand']}")
print(f"Number of mRNAs: {len(gene.get('mRNA', []))}")
print()
Convert a GFF3 file to BED format:
import gfftk
# Convert GFF3 to BED
gfftk.convert.gff2bed("input.gff3", "output.bed")
Generate consensus gene models:
import gfftk
# Generate consensus gene models
consensus = gfftk.consensus.consensus_gene_models(
["input1.gff3", "input2.gff3"],
"genome.fasta",
weights={"input1": 1, "input2": 2},
threshold=3,
)
# Write the consensus gene models to a GFF3 file
gfftk.gff.dict2gff3(consensus, output="consensus.gff3")