pybioinfo-utils is a collection of small Python functions designed for bioinformatics applications, particularly focused on protein sequence processing. These functions aim to simplify common tasks such as sequence manipulation, file format conversion, and sequence analysis.
This script contains a function that removes dash characters ("-") from protein sequences in a FASTA file and writes the cleaned sequences to a new file.
remove_gaps(input_file, output_file)
This script contains a function that converts lowercase characters to uppercase and replaces dots (".") with dashes ("-") in a protein sequence.
fasta_to_uppercase_and_dashes(input_file, output_file)
This script contains a function that plots the amino acid distribution for each position in the Multiple Sequence Aignment of protein sequences from the given input FASTA file.
aa_distribution(fasta_file)
This Python script contains two functions. One to convert a FASTA file to Clustal format and the other to convert a directory of FASTA files to Clustal format files.
fasta_to_clustal(input_fasta, output_clustal)
convert_directory(input_dir, output_dir)
This Python script contains two functions. One to convert a file in seq format to FASTA format and the other to convert a directory of seq files to FASTA files.
convert_seq_to_fasta(input_file, output_file)
batch_convert_seq_to_fasta(input_dir, output_dir)
This Python script contains a function that counts the number of sequences in a FASTA file.
sequence_count = count_sequences_in_fasta(fasta_file)
This Python script contains a function that calculates consensus sequences from multiple sequence alignments and writes them to an output file.
find_consensus(input_dir, output_file)
This Python script contains a function that trims or pads each sequence in an input multiple sequence alignment to the maximum length.
make_seq_same_length(input_file, output_file)
This function converts PFAM alignment files to FASTA format.
pfam2fasta(input_file, output_file)
This Python script contains three functions. One to convert a given input multiple sequence alignment in FASTA format to a nested list format, another to remove columns in the multiple sequence alignment with high gap frequency, and finally write the new multiple sequence alignment to a FASTA file.
fasta_to_nested_list(input_file)
remove_columns_with_high_gap_frequency(nested_list, threshold)
write_to_fasta(output_file, data)
To use these functions, simply import them into your Python script or interactive session and provide the required input parameters. See individual function descriptions for usage examples.