Multiple Sequence Alignment¶
Alignments¶
Alignment¶
- class pytrimal.Alignment¶
A multiple sequence alignment.
- __init__(names, sequences)¶
Create a new alignment with the given names and sequences.
- Parameters
Examples
Create a new alignment with a list of sequences and a list of names:
>>> alignment = Alignment( ... names=[b"Sp8", b"Sp10", b"Sp26"], ... sequences=[ ... "-----GLGKVIV-YGIVLGTKSDQFSNWVVWLFPWNGLQIHMMGII", ... "-------DPAVL-FVIMLGTIT-KFS--SEWFFAWLGLEINMMVII", ... "AAAAAAAAALLTYLGLFLGTDYENFA--AAAANAWLGLEINMMAQI", ... ] ... )
There should be as many sequences as there are names, otherwise a
ValueErrorwill be raised:>>> Alignment( ... names=[b"Sp8", b"Sp10", b"Sp26"], ... sequences=["GLQIHMMGII", "GLEINMMVII"] ... ) Traceback (most recent call last): ... ValueError: `Alignment` given 3 names but 2 sequences
Sequence characters will be checked, and an error will be raised if they are not one of the characters from a biological alphabet:
>>> Alignment( ... names=[b"Sp8", b"Sp10"], ... sequences=["GLQIHMMGII", "GLEINMM123"] ... ) Traceback (most recent call last): ... ValueError: The sequence "Sp10" has an unknown (49) character
- copy()¶
Create a copy of this alignment.
- dump(file, format='fasta')¶
Dump the alignment to a file or a file-like object.
- Parameters
file (
str,bytes,os.PathLikeor file-like object) – The file to which to write the alignment. If a file-like object is given, it must be open in binary mode. Otherwise,fileis treated as a path.format (
str) – The name of the alignment format to write. See below for a list of supported formats.
- Raises
ValueError – When
formatis not a recognized file format.OSError – When the path given as
filecould not be opened.
Hint
The alignment can be written in one of the following formats:
clustalThe alignment format produced by the Clustal and Clustal Omega alignment softwares.
fastaThe aligned FASTA format, which outputs all sequences in the alignment as FASTA records with gap characters (see Wikipedia:FASTA format).
htmlAn HTML report showing alignment in pseudo-Clustal format with colored residues.
megaThe alignment format produced by the MEGA software for evolutionary analysis of alignments.
nexusThe NEXUS alignment format (see Wikipedia:Nexus file).
phylip(orphylip40):The PHYLIP 4.0 alignment format.
phylip32The PHYLIP 3.2 alignment format.
phylippamlA variant of PHYLIP 4.0 compatible with the PAML tool for phylogenetic analysis.
nbrforpirThe format of Protein Information Resource database files, provided by the National Biomedical Research Foundation.
Additionally, the
fasta,nexus,phylippaml,phylip32, andphylip40formats support an_m10variant, which limits the sequence names to 10 characters.New in version 0.2.2.
- dumps(format='fasta', encoding='utf-8')¶
Dump the alignment to a string in the provided format.
- Parameters
- Raises
ValueError – When
formatis not a recognized file format.
New in version 0.2.2.
- load(file, format=None)¶
Load a multiple sequence alignment from a file.
- Parameters
path (
str,bytesoros.PathLike) – The file from which to write the alignment. If a file-like object is given, it must be open in binary mode and support random access with theseekmethod. Otherwise,fileis treated as a path.format (
str, optional) – The file-format the alignment is stored in. Must be given when loading from a file-like object, will be autodetected when reading from a file.
- Returns
Alignment– The deserialized alignment.
Example
>>> msa = Alignment.load("example.001.AA.clw") >>> msa.names [b'Sp8', b'Sp10', b'Sp26', b'Sp6', b'Sp17', b'Sp33']
Changed in version 0.3.0: Add support for reading code from a file-like object.
- residues¶
The residues in the alignment.
- Type
- sequences¶
The sequences in the alignment.
- Type
Trimmed Alignment¶
- class pytrimal.TrimmedAlignment(Alignment)¶
A multiple sequence alignment that has been trimmed.
Internally, the trimming process produces a mask of sequences and a mask of residues. This class only exposes the filtered sequences and residues.
Example
Create a trimmed alignment using two lists to filter out some residues and sequences:
>>> trimmed = TrimmedAlignment( ... names=[b"Sp8", b"Sp10", b"Sp26"], ... sequences=["QFSNWV", "KFS--S", "NFA--A"], ... sequences_mask=[True, True, False], ... residues_mask=[True, True, True, False, False, True], ... )
The
namesandsequencesproperties will only contain the retained sequences and residues:>>> list(trimmed.names) [b'Sp8', b'Sp10'] >>> list(trimmed.sequences) ['QFSV', 'KFSS']
Use the
original_alignmentmethod to build the original unfiltered alignment containing all sequences and residues:>>> ali = trimmed.original_alignment() >>> list(ali.names) [b'Sp8', b'Sp10', b'Sp26'] >>> list(ali.sequences) ['QFSNWV', 'KFS--S', 'NFA--A']
- __init__(names, sequences, sequences_mask=None, residues_mask=None)¶
Create a new alignment with the given names, sequences and masks.
- Parameters
names (
Sequenceofbytes) – The names of the sequences in the alignment.sequences (
Sequenceofstr) – The actual sequences in the alignment.sequences_mask (
Sequenceofbool) – A mask for which sequences to keep in the trimmed alignment. If given, must be as long as thesequencesandnameslist.residues_mask (
Sequenceofbool) – A mask for which residues to keep in the trimmed alignment. If given, must be as long as every element in thesequencesargument.
- sequences¶
The sequences in the alignment.
- Type
Attributes¶
AlignmentSequences¶
- class pytrimal.AlignmentSequences¶
A read-only view over the sequences of an alignment.
Objects from this class are created in the
sequencesproperty ofAlignmentobjects. Use it to access the string data of individual rows from the alignment:>>> msa = Alignment.load("example.001.AA.clw") >>> len(msa.sequences) 6 >>> msa.sequences[0] '-----GLGKVIV-YGIVLGTKSDQFSNWVVWLFPWNGLQIHMMGII' >>> sum(letter == '-' for seq in msa.sequences for letter in seq) 43
AlignmentResidues¶
- class pytrimal.AlignmentResidues¶
A read-only view over the residues of an alignment.
Objects from this class are created in the
residuesproperty ofAlignmentobjects. Use it to access the string data of individual columns from the alignment:>>> msa = Alignment.load("example.001.AA.clw") >>> len(msa.residues) 46 >>> msa.residues[0] '--A---' >>> msa.residues[-1] 'IIIIFL'