Multiple Sequence Alignment¶
Alignments¶
Alignment¶
- class pytrimal.Alignment¶
A multiple sequence alignment.
- __init__(names, sequences)¶
Create a new alignment with the given names and sequences.
- Parameters:
Examples
Create a new alignment with a list of sequences and a list of names:
>>> alignment = Alignment( ... names=[b"Sp8", b"Sp10", b"Sp26"], ... sequences=[ ... "-----GLGKVIV-YGIVLGTKSDQFSNWVVWLFPWNGLQIHMMGII", ... "-------DPAVL-FVIMLGTIT-KFS--SEWFFAWLGLEINMMVII", ... "AAAAAAAAALLTYLGLFLGTDYENFA--AAAANAWLGLEINMMAQI", ... ] ... )
There should be as many sequences as there are names, otherwise a
ValueError
will be raised:>>> Alignment( ... names=[b"Sp8", b"Sp10", b"Sp26"], ... sequences=["GLQIHMMGII", "GLEINMMVII"] ... ) Traceback (most recent call last): ... ValueError: `Alignment` given 3 names but 2 sequences
Sequence characters will be checked, and an error will be raised if they are not one of the characters from a biological alphabet:
>>> Alignment( ... names=[b"Sp8", b"Sp10"], ... sequences=["GLQIHMMGII", "GLEINMM123"] ... ) Traceback (most recent call last): ... ValueError: The sequence "Sp10" has an unknown (49) character
- copy()¶
Create a copy of this alignment.
- dump(file, format='fasta')¶
Dump the alignment to a file or a file-like object.
- Parameters:
file (
str
,bytes
,os.PathLike
or file-like object) – The file to which to write the alignment. If a file-like object is given, it must be open in binary mode. Otherwise,file
is treated as a path.format (
str
) – The name of the alignment format to write. See below for a list of supported formats.
- Raises:
ValueError – When
format
is not a recognized file format.OSError – When the path given as
file
could not be opened.
Hint
The alignment can be written in one of the following formats:
clustal
The alignment format produced by the Clustal and Clustal Omega alignment softwares.
fasta
The aligned FASTA format, which outputs all sequences in the alignment as FASTA records with gap characters (see Wikipedia:FASTA format).
html
An HTML report showing alignment in pseudo-Clustal format with colored residues.
mega
The alignment format produced by the MEGA software for evolutionary analysis of alignments.
nexus
The NEXUS alignment format (see Wikipedia:Nexus file).
phylip
(orphylip40
):The PHYLIP 4.0 alignment format.
phylip32
The PHYLIP 3.2 alignment format.
phylippaml
A variant of PHYLIP 4.0 compatible with the PAML tool for phylogenetic analysis.
nbrf
orpir
The format of Protein Information Resource database files, provided by the National Biomedical Research Foundation.
Additionally, the
fasta
,nexus
,phylippaml
,phylip32
, andphylip40
formats support an_m10
variant, which limits the sequence names to 10 characters.New in version 0.2.2.
- dumps(format='fasta', encoding='utf-8')¶
Dump the alignment to a string in the provided format.
- Parameters:
- Raises:
ValueError – When
format
is not a recognized file format.
New in version 0.2.2.
- from_biopython(alignment)¶
Create a new
Alignment
from an iterable of Biopython records.- Parameters:
alignment (iterable of
SeqRecord
) – An iterable of Biopython records objects to build the alignment from. Passing aBio.Align.MultipleSeqAlignment
object is also supported.- Returns:
Alignment
– A new alignment object ready for trimming.
New in version 0.5.0.
- from_pyhmmer(alignment)¶
Create a new
Alignment
from apyhmmer.easel.TextMSA
.- Parameters:
alignment (
TextMSA
) – A PyHMMER object storing a multiple sequence alignment in text format.- Returns:
Alignment
– A new alignment object ready for trimming.
New in version 0.5.0.
- load(file, format=None)¶
Load a multiple sequence alignment from a file.
- Parameters:
path (
str
,bytes
,os.PathLike
or file-like object) – The file from which to read the alignment. If a file-like object is given, it must be open in binary mode and support random access with theseek
method. Otherwise,file
is treated as a path.format (
str
, optional) – The file-format the alignment is stored in. Must be given when loading from a file-like object, will be autodetected when reading from a file.
- Returns:
Alignment
– The deserialized alignment.
Example
>>> msa = Alignment.load("example.001.AA.clw") >>> msa.names [b'Sp8', b'Sp10', b'Sp26', b'Sp6', b'Sp17', b'Sp33']
Changed in version 0.3.0: Add support for reading code from a file-like object.
- to_biopython()¶
Create a new
MultipleSeqAlignment
from thisAlignment
.- Returns:
MultipleSeqAlignment
– A multiple sequence alignment object as implemented in Biopython.- Raises:
ImportError – When the
Bio
module cannot be imported.
New in version 0.5.0.
- to_pyhmmer()¶
Create a new
TextMSA
from thisAlignment
.- Returns:
TextMSA
– A PyHMMER multiple sequence alignment in text mode.- Raises:
ImportError – When the
pyhmmer
module cannot be imported.
New in version 0.5.0.
- residues¶
The residues in the alignment.
- Type:
- sequences¶
The sequences in the alignment.
- Type:
Trimmed Alignment¶
- class pytrimal.TrimmedAlignment(Alignment)¶
A multiple sequence alignment that has been trimmed.
Internally, the trimming process produces a mask of sequences and a mask of residues. This class only exposes the filtered sequences and residues.
Example
Create a trimmed alignment using two lists to filter out some residues and sequences:
>>> trimmed = TrimmedAlignment( ... names=[b"Sp8", b"Sp10", b"Sp26"], ... sequences=["QFSNWV", "KFS--S", "NFA--A"], ... sequences_mask=[True, True, False], ... residues_mask=[True, True, True, False, False, True], ... )
The
names
andsequences
properties will only contain the retained sequences and residues:>>> list(trimmed.names) [b'Sp8', b'Sp10'] >>> list(trimmed.sequences) ['QFSV', 'KFSS']
Use the
original_alignment
method to build the original unfiltered alignment containing all sequences and residues:>>> ali = trimmed.original_alignment() >>> list(ali.names) [b'Sp8', b'Sp10', b'Sp26'] >>> list(ali.sequences) ['QFSNWV', 'KFS--S', 'NFA--A']
- __init__(names, sequences, sequences_mask=None, residues_mask=None)¶
Create a new alignment with the given names, sequences and masks.
- Parameters:
names (
Sequence
ofbytes
) – The names of the sequences in the alignment.sequences (
Sequence
ofstr
) – The actual sequences in the alignment.sequences_mask (
Sequence
ofbool
) – A mask for which sequences to keep in the trimmed alignment. If given, must be as long as thesequences
andnames
list.residues_mask (
Sequence
ofbool
) – A mask for which residues to keep in the trimmed alignment. If given, must be as long as every element in thesequences
argument.
- copy()¶
Create a copy of this trimmed alignment.
- load()¶
Load a multiple sequence alignment from a file.
- Parameters:
path (
str
,bytes
,os.PathLike
or file-like object) – The file from which to read the alignment. If a file-like object is given, it must be open in binary mode and support random access with theseek
method. Otherwise,file
is treated as a path.format (
str
, optional) – The file-format the alignment is stored in. Must be given when loading from a file-like object, will be autodetected when reading from a file.
- Returns:
Alignment
– The deserialized alignment.
Example
>>> msa = Alignment.load("example.001.AA.clw") >>> msa.names [b'Sp8', b'Sp10', b'Sp26', b'Sp6', b'Sp17', b'Sp33']
Changed in version 0.3.0: Add support for reading code from a file-like object.
- original_alignment()¶
Rebuild the original alignment from which this object was obtained.
- Returns:
Alignment
– The untrimmed alignment that produced this trimmed alignment.
- terminal_only()¶
Get a trimmed alignment where only the terminal residues are removed.
- Returns:
TrimmedAlignment
– The alignment where only terminal residues have been trimmed.
Attributes¶
AlignmentSequences¶
- class pytrimal.AlignmentSequences¶
A read-only view over the sequences of an alignment.
Objects from this class are created in the
sequences
property ofAlignment
objects. Use it to access the string data of individual rows from the alignment:>>> msa = Alignment.load("example.001.AA.clw") >>> len(msa.sequences) 6 >>> msa.sequences[0] '-----GLGKVIV-YGIVLGTKSDQFSNWVVWLFPWNGLQIHMMGII' >>> sum(seq.count('-') for seq in msa.sequences) 43
A slice over a subset of the sequences can be obtained as well without having to copy the internal data, allowing to create a new
Alignment
with only some sequences from the original one:>>> msa2 = Alignment(msa.names[:4:2], msa.sequences[:4:2]) >>> len(msa2.sequences) 2 >>> msa2.sequences[1] == msa.sequences[2] True
New in version 0.4.0: Support for zero-copy slicing.
AlignmentResidues¶
- class pytrimal.AlignmentResidues¶
A read-only view over the residues of an alignment.
Objects from this class are created in the
residues
property ofAlignment
objects. Use it to access the string data of individual columns from the alignment:>>> msa = Alignment.load("example.001.AA.clw") >>> len(msa.residues) 46 >>> msa.residues[0] '--A---' >>> msa.residues[-1] 'IIIIFL'
New in version 0.4.0: Support for zero-copy slicing.