Multiple Sequence Alignment

Alignments

Alignment

class pytrimal.Alignment(names, sequences)

A multiple sequence alignment.

__copy__(self)
__init__(names, sequences)

Create a new alignment with the given names and sequences.

Parameters
  • names (Sequence of bytes) – The names of the sequences in the alignment.

  • sequences (Sequence of bytes or str) – The actual sequences in the alignment.

Examples

Create a new alignment with a list of sequences and a list of names:

>>> alignment = Alignment(
...     names=[b"Sp8", b"Sp10", b"Sp26"],
...     sequences=[
...         "-----GLGKVIV-YGIVLGTKSDQFSNWVVWLFPWNGLQIHMMGII",
...         "-------DPAVL-FVIMLGTIT-KFS--SEWFFAWLGLEINMMVII",
...         "AAAAAAAAALLTYLGLFLGTDYENFA--AAAANAWLGLEINMMAQI",
...     ]
... )

There should be as many sequences as there are names, otherwise a ValueError will be raised:

>>> Alignment(
...     names=[b"Sp8", b"Sp10", b"Sp26"],
...     sequences=["GLQIHMMGII", "GLEINMMVII"]
... )
Traceback (most recent call last):
...
ValueError: `Alignment` given 3 names but 2 sequences

Sequence characters will be checked, and an error will be raised if they are not one of the characters from a biological alphabet:

>>> Alignment(
...     names=[b"Sp8", b"Sp10"],
...     sequences=["GLQIHMMGII", "GLEINMM123"]
... )
Traceback (most recent call last):
...
ValueError: The sequence "Sp10" has an unknown (49) character
copy(self) Alignment
copy(self) None

Create a copy of this alignment.

dump(self, file, unicode format=u'fasta') void
dump(self, file, format='fasta') None

Dump the alignment to a file or a file-like object.

Parameters
  • file (str, bytes, os.PathLike or file-like object) – The file to which to write the alignment. If a file-like object is given, it must be open in binary mode. Otherwise, file is treated as a path.

  • format (str) – The name of the alignment format to write. See below for a list of supported formats.

Raises
  • ValueError – When format is not a recognized file format.

  • OSError – When the path given as file could not be opened.

Hint

The alignment can be written in one of the following formats:

clustal

The alignment format produced by the Clustal and Clustal Omega alignment softwares.

fasta

The aligned FASTA format, which outputs all sequences in the alignment as FASTA records with gap characters (see Wikipedia:FASTA format).

html

An HTML report showing alignment in pseudo-Clustal format with colored residues.

mega

The alignment format produced by the MEGA software for evolutionary analysis of alignments.

nexus

The NEXUS alignment format (see Wikipedia:Nexus file).

phylip (or phylip40):

The PHYLIP 4.0 alignment format.

phylip32

The PHYLIP 3.2 alignment format.

phylippaml

A variant of PHYLIP 4.0 compatible with the PAML tool for phylogenetic analysis.

nbrf or pir

The format of Protein Information Resource database files, provided by the National Biomedical Research Foundation.

Additionally, the fasta, nexus, phylippaml, phylip32, and phylip40 formats support an _m10 variant, which limits the sequence names to 10 characters.

New in version 0.2.2.

dumps(self, unicode format=u'fasta', unicode encoding=u'utf-8') unicode
dumps(self, format='fasta', encoding='utf-8') None

Dump the alignment to a string in the provided format.

Parameters
  • format (str) – The format of the alignment. See the dump method for a list of supported formats.

  • encoding (str) – The encoding to use to decode sequence names.

Raises

ValueError – When format is not a recognized file format.

New in version 0.2.2.

classmethod from_biopython(type cls, alignment)
classmethod from_biopython(cls, alignment) None

Create a new Alignment from an iterable of Biopython records.

Parameters

alignment (iterable of SeqRecord) – An iterable of Biopython records objects to build the alignment from. Passing a Bio.Align.MultipleSeqAlignment object is also supported.

Returns

Alignment – A new alignment object ready for trimming.

New in version 0.5.0.

classmethod from_pyhmmer(type cls, alignment)
classmethod from_pyhmmer(cls, alignment) None

Create a new Alignment from a pyhmmer.easel.TextMSA.

Parameters

alignment (TextMSA) – A PyHMMER object storing a multiple sequence alignment in text format.

Returns

Alignment – A new alignment object ready for trimming.

New in version 0.5.0.

classmethod load(type cls, file, unicode format=None)
classmethod load(cls, file, format=None) None

Load a multiple sequence alignment from a file.

Parameters
  • path (str, bytes, os.PathLike or file-like object) – The file from which to read the alignment. If a file-like object is given, it must be open in binary mode and support random access with the seek method. Otherwise, file is treated as a path.

  • format (str, optional) – The file-format the alignment is stored in. Must be given when loading from a file-like object, will be autodetected when reading from a file.

Returns

Alignment – The deserialized alignment.

Example

>>> msa = Alignment.load("example.001.AA.clw")
>>> msa.names
[b'Sp8', b'Sp10', b'Sp26', b'Sp6', b'Sp17', b'Sp33']

Changed in version 0.3.0: Add support for reading code from a file-like object.

to_biopython(self)
to_biopython(self) None

Create a new MultipleSeqAlignment from this Alignment.

Returns

MultipleSeqAlignment – A multiple sequence alignment object as implemented in Biopython.

Raises

ImportError – When the Bio module cannot be imported.

New in version 0.5.0.

to_pyhmmer(self)
to_pyhmmer(self) None

Create a new TextMSA from this Alignment.

Returns

TextMSA – A PyHMMER multiple sequence alignment in text mode.

Raises

ImportError – When the pyhmmer module cannot be imported.

New in version 0.5.0.

names

The names of the sequences in the alignment.

Type

sequence of bytes

residues

The residues in the alignment.

Type

AlignmentResidues

sequences

The sequences in the alignment.

Type

AlignmentSequences

Trimmed Alignment

class pytrimal.TrimmedAlignment(Alignment)

TrimmedAlignment(names, sequences, sequences_mask=None, residues_mask=None) A multiple sequence alignment that has been trimmed.

Internally, the trimming process produces a mask of sequences and a mask of residues. This class only exposes the filtered sequences and residues.

Example:

Create a trimmed alignment using two lists to filter out some residues and sequences:

>>> trimmed = TrimmedAlignment(
...    names=[b"Sp8", b"Sp10", b"Sp26"],
...    sequences=["QFSNWV", "KFS--S", "NFA--A"],
...    sequences_mask=[True, True, False],
...    residues_mask=[True, True, True, False, False, True],
... )

The names and sequences properties will only contain the retained sequences and residues:

>>> list(trimmed.names)
[b'Sp8', b'Sp10']
>>> list(trimmed.sequences)
['QFSV', 'KFSS']

Use the original_alignment method to build the original unfiltered alignment containing all sequences and residues:

>>> ali = trimmed.original_alignment()
>>> list(ali.names)
[b'Sp8', b'Sp10', b'Sp26']
>>> list(ali.sequences)
['QFSNWV', 'KFS--S', 'NFA--A']
__init__(names, sequences, sequences_mask=None, residues_mask=None)

Create a new alignment with the given names, sequences and masks.

Parameters
  • names (Sequence of bytes) – The names of the sequences in the alignment.

  • sequences (Sequence of str) – The actual sequences in the alignment.

  • sequences_mask (Sequence of bool) – A mask for which sequences to keep in the trimmed alignment. If given, must be as long as the sequences and names list.

  • residues_mask (Sequence of bool) – A mask for which residues to keep in the trimmed alignment. If given, must be as long as every element in the sequences argument.

names

The names of the sequences in the alignment.

Type

sequence of bytes

sequences

The sequences in the alignment.

Type

AlignmentSequences

Attributes

AlignmentSequences

class pytrimal.AlignmentSequences

A read-only view over the sequences of an alignment.

Objects from this class are created in the sequences property of Alignment objects. Use it to access the string data of individual rows from the alignment:

>>> msa = Alignment.load("example.001.AA.clw")
>>> len(msa.sequences)
6
>>> msa.sequences[0]
'-----GLGKVIV-YGIVLGTKSDQFSNWVVWLFPWNGLQIHMMGII'
>>> sum(seq.count('-') for seq in msa.sequences)
43

A slice over a subset of the sequences can be obtained as well without having to copy the internal data, allowing to create a new Alignment with only some sequences from the original one:

>>> msa2 = Alignment(msa.names[:4:2], msa.sequences[:4:2])
>>> len(msa2.sequences)
2
>>> msa2.sequences[1] == msa.sequences[2]
True

New in version 0.4.0: Support for zero-copy slicing.

AlignmentResidues

class pytrimal.AlignmentResidues

A read-only view over the residues of an alignment.

Objects from this class are created in the residues property of Alignment objects. Use it to access the string data of individual columns from the alignment:

>>> msa = Alignment.load("example.001.AA.clw")
>>> len(msa.residues)
46
>>> msa.residues[0]
'--A---'
>>> msa.residues[-1]
'IIIIFL'

New in version 0.4.0: Support for zero-copy slicing.