Multiple Sequence Alignment

Alignments

Alignment

class pytrimal.Alignment

A multiple sequence alignment.

__init__(names, sequences)

Create a new alignment with the given names and sequences.

Parameters
  • names (Sequence of bytes) – The names of the sequences in the alignment.

  • sequences (Sequence of str) – The actual sequences in the alignment.

Examples

Create a new alignment with a list of sequences and a list of names:

>>> alignment = Alignment(
...     names=[b"Sp8", b"Sp10", b"Sp26"],
...     sequences=[
...         "-----GLGKVIV-YGIVLGTKSDQFSNWVVWLFPWNGLQIHMMGII",
...         "-------DPAVL-FVIMLGTIT-KFS--SEWFFAWLGLEINMMVII",
...         "AAAAAAAAALLTYLGLFLGTDYENFA--AAAANAWLGLEINMMAQI",
...     ]
... )

There should be as many sequences as there are names, otherwise a ValueError will be raised:

>>> Alignment(
...     names=[b"Sp8", b"Sp10", b"Sp26"],
...     sequences=["GLQIHMMGII", "GLEINMMVII"]
... )
Traceback (most recent call last):
...
ValueError: `Alignment` given 3 names but 2 sequences

Sequence characters will be checked, and an error will be raised if they are not one of the characters from a biological alphabet:

>>> Alignment(
...     names=[b"Sp8", b"Sp10"],
...     sequences=["GLQIHMMGII", "GLEINMM123"]
... )
Traceback (most recent call last):
...
ValueError: The sequence "Sp10" has an unknown (49) character
load(path)

Load a multiple sequence alignment from a file.

Parameters

path (str, bytes or os.PathLike) – The path to the file containing the serialized alignment to load.

Returns

Alignment – The deserialized alignment.

Example

>>> msa = Alignment.load("example.001.AA.clw")
>>> msa.names
[b'Sp8', b'Sp10', b'Sp26', b'Sp6', b'Sp17', b'Sp33']
names

The names of the sequences in the alignment.

Type

sequence of bytes

residues

The residues in the alignment.

Type

AlignmentResidues

sequences

The sequences in the alignment.

Type

AlignmentSequences

Trimmed Alignment

class pytrimal.TrimmedAlignment(Alignment)

A multiple sequence alignment that has been trimmed.

Internally, the trimming process only produces a mask of sequences and a mask of residues. This class exposes the filtered sequences and residues.

__init__(*args, **kwargs)
names

The names of the sequences in the alignment.

Type

sequence of bytes

sequences

The sequences in the alignment.

Type

AlignmentSequences

Attributes

AlignmentSequences

class pytrimal.AlignmentSequences

A read-only view over the sequences of an alignment.

Objects from this class are created in the sequences property of Alignment objects. Use it to access the string data of individual rows from the alignment:

>>> msa = Alignment.load("example.001.AA.clw")
>>> len(msa.sequences)
6
>>> msa.sequences[0]
'-----GLGKVIV-YGIVLGTKSDQFSNWVVWLFPWNGLQIHMMGII'
>>> sum(letter == '-' for seq in msa.sequences for letter in seq)
43

AlignmentResidues

class pytrimal.AlignmentResidues

A read-only view over the residues of an alignment.

Objects from this class are created in the residues property of Alignment objects. Use it to access the string data of individual columns from the alignment:

>>> msa = Alignment.load("example.001.AA.clw")
>>> len(msa.residues)
46
>>> msa.residues[0]
'--A---'
>>> msa.residues[-1]
'IIIIFL'