Trimmer

Base Trimmer

class pytrimal.BaseTrimmer

A sequence alignment trimmer.

All subclasses provide the same trim method, and are configured through their constructor.

__init__(*, backend='detect')

Create a new base trimmer.

Keyword Arguments:

backend (str, optional) – The SIMD extension backend to use to accelerate computation of pairwise similarity statistics. If None given, use the original code from trimAl.

New in version 0.2.0: The backend keyword argument.

trim(alignment, matrix=None)

Trim the provided alignment.

Parameters:
  • alignment (Alignment) – A multiple sequence alignment to trim.

  • matrix (SimilarityMatrix, optional) – An alternative similarity matrix to use for computing the similarity statistic. If None, a default matrix will be used based on the type of the alignment.

Returns:

TrimmedAlignment – The trimmed alignment.

Hint

This method is re-entrant, and can be called safely accross different threads. Most of the computations will be done after releasing the GIL.

Changed in version 0.1.2: Added the matrix optional argument.

backend

The computation backend for this trimmer.

New in version 0.4.0.

Type:

str or None

Automatic Trimmer

class pytrimal.AutomaticTrimmer(BaseTrimmer)

A sequence alignment trimmer with automatic parameter detection.

trimAl provides several heuristic methods for automated trimming of multiple sequence algorithms:

  • strict: A statistical method that combines gaps and similarity statistics to clean the alignment.

  • strictplus: A statistical method that combines gaps and similarity statistics, optimized for Neighbour-Joining tree reconstruction.

  • gappyout: A statistical method that only uses gaps statistic to clean the alignment.

  • automated1: A meta-method that chooses between strict and gappyout, optimized for Maximum Likelihood phylogenetic tree reconstruction.

  • nogaps: A naive method that removes every column containing at least one gap.

  • noallgaps: A naive method that removes every column containing only gaps.

  • noduplicateseqs: A naive method that removes sequences that are equal on the alignment, keeping the latest occurence.

Hint

A Python frozenset containing all valid automatic trimming methods can be obtained with the AutomaticTrimmer.METHODS attribute. This can be useful for listing or validating methods beforehand, e.g. to build a CLI with argparse.

New in version 0.4.0: The AutomaticTrimmer.METHODS class attribute.

New in version 0.5.0: Support for pickle protocol.

__init__(method='strict', *, backend='detect')

Create a new automatic alignment trimmer using the given method.

Parameters:

method (str) – The automatic aligment trimming method. See the documentation for AutomaticTrimmer for a list of supported values.

Keyword Arguments:

backend (str, optional) – The SIMD extension backend to use to accelerate computation of pairwise similarity statistics. If None given, use the original code from trimAl.

Raises:

ValueError – When method is not one of the automatic alignment trimming methods supported by trimAl.

New in version 0.2.0: The backend keyword argument.

New in version 0.4.0: The noduplicateseqs method.

trim(alignment, matrix=None)

Trim the provided alignment.

Parameters:
  • alignment (Alignment) – A multiple sequence alignment to trim.

  • matrix (SimilarityMatrix, optional) – An alternative similarity matrix to use for computing the similarity statistic. If None, a default matrix will be used based on the type of the alignment.

Returns:

TrimmedAlignment – The trimmed alignment.

Hint

This method is re-entrant, and can be called safely accross different threads. Most of the computations will be done after releasing the GIL.

Changed in version 0.1.2: Added the matrix optional argument.

backend

The computation backend for this trimmer.

New in version 0.4.0.

Type:

str or None

Manual Trimmer

class pytrimal.ManualTrimmer(BaseTrimmer)

A sequence alignment trimmer with manually defined thresholds.

Manual trimming allows the user to specify independent thresholds for two different statistics:

  • Gap threshold: Remove columns where the gap ratio (or the absolute gap count) is higher than the provided threshold.

  • Similarity threshold: Remove columns with a similarity ratio lower than the provided threshold.

In addition, the trimming can be restricted so that at least a configurable fraction of the original alignment is retained, in order to avoid stripping an alignment of distance sequences by aggressive trimming.

__init__(*, gap_threshold=None, gap_absolute_threshold=None, similarity_threshold=None, conservation_percentage=None, window=None, gap_window=None, similarity_window=None, backend='detect')

Create a new manual alignment trimmer with the given parameters.

Keyword Arguments:
  • gap_threshold (float, optional) – The minimum fraction of non-gap characters that must be present in a column to keep the column.

  • gap_absolute_threshold (int, optional) – The absolute number of gaps allowed on a column to keep it in the alignment. Incompatible with gap_threshold.

  • similarity_threshold (float, optional) – The minimum average similarity required.

  • conservation_percentage (float, optional) – The minimum percentage of positions in the original alignment to conserve.

  • window (int, optional) – The size of the half-window to use when computing statistics for an alignment.

  • gap_window (int, optional) – The size of the half-window to use when computing the gap statistic for an alignment. Incompatible with window.

  • similarity_window (int, optional) – The size of the half-window to use when computing the similarity statistic for an alignment. Incompatible with window.

  • backend (str, optional) – The SIMD extension backend to use to accelerate computation of pairwise similarity statistics. If None given, use the original code from trimAl.

New in version 0.2.0: The backend keyword argument.

New in version 0.2.2: The keyword arguments for controling the half-window sizes.

Changed in version 0.4.0: Removed consistency_threshold and consistency_window.

trim(alignment, matrix=None)

Trim the provided alignment.

Parameters:
  • alignment (Alignment) – A multiple sequence alignment to trim.

  • matrix (SimilarityMatrix, optional) – An alternative similarity matrix to use for computing the similarity statistic. If None, a default matrix will be used based on the type of the alignment.

Returns:

TrimmedAlignment – The trimmed alignment.

Hint

This method is re-entrant, and can be called safely accross different threads. Most of the computations will be done after releasing the GIL.

Changed in version 0.1.2: Added the matrix optional argument.

backend

The computation backend for this trimmer.

New in version 0.4.0.

Type:

str or None

Representative Trimmer

class pytrimal.RepresentativeTrimmer(BaseTrimmer)

A sequence alignment trimmer for selecting representative sequences.

Representative sequences on an alignment can be selected using a specific identity threshold, or a fixed number of representative sequences to keep. Representative trimming can be useful to reduce the weight of certain very similar sequences in an alignment, for instance to build a less conservative HMM.

New in version 0.5.0.

__init__(clusters=None, identity_threshold=None, *, backend='detect')

Create a new representative alignment trimmer.

Parameters:
  • clusters (int, optional) – The number of cluster representatives to keep in the trimmed alignment. Must be strictly positive. If the trimmer receives an alignment with less sequences than this, it will not perform any trimming.

  • identity_threshold (float, optional) – The identity threshold for which to get representative sequences.

Keyword Arguments:

backend (str, optional) – The SIMD extension backend to use to accelerate computation of pairwise similarity statistics. If None given, use the original code from trimAl.

Raises:

ValueError – When both clusters and identity_threshold are provided at the same time, or when they don’t fall in a valid range.

trim(alignment, matrix=None)

Trim the provided alignment.

Parameters:
  • alignment (Alignment) – A multiple sequence alignment to trim.

  • matrix (SimilarityMatrix, optional) – An alternative similarity matrix to use for computing the similarity statistic. If None, a default matrix will be used based on the type of the alignment.

Returns:

TrimmedAlignment – The trimmed alignment.

Hint

This method is re-entrant, and can be called safely accross different threads. Most of the computations will be done after releasing the GIL.

Changed in version 0.1.2: Added the matrix optional argument.

backend

The computation backend for this trimmer.

New in version 0.4.0.

Type:

str or None

Overlap Trimmer

class pytrimal.OverlapTrimmer(BaseTrimmer)

A sequence alignment trimmer for overlap blocks.

Overlap trimming works by defining “good” positions, i.e. a position where most sequences agree (given a certain threshold) that the alignment contains a gap or a residue (independently of their agreement on that given residue). Sequences not containing enough “good” positions are then removed.

Example

Consider the following alignment, where the last three sequences align decently, while the first sequence doesn’t. In particular, it creates a large gap in the rest of the alignment:

>>> ali = Alignment(
...     names=[b"Sp8", b"Sp17", b"Sp10", b"Sp26"],
...     sequences=[
...         "LG-----------TKSD---NNNNNNNNNNNNNNNNWV----------",
...         "APDLLL-IGFLLKTV-ATFG-----------------DTWFQLWQGLD",
...         "DPAVL--FVIMLGTI-TKFS-----------------SEWFFAWLGLE",
...         "AAALLTYLGLFLGTDYENFA-----------------AAAANAWLGLE",
...     ]
... )

Let’s create an overlap trimmer so that “good” positions correspond to an agreement between at least half of the sequences, and make it remove sequences with less than 40% of good positions:

>>> trimmer = OverlapTrimmer(40.0, 0.5)

Trimming will remove the first sequence because it doesn’t contain enough good positions; then, the block containing only gaps will be removed (this is the default behaviour of all trimmer objects):

>>> trimmed = trimmer.trim(ali)
>>> for name, seq in zip(trimmed.names, trimmed.sequences):
...     print(name.decode().ljust(5), seq)
Sp17  APDLLL-IGFLLKTV-ATFGDTWFQLWQGLD
Sp10  DPAVL--FVIMLGTI-TKFSSEWFFAWLGLE
Sp26  AAALLTYLGLFLGTDYENFAAAAANAWLGLE

New in version 0.4.0.

New in version 0.5.0: Support for the pickle protocol.

__init__(sequence_overlap, residue_overlap, *, backend='detect')

Create a new overlap trimmer with the given thresholds.

Parameters:
  • sequence_overlap (float) – The minimum percentage of “good” positions a sequence must contain to be kept in the alignment.

  • residue_overlap (float) – The fraction of matching residues a column must contain to be considered a “good” position.

Keyword Arguments:

backend (str, optional) – The SIMD extension backend to use to accelerate computation of pairwise similarity statistics. If None given, use the original code from trimAl.

trim(alignment, matrix=None)

Trim the provided alignment.

Parameters:
  • alignment (Alignment) – A multiple sequence alignment to trim.

  • matrix (SimilarityMatrix, optional) – An alternative similarity matrix to use for computing the similarity statistic. If None, a default matrix will be used based on the type of the alignment.

Returns:

TrimmedAlignment – The trimmed alignment.

Hint

This method is re-entrant, and can be called safely accross different threads. Most of the computations will be done after releasing the GIL.

Changed in version 0.1.2: Added the matrix optional argument.

backend

The computation backend for this trimmer.

New in version 0.4.0.

Type:

str or None