Trimmer

Base Trimmer

class pytrimal.BaseTrimmer

A sequence alignment trimmer.

All subclasses provide the same trim method, and are configured through their constructor.

__init__(*, backend='detect')

Create a new base trimmer.

Keyword Arguments:

backend (str, optional) – The SIMD extension backend to use to accelerate computation of pairwise similarity statistics. If None given, use the original code from trimAl.

New in version 0.2.0: The backend keyword argument.

trim(alignment, matrix=None)

Trim the provided alignment.

Parameters:
  • alignment (Alignment) – A multiple sequence alignment to trim.

  • matrix (SimilarityMatrix, optional) – An alternative similarity matrix to use for computing the similarity statistic. If None, a default matrix will be used based on the type of the alignment.

Returns:

TrimmedAlignment – The trimmed alignment.

Hint

This method is re-entrant, and can be called safely accross different threads. Most of the computations will be done after releasing the GIL.

Changed in version 0.1.2: Added the matrix optional argument.

backend

The computation backend for this trimmer.

New in version 0.4.0.

Type:

str or None

Automatic Trimmer

class pytrimal.AutomaticTrimmer(BaseTrimmer)

A sequence alignment trimmer with automatic parameter detection.

trimAl provides several heuristic methods for automated trimming of multiple sequence algorithms:

  • strict: A statistical method that combines gaps and similarity statistics to clean the alignment.

  • strictplus: A statistical method that combines gaps and similarity statistics, optimized for Neighbour-Joining tree reconstruction.

  • gappyout: A statistical method that only uses gaps statistic to clean the alignment.

  • automated1: A meta-method that chooses between strict and gappyout, optimized for Maximum Likelihood phylogenetic tree reconstruction.

  • nogaps: A naive method that removes every column containing at least one gap.

  • noallgaps: A naive method that removes every column containing only gaps.

  • noduplicateseqs: A naive method that removes sequences that are equal on the alignment, keeping the latest occurence.

Hint

A Python frozenset containing all valid automatic trimming methods can be obtained with the AutomaticTrimmer.METHODS attribute. This can be useful for listing or validating methods beforehand, e.g. to build a CLI with argparse.

New in version 0.4.0: The AutomaticTrimmer.METHODS class attribute.

New in version 0.5.0: Support for pickle protocol.

__init__(method='strict', *, backend='detect')

Create a new automatic alignment trimmer using the given method.

Parameters:

method (str) – The automatic aligment trimming method. See the documentation for AutomaticTrimmer for a list of supported values.

Keyword Arguments:

backend (str, optional) – The SIMD extension backend to use to accelerate computation of pairwise similarity statistics. If None given, use the original code from trimAl.

Raises:

ValueError – When method is not one of the automatic alignment trimming methods supported by trimAl.

New in version 0.2.0: The backend keyword argument.

New in version 0.4.0: The noduplicateseqs method.

trim(alignment, matrix=None)

Trim the provided alignment.

Parameters:
  • alignment (Alignment) – A multiple sequence alignment to trim.

  • matrix (SimilarityMatrix, optional) – An alternative similarity matrix to use for computing the similarity statistic. If None, a default matrix will be used based on the type of the alignment.

Returns:

TrimmedAlignment – The trimmed alignment.

Hint

This method is re-entrant, and can be called safely accross different threads. Most of the computations will be done after releasing the GIL.

Changed in version 0.1.2: Added the matrix optional argument.

backend

The computation backend for this trimmer.

New in version 0.4.0.

Type:

str or None

Manual Trimmer

class pytrimal.ManualTrimmer(BaseTrimmer)

A sequence alignment trimmer with manually defined thresholds.

Manual trimming allows the user to specify independent thresholds for two different statistics:

  • Gap threshold: Remove columns where the gap ratio (or the absolute gap count) is higher than the provided threshold.

  • Similarity threshold: Remove columns with a similarity ratio lower than the provided threshold.

In addition, the trimming can be restricted so that at least a configurable fraction of the original alignment is retained, in order to avoid stripping an alignment of distance sequences by aggressive trimming.

__init__(*, gap_threshold=None, gap_absolute_threshold=None, similarity_threshold=None, conservation_percentage=None, window=None, gap_window=None, similarity_window=None, backend='detect')

Create a new manual alignment trimmer with the given parameters.

Keyword Arguments:
  • gap_threshold (float, optional) – The minimum fraction of non-gap characters that must be present in a column to keep the column.

  • gap_absolute_threshold (int, optional) – The absolute number of gaps allowed on a column to keep it in the alignment. Incompatible with gap_threshold.

  • similarity_threshold (float, optional) – The minimum average similarity required.

  • conservation_percentage (float, optional) – The minimum percentage of positions in the original alignment to conserve.

  • window (int, optional) – The size of the half-window to use when computing statistics for an alignment.

  • gap_window (int, optional) – The size of the half-window to use when computing the gap statistic for an alignment. Incompatible with window.

  • similarity_window (int, optional) – The size of the half-window to use when computing the similarity statistic for an alignment. Incompatible with window.

  • backend (str, optional) – The SIMD extension backend to use to accelerate computation of pairwise similarity statistics. If None given, use the original code from trimAl.

New in version 0.2.0: The backend keyword argument.

New in version 0.2.2: The keyword arguments for controling the half-window sizes.

Changed in version 0.4.0: Removed consistency_threshold and consistency_window.

trim(alignment, matrix=None)

Trim the provided alignment.

Parameters:
  • alignment (Alignment) – A multiple sequence alignment to trim.

  • matrix (SimilarityMatrix, optional) – An alternative similarity matrix to use for computing the similarity statistic. If None, a default matrix will be used based on the type of the alignment.

Returns:

TrimmedAlignment – The trimmed alignment.

Hint

This method is re-entrant, and can be called safely accross different threads. Most of the computations will be done after releasing the GIL.

Changed in version 0.1.2: Added the matrix optional argument.

backend

The computation backend for this trimmer.

New in version 0.4.0.

Type:

str or None