Basic examples from the trimAl documentation

This example shows how to run the basic methods shown in the trimAl manual, but using the pytrimal API.

[1]:
import pytrimal
pytrimal.__version__
[1]:
'0.3.0'

For this example, we will use one of the example alignments from the trimAl source repository. We use Alignment.load to load the alignment from a filename; note that os.PathLike objects are supported as well.

[2]:
import pathlib
ali = pytrimal.Alignment.load(pathlib.Path("data").joinpath("example.001.AA.clw"))

Let’s see how the original alignment looks before trimming:

[3]:
for name, seq in zip(ali.names, ali.sequences):
    print(name.decode().ljust(10), seq)
Sp8        -----GLGKVIV-YGIVLGTKSDQFSNWVVWLFPWNGLQIHMMGII
Sp10       -------DPAVL-FVIMLGTIT-KFS--SEWFFAWLGLEINMMVII
Sp26       AAAAAAAAALLTYLGLFLGTDYENFA--AAAANAWLGLEINMMAQI
Sp6        -----ASGAILT-LGIYLFTLCAVIS--VSWYLAWLGLEINMMAII
Sp17       --FAYTAPDLL-LIGFLLKTVA-TFG--DTWFQLWQGLDLNKMPVF
Sp33       -------PTILNIAGLHMETDI-NFS--LAWFQAWGGLEINKQAIL

Example 1

Remove all positions in the alignment with gaps in 10% or more of the sequences, unless this leaves less than 60% of original alignment. In such case, print the 60% best (with less gaps) positions. Equivalent to:

$ trimal -in data/example.001.AA.clw -gt 0.9 -cons 60
[4]:
trimmer = pytrimal.ManualTrimmer(gap_threshold=0.9, conservation_percentage=60)
trimmed = trimmer.trim(ali)

for name, seq in zip(trimmed.names, trimmed.sequences):
    print(name.decode().ljust(10), seq)
Sp8        GKVIYGIVLGTKSQFSVVWLFPWNGLQIHMMGII
Sp10       DPAVFVIMLGTITKFSSEWFFAWLGLEINMMVII
Sp26       AALLLGLFLGTDYNFAAAAANAWLGLEINMMAQI
Sp6        GAILLGIYLFTLCVISVSWYLAWLGLEINMMAII
Sp17       PDLLIGFLLKTVATFGDTWFQLWQGLDLNKMPVF
Sp33       PTILAGLHMETDINFSLAWFQAWGGLEINKQAIL

Example 2

Same as Example 1, but the gap score is averaged over a window starting 3 positions before and ending 3 positions after each column.

[5]:
trimmer = pytrimal.ManualTrimmer(gap_threshold=0.9, conservation_percentage=60, window=3)
trimmed = trimmer.trim(ali)

for name, seq in zip(trimmed.names, trimmed.sequences):
    print(name.decode().ljust(10), seq)
Sp8        V-YGIVLGTKSDQLFPWNGLQIHMMGII
Sp10       L-FVIMLGTIT-KFFAWLGLEINMMVII
Sp26       TYLGLFLGTDYENANAWLGLEINMMAQI
Sp6        T-LGIYLFTLCAVYLAWLGLEINMMAII
Sp17       -LIGFLLKTVA-TFQLWQGLDLNKMPVF
Sp33       NIAGLHMETDI-NFQAWGGLEINKQAIL

Example 3

Use an automatic method to decide optimal thresholds, based on the gap scores the input alignment. Equivalent to:

$ trimal -in data/example.001.AA.clw -gappyout
[6]:
trimmer = pytrimal.AutomaticTrimmer(method="gappyout")
trimmed = trimmer.trim(ali)

for name, seq in zip(trimmed.names, trimmed.sequences):
    print(name.decode().ljust(10), seq)
Sp8        GKVIVYGIVLGTKSQFSVVWLFPWNGLQIHMMGII
Sp10       DPAVLFVIMLGTITKFSSEWFFAWLGLEINMMVII
Sp26       AALLTLGLFLGTDYNFAAAAANAWLGLEINMMAQI
Sp6        GAILTLGIYLFTLCVISVSWYLAWLGLEINMMAII
Sp17       PDLL-IGFLLKTVATFGDTWFQLWQGLDLNKMPVF
Sp33       PTILNAGLHMETDINFSLAWFQAWGGLEINKQAIL

Example 4

Use automatic methods to decide optimal thresholds, based on the combination of gap and similarity scores. Equivalent to:

$ trimal -in data/example.001.AA.clw -strictplus
[7]:
trimmer = pytrimal.AutomaticTrimmer(method="strictplus")
trimmed = trimmer.trim(ali)

for name, seq in zip(trimmed.names, trimmed.sequences):
    print(name.decode().ljust(10), seq)
Sp8        GIVLVWLFPWNGLQIHMMGII
Sp10       VIMLEWFFAWLGLEINMMVII
Sp26       GLFLAAANAWLGLEINMMAQI
Sp6        GIYLSWYLAWLGLEINMMAII
Sp17       GFLLTWFQLWQGLDLNKMPVF
Sp33       GLHMAWFQAWGGLEINKQAIL

Example 5

Use an heuristic to decide the optimal method for trimming the alignment. Equivalent to:

$ trimal -in data/example.001.AA.clw -automated1
[8]:
trimmer = pytrimal.AutomaticTrimmer(method="automated1")
trimmed = trimmer.trim(ali)

for name, seq in zip(trimmed.names, trimmed.sequences):
    print(name.decode().ljust(10), seq)
Sp8        VWLFPWNGLQIHMMGII
Sp10       EWFFAWLGLEINMMVII
Sp26       AAANAWLGLEINMMAQI
Sp6        SWYLAWLGLEINMMAII
Sp17       TWFQLWQGLDLNKMPVF
Sp33       AWFQAWGGLEINKQAIL