{ "cells": [ { "cell_type": "markdown", "id": "805a9bde-7fde-4eeb-abc1-bc15b8aceb7f", "metadata": {}, "source": [ "# Basic examples from the trimAl documentation\n", "\n", "This example shows how to run the basic methods shown in the trimAl manual, but using the `pytrimal` API." ] }, { "cell_type": "code", "execution_count": null, "id": "a1f5f0b9-e3e3-413b-8ae6-486069e0705f", "metadata": {}, "outputs": [], "source": [ "import pytrimal\n", "pytrimal.__version__" ] }, { "cell_type": "markdown", "id": "c1e55561-0213-47e1-addb-5c4252051850", "metadata": {}, "source": [ "For this example, we will use an alignment used to generate HMMs for Pfam (namely [PF12574](https://www.ebi.ac.uk/interpro/entry/pfam/PF12574/), a family of Rickettsia surface antigen). We use `Alignment.load` to load the alignment from a filename; note that `os.PathLike` objects are supported as well." ] }, { "cell_type": "code", "execution_count": null, "id": "7f8d4a1c-eaa0-4d29-85e4-9ec852cb4df2", "metadata": {}, "outputs": [], "source": [ "import pathlib\n", "ali = pytrimal.Alignment.load(pathlib.Path(\"data\").joinpath(\"PF12574.full.afa\"))" ] }, { "cell_type": "markdown", "id": "a0ae4dc6-1466-446a-8ec4-87fce67bfefd", "metadata": {}, "source": [ "To easily compare the alignments, let's use [rich-msa](https://pypi.org/project/rich-msa), a Python package for displaying multiple sequence alignments with [rich](https://github.com/Textualize/rich)." ] }, { "cell_type": "code", "execution_count": null, "id": "94f9efe9-7bf6-4203-ab2e-45007716709a", "metadata": {}, "outputs": [], "source": [ "import rich.console\n", "import rich.panel\n", "from rich_msa import RichAlignment\n", "\n", "def show_alignment(alignment):\n", " console = rich.console.Console(width=len(alignment.sequences[0])+40)\n", " widget = RichAlignment(names=[n.decode() for n in alignment.names], sequences=alignment.sequences, max_name_width=30)\n", " panel = rich.panel.Panel(widget, title_align=\"left\", title=\"({} residues, {} sequences)\".format(len(alignment.sequences[0]), len(alignment.sequences)))\n", " console.print(panel)" ] }, { "cell_type": "markdown", "id": "9cb77186-8f4b-4f7b-a60f-72ddf325828e", "metadata": {}, "source": [ "Let's see how the original alignment looks before trimming:" ] }, { "cell_type": "code", "execution_count": null, "id": "cb4bdda8-0206-4ca9-ba92-dccf1003393b", "metadata": { "tags": [] }, "outputs": [], "source": [ "show_alignment(ali)" ] }, { "cell_type": "markdown", "id": "293dddfc-ac4b-444d-9bd5-2b670f5f8e3f", "metadata": {}, "source": [ "## Example 1\n", "\n", "Remove all positions in the alignment with gaps in 10% or more of the sequences, unless this leaves less than 60% of original alignment. In such case, print the 60% best (with less gaps) positions. Equivalent to:\n", "```\n", "$ trimal -in data/example.001.AA.clw -gt 0.9 -cons 60\n", "```" ] }, { "cell_type": "code", "execution_count": null, "id": "86e2b0cc-6cad-48a2-a615-a37ecd48a13e", "metadata": {}, "outputs": [], "source": [ "trimmer = pytrimal.ManualTrimmer(gap_threshold=0.9, conservation_percentage=60)\n", "trimmed = trimmer.trim(ali)\n", "show_alignment(trimmed)" ] }, { "cell_type": "markdown", "id": "b40760e3-b73a-41d9-9a92-0cc824b09e73", "metadata": {}, "source": [ "## Example 2\n", "\n", "Same as *Example 1*, but the gap score is averaged over a window starting 3 positions before and ending 3 positions after each column." ] }, { "cell_type": "code", "execution_count": null, "id": "872d8097-f54c-469b-9a5c-dba02da590b1", "metadata": {}, "outputs": [], "source": [ "trimmer = pytrimal.ManualTrimmer(gap_threshold=0.9, conservation_percentage=60, window=3)\n", "trimmed = trimmer.trim(ali)\n", "show_alignment(trimmed)" ] }, { "cell_type": "markdown", "id": "d7b66c60-54d8-4ab2-ac19-dfd4dfab7a90", "metadata": {}, "source": [ "## Example 3\n", "\n", "Use an automatic method to decide optimal thresholds, based on the gap scores the input alignment. Equivalent to:\n", "```\n", "$ trimal -in data/example.001.AA.clw -gappyout\n", "```" ] }, { "cell_type": "code", "execution_count": null, "id": "00e658aa-9280-437d-a024-853bdaa3fafa", "metadata": {}, "outputs": [], "source": [ "trimmer = pytrimal.AutomaticTrimmer(method=\"gappyout\")\n", "trimmed = trimmer.trim(ali)\n", "show_alignment(trimmed)" ] }, { "cell_type": "markdown", "id": "01efceb6-8eec-4548-9b70-7178a7670537", "metadata": {}, "source": [ "## Example 4\n", "\n", "Use automatic methods to decide optimal thresholds, based on the combination of gap and similarity scores. Equivalent to:\n", "```\n", "$ trimal -in data/example.001.AA.clw -strictplus\n", "```" ] }, { "cell_type": "code", "execution_count": null, "id": "2580fe78-64a6-43f5-808a-fd3220804c69", "metadata": {}, "outputs": [], "source": [ "trimmer = pytrimal.AutomaticTrimmer(method=\"strictplus\")\n", "trimmed = trimmer.trim(ali)\n", "show_alignment(trimmed)" ] }, { "cell_type": "markdown", "id": "664c6392-bb2b-4b21-ae4d-655a97f0d65a", "metadata": {}, "source": [ "## Example 5\n", "\n", "Use an heuristic to decide the optimal method for trimming the alignment. Equivalent to:\n", "```\n", "$ trimal -in data/example.001.AA.clw -automated1\n", "```" ] }, { "cell_type": "code", "execution_count": null, "id": "38499837-b1ef-45a0-844d-be36adae2f47", "metadata": {}, "outputs": [], "source": [ "trimmer = pytrimal.AutomaticTrimmer(method=\"automated1\")\n", "trimmed = trimmer.trim(ali)\n", "show_alignment(trimmed)" ] }, { "cell_type": "markdown", "id": "39b39b38-1f8c-4f00-81ca-96a811fab03d", "metadata": {}, "source": [ "## Example 6\n", "\n", "Use residues and sequences overlap thresholds to delete some sequences from the alignment. See the [trimAl User Guide](http://trimal.cgenomics.org) for details.\n", "\n", "```\n", "$ trimal -in data/example.001.AA.clw -resoverlap 0.6 -seqoverlap 75\n", "```" ] }, { "cell_type": "code", "execution_count": null, "id": "bdf775d0-81e8-4202-9496-ac8f64cfb37c", "metadata": {}, "outputs": [], "source": [ "trimmer = pytrimal.OverlapTrimmer(residue_overlap=0.6, sequence_overlap=75)\n", "trimmed = trimmer.trim(ali)\n", "show_alignment(trimmed)" ] }, { "cell_type": "markdown", "id": "b1c127f5-14ee-4fe3-b51b-32f71d806cc0", "metadata": {}, "source": [ "## Example 10\n", "\n", "Select the 5 most representative sequences from the alignment.\n", "\n", "```\n", "$ trimal -in data/example.001.AA.clw -clusters 5\n", "```" ] }, { "cell_type": "code", "execution_count": null, "id": "0e8f515a-b458-4efc-80d5-f0428fb210e0", "metadata": {}, "outputs": [], "source": [ "trimmer = pytrimal.RepresentativeTrimmer(clusters=5)\n", "trimmed = trimmer.trim(ali)\n", "show_alignment(trimmed)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.6" } }, "nbformat": 4, "nbformat_minor": 5 }