Skip to content

Commit

Permalink
Merge pull request #67 from MPI-Dortmund/strategies
Browse files Browse the repository at this point in the history
This PR adds strategies to the documentation
  • Loading branch information
thorstenwagner authored Nov 2, 2023
2 parents 6823db8 + 0c5c064 commit ff9fb81
Show file tree
Hide file tree
Showing 5 changed files with 122 additions and 13 deletions.
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ User Guide: full table of contents

installation
tutorials/tutorials_overview
strategies/strategy_overview
developer/devs
Changes <https://github.com/MPI-Dortmund/tomotwin-cryoet/releases>

Expand Down
67 changes: 67 additions & 0 deletions docs/strategies/strategy_01.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
Strategy 1: Refinement of references/targets using umaps
========================================================

When to use it
--------------

You have selected references or cluster targets, but you are not satisfied with the picking results. The embedding computed from a cluster or reference is not always an ideal representation. Some references just don't work well, and sometimes umap doesn't show all the structure that is actually in the umap embedding.

What it does
------------

This strategy takes your references/targets and collects all embeddings that are slightly similar to at least one of your references/targets (similarity > 0.5). These embeddings are then used to estimate a UMAP. In some cases, you will see new structures in the umap, where some of these new structures of the umap correspond to irrelevant embeddings (e.g. membranes). By finding the cluster in the umap that actually corresponds to your target protein, you can improve the picking!

How to use it
-------------

I assume you ran the reference workflow in this example. But it can easily be used with cluster target embeddings as well.

1. Filter the tomogram embeddings
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

We first select those embeddings that are reasonably close (`-t 0.5`) to our reference embeddings.

.. prompt:: bash $

tomotwin_tools.py filter_embedding -i embed/tomo_embeddings.temb -m map/map.tmap -t 0.5 -o filter/ --lower --concat

2. Estimate umap
^^^^^^^^^^^^^^^^

.. prompt:: bash $

tomotwin_tools.py umap -i filter/tomo_embeddings_filtered_allrefs.temb -o umap/


3. Start napari and select regions of interest
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

To start napari run:

.. prompt:: bash $

napari tomo/tomo.mrc umap/tomo_embeddings_filtered_allrefs_label_mask.mrci

After starting napari, load the clustering plugin: :guilabel:`Plugins` -> :guilabel:`napari-tomotwin` -> :guilabel:`Cluster umap embeddings`.

Within the plugin, select the :file:`.tumap` file in the :file:`umap/` folder and press :guilabel:`load`.

Select your targets in the umap. You can select multiple targets by pressing :kbd:`Shift`. Save your targets when you are done. I assume you saved them in `cluster_targets/`.

4. Map the cluster targets with the tomogram embeddings
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. prompt:: bash $

tomotwin_map.py distance -r cluster_targets/cluster_targets.temb -v embed/tomo_embeddings.temb -o map_cluster/


5. Locate the particles
^^^^^^^^^^^^^^^^^^^^^^^

.. prompt:: bash $

tomotwin_locate.py findmax -m map_cluster/map.tmap -o locate_refined/


Check your results with the napari-boxmanager :-)
42 changes: 42 additions & 0 deletions docs/strategies/strategy_02.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
Strategy 2: Speedup and improvement of embeddings using median filtering
========================================================================

When to use it
--------------

You may use this strategy for two reasons:

1. It makes the embedding step much faster. So you might just want to save time.
2. It skips a large part of the empty areas of the tomogram, which can improve the umap.

What it does
------------

The command used in the strategy creates a mask. This mask defines a ROI within the tomogram and excludes areas that are most likely not of interest.

It takes advantage of the fact that the average position within a tomogram is unlikely to contain a centered protein. Thus, if a sample of some positions within the tomogram is taken, the median of these embeddings is likely to be a good representation of the background embedding.

The command first calculates the embeddings of a tomogram using a large stride (coarse sampling). It then calculates the median embedding from the coarse tomogram embeddings. Using the median embedding, we can calculate a heatmap of how likely it is that a given position is a background embedding. From this heatmap, a mask is generated using only those positions that are highly dissimilar to the median embedding.

This mask can then be used to compute the embeddings with a smaller stride (fine sampling). But using the mask effectively reduces the total number of embeddings and makes the embedding faster, which is the first advantage of this strategy.

In addition, and this is the second advantage, fewer background embeddings also means that the umap can focus on those embeddings that are actually more important, which may yield more protein clusters.

How to use it
-------------

1. Estimate the mask

To calculate the mask, all you need is your tomogram and the latest TomoTwin model:

.. prompt:: bash $

CUDA_VISIBLE_DEVICES=0,1 tomotwin_tools.py embedding_mask median -i tomo/tomo.mrc -m tomotwin_latest.pth -o mask

2. Calculate the (filtered) embeddings

.. prompt:: bash $

CUDA_VISIBLE_DEVICES=0,1 tomotwin_embed.py tomogram -v tomo/tomo.mrc -m tomotwin_latest.pth --mask mask/tomo_mask.mrc

Once the embeddings are computed, you can simply continue with either the reference or clustering workflow.
9 changes: 9 additions & 0 deletions docs/strategies/strategy_overview.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
=========
Strategies
=========

.. _strategy-01:
.. include:: strategy_01.rst

.. _strategy-02:
.. include:: strategy_02.rst
16 changes: 3 additions & 13 deletions docs/tutorials/text_modules/embed.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,18 +10,8 @@ To embed your tomogram using two GPUs and batchsize of 256 use:

To have your tomograms embedded as quick as possible, you should choose a batchsize that utilize your GPU memory as much as possible. However, if you choose it too big, you might run into memory problems. In those cases play around with different batch sizes and check the memory usage with `nvidia-smi`.

.. hint:: **Speed up embedding using a mask**
.. hint:: **Strategy: Speed up embedding calculation using a mask**

With TomoTwin 0.5, the emedding command supports the use of masks. With masks you can define which regions of your tomogram get actually embedded and therefore speedup the embbeding.
We also provide new tools that calculates mask that excludes areas that probably does not contain any protein. You can run it with:
Using masks can dramatically speed up the embedding calculation. It can also improve the estimated umaps!

.. prompt:: bash $

tomotwin_tools.py embedding_mask -i your_tomo_a10.mrc -o out/mask/

The mask you find there can be used when running ``tomotwin_embed.py`` using the argument ``--mask``.
As this is still experimental, please check if the masks do not exclude any important areas. You can do that easiliy with napari by opening the tomogram and your mask and then change the opacity of your mask:

.. prompt:: bash $

napari your_tomo_a10.mrc out_mask/your_tomo_a10_mask.mrc
Check out the :ref:`corresponding strategy <strategy-02>`!

0 comments on commit ff9fb81

Please sign in to comment.