-
Notifications
You must be signed in to change notification settings - Fork 83
Planned Analysis: Filter and Annotate Fusions #39
Comments
@kgaonkar6 and I have created a workflow for this, will update soon! |
Is this pipeline described in AlexsLemonade/OpenPBTA-manuscript#21? |
It is not yet described there. Do you think we should add it to the methods section or part of the analysis/results section? We created a workflow (and @kgaonkar6 created an R package) to annotate fusion gene partners as TSG, oncogene, kinase, TF, receptor, added expression of each gene, filtered out artifacts, fusions in normal tissues, etc, to come up with a high-confidence list of putative driver fusions. It is probably more of a method, but weren't going to add until this PR was finished - still making tweaks. |
Ah, sounds good. This is the order I would expect. Are the |
They are all post data download of TSV files. |
@jaclyn-taroni @cgreene - seeking advice on this PR. We plan to create a package to do the annotations and prioritization, but have some bugs currently. We were thinking of creating code for the PR that would use the new tool and spit out the results (TXT file and figures), but in the meantime, for this PR, would you rather us contribute the entirety of the code as we have in this repo https://github.com/d3b-center/fusion_filtering_pipeline? It has been a work in progress for several months, so may be a lot to go through for the purposes of the PR. cc: @kgaonkar6 |
It would be great to have that as a reusable analysis workflow. Sounds like that's your goal with creating a package. If you're open to it, we could put that code through code review, as it is often helpful to have some fresh eyes on a piece of work when the goal is to make something more generalizable/reusable. Before we figure out the mechanics of getting it through review and which repository, etc., I have a few questions. The most important of which is: what is the broad idea of what this pipeline does? Follow-up questions: What are the inputs to the pipeline? Can you make the files you are using as input public? |
To clarify, this is very helpful:
I'm wondering about things like where the fusions observed in normal tissue information is coming from. |
Good point - I think a code review would be helpful. The goals of the package would be
Inputs are the fusion output files from arriba and star-fusion. While we are only using these two algorithms, in the past, we have run 4 other algorithms and plan to add the capability of some of those output files as input to this package. There are a host of annotation tools and databases used and for normal fusion removal, we are using Fusion Annotator and Arriba has its own blacklist. Now that I am writing this, I think we should remove from STAR-fusion the fusions present in the arriba blacklist. There are a lot of pieces to this, so you will see. Hope to have the PR submitted today or tomorrow with what we have to date. |
What will the planned PR consist of? Will it be some wrapper script that calls the code in https://github.com/d3b-center/fusion_filtering_pipeline? As you state, there's quite a bit of code in that repository. It would be infeasible to review it well all at once. Is the plan to submit a draft pull request and that's where we'll discuss splitting it up (per https://github.com/AlexsLemonade/OpenPBTA-analysis/blob/master/CONTRIBUTING.md#size-and-composition-of-pull-requests)? |
Yes, the plan is to submit a bash script to call all of those scripts in the correct order. Also just realized we annotated the arriba fusions to be consistently annotated with star fusion and rather than having users reproduce that via this PR (requires 7GB database download), we will release the annotated fusions in V3. Will try to get this released today with @yuankunzhu. |
Okay. Having the order will be helpful in figuring out next steps. If more context is needed, we can discuss here or on the pull request. |
it can be from multiple resources. For instance, TCGA normal samples have been analyzed for fusions; but the best source might be GTEx. One issue with using normal sample for filtering is that the normal samples should be analyzed with the same pipeline used for cancer so to minimize tool introduced artifacts. |
Here, we will filter potential artifacts, filter fusions observed in normal tissue, retain high-confidence calls, and annotate with several databases to create a final list of putative driver fusions.
The text was updated successfully, but these errors were encountered: