Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it necessary to de-redundant the resulting transcript model? #257

Open
wjcre2023 opened this issue Nov 6, 2024 · 9 comments
Open

Is it necessary to de-redundant the resulting transcript model? #257

wjcre2023 opened this issue Nov 6, 2024 · 9 comments
Labels
question Further information is requested weird results Something looks odd in the resulting files

Comments

@wjcre2023
Copy link

Dear andrewprzh
I ran all the samples together and ended up with a transcript model .gtf file. However, many new but very similar transcripts have been found in IGV with the same number of exons, with only a few hundred bp difference in the length of some exons. Do these require further removal of redundancy? Because this is most likely due to degradation of the 5 'end.How to deal with this situation? Should reads be de-redundant before quantitative and transcriptional modeling?
image

best wishes
jie

@andrewprzh
Copy link
Collaborator

Dear @wjcre2023

Could send a command line you used?
Normally, IsoQuant does not report multiple transcripts with the same intron chain unless strong evidence is found, for example, multiple distinct polyadenylation sites.

Best
Andrey

@wjcre2023
Copy link
Author

Dear @andrewprzh
My parameters are as follows:
image
The file I used was a full-length transcript .fq identified by pychopper.

Also it seems that the.gff file is still not available in 3.6.1, I uploaded a log file.
isoquant_log.txt

@andrewprzh
Copy link
Collaborator

Dear @wjcre2023

Yes, I think the main reason is --fl_data option. It considers that all reads correspond to a full-length transcript and 5' and 3' are correctly detected. Thus, you have transcripts with the same intron but different TSS and TES positions. I suggest to re-run IsoQuant without any options.
Also, it is possible to run IsoQuant without any pre-preocessing on raw ONT data.

P.S. You log shows an error caused by duplicated ids in your reference annotation.

Best
Andrey

@andrewprzh andrewprzh added the question Further information is requested label Nov 18, 2024
@wjcre2023
Copy link
Author

Dear@andrewprzh
Thank you for your reply! I will delete this parameter and try again. In addition, this duplication seems to be correct, because one protein ID corresponds to multiple CDS. I don't know why the error was reported, can you give me some advice?
image

@andrewprzh
Copy link
Collaborator

Dear @wjcre2023

The ID should be unique for all features, even for exons belonging to a single CDS.
From GFF documentation:

ID
Indicates the unique identifier of the feature. IDs must be unique within the scope of the GFF file.

So that's why gffutils library that IsoQuant uses to convert GFF to gene database freaks about this. I think it's better to modify our annotation.

It is possible to ignore these warnings (i.e. convert GFF to database with other options), but then the outcome is not predictable.

Best
Andrey

@wjcre2023
Copy link
Author

Dear@andrewprzh
Ok, thank you very much. I think I see what you mean.
Best
Jie

@wjcre2023
Copy link
Author

Dear @andrewprzh
Unfortunately, the result of my re-run has not changed much from before. I was re-running isoquant with bam from previous alignment to save time.Is it related to these two parameters?
image
Here are my results:
image
image

@andrewprzh
Copy link
Collaborator

Dear @wjcre2023

The parameters looks OK. Could you send me GTF records of these two transcripts?
I can take a look, but, of course, it mat be hard to understand the real reason behind it without having the data.

Best
Andrey

@andrewprzh andrewprzh added the weird results Something looks odd in the resulting files label Dec 3, 2024
@wjcre2023
Copy link
Author

Dear@andrewprzh
Thank you for your reply!
Here are my two examples:
example.txt
image
image
Best
jie

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested weird results Something looks odd in the resulting files
Projects
None yet
Development

No branches or pull requests

2 participants