Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: Filter adVNTR results to exclude false positives like D23_2 #96

Open
berntpopp opened this issue Feb 4, 2025 · 2 comments
Open
Assignees
Labels
bug Something isn't working enhancement New feature or request
Milestone

Comments

@berntpopp
Copy link
Collaborator

adVNTr outputs some false positive calls like D23_2.
This shoudl be filtered

We need a logic.

Is this available in version 1.3?

@berntpopp berntpopp added bug Something isn't working enhancement New feature or request labels Feb 4, 2025
@hassansaei
Copy link
Owner

The logic was implemented in version 1.3 with functions: advntr_processing_del and advntr_processing_ins

def advntr_processing_del(df):

df1 = df.copy()
df1.rename(columns={'State': 'Variant', 'Pvalue\n':'Pvalue'}, inplace = True)
df1['Deletion_length'] = df1['Variant'].str.count('D').add(0).fillna(0)
df1['Insertion'] = df1['Variant'].str.count('I').add(0).fillna(0)
df1['Insertion_len'] = df1['Variant'].str.extract('(LEN.*)')
df1.Insertion_len = df1.Insertion_len.fillna('LEN')
df1[['I', 'Insertion_len']]= df1['Insertion_len'].str.split('LEN', expand=True)
df1.Insertion_len = df1.Insertion_len.replace('', 0)
df1.Deletion_length = df1.Deletion_length.fillna('0')
df1.Deletion_length = df1.Deletion_length.astype(int)
df1.Insertion_len = df1.Insertion_len.astype(int)
df1['frame'] = abs(df1.Insertion_len - df1.Deletion_length)
df1.frame = df1['frame'].astype(str)
df1 = df1.loc[(df1['Deletion_length'] >= 1)]
df1 = df1[df1['frame'].isin(del_frame)]

return df1

def advntr_processing_ins(df):

dff = df.copy()
dff.rename(columns={'State': 'Variant', 'Pvalue\n':'Pvalue'}, inplace = True)
dff['Deletion_length'] = dff['Variant'].str.count('D').add(0).fillna(0)
dff['Insertion'] = dff['Variant'].str.count('I').add(0).fillna(0)
dff['Insertion_len'] = dff['Variant'].str.extract('(LEN.*)')
dff.Insertion_len = dff.Insertion_len.fillna('LEN')
dff[['I', 'Insertion_len']]= dff['Insertion_len'].str.split('LEN', expand=True)
dff.Insertion_len = dff.Insertion_len.replace('', 0)
dff.Deletion_length = dff.Deletion_length.fillna('0')
dff.Deletion_length = dff.Deletion_length.astype(int)
dff.Insertion_len = dff.Insertion_len.astype(int)
dff['frame'] = abs(dff.Insertion_len - dff.Deletion_length)
dff.frame = dff['frame'].astype(str)
dff = dff.loc[(dff['Insertion_len'] >= 1)]
dff = dff[dff['frame'].isin(ins_frame)]
     
return dff

@berntpopp
Copy link
Collaborator Author

ok, I will reimplement this.

@berntpopp berntpopp pinned this issue Feb 4, 2025
@berntpopp berntpopp added this to the 2.0.0-beta milestone Feb 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants