Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update aggregation to match ancestry output #79

Merged
merged 10 commits into from
Feb 21, 2024
Merged

Update aggregation to match ancestry output #79

merged 10 commits into from
Feb 21, 2024

Conversation

nebfield
Copy link
Member

@nebfield nebfield commented Feb 20, 2024

Doing this in the quarto report with data.table causing problems with building the dockerfile.

Also, this fixes an issue where the DENOM column is missing when multiple custom scoring files are aggregated

Copy link
Member

@smlmbrt smlmbrt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will need to change read_pgs in ancestry_analysis, as it reads in the old version of the aggregated scores.

def read_pgs(loc_aggscore, onlySUM: bool):
"""
Function to read the output of aggreagte_scores
:param loc_aggscore: path to aggregated scores output
:param onlySUM: whether to return only _SUM columns (e.g. not _AVG)
:return:
"""
logger.debug('Reading aggregated score data: {}'.format(loc_aggscore))
df = pd.read_csv(loc_aggscore, sep='\t', index_col=['sampleset', 'IID'], converters={"IID": str}, header=0)
if onlySUM:
df = df[[x for x in df.columns if x.endswith('_SUM')]]
rn = [x.rstrip('_SUM') for x in df.columns]
df.columns = rn
return df

Signed-off-by: smlmbrt <sam.a.lambert@gmail.com>
Copy link
Member

@smlmbrt smlmbrt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my last commit solves it

@nebfield nebfield merged commit a4df14d into dev Feb 21, 2024
1 check passed
@nebfield nebfield deleted the fix-aggregate branch February 21, 2024 12:34
nebfield added a commit that referenced this pull request Feb 21, 2024
* update vulnerable dependencies

* Update aggregation to match ancestry output (#79)

* match ancestry aggregation output

* bump version

* fix column name (accession -> PGS)

* fix column name

* add aggregate tests

* fix not respecting outdir

* read new version of pgs

* drop onlySUM parameter

* Make sure it only reads SUM and provides the correct column names back

Signed-off-by: smlmbrt <sam.a.lambert@gmail.com>

* drop deprecated parameter

---------

Signed-off-by: smlmbrt <sam.a.lambert@gmail.com>
Co-authored-by: smlmbrt <sam.a.lambert@gmail.com>

---------

Signed-off-by: smlmbrt <sam.a.lambert@gmail.com>
Co-authored-by: smlmbrt <sam.a.lambert@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants