Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: simplify csv files #263

Merged
merged 1 commit into from
Jun 1, 2021
Merged

feat: simplify csv files #263

merged 1 commit into from
Jun 1, 2021

Conversation

maxulysse
Copy link
Member

Trying something different with the samplesheet.csv files

@maxulysse
Copy link
Member Author

(works locally so far on my progress in nf-core/sarek#379)

@FriederikeHanssen
Copy link

Why are you removing gender information? It is needed for copy number calling

@maxulysse
Copy link
Member Author

I actually just remove it from the normal only samples.
cf https://github.com/nf-core/sarek/blob/c3714186c70e8d0e25072924bddf0b56f0f418ce/workflows/sarek.nf#L376-L412

So far, in previous sarek versions, we had one function to extract information from the TSV file for each step.
By using header in the csv file, we can directly assess if sample has gender, if it's for mappping or variant calling.
Hence simplifying this process.
I do like what Harshil did with the checksamplesheet process, but I feel like we can do all that in groovy/nextflow and not waste a process for it.

Basically my plan is to change as little as possible for users like (so switching from TSV to CSV(+header)).
But in the meantime, I feel like we can simplify some stuff for some other users as well.
I'm planning to start a discussion in nf-core/sarek#379 (as soon as I manage to fix the CI tests (and other issues) again), so we can discuss of what we want to be as header in said file.

@FriederikeHanssen
Copy link

Ah ok, I see. That makes sense. In the case of providing normal/tumor sample pairs, if the field is left empty, it will set it to 'NA' for both, right and CNV will be skipped?

@maxulysse
Copy link
Member Author

That's it.
But I think we can have the csv flexible.
Default status if not specified is normal -> 0.
Default gender if not specified is NA.
That way, if at some point we want to introduce CNV for normal sample, the csv would just need to have the gender specified.

Copy link

@FriederikeHanssen FriederikeHanssen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes after our discussion in slack: I get what you mean. For everyone still wanting to provide all columns not much changes except tsv -> csv. The only processes needing the gender information are CNV processes anyway. and for the CNV on germline samples not even the csv parsing code would need to be changed, but we would need to catch somewhere whether CNV tools are specified without gender information

@maxulysse maxulysse merged commit cedaeb0 into nf-core:sarek Jun 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants