Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accounting for batch effects #14

Open
bontus opened this issue Aug 27, 2019 · 5 comments
Open

Accounting for batch effects #14

bontus opened this issue Aug 27, 2019 · 5 comments

Comments

@bontus
Copy link

bontus commented Aug 27, 2019

Hi,
I was wondering if there is a way to include batches in the binless analyses. I have 4 conditions, of which two have 3 replicates (control & treatment) and two have only 2 replicates (control + inhibitor as well as treatment + inhibitor). We are interested in detecting differences induced by treatment and dependent on the inhibitor but already noticed that one of our replicate batches clusters separately (globally the same changes are still visible though).
Any advice is greatly appreciated!
Best regards

@yannickspill
Copy link
Member

It depends on what you exactly call batch effects. There is some accounting for that in binless, by default. However, could you maybe explain how you see the batch effect, i.e. does it affect the diagonal decay, the biases etc?

@bontus
Copy link
Author

bontus commented Sep 3, 2019

The decay values are indeed different (i.e. smaller in batch 3 compared to batch 1&2), and I mainly noticed the differences in downstream calculations when looking at TAD borders and compartment strength. However, I realize that my question was somewhat arbitrary as I am mostly interested in accounting for batch effects during the difference test implemented in binless. Basically, my question could be translated to: can detect_binless_differences() use pairing information (akin to a paired t-test)? read_and_prepare()_ does provide the replicate parameter, but I did not see any other function make use of it.
Best

@yannickspill
Copy link
Member

In general, detect_binless_differences pairs the samples, so acts like a paired t-test, albeit more complicated because it takes into account the neighborhood of each pixel. In that sense, batch effects are already accounted for.

The replicate parameter in read_and_prepare serves essentially to have a different name for each sample. If you want to model a different decay, you could adapt the condition or enzyme fields of read_and_prepare, and then play with the different.decays argument of merge_cs_norm_datasets

Also, in difference detection, did you group your datasets before, or did you call differences in each dataset individually?

@bontus
Copy link
Author

bontus commented Sep 3, 2019

Also, in difference detection, did you group your datasets before, or did you call differences in each dataset individually?

I grouped them after normalization and before calling detect_binless_interactions().

The replicate parameter in read_and_prepare serves essentially to have a different name for each sample. If you want to model a different decay, you could adapt the condition or enzyme fields of read_and_prepare, and then play with the different.decays argument of merge_cs_norm_datasets

Alright, I will give that a try.

In general, detect_binless_differences pairs the samples, so acts like a paired t-test, albeit more complicated because it takes into account the neighborhood of each pixel. In that sense, batch effects are already accounted for.

That's great to hear, but I am still wondering which information is used to pair the samples if it is not explicitly provided by the user?

@yannickspill
Copy link
Member

I am still wondering which information is used to pair the samples if it is not explicitly provided by the user?

For difference detection, data is grouped by square bins of size base.res, and compared two by two, taking into account neighbour information. That is done automatically, and does not require user input. A more stricter pairing, in the sense of a patient before and after treatment, would not make sense anyway in this context.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants