Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when associating TFs to TREs and genes (mapTF / get.proximal.genes) #1

Open
mattgalbraith opened this issue May 22, 2019 · 6 comments

Comments

@mattgalbraith
Copy link

When running tfTarget via run_tfTarget.bsh with the following command:
bash run_tfTarget.bsh \ -query $TREATMENT_SAMPLES \ -control $CONTROL_SAMPLES \ -bigWig.path $BIGWIG_PATH \ -prefix gencode_test \ -TRE.path $TRE_MERGED_BED \ -gene.path $ANNOTATION_BED \ -2bit.path $HG19_2BIT \ -pval.up 0.1 \ -pval.down 0.1 \ -ncores 3 \ -dist 50000 \ -closest.N 2 \ -pval.gene 0.1

I am getting the following error:

[1] "associating TFs to TREs and genes"
awk: syntax error at source line 1
context is
BEGIN{OFS=" "} {print >>> $1,$6== <<<
awk: illegal statement at source line 1
awk: illegal statement at source line 1
Error in $<-.data.frame(*tmp*, "closest.N", value = c(1L, 2L, 1L, :
replacement has 36 rows, data has 37
Calls: mapTF -> get.proximal.genes -> $&lt;- -&gt; $&lt;-.data.frame
Execution halted

This appears to be related to the awk command at lines 18-20 or 43-45 of mapTF.R

R session info with tfTarget loaded:

R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS 10.14.4

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] tfTarget_1.0

loaded via a namespace (and not attached):
[1] bitops_1.0-6 matrixStats_0.54.0 rtfbsdb_0.4.5
[4] bit64_0.9-7 RColorBrewer_1.1-2 GenomeInfoDb_1.18.1
[7] tools_3.5.1 backports_1.1.3 R6_2.3.0
[10] KernSmooth_2.23-15 rpart_4.1-13 sm_2.2-5.4
[13] Hmisc_4.1-1 DBI_1.0.0 lazyeval_0.2.1
[16] BiocGenerics_0.28.0 colorspace_1.3-2 nnet_7.3-12
[19] tidyselect_0.2.5 gridExtra_2.3 DESeq2_1.22.1
[22] bit_1.1-14 compiler_3.5.1 Biobase_2.42.0
[25] htmlTable_1.12 DelayedArray_0.8.0 rphast_1.6.9
[28] caTools_1.17.1.1 scales_1.0.0 checkmate_1.8.5
[31] genefilter_1.64.0 stringr_1.3.1 apcluster_1.4.7
[34] digest_0.6.18 foreign_0.8-71 XVector_0.22.0
[37] vioplot_0.3.0 base64enc_0.1-3 pkgconfig_2.0.2
[40] htmltools_0.3.6 htmlwidgets_1.3 rlang_0.3.0.1
[43] rstudioapi_0.8 RSQLite_2.1.1 bindr_0.1.1
[46] zoo_1.8-5 BiocParallel_1.16.5 bigWig_0.2-9
[49] gtools_3.8.1 acepack_1.4.1 dplyr_0.7.8
[52] RCurl_1.95-4.11 magrittr_1.5 GenomeInfoDbData_1.2.0
[55] Formula_1.2-3 Matrix_1.2-15 Rcpp_1.0.0
[58] munsell_0.5.0 S4Vectors_0.20.1 stringi_1.2.4
[61] yaml_2.2.0 rtfbs_0.3.9 SummarizedExperiment_1.12.0
[64] zlibbioc_1.28.0 gplots_3.0.1 plyr_1.8.4
[67] grid_3.5.1 blob_1.1.1 gdata_2.18.0
[70] parallel_3.5.1 crayon_1.3.4 lattice_0.20-38
[73] splines_3.5.1 annotate_1.60.0 locfit_1.5-9.1
[76] knitr_1.21 pillar_1.3.0 GenomicRanges_1.34.0
[79] geneplotter_1.60.0 stats4_3.5.1 XML_3.98-1.16
[82] glue_1.3.0 latticeExtra_0.6-28 data.table_1.11.8
[85] gtable_0.2.0 purrr_0.2.5 assertthat_0.2.0
[88] ggplot2_3.1.0 xfun_0.4 xtable_1.8-3
[91] survival_2.43-3 tibble_1.4.2 AnnotationDbi_1.44.0
[94] memoise_1.1.0 IRanges_2.16.0 bindrcpp_0.2.2
[97] cluster_2.0.7-1

@tinyi
Copy link
Collaborator

tinyi commented May 22, 2019 via email

@mattgalbraith
Copy link
Author

head ~/Refs/hg19/gencode.v19.annotation.bed
chr1 11868 14412 ENSG00000223972.4 DDX11L1 +
chr1 14362 29806 ENSG00000227232.4 WASH7P -
chr1 29553 31109 ENSG00000243485.2 MIR1302-11 +
chr1 34553 36081 ENSG00000237613.2 FAM138A -
chr1 52472 54936 ENSG00000268020.2 OR4G4P +
chr1 62947 63887 ENSG00000240361.1 OR4G11P +
chr1 69090 70008 ENSG00000186092.4 OR4F5 +
chr1 89294 133566 ENSG00000238009.2 RP11-34P13.7 -
chr1 89550 91105 ENSG00000239945.1 RP11-34P13.8 -
chr1 131024 134836 ENSG00000233750.3 CICP27 +

I was unable to successfully get all the R dependencies installed on our linux system, hence using the Mac.

@mattgalbraith
Copy link
Author

I have now managed to get tfTarget and all dependencies running on linux and no longer get the awk error. However, I am now getting a new error:

[1] "associating TFs to TREs and genes"
Error in names(x) <- value :
'names' attribute [27] must be the same length as the vector [16]
Calls: mapTF -> colnames<-
Execution halted

From looking into the mapTF function, it appears that
TF.TRE.gene.tab.short <- TF.TRE.gene.tab[, -c(1, 13:15)]
is generating a data frame with only 16 columns rather than the 27 suggested by
header.vec <- c("tre.chrom", "tre.chromStart", "tre.chromEnd", "tf.chrom", "tf.chromStart", "tf.chromEnd", "score", "strand", "motif.name", "motif.id", "motif.idx", "TRE.baseMean", "TRE.log2FoldChange", "TRE.pvalue", "TRE.padj", "gene.TSS.chr", "gene.TSS.start", "gene.TSS.end", "transcript.id", "gene.name", "gene.strand", "gene.baseMean", "gene.log2FoldChange", "gene.pvalue", "gene.padj", "distance")

if (!is.null(closest.N)) header.vec <- c(header.vec, "closest.N")

colnames(TF.TRE.gene.tab.short) <- header.vec

I will try running the R commands manually to see if I can track this down any further...

@mattgalbraith
Copy link
Author

For reference:
The last error was caused by an empty TF.TRE.gene.tab object due to the stringency of settings used.

@tinyi
Copy link
Collaborator

tinyi commented May 28, 2019 via email

@CholponZ
Copy link

I have now managed to get tfTarget and all dependencies running on linux and no longer get the awk error. However, I am now getting a new error:

[1] "associating TFs to TREs and genes"
Error in names(x) <- value :
'names' attribute [27] must be the same length as the vector [16]
Calls: mapTF -> colnames<-
Execution halted

From looking into the mapTF function, it appears that
TF.TRE.gene.tab.short <- TF.TRE.gene.tab[, -c(1, 13:15)]
is generating a data frame with only 16 columns rather than the 27 suggested by
header.vec <- c("tre.chrom", "tre.chromStart", "tre.chromEnd", "tf.chrom", "tf.chromStart", "tf.chromEnd", "score", "strand", "motif.name", "motif.id", "motif.idx", "TRE.baseMean", "TRE.log2FoldChange", "TRE.pvalue", "TRE.padj", "gene.TSS.chr", "gene.TSS.start", "gene.TSS.end", "transcript.id", "gene.name", "gene.strand", "gene.baseMean", "gene.log2FoldChange", "gene.pvalue", "gene.padj", "distance")

if (!is.null(closest.N)) header.vec <- c(header.vec, "closest.N")

colnames(TF.TRE.gene.tab.short) <- header.vec

I will try running the R commands manually to see if I can track this down any further...

I am having the same issue. I wonder if your manual solution did work.

Best regards

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants