Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hash generates different results on identical objects (even with same memory address) #1681

Open
dipterix opened this issue Jan 16, 2024 · 11 comments

Comments

@dipterix
Copy link

I thought hash is supposed to generate the same results for identical objects. Could you help me with the following cases?

options(keep.source = TRUE)
a <-   function(){}
rlang::hash(a)
#> [1] "eca7e650f5be54ba7c122fbc88ed0811"
a <- function(){}
rlang::hash(a)
#> [1] "7bcbcbd5583248d94607c79bab4b70f0"
a()
#> NULL
rlang::hash(a)
#> [1] "a70206d89b1b3cc96363d3413aea8ed6"
a()
#> NULL
rlang::hash(a)
#> [1] "c3b887fbb758842048691f04406c130c"

Created on 2024-01-16 with reprex v2.1.0

Also

memF <- memoise::memoise(function(f){ f() })

a <- function(){
  message("a is evaluated")
}

memF(a)
#> a is evaluated
memF(a)
#> a is evaluated
memF(a)
#> a is evaluated
memF(a)

Created on 2024-01-16 with reprex v2.1.0

@lionel-
Copy link
Member

lionel- commented Jan 16, 2024

Try rlang::zap_srcref() on the function.

@dipterix
Copy link
Author

dipterix commented Jan 16, 2024

zap_srcref removes the source reference. I turned keep.source off and the result is still weird.

options(keep.source = FALSE)
a <-   function(){}
rlang::hash(a)
#> [1] "f53c7a98354786f99f3bdd8a1f655827"
a <- function(){}
rlang::hash(a)
#> [1] "57f55bbc9d540a599eb94edb2b4422d1"
a()
#> NULL
rlang::hash(a)
#> [1] "b5c51b4d7ecdc52b7ad405be5e10d998"
a()
#> NULL
rlang::hash(a)
#> [1] "4489adf065cb6c71e077228d56c46424"

Created on 2024-01-16 with reprex v2.1.0

@lionel-
Copy link
Member

lionel- commented Jan 16, 2024

The R code is evaluated after it is parsed, and source refs are attached by the parser, so setting the option from the same source will not work.

Call zap_srcref() before hash() if you need a stable hash.

@dipterix
Copy link
Author

options(keep.source = FALSE)
a <- rlang::zap_srcref(function(){})
rlang::hash(a)
#> [1] "b7e8ef5f48c3aa74a30b6ca39fbac850"
a()
#> NULL
rlang::hash(a)
#> [1] "f5852273190358bbe0b6e8328b37a4d6"
a()
#> NULL
rlang::hash(a)
#> [1] "c40ea5774ecd8ba9dccf890d710f0b46"

Created on 2024-01-16 with reprex v2.1.0

@lionel-
Copy link
Member

lionel- commented Jan 16, 2024

oh that's the bytecode I bet

@lionel-
Copy link
Member

lionel- commented Jan 16, 2024

From the JIT

@dipterix
Copy link
Author

Gotcha, then how can I get rid of it and produce stable results?

@lionel-
Copy link
Member

lionel- commented Jan 16, 2024

You could do something like this:

my_hash <- function(x) {
  if (is.function(x)) {
    # Attach a marker to disambiguate from an actual list
    x <- c("my_unique_function_marker", as.list(x))
  }
  rlang::hash(x)
}

On our side we should consider ignoring bytecode when computing the hash (and possibly the srcrefs).

@dipterix
Copy link
Author

dipterix commented Jan 17, 2024

I see https://github.com/wch/r-source/blob/67c905672a7f4dd00d12d9a0f1763bc46b985bb5/src/main/serialize.c#L1023C5-L1045C6 that if

    if (R_compile_pkgs && TYPEOF(s) == CLOSXP && TYPEOF(BODY(s)) != BCODESXP &&
        !R_disable_bytecode &&
        (!IS_S4_OBJECT(s) || (!inherits(s, "refMethodDef") &&
                              !inherits(s, "defaultBindingFunction")))) {


        /* Do not compile reference class methods in their generators, because
           the byte-code is dropped as soon as the method is installed into a
           new environment. This is a performance optimization but it also
           prevents byte-compiler warnings about no visible binding for super
           assignment to a class field.


           Do not compile default binding functions, because the byte-code is
           dropped as fields are set in constructors (just an optimization).
        */


        SEXP new_s;
        R_compile_pkgs = FALSE;
        PROTECT(new_s = R_cmpfun1(s));
        WriteItem (new_s, ref_table, stream);
        UNPROTECT(1);
        R_compile_pkgs = TRUE;
        return;
    }

then R serialize will compile the functions during serialize, provided the BODY(s) is not BCODESXP. Maybe we should consider temporarily enable

Sys.setenv("_R_COMPILE_PKGS_" = "1")
Sys.setenv("R_DISABLE_BYTECODE" = "0")

in rlang::hash? This should resolve most JIT cases and trim the source reference.

@dipterix
Copy link
Author

dipterix commented Jan 17, 2024

I tried Sys.setenv("_R_COMPILE_PKGS_" = "1") and it seemed to work only on reprex not interactively... However, if I compile explicitly, the function hashes are the same. There must be something missing...

options(keep.source = TRUE)
Sys.setenv("_R_COMPILE_PKGS_" = "1")
a <- compiler:::tryCmpfun(  function() {})
rlang::hash(a)
#> [1] "bcbe14ef3a47a590767f94f36c0765d8"
a()
#> NULL
rlang::hash(a)
#> [1] "bcbe14ef3a47a590767f94f36c0765d8"
a()
#> NULL
rlang::hash(a)
#> [1] "bcbe14ef3a47a590767f94f36c0765d8"
a()
#> NULL
rlang::hash(a)
#> [1] "bcbe14ef3a47a590767f94f36c0765d8"
a()
#> NULL
rlang::hash(a)
#> [1] "bcbe14ef3a47a590767f94f36c0765d8"

Created on 2024-01-16 with reprex v2.1.0

@lionel-
Copy link
Member

lionel- commented Jan 17, 2024

Disabling JIT will not help here because JIT kicks in when a function is called and hash() doesn't call the function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants