Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: spec: memoization #62637

Closed
quantumsheep opened this issue Sep 14, 2023 · 19 comments
Closed

proposal: spec: memoization #62637

quantumsheep opened this issue Sep 14, 2023 · 19 comments
Labels
LanguageChange Suggested changes to the Go language Proposal Proposal-FinalCommentPeriod v2 An incompatible library change
Milestone

Comments

@quantumsheep
Copy link

quantumsheep commented Sep 14, 2023

Author background

Would you consider yourself a novice, intermediate, or experienced Go programmer?
Experienced (working professionally with Go).

What other languages do you have experience with?
Rust, TypeScript, Python, C, C++, C#.

Related proposals

Has this idea, or one like it, been proposed before?
It does not seem so, at least I haven't found one similar case searching the current issues.

Does this affect error handling?
No.

Is this about generics?
No.

Proposal

What is the proposed change?
A build-in memoization system.

Who does this proposal help, and why?
This feature would help developers relying on manual memoization drastically reduce code boilerplate and complexity. Especially when dealing with a lots of functions that requires memoization.

Please describe as precisely as possible the change to the language.
Having a cache system to optimize compute-heavy functions is not an uncommon practice. Here's an example:

var heavyComputationCache = map[string]int{}

func heavyComputation(input string) int {
	if cached, ok := heavyComputationCache[input]; ok {
		return cached
	}

        result := // code that takes a long time to process

	heavyComputationCache[input] = result
	return result
}

While this code is fine, maintaining multiple functions with similar caching logic, especially when dealing with functions that have multiple or variadic arguments, can become difficult.

I propose to introduce a built-in memo keyword prefixing a function definition.

memo func heavyComputation(input string) int {
        result := // code that takes a long time to process
	return result
}

We would need to think about how to handle complex data structures and pointers (should we compare the pointers or the content of the pointers ?).

This concept can be thought of as a syntactic sugar feature, which may not align perfectly with Go's typical design principles. However, I'm still sharing the idea anyway to see what the community thinks.

What would change in the language spec?
The addition of a new keyword along with function "options" will be required.

Please also describe the change informally, as in a class teaching Go.
Sometimes in Go we want to speed up a function that takes a lot of time to process but with the same inputs. A good alternative is memoization where you cache the output of a function for the given inputs. When the function is called another time with the same inputs, the cached output is fetched and returned directly without processing the instructions again.

This proposal would remove any boilerplate needed to perform this task in favor of a single keyword.

Is this change backward compatible?
Yes, it would not change how previous Go code worked.

Orthogonality: how does this change interact or overlap with existing features?

Is the goal of this change a performance improvement?
Memoization is optimization so yes.

  • If so, what quantifiable improvement should we expect?

    • We should expect a reduction of almost 100% of a function call time.
  • How would we measure it?

    • We can compare the time it takes for the same function with and without memoization.

Costs

Would this change make Go easier or harder to learn, and why?
This proposal requires developers to know about memoization and it's a new keyword so I suppose it would make it harder to learn Go entirely.

What is the cost of this proposal? (Every language change has a cost).

How many tools (such as vet, gopls, gofmt, goimports, etc.) would be affected?
I'm not familiar on how tools like these handles Go code but current parsers will certainly fail if this feature is added.

What is the compile time cost?
Should be negligible but I'm not sure as I haven't dig into the Go compiler.

What is the run time cost?
It will certainly add a little overhead to each functions that have this option enabled as it requires to check the presence of a value in a data structure.

Can you describe a possible implementation?
I never got my hands into the Go compiler so I'm not sure. However I think we would need a map for each function, each map having the function arguments as key and output as value.

We could have a TTL by cleaning the old cache with the GC so it does not impact the main process' performances.

@gopherbot gopherbot added this to the Proposal milestone Sep 14, 2023
@quantumsheep quantumsheep changed the title proposal: spec: functions memoization proposal: spec: memoization Sep 14, 2023
@seankhliao
Copy link
Member

Please fill out https://github.com/golang/proposal/blob/master/go2-language-changes.md when proposing language changes

@quantumsheep
Copy link
Author

Please fill out https://github.com/golang/proposal/blob/master/go2-language-changes.md when proposing language changes

Thanks, I wasn't sure if it was needed for this proposal or not. It's now updated.

@ianlancetaylor ianlancetaylor added LanguageChange Suggested changes to the Go language v2 An incompatible library change labels Sep 14, 2023
@ianlancetaylor
Copy link
Member

It seems to me that we could write a general purpose memoization package instead, and a function could use that. I don't see a reason that this should be built into the language.

@seankhliao
Copy link
Member

would intern #62483 be such a package?

@qiulaidongfeng
Copy link
Member

would intern #62483 be such a package?

no.
The purpose of the #62483 proposal is to retain only one copy of duplicate data and to quickly compare data. This is based on the function input cache output.

@qiulaidongfeng
Copy link
Member

It seems to me that we could write a general purpose memoization package instead, and a function could use that. I don't see a reason that this should be built into the language.

I don't quite understand. Can you write out all the APIs of the package to help understand?

@psnilesh
Copy link

While this code is fine, maintaining multiple functions with similar caching logic, especially when dealing with functions that have multiple or variadic arguments, can become difficult.

Is it possible to use go:generate to generate a memorized version of the function that can handle all these scenarios at compile time ?

@atdiar
Copy link

atdiar commented Sep 15, 2023

@psnilesh as I understand it I'd think no because some functions are probably only executed at runtime.

Wrt the proposal I wonder if it doesn't require to tie into the runtime.

If done at a library level, it's basically a value cache that function writers could import and would retrieve value from if it exists. Probably the better and easier way to do this. I'd join @ianlancetaylor on that.

Just a question in passing, although I don't think that these are the kinds of language changes that are usually considered for go, but
memo func would probably be func memo to keep changes to the parser localized?

@apparentlymart
Copy link

apparentlymart commented Sep 15, 2023

I agree with Ian that this feels more appropriate to be a library feature than a language feature. Having memoization built into a language is not a totally unreasonable idea, but this level of abstraction feels out of step with the rest of Go, and suggests a built-in feature for a need that is -- based on my experience, at least -- a relatively specialized one that isn't important for most programs. (Of course that's just a hunch; I don't have data to support it.)

When I think about API for this I think of something like sync.Once but with support for multiple different responses that are each identified by a comparable key. For example:

// ManyOnce is a terrible name. I'm more focused on the API than the
// name for the moment.
type ManyOnce[ReqKey comparable, Result any] struct {
    // ...
}

func (mo *ManyOnce[ReqK, R]) Do(k RK, f func () R) R {
    // ...
}

The Do method would check if the ManyOnce object already has a result matching k, and return it immediately if so. Otherwise it would call f, save its return value as the result for k, and return that value. Do should be concurrency-safe so that multiple callers can potentially ask for a particular k at the same time without causing a data race. The zero value of ManyOnce should be ready to use. I'm intentionally not specifying the implementation here but the intuition is that this is an unexported map[ReqKey]Result that is allocated automatically at the first call to Do.

The same caveats apply as for sync.Once: the closure passed to Do should not capture anything that might vary between different dynamic callers, but it can capture values that are represented in k.

So from a callsite it might appear like this (based on the motivating example in the proposal):

var results ManyOnce[string, int]

func heavyComputation(input string) int {
    return results.Do(input, func () int {
        return // code that takes a long time to process
    })
}

From the perspective of heavyComputation's caller, the memoization is an invisible implementation detail. Inside heavyComputation, the memoization adds only a little extra boilerplate.

Does that seem like it would meet the original stated need, or have I missed something?

@atdiar
Copy link

atdiar commented Sep 15, 2023

@apparentlymart now that you mention it, isn't it possible to simply use sync.Once in a closure? (edit: answering to self, it's not)

As long as the function has been called once, a boolean var and the variables storing the results would be set:

If the bool is false, then call once.Do(f) and return the computed values, else return the computed values.

Does it solve the problem?

Edit: since memoization is per function and functions are not comparable, one should probably use a function pointer as key in the global result map.

Naively:
https://go.dev/play/p/y9geuoK3KSL

But now making that automatic and generic is probably another ballgame.

@quantumsheep
Copy link
Author

quantumsheep commented Sep 15, 2023

@atdiar unfortunately, while Once will memoize the function after it's called at least once but won't manage input changes.

In general I see that it's a controversial proposal, but at least I shared my thoughts.

It's certainly feasible to achieve a form of memoization using a library. I've personally implemented it here. Nevertheless, it's worth noting that Go still lacks certain essential features for this solution to be considered flawless, such as variadic generics for function arguments.

@apparentlymart
Copy link

It is true that the API I proposed would suffer from the typical problem that it isn't possible to express multiple return values from a function as a single type. But I would suggest then that it would be better to fix that wart in the language design, which would be an orthogonal change that would solve for many variations of this problem, than to introduce a "memoization" feature to work around that design problem in only one situation.

(There are already various existing proposals open for variations on how we might deal with that schism, so I don't think we need to discuss the details of that here, only to consider what we might do if any one were approved and implemented.)

@atdiar
Copy link

atdiar commented Sep 15, 2023

@quantumsheep yes, it is not easy. The problems I see are:

  • how to make it generic
  • how to get a function id since they are not comparable (needs assistance from the runtime, knowing that using the function address works as long as the GC is not moving, so should be ok... As long as we know... For now)
  • how to make sure that the function is pure

I think you're right that it will be difficult to do that in a library without answers to at least the above.

@quantumsheep
Copy link
Author

@atdiar

For the function id I use runtime.FuncForPC(reflect.ValueOf(function).Pointer()).Name(), it works fine.

Knowing that functions are pure seems impossible with today's specs and I don't this a feature like this will be implemented. Every functions implemented by now would need to be marked pure. However the compiler could infer pure functions under certain conditions, it's clearly feasible but I'm not sure about the cost/benefit ratio.

We could have variadic generic arguments:

func example[T any, Args ...any](f func(args ...Args) T) func(args ...Args) T {
	return func(args ...Args) T {
		return f(args...)
	}
}

What do you think ? There might already be an existing proposal for this thought.

@atdiar
Copy link

atdiar commented Sep 19, 2023

Yes you could use that or use fmt.Sprint(f) where f is the function.
The alternative is using function pointers which is probably more future-proof but isn't really approximating global function identity/serialization (more likely scoped).

Wrt purity, unfortunately, I'd tend to think that it is necessary. Memoizing a function that can have side-effects seems a bit too dangerous.
But it's probably not simple to detect it. Every call within a pure function body should be of a pure function/method wrt the external scope. Function arguments shouldn't be mutated etc... Tends to be recursive, interprocedural really fast.
As you remark, not sure of the cost/benefit ratio of implementing this.

Wrt variadic generic arguments, I admit I am not too sure I understand exactly variadic type parameters.
There is still no way to write multiple function signatures even if type parameters are variadic.

I think the real feature would be function constraints that are arity-independent. And even then, the theory deems it impossible as it would be equivalent to having types of infinite size.

So to conclude, I'm afraid it will be impossible.

[edit] it might be possible in fact to envision variadic type parameters as a way to construct an object that is similar to a struct type and whose components are each of given type, but it will also require to have the equivalence between function argument lists and this type. Essentially some kind of tuple type. This is involved but probably necessary to have that kind of equivalence because function returns have to be assignable to variables and it's not possible/desirable to have variadic variable declarations as well.
Since every type in Go is of finite size, there wouldn't be any problem.

But that's just merely an idea.

Another question: for a function to be memoizable, shouldn't the type of each function argument comparable?

@ianlancetaylor
Copy link
Member

Based on the discussion above, and the emoji voting, this is a likely decline. Leaving open for three weeks for final comments.

@qiulaidongfeng
Copy link
Member

I am inclined to support the proposal.

First, if you write a general memorization package, suppose a function has three arguments, two return values, and five different value types, the implementation method I think of is to use reflection. But the reflection performance is not high and the readability is poor. Or the memo from the author of the proposal, which I tested and found that the memorized function (which is just a recursive solution to the Fipolacci sequence) has 3 allocs/op, and the original function is 0 allocs/op. If it is a more complex function, I am not sure that there will not be more malloc.

Second, I think using this proposal for a function that has side effects
Users should understand that side effects will only occur when the function is called for the first time, subsequent calls will return the cached return value, and side effects will not occur, so whether you remember a function with side effects does not affect whether the proposal is accepted.

Third, I think the question of whether a memorized function can be called from multiple Goroutines at the same time is more open to discussion.

@griesemer
Copy link
Contributor

No substantial change in consensus.

@griesemer griesemer closed this as not planned Won't fix, can't repro, duplicate, stale Nov 1, 2023
@emad-elsaid
Copy link

emad-elsaid commented May 23, 2024

As the @apparentlymart pointed out. it's possible to do a generic solution. but the function interface still has some constraints like the number of arguments.
As for making sure the function is pure. this is something the function writer should take care of, after all memoization can't memoize the whole environment to make it part of the input that's being memoized for.
So any solution can't check if the function is pure. it's up to the user of the solution.

For me I have wrote my own solution to memoize functions of type func[K,V any](K) V and func[K,V any](K) (V,error) which is the most common interfaces I use.
I have the package here: https://github.com/emad-elsaid/memoize
and I have been testing it on some side projects that's been CPU heavy and it shows good results for my usecase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
LanguageChange Suggested changes to the Go language Proposal Proposal-FinalCommentPeriod v2 An incompatible library change
Projects
None yet
Development

No branches or pull requests

10 participants