Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WTP Space, GMNL, and Starting Values #22

Open
chevrotin opened this issue Dec 19, 2023 · 23 comments
Open

WTP Space, GMNL, and Starting Values #22

chevrotin opened this issue Dec 19, 2023 · 23 comments

Comments

@chevrotin
Copy link

Thanks for the awesome package. Cristian.

For implementation of WTP Space models, having reasonable starting values may be critical. My attempt to estimate my own data resulted in convergence issue: " Nonpositive definiteness in Cholesky factorization in formt; refresh the lbfgs memory and restart the iteration." The log likelihood is poor, indicating that the algorithm was stuck somewhere.

My intuition is that with proper starting values, the issues of local maxima or flat regions can be tackled. Is it possible to provide starting values to the procedure?

Further, Greene and Hensher (2010; Trasnportation Part D) and Fiebig et al. (2010; Marketing Science) have shown that WTP Space can be implemented with GMNL. The advantage of that is that the price parameter follows a log normal distribution. I wonder how the present set up of xlogit compares to/allow this specification.

Again, thanks for the awesome package. Xlogit is unrivalled in speed. It would be great to complete it with more useful features. This is a lot to ask. I would help if I can overcome the learning curve of programming it.

@arteagac
Copy link
Owner

Hi @chevrotin, you can pass initial values to xlogit using the init_coeff parameter to the fit function (see example below). You just need to make sure to pass your initial values following the order in which xlogit creates the coefficients. The order of the coefficients is varnames + sd of varnames + scale_factor. An easy way to figure out the order is to run a test estimation and follow the order of the coefficients in the summary (this test estimation does not need to be successful as the goal is to simply figure out the order of the coefficients).

varnames = ["a", "b", "c"]
ml = MixedLogit()
ml.fit(X=df[varnames],
    y=df["choice"],
    varnames=varnames,
    ids=df["chid"],
    randvars={"a": "n", "b": "n", "c "n"}
    ..., 
    init_coeff=np.array([.1, .1, .1, .1, .1, .1, .1])  # Order -> [a, b, c, sd.a, sd.b, sd.c, scale_factor]
    )

Regarding your other question, the current version of xlogit does not allow the scale_factor to follow a distribution. I know this would be a valuable addition to xlogit, but I have not had the chance to implement this yet.

@chevrotin
Copy link
Author

Thanks Cristian. The code works. I managed to estimate a WTP-space model with my data. Providing starting values, derived from a preference space ML model, seems to do the trick.

Few things to note. The model works with lower number of draws (4000), however, it runs into memory problem (similar to our discussion in logitr) when I specified 10000. I have 8 random variables. I imagine it would become much more constraining when you allow correlated parameters in future versions.

I wonder if cupy allows for some degree of memory management, where it can pass on some of the memory burden to hard drive. Essentially, you have a rocket here. The estimation only took less than 15 minutes. We have plenty of room to dial the speed down to allow for a larger model.

error message: cupy.cuda.memory.OutOfMemoryError: Out of memory allocating 2,691,840,000 bytes (allocated so far: 32,640,581,120 bytes).

@arteagac
Copy link
Owner

arteagac commented Dec 21, 2023

I think the batch processing functionality in xlogit can fix your GPU OutOfMemoryError. Simply pass the batch_size=1000 parameter to the fit method to tell xlogit to process 1,000 random draws at a time. This avoids overflowing the GPU memory as xlogit processes one batch at a time, computes the log likelihoods, and average them at the end, which does not affect on the final estimates or log likelihood. Please let me know if this works for you. You can also increase the batch_size depending on your GPU memory size, but the maximum for you would be around 4,000, which was when you ran out of GPU ram. The batch processing functionality allows estimations using very large number of random draws. This is how I estimated a model with half-million random using a 6GB GPU and 16GB RAM. Section 5.4 in xlogit's paper shows how this functionality work and the stats on use of RAM and GPU memory.

Regarding the offloading to disk, this is feasible but requires a separate implementation and it would be quite slow. However, the batch processing functionality should solve most of the issues, and the only limitation would be the computer RAM. Note that 16GB of RAM were enough for half-million draws, so this should not be that big of a constraint.

@chevrotin
Copy link
Author

Ah.. thanks. Glad that you have already solved that. Wasn't reading careful enough. Will try the batch function.

@arteagac
Copy link
Owner

Sure, please let me know how it works. Unfortunately xlogit's documentation does not properly elaborate on the batch processing and WTP space models. I definitely need to spend some time expanding the documentation to better highlight some of the nice features implemented into xlogit.

@chevrotin
Copy link
Author

If you need a helping hand for documentation, I'll be glad to help.

@arteagac
Copy link
Owner

That would be actually fantastic and I would love your help. If you can create some examples for WTP space models in a Google Colab notebook, I can directly integrate such notebook into the documentation. I'd would be great if one example includes passing initial values to the estimation using the init_coeff parameter and another example with the use of the batch_size functionality to handle large number of random draws. You could use the Yogurt dataset or any other open-access dataset of your preference. For the examples I try using a tutorial-like style, as illustrated in the examples below:

@chevrotin
Copy link
Author

Alright.. Will do

@arteagac
Copy link
Owner

Thanks a lot @chevrotin!

@chevrotin
Copy link
Author

https://colab.research.google.com/drive/1bjuh4-oAcsz5m9rN4uYnmQtqYPeC-Vwf?usp=sharing

@arteagac Please see the examples of both starting values and using batch_size to estimate with ndraws = 50000.

Note that the results differ a bit to LOGITR. I think this might be because of the multiple starting values that John has implemented. The log-likelihood score suggest that the solution might be a local maxima.

@arteagac
Copy link
Owner

Hi @chevrotin,
I am glad that the batch processing functionality worked for you and you managed to estimate a model with 50k random draws. Thank you so much for creating the examples. I will integrate them into the documentation sometime this weekend. Yes, the results differ a bit from logitr ones and logitr has a slightly better log-likelihood. I will try replicating @jhelvy's multi-start approach to see if this makes xlogit's log-likelihood closer to logitr's.

Dear @jhelvy, Can you please provide me some insights on how your multi-start approach samples the initial values? What upper and lower bounds do you use for the random sampling of initial values? I am trying to figure out why logitr yields a slightly better final log-likelihood despite using lower number of draws. Any ideas of why this might be happening? Is it perhaps the multi-start? Do you maybe use Sobol draws by default? I don't think the cause is the optimization routine because we both us Limited Memory BFGS and similar tolerance values (>= 1e-6).

@jhelvy
Copy link

jhelvy commented Dec 22, 2023

Cool thread here folks. A couple things:

  1. You don't need GMNL to allow the scale (price) parameter to follow a distribution. You can just specify a distribution to that parameter. In {logitr} this is handled with the randScale parameter, which can take normal (n), log-normal (ln), or censored normal (cn). That said, I am actually going to restrict this in the future to not allow normal since sometimes you can get a negative draw in the simulation, and that basically breaks everything (scale term must be strictly positive). This came up in another {logitr} issue and that was the culprit, so it really shouldn't be allowed to be normal. Also, while theoretically log-normal should be okay (it enforces positivity), in practice those models really struggle to converge. Because log-normals have long tails, the simulations often get a few large numbers in the draws, and when you scale the whole utility model by some large number, it bounces all over the place and diverges. WTP space models in general are rather sensitive to the scale parameter, so I often end up just choose a fixed scale parameter. Censored normal is a good compromise in that it forces positivity while not getting too many large value draws.
  2. For setting up the starting values, you can see my heuristics here. I've added a handful of hard-coded rules depending on the context to make sure I'm using reasonable starting values. The default is to draw random values between (-1, 1). I allow the user to pass in different boundaries if they want as well. But then after I draw those values, I run through these checks to override some. For example, if a negative value was drawn for the scale parameter, I overwrite that to be positive, otherwise the model won't converge. These are purely heuristics that I've worked with from trial and error.

This is also why I went through the trouble of building in a multi-start loop. I had enough convergence issues with WTP space models that I didn't want users to have to do this much work - just let it iterate more to find a better solution. But this is also true for mixed logit in the preference space. Mixed logit models have non-convex log-likelihood functions, so really we should be running them from different starting points to be a little more confident in the solutions.

@arteagac
Copy link
Owner

Thanks a lot for your detailed explanation, @jhelvy. I will check the heuristics included in your multi-start loop. Given the importance of the multi-start functionality, I might include it in xlogit.

@chevrotin
Copy link
Author

Glad to hear from you, John. Here are some of my thoughts:

@jhelvy It's true that GMNL is not necessary for WTP-space. The plain ML can do it. In my own implementations, GMNL tend to converge better. To avoid the long tail, you can use the draws of truncated normal (+- 2) as the log normal price's underlying normal distribution.

@arteagac : I'm kinda curious about xlogit implementation of WTP Space. I take it that the scale parameter is assumed as fixed, therefore it does not have a standard deviation associated with it. Or does it? Also, does it automatically take the negative of price, (when scale_parameter = price) ?

Merry Christmas to both of you, Cristian and John.

@arteagac
Copy link
Owner

arteagac commented Dec 23, 2023

Hi @chevrotin, you are correct, in xlogit the scale parameter is fixed and does not have an associated standard deviation. Yes, xlogit uses the negative of the scale parameter, as it follows the WTP space formulation shown below (see $p_{nj})$, which was kindly provided by @jhelvy. Merry Christmas for you too!

image

@chevrotin
Copy link
Author

chevrotin commented Dec 24, 2023 via email

@jhelvy
Copy link

jhelvy commented Dec 24, 2023

@chevrotin I don't have a way to allow bounds on the underlying normal for a log-normal parameter. I just use cn for the scale parameter in a WTP space model and it works reasonably well.

But I'm curious how you've been able to implement GMNL for WTP space models? Do you use custom code? I've struggled to get it to converge for a WTP space setup using alternatives like {gmnl} and {apollo} (see, for example, this article).

WTP space models are just weird, and in my experience there haven't been many developers who have made them easy to estimate. Other than {logitr}, there's a routine in Stata that seems pretty robust, and of course there's xlogit. I guess you could go Bayesian with Stan too and set this up, but at that point you're doing a lot of coding yourself and you really, really need to know what you're doing. I wanted WTP space models to be easily accessible to researchers who aren't programming experts, so I've tried to make the default settings in {logitr} work well for most.

@chevrotin
Copy link
Author

@jhelvy I use the GMNL estimator with STATA. It usually take days to estimate.

You are right there isn't many routines that are fast and complete.

@jhelvy
Copy link

jhelvy commented Dec 30, 2023

Days!? 😲

Wow...I have no idea why. For licensed software you think they'd spend some time doing a little performance engineering. I don't have gmnl on the list to support in logitr though, there are just too many other things to add right now that are more pressing, like maybe support for xlogit.

@chevrotin
Copy link
Author

chevrotin commented Dec 30, 2023 via email

@arteagac
Copy link
Owner

Dear @chevrotin,
I finally had the chance to integrate into the docs the example you kindly created (see link below):
https://xlogit.readthedocs.io/en/latest/notebooks/wtp_space.html

Thank you again for the time you dedicated to develop this example.

Regarding the slight difference with logitr's final log-likelihood, I don't think it is caused by the fixed scale parameter. I think logitr also uses by default a fixed scale parameter, which you can change to random using the randScale option. I will confirm this with @jhelvy. Also, I am still working on figuring out if the multi-start approach helps xlogit to get a log-likelihood closer to logitr's one.

Dear @jhelvy,
Could you please kindly confirm the following two questions about logitr?

  1. Does logitr use by default a fixed scale parameter for models in WTP space?
  2. How does the multi-start approach sample values when you pass pre-computed initial coefficients? For instance, if the initial coefficients are based on the estimates of a model in preference space, how do does the multi-start approach use these pre-computed initial coefficients?

@jhelvy
Copy link

jhelvy commented Jan 2, 2024

Yes, logitr uses a fixed scale parameter by default. It's only random if the user specifies so using randScale.

In logitr, a user can specify a set of starting points using the startVals argument. These values are only used for the starting values for the first run if numMultiStarts > 1, after which random starting values will be used. If no startVals are provided, a zero vector is used for the first run, after which again random values are used.

@arteagac
Copy link
Owner

arteagac commented Jan 2, 2024

Thanks a lot for the clarification, John!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants