Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for non-static data for reinforcement learning #713

Closed
ghost opened this issue Jan 19, 2020 · 20 comments · Fixed by #1232
Closed

Support for non-static data for reinforcement learning #713

ghost opened this issue Jan 19, 2020 · 20 comments · Fixed by #1232
Labels
feature Is an improvement or enhancement help wanted Open to be worked on

Comments

@ghost
Copy link

ghost commented Jan 19, 2020

What would be the best approach for reinforcement learning problems where you would need to interact with the environment for data? Maybe DataLoader is restricting?

@williamFalcon
Copy link
Contributor

could you post a snippet?

@irustandi
Copy link
Contributor

Along this line, I think what would be good is to have the PyTorch Lightning equivalent of the reinforcement learning examples in PyTorch or PyTorch Ignite:

https://github.com/pytorch/examples/tree/master/reinforcement_learning
https://github.com/pytorch/ignite/tree/master/examples/reinforcement_learning

Is this possible?

@colllin
Copy link

colllin commented Jan 22, 2020

I'm interested in this too. I'm thinking about trying to make it work using pytorch's new IterableDataset for feeding data from a (prioritized) replay buffer.

Edit: Then I would rollout episodes (across a cluster) before each "epoch", which is just a fixed number of training steps between rollouts.

@Borda
Copy link
Member

Borda commented Jan 24, 2020

@colllin may you consider creating a PR?

@djbyrne
Copy link
Contributor

djbyrne commented Feb 11, 2020

Hey guys, also really interested in using Pytorch Lightning for Reinforcement Learning. Not sure that the Dataloader is the best structure for RL though, has anyone found a good way of incorporating Dataloaders for things like gym environments?

@ghost
Copy link
Author

ghost commented Feb 11, 2020

I've been looking at pytorch's built in map-stype and iterable-style datasets, and I think there might be a way of getting RL to work with them. Map-style might work for replay buffers, otherwise iterable-style would provide more flexibility in feeding data. I'll post code if I get something to work.

@djbyrne
Copy link
Contributor

djbyrne commented Feb 11, 2020

I was trying to see if there was a good way to incorporate the Data loader into the RL environment, but it doesn't seem to fit. Using it for a replay buffer sounds like a good idea. But what should you do if you are using Lightning for an RL agent that doesn't use a Replay Buffer? should you just use a dummy DataLoader that isn't utilized?

@ghost
Copy link
Author

ghost commented Feb 11, 2020

Pytorch's IterableDataset lets you use a python iterator as your dataloader. That should work as a sort of dummy dataloader. You can just ask for a sample, and run the environment in next(). This is what I'm thinking might work.

@djbyrne
Copy link
Contributor

djbyrne commented Feb 12, 2020

Was looking at something like this to use the DataLoader for simply retrieving the current state

class Environment(IterableDataset):
    """Basic Gym Environment Dataset."""

    def __init__(self, env):
        super(EnvDataset).__init__()
        self.env = env
        self.obs = self.env.reset()
        
    def __iter__(self):
        return iter([self.env.state])

This would provide a "dummy" dataloader, providing Lightning with everything it needs. However, this solution feels like trying to fit the project to the framework.

Would it be possible to change the hard requirement of providing a dataloader to Lightning for systems like RL agents?

@AwokeKnowing
Copy link

Doesn't RL usually involve 'rollouts with existing network' then 'evaluation of the data' for learning? It seems kind of odd even for RL to have the 'next()' of the environment in the 'inner loop' of the learning.

There does need to be a hook to switch back and forth between learning and 'rollouts' but it might be counterproductive to put the learning on a 'per frame basis' where each pulling of a sample from the dataloader 'runs' the environment. So I'm just saying in the design of this, it's not about pulling one sample from 'the environment', it's about pulling a 'batch of data' from the environment, but there would be a benefit to having a standard way to connect the dataloader to the environment to pull batches (theoretically as small as single frames/samples).

@williamFalcon
Copy link
Contributor

@AwokeKnowing i wish i was more up to speed on RL but haven't been doing much of it. I'd love to make sure lightning supports it. Mind suggesting what needs to change to do that?

thanks

@djbyrne
Copy link
Contributor

djbyrne commented Feb 12, 2020

@AwokeKnowing are you saying that the dataset would have a reference to both the agent and the env. Then the iter/getitem function inside the dataset would collect a batch of transitions ?

@colllin
Copy link

colllin commented Feb 12, 2020 via email

@AwokeKnowing
Copy link

AwokeKnowing commented Feb 12, 2020

@colllin I don't think any hardcoded value (5000) is appropriate because some tasks the samples ("frames") are a few floats (many gym) a tiny matrix (chess/go) or sometimes they are 1024x1024 images. And some tasks (meta learning) may require different amounts of samples per 'model update' step.

So from Lightning perspective, it cannot know how many rollouts. This has to be configured for each task.

What I am suggesting is something inbetween the DataLoader and Environment, call it EnvDataManager. The EnvDataManager is configured with information about how to collect rollouts and feed to DataLoader. when DataLoader requests Data from EnvDataManager (EDM), and EDM decides it's time for a 'brain' update, EDM updates the model used by the Environment and collects more samples (async) and begins feeding the DataLoader. The EDM would also know when to 'add' to the existing data vs 'replace' with new data.

image

Note that by 'new model weights' I just mean access to the agent to run inference to select an action to pass to the environment to get a new observation. However, typically in RL you don't run the latest 'agent' but a 'checkpoint', which also you pass to multiple 'rollout servers/processes'. You might even have a couple different versions of the agent. Thus I said 'weights' and the EDM will keep copies of them as needed.

@djbyrne I think so, if I understood you correctly.

@williamFalcon I think Lightning is flexible enough to work with RL but as it wraps common scenarios, I think the case of RL where it's not a 'static data set' has some good potential for wrapping so people don't do same/similar custom code in all their RL projects to work with Lighning.

@colllin
Copy link

colllin commented Feb 13, 2020

The 5000 was in reference to either the number of time steps of experience you want to collect by rolling-out the current version of the agent, or the number of updates to the model you want to make. Often, these numbers are similar. I think your diagram is sensible. I was only suggesting that the roll-outs probably make sense to occur in an on_epoch_start hook so that you can collect some experience, after which lightning will perform a training loop over each batch of your dataloader, which I was proposing to "fake" the length of your dataset by returning 5000*batch_size so that lightning will call your training_step 5000 times. Then all you need to do is get your experience collected in on_epoch_start into whatever data structure is being sampled by your dataloader. Actually it looks like on_epoch_start is called before train_dataloader (which is called before every epoch rather than keeping around the dataloader across epochs) so it should be pretty easy to rollout in on_epoch_start, save a dataset (self.rollout_data = whatever), and then return a new dataloader in train_dataloader. That's how I see it, but maybe we're coming from different ends of the RL/lightning universe and this doesn't make sense to you. Either way, I'm excited to see what you come up with.

@djbyrne
Copy link
Contributor

djbyrne commented Feb 13, 2020

I came across the Ptan RL library which uses a class called ExperienceSource. This is essentially an iterator that keeps track of the environment and the weights of the current policy and rolls out the batch of trajectory data. I think this is aligned with what you were describing @AwokeKnowing

@AwokeKnowing
Copy link

AwokeKnowing commented Feb 13, 2020

@djbyrne yes that's the general idea. Though the ExperienceSource there seems to include the part about working with gym and DQN-specific concepts etc.

I think for a PyTorch Lightning, it would make more sense to have the ExperienceDataManager not know how to work with gym and specific buffers etc, but rather be focused on interfacing with the Lightning Agent and Lightning Dataset. Maybe a better name is DynamicDataset

So concretely, on the Lightning side, we need to provide a class that 'looks like a dataset' (to the dataloader) but also can receive 'model checkpoints'.

Then the users could use a library like Ptan or their own, or just a simple couple handcoded methods to launch/run the 'gym'. But Lightning would give them automatic flow up 'updated agents', and a clear place to feed the data.

It seems a bit odd that a 'dataset' should have this functionality, but in RL the 'dataset' is very much 'alive', and changes in the model directly affect the data that is passed to the dataloader. There may be something we can learn from the Unity ML agents about where to separate the concerns.

So think of simplest possible environment that provides an observation of a number 1 to 10, and the action is 1 or 0 to say whether it's over 5 or not, and the reward is 1 or 0. We need Lightning to think of the series of observations and rewards as a DynamicDataset, and we need Lightning to provide the agent checkpoints to the DynamicDataset so that it can continue to generate (unlimited) data.

@djbyrne
Copy link
Contributor

djbyrne commented Feb 13, 2020

@AwokeKnowing yeah I agree that the EDM should not need to know about the specific env or buffer and should really just be an interface.

If the lightning model contained a function for env_step() where the user can provide the logic for carrying out a single step of their specific environment. The EDM would have a reference to the PL model which provides access to the weights, forward and the env_step. Then the EDM can handle the rollout agnostic to the type of environment being used and provides the dataset interface for the dataloader.

I wonder is this actually a problem that lightning should be trying to solve or should this be solely in the domain of the user?

@AwokeKnowing
Copy link

AwokeKnowing commented Feb 13, 2020

@djbyrne for the question of should 'lightning solve it', the question is, is there some 'repetitive' code that all RL projects will be writing to wire these together? If so, I think yes, because the point of Pytorch Lightning is that I want to just write the logic (the model, and the code to interact with "minecraft" or my own env using the model) and I don't want to write the code to manage checkpoint agents, and transform pool my observations into a 'dataset'.

What would help is to actually use PL to do 10 similar and totally different RL projects, and see what is the repetitive code specific to RL 'data', and try to put that part in PL as a DynamicDataset.

my expectation is that a common thread is managing the agent checkpoints, and batching together observations in randomized buffers of sequences. It would be good to have something that you can hook and environemnt to and start 'filling up' a dataset. The difference between RL and other dynamic datasets (eg a webcam) is that the 'agent' affects the data. So standardizing the way to plug the PL agent (checkpoints) into a dataset, and saving the samples to disk or db as buffers. Leaving the RL practicioner to write the code of the model, and the code of directly pulling samples from the environment given a particular agent. And there would need to be clear place to inject logic of how to select data from the buffers to feed to dataloader.

@djbyrne
Copy link
Contributor

djbyrne commented Feb 13, 2020

Id certainly be up for building some varied examples of RL projects with lightning. Get a better idea of what works across the board

@Borda Borda added feature Is an improvement or enhancement help wanted Open to be worked on labels Feb 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Is an improvement or enhancement help wanted Open to be worked on
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants