Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pure Continuous version of ALE on CPP #550

Merged
merged 33 commits into from
Aug 8, 2024
Merged
Changes from 1 commit
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
9d3605d
step by step
jet-sony Aug 1, 2024
78ad5b4
stash, gotta get back to work
jet-sony Aug 1, 2024
5ae27d9
remove discrete implementation and use only continuous
jjshoots Aug 3, 2024
b2af342
split the thresholds
jjshoots Aug 3, 2024
229f1ed
add thresholds
jjshoots Aug 3, 2024
0355c91
use true-to-game actions
jjshoots Aug 3, 2024
0bc1ea3
I think... I'm happy with this interface for now
jjshoots Aug 3, 2024
998e3e3
amend stella env
jjshoots Aug 3, 2024
c3a9164
remove redundant params
jjshoots Aug 3, 2024
f25a114
make default parameter
jjshoots Aug 3, 2024
10903c5
amend interface to have default parameter at top level
jjshoots Aug 3, 2024
84ca055
swap parameter order and implement continuous for wrappers
jjshoots Aug 3, 2024
9aef3d7
maybe stella shouldn't use default params
jjshoots Aug 3, 2024
3a971e8
move discretization to Python
jjshoots Aug 3, 2024
cef5892
fix some bugs
jjshoots Aug 3, 2024
b9ac273
fix another bug
jjshoots Aug 3, 2024
4189fb8
stash
jjshoots Aug 3, 2024
9dfe314
stash
jjshoots Aug 3, 2024
3c9c8aa
fix some more bugs
jjshoots Aug 3, 2024
86722e9
ALWAYS the rogue curlies you gotta watch out for
jjshoots Aug 3, 2024
fae19c1
streamline
jjshoots Aug 3, 2024
2e3d879
fixing tests
jjshoots Aug 3, 2024
d7f37b7
make int
jjshoots Aug 3, 2024
ea133ae
fix argument
jjshoots Aug 3, 2024
444dde3
passing tests
jjshoots Aug 3, 2024
2d86927
fix bug
jjshoots Aug 3, 2024
b58e9b2
use full action space in continuous mode
jjshoots Aug 3, 2024
a5df888
precommit
jjshoots Aug 3, 2024
ea2303f
fix bug
jjshoots Aug 3, 2024
394c515
additional warning
jjshoots Aug 3, 2024
be3e869
precommit
jjshoots Aug 3, 2024
c38033e
update interface signature
jjshoots Aug 4, 2024
140f95e
change to default emulate strength 1.0
jjshoots Aug 6, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
fix some bugs
  • Loading branch information
jjshoots committed Aug 3, 2024
commit cef5892da4807da4399fc8849b26aaf29f5a429e
22 changes: 12 additions & 10 deletions src/environment/stella_environment.cpp
Original file line number Diff line number Diff line change
@@ -107,7 +107,7 @@ void StellaEnvironment::reset() {
int noopSteps;
noopSteps = 60;

emulate(PLAYER_A_NOOP, PLAYER_B_NOOP, noopSteps);
emulate(PLAYER_A_NOOP, PLAYER_B_NOOP, 0.0, 0.0, noopSteps);
jjshoots marked this conversation as resolved.
Show resolved Hide resolved
// Reset the emulator
softReset();

@@ -122,7 +122,7 @@ void StellaEnvironment::reset() {
// Apply necessary actions specified by the rom itself
ActionVect startingActions = m_settings->getStartingActions();
for (size_t i = 0; i < startingActions.size(); i++) {
emulate(startingActions[i], PLAYER_B_NOOP);
emulate(startingActions[i], PLAYER_B_NOOP, 0.0, 0.0);
}
}

@@ -163,13 +163,15 @@ reward_t StellaEnvironment::act(Action player_a_action, Action player_b_action,
// past the terminal state
for (size_t i = 0; i < m_frame_skip; i++) {
// Stochastically drop actions, according to m_repeat_action_probability
if (rng.nextDouble() >= m_repeat_action_probability)
if (rng.nextDouble() >= m_repeat_action_probability) {
m_player_a_action = player_a_action;
m_paddle_a_strength = paddle_a_strength;
}
// @todo Possibly optimize by avoiding call to rand() when player B is "off" ?
if (rng.nextDouble() >= m_repeat_action_probability)
if (rng.nextDouble() >= m_repeat_action_probability) {
m_player_b_action = player_b_action;
m_paddle_b_strength = paddle_b_strength;
}

// If so desired, request one frame's worth of sound (this does nothing if recording
// is not enabled)
@@ -183,16 +185,16 @@ reward_t StellaEnvironment::act(Action player_a_action, Action player_b_action,
m_screen_exporter->saveNext(m_screen);

// Use the stored actions, which may or may not have changed this frame
sum_rewards += oneStepAct(m_player_a_action, m_player_a_strength,
m_player_b_action, m_player_b_strength);
sum_rewards += oneStepAct(m_player_a_action, m_player_b_action,
m_paddle_a_strength, m_paddle_b_strength);
}

return std::clamp(sum_rewards, m_reward_min, m_reward_max);
}

/** This functions emulates a push on the reset button of the console */
void StellaEnvironment::softReset() {
emulate(RESET, PLAYER_B_NOOP, m_num_reset_steps);
emulate(RESET, PLAYER_B_NOOP, 0.0, 0.0, m_num_reset_steps);

// Reset previous actions to NOOP for correct action repeating
m_player_a_action = PLAYER_A_NOOP;
@@ -213,8 +215,8 @@ reward_t StellaEnvironment::oneStepAct(Action player_a_action, Action player_b_a
noopIllegalActions(player_a_action, player_b_action);

// Emulate in the emulator
emulate(player_a_action, paddle_a_strength,
player_b_action, paddle_b_strength);
emulate(player_a_action, player_b_action,
paddle_a_strength, paddle_b_strength);
// Increment the number of frames seen so far
m_state.incrementFrame();

@@ -250,7 +252,7 @@ void StellaEnvironment::pressSelect(size_t num_steps) {
}
processScreen();
processRAM();
emulate(PLAYER_A_NOOP, PLAYER_B_NOOP);
emulate(PLAYER_A_NOOP, PLAYER_B_NOOP, 0.0, 0.0);
m_state.incrementFrame();
}

Loading