Reward handler issues #275

lydiatliu · 2016-08-11T12:16:14Z

reward handler might be firing twice, or firing late, or firing the wrong value?

e.g. I can get -200 reward for turn -1, though the only reward handlers in my mission are:

<RewardForTouchingBlockType>
        <Block reward="-100.0" type="obsidian" behaviour="onceOnly"/>
        <Block reward="100.0" type="stained_hardened_clay" behaviour="oncePerBlock"/>
      </RewardForTouchingBlockType>
      <RewardForSendingCommand reward="-1" />

The text was updated successfully, but these errors were encountered:

DaveyBiggers · 2016-08-12T17:26:25Z

So far I've been totally unable to reproduce this, using the attached script.
There were a handful of cases where the agent started the mission in the wrong spot - which occasionally triggered a reward - but this turned out to be because the agent was being attacked by mobs, due to another bug (#280).

The double reward firing hasn't happened once, in around 15000 missions. Is it possible the bug lies in the rl-framework code?

(NB: I'm testing on an updated code-base which contains the fix for slow xml reward parsing - #261 - perhaps this has an impact?)

# ------------------------------------------------------------------------------------------------
# Copyright (c) 2016 Microsoft Corporation
# 
# Permission is hereby granted, free of charge, to any person obtaining a copy of this software and
# associated documentation files (the "Software"), to deal in the Software without restriction,
# including without limitation the rights to use, copy, modify, merge, publish, distribute,
# sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
# 
# The above copyright notice and this permission notice shall be included in all copies or
# substantial portions of the Software.
# 
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT
# NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
# DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
# ------------------------------------------------------------------------------------------------

import MalmoPython
import os
import random
import sys
import time
import json
import random
import errno

def GetMissionXML():
    ''' Build an XML mission string that uses the RewardForCollectingItem mission handler.'''

    return '''<?xml version="1.0" encoding="UTF-8" ?>
    <Mission xmlns="http://ProjectMalmo.microsoft.com" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
        <About>
            <Summary>Nom nom nom</Summary>
        </About>

        <ModSettings>
            <MsPerTick>5</MsPerTick>
        </ModSettings>

        <ServerSection>
            <ServerHandlers>
                <FlatWorldGenerator generatorString="3;7,220*1,5*3,2;3;,biome_1" />
                <DrawingDecorator>
                    <DrawCuboid x1="-21" y1="226" z1="-21" x2="21" y2="227" z2="21" type="air"/>
                    <DrawCuboid x1="-21" y1="226" z1="-21" x2="21" y2="226" z2="21" type="stained_glass" colour="PINK"/>
                    <DrawCuboid x1="-20" y1="226" z1="-20" x2="20" y2="226" z2="20" type="emerald_block" />
                </DrawingDecorator>
                <DrawingDecorator>
                    ''' + GetMineDrawingXML() + '''
                </DrawingDecorator>
                <ServerQuitFromTimeUp timeLimitMs="150000" description="out_of_time"/>
                <ServerQuitWhenAnyAgentFinishes />
            </ServerHandlers>
        </ServerSection>

        <AgentSection mode="Survival">
            <Name>Emma</Name>
            <AgentStart>
                <Placement x="0.5" y="227.0" z="0.5"/>
                <Inventory>
                </Inventory>
            </AgentStart>
            <AgentHandlers>
                <VideoProducer>
                    <Width>640</Width>
                    <Height>480</Height>
                </VideoProducer>
                <RewardForTouchingBlockType>
                    <Block type="obsidian" reward="100.0" behaviour="onceOnly"/>
                </RewardForTouchingBlockType>
                <AgentQuitFromTouchingBlockType>
                    <Block type="stained_glass" description="out_of_arena"/>
                    <Block type="obsidian" description="wooo"/>
                </AgentQuitFromTouchingBlockType>
                <DiscreteMovementCommands/>
            </AgentHandlers>
        </AgentSection>

    </Mission>'''


def GetMineDrawingXML():
    ''' Build an XML string that contains some randomly positioned "mines"'''
    xml=""
    for item in range(200):
        x = 0
        z = 0
        while abs(x) < 4:
            x = random.randint(-20,20)
        while abs(z) < 4:
            z = random.randint(-20,20)
        type = random.choice(["ice", "obsidian", "obsidian"])
        y = random.randint(226,227)
        xml += '''<DrawBlock x="''' + str(x) + '''" y="''' + str(y) + '''" z="''' + str(z) + '''" type="''' + type + '''"/>'''
    return xml


sys.stdout = os.fdopen(sys.stdout.fileno(), 'w', 0)  # flush print output immediately

recordingsDirectory="F:\RewardTestRecordings"
try:
    os.makedirs(recordingsDirectory)
except OSError as exception:
    if exception.errno != errno.EEXIST: # ignore error if already existed
        raise

validate = True
my_mission = MalmoPython.MissionSpec(GetMissionXML(),validate)
agent_host = MalmoPython.AgentHost()
try:
    agent_host.parse( sys.argv )
except RuntimeError as e:
    print 'ERROR:',e
    print agent_host.getUsage()
    exit(1)
if agent_host.receivedArgument("help"):
    print agent_host.getUsage()
    exit(0)

# Create a pool of Minecraft Mod clients.
# By default, mods will choose consecutive mission control ports, starting at 10000,
# so running four mods locally should produce the following pool by default (assuming nothing else
# is using these ports):
my_client_pool = MalmoPython.ClientPool()
my_client_pool.add(MalmoPython.ClientInfo("127.0.0.1", 10000))
my_client_pool.add(MalmoPython.ClientInfo("127.0.0.1", 10001))
my_client_pool.add(MalmoPython.ClientInfo("127.0.0.1", 10002))
my_client_pool.add(MalmoPython.ClientInfo("127.0.0.1", 10003))

if agent_host.receivedArgument("test"):
    num_reps = 1
else:
    num_reps = 30000

for iRepeat in range(num_reps):
    # Set up a recording
    my_mission_record = MalmoPython.MissionRecordSpec(recordingsDirectory + "//" + "Mission_" + str(iRepeat) + ".tgz")
    my_mission_record.recordRewards()
    my_mission_record.recordCommands()
    my_mission_record.recordObservations()
    my_mission_record.recordMP4(24,800000)
    max_retries = 3
    for retry in range(max_retries):
        try:
            # Attempt to start the mission:
            agent_host.startMission( my_mission, my_client_pool, my_mission_record, 0, "missionEndTestExperiment" )
            break
        except RuntimeError as e:
            if retry == max_retries - 1:
                print "Error starting mission",e
                print "Is the game running?"
                exit(1)
            else:
                time.sleep(2)

    world_state = agent_host.getWorldState()
    while not world_state.is_mission_running:
        time.sleep(0.1)
        world_state = agent_host.getWorldState()

    reward = 0.0    # keep track of reward for this mission.

    # main loop:
    while world_state.is_mission_running:
#        agent_host.sendCommand(random.choice(["movenorth 1", "movesouth 1", "moveeast 1", "movewest 1"]))
        agent_host.sendCommand(random.choice(["move 1", "move -1", "turn 1", "turn -1"]))
        world_state = agent_host.getWorldState()
        if world_state.number_of_rewards_since_last_state > 0:
            # A reward signal has come in - see what it is:
            delta = world_state.rewards[0].getValue()
            if delta != 0:
                print "New reward: " + str(delta)
                reward += delta

        time.sleep(0.002)

    # mission has ended.
    print "Mission " + str(iRepeat+1) + ": Reward = " + str(reward)
    if reward > 100:
        print "ERROR!!!!!!!!!!!!"
        exit(1)
    time.sleep(0.2) # Give the mod a little time to prepare for the next mission.

timhutton · 2016-08-16T08:55:19Z

Clicks for Dave for discovering this unused parameter bug: https://github.com/Microsoft/malmo/blob/master/Malmo/src/TimestampedReward.cpp#L155
which I introduced. This will be fixed shortly.

timhutton · 2016-08-16T09:01:46Z

Fixed in 8e50eb2

SamNPowers · 2018-08-17T19:33:57Z

Is it possible there's been a regression? I'm currently seeing reward doubling on 0.36.0

Scenario 1:

            <RewardForMissionEnd>
                <Reward description="out_of_time" reward="-1000" />
            </RewardForMissionEnd>
            <AgentQuitFromReachingCommandQuota total="1500" description="out_of_time"/>

This gives me rewards of only -1000, as it should. No doubling.

Scenario 2:

            <RewardForMissionEnd>
                <Reward description="out_of_time" reward="-1000" />
                <Reward description="found_goal" reward="1000" />
            </RewardForMissionEnd>
            <AgentQuitFromReachingCommandQuota total="1500" description="out_of_time"/>
            <AgentQuitFromTouchingBlockType>
                <Block type="redstone_block" description="found_goal"/>
            </AgentQuitFromTouchingBlockType>

This is scenario 1 with a second type of reward added, the found_goal reward. In this case I almost exclusively see -2000 and 2000 rewards.

This doesn't seem expected to me; I was aiming for just rewards of -1000 and 1000.

melorian94 · 2022-05-16T12:40:24Z

I am having the same issue with the findthegoal mission. I changed the movement type to discrete and i am having many moves with 0 rewards and then some steps later multiple rewards. One solution i found was to use a time.sleep of around 150ms and then the values for the reward work properly.

timhutton modified the milestone: Dolphin Aug 15, 2016

timhutton closed this as completed Aug 16, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reward handler issues #275

Reward handler issues #275

lydiatliu commented Aug 11, 2016 •

edited

Loading

DaveyBiggers commented Aug 12, 2016

timhutton commented Aug 16, 2016

timhutton commented Aug 16, 2016 •

edited

Loading

SamNPowers commented Aug 17, 2018

melorian94 commented May 16, 2022 •

edited

Loading

Reward handler issues #275

Reward handler issues #275

Comments

lydiatliu commented Aug 11, 2016 • edited Loading

DaveyBiggers commented Aug 12, 2016

timhutton commented Aug 16, 2016

timhutton commented Aug 16, 2016 • edited Loading

SamNPowers commented Aug 17, 2018

melorian94 commented May 16, 2022 • edited Loading

lydiatliu commented Aug 11, 2016 •

edited

Loading

timhutton commented Aug 16, 2016 •

edited

Loading

melorian94 commented May 16, 2022 •

edited

Loading