Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reward handler issues #275

Closed
lydiatliu opened this issue Aug 11, 2016 · 5 comments
Closed

Reward handler issues #275

lydiatliu opened this issue Aug 11, 2016 · 5 comments
Milestone

Comments

@lydiatliu
Copy link

lydiatliu commented Aug 11, 2016

reward handler might be firing twice, or firing late, or firing the wrong value?

e.g. I can get -200 reward for turn -1, though the only reward handlers in my mission are:

<RewardForTouchingBlockType>
        <Block reward="-100.0" type="obsidian" behaviour="onceOnly"/>
        <Block reward="100.0" type="stained_hardened_clay" behaviour="oncePerBlock"/>
      </RewardForTouchingBlockType>
      <RewardForSendingCommand reward="-1" />
@DaveyBiggers
Copy link
Member

So far I've been totally unable to reproduce this, using the attached script.
There were a handful of cases where the agent started the mission in the wrong spot - which occasionally triggered a reward - but this turned out to be because the agent was being attacked by mobs, due to another bug (#280).

The double reward firing hasn't happened once, in around 15000 missions. Is it possible the bug lies in the rl-framework code?

(NB: I'm testing on an updated code-base which contains the fix for slow xml reward parsing - #261 - perhaps this has an impact?)

# ------------------------------------------------------------------------------------------------
# Copyright (c) 2016 Microsoft Corporation
# 
# Permission is hereby granted, free of charge, to any person obtaining a copy of this software and
# associated documentation files (the "Software"), to deal in the Software without restriction,
# including without limitation the rights to use, copy, modify, merge, publish, distribute,
# sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
# 
# The above copyright notice and this permission notice shall be included in all copies or
# substantial portions of the Software.
# 
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT
# NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
# DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
# ------------------------------------------------------------------------------------------------

import MalmoPython
import os
import random
import sys
import time
import json
import random
import errno

def GetMissionXML():
    ''' Build an XML mission string that uses the RewardForCollectingItem mission handler.'''

    return '''<?xml version="1.0" encoding="UTF-8" ?>
    <Mission xmlns="http://ProjectMalmo.microsoft.com" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
        <About>
            <Summary>Nom nom nom</Summary>
        </About>

        <ModSettings>
            <MsPerTick>5</MsPerTick>
        </ModSettings>

        <ServerSection>
            <ServerHandlers>
                <FlatWorldGenerator generatorString="3;7,220*1,5*3,2;3;,biome_1" />
                <DrawingDecorator>
                    <DrawCuboid x1="-21" y1="226" z1="-21" x2="21" y2="227" z2="21" type="air"/>
                    <DrawCuboid x1="-21" y1="226" z1="-21" x2="21" y2="226" z2="21" type="stained_glass" colour="PINK"/>
                    <DrawCuboid x1="-20" y1="226" z1="-20" x2="20" y2="226" z2="20" type="emerald_block" />
                </DrawingDecorator>
                <DrawingDecorator>
                    ''' + GetMineDrawingXML() + '''
                </DrawingDecorator>
                <ServerQuitFromTimeUp timeLimitMs="150000" description="out_of_time"/>
                <ServerQuitWhenAnyAgentFinishes />
            </ServerHandlers>
        </ServerSection>

        <AgentSection mode="Survival">
            <Name>Emma</Name>
            <AgentStart>
                <Placement x="0.5" y="227.0" z="0.5"/>
                <Inventory>
                </Inventory>
            </AgentStart>
            <AgentHandlers>
                <VideoProducer>
                    <Width>640</Width>
                    <Height>480</Height>
                </VideoProducer>
                <RewardForTouchingBlockType>
                    <Block type="obsidian" reward="100.0" behaviour="onceOnly"/>
                </RewardForTouchingBlockType>
                <AgentQuitFromTouchingBlockType>
                    <Block type="stained_glass" description="out_of_arena"/>
                    <Block type="obsidian" description="wooo"/>
                </AgentQuitFromTouchingBlockType>
                <DiscreteMovementCommands/>
            </AgentHandlers>
        </AgentSection>

    </Mission>'''


def GetMineDrawingXML():
    ''' Build an XML string that contains some randomly positioned "mines"'''
    xml=""
    for item in range(200):
        x = 0
        z = 0
        while abs(x) < 4:
            x = random.randint(-20,20)
        while abs(z) < 4:
            z = random.randint(-20,20)
        type = random.choice(["ice", "obsidian", "obsidian"])
        y = random.randint(226,227)
        xml += '''<DrawBlock x="''' + str(x) + '''" y="''' + str(y) + '''" z="''' + str(z) + '''" type="''' + type + '''"/>'''
    return xml


sys.stdout = os.fdopen(sys.stdout.fileno(), 'w', 0)  # flush print output immediately

recordingsDirectory="F:\RewardTestRecordings"
try:
    os.makedirs(recordingsDirectory)
except OSError as exception:
    if exception.errno != errno.EEXIST: # ignore error if already existed
        raise

validate = True
my_mission = MalmoPython.MissionSpec(GetMissionXML(),validate)
agent_host = MalmoPython.AgentHost()
try:
    agent_host.parse( sys.argv )
except RuntimeError as e:
    print 'ERROR:',e
    print agent_host.getUsage()
    exit(1)
if agent_host.receivedArgument("help"):
    print agent_host.getUsage()
    exit(0)

# Create a pool of Minecraft Mod clients.
# By default, mods will choose consecutive mission control ports, starting at 10000,
# so running four mods locally should produce the following pool by default (assuming nothing else
# is using these ports):
my_client_pool = MalmoPython.ClientPool()
my_client_pool.add(MalmoPython.ClientInfo("127.0.0.1", 10000))
my_client_pool.add(MalmoPython.ClientInfo("127.0.0.1", 10001))
my_client_pool.add(MalmoPython.ClientInfo("127.0.0.1", 10002))
my_client_pool.add(MalmoPython.ClientInfo("127.0.0.1", 10003))

if agent_host.receivedArgument("test"):
    num_reps = 1
else:
    num_reps = 30000

for iRepeat in range(num_reps):
    # Set up a recording
    my_mission_record = MalmoPython.MissionRecordSpec(recordingsDirectory + "//" + "Mission_" + str(iRepeat) + ".tgz")
    my_mission_record.recordRewards()
    my_mission_record.recordCommands()
    my_mission_record.recordObservations()
    my_mission_record.recordMP4(24,800000)
    max_retries = 3
    for retry in range(max_retries):
        try:
            # Attempt to start the mission:
            agent_host.startMission( my_mission, my_client_pool, my_mission_record, 0, "missionEndTestExperiment" )
            break
        except RuntimeError as e:
            if retry == max_retries - 1:
                print "Error starting mission",e
                print "Is the game running?"
                exit(1)
            else:
                time.sleep(2)

    world_state = agent_host.getWorldState()
    while not world_state.is_mission_running:
        time.sleep(0.1)
        world_state = agent_host.getWorldState()

    reward = 0.0    # keep track of reward for this mission.

    # main loop:
    while world_state.is_mission_running:
#        agent_host.sendCommand(random.choice(["movenorth 1", "movesouth 1", "moveeast 1", "movewest 1"]))
        agent_host.sendCommand(random.choice(["move 1", "move -1", "turn 1", "turn -1"]))
        world_state = agent_host.getWorldState()
        if world_state.number_of_rewards_since_last_state > 0:
            # A reward signal has come in - see what it is:
            delta = world_state.rewards[0].getValue()
            if delta != 0:
                print "New reward: " + str(delta)
                reward += delta

        time.sleep(0.002)

    # mission has ended.
    print "Mission " + str(iRepeat+1) + ": Reward = " + str(reward)
    if reward > 100:
        print "ERROR!!!!!!!!!!!!"
        exit(1)
    time.sleep(0.2) # Give the mod a little time to prepare for the next mission.

@timhutton timhutton modified the milestone: Dolphin Aug 15, 2016
@timhutton
Copy link
Contributor

Clicks for Dave for discovering this unused parameter bug: https://github.com/Microsoft/malmo/blob/master/Malmo/src/TimestampedReward.cpp#L155
which I introduced. This will be fixed shortly.

@timhutton
Copy link
Contributor

timhutton commented Aug 16, 2016

Fixed in 8e50eb2

@SamNPowers
Copy link

Is it possible there's been a regression? I'm currently seeing reward doubling on 0.36.0

Scenario 1:

            <RewardForMissionEnd>
                <Reward description="out_of_time" reward="-1000" />
            </RewardForMissionEnd>
            <AgentQuitFromReachingCommandQuota total="1500" description="out_of_time"/>

This gives me rewards of only -1000, as it should. No doubling.

Scenario 2:

            <RewardForMissionEnd>
                <Reward description="out_of_time" reward="-1000" />
                <Reward description="found_goal" reward="1000" />
            </RewardForMissionEnd>
            <AgentQuitFromReachingCommandQuota total="1500" description="out_of_time"/>
            <AgentQuitFromTouchingBlockType>
                <Block type="redstone_block" description="found_goal"/>
            </AgentQuitFromTouchingBlockType>

This is scenario 1 with a second type of reward added, the found_goal reward. In this case I almost exclusively see -2000 and 2000 rewards.

This doesn't seem expected to me; I was aiming for just rewards of -1000 and 1000.

@melorian94
Copy link

melorian94 commented May 16, 2022

I am having the same issue with the findthegoal mission. I changed the movement type to discrete and i am having many moves with 0 rewards and then some steps later multiple rewards. One solution i found was to use a time.sleep of around 150ms and then the values for the reward work properly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants