-
Notifications
You must be signed in to change notification settings - Fork 431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Discussion]: Audio Query Support to master #577
Comments
Thanks for raising the issue, I agree that it would be super interesting to add this directly into ALE. If so, for multi-modal policies, would it be worth adding the option for observations that contain both image and audio? |
Yes, can do. What would be the preferred way to submit a working version given the state of the original PR, and wanting to maintain attribution/credit? The other question: given the doc/example/ folder was removed a while back and replaced with a minimal set of markdown docs, should the scripts added here 53eb67d be converted to markdown and moved under docs/ or removed?
The original commit added a combined call on the ale python interface: I did not integrate that change into the version we maintain, opting to keep the lower level ale python interface as minimal as possible. For the AtariEnv, returning an observation that contained both the 'obs_type' and audio would seem ideal, given gym wraps everything required for a step(). I am exclusively using the ale api, so I do not currently have audio exposed on the gym interface. If this would be useful to expose to gym, one way to achieve this could be to:
For applications that do not use audio, it would be backwards-compatible. It might be useful for _get_obs() to return an extensible dict or utility class in the future, as there's been research which used image data along with RAM as additional compressed state. Open to suggestions. |
Amazing, could you make PR here, we can link / reference to the older versions
I'm happy with a minimal interface as that means less work to support
I would use Are you currently working on a research paper with this? Is there anything that I should be aware of? |
Ok, thank you. I'll get started on this.
No paper planned as of yet, just general research. |
While putting together a PR of the original commit (#233) on top of latest, I reviewed what was done and decided to refactor/simplify the code. The original commit dual-purposed the SoundSDL+SoundExporter path in order to implement the feature. Conflating SoundSDL with additional support for a per-frame sound observation through SoundExporter added a level of complexity that will most likely make maintenance more difficult and error-prone due to the multiple boolean switches to enforce mutually exclusive behavior in both the exporter and sound playback paths, while also pulling in ale common utility headers to an emucore base class. Additionally, generating a sound observation from the TIA sound registers does not require SDL support, so an artificial dependency is created. It looks like the main reason why the feature was implemented through SoundExporter on SoundSDL was to:
The commit I'm proposing at #579 does the following:
For the API:
The other consideration is how many users would be interested in multi-modal tests in gym. We could forego modifying the gym api until there's demand, and in the meantime developers of multi-modal gym applications could call env.ale.setBool("sound_obs", True) and env.ale.getAudio() directly. Thoughts? Pull request: #579 |
Providing an interface to query the raw audio samples is a very useful feature for multi-modal research.
The topic has been discussed before, with feature support added here: #233
But the discussion from 2021 #183 left off with the recommendation to use the fork until the feature could be refined.
The main issue with this approach, is there have been many useful changes in master since; and with those changes, enough divergence to require modifications to the fork. On our end, we have to maintain a custom build in order to support these audio use cases, which is not ideal as it can be error-prone and costly from a time perspective.
Is there a plan to merge this feature to master?
The text was updated successfully, but these errors were encountered: