Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi @paul-gauthier, I'm trying to get this to work with the current versions and I faced the same issues in #5 . These are some updates to this repo, but some would need to happen on the Aider side.
This installation pins swebench==1.1.5 because report.py requires
swebench.metrics.report.get_model_report
which was removed from later versions (I'm not clear what they replaced it with).There is an extra change I had to do to run SWE-Bench Lite that I haven't added here because there might already be a better way to do it. That change is to
report.py
, and it's about importingLITE_DATASET_FNAME
and substituting it in most places that mentionFULL_DATASET_FNAME
. But there might be a better way to do that or existing functionality. I haven't read the code in detail yet.On the aider side, what I've had to change was:
1- in repomap.py: in
get_scm_fname
Changeexcept KeyError
toexcept (KeyError, TypeError)
2- Also in
repomap.py
:change
to
Otherwise, it may fail when facing file extensions not in tree-sitter.
Also need to point out this now needs to be run in python 3.11. SWE-bench-docker's
run_evaluations.py
needsasyncio.TaskGroup
.So for my setup, I did this before installing the requirements: