-
Notifications
You must be signed in to change notification settings - Fork 763
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Her 2031 - Improve login-form submission options #20
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
createRecorder(String,String) - new method for creating test Recorder and specifying charset createRecorder(String) - deprecate (use default charset as before)
use real use case actual content, actual urls from facebook for testing * ExtractorMultipleRegex.java some comments, little twiddles
use groovy templating facility
getEngine() - fix logic to respect isolateThreads setting
refactor for readability
more refactoring to avoid redundant operations
fix omission from last commit, uriRegex
working on making test work - first two scroll-down urls are extracted successfully, others fail
outLinks, outCandidates - use LinkedHashSet to ensure predictable order (any reason not to do that?)
remove the "fooIndex" thing from available bindings, since it's kinda hacky and turned out not to be needed for our use case
turns out that __adt parameter can be found near the json blob - most, but not all, expected links are now found
test passes now; extractor gets what it can, which is most of the scroll down urls
javadocs
keep cache of groovy Template objects, since they are expensive to create
remove temporary performance testing code
avoid "constant string too long" compile error
add "implements Closeable" since it already has the close() method
testLaxUrlEncoding() - Tests a URL not correctly url-encoded, but that heritrix lets pass through to mimic browser behavior.
include junit as a regular dependency not managed by eclipse, so source jar can be attached
…ng down the test http servers happen only once at start and finish respectively
runTest() - convenience logging of test failures
speculativeFixup() - improve detection of scheme-less intended-absolute-URIs
refactor so considerStrings() is not static, allowing it to be overridden in subclasses * ExtractorHTML.java, ExtractorJS.java add @Autowired parameter extractorJS used to process inline javascript, instead of call to static ExtractorJS.considerStrings()
makeExtractor() - call setExtractorJS(new ExtractorJS()) so testSpeculativeLinkExtraction() passes
…ses HER-1523) * UriUtils.java new method isVeryLikelyUri() with tighter heuristic than isLikelyUri() * ExtractorJS.java use UriUtils.isVeryLikelyUri(), and change order of operations to do fixup before call to isVeryLikelyUri(), since it doesn't expect strings with javascript escaping and stuff * StringExtractorTestBase.java handle test data with expected value null, meaning no outlinks expected * ExtractorHTMLTest.java avoid redundancy by using the extractor created in ContentExtractorTestBase.setUp() * ExtractorJSTest.java some new tests
avoid redundancy by using extractor built in setUp() makeExtractor() - call extractor.setExtractorJS(new ExtractorJS()) since we got rid of the static ExtractorJS stuff testConditionalComment1() - override to skip the test since it fails with JerichoExtractorHTML
…e-commons dependency
…ary.INSTANCE.link() with FilesystemLinkMaker.makeHardLink() in doRecover() - contributed by Andrés Aguilar
The new class BeanLookupBindings allows scripts to skip getBean calls. Lines like 'beanname = appCtx.getBean("beanname")' can be left out. Past scripts remain compatible. This change effects action directory scripts and rest console scripts.
bean lookup may be enabled by setting a variable named beanBindings to true in the Bindings or in the script.
BeanLookupBindings for simpler script access to beans
…thout compression
…sions to get rid of warnings
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Basic support for detecting and submitting to login forms, primarily via the ExtractorHTMLForms and FormLoginProcessor processors. Examples of configuration in the class Javadocs.