Releases: strohne/Facepager
Version 4.5.3
Please be aware of changing access restrictions implemented by the platforms. If something does not work as expected, first read the reference of the specific API. You will find links to the references in the presets.
The latest Mac build is not yet ready, you'll find the previous version below.
Latest changes:
- Application rate limit handling in the Facebook module. The speed is throttled to one request per minute as soon as 95% of the rate limit is reached and unthrottled to full speed when the requests calmed down.
- Signed installer for Mac. This simplifies the installation, though it is not yet perfect (notarization is still an issue).
- Add first, last, min, and max modifiers to pull out IDs from the fetched data. Usage for example in the pagination setup for Mastodon:
*.id|last
. - Bug fix in the transfer nodes function. The function can be used in crawling scenarios to create new seed nodes from already fetched nested data.
- Bug fix in the re modifier. The re modifier can be used to extract data with regular expressions. For example, in the column setup you can use
snippet.description|re:#[^\W]+
to extract hashtags from the snippet.decription field.
Sorry, the binaries are huge due to lots of functionality from PySide behind the scenes. Feel free to contact me if you have ideas for improving the packages.
Version 4.4.4
Note that the binary for Mac is delayed and will be updated later. If this release crashes on your computer, try an older version!
Please be aware of changing access restrictions implemented by the platforms. Preregistered access is volatile. Due to changes in the Facebook API you can't request metadata about pages with the preregistered access anymore. Fetching posts still works.
Latest changes:
- Load multiple API docs with different basepaths
- Refactored settings handling to support Crowdtangle and Twitter v2 (see the Presets and the API viewer)
- Transfer nodes feature for crawling
- Timestamp conversion modifier, see https://github.com/strohne/Facepager/wiki/Webscraping#supported-modifiers
- Revised token response workflow for OAuth (e.g. for using VKontakte)
- Signed paginated URLs in Amazon module
- Markdown formatting in the preset window
- Bug fixes
Sorry, the binaries are huge due to the awesome builtin web browser (Qt for Python). If you have ideas how to improve the installers, feel free to contact me.
Version 4.3.10
Please notice: the release date of GitHub doesn't reflect the latest changes. You will find newer releases, after November 2020, in the download section below.
Latest changes:
- Load multiple API docs with different basepaths
- Fix decoding issue
- Fix "content already consumed" error
- Option to keep complete data in the offcut node (useful for webscraping)
- Fixed Facebook login - the red success error message over which some have stumbled should be history :)
- Maximum size parameter to cap large downloads
- Adjusted the Generic module to support Twitter API v2 (for the academic research track)
- Google announced the YouTube login procedure will be changed in January 2021. Embedded browsers will not be supported anymore, instead users must login using the system browser. You'll find the option "External OAuth2" in the settings. If the login to YouTube fails, try external OAuth2.
- New login process. You will be presented two options: use the preregistered Facepager app or your own app. When using the Facepager app, we need to maintain an anonymized user list due to the API providers terms. You will find an explanation in the privacy policy of Facepager.
- Experimental screenshot and rendered HTML feature (see Preset)
- HEAD verb in Generic Module. Useful for resolving shortlinks or redirects. See the preset in the scraping category.
- Updated SSL certificate.
- Updated internal browser
- Under the hood refactorings.
Sorry, the binaries are huge. You always want the latest technology, don't you?
Note: Facepager is under perpetual reconstruction. Keep an eye on the status log. If you encounter any bugs or black cats, update to the latest version and report in the issues section.
Version 4.2.22
Note: Facepager is under perpetual reconstruction. Keep an eye on the status log. If you encounter any bugs or black cats, update to the latest version and report in the issues section.
Latest changes:
- Open browser with an URL generated from the query settings: hold control key and click
Fetch data
- Select style in the settings (maybe try out Fusion style on Mac?)
- Move some settings into an extra window, these settings are saved when closing Facepager
- Update preset version number
- Fix timer function
- Minor improvements and bug fixes
Version 4.2.16
Note: Facepager is under perpetual reconstruction. Keep an eye on the status log. If you encounter any bugs or black cats, update to the latest version and report in the issues section.
Latest changes:
- Find nodes function
- Progress indicator when adding large amounts of nodes
- Fixed missing icons
- Open db files with command line (Windows: right click the file, open with Facepager)
- Faster delete (not that fast, though)
- Extract data for multiple nodes
- Parse Twitter dates using a new "shortdate"-modifier, e.g. in the column setup add created_at|shortdate. This converts dates such as "Tue Mar 29 08:11:25 +0000 2011" to ISO 8601 dates such as "2011-03-29T08:11:25+00:00"
- Parse JavaScript with the js-modifier. This is useful for extracting data from JavaScript that is embedded in HTML. First extract the script tags, then pipe the content into to js-modifier and select the property you want to extract. For example, in the Generic module use the Extract data-function with the following key:
text|xpath://script/text()|js:fancydata
- Fetch data for multiple nodes in Twitter Streaming module (by increasing the threads)
- Support multiple identical parameter names
- Categories in the preset window are now ordered alphabetically, filenames are changed when a preset is renamed
- Minor improvements
Version 4.2.8
Note: At the moment, Facepager is under heavy reconstruction, a bunch of features is under development. Keep an eye on the status log. If you encounter any bugs or black cats, update to the latest version and report in the issues section. After updating to a new version, reinstalling default API definitions from GitHub may be necessary: start Facepager, wait for the message in the status log, restart Facepager.
Latest changes:
- Webscraping features in the Generic Module: Set the response format to "text" and you will find the HTML source code of downloaded pages in the text property. Then, you can use CSS selectors, XPath and regular expression to extract data. See the wiki for a very brief explanation.
- Preview in the Extract data dialog. This will greatly help you with webscraping. Type keys such as
text|xpath://a
and you will directly see the HTML content of all a-elements. ClickingApply
creates new nodes. Or you can devlop your keys here and enter them into the column setup or even in the placeholders for further fetching actions. - Renaming of keys in the column setup or when extracting data. Prefix your key with
newname=
, for examplelinks=text|xpath://@href
will save all links contained in the text property under the new keylinks
. - Resume canceled data collection, even with pagination. See the tooltip of the Resume collection checkbox.
- Option to stop pagination based on data from a request (e.g. stop if the value "hasnextpage" is empty)
- Login using cookies: Authorization=header; Name=Cookie; Click settings button next to login button, choose "Cookie", add URL of website, then click Login. After logging into the website, the cookies are transfered to the access token field. Close the login window.
- Detect rate limit in Generic module (status 429)
- Timestamp modifier
|timestamp
in keys converts timestamp to date & time. - API key support in YouTube module
- Improved error logging (request errors won't stop the whole process, error nodes are created instead)
- Separate all connections. Each request gets its own session now.
- Support empty keys for extracting the object ID (e.g. to get IDs of Twitter followers or friends).
- Pro tip: You can build a pipeline by creating multiple presets in the same category and then apply the category.
- Bug fix: bring tool windows to front on OSX (preset window, api viewer, extract data dialog)
Version 4.1.7
Latest changes:
- Experimental webscraping features: In the Generic Module, if you set the response format to "text", you will find the HTML source code of downloaded pages in the text property. Then, you can use CSS selectors and XPath to extract data. See the wiki for a very brief explanation.
- Improved handling of Facebook app rate limit
- Support paging in URL path: set paging param to placeholder
<page>
and use the placeholder in the base path or resource field. Useful for AJAX based paging of websites (e.g. ' load more' buttons) - Add tooltips in nodes view
- Bug fix in Twitter and YouTube paging mechanism
- Bug fix in export mechanism
- Bug fix in Twitter app-only login (necessary for Premium API)
- Improved speed of the file handler (file://), broken releases fixed at v4.1.7
Version 4.0.10
Latest changes:
- Facepager module supports login for own Facebook pages: 1. click settings, 2. add page ID (last part of the URL), 3. login.
- API definitions and presets are both downloaded from GitHub and may be updated with a reload button
- Scrape links out of webpages and crawl the web (response format option 'link')
- Save response data to files. All functions of the former Files module are incorporated into the Generic module. Just specify download folder and file name, set response format to file. All responses including json are downloaded to the folder.
- Removed wide format export option. If you need this, see https://github.com/strohne/Facepager/wiki/Data-Analysis
- Load CSV files with additional data as seed nodes
- Minor bug fixes
Version 4.0.4
Updated on 2019-08-29.
Changes on the surface:
- The Generic module can process XML, e.g. for downloading RSS-feeds. See the arxiv.org preset for an example.
- The Generic module can convert HTML to JSON. Thus, Facepager can be used for very simple webscraping tasks.
- The Generic module can save arbitrary text data. Why? I needed to download millions of small files, they messed up my file system. Storing them in a database is much more convenient.
- Introduced an API viewer, based on OpenAPI. You can plug in your own OpenAPI files.
- Faster selection of nodes, fixed hanging interface for large amounts of nodes.
Recent changes under the hood:
- Updated from Python2 to Python3
- Updated from PySide1 to PySide2
- Updated documentation to use OpenAPI format
Caveats:
- Facebook announced to close access to public pages API. Probably, in a few days the Facepager module will not work anymore. Google introduced restricted&sensitive scopes, restrictions are announced for October. Packaging the Mac version cost a great deal of nerves. Why is it worth the hassle? One of the primary goals of Facepager is to help people learn about automated methods. Documentation is a bit lagged but we are working on it. Feel free to provide presets or to contribute to the wiki. Find out what works and see the limitations. Trial and error.
- The resulting OSX file is huge, sorry for that. The internal webbrowser (QtWebView) blows up the file size.
Version 3.10.2
See installation hints on https://github.com/strohne/Facepager#installer
New features in v3.10:
- Generic module and Files module come with OAuth2 now.
- Post and put requests in Generic module and Files module
- Upload files with placeholders
<filename|file>
or<filename|file|base64>
(replace "filename" with the filename and select the folder in the settings) - Upload multipart/form-data. How to format the data will soon be documented in the Wiki (JSON with name-value pairs).
- Convert XML or HTML responses to JSON. This way Facepager goes beyond JSON APIs, e.g. for using Amazon.
- Amazon module (experimental)
- Custom categories for presets to improve the ordering.
- Improved handling of rate limits (automatic retry)
- Some convenience improvements
It took quite a long time to refactor the code for these features. Why is it worth the effort? Facepager is on its way to becoming a versatile cloud computing tool. You can now connect to Google Cloud Console. Try out the Getting Started for speech recognition: https://github.com/strohne/Facepager/wiki/Getting-Started-with-Google-Cloud-Platform
The macOS version is tested with HighSierra and probably will not work with older versions. Sorry for that.