Releases: floschnell/properwatcher
Update dependencies, new modules & use await/async
This release should increase performance once more, because the whole core has been refactored to use Rust's async/await language features to drive IO tasks asynchronously.
Debug module
The debug observer module can be used to get a listing of all found properties rendered to stdout.
Removed Firebase
Firebase support has been removed due to the lack of missing async/await support. DynamoDb is recommended in its favor.
CSV filter module
The CSV filter module can be used to deduplicate entries from an existing CSV file. It is recommended to be used in combination with the CSV observer module. When both are active, properwatcher can be restarted without revisiting all properties again. It will check the CSV file, which it has written previously, and only observe properties that did not exist yet in the CSV.
Updated project's dependencies
All dependencies have been updated to their current latest version to improve stability and performance.
Introducing filters, better configuration and a more robust pipeline
Filters
A new type of module has been added. Filters can remove found properties from the pipeline, before they are further processed (enriched and observed). For now, there's only the dynamodb
filter, which removes items that have already been written to the database.
Configuration changes
There have been breaking changes in the way modules are configured. In a single place service access needs to be configured i.e., access to DynamoDb or Firebase. These access information/configuration can then be used by different modules (filters, enrichers, observers). For instance, there's now a filter and an observer, both using the DynamoDb config object.
An example toml configuration file can as usually be found in the repository. Pay special attention to the filters
, enrichers
and observers
attributes. Those let you configure an array of string each, enabling different modules.
More robust pipeline
Rather than having different pipeline stages that will be run to completion, one after another. Each property will now flow from filter over enriching to the observers. This is not parallelized yet, but soon will be. For now one item after another will be processed. That has the big advantage, that if you are running properwatcher from an environment that limits execution time, properties that have been completed (and persisted) do not need to be processed in the retry, if the run has hit the execution boundary.
First release
Lambda support
You can now use the zip release to create a custom runtime lambda. Trigger it with a JSON representation of the properwatcher configuration file and receive a list of results in JSON format.
{
"thread_count": 1,
"watchers": [
{
"address": "https://www.immobilienscout24.de/Suche/de/bayern/muenchen-kreis/wohnung-mieten?enteredFrom=one_step_search",
"city": "Munich",
"crawler": "immoscout",
"property_type": "Flat",
"contract_type": "Rent"
}
],
"dynamodb": {
"enabled": true,
"table_name": "properties",
"region": "eu-central-1"
}
}
DynamoDb support
Either from lambda or docker, publish new found properties to a DynamoDb. Pass credentials via environment variables or set right permissions in the lambda role.