Geotagging model v1.0.0
We are excited to release our latest geotagging text-to-location model based on the ByT5-encoder. The transformer-based model offers enhanced accuracy and improved validation metrics, enabling more precise predictions of the location of texts.
Features:
- ByT5-encoder based model for geotagging: We have introduced a novel transformer-based model specifically designed for geotagging tasks. Leveraging the power of the ByT5-encoder, this model predicts the coordinates of texts originating from various geographic clusters.
- Granularity: The model encompasses a comprehensive set of 3000 distinct clusters, allowing for more precise location predictions.
- Confidence inference: The model automatically determines the confidence of its predictions by identifying the highest probability associated with a specific location cluster.
Validation Metrics:
As part of this release, we have made significant improvements to the validation metrics for ByT5 geotagging model.
- Median Error: We have achieved a median error of 19.22 kilometers and Mean Absolute Error (MAE): For the top 10% of samples ranked by confidence, the MAE is 434.16 kilometers. This metric provides insights into the average absolute discrepancy between the predicted and true locations, emphasizing the model's overall reliability.
This update enhances the geotagging capabilities, enabling more accurate identification of the coordinates associated with texts. For more details on the implementation and usage, please refer to the updated documentation in the repository. We welcome any feedback or contributions from the community to further improve and refine the geotagging model.