Skip to content

Latest commit

 

History

History
47 lines (32 loc) · 3.75 KB

README.md

File metadata and controls

47 lines (32 loc) · 3.75 KB

Geotagged tweets in English about earthquakes in Japan

The released dataset contains 39,772 geotagged tweets in English, which are related to earthquakes in Japan and have been collected with the Twitter Standard Streaming API. The tweets have been posted between March 1, 2020 and February 28, 2021 (one-year period) and contain the terms "earthquake" and "Japan" in their text.

The provided geoinformation is generated by an automatic geotagging methodology that transforms English tweets into georeferenced data by using their textual content to detect mentioned locations. After proper preprocessing, Named Entity Recognition (NER) techniques are employed in the form of a pre-trained Bidirectional Long Short-Term Memory (biLSTM)-based model to retrieve location-type mentions in the tweet’s text. Terms that are recognized as places (they can be single-word, e.g. “Tokyo”, or multi-word, e.g. “Sendai Airport”), are then associated to a geographical point (pair of coordinates) through a query to OpenStreetMap API.

Data Organization

Property Name Property Type Description
_id String The unique identifier of a tweet, as provided by Twitter.
detected_locations Array An array of objects that contain information about the locations that have been extracted from a tweet’s text.
location_in_text String The word(s) in a tweet’s text that has/have been recognized as locations after analysis.
location_fullname String The full location name as retrieved by the OpenStreetMap API.
geometry JSON Object A JSON object that contains information about the coordinates of the location.
type String
(predefined value: "Point")
A field that defines the type of the coordinates.
coordinates Array
(format [latitude,longitude])
An array of Double values that refer to the latitude and longitude coordinates of the location, as retrieved by the OpenStreetMap API.

Licensing

This dataset is licensed under the Creative Commons Attribution-NonCommercial International Public License (CC BY-NC). When downloading tweets by the means of the distributed Tweet IDs, users have to be compliant with Twitter’s Developer Agreement and Policy. By using this dataset, you agree to abide by the stipulations in the license, remain in compliance with Twitter’s Terms of Service, and cite the following manuscript.

How to Cite

Andreadis, S., Gialampoukidis, I., Manconi, A., Cordeiro, D., Conde, V., Sagona, M., Brito, F., Pantelidis, N., Mavropoulos, T., Grosso, N. and Vrochidis, S., 2022. Earthquakes: From Twitter Detection to EO Data Processing. IEEE Geoscience and Remote Sensing Letters, 19, pp.1-5.

BibTeX:

@article{andreadis2022earthquakes,
    title={Earthquakes: From Twitter Detection to EO Data Processing},
    author={Andreadis, Stelios and Gialampoukidis, Ilias and Manconi, Andrea and Cordeiro, David and Conde, Vasco and Sagona, Manuela and Brito, Fabrice and Pantelidis, Nick and Mavropoulos, Thanassis and Grosso, Nuno and others},
    journal={IEEE Geoscience and Remote Sensing Letters},
    volume={19},
    pages={1--5},
    year={2022},
    publisher={IEEE}
}

Contact

If you have any further questions about the dataset or if you are interested in running some additional analyses on the data, please contact Stelios Andreadis at andreadisst@iti.gr.

Team

Stelios Andreadis, Nick Pantelidis, Thanassis Mavropoulos, Ilias Gialampoukidis, Stefanos Vrochidis, Ioannis Kompatsiaris

Information Technologies Institute (ITI), Centre for Research and Technology Hellas (CERTH)