How social media predict the natural environment and how topological data analysis can model it

Presentation: How social media predict the natural environment and how topological data analysis can model it.

Chris Jones, School of Computer Science & Informatics Cardiff University

Monday, April 15, 3:30pm – 4:30pm
Building 8, Level 8, Room 44

Biographical note

Chris Jones is Professor Geographical Information Systems at Cardiff University. Previously he held academic positions at the University of South Wales and the University of Cambridge.  He also worked at BP Exploration and the British Geological Survey. Research interests include aspects of geographical information retrieval (GIR); exploitation of social media to model the environment; interpretation and generation of geo-spatial natural language; application of topological data analysis to geospatial data; place name ontologies and gazetteers, especially vernacular place names; spatio-textual indexing methods; 3D modelling for city models; and automated cartographic design with regard to map generalisation and automated text placement. Work on cartographic text placement led to the development of the commercial cartographic program Maplex which is now an ESRI product.

Talk summary

Social media such as photo-sharing web sites frequently contain tags describing aspects of the environment relating for example to climate, land cover, scenicness and wildlife distribution. This talk illustrates the use of a bag-of-words based machine learning method that exploits the tags to predict these phenomena. When the tag data are combined with conventional scientific environmental datasets the results are significantly improved relative to those based solely on the conventional datasets. When used independently, the tag data sometimes outperform the scientific datasets. Prediction quality can be further improved with a location-specific adaptation of word embeddings that reduces the dimensionality of the tag data.

A major challenge in modelling environmental data is handling the imprecision of point-based observations. This talk also shows how the relatively novel field of topological data analysis employs methods, from persistent homology, with the potential to assist in tracking and analysing the form of spatio-temporal patterns.