PhD Dissertation Defense - Zifu Wang - Geography and Geoinformation Science

Mar 7, 2025, 1:00 - 3:00 PM

Exploratory Hall, Room 2312

PhD Candidate: Zifu Wang
PhD of Science, Earth Systems and Geoinformation Sciences
Department of Geography and Geoinformation Science

Date: Friday, March 7, 2025

Time: 1:00 pm - 3:00 pm

Location: Exploratory 2312

Virtual Meeting link: email zwang31@gmu.edu for Zoom link

Dissertation Chair: Dr. Chaowei Yang (GMU, GGS)

Committee Members:

Dr. Naoru Koizumi (GMU, Schar School of Policy & Government)

Dr. Taylor M. Anderson (GMU, GGS)

Dr. Ruixin Yang (GMU, GGS)

Abstract:

The Internet, especially online News media, provides a rich source of spatiotemporal data, capturing descriptions of critical events, locations, and their temporal evolution. Extracting insights from such data is essential for obtaining actionable information to address regional and global challenges. However, it is challenging to (1) extract context aware geographic information, (2) classify articles under varying data constraints, and (3) automate spatiotemporal visualization from extracted information. This dissertation addresses these challenges using two case studies—the Sudan conflict and the global illegal kidney trade. Extracting actionable information for the use cases can help us either plan for humanitarian needs in global conflicts or protect victims from illegal organ trades. Large Language Models (LLMs), such as Bidirectional Encoder Representations from Transformers (BERT) and Generative Pre-trained Transformers (GPT), offer potential solutions to these challenges through their ability to capture complex linguistic patterns and contextual relationships. However, simple LLM prompts often fail to generate the structured outputs required for multi-entity extraction, struggle with achieving consistent classification performance, and cannot automate map creation with the quality and precision required for effective visualizations. To address these limitations, this research employed advanced tuning strategies such as Retrieval-Augmented Generation (RAG) to optimize location extraction, improve classification accuracy, and streamline spatiotemporal mapping.

The first challenge lies in extracting detailed geographic information and associating it with temporal and thematic attributes. News articles often include multi-entity location descriptors and require structured outputs tailored to the context. For the Sudan conflict, structured outputs such as 'neighborhood, state, country, and date' are essential to associate incidents with precise spatiotemporal contexts. In the kidney trade project, structured outputs like ‘buyer: country name’ are needed to identify relational roles among transnational actor countries. Traditional Named Entity Recognition (NER) tools often fail to accurately extract such structured information or associate geographic entities with contextual elements like dates and roles, requiring extensive manual intervention. I employed RAG to enhance GPT’s capability in generating structured outputs for complex geographic descriptors. RAG-tuned models demonstrated significant improvements, achieving an F1 score exceeding 0.9 for multi-entity location and date extraction in the Sudan conflict case. In the kidney trade case, the F1 score of identifying roles such as buyers, sellers, brokers, and surgery host countries ranged from 0.73 to 0.86, depending on the specific role being extracted.

The second challenge involves classification tasks with distinct constraints. In the Sudan case, multi-label classification was required, as a single article could report multiple overlapping event types, making classification more complex due to limited training data and label ambiguity. In contrast, the kidney trade case employed binary classification, distinguishing related vs. unrelated articles with a well-defined, mutually exclusive distinction, resulting in a simpler classification task. The difference in task complexity contributed to the variation in performance metrics: GPT with RAG achieved an F1 score of 0.6879 and BERT achieved 0.6301 in the Sudan case, where multiple labels increased classification difficulty. Meanwhile, in the kidney trade case, where only two categories were assigned per article, BERT achieved 88.75% accuracy, benefiting from the clear label separation and larger dataset.

The third challenge is the lack of automated workflows for visualizing spatiotemporal patterns. Current approaches require manual effort to produce high-quality maps with accurate data labels, appropriate basemaps, and essential map elements such as legends, scale bars, and titles. Automated mapping seamlessly merged extracted locations as points of interest, geocoding them to determine spatial coordinates and deriving appropriate map scales and resolutions based on the coordinates and geographic scope. Classification outputs were used for data symbology, while LLM-based summarization provided concise and contextually relevant map titles, creating high-quality visualizations to support spatiotemporal analysis. For spatiotemporal visualization, an automated mapping workflow was developed for the Sudan conflict case, integrating extracted data with ArcPy to demonstrate patterns of incidents over time. In the kidney trade case, a comprehensive database was constructed to describe trade cases annually across countries, detailing their roles as buyers, sellers, brokers, or surgery hosts. Visualization tools were used to depict this database and provide insights into transnational kidney trade networks spanning two decades.

This dissertation demonstrates that advanced tuning methods, such as RAG, significantly enhance the accuracy and consistency of geographic location extraction, while BERT and GPT models effectively adapt to different classification tasks across varying data constraints. By developing automated mapping workflows, this research provides an approach to bridge the gap between unstructured textual data and actionable spatiotemporal insights. The methodologies proposed in this study, demonstrated through the Sudan conflict and kidney trade use cases, offer potential solutions for analyzing multi-incident events and visualizing transnational networks. These findings are not only applicable to humanitarian crises in global conflict and public health monitoring but also extensible to potentially combating illicit activities and understanding global patterns of resource flows. Future research could focus on refining LLMs to handle more complex multi-entity relationships, improving runtime efficiency, and expanding the automated mapping framework.

Upcoming Events

Thesis Defense - Heather Walters, Biology MS

Mason Math Odyssey Enrichment Camp

FOCUS Camp and Academy

PhD Dissertation Defense - Zifu Wang - Geography and Geoinformation Science