IC2S2 Cologne July 10 2017
Extended Abstract Submitted to INTERNATIONAL CONFERENCE ON COMPUTATIONAL SOCIAL SCIENCE, Cologne 2017
Quantifying Attention to Electoral Integrity of Foreign States with Text Analysis
keywords: sanctions, elections, web-scraping, text analysis, event collections
Sanctions in international relations are viewed as progressive tools for achieving political objectives. Among the most prominent goals of sanctions are the objectives of promoting democ ratization, human rights and respect for political oppositions. Our research project is designed to use web-scraping and text-analysis techniques to identify all instances in which the U.S. considered or imposed sanctions in pursuit of those objectives. We aim at improving the precision of existing sanctions data by generating a dataset that is complete and exhaustive. We also want to ensure the transparency of the data generating process. Some scholars have collected data in the past to measure the occurrence and intensity of sanctions. We identify four data sources: (1) HSE dataset [Hufbauer et al., 1990]; (2) Hadewych [Hazelzet, 2001]; (3) WPD - World Peace for Democracy; (4) TIES dataset [Morgan et al., 2014]. We examine subsets of these datasets that have looked at sanctions with the aim of improving human rights, respect of the opposition and promoting democratization. These datasets use bibliographic sources, Lexis-Nexis searches, searches on Keesings World News archive, searches on UN and EU files and other news sources to identify if and when countries were targeted by sanctions. A notable feature of almost all existing efforts is that the precise nature of the data collection procedure is not detailed. This raises the prospect of low inter-dataset agreement: If the procedure is driven by arbitrary reliance on sources, on a case by case basis, we may well see little overlap between the different datasets.
We use web-scraping and text-analysis techniques to identify all election-related instances in which the U.S. considered or imposed sanctions in pursuit of human rights, democratization, quality of election or respect of the opposition. We comb through more than 2 million documents contained in the Congressional Record, the American Presidency Project, and the New York Times; using those different sources helps us differentiate between congressional and executive threat and imposition of sanctions. At the heart of our approach is to use actions-reactions salient language, around specific events of interest (in particular elections)
By using Natural Language Processing (NLP) solutions, we look for (foreign, or American) government responses (and types of responses) to types of policy issues that arise in (progress toward of deviation from) democracy in other countries. Given an election under study, we first defined a window of three years before and after this event and adopted a document-filtering approach to collect a sub-set of our collection composed by all the documents published in this time-period which mention at least once the name of the country. Country mentions were detected and disambiguated using the entity linker TagMe [Ferragina and Scaiella, 2010]. This leads us to collect a mean of over a thousand documents for each election. After creating this initial sub-collection, we follow different strategies, depending on the type of document under study.
Regarding in particular the Congressional Record, we initially create customised trigger lists that include once, trigger phrases regarding democratic violations and once from the policy responses and we adopt them to identify relevant documents.