This data paper is meant to accompany the paper on deforestation and weather shocks proposed by Vaglietti et al. (2022)1. It aims to present the data used during the analysis and how those have been processed in order to obtain the final results.
Moreover, it presents additional databases, which have not been applied in the analysis but may be useful to answer future questions connected to the same topics. We are sure that the additional material, even if not applied in Vaglietti et al. (2022), may be useful to many scholars working on similar topics.
In particular, for each database will be provided the main characteristics in term of data availability, space/temporal coverage as well as the procedure followed to make the data comparable. In this case the purpose is mainly informative while for the reproduction of the experience to each database is also associated a code file2.
The results have been then aggregated in a panel dataset used to perform the econometrics presented in the paper.
For any further question: giulia.vaglietti@chaireeconomieduclimat.org
The data were retrieved from Funk et al. (2014)14.
Rainfall dataset information: Despite precipitations data are available at numerous time scale, from yearly to daily, the monthly one monthly have been chosen. In particular it was chosen the database covering Africa only, which as the others covers the period from 1981 up today; the data have been retrieved and processed until April 2021. Download at: https://data.chc.ucsb.edu/products/CHIRPS-2.0/africa_monthly/tifs/
The data are georeferenced data, in the format of .tif file, characterised by a 0.05 degrees scale: this level of detail will be used as a reference to aggregate all the subsequent data.
Processing:
monthly precipitations originally covering entire Africa, have been initially cropped by Congo borders (shapefile retreived from https://gadm.org/download_country_v3.html level 0);
then masked again by Congo borders to turn to NA the values external to the borders and maintaining only relevant data;
In the case of precipitations the processing has been addressed using Python.
Thanks to the availability of monthly data, two main aggregation procedure have been followed: by year and by agricultural season.
The yearly precipitations correspond to the cumulative precipitation over one entire year (sum data from January to December) over a single cell. The value is expressed in mm.
Yearly precipitation mean and deviation from the mean for the years of interest.
Droughts:
Floods:
Similarly to the yearly precipitations, the seasonal ones correspond to the cumulative precipitation over one agricultural season observed in single cell. The time span considered to calculate the cumulative precipitation is instead dependent from the agricultural season considered (growing, harvesting or planting) as well as from the agricultural areas (north, centre, south and extreme south) rearranged on the basis of the crop calendars defined by Sacks, W.J., D. Deryng, J.A. Foley, and N. Ramankutty (2010). In this case precipitations are aggregated only over the months of each agricultural season, which differs by area. Those have been calculated for maize (2 agricultural cycles per year) and cassava (a single agricultural cycle per year). Differently from the definition choices made by Sacks et al. (2010) the distinction between first and second cycle in our analysis it is only determined by the one happening first in the calendar year but is not influenced by the harvested surfaces or other factors. For instance, in the first planting season in year 2000, the precipitations are aggregated over:
MAIZE, PLANTING, FIRST SEASON:
north: month = [“02”, “03”]
centre: month = [“11”, “12”, “01”]
south: month = [“01”, “02”]
extreme south: month = [“11”, “12”]
MAIZE, PLANTING, SECOND SEASON:
north: month = [“06”, “07”]
centre: month = [ “07”,“08”, “09”]
south: month = [“09”, “10”]
extreme south: month = [no second season]
CASSAVA, PLANTING, UNIQUE SEASON:
north: month = [“04”, “05”, “06”, “07”, “08”]
centre, south and extreme south: month = [ “10”, “11”, “12”, “01”, “02”]
Again, the values are expressed in mm. In some cases, agricultural season may cross two years: for example the first planting season in the center regions starts in November (11) of the precedent year and ends in January (01) of the year under analysis. In those cases also the months of the previous / following year are included.
The definition of the areas, season and cycles is later discussed in the dedicated chapter [CROP CALENDAR & AGRICULTURAL AREAS].
Cumulative precipitations over the agricultural seasons of maize, first cycle, over the four regions (delimited by the black borders):
The Standard Precipitation Index (SPI), as described by McKee et al. (1993), is the index chosen to measure the impact of droughts. It was directly calculated from the previously processed precipitations data, with the aid of the python package climate indices.
It was produced at 1-month time scale to account for immediate impacts and evaluate adaptation’s reaction as well as not to attribute the impact to previous agricultural seasons or cycles. The distribution chosen is the gamma distribution.
Again the SPI observations were aggregated over seasons to extract season’s maximum, mean and minimum observation. A count of droughts happening during each season (episodes below the threshold set to -1.5) was also processed.
The crop calendar proposed by Sacks, W.J., D. Deryng, J.A. Foley, and N. Ramankutty (2010) has been used as an essential tool to aggregate precipitations over agricultural areas at grid-cell level. In fact, the crop calendar provides, per each crop, the starting/duration/end period of planting/growing/harvesting season at grid-cell level for each crop and each cycle.
The crop calendars proposed by Sacks et al. (2010) are available both by country macro-regions (as possible to visualise through this tool) and at georeferenced gridded level. The two are characterised by different time and geographical scale but both will serve the scope of our analysis. The first difference between the two is the geographical detail: the first represent by each coutry multiple undefined regions (in our case North, Centre, South and Extreme south) with mostly uniforms characteristics while the second allows to associate every known geographical point to a calendar (defined by the gridded structure of georeferenced data available at 5 min level). The second difference is the time scale and detail provided by the two: while the visual calendar gives a idea of the monthly aggregation of the agricultural cycle, the gridded calendar provides by each cell the day in which a certain agricultural season will start/end and the duration in days of each period.
Thanks to their characteristics, both have been implied in the structuring of our database: in fact, the monthly precipitations previously discussed have been summed for each season over the same periods defined by at macro-regions level. The gridded data, instead, have been used to associate to each point of the database to a precise macro-regions.
In function of DRC’s traditions, the staples under analysis are corn (double cropping) and cassava. These two have been chosen for their relevance in people’s diet as they are the main ingredients of fufu, the most commonly used dish in the area, a puree which works as base for meat and vegetables.
Those calendar have been then used to aggregate precipitations and drought indexes over time.
Deforestation data have been retrieved from the Global Forest Change 2000–2020 database (by Hansen et al., 2013); the database, which covers the entire globe, offer an exceptional overview of forest cover and deforestation over the last twenty years with an extraordinary resolution (1 arc-second per pixel, or approximately 30 meters per pixel at the equator). In particular, it is composed by 3 main parts: the forest cover share layer (reports forest cover in each pixel in year 2000), the treeloss (reporting the year in which a specific pixel has been deforested) and a mask layer to recognise mapped versus unmapped areas. Additionally a gain layer is available too. Nevertheless it only rapresents pixels gaining some forest only until 2012 excluding one period of our analysis and not specifying in which year those have been gaining).
Hansen data 2020 edition (mask, lossyear, gain, cover2000) processing:
have been initially cropped by Congo borders (retreived from https://gadm.org/download_country_v3.html level 0);
merged to cover all the area of Congo (7 raster files for each variable of interest, mask, loss..);
masked by Congo Borders to put to turn to na the values external to the borders;
The mask raster defines the areas that have been mapped (no data (0), mapped land surface (1) and permanent water bodies (2)). Only when the mask == 1 is of interest: The information = 1 has been kept the remaining values have been changed to na.
The remainng rasters (_lossyear, gain, cover2000_) have been processed to maintain the data only where the *mask raster* was ==1. This process assure us that we are processing informmations only in the areas actually mapped my Hansen.
The lossyear masked contains, for each pixel the year in which the represented area has been deforested.
Starting from this file, has been created for each year (from 2001 to 2020) a single file reporting a dummy (0,1) for each pixel.
A value of the dummy ==1 indicates that in the year analysed the forest in the area observed has been cleared. Zero otherwise.
All the processed files have been reashped in order to match our reference resolution (level of aggregation of the information), as well the origin and crs of the precipitation data https://data.chc.ucsb.edu/products/CHIRPS-2.0/africa_monthly/tifs/:
deforestation res = 0.0002500000000000000052,-0.0002500000000000000052
recipitation res = 0.05000000074505805969,-0.05000000074505805969 (reference file used is the already processed cumulative precipitation for 2020)
Lossdata, each year raster, has bee aggregated using an aggregation factor of 200: aggregating with the function *mean* the dummy pixel returns a value ranging from 0 and 1. This value can be interpreted as percentage area deforested in each pixel (percentage deforested of the area mapped).
The aggregated Lossdata was then resampled imposing as a model raster the **precipitation Congo2020** applying method = ‘ngb’. This last process assures the data all to precisely overlap to eachother, which will be essential to retrieve the time series.
Final loss by year:
Average yearly deforestation:
The same procedure has been applied to the mask raster; in this case the cell value (that again runs from 0 to 1) represent the percentage area per each new pixel (bigger) that has been mapped.
Final mapped area:
Lastly, the same procedure has been applied to the treecover2000: the cell values in this case runs from 0 to 100 and represent the percentage tree cover of each pixel.
Nevertheless, before reshaping, the treecover was matched with the loss year to obtain the forest cover share by each year only in the deforested areas.
Final file for Treecover in year 2000 (aggregation without considering the year):
Final file for Treecover Loss by year:
The cover change data has been retrieved from the Copernicus Climate Change Service which chategorise land use over years from 1992 until present and to understand its changes over time.
The global surface is classified over 37 classes at 300m scale, reason for which the files has been cropped and masked by the areas of origin and the reshaped to match the same level of detail of the present analysis (higher resolution).
In the following plotted examples of the reshaped land use change is represented simplyfing the 22 land use change into macro-groups.
Out of the 37 land use classification proposed by the database, in DRC the main classes observed where:
10 Agriculture - Cropland, rainfed;
11 Agriculture - Cropland, rainfed, herbaceous cover;
12 Agriculture - Cropland, rainfed, tree or shrub cover;
20 Agriculture - Cropland, irrigated or post-flooding;
30 Agriculture - Mosaic cropland (>50%) / natural vegetation (tree, shrub, herbaceous cover) (<50%);
40 Agriculture - Mosaic natural vegetation (tree, shrub, herbaceous cover) (>50%) / cropland (<50%);
50 Forest - Tree cover, broadleaved, evergreen, closed to open (>15%). Most represented category;
60 Forest - Tree cover, broadleaved, deciduous, closed to open (>15%). Third most represented category;
61 Forest - Tree cover, broadleaved, deciduous, closed (>40%);
62 Forest - Tree cover, broadleaved, deciduous, open (15- 40%). Second most represented category;
100 Forest - Mosaic tree and shrub (>50%) / herbaceous cover (<50%);
110 Grassland - Mosaic herbaceous cover (>50%) / tree and shrub (<50%);
120 Shrubland;
130 Grassland;
160 Forest - Tree cover, flooded, fresh, or brackish water;
170 Forest - Tree cover, flooded, saline water;
180 Wetland - Shrub or herbaceous cover, flooded, fresh/saline/brackish water;
190 Settlement - Artificial surfaces and associated areas;
200 Bare area;
210 Water - Natural water bodies // Artificial water bodies;
Conflicts have been retrieved from The Armed Conflict Location & Event Data Project database which reports data on political violence and protest events almost at global scale, covering Africa from 1997 to real time). Data are classified by typology of event, actors, actors interaction, time, location and fatalities; For the interest of this paper only location and fatalities were considered.
The data are provided in a .csv file so that have been transformed in a shapefile to visualise the location of any event happening from 2000 to 2020 (latitude and longitude are provided as attributes of the original data); To be able to assign a conflict probability to each cell and each year, it was produced a heat-map of conflicts, out of yearly conflict shapefile, using the Spatial kernel density estimate tool (available in the spatiaEco R package). Moreover, in order to account for fatalities a second heat-map was produced weighting the value of each conflict event by its number of fatalities.
The parameter set to define the area of influence, the Distance bandwidth of Gaussian Kernel, was set to 169504.94m corresponding to the radius of the average size of a province (90264 km2). The geometries assigned to the new georeferenced raster file were compliant with the precipitations raster files geometries previously discussed.
Of potential relevance could be to take in account the population, both in absolute (count) and relative terms (density). Those data are available at world level at 100m or 1 km resolution and are provided by the WorldPop working group. With respect to the most commonly used georeferenced data on population, which are provided on a 5 years basis, these are instead provided at yearly scale thanks to the application of a random forest machine learning model. The model uses a top-down approch to predict and distribute, over the georeferenced grid, the data available at administrative level. The decision process is fed through other georeferenced data as: settlement locations, settlement extents, land cover, roads, building maps, health facility locations, satellite nightlights, vegetation, topography, refugee camps etc. The database has to be used with precaution when including those in a econometric analysis.
Two data-set are available: processed through a top-down unconstrained model and a constrained one which respectively don’t consider or consider the mapping of human settle and buildings. Was chosen the first, which was also adjusted over the UN estimates, due to the peculiarity of the rural areas under analysis and its availability over time.
Data processing:
The unconstrained 1km population count as well as the population density for DRC from year 2000 to 2020 were rescaled to fit the precipitations resolution. In the first case the population was summed over pixels (to take in account all the individuals over the pixels) while in the second averaged (to take in account the pressure of the population on the land).
Two databases on Ethnicity have been taken in account:
Being a shapefile no specific resolution is provided so that, to match our data, those have been rasterised respecting the characteristics of our reference raster ([Yearly cumulative precipitations ]).
In case of overlapping polygons - that represent two different ethnicity insisting on the same area - multiple raster layers have been created.
Where each ID stands for either:
Ethnicity Group ID | Ethnicity name (main group) | Ethnicity name (secondary group) | |
---|---|---|---|
0 | 110 | Bakongo | NA |
2 | 124 | Bambundu | NA |
3 | 122 | Balunda | NA |
4 | 1180 | Wachokwe | NA |
5 | 137 | Banyaruanda | NA |
6 | 146 | Barundi | NA |
7 | 983 | Sere-Mundu: Ndogo-Sere, Mundu-Ngbaka | NA |
8 | 815 | Ngiri (incl. Ngundi, Bamitaba) | NA |
9 | 189 | Bobangi and Bangala | NA |
11 | 157 | Bateke | NA |
13 | 129 | Banda | NA |
18 | 393 | Gbaya | NA |
19 | 814 | Ngbandi | NA |
20 | 816 | Ngombe | NA |
23 | 148 | Basakata | NA |
26 | 113 | Bakuba | NA |
27 | 143 | Barega | NA |
28 | 120 | Baluba | NA |
31 | 135 | Bantu-speaking Pygmy tribes: Babinga, Bakwe, Batwa, Bakola, etc. | NA |
32 | 109 | Bakomo (incl. Mabudu) | NA |
33 | 138 | Banyoro | NA |
34 | 772 | Moru-Mangbetu and Sere-Mundu-speaking Pygmy tribes: Efe, Basua, etc. | NA |
35 | 771 | Moru-Mangbetu | Moru-Mangbetu and Sere-Mundu-speaking Pygmy tribes: Efe, Basua, etc. |
37 | 734 | Mba (Dongo) | NA |
38 | 144 | Bari | NA |
40 | 91 | Baboa | NA |
41 | 771 | Moru-Mangbetu | NA |
42 | 91 | Baboa | Moru-Mangbetu and Sere-Mundu-speaking Pygmy tribes: Efe, Basua, etc. |
56 | 761 | Mongo | Bakomo (incl. Mabudu) |
67 | 1023 | Southern Lwo | NA |
68 | 111 | Bakonjo | NA |
75 | 761 | Mongo | NA |
78 | 87 | Azande | NA |
81 | 168 | Bemba | NA |
85 | 109 | Bakomo (incl. Mabudu) | Moru-Mangbetu and Sere-Mundu-speaking Pygmy tribes: Efe, Basua, etc. |
87 | 129 | Banda | Sere-Mundu: Ndogo-Sere, Mundu-Ngbaka |
91 | 105 | Bakare | NA |
108 | 6 | Acholi | NA |
111 | 168 | Bemba | Balunda |
In the case of DRC the authors mapped 8 ethnic group based on the national provinces plus an additional mixed one to collect all the less represented identities (called “others”). The data, available at 1km resolution have been reshaped to match the reference characteristics required for our analysis.
Despite representing more in detail the distribution characteristic of each ethnic group, it represents a limited number of ethnicity, is thus essential to take both in consideration for robustness checks.
Unfortunately none of the two has a time evolution and should be then taken with precaution as it doesn’t account for the mobility of people and then, the possible changes in ethnicity prevalence over a region over specific periods. Specially in case of conflicts and environmental damages, people relocation may significantly alter the balance between ethnicity.
The global protected areas are listed and classified in the World Database on Protected Areas(WDPA) by UNEP and IUCN.
The database reports the georeferenced polygons of each PA together with essential attributes to know its characteristics, such as: ID, Protected Area Definition, name, category or type of protected area as legally/officially designated, access to resources policy, status, year of implementation of the status, governance, ownership and management etc.
The same process of rasterisation described above for the GREG database was also processed for the Congolese PA; this time the process was executed through the function fasterize which allowed to create a rasterbrick (a pile of rasters) whose layers were representing each year in which a PA was built and each cell was chategorized with the typology of PA insisting on it.
The design type of the PA were classified in the following 11 chategories:
Classification | Design type (eng) |
---|---|
1 | Biosphere Reserve |
2 | Community Reserve |
3 | Hunting Area |
4 | National Park |
5 | Nature reserve |
6 | Primate Nature Reserve |
7 | Ramsar Site, Wetland of International Importance |
8 | Scientific Reserve |
9 | UNESCO-MAB Biosphere Reserve |
10 | Wildlife Reserve |
11 | World Heritage Site (natural or mixed) |
Due to the existence of PA to which the status or the status year was unknown, a previous mechanical research was necessary to complete the missing data.
Data on elevation (filled Digital Elevation Models - DEM) are provided by HydroSHEDS. Those are available at different time scale and for each continent. It was chosen the 30 seconds DEM for Africa which has been then cropped and reshaped to match the reference rasters (precipitation). The slope was the computed starting from the processed DEM, by applying the R function terrain (from raster package).
Interactive map of the final DEM:
Nelson (2008) has produced a raster map reporting the travel distance (in minutes) from a given raster cell to the nearest city of 50,000 or more people in year 2000. The product covers the entire world and is provided at 30 arc seconds by the Forest Resources and Carbon Emissions (IFORCE).
Again it was cropped and reshaped to fit the precipitation resolution.
A shapefile reproducing DRC internal borders were retrieved by the World Research Institute: those represent province, district, territories, and sectors.
Per each variable and each year was finally obtained a single .tif file: using R, those were finally collected together in a raster stack to which it was assigned a time (year) identifier. They were finally exported through the tibble command and collapsed together in a panel-like database.
Potentially the database could be matched with a shapefile mirroring the raster grid of precipitations to allow spatial econometrics analysis.
Giulia Vaglietti, Philippe Delacote, and Antoine Leblois, Droughts and deforestation: does seasonality matter?, Working paper (2022).↩︎
To have the codefiles, please, do not hesitate to mail the authors.↩︎
Funk, C.C., Peterson, P.J., Landsfeld, M.F., Pedreros, D.H., Verdin, J.P., Rowland, J.D., Romero, B.E., Husak, G.J., Michaelsen, J.C., and Verdin, A.P., 2014, A quasi-global precipitation time series for drought monitoring: U.S. Geological Survey Data Series 832, 4 p. <http://pubs.usgs.gov/ds/832/>↩︎
Sacks, W.J., D. Deryng, J.A. Foley, and N. Ramankutty (2010). Crop planting dates: an analysis of global patterns. Global Ecology and Biogeography 19, 607-620. DOI: 10.1111/j.1466-8238.2010.00551.x.↩︎
Hansen, M. C., Potapov, P. V., Moore, R., Hancher, M., Turubanova, S. A., Tyukavina, A., … & Townshend, J. (2013). High-resolution global maps of 21st-century forest cover change. science, 342(6160), 850-853.↩︎
UNEP-WCMC - UN Environment Programme World Conservation Monitoring Centre (2019). User Manual for the World Database on Protected Areas and world database on other effective area-based conservation measures: 1.6. UNEP-WCMC: Cambridge, UK. Available at: http://wcmc.io/WDPA_Manual.↩︎
Nelson, A., 2008. Travel time to major cities: A global map of accessibility. Office for Official Publications of the European Communities, Luxembourg. URL: https://forobs.jrc.ec.europa.eu/products/gam/download.php, doi:10.2788/95835.↩︎
World Resources Institute (2009). Administrative Boundaries, Democratic Republic of Congo, 2009. Retrieved at https://earthworks.stanford.edu/catalog/tufts-drcadminbounds09 in 2021.↩︎
ESA (2017), Land Cover CCI Product User Guide Version 2 Technical Report European Space Agency and Climate Change Initiative (available ’at: http://maps.elie.ucl. ac.be/CCI/viewer/download/ESACCI-LC-Ph2- PUGv2_2.0.pdf).
Database: Land cover classification gridded maps from 1992 to present derived from satellite observations visited in year 2021.↩︎
Raleigh, C., Linke, A., Hegre, H., & Karlsen, J. (2010). Introducing ACLED: an armed conflict location and event dataset: special data feature. Journal of peace research, 47(5), 651-660. Data downloaded in year 2021 at https://acleddata.com/#/dashboard.↩︎
Lloyd, C. T., Chamberlain, H., Kerr, D., Yetman, G., Pistolesi, L., Stevens, F. R., … & Tatem, A. J. (2019). Global spatio-temporally harmonised datasets for producing high-resolution gridded population distribution datasets. Big earth data, 3(2), 108-139.↩︎
Weidmann, N. B., Rød, J. K., & Cederman, L. E. (2010). Representing ethnic groups in space: A new dataset. Journal of Peace Research, 47(4), 491-499.↩︎
Lehner, B., Verdin, K., & Jarvis, A. (2006). HydroSHEDS - Technical Documentation Version 1.0. USGS Earth Resources Observation and Science: Sioux Falls, SD, USA.↩︎
Funk, C.C., Peterson, P.J., Landsfeld, M.F., Pedreros, D.H., Verdin, J.P., Rowland, J.D., Romero, B.E., Husak, G.J., Michaelsen, J.C., and Verdin, A.P., 2014, A quasi-global precipitation time series for drought monitoring: U.S. Geological Survey Data Series 832, 4 p. <http://pubs.usgs.gov/ds/832/>↩︎