GIS Analyses


Data Preparation:	Data Analysis:

Climate data processing	Generate a random sample
Projecting raster data	Retrieve X,Y coordinates
Projecting vector data	Extract data from random sample
Resampling	Create the population density model
Creating a slope layer	Run population model with future climate data
Creating attribute tables

Data Preparation

Climate

Climate data processing

Accessing Data:

Data are available at the WorldClim website: http://www.worldclim.org/download (Hijmans et al. 2005). Here you will find data available for three categories: past, current, and future. We want to first explore the future scenarios available. If you click on "future" you will find that there are three models available: CCCMA, HADCM3, and CSIRO. For each model two emission scenarios (a2a, b2a) are available, and three years (2020, 2050, 2080). The resolution varies from 30 arc-seconds (finest scale) to 10 arc-minutes (very coarse scale). The model structures of CCCMA, HADCM3 and CSIRO differ but are similar in their climate predictions. Researchers frequently use all three in their research, but we will using the Hadley model.

Future Climate Data:*

*Data files for future scenarios are large and include coverage for the entire globe. Make sure adequate space exists to work with climate data.

1.) On the WorldClim website, navigate to the download page and click on the link for “Future conditions.”

2.) Download the zip file for the desired projected year, variable (tmin, tmax, precip), and emission scenario of interest (e.g. 2020, tmin, a2a).

3.) Extract zipped files to appropriate folder to work with later. Note: extracted files will have .bil extensions at this point.

4.) WorldClim data is in geographic (lat/long) coordinates, WGS 1984. You must define this projection for your newly downloaded layer.

a.) Open ArcToolbox

b.) Data Management Tools --> Projections and Transformations --> Define Projection. Navigate to the file of interest by clicking on the folder button to the right of the “Input Data Set or Feature Class” input.

c.) Select the data file of interest and click ‘Add.’

d.) Select the coordinate system. Choose “Select a predefined coordinate system.” Select Geographic Coordinate Systems --> World --> WGS1984.prj --> Apply --> OK

Unfortunately, the files with BIL extensions are not very compatible with ESRI products, so they must be converted to ASCII files before we can work with them directly. There are several ways to do this, the easiest being through DIVA-GIS, a freeware program available for download at http://www.diva-gis.org/. The following instructions demonstrate how to convert BIL files to ASCII using DIVA-GIS. It is important that the previous steps occur before conversion with DIVA-GIS as header files tend to get lost in the conversion, and as a result the latitude/longitude data will also be lost for all converted raster files.

5.) Open DIVA-GIS. Go to Data --> Import Gridfile --> Multiple files. Click on “BIL, BIP, BSQ” and Apply.

6.) Next, export the data. Go to Data --> Export Gridfile --> Multiple files. Click on “Arc ASCII” and Apply.

Now the data are ready to be projected to UTMs.

7.) Open ArcCatalog. Go to Data Management Tools --> Projections and Transformations --> Raster --> Project Raster --> Select a predefined coordinate system --> Projected Coordinate Systems --> WGS 1984 --> UTM Zone 37N.prj.

8.) Repeat step 7 for all converted raster files. (This process will also later be repeated for the population and elevation rasters.)

The next step involves clipping climate layers to the boundaries of Ethiopia. Political boundary layers are readily available on the web. Before clipping to a political boundary, however, the boundary must be defined and projected the same as the above climate layers.

9.) Open ArcCatalog. Go to Data Management Tools --> Raster --> Clip. Navigate to the raster layer to be clipped under “Input Raster.” Navigate to the layer the raster file is being clipped to under “Output Extent.” Click on “Use Input Features for Clipping Geometry.” Click OK.

10.) Repeat step 9 for all data files converted to ASCII.

Depending on the questions you are asking in your research, you may decide that the BIOCLIM variables (19) are appropriate explanatory variables for your study. These variables can be derived directly from the minimum temp, maximum temp, and precipitation variables using an aml script that can be run in ArcInfo. The script and directions are available at http://www.worldclim.org/bioclim-aml.

Return to top

placeholder

Project raster files to WGS 1984 UTM Zone 37N

(Population and Elevation rasters)

1.) Open Arc Toolbox.

2.) Data Management Tools --> Projections and Transformations --> Raster --> Project Raster

3.) For "Input Raster," select the raster to be transformed. Name the output raster and navigate to where it will be saved. Open the "Output Coordinate System" menu.

4.) The "Spatial Reference Properties" menu will open. Click: Select --> Projected Coordinate Systems --> UTM --> WGS 1984 UTM Zone 37N. Repeat for each raster.

Return to top

projections2

Project shape file to WGS 1984 UTM Zone 37N

(Woreda shape file)

1.) Projecting vector data follows essentially the same steps as projecting raster data. Open Arc toolbox.

2.) Data Management Tools --> Projections and Transformations --> Feature --> Project Feature

3.) For "Input Dataset or Feature Class," select the shape file to be transformed. Name the output file and navigate to where it will be saved. Open the "Output Coordinate System" menu.

4.) Select --> Projected Coordinate Systems --> UTM --> WGS 1984 UTM Zone 37N

Return to top

Resampling

Resample to create identical cell sizes for all rasters

For later analyses, all rasters must have the same cell sizes. To acheive equal cell sizes, resample data from the layer with larger cells to match the cell size of the higher resolution layer. This creates smaller cells with redundant values, but it prevents data from being lost, as would be the case if smaller cells were averaged into larger cells. For these data, resample the population and climate rasters so that they have the same cell size as the elevation raster. The population and climate resolution then be higher, but the accuracy of the data will remain the same.

1.) Identify the cell sizes for all rasters. For these data, the population and climate layers have cells that are 932.13 x 932.13 m. The elevation layer has cells that are 93.21 x 93.21 m. This means that each cell in the elevation layer is approximately 1 km2. Since the population densities are expressed in people per km2, having each cell contain 1 km2 will make interpretation of the data much easier.

a.) In ArcMap, right click on each layer. Go to "Properties" --> "Source," and scroll to cell size.

2.) Arc Toolbox --> Data Management Tools --> Raster --> Raster Processing --> Resample

3.) Identify input raster (population). Name the output raster "pop_nearest" and navigate to where it will be saved.

4.) Set output cell size to be the same as the Elevation layer.

5.) Select "NEAREST" as the resampling technique.

a.) Nearest neighbor resampling is appropriate for both discrete and continuous values. For this situation, it is preferable to the other resampling options because the input values are maintained in the output. Because we are going from coarser to finer resolution, we aren't interested in interpoating between points; we want to preserve the values in the original data.

6.) Repeat these steps for all climate rasters.

Return to top

Slope

Create a slope layer from the Digital Elevation Model

(DEM raster)

Topographic slope may be an important variable contributing to distribution of human settlements in Ethiopia. In order to be able to add slope as a parameter in the population density model, a slope raster must first be derived from the digital elevation model.

1.) In ArcMap, enable Spatial Analyst

a.) Tools --> Extensions --> Enable Spatial Analyst

b.) View --> Toolbars --> Spatial Analyst

c.) From the Spatial Analyst dropdown menu, select "Options" and select your working directory

2.) Calculate slope: Spatial Analyst --> Surface Analysis --> Slope

a.) The input surface should be the projected DEM ("dem_utm37n"). Name the output raster "dem_slope" and navigate to where it will be saved.

b.) Accept defaults. It won't affect the analysis if slope is measured in degrees or percent.

Percent Slope of Ethiopian Land Surface

Created by: Kelly Hopping and Greg Wann, December 5, 2009
Projection: WGS 1984 UTM Zone 37N
Data Source: NR 505, Warner College of Natural Resources, Colorado State University (Q:\nr505\nr505_08\EthiopiaData)

Return to top

AttributeTable

Create an Attribute Table

(Population raster)

The population raster currently has floating point data, but it must have integer data in order to be able to create an attribute table for it.

1.) As a preliminary step, the raster settings will need to be adjusted so that decimal places can be retained in long integers.

a.) In ArcMap --> Tools --> Options --> Raster --> Raster Attribute Table

b.) Enter 1,000,000,000 where it says "Do not build raster attribute tables when the number of unique values is greater than:"

If you are unable to modify your computer’s settings and the original population data is already at the maximum number of digits, go straight to the data analysis steps in the following section. You will then have to use the “pop_nearest” layer for sampling random points. This will not affect your results, but it will prevent you from being able to create an attribute table for the population data.

2.) In order to avoid losing data when converting to integer format, multiply population data by 10,000 to preserve all decimal places. (Data will be converted back to reasonable population density values at a later stage in the analysis.)

a.) On the Spatial Analyst toolbar in ArcMap, select "Raster Calculator" from the Spatial Analyst dropdown menu.

i.) Enter: [pop_10000] = [pop_nearest] * 10000

ii.) Evaluate

3.) To convert to integer data: Arc Toolbox --> Spatial Analyst Tools --> Math --> Int

a.) Select the new "pop_10000" raster as the input raster. Name the output raster "pop_integer" and navigate to where it will be saved.

4.) To create an attribute table: Arc Toolbox --> Data Management Tools --> Raster --> Raster Properties --> Build Raster Attribute Table

a.) Select "pop_integer" as the input raster. An attribute table may now be opened for this layer.

5.) At this point it is possible, but not neccessary, to convert population data back to reasonable population density values in the attribute table.

a.) In ArcMap Table of Contents, right click on "pop_integer" and open attribute table.

i.) Options --> Add Field: "people_km2"

b.) Right click on new column heading --> Field Calculator

i.) Choose" type = "double" (for long numbers); precision = "8" to preserve integer places; scale = "4" to preserve decimal places

ii.) * Enter: [Value/1000000]

* Note: This number (1,000,000) comes from multiplying 10,000 by 100 because the original population data were 2 orders of magnitude higher than actual Ethiopian population densities, and then we multiplied them by 10,000 to preserve decimal places. Actual Ethiopian population density numbers were obtained from the Atlas of the Rural Economy, allowing us to deduce that the GIS data had already been multiplied by 100 in the original raster available to us (Atlas 2006). Aside from differences in decimal placement, we have no reason to believe that the population raster data is inaccurate.

Return to top

Data Analysis

placeholder

Generate Random Sample Points

Random sample points will be generated using Hawth's Tools so that data can be obtained for a more manageable number of cells in each raster layer than if every cell were used. To ensure that both high and low population density areas are represented by the random sample, points are stratified by Woreda (Ethiopian political region). The population data seems to have been gathered by Woreda, so the population values do not vary much within each political region. However, climate and elevation will vary within Woredas, so we will select 10 random points per Woreda to capture climatic and topographic variation.

1.) Download and install Hawth's Tools, a free ArcGIS extension available online. (Hawth's Tools; http://www.spatialecology.com/htools/download.php)

2.) In ArcMap: Tools --> Extenstions --> Enable Hawth's tools

3.) View --> Toolbars --> Hawth's Tools

4.) On the Hawth's Tools toobar, click: Sampling Tools --> Generate Random Points

a.) Select "Polygon Layers" and choose projected Woreda layer

b.) Stratify points by Woreda. Select "ID" for "Polygon unique ID field." Generate 10 points per polygon.

c.) Name the output shapefile of random points and navigate to where it will be saved.

Randomly sampled points, stratified by Woreda

Created by: Kelly Hopping and Greg Wann, December 3, 2009
Projection: WGS 1984 UTM Zone 37N
Data Source: NR 505, Warner College of Natural Resources, Colorado State University (Q:\nr505\nr505_08\EthiopiaData)

Return to top

X,Y

Retrieve X,Y coordinates for random points

In order to locate where the randomly generated points fall on each raster layer, X,Y coordinates must be known for each point.

1.) In ArcMap, right click on the Random Point shape file and open its attribute table.

a.) Options --> Add field --> "Point_X"

b.) Options --> Add field --> "Point_Y"

2.) Arc Toolbox --> Data Management Tools --> Features --> Add XY Coordinates

a.) Coordinates are automatically added to new fields in attribute table

3.) In the attribute table, right click on "ID" column --> Field Calculator --> [FID] + 1

a.) This step formats the table for future data analysis in Excel

Return to top

Extract

Extract data from rasters for all random points

In order to do statistics on the data from the random sample points, the data must be entered into a worksheet. Before the data can be opened in Excel, it must be extracted from the raster layers.

1.) Toolbox --> Spatial Analyst Tools --> Extraction --> Sample

a.) Add all Current Climate rasters (for variables 1-19), Elevation raster, Slope raster, and Population raster.

b.) For "input location raster or point features," input the random points shape file

c.) Select your working directory as the output location for the table of extracted data points.

d.) Leave all defaults (e.g., "nearest," etc.)

2.) In ArcCatalog, right click on the table that was created by step 1

a.) Export --> to dBase (single) --> select a folder as the output location

3.) Name the new database file in ArcCatalog, as it may have lost its file name when it was converted from a table to a .dbf file

4.) In Windows Explorer, open the new database file with Microsoft Excel

5.) Rename the column headings to follow the original raster layer titles, in exactly the order that the rasters were input into the sample extraction in Step 1

6.) If the population data have not been converted back to people/km2, divide the population numbers by 100,000,000 in Excel. (See explanation for this number under "Create an Attribute Table.") If you were unable to multiply the original population data by 100,000, only divide by 100 in this step.

Return to top

Model

Create the population density model

A multiple regression approach was chosen to build a population density model for Ethiopia using a digital elevation layer and climate data (BIOCLIM) as our explanatory variables. Population density was our response variable. We chose SAS 9.2 as the statistical package for our analysis, but free statistical software is available that can readily implement our methods (for example, R). As a result, all steps outlined below are assuming the use of SAS.

1.) Using the MS Excel file created in the above steps (which includes all of the stratified random sample points), eliminate rows where all explanatory variables are zero (these are a result of errors in the BIOCLIM data). The ‘Sort’ option under ‘Data’ on the toolbar menu will make this step easy, but make sure the entire range of columns is highlighted before any sorting is done.

2.) Place the data into a template appropriate for the statistical software you have selected. For example, SAS and R both consist of editor windows where the code underlying the analysis is entered. Using a text editing program (such as MS Notepad) to write code makes this process easier. Examples of the steps that may be taken are outlined below (using SAS as the program example).

3.) The code from the text editor can be copied and pasted directly into the SAS editor, or an infile statement can be used directing SAS to read the data directly from the MS Excel file.

4a.) First run all explanatory variables in a single model (the ‘global model’ = 19 BIOCLIM variables + DEM + slope). Check for multicollinearity in the explanatory variables using collinearity diagnostics (variance inflation factor). This can be done in Proc Reg using the ‘/vif collinoint’ option at the end of the model statement.

4b.) Typically a variance inflation factor (VIF) value greater than 10 for a variable is of concern. Consider removing these variables from the model. Depending on the goals of the analysis, which correlated variables are removed will be a personal choice of the modeler.

4c.) Rerun the new global model (which excludes the variables removed in the previous step), this time using ‘/selection=adjrsq best=10 aic’ which provides the best 10 subsets of the global model based on the model selection criteria of AIC and adjusted R squared.

4d.) Consider running the model from step 2c again using a different selection criteria, such as backward selection (‘/selection = backward sls = 0.05’) or forward selection (‘/selection = forward sle=0.05’). It is a personal choice which model you decide on, but frequently different model selection methods will pick the same model as best.

5.) Examine model output. Select the model with the lowest AIC value and run this model just as the global model was run in step 2.

6.) Examine the output from step 5 above. The coefficients of this model (including slope) can now be applied to data layers in raster calculator (under Spatial Analyst).

Return to top

FuturePopulation

Map future population density in response to climate change

Use raster calculator to run the model with future climate data.

1.) In ArcMap, insert 2 new data frames: "2020" and "2050." Add the projected DEM, as well as the climate variables used in the final model (from 2020 and 2050, respectively) to each frame. Add the "current" population layer to the 2020 data frame as well.

2.) Activate the 2020 data frame, and select Raster Calculator from the Spatial Analyst Toolbar. Name the new layer "pop_2020" and enter your model equation. Add the current population as a term so that the model will grow off of the population density that already exists in each cell. This will allow the model to take account of where population density is already high for reasons other than climate, such as in the capital.

3.) Repeat step 2 for the 2050 climate variables, but add the new "pop_2020" term instead of current population so that the final population density will take account of the climate- driven population changes that occurred at the intermediate time step.

4.) To interpret how population changes in response to climate over time, use raster calculator to subtract current population from the future population layers. Each cell will then represent the amount that population density increased or declined in each cell.

5.) Step 4 may be repeated for the climate variables to see how they change through time as well.

Return to top