Monday, January 12, 2015

MAP DIGITIZATION PROCESS IN GIS



MAP DIGITIZATION PROCESS
Digitizing is the process of converting features on a paper map into digital format. To digitize a map, you use a digitizing tablet (also known as a digitizer) connected to your computer to trace over the features that interest you. The x,y coordinates of these features are automatically recorded and stored as spatial data. (Davis, 2001)

Digitizing with a digitizing tablet offers another way, besides screen digitizing freehand, to create and edit spatial data. You can convert features from almost any paper map into digital features. You can use a digitizer in conjunction with the editing tools in ArcMap to create new features or edit existing features on a digital map (Murayama, 2012).

You might want to digitize features into a new layer and add the layer to an existing map document or create a completely new set of layers for an area for which no digital data is available. You can also use a digitizer to update an existing layer on your digital map.
 Sample table digitizer


THE FOLLOWING ARE THE PROCESSES/STEPS FOR MAP DIGITIZATION
Step 1: Set up your digitizing tablet and install the driver software
To use a digitizing tablet with ArcMap, it must have WinTab-compliant digitizer driver software. To find out if a WinTab-compliant driver is available for your digitizer, see the documentation that came with the tablet or contact the manufacturer.
If you installed ArcGIS before installing your digitizer, the Digitizer tab may not appear in the Editing Options dialog box. To add the tab, you must register the digitizer.dll file using the ArcGIS ESRIRegAsm.exe utility. You need to have administrator privileges to perform these steps.
Tip:
If you have installed the ArcGIS ArcObjects Software Development Kit, you can simply browse to the directory containing the digitizer.dll, right-click it, then register it from the shortcut menu.
1.     Close any open ArcGIS applications.
2.     Start the DOS Command Prompt, which is usually accessed by clicking Start, pointing to Programs (or All Programs), then clicking Accessories.
3.     In the Command Prompt window, type cd and a space, followed by the path to the directory containing the ESRIRegAsm.exe utility: C:\Program Files\Common Files\ArcGIS\bin. This changes the Command Prompt's active directory to the folder where the ESRIRegAsm.exe utility is installed.
4.     Press the ENTER key.
5.     Type ESRIRegAsm.exe, a space, a quotation mark, the full path to your ArcGIS installation location with the name of the DLL, and a closing quotation mark. The default path is "C:\Program Files\ArcGIS\Desktop10.1\bin\digitizer.dll". If you installed ArcGIS in another directory, substitute that path.
6.     Press the ENTER key.
7.     If the registration was successful, close the Command Prompt window. The Editing Options dialog box will have the Digitizer tab when you restart ArcMap.


 A specialist processing map data for digitizing


Step 2: Configure the digitizer puck buttons
After installing the driver software, use the WinTab Manager setup program to configure the buttons on your digitizer puck (you might have to turn on your digitizer and reboot your machine before you can use the setup program). One puck button should be configured to perform a single click to digitize point features and vertices; another button should be configured to perform a double-click to finish digitizing line or polygon features. You might also want to configure a button to perform a right-click so you can access shortcut menus.
With any development programming language, you can configure additional buttons to run specific ArcMap commands.
Step 3: Ensure the quality of your paper map
Your map should ideally be reliable, up-to-date, flat, and not torn or folded. Paper expands or shrinks according to the weather. To minimize distortion in digitizing, experienced digitizers often copy paper maps to a more stable material, such as Mylar.
Step 4: Establish control points on the paper map
Before you can begin digitizing from your paper map, you must first establish control points that you will later use to register the map to the geographic space in ArcMap. If your map has a grid or a set of known ground points, you can use these as your control points. If not, you should choose between four and 10 distinctive locations, such as road intersections, and mark them on your map with a pencil. Give each location a unique number, and write down its actual ground coordinates.
Once you have identified at least four well-placed control points, place your map on the tablet and attach it with special residue-free putty; masking tape; or drafting tape, which looks like masking tape but leaves less residue when it's removed. You do not have to align the map precisely on your tablet; ArcMap corrects any alignment problems when you register the map and displays such adjustments in the error report.
Step 5: Register the paper map
After identifying your control points, you must register your paper map in real-world coordinates. This allows you to digitize features directly in geographic space.
Registering your map involves recording the ground coordinates for the control points you identified. These are recorded using the Digitizer tab of the Editing Options dialog box.
After you have entered the ground coordinates, ArcMap displays an error report. The error report includes two error calculations: a point-by-point error and a root mean square (RMS) error. The point-by-point error represents the distance deviation between the transformation of each input control point and the corresponding point in map coordinates. The RMS error is an average of those deviations.
ArcMap reports the point-by-point error in current map units. The RMS error is reported in both current map units and digitizer inches. If the RMS error is too high, you can reregister the appropriate control points. To maintain highly accurate data, your RMS error should be less than 0.004 digitizer units (often inches) or the equivalent scaled distance in map units—the ground units in which the coordinates are stored. For less accurate data, the value can be as high as 0.008 digitizer units.
Step 6: Set the correct projection
If you know what coordinate system (projection) your paper map is in, you should set the same projection for the layer into which you're digitizing. If you are digitizing features into an existing feature layer, you must ensure that your paper map and digital layer share the same coordinate system.
Step 7: Enable digitizing mode and begin digitizing
To digitize features, you need to enable digitizing mode to create features.
Digitizing tablets generally operate in two modes: digitizing (absolute) mode and mouse (relative) mode. When you are in digitizing mode, you can only digitize features; you cannot choose buttons, menu commands, or tools from the ArcMap user interface because the screen pointer is locked to the drawing area. In mouse mode, however, there is no correlation between the position of the screen pointer and the digitizing tablet. When digitizing, you can switch between digitizing mode and mouse mode using the Editing Options dialog box. This allows you to use the digitizer puck to digitize features as well as access user interface choices (as a substitute for the mouse). Also, you can use your mouse to choose interface elements at any time, whether your digitizer is in mouse mode or digitizing mode (Wenzhong, 2010).
You can digitize features on a paper map in two ways: point mode digitizing or stream mode digitizing (streaming). You can switch back and forth between the two modes as you digitize by pressing F8.
When you start a digitizing session, the default is point mode. With point mode digitizing you convert a feature on a paper map by digitizing a series of precise points, or vertices. ArcMap connects the vertices to create a digital feature. Point mode digitizing works the same way with a digitizer as with screen digitizing with the construction tools; the only difference is that with the digitizer you are converting a feature from a paper map using a digitizer puck instead of a mouse.
When streaming, ArcMap automatically adds vertices at an interval as you move around the map. You might want to stream when creating a curved line, such as a river. Streaming, or stream mode digitizing, is commonly used with a digitizing tablet but can be used simply with a mouse.
To begin digitizing in stream mode, right-click and click Streaming when creating features. You can also press the F8 key to enter stream mode. If you click the map, streaming is suspended. This allows you to click buttons, menus, and other user interface elements. This means you can right-click to access the shortcut menu, enabling you to place a vertex using Absolute X,Y, Delta X,Y, or any of the other commands on that menu. Click the map again to start streaming. To exit from stream mode entirely, right-click and click Streaming or press F8.

References
Davis, B. (2001). GIS: A visual Approach. Canada: OnWord Press.

Murayama, Y. (2012). Progress in Geospatial Analysis. Tokyo: Springer Publishers.

Wenzhong, S. (2010). Principles of Modeling Uncertainties in Spatial Data and Spatial Analyses. USA: CRC Press.

A Supportive material prepared by

Ngogo MN.
Geography Dept.
St. Augustine University of Tanzania


Friday, January 2, 2015

THE GEOGRAPHICAL INFORMATION SYSTEM DATABASE



THE GEOGRAPHICAL INFORMATION SYSTEM DATABASE

5.1 The GIS Database
·        The GIS data base is the collection of graphic (spatial) and non graphic data organized in a particular structure

5.1.2. Types of database

There are four main structural types of database management systems: hierarchical, network, relational, and object-oriented. 



1. Hierarchical Databases




Hierarchical Databases (DBMS),
·        It is one of the oldest methods of organizing and storing data, and it is still used.
·        A hierarchical database is organized in pyramid fashion, like the branches of a tree extending downwards.
·        Related fields or records are grouped together so that there are higher-level records and lower-level records, just like the parents in a family tree sit above the subordinated children.
·        Based on this analogy, the parent record at the top of the pyramid is called the root record.
·        A child record always has only one parent record to which it is linked, just like in a normal family tree.
·        In contrast, a parent record may have more than one child record linked to it.
·        Hierarchical databases work by moving from the top down.
·        A record search is conducted by starting at the top of the pyramid and working down through the tree from parent to child until the appropriate child record is found.
·        Furthermore, each child can also be a parent with children underneath it.
·        The advantage of hierarchical databases is that they can be accessed and updated rapidly because the tree-like structure and the relationships between records are defined in advance.
·        The disadvantage of this type of database structure is that each child in the tree may have only one parent, and relationships or linkages between children are not permitted, even if they make sense from a logical standpoint.
·        Hierarchical databases are so rigid in their design that adding a new field or record requires that the entire database be redefined.
 


2.  Network Databases


 
·        Network databases are similar to hierarchical databases by also having a hierarchical structure.
·        There are a few key differences, however. Instead of looking like an upside-down tree, a network database looks more like a cobweb or interconnected network of records.
·        In network databases, children are called members and parents are called owners. The most important difference is that each child or member can have more than one parent (or owner).
·        Since more connections can be made between different types of data, network databases are considered more flexible.
·        However, two limitations must be considered when using this kind of database.
·        Similar to hierarchical databases, network databases must be defined in advance.
·        There is also a limit to the number of connections that can be made between records. 

3. Relational Databases




·        In relational databases, the relationship between data files is relational, not hierarchical.
·        Hierarchical and network databases require the user to pass down through a hierarchy in order to access needed data.
·        Relational databases connect data in different files by using common data elements or a key field.
·        Data in relational databases is stored in different tables, each having a key field that uniquely identifies each row.
·        Relational databases are more flexible than either the hierarchical or network database structures.
·        In relational databases, tables or files filled with data are called relations, tuples designates a row or record, and columns are referred to as attributes or fields.
·        Relational databases work on the principle that each table has a key field that uniquely identifies each row, and that these key fields can be used to connect one table of data to another.
·        Thus, one table might have a row consisting of a customer account number as the key field along with address and telephone number.
·        The customer account number in this table could be linked to another table of data that also includes customer account number (a key field), but in this case, contains information about product returns, including an item number (another key field).
·        This key field can be linked to another table that contains item numbers and other product information such as production location, color, quality control person, and other data.
·        Therefore, using this database, customer information can be linked to specific product information.
·        The relational database has become quite popular for two major reasons.
·        First, relational databases can be used with little or no training.
·        Second, database entries can be modified without redefining the entire structure.
·        The downside of using a relational database is that searching for data can take more time than if other methods are used.

4. Object-oriented Databases (OODBMS)
·        Object-oriented databases represent a significant advance over their other database cousins.
·        It is able to handle many new data types, including graphics, photographs, audio, and video.
·        Hierarchical and network databases are all designed to handle structured data; that is, data that fits nicely into fields, rows, and columns.
·        They are useful for handling small snippets of information such as names, addresses, zip codes, product numbers, and any kind of statistic or number you can think of.
·        On the other hand, an object-oriented database can be used to store data from a variety of media sources, such as photographs and text, and produce work, as output, in a multimedia format.
·        Object-oriented databases use small, reusable chunks of software called objects.
·        The objects themselves are stored in the object-oriented database.
·        Each object consists of two elements: 1) a piece of data (e.g., sound, video, text, or graphics), and 2) the instructions, or software programs called methods, for what to do with the data.
·        The instructions contained within the object are used to do something with the data in the object.
·        For example, test scores would be within the object as would the instructions for calculating average test score.
·        Object-oriented databases have two disadvantages. First, they are more costly to develop.
·        Second, most organizations are reluctant to abandon or convert from those databases that they have already invested money in developing and implementing.
·        However, the benefits to object-oriented databases are compelling. The ability to mix and match reusable objects provides incredible multimedia capability.
·        Healthcare organizations, for example, can store, track, and recall CAT scans, X-rays, electrocardiograms and many other forms of crucial data.

5.2 Elements of the GIS Database
The elements of GIS databases are the graphic and non graphic coded and stored in the GIS system.
·        Graphical data are the spatial description of the feature and
·        Attribute data are the quality or characteristics of the graphic data, the feature.
Graphic data elements are points, lines polygon and symbols that are used to depict maps and other graphic features.



·        Other graphic elements are shown below:




·        A point is a zero dimensional object that specifies a geometric location through a set of co-ordinates.
·        A node is a special type of point, a zero dimensional object that is a topological junction or end point and may specify a geometric location.
·        A pixel -(picture element) is a two-dimensional picture element that is a smallest indivisible element of an image.
·        A line is a one-dimensional object. A line segment is a direct line between two points.
·        A polygon is a continuous two dimensional object bounded by line segments.
·        Symbols are graphic elements that reprint features at points on map.

II) Non-graphic data
None-graphic   data   are   representations   of the   characteristic   qualities   or relationship of map or graphic features. Non-graphic data include:
   Attributes
   Topology

a) Attributes
·        Attributes are the descriptive information about a map or graphic feature.
·         For example attributes for the town plot could include, plot number, size, name incidents that occurs on it such as ownership, building permits, surveys or price etc.
·        Geographic indexes are special types of attributes. These are codes given to a certain features to distinguish them from another feature.
·         Geographic indexes are also called identifiers (id). Geographic indexes include street address, box number, parcel number or an arbitrary assigned number such as polygon 1, polygon 2 etc.

·        b) Topology
·        Topologies are data that indicate the spatial relationships between one geographical phenomenons to another.
·        The topological data include connectivity, adjacency, proximity, and containment.
·        This information are given in the database because computer cannot recognize that line 1 is connected to line 2, polygon C is contained in polygon D or feature E is adjacent to feature F unless these spatial relationships are clearly defined.
·        Such systems are called raster. In those systems which store graphic and non-graphic data separately, the data are linked together during analyses or display.

5.3 Spatial Data Structure
There are two ways of graphic/spatial data representation techniques: vector and raster.

Vector
The element of the database that constitutes the points, lines, polygons, anaotation, nodes, vertex, arc and strings is a vector representation.
·        With vector representation, the boundaries or the course of the features are defined by a series of points that, when joined with straight lines, form the graphic representation of that feature.
·        The points themselves are encoded with a pair of numbers giving the X and Y coordinates in systems such as latitude/longitude or Universal Transverse Mercator grid coordinates.
The attributes of features are stored in the database management (DBMS) software program

Vector data representation


Raster
·        The raster system do not defines feature rather the area is subdivided into a fine mesh of grid cells.
·        The cells record the condition or attribute of the earth's surface at that point.
·        Each cell is given a numeric value which may then represent a feature identifier, a qualitative attribute code or a quantitative attribute value.
·        For example, a cell could have the value "6" to indicate that it belongs to District 6 (a feature identifier), or that it is covered by soil type 6 (a qualitative attribute), or that it is 6 meters above sea level (a quantitative attribute value).
·        Although the data we store in these grid cells do not necessarily refer to phenomena that can be seen in the environment.
·        The data grids themselves can be thought of as images or layers, each depicting one type of information over the mapped region. 

Raster versus Vector

 RASTER DATA DISPLAY                                                VECTOR DATA DISPLAY

·        Raster systems are typically data intensive (although good data compaction techniques exist) since they must record data at every point.
·        Raster systems have substantially more analytical power than their vector counterparts in the analysis of continuous space.
·        Raster is suited to the study of data that are continuously changing over space such as terrain, vegetation biomass, rainfall and the like.
·        Raster systems tend to be very rapid in the evaluation of problems that involve various mathematical combinations of the data in multiple layers (modeling).
Vector versus raster
·        Vectors systems are quite efficient in their storage of map data because they only store the boundaries of features and not that which is inside those boundaries.
·        Vector systems usually allow one to roam around the graphic display with a mouse and query the attributes associated with a displayed feature, such as the distance between points or along lines, the areas of regions defined on the screen, and so on.
·        In addition, they can produce simple thematic maps of database queries.
·        Vector systems do not have as extensive a range of capabilities for analyses over continuous space.
·        They excel at problems concerning movements over a network and can undertake the most fundamental of GIS operations.
·        Raster and vector systems each have their special strengths and weakness the current GIS systems tend to combine both


5.4. Data source and input
·        Spatial data and associated attribute are collected from different sources
·        The features described as the geographic data and their sources are the following:
·        Geodetic control points from GPS or plane surveys
·        Planementric features such as roads, buildings, water bodies, and rivers, utility poles obtained from aerial photographs or maps.
·        Topographic features such as benchmark points, digital elevation models from surveyed data and remote sensing.
·        Cadastral features which are boundaries of ownership or rights, or any other enclosed area established by legal or administration definition, such as administrative units (wards, districts, regions, countries) and statistical units or electoral units.
·        Facility features such as water supply networks, gas supply networks, sewer, electricity networks
·        Natural features such as soils, vegetation, water bodies, rock types from surveys, remote sensing.

5.4.1 Data input methods
·        GIS requires the availability of digital data before analysis and production operations are performed
·        To create a digital database you need to convert a full set of existing data into a digital format or compile a new set of data from the field or remote sensing
·        The procedure of converting non digital data into digital form is called digitizing and the processes of importing the data in the GIS system is called data input or coding.

·        There are several techniques of data input used to develop a GIs database.
        1.  Manual digitizing
·        The data that need to perform manual digitizing are from a hard copy map, hardcopy aerial photography and or hard copy image.
·        The hardcopy to be digitized is mounted on the digitizing table or a digitizer
·        Each of the map features is traced by pointing the stylus and pushing an appropriate button.
·        The scale , coordinate areas coverage of a map and topological data of features are registered in the computer system

        2. Photogrammetric digitizing
·        It used to compile new maps from arterial photographs
·        The table in the manual digitizing is replaced by a photogrammetric instrument
·        Photogramentric digitizing is used often to record digital plan metric features and elevation data from stereo photographs.

        3. Scanning
·        Scanning use optical laser or other electronic devices to scan map and convert them into digital format
·        Most of the scanners produce bit maps (bmp) or raster images
·        Scanning is very quick and less costly.
·        GIS software can coverts raster images into vector files.

        4. Co-ordinate geometry
·        Coordinate geometry is software that convert the booked surveyed data to produce maps.
·        The geometric description of the feature (point of origin, bearing, leg distances) is types in the computer
·        The software (COGO) the generate coordinates that are stored to generate maps
·        The technique is usually used for surveyed data.

        5. Key entry and onscreen digitizing
·        Non graphic data and annotation is commonly entered through typing using a keyboard
·        Some systems allow onscreen digitizing using a mouse.

        6. Digital data transfer
·        Existing data files are common source of attribute and graphic data for GIS
·        Use of existing data often requires translation from the original format to the receiving GIS system format.

        7. Remote sensing and GPS data collection
·        Digital images from remote sensing are directly imported into GIS and are important source of graphic data.
·        Remote sensing provide to GIS the graphic data and non graphic data (such as coordinates, quality of attribute etc).
·        The Global position system (GPS) the satellite based system for collection of location information also is used to collect GIS data from the field.

5.5 Sources of errors and quality of spatial data
Errors in data may derive from three main sources:
1.       errors in the source data;
2.       errors introduced during encoding; and
3.       Errors propagated during data transfer and conversion.

1.      Error in the Source Data
·        Errors in source data may be difficult to identify.
·        There may be subtle errors in a paper map used for digitiz­ing that may have resulted during the creation of paper map.
·        There may be printing errors in paper- used to create a paper map.

2.      Errors introduced during encoding
·        During encoding a range of errors can be intro­duced.
·        During keyboard encoding an operator can make a typing mistake.
·        During digitizing an operator may encode the wrong line;
·        Folds and stains can easily be scanned and mistaken for real geographical features.
3.           Errors propagated during data transfer and conversion.
·        During data transfer, con­version of data between different formats required by different packages may lead to a loss of data.

1.     Error Detection in attribute data
·        Errors in attribute data are relatively easy to spot and may be identified using manual compari­son with the original data.
·        For example, if the operator notices that a hotel has been coded as a cafe, then the attribute database may be corrected accordingly.
·        Various methods, in addition to manual comparison, exist for the correction of attribute errors.
·        These methods include:
Methods of attribute data checking
Several methods may be used to check for errors in the encoding of attribute data. These include:
1 Impossible value. Simple checks for impossible data values can be made when the range of the data is known. Data values falling outside this range are obviously incorrect. For example, a negative rainfall measurement is impossible, as is a slope of 100 degrees.

2  Extreme values. Extreme data values should be cross-checked against the source document to see if they are correct. An entry in the attribute database that says the Mountain View Hotel has 2000 rooms’ needs to be checked. It is more likely that this hotel has 200 rooms and that the error is the result of a typing mistake.

3 Internal consistency. Checks can be made against summary statistics provided with source documents where data are derived from statistical tables. Totals and means for attribute data entered into the GIS should tally with the totals and means reported in the source document. If a discrepancy is found, then there must be an error somewhere in the attribute data.

4 Scattergrams. If two or more variables in the attribute data are correlated, then errors can be identified using scattergrams. The two variables are plotted along the x and y axes of a graph and values that depart noticeably from the regression line are investigated. Examples of correlated variables from Happy Valley might be altitude and temperature, or the category of a hotel and the cost of accommodation.

5. Trend surfaces. Trend surface analyses may be used to highlight points with values that depart markedly from the general trend of the data. This technique may be useful where a regional trend is known to exist. For example, in the case of Happy Valley most ski accidents occur on the nursery slopes and the general trend is for accidents to decrease as the ski piste becomes more difficult. Therefore, an advanced piste recording a high number of accidents reflects either an error in the data set or an area requiring investigation.

2.     Error in Spatial data
·        Errors in spatial data are often more difficult to identify and correct than errors in attribute data.
·        These errors take many forms, depending on the data model being used (vector or raster) and the method of data capture.
·        Some of error in spatial data are:
o   A spatial error may arise if a meteorological station has been located in the wrong place,
o   A forest polygon has been wrongly identified during image processing
o    A railway line has been erroneously digitized as a road.
o     Most GIS packages provide a suite of editing tools for the identi­fication and removal of errors in vector data.

·        Corrections can be done interactively by the oper­ator 'on-screen', or automatically by the GIS software.
·         However, visual comparison of the digi­tized data against the source document, either on paper or on the computer screen, is a good start­ing point.
·         It is important for data to be absolutely correct if topology is to be created for a vector data set.


Examples of spatial error in vector data  
 Errors will also be present in raster data.
·        Missing entities and noise are particular problems. Data for some.
·        Noise may be independently added to the data, either when data were collected or during processing.
·        This noise often shows up as scattered pixels whose attributes do not conform to those neighboring pixels.
·        Filtering, a technique we learned in digital image processing can be used to remove these artifacts.


6.0 Spatial Analysis and modeling
·        The analytical characteristics of GIS categorized in two ways 1) the tools that GIS provides 2) the operations that GIS allows.
·        Regardless of data structure raster or a vector system, there are four basic groups of tools and that three basic operations

6.1 Analytical Tools
6.1.1. Database Query
This group of tools is fundamental tool of analysis of GIS and these tools work both with the traditional database and geographical analysis
The database query simply asks questions about the currently-stored information.

Types of database queries:
1). Query by attribute. e.g. What section of river has high level of coli form?
2) Query by location? e.g.   What quantity of sediments is derived from this location?
3) . Complex or condition queries e.g. show me all wetlands that are larger than 1 hectare and that are adjacent to industrial lands.

Vector and Raster analyses 
           Examples of query by attribute query
What ward in Arumeu District Has More Women than Men?
Arc GIS Tool= Query by attribute

Example of query by location
What are land covers are found in Nelson Mandela Campus?
Arc GIS Tool= Clip