
Upcoming Changes to the Geocoder
Data Harvest: Upcoming Changes to the Geocoder
In the upcoming release of the Indiana Geographic Information Office's statewide geocoder, utilizing data collected during the 2023 data harvest program, significant updates will be made to the geocoder, including changes to data formatting locator styles. After extensive testing throughout the first half of 2024, we have learned of many updates and configuration changes that enable the geocoder to return the most accurate result possible. Here, we will inform you about these upcoming changes, share lessons learned, and explain the improvements.
Overview of the Geocoder
The IGIO's statewide geocoder is a key service resulting from the Data Harvest program. Each year, through a collaborative project with the Indiana Geographic Information Council (IGIC) and Indiana's local governments, we collect address point, street centerline, parcel boundary, and administrative boundary GIS data from all 92 counties. These datasets are standardized and aggregated to the statewide level before being shared with the public on IndianaMap. Two of these datasets, address points and road centerlines, are especially crucial for the statewide geocoder tools.
A geocoder is a GIS-centric tool that converts a text-based address (e.g., 100 N Senate Ave, Indianapolis, IN 46204) to X/Y coordinates, or to a point in GIS data representing that coordinate pair. Geocoders can perform reverse geocoding operations, where coordinates are entered into the geocoder and the address, or nearest address, to those coordinates is returned. Geocoders can be used to convert one address at a time in search tools and other solutions or can bulk process tabular address data and generate larger GIS data sets from the input data.
Geocoders utilize reference data and scoring algorithms to determine the "score" of candidate addresses, or how similar the candidates are to the input address. A geocoder can have a minimum score to be considered a candidate, and a minimum score to match a candidate to the address. Increasing these values means that the reference data must match the input more closely to fully geocode a given address.
The statewide geocoder enables users to search for and locate addresses or utilize the API for a variety of applications, offering both one-off and bulk address lookup capabilities. Multiple state agencies, including the Bureau of Motor Vehicles (BMV), Family and Social Services Administration (FSSA), and the Indiana Office of Technology (IOT), have integrated the geocoder web service into various applications. Some local governments have also integrated the statewide locator into address search tools and in their public-facing web maps. The public utilizes the geocoder as a primary address search tool on the IndianaMap Map Viewer and in 2023, the state’s geocoder web service processed over 17 million requests, showcasing its widespread usage.
Previous Iterations and Lessons Learned
The previous version of the statewide geocoder was built as a composite locator utilizing 3 primary datasets:
- Address points from the Data Harvest program.
- Street centerlines from the Data Harvest program.
- Street centerlines from the US Census Bureau's TIGER/Line program.
In this composite setup, if an address could not be matched to an address point, it would fall back in attempt to match the street centerlines, and if it failed again, it would fall back to the TIGER street centerlines.
During testing in the first half of 2024, we noticed many instances where the street centerline data contained a better address match with a higher score than the address points. However, because the address points were first in the composite locator order and all the candidates still met the address point minimum match score, that candidate would be chosen as a match. Typically, this behavior was observed when an address matched the correct house number and the street name but was placed in the wrong municipality zip code, or even county due to the result order and prioritization.
Additionally, we found that many addresses on certain types of roads commonly spelled or written in different ways were not being matched at all. For example, addresses on County Roads are sometimes abbreviated to "Co Rd", or "CR". And sometimes, the road type is not included in the full address when entered (such as 1500 N 400 W, which could be written as 1500 N County Road 400 W).
Upcoming Changes and Enhancements
The upcoming geocoder release will modify how candidates are handled between the Data Harvest address points and street centerlines, include alternate street names for certain road name types, integrate alternate street names or street name aliases for counties that provided street name alias tables in the 2023 Data Harvest cycle, and add information regarding tax districts and parcels to the output results.
A recent geocoder style option created by Esri is the multirole locator. Previously, a single locator built in ArcGIS Pro could only use one primary reference dataset and associated alternate tables. To combine multiple datasets into one geocoder, users had to create composite locators that fall back to other dataset if a candidate did not match to the first dataset based on the locator order. The multirole locator allows multiple reference datasets to participate in the locator simultaneously.
For the Indiana statewide geocoder, this means any address can match to either the Data Harvest address points or street centerlines depending on what candidate has a higher score, rather than falling back to the street centerlines of an address point candidate is not matched. This configuration has resolved many issues where a less accurate address point candidate was chosen over a higher scoring street centerline candidate simply because the address point locator was first in the composite geocoder order.
Moving from a composite locator with three separate locators to a composite locator with a multirole locator and an additional separate locator for TIGER data will cause results to return different information from previous versions. In the past, if an address matched a Data Harvest address point, the "Loc_Name” field (indicating which specific geocoder from the composite locator was used to match the address) would return "IDSI_Address". If matched to a Data Harvest street centerline, it would return "IDSI_Street".
Since the multirole locator combines both address points and street centerlines into a single locator, the value ISI multi rule will be returned. For users wanting to know if a matched address was from an address point or street centerline, check the "Add Type" field. If matched to an address point, this field will have the value "PointAddress" or “Subaddress”. If matched to a street centerline, it will have the value "StreetAddress", "StreetInt", "StreetAddressExt", or "StreetName", depending on how the address matched the street centerline. See this page for information about the values returned by the geocoder and what they mean.
To resolve issues with how certain road names/types may be entered into the geocoder, we created a comprehensive alternate street name table. Each record corresponding to an address point or street centerline is linked to its corresponding alternate street names through the NGUID (NENA Globally Unique ID) field. Using Python, we iterated through each record of the data, queried certain street types (country roads, highways, state highways, state roads, state routes, and United States highways), and created a record for each address and potential street type spelling/configuration. This new table was integrated as an alternate street name table in the multirole geocoder and will be included in the locator files that the IGIO will make available for download soon.
Feedback and Involvement
We encourage users to provide feedback on the geocoder and its recent updates. If you use the statewide geocoder and would like to provide feedback or ask questions, e-mail us at DataHarvest@iot.in.gov.
We are committed to continuous improvement in user satisfaction. Given the power of free-to-use statewide geocoder services, improving the accuracy, efficiency, and stability of the geocoder is a high priority for us. As the Data Harvest continues to progress and focus on data standardization and improvement across Indiana, ensuring that improved data quality is integrated into the geocoder is also key.
The current statewide geocoder is available from https://gisdata.in.gov/server/rest/services/Geocode/State_Geocoder_WGS84/GeocodeServer.