MaxMind GeoCity Lite Parser in C++

Posted by Tom on 2012-07-17 10:22

Not so long ago I was putting together a toy project that would snag all of the packets in and out of your machine, geolocate them and then plot them on a spinning globe. I'll probably bung it up on Github at some point, although it's mostly Duplo programming - nailing together various APIs and not doing any real work myself. Framework and graphics by Cinder, shoreline data loading via ShapeLib and packet capture by winpcap. Which leaves only the geolocation, and this was the only bit I ended up writing any code for.

MaxMind seem to be the provider of choice for free IP geolocation data, as well as various commercial offerings, and it's all available from their site. They provide a decent block of free data which can map IPs to cities with enough accuracy for our needs (although we appear to know less about Ukraine than we do about the moon, bizarrely). And they also have a binary version of their free data as well as a C API. How thoughtful! Except . . .

Getting the MaxMind's C API code to build under Windows is like trying to beat a grizzly bear to death with cooked spaghetti.

After spending an embarrassing amount of time flailing I just decided to write my own parser. So I ported the C# version of the API into C++ without all of the legacy support. It's all in a VS2008 project and I may dump it onto GitHub at some point in case anyone else has had the same tribulations as me. As long as you're only targetting recent versions of the GeoCity data it does the job.

Code follows!

#ifndef COLOURBLIND_GEOIP_H
#define COLOURBLIND_GEOIP_H
 
#include <string>
#include <fstream>
#include <sstream>
 
std::string COUNTRY_CODE[] = {
    "--","AP","EU","AD","AE","AF","AG","AI","AL","AM","AN","AO","AQ","AR",
    "AS","AT","AU","AW","AZ","BA","BB","BD","BE","BF","BG","BH","BI","BJ",
    "BM","BN","BO","BR","BS","BT","BV","BW","BY","BZ","CA","CC","CD","CF",
    "CG","CH","CI","CK","CL","CM","CN","CO","CR","CU","CV","CX","CY","CZ",
    "DE","DJ","DK","DM","DO","DZ","EC","EE","EG","EH","ER","ES","ET","FI",
    "FJ","FK","FM","FO","FR","FX","GA","GB","GD","GE","GF","GH","GI","GL",
    "GM","GN","GP","GQ","GR","GS","GT","GU","GW","GY","HK","HM","HN","HR",
    "HT","HU","ID","IE","IL","IN","IO","IQ","IR","IS","IT","JM","JO","JP",
    "KE","KG","KH","KI","KM","KN","KP","KR","KW","KY","KZ","LA","LB","LC",
    "LI","LK","LR","LS","LT","LU","LV","LY","MA","MC","MD","MG","MH","MK",
    "ML","MM","MN","MO","MP","MQ","MR","MS","MT","MU","MV","MW","MX","MY",
    "MZ","NA","NC","NE","NF","NG","NI","NL","NO","NP","NR","NU","NZ","OM",
    "PA","PE","PF","PG","PH","PK","PL","PM","PN","PR","PS","PT","PW","PY",
    "QA","RE","RO","RU","RW","SA","SB","SC","SD","SE","SG","SH","SI","SJ",
    "SK","SL","SM","SN","SO","SR","ST","SV","SY","SZ","TC","TD","TF","TG",
    "TH","TJ","TK","TM","TN","TO","TL","TR","TT","TV","TW","TZ","UA","UG",
    "UM","US","UY","UZ","VA","VC","VE","VG","VI","VN","VU","WF","WS","YE",
    "YT","RS","ZA","ZM","ME","ZW","A1","A2","O1","AX","GG","IM","JE","BL",
    "MF"};
 
std::string COUNTRY_NAME[] = {
    "N/A","Asia/Pacific Region","Europe","Andorra","United Arab Emirates",
    "Afghanistan","Antigua and Barbuda","Anguilla","Albania","Armenia",
    "Netherlands Antilles","Angola","Antarctica","Argentina","American Samoa",
    "Austria","Australia","Aruba","Azerbaijan","Bosnia and Herzegovina",
    "Barbados","Bangladesh","Belgium","Burkina Faso","Bulgaria","Bahrain",
    "Burundi","Benin","Bermuda","Brunei Darussalam","Bolivia","Brazil","Bahamas",
    "Bhutan","Bouvet Island","Botswana","Belarus","Belize","Canada",
    "Cocos (Keeling) Islands","Congo, The Democratic Republic of the",
    "Central African Republic","Congo","Switzerland","Cote D'Ivoire",
    "Cook Islands","Chile","Cameroon","China","Colombia","Costa Rica","Cuba",
    "Cape Verde","Christmas Island","Cyprus","Czech Republic","Germany",
    "Djibouti","Denmark","Dominica","Dominican Republic","Algeria","Ecuador",
    "Estonia","Egypt","Western Sahara","Eritrea","Spain","Ethiopia","Finland",
    "Fiji","Falkland Islands (Malvinas)","Micronesia, Federated States of",
    "Faroe Islands","France","France, Metropolitan","Gabon","United Kingdom",
    "Grenada","Georgia","French Guiana","Ghana","Gibraltar","Greenland","Gambia",
    "Guinea","Guadeloupe","Equatorial Guinea","Greece",
    "South Georgia and the South Sandwich Islands","Guatemala","Guam",
    "Guinea-Bissau","Guyana","Hong Kong","Heard Island and McDonald Islands",
    "Honduras","Croatia","Haiti","Hungary","Indonesia","Ireland","Israel","India",
    "British Indian Ocean Territory","Iraq","Iran, Islamic Republic of",
    "Iceland","Italy","Jamaica","Jordan","Japan","Kenya","Kyrgyzstan","Cambodia",
    "Kiribati","Comoros","Saint Kitts and Nevis",
    "Korea, Democratic People's Republic of","Korea, Republic of","Kuwait",
    "Cayman Islands","Kazakhstan","Lao People's Democratic Republic","Lebanon",
    "Saint Lucia","Liechtenstein","Sri Lanka","Liberia","Lesotho","Lithuania",
    "Luxembourg","Latvia","Libyan Arab Jamahiriya","Morocco","Monaco",
    "Moldova, Republic of","Madagascar","Marshall Islands",
    "Macedonia, the Former Yugoslav Republic of","Mali","Myanmar","Mongolia",
    "Macau","Northern Mariana Islands","Martinique","Mauritania","Montserrat",
    "Malta","Mauritius","Maldives","Malawi","Mexico","Malaysia","Mozambique",
    "Namibia","New Caledonia","Niger","Norfolk Island","Nigeria","Nicaragua",
    "Netherlands","Norway","Nepal","Nauru","Niue","New Zealand","Oman","Panama",
    "Peru","French Polynesia","Papua New Guinea","Philippines","Pakistan",
    "Poland","Saint Pierre and Miquelon","Pitcairn","Puerto Rico",
    "Palestinian Territory, Occupied","Portugal","Palau","Paraguay","Qatar",
    "Reunion","Romania","Russian Federation","Rwanda","Saudi Arabia",
    "Solomon Islands","Seychelles","Sudan","Sweden","Singapore","Saint Helena",
    "Slovenia","Svalbard and Jan Mayen","Slovakia","Sierra Leone","San Marino",
    "Senegal","Somalia","Suriname","Sao Tome and Principe","El Salvador",
    "Syrian Arab Republic","Swaziland","Turks and Caicos Islands","Chad",
    "French Southern Territories","Togo","Thailand","Tajikistan","Tokelau",
    "Turkmenistan","Tunisia","Tonga","Timor-Leste","Turkey","Trinidad and Tobago",
    "Tuvalu","Taiwan","Tanzania, United Republic of","Ukraine","Uganda",
    "United States Minor Outlying Islands","United States","Uruguay","Uzbekistan",
    "Holy See (Vatican City State)","Saint Vincent and the Grenadines",
    "Venezuela","Virgin Islands, British","Virgin Islands, U.S.","Vietnam",
    "Vanuatu","Wallis and Futuna","Samoa","Yemen","Mayotte","Serbia",
    "South Africa","Zambia","Montenegro","Zimbabwe","Anonymous Proxy",
    "Satellite Provider","Other",
    "Aland Islands","Guernsey","Isle of Man","Jersey","Saint Barthelemy",
    "Saint Martin"};
 
const int COUNTRY_EDITION = 1;
const int REGION_EDITION_REV0 = 7;
const int REGION_EDITION_REV1 = 3;
const int CITY_EDITION_REV0 = 6;
const int CITY_EDITION_REV1 = 2;
const int ORG_EDITION = 5;
const int ISP_EDITION = 4;
const int PROXY_EDITION = 8;
const int ASNUM_EDITION = 9;
const int NETSPEED_EDITION = 10;
const int DOMAIN_EDITION = 11;
const int COUNTRY_EDITION_V6 = 12;
const int ASNUM_EDITION_V6 = 21;
const int ISP_EDITION_V6 = 22;
const int ORG_EDITION_V6 = 23;
const int DOMAIN_EDITION_V6 = 24;
const int CITY_EDITION_REV1_V6 = 30;
const int CITY_EDITION_REV0_V6 = 31;
const int NETSPEED_EDITION_REV1 = 32;
const int NETSPEED_EDITION_REV1_V6 = 33;
 
const int COUNTRY_BEGIN = 16776960;
const int STATE_BEGIN   = 16700000;
const int STRUCTURE_INFO_MAX_SIZE = 20;
const int DATABASE_INFO_MAX_SIZE = 100;
const int FULL_RECORD_LENGTH = 100;//???
const int SEGMENT_RECORD_LENGTH = 3;
const int STANDARD_RECORD_LENGTH = 3;
const int ORG_RECORD_LENGTH = 4;
const int MAX_RECORD_LENGTH = 4;
const int MAX_ORG_RECORD_LENGTH = 1000;//???
const int FIPS_RANGE = 360;
const int STATE_BEGIN_REV0 = 16700000;
const int STATE_BEGIN_REV1 = 16000000;
const int US_OFFSET = 1;
const int CANADA_OFFSET = 677;
const int WORLD_OFFSET = 1353;
const int GEOIP_STANDARD = 0;
const int GEOIP_MEMORY_CACHE = 1;
const int GEOIP_UNKNOWN_SPEED = 0;
const int GEOIP_DIALUP_SPEED = 1;
const int GEOIP_CABLEDSL_SPEED = 2;
const int GEOIP_CORPORATE_SPEED = 3;
 
struct Location
{
    Location() : countryName("Unknown"), latitude(-200), longitude(-200) { }
 
    std::string countryCode;
    std::string countryName;
    std::string region;
    std::string city;
    std::string postalCode;
    float latitude;
    float longitude;
};
 
class GeoIp
{
public:
    GeoIp() : 
        dataFile_("GeoLiteCity.dat", std::ios::in|std::ios::binary), 
        segmentCount_(0) 
    {
 
    }
 
    GeoIp(std::string path)    :
        dataFile_(path.c_str(), std::ios::in|std::ios::binary), 
        segmentCount_(0) 
    {
 
    }
 
    ~GeoIp()
    {
        dataFile_.close();
    }
 
    bool Init()
    {
        unsigned char buffer[SEGMENT_RECORD_LENGTH];
        
        recordLength_ = STANDARD_RECORD_LENGTH;
 
        dataFile_.seekg(-SEGMENT_RECORD_LENGTH, std::ios::end);
        dataFile_.read((char *)&buffer[0], SEGMENT_RECORD_LENGTH);
        for (int i = 0; i < SEGMENT_RECORD_LENGTH; i ++)
            segmentCount_ += (int)buffer[i] << (8 * i);
 
        return true;
    }
 
    Location Lookup(std::string ip)
    {
        std::stringstream ipString(ip);
        unsigned char ipBytes[4];
        for (int i = 0; i < 4; i ++)
        {
            std::string octet;
            getline(ipString, octet, '.');
            ipBytes[i] = (unsigned char)atoi(octet.c_str());
        }
        return Lookup(ipBytes);
    }
 
    Location Lookup(unsigned char *ipBytes)
    {
        unsigned int ip = (ipBytes[0] << 24) + (ipBytes[1] << 16) + (ipBytes[2] << 8) + ipBytes[3];
        return Lookup(ip);
    }
 
    Location Lookup(unsigned int ip)
    {
        Location result;
 
        unsigned char buffer[FULL_RECORD_LENGTH];
        char *buffPtr = (char *)buffer;
        int stringLen = 0;
 
        int countryPtr = SeekCountry(ip);
        if (countryPtr < 0 || countryPtr == segmentCount_)
            return result;
 
        int recordPtr = countryPtr + (2 * recordLength_ - 1) * segmentCount_;
 
        dataFile_.seekg(recordPtr);
        dataFile_.read((char *)&buffer[0], FULL_RECORD_LENGTH);
 
        int countryIndex = (int)buffer[0];
        result.countryCode = COUNTRY_CODE[countryIndex];
        result.countryName = COUNTRY_NAME[countryIndex];
        buffPtr += 1;
 
        stringLen = strchr(buffPtr, '\0') - buffPtr;
        result.region = std::string(buffPtr, stringLen);
        buffPtr += stringLen + 1;
 
        stringLen = strchr(buffPtr, '\0') - buffPtr;
        result.city = std::string(buffPtr, stringLen);
        buffPtr += stringLen + 1;
 
        stringLen = strchr(buffPtr, '\0') - buffPtr;
        result.postalCode = std::string(buffPtr, stringLen);
        buffPtr += stringLen + 1;
 
        unsigned int lat = 0;
        for (int i = 0; i < 3; i ++, buffPtr ++)
            lat += *buffPtr << (i * 8);
        result.latitude = (float)lat / 10000 - 180;
 
        unsigned int lon = 0;
        for (int i = 0; i < 3; i ++, buffPtr ++)
            lon += *buffPtr << (i * 8);
        result.longitude = (float)lon / 10000 - 180;
 
        return result;
    }
 
private:
    int SeekCountry(unsigned int ip)
    {
        unsigned char buffer[2 * MAX_RECORD_LENGTH];
        int offset = 0;
        int x[2];
 
        for (int depth = 31; depth >= 0; depth --)
        {
            dataFile_.seekg(2 * recordLength_ * offset);
            dataFile_.read((char *)&buffer[0], 2 * MAX_RECORD_LENGTH);
 
            for (unsigned int i = 0; i < 2; i ++)
            {
                x[i] = 0;
                for (unsigned int j = 0; j < recordLength_; j ++)
                {
                    int y = buffer[i * recordLength_ + j];
                    if (y < 0)
                        y += 256;
                    x[i] += y << (j * 8);
                }
            }
 
            if ((ip & (1 << depth)) > 0)
            {
                if (x[1] >= segmentCount_)
                    return x[1];
                offset = x[1];
            }
            else
            {
                if (x[0] >= segmentCount_)
                    return x[0];
                offset = x[0];
            }
        }
 
        return -1;
    }
 
    std::ifstream dataFile_;
    unsigned int segmentCount_;
    unsigned int recordLength_;
};
 
#endif // COLOURBLIND_GEOIP_H

It's pretty hacky and has only been tested on the free GeoIP City data, but it should get you up and running a little quicker.