IIS Logfiles Go In - SQLite Databases Come Out

Posted by Tom on 2014-03-31 22:14

You know shit's gotten real when you need to start trawling through IIS logs. After some flailing around in Excel I got frustrated and hacked up some quick code to parse the log file and turn it into an SQLite database.

import sys
import sqlite3 

def go(in_file, out_filename):
    con = sqlite3.connect(out_filename)
    cur = con.cursor()
    fields = []
    current_table_index = 0

    for line in in_file:
        if line.endswith('\n'):
            line = line[:-1]
        if line[0] == '#':
            if line[:9] == '#Fields: ':
                # parse field list
                new_fields = line[9:].split(' ')
                new_fields = filter(lambda x: len(x) > 0, new_fields)
                # different layout for this section? create a new table
                if not fields == new_fields:
                    current_table_index = current_table_index + 1
                    fields = new_fields

                    fields_sql = ', '.join(map(lambda x: "'" + x + "' text", fields))
                    create_table_sql = 'CREATE TABLE iis_log{0} ({1})'.format(current_table_index, fields_sql)
            values = line.split(' ')
            values_sql = ', '.join(map(lambda x: 'NULL' if x == '-' else "'" + x + "'", values))
            cur.execute('INSERT INTO iis_log{0} VALUES ({1})'.format(current_table_index, values_sql))


if __name__ == '__main__':
    in_filename = sys.argv[1]
    out_filename = in_filename + '.db' if len(sys.argv) < 3 else sys.argv[2]
    in_file = open(in_filename, 'r')
    go(in_file, out_filename)

If the code encounters a new #Fields line on its travels it will compare it to the previous list of fields and create a new table if they're different. I don't know if that's actually possible - maybe if someone changes the log format and restarts the app pool? - but it couldn't hurt to cover the possibility. Beyond that it's as done as it needs to be. The fact that everything is stringly typed is not as much of an issue as it could be since the dates and times are in a rational format and sort naturally as strings.