July 09, 2017
After spending over $300 in AWS g2 instances last summer, I recently upgraded my GPU so that I would not need to spend as much money running GPU-intensive machine learning code over the course of several weeks.
My solution was to create a Python file, gpu_rec.py that could be imported at the beginning of a file, record an initial time and the command used to call the file, and then record the time when the program completed. Other than the import, no additional coding is needed in the files being run.
I used PostgreSQL to log the runs, but any form of logging would work. I symbolic-linked this file to
/usr/local/lib/python2.7/dist-packages/ so it can be called from any Python file I execute from command-line on my system. This will not work if the
import gpu_rec occurs in a file that is not called via command-line.
from __future__ import print_function #change this if you are using a different import psycopg2 from pytz import timezone import datetime import __main__ as main import sys tz = timezone('US/Central') def get_current_time(): return( tz.localize(datetime.datetime.now())) fn = main.__file__ start_time = get_current_time() #setup your own database and user if you want to use PostgreSQL, as it is implemented here conn = psycopg2.connect(database='gpu_db',host='localhost',user='gpu_recorder',password='gpu') cur = conn.cursor() cur.execute(""" CREATE TABLE IF NOT EXISTS gpu( id BIGSERIAL, app_filename VARCHAR(256), timestamp_start TIMESTAMPTZ, timestamp_end TIMESTAMPTZ, total_minutes DOUBLE PRECISION, command_line_args TEXT )""") conn.commit() #register initial event cur.execute("""INSERT INTO gpu(app_filename, timestamp_start, command_line_args) VALUES (%s, %s, %s)""", (fn, start_time, ' '.join(sys.argv) ) ) cur.execute("""SELECT max(id) FROM gpu""") gpu_run_id = cur.fetchall() conn.commit() import atexit @atexit.register def func_atexit(): now = get_current_time() difference = (now - start_time).total_seconds()/60. cur.execute("""UPDATE gpu SET timestamp_end=%s, total_minutes=%s WHERE id=%s""", [now, difference ,gpu_run_id] ) conn.commit() conn.close() print( 'Finished GPU run #%s, from %s to %s' % (gpu_run_id, start_time.strftime('%x %X'), now.strftime('%x %X'))) print( 'TOTAL MINUTES: %.02f' % difference)