Wednesday, March 18, 2015

Measuring Time

My previous post, on sychronized tasks, started me thinking - for long running processes, it would be nice if we had some way of estimating how much time is required or what time a process might complete. Given that most of my development is in Bash, I cobbled together a tool that will do some rough calculations and give that information. In my sample version (shown below) a few of the variables are hard-coded, but if you set up process logging, you should be able to pull most of your information from there. On to the code...

REC_TOTALD="100000"
STARTED=$(date -d '2015-03-17 17:30:00' '+%F %T.%N %Z')

diff () {
    printf '%s' $(( $(date -u -d"$CURRENT" +%s) -
        $(date -u -d"$STARTED" +%s)))
}

REC_FAILED=`grep FAILURE output.log | wc -l`
REC_PASSED=`grep SUCCESS output.log | wc -l`
REC_COMPLT=$((REC_FAILED + REC_PASSED))
REC_PERCNT=$((REC_COMPLT * 10000 / REC_TOTALD)) # percent as 4 decimals

CURRENT=$(date '+%F %T.%N %Z')
MINUTES=$(( $(diff) / 60 ))
EST_TIME=$((( $(diff) * 10000 / 60 ) / REC_PERCNT))
ETA=$(date -d "$STARTED+$EST_TIME minutes" '+%F %T')

echo ""
echo "   STARTED: $(date -d "$STARTED" '+%F %T')"
echo "   CURRENT: $(date -d "$CURRENT" '+%F %T')"
echo ""
printf "    QUEUED: %6d records\n" $REC_TOTALD
printf "    FAILED: %6d\n" $REC_FAILED
printf "    PASSED: %6d\n" $REC_PASSED
printf " COMPLETED: %6d\n" $REC_COMPLT
echo ""
echo "   PERCENT: $((REC_PERCNT / 100)).$((REC_PERCNT - ((REC_PERCNT / 100) * 100)))%"
echo "   ELAPSED: $((MINUTES / 60)):$((MINUTES % 60))"
echo "       ETR: $((EST_TIME / 60)):$((EST_TIME % 60))"
echo "       ETA: $ETA"
echo ""

Now a little explanation so you know what I'm doing here. First, I'm setting the number of items I'm going to process - in this case 100K. When each item in the queue is processed, I log its completion with a status that says either SUCCESS or FAILURE. The simple mathematics need no explanation, but I should note that since I'm writing a shell script that it doesn't do floating point, so everything is multiplied by a large enough number to make my precision available in integer form. This will basically output something like this...

   STARTED: 2015-03-17 07:30:00
   CURRENT: 2015-03-18 17:00:00

    QUEUED: 100000 records
    FAILED:    100
    PASSED:  99000
 COMPLETED:  99100

   PERCENT: 99.10%
   ELAPSED: 23:30
       ETR: 00:43
       ETA: 2015-03-18 17:43:00

Simple and relatively precise. Of course it's operating under the assumption that each action takes roughly the same amount of time...which it may or may not. There are a lot of variables that go into that assumption - CPU load being one of the most important.

If you decide to use this, feel free to clean up the script - you may want to use printf for the ELASPED and ETR output to zero-pad the minutes and seconds, for example. You might also want to extend it by adding the ability to pass a command-line argument that contains the process name and have it look up the process, get the elapsed time and calculate the start time using something like ps -eo pid,cmd,etime and the diff function that's in the example - of course you'd want to modify the function too so that it will accept parameters and then modify the diff calls in the remainder of the example to pass in the parameters...and of course you'd want to be able to pass in the number of records somehow - maybe by passing in a process log file that has the number of items in the queue...and possibly contains the start time - or you could read the create time of the log file, either negating the need to pull the time using ps. The possibilities abound for crafting a tool that is a nearly perfect fit for your situation.

Once you've built your tool, you can then use it to keep everyone up-to-date on just when you believe the process will be done. They can then be free to go about their business and not bother looking for an email or checking in - all of that reduces multi-tasking, which we all know is a good thing.

Happy coding.

No comments:

Post a Comment