Let's say you work for a company that uses Splunk. While I think Splunk is a wonderful tool, sometimes it doesn't really give you the data you need - or it's not in the form that you need. To make matters a little more complicated, the Splunk interface can be a little arcane. There are a couple of things you can do in this situation - either you can train all your engineers how to interface with Splunk or you can write ReST APIs that act as middleware and are geared directly to your business needs. Just so there is no doubt about my position I will be clear and say that the middleware route is the route to choose.
Let's take, as a hypothetical, that I run a content-delivery company and want to show throughput information for cities around the world. Splunk is fed information from network operations and now I have to pull it out to display it. The problem is that it is unlikely that the data is in the form I need, the feed is most likely raw and the data is probably being used for other purposes as well. Splunk can slice and dice the data, and can give you a range of formats - XML and JSON, for example - but your front-end engineers are not terribly likely to want to parse all the data that comes back. It also doesn't make sense to do much parsing in the user-agent, for a variety of reasons.
Given our hypothetical situation and my best advice based on direct experience, you create a nodejs server app using the Splunk SDK - by the way, don't try to create an entire Splunk middleware app without using the SDK, it's just not worth it - and in a few relatively short lines you're able to run the search you want against the Splunk data. Great, so now what do you do with the results? There's a ton of data in there that you don't need and a few pieces that you do need are missing.
Since you're using nodejs, you'll want to create a JS object (so you can send JSON back in the server responses) and populate it using the data you got back from Splunk. One of the problems you'll likely run into is that when you're populating your object, you're referencing data that is deep within the source (Splunk) object. This is actually a fairly common problem - and one that will pretty consistently come up, especially when transforming from XML to a JS object, because while an XML-Schema allows for a minOccurs="0", JavaScript isn't quite so forgiving. Let's say you have code that looks something like Example 1 in your transformation method.
Example 1
data.push({
country:splunk[index1].location.country,
city:splunk[index1].location.city,
bps:(splunk[index1].event[index2].packet.transferred*splunk[index1].event[index2].packet.size)/
&bnsp; (splunk[index1].event[index2].durations.dns +
&bnsp; splunk[index1].event[index2].durations.connect +
&bnsp; splunk[index1].event[index2].durations.latency),
loss:splunk[index1].event[index2].packet.lost*splunk[index1].event[index2].packet.size
});
country:splunk[index1].location.country,
city:splunk[index1].location.city,
bps:(splunk[index1].event[index2].packet.transferred*splunk[index1].event[index2].packet.size)/
&bnsp; (splunk[index1].event[index2].durations.dns +
&bnsp; splunk[index1].event[index2].durations.connect +
&bnsp; splunk[index1].event[index2].durations.latency),
loss:splunk[index1].event[index2].packet.lost*splunk[index1].event[index2].packet.size
});
The code will run without a problem as long as all of those properties are in the array element referenced by splunk[index], but what if the event[index2] array doesn't contain the durations element because it's a different kind of event, for example? I'll tell you what will happen - an error will be thrown when the code attempts to determine a sub-property (e.g., dns, connect, or latency) of a property that is null or undefined.
How do you fix this without using an ever-increasing if statement...something that begins to look like Example 2?
Example 2
if (splunk[index1] && splunk[index1].location && splunk[index1].location.country && splunk[index1].location.city && ... splunk[index1].event[index2].durations) {
Here's how — use a function to wrap object references.
Example 3
/**
* Returns the data in an object property using a dot-notation expression.
* @return {string}
* @param {string} path
* @param {string} jso
*/
function deepDive(path, jso) {
var index
, keys = path.split('.')
, n
, obj = jso
;
for (index = 0; index < keys.length; index += 1) {
n = /\[(\d{1,})\]/.exec(keys[index]);
if (n) {
obj = obj[keys[index].replace(n[0], '')];
if (obj && obj.length > n[1]) {
obj = obj[n[1]];
} else {
// invalid array reference
obj = null;
}
} else {
obj = obj[keys[index]];
}
if (!obj) {
break;
}
}
return (obj || null);
}
* Returns the data in an object property using a dot-notation expression.
* @return {string}
* @param {string} path
* @param {string} jso
*/
function deepDive(path, jso) {
var index
, keys = path.split('.')
, n
, obj = jso
;
for (index = 0; index < keys.length; index += 1) {
n = /\[(\d{1,})\]/.exec(keys[index]);
if (n) {
obj = obj[keys[index].replace(n[0], '')];
if (obj && obj.length > n[1]) {
obj = obj[n[1]];
} else {
// invalid array reference
obj = null;
}
} else {
obj = obj[keys[index]];
}
if (!obj) {
break;
}
}
return (obj || null);
}
This will allow you to set properties in your object using dot-notation without throwing errors when a child of a null or undefined property is referenced, and it does so without substantially negatively impacting your code.
Example 4
data.push({
country:deepDive('location.country', splunk[index1]),
city:deepDive('location.city', splunk[index1]),
bps:((deepDive('event['+index2+'].packet.transferred', splunk[index1]) || 0) *
(deepDive('event['+index2+'].packet.size', splunk[index1]) || 0)) /
Math.max(((deepDive('event['+index2+'].durations.dns', splunk[index1]) || 0) +
(deepDive('event['+index2+'].durations.connect', splunk[index1]) || 0) +
(deepDive('event['+index2+'].durations.latency', splunk[index1]) || 0)), 1)
loss:(deepDive('event['+index2+'].packet.lost', splunk[index1]) || 0) *
(deepDive('event[+index2+'].packet.size, splunk[index1]) || 0)
});
country:deepDive('location.country', splunk[index1]),
city:deepDive('location.city', splunk[index1]),
bps:((deepDive('event['+index2+'].packet.transferred', splunk[index1]) || 0) *
(deepDive('event['+index2+'].packet.size', splunk[index1]) || 0)) /
Math.max(((deepDive('event['+index2+'].durations.dns', splunk[index1]) || 0) +
(deepDive('event['+index2+'].durations.connect', splunk[index1]) || 0) +
(deepDive('event['+index2+'].durations.latency', splunk[index1]) || 0)), 1)
loss:(deepDive('event['+index2+'].packet.lost', splunk[index1]) || 0) *
(deepDive('event[+index2+'].packet.size, splunk[index1]) || 0)
});
By using the deepDive method shown in Example 3, you've made your code more robust and fault tolerant, and avoided the bends.
One last note – if you'd rather use simple XPath notation than dot notation, in Example 3, change keys = path.split('.') to keys = path.split('/') on line 9, obj.length > n[1] to obj.length > n[1] - 1 on line 18, and obj = obj[n[1]]; to obj = obj[n[1] - 1]; on line 19 - these last two changes are because JavaScript uses a zero-based array and XPath does not. This will not allow you to use the full XPath syntax (which you can read about online), but it will allow simple references. Also, you should feel free to build out the remaining syntax parsing using this approach...and if you do, I would appreciate it if you would post a comment here showing the changes you made.
Happy coding.
No comments:
Post a Comment