Find an Answer
MERGE statements are a hybrid tool for inserting or updating information to the database in situations where GoToAssist needs to make an intelligent decision about which data to insert, which to update (replace), and which data to leave unchanged. The MERGE statement identifies the node or nodes in question, and then looks for it. If it finds it, it adds any missing information, or updates any obsolete information. If it doesn't find the node, it creates it. You can use MERGE statements to add nodes that were not originally discovered.
Think of a MERGE statement as a tree of nested condition statements. Each condition statement identifies a node or a set of nodes that match the condition. Going down the tree, condition statements are applied one by one. After each condition statement is satisfied, the children of that condition statement are then applied, making the statement more and more specific.
Any single condition statement may of course match more than one node. Make sure that you construct your MERGE statement with that in mind. If you want to merge data into a single node in your PQL database, make sure that your condition statements identify only that node and filter out all others.
Caution: Be careful using MERGE statements. Because a MERGE statement replaces existing data with new data, it's possible to accidently delete or overwrite existing data by using such innocuous-looking statements as these two:
MERGE INTO / VALUES {};
MERGE @'timestamp' INTO / VALUES {};
The possible elements of an MERGE statement include:
pql_statement ::= MERGE [INTO] [@timestamp] path VALUES conditional_tree_literal timestamp ::= 'yyyy-mm-ddThh:mm:dd.uuuuuuZ'
conditional_tree_literal ::=
conditional_tree_assignment ::= {conditional_tree_assignment [,conditional_tree_assignment]}
identifier[boolean_expression] => expression
| identifier[boolean_expression] => tree_literal
| identifier => expression
| identifier => tree_literal
It is important to understand when and why the MERGE statement updates (replaces) data, and when it simply inserts data, because this statement can accidently delete your entire tree if used incorrectly.
The MERGE rules are as follows:
If there are no conditions on a node because the Crawler is reporting all known information about the node, then MERGE updates (replaces), rather than inserts this information. If the server finds no conditions, it replaces all existing nodes that have the same name. You can avoid Rule 1 by adding a condition to exclude a particular node from this rule, where the node has no other conditions to distinguish it such as the network node, by adding a TRUE condition.
The elements of an MERGE statement are defined as follows:
| MERGE [INTO] |
The MERGE statement is a hybrid between INSERT and UPDATE, for situations where the GoToAssist Crawler discovers new data and needs to make intelligent decisions about how to merge it with existing data. For example, if either the new or the old data is incomplete, then the server must decide which data to insert, which data to replace or update, and which data to keep unchanged. The INTO keyword is optional. The MERGE clause must include a path to indicate the location of the data to be added or replaced, and can include a [@timestamp] option to add or replace history. A path can be represented by an identifier or a boolean_expression. The [@timestamp] option updates history, and is represented by the string: 'yyyy-mm-ddThh:mm:dd.uuuuuuZ', where uuuuuu is the fraction of a second in microseconds. For example: 2007-09-02T16:14:52.171176Z . | |
| VALUES |
Indicates the data to be added or replaced, and must be specified with a conditional_tree_literal. A conditional_tree_literal is one or more conditional_tree_assignments. A conditional_tree_assignment can be any tree_assignment with additional conditions. |
You can try the following examples yourself against fictitious data in the Query Sandbox:
Suppose you want to add some information to the database, but are not sure whether it is actually missing or not. Some of the data is new, and some must update (replace) existing data. You could write an MERGE statement that specifies that where b is 2, a should be inserted, and where b is not 2, b should be updated to 3, like this:
MERGE INTO / values {
a[b='2'] => {
b => '3'
}
}
This statement starts with the MERGE INTO keywords, indicating your intention to insert and/or update data. Next is a slash / to represent the root node where you are taking action. The VALUES keyword follows, and then the specific values themselves.
The GoToAssist Crawler specifies information to the server in the form of a complete tree. The server then merges the data with the existing forest. The new trees include conditional paths, and the conditions specify how they should merge with the existing forest.
The resulting output might look something like this:
row:
*:
a:
b: 3
It might be helpful to compare a MERGE statement with a similar INSERT statement. The following INSERT statement simply inserts the {a => 'b'} values into the root:
INSERT INTO / values { MERGE INTO / values {
a => 'b' a[b = '2'] => {
} b => '3'
}
}
The MERGE statement inserts and/or overrides data, and the conditional path: {a[b = '2'] => {b => '3'}} indicates what should be inserted and what should be updated. In this case, where b = 2, a is inserted, and where b != 2, b should be updated to 3.
The resulting output is the same as in Example 1.
Let's use a more realistic example. Suppose the GoToAssist Crawler discovers a new device, and that it already has information about one of the interfaces on that device. The Crawler sends a statement like this to the Search Index:
MERGE INTO / values {
network[TRUE] => {
device[interface/mac_address = '00:00' or
interface/ip_address = '1.2.3.4'] => {
interface => {
name => 'eth0',
mac_address => '00:B1:B2:B3:B4:B5',
ip_address => '10.10.10.40'
}
}
}
}
The device node has a condition that specifies which existing node should be updated. As we know from Rule 2, if the condition does not match any existing nodes, then a new node is created.
The interface node has no conditions. As we know from Rule 1, nodes without conditions are updated (completely replaced). If the server finds no conditions, it replaces all existing nodes with that name.
Note that the network node has a condition of [TRUE]. That is because the top-level network node normally has no conditions. Rule 1 completely replaces nodes without conditions. So [TRUE] is used as an exception to Rule 1. If we do not use this exception, every update would replace the network node, which is equivalent to replacing the entire tree.
If this statement is run on the sample database, it produces the following results:
row:
*:
network:
device:
interface:
ip_address: 10.10.10.40
mac_address: 00:B1:B2:B3:B4:B5
name: eth0
In this example, the Crawler reports the set of running processes for a device:
MERGE INTO / values {
network[TRUE] => {
device[uuid = '1'] => {
running_processes => {
win32_process => 'foo',
win32_process => 'bar'
}
}
}
}
Note that the network node has a condition of [TRUE] to avoid (Rule 1) replacing the entire tree. This statement produces the following results:
row:
*:
network:
device:
running_processes:
win32_process: foo
win32_process: bar
In this example, the free disk space is updated, but only for the disk called '/dev/hda':
MERGE INTO / values {
network[TRUE] => {
device[uuid = '1'] => {
disk[name = '/dev/hda'] => {
free_space => 10000
}
}
}
}
This statement would produces results like this:
row:
*:
network:
device:
disk:
free_space: 10000
For nodes that are never replaced, such as log messages, we need a way to force Rule 2 and ensure new node creation. You can do that by adding a [FALSE] condition so that new nodes are added instead of being updated (replaced). For that case, you can use:
MERGE INTO / values {
log[FALSE] => {
message => 'Service started',
timestamp => '1/2/07'
}
}
This statement would produces results like this:
row:
*:
log:
message: Service started
timestamp: 1/2/07
This example merges historical values by adding a timestamp. Note that the timestamp must be in single quotes:
'2008-03-21T15:50:01.528935Z'.
Note the use of an indented format to write a good MERGE statement that can be read by human eyes. The first 5 lines of the following more complex statement are boilerplate. They identify the node and its location. From the sixth line and below, the details about that node are listed:
Note: GoToAssist timestamps are in coordinated universal time (UTC/GMT). When you view your data in your GoToAssist Web account, GoToAssist converts your data to your local time. But when you retrieve timestamps from your database as results to your searches or PQL queries, it is in UTC.
MERGE@'2008-03-21T15:50:01.528935Z' into / values {
network[true] => {
device[interface/mac_address='00:13:20:40:9A:9E'] => {
system[true] => {
meminfo[true] => {
SwapTotal[true] => '3997474816',
SwapFree[true] => '3548553216',
MemFree[true] => '40374272',
HighTotal[true] => '1198030848',
Committed_AS[true] => '1972723712',
SUnreclaim[true] => '19660800',
NFS_Unstable[true] => '0',
VmallocChunk[true] => '64999424',
Writeback[true] => '0',
MemTotal[true] => '2116120576',
VmallocUsed[true] => '47677440',
AnonPages[true] => '1147985920',
Active[true] => '1293049856',
CommitLimit[true] => '5055533056',
Cached[true] => '475234304',
SwapCached[true] => '247386112',
LowTotal[true] => '918089728',
Dirty[true] => '24576',
Mapped[true] => '58638336',
HighFree[true] => '30834688',
SReclaimable[true] => '64770048',
VmallocTotal[true] => '117432320',
Bounce[true] => '0',
Inactive[true] => '676634624',
PageTables[true] => '3608576',
Slab[true] => '84430848',
Buffers[true] => '346124288',
LowFree[true] => '9539584'
}
}
}
}
}
This statement would produces results like this:
row:
*:
network:
device:
system:
meminfo:
lowfree: 9539584
buffers: 346124288
slab: 84430848
pagetables: 3608576
inactive: 676634624
bounce: 0
vmalloctotal: 117432320
sreclaimable: 64770048
highfree: 30834688
mapped: 58638336
dirty: 24576
lowtotal: 918089728
swapcached: 247386112
cached: 475234304
commitlimit: 5055533056
active: 1293049856
anonpages: 1147985920
vmallocused: 47677440
memtotal: 2116120576
writeback: 0
vmallocchunk: 64999424
nfs_unstable: 0
sunreclaim: 19660800
committed_as: 1972723712
hightotal: 1198030848
memfree: 40374272
swapfree: 3548553216
swaptotal: 3997474816
This example merges device data.
MERGE INTO / values {
network[true] => {
device[interface/mac_address = '00:E0:81:58:FE:CA'
or interface/inet/ip_address = '50.10.33.64'] => {
interface[mac_address = '00:0C:29:93:CE:EE' or
inet/ip_address = '10.10.20.27'] => {
mac_address[true] => '00:0C:29:93:CE:EE',
inet[ip_address = '10.10.20.27'] => {
last_seen[true] => '2008-03-19T18:42:10.0000000Z'
}
}
}
}
}
This statement produces the following results:
row:
row
*
network
device
interface
inet
last_seen 2008-02-19T18:42:10.0000000Z
mac_address 00:0C:29:93:CE:EE
device
interface
inet
last_seen 2008-03-19T18:42:10.0000000Z
mac_address 00:0C:29:93:CE:EE
In reality, your MERGE statements may be far more complex than these examples. To add multiple nodes to your database that were not previously discovered, you may want to use a different MERGE statement for each node. Here's a more complex and more realistic example:
ERGE@'2008-03-21T17:15:12.014103Z' into / values {
network[true] => {
device[interface/mac_address = '00:0C:29:93:CE:EE' or
or interface/inet/ip_address = '10.10.20.27'] => {
interface[mac_address = '00:0C:29:93:CE:EE' or
or inet/ip_address = '10.10.20.27'] => {
mac_address[true] => '00:0C:29:93:CE:EE',
inet[ip_address = '10.10.20.27'] => {
ip_address[true] => '10.10.20.27'
}
},
system[true] => {
dns_name[true] => 'maruichi'
},
evidence => {
dhcp[false] => {
value[true] => {
vendor_class[true] => 'MSFT 5.0'
},
value[true] => {
host_name[true] => 'maruichi'
},
value[true] => {
param_request_list[true] =>
'1,15,3,6,44,46,47,31,33,43,252'
}
},
dns_name[false] => {
value[true] => {
value[true] => 'maruichi'}
},
ip_address[false] => {
value[true] => {
value[true] => '10.10.20.27'
}
},
mac_address[false] => {
value[true] => {
value[true] => '00:0C:29:93:CE:EE'
}
},
probe[false] => {
param[true] => {
technique[true] => 'snmp'
},
param[true] => {
port[true] => '161'
},
param[true] => {
request_label[true] => 'snmp_sysdescr'
},
param[true] => {
request[true] => '1.3.6.1.2.1.1.1.0'
},
value[true] => {
response[true] => 'Hardware: x86 Family 6 Model
15 Stepping 8 AT/AT COMPATIBLE
- Software: Windows 2000
Version 5.0 (Build 2195
Uniprocessor Free)'
}
},
probe[false] => {
param[true] => {
technique[true] => 'snmp'
},
param[true] => {
port[true] => '161'
},
param[true] => {
request_label[true] => 'snmp_sysname'
},
param[true] => {
request[true] => '1.3.6.1.2.1.1.5.0'
},
value[true] => {
response[true] => 'MARUICHI'
}
},
scan_status[false] => {
param[true] => {
scan[true] => 'SCAN_DNS_QUERY'
},
value[true] => {
value[true] => 'completed'
}
},
scan_status[false] => {
param[true] => {
scan[true] => 'SCAN_NBNS_QUERY'
},
value[true] => {
value[true] => 'completed'
}
},
scan_status[false] => {
param[true] => {
scan[true] => 'SCAN_ARP_QUERY'
},
value[true] => {
value[true] => 'completed'
}
},
scan_status[false] => {
param[true] => {
scan[true] => 'SCAN_TCP_PORTS'
},
value[true] => {
value[true] => 'completed'
}
},
scan_status[false] => {
param[true] => {
scan[true] => 'SCAN_UDP_PORTS'
},
value[true] => {
value[true] => 'completed'
}
},
scan_status[false] => {
param[true] => {
scan[true] => 'SCAN_SNMP'
},
value[true] => {
value[true] => 'completed'
}
},
subnet[false] => {
value[true] => {
value[true] => '10.10.0.0/16'
}
},
tcp_port[false] => {
param[true] => {
port[true] => '135'
},
value[true] => {
value[true] => 'open'
}
},
tcp_port[false] => {
param[true] => {
port[true] => '139'
},
value[true] => {
value[true] => 'open'
}
},
tcp_port[false] => {
param[true] => {
port[true] => '7'
},
value[true] => {
value[true] => 'open'
}
}
}
}
}
}
This statement would produces results like this:
row:
*:
network:
device:
interface:
mac_address: 00:0C:29:93:CE:EE
inet:
ip_address: 10.10.20.27
system:
dns_name: maruichi
evidence:
mac_address:
value:
value: 00:0C:29:93:CE:EE
ip_address:
value:
value: 10.10.20.27
tcp_port:
value:
value: open
param:
port: 7
tcp_port:
value:
value: open
param:
port: 139
tcp_port:
value:
value: open
param:
port: 135
subnet:
value:
value: 10.10.0.0/16
scan_status:
value:
value: completed
param:
scan: SCAN_SNMP
scan_status:
value:
value: completed
param:
scan: SCAN_UDP_PORTS
scan_status:
value:
value: completed
param:
scan: SCAN_TCP_PORTS
scan_status:
value:
value: completed
param:
scan: SCAN_ARP_QUERY
scan_status:
value:
value: completed
param:
scan: SCAN_NBNS_QUERY
scan_status:
value:
value: completed
param:
scan: SCAN_DNS_QUERY
probe:
value:
response: MARUICHI
param:
request: 1.3.6.1.2.1.1.5.0
param:
request_label: snmp_sysname
param:
port: 161
param:
technique: snmp
probe:
value:
response: Hardware: x86 Family 6 Model 15
Stepping 8 AT/AT COMPATIBLE -
Software: Windows 2000 Version 5.0
(Build 2195 Uniprocessor Free)
param:
request: 1.3.6.1.2.1.1.1.0
param:
request_label: snmp_sysdescr
param:
port: 161
param:
technique: snmp
dns_name:
value:
value: maruichi
dhcp:
value:
param_request_list: 1,15,3,6,44,46,47,31,33,43,252
value:
host_name: maruichi
value:
vendor_class: MSFT 5.0
You can use Node IDs in MERGE statements, in place of lengthy condition statements in the FROM clause. Many typical MERGE statements include a lengthy path that identifies which node or set of nodes to merge the new data into, such as: MERGE INTO / . . . where the slash / represents a path that roots the first clause of the VALUES tree. Node IDs cut that path down to just a number that uniquely identifies a single node.
For example, say you have 3 alerts in your database and you want to update the first one. To merge data into Alert 1, you must write a MERGE statement that identifies Alert 1, but filters out Alerts 2 and 3. To do this without using Node ID, you'd write a MERGE statement with some conditions that identify Alert 1, like this:
MERGE INTO / VALUES {
apps[true] => {
com[true] => {
alerts[true] =>
{
alert[name='Alert 1'] =>
{name[true] => 'Renamed Alert 1'
}
}
}
}
}
Instead, you can use the Node ID of Alert 1 to identify it. To find the Node ID of any particular node, run a query like this:
Query: What is the Node ID of Alert 1?
SELECT /#id FROM /apps/com/paglo/alerts/alert WHERE name = 'Alert 1'
Result: The result indicates that 10357496 is the Node ID of Alert 1.
row id(): 10357496
Now you can update Alert 1 with a streamlined MERGE statement like this:
MERGE INTO 10357496 VALUES {
name[true] => 'Renamed Alert 1'
}
For more examples, see Examples and More examples.
WARNING! Be careful with the MERGE statement. With only a simple but all-encompasing statement, it is possible to delete your entire database with a statement this simple: MERGE INTO / VALUES {};
Problems can occur if you use a MERGE statement in which you use a field in a condition to select a node, in a case where the field does not actually exist. For example, look at the following MERGE statement:
MERGE into / values {
network[true] => {
device[interface/mac_address = '00:A0:B0:C0:D0:E0'
OR interface/inet/ip_address = '10.10.10.10'] => {
last_seen[true] => '2008-02-19T18:42:10.0000000Z'
}
}
}
If you run this statement, it creates a device, but does not create the interface sub-tree. Therefore, if you run this same MERGE statement again, it creates a second device instead of merely updating the first device.
When data about a device is gathered from different sources, they often differ in how they identify the device. One plugin identifies the device by its IP address, and another by its MAC address. As part of best practices, always include both addresses in your MERGE statements so that data from different sources merges properly. Write this condition in the same order: mac_address OR ip_address as in the following example:
MERGE into / values {
network[true] => {
device[interface/mac_address = '00:A0:B0:C0:D0:E0'
OR interface/inet/ip_address = '10.10.10.10'] => {
last_seen[true] => '2008-02-19T18:42:10.0000000Z'
}
}
}
It might be helpful to examine a poorly-written MERGE statement and see what's wrong with it. The following example attempts to identify an interface that has an IP address of 10.20.30.40 or a MAX address of 00:A0:B0:C0:D0:E0, and instead creates problems in a couple of ways:
Incorrect MERGE statement:
MERGE INTO / values {
network[true] => {
device[interface[inet/ip_address = '10.20.30.40'
or mac_address = '00:A0:B0:C0:D0:E0'] => {
mac_address[true] => '00:A0:B0:C0:D0:E0',
inet => {
ip_address[true] => '10.20.30.40'
}
}
}
}
One problem is that the same clause that provides the conditions to match, also adds the sub-tree. That means that if GoToAssist doesn't find this sub-tree in the database already, it creates it new.
In other words, this MERGE statement adds an IP address of 10.20.30.40 and a MAC address of 00:A0:B0:C0:D0:E0 if it doesn't find them already in existence. Was that your intention? If not, you have added data accidently.
Another problem occurs if you run the same MERGE statement again at a later date, to add or update the database. Since the sub-tree now already exists, the MERGE statement creates a second inet clause and results in an interface with two identical IP addresses:
Result:
. . .
interface/
mac_address = '00:A0:B0:C0:D0:E0'
inet/
ip_address = '10.20.30.40'
inet/
ip_address = '10.20.30.40'
. . .
The reason this happens is an omission in the following lines of the original MERGE statement:
inet => {
ip_address[true] => '10.20.30.40'
}
hese lines do not specify which inet tree to update, so the MERGE statement creates a second inet sub-tree.
To correct this omission, apply [true] to the inet clause:
inet[true] => {
ip_address[true] => '10.20.30.40'
}
This would match any inet sub-tree under interface. However, what about an interface that has more than one IP address? The above inet[true] clause matches any and all inet sub-trees under interface and then sets the IP address for all of them to 10.20.30.40. For example, if you start with this interface in your database:
Database:
../interface/
inet/
ip_address = '10.20.30.40'
inet/
ip_address = '10.10.10.1'
After the MERGE statement you would have this interface in your database:
Result:
../interface/
inet/
ip_address = '10.20.30.40'
inet/
ip_address = '10.20.30.40'
The correct way to write this merge statement is to fully specify the condition clause for every non-leaf node, as follows:
Correct MERGE statement:
MERGE INTO / values {
network[true] => {
device[interface[inet/ip_address = '10.20.30.40'
or mac_address = '00:A0:B0:C0:D0:E0'] => {
mac_address[true] => '00:A0:B0:C0:D0:E0',
inet[ip_address = '10.20.30.40'] => {
ip_address[true] => '10.20.30.40'
}
}
}
}
Conclusion: Remember that when you have a sub-tree that may or may not exist, or when you may have multiple sub-trees with the same node name, construct your condition clauses so that you full identify every non-leaf node, and thereby update the correct sub-tree.
Related topics