Troubleshooting & Agent Advanced Topics

apply-now

Documentation Note

You have wandered into the Splunk Agent Advanced Topics section.At this point the documentation is being written in a general overtone and only being promoted here to point you towards a topic you were not previously aware of or to act as a quick reference for a helpful command.It is not intended to be exhaustive and you are encouraged to find additional documentation on the topics below from the well documented and publicly available Splunk documentation.Also note that paths to commands may differ between Splunk Universal Forwarder, Splunk Heavy Forwarder, and Splunk Enterprise installations.

Logging

Splunk Agent logs are stored in $SPLUNK_HOMEvarlogsplunk.

The most valuable of these logs is the splunkd.log file.

#Bash Splunk Universal Forwarder
less +F -N /opt/splunkforwarder/var/log/splunk/splunkd.log
#Bash Splunk Heavy/Enterprise
less +F -N /opt/splunk/var/log/splunk/splunkd.log
#PowerShell
Get-Content -Path "C:\Program Files\SplunkUniversalForwarder\var\log\splunk\splunkd.log" -Wait -Last 10

Splunkd Log Files

By default the Splunkd.log and all other internal Splunk log files are sent to the main Splunk indexing tier. These logs do not count against the ingestion charges for the service either by Splunk nor from Technology Services. These logs are also not available for general searching due to the sensitive nature of internally generated log files. At a service level these logs can contain search string, result, authentication, and connection information that is not suitable for general viewing. These logs can always be accessed from the local machine’s var/log folder.

In some cases it might be beneficial to a service owner to want to ingest these logs. This is especially important where key workflows such as alerting or automations are reliant on the free flow of Splunk data from the endpoint to the Splunk indexing tiers.

To ingest the splunkd log files a second time you can add another monitoring stanza to the inputs.conf file located in the etc/system/local folder.

[monitor:///opt/splunk/var/log/splunk/.../splunkd.log]
index = my_index
crcSalt = my_arbitrary_salt_value

Index Lag

Where the above trick comes in handy is is determining Splunk Forwarder specific issues.

Though not perfect and being performed over the internet to Splunk Cloud, we are attempting to reach a <30 second delay between agent sending an event and the event being indexed at the Splunk indexing tier. For systems where timeliness is imperative, you can run a simple Splunk query against the indexed data to determine when the data was generated and when it was indexed. This can be helpful in alerting to a potential problem with data or machine. For example a single machine that experiences higher lag in log generation might be undersized compared to it’s peer machines. This query does not help for systems that batch generate their logs.

This query can also be an initial indicator of issues such as blocked queues mentioned below.

index=my_index source=*splunkd.log* earliest=-200m@m latest=-1m@m   | eval delay_sec=_indextime-_time   | stats  min(delay_sec) avg(delay_sec) as avg_delay max(delay_sec) by host    | search ((avg_delay > 10) OR (avg_delay < -3))

Blocked Queues

When a forwarder is having a problem keeping data throughput sustained to the indexing tier it will begin to log events to splunkd.log indicating that a queue is blocked such as the below example. These problems should be corrected as quickly as possible to ensure that all data is arriving at Splunk appropriately and no events are dropped.

mm-dd-yyyy 23:31:18.987 -0600 INFO  Metrics - group=queue, name=parsingqueue, blocked=true, max_size_kb=512, current_size_kb=511, current_size=14, largest_size=34, smallest_size=0

Throughput

These can indicate an upstream problem or a local problem with the agent’s configuration. If the number of these blocked queue alerts is not constant and no other connectivity issues are found in the logs, start by increasing the throughput of the universal forwarder agent. This is performed through the limits.conf file. Create a new limits.conf file in the etc/system/local folder and add the following stanza before restarting.

[thruput]
maxKBps = 1024

This will double the initial network bandwidth that the agent is allowed to consume. Proceed with caution though, the more that you allow the agent to send, the more of CPU and network utilization the agent will require on the machine.

Another way to increase throughput is to increase the number of threads that the Splunk agent uses. Each thread acts as a seperate pipeline which can speed up speeds significantly to a point. There is a 100% perofrmance increase by adding a second pipeline, however you also require 100% more resources in the form of memory, cpu, and network bandwidth. After the second thread you begin to have diminishing returns which need to be monitored closely. While it is frequently safe to double your agent’s number of threads (the default is 1) you should read the documentation and understand the risks associated with changing this setting in the server.conf file and other documentation on docs.splunk.com.

# $SPLUNK_DIR/etc/system/local/server.conf
[general]
...
parallelIngestionPipelines = 2
#* The number of discrete data ingestion pipeline sets to create for this
#  instance.
#* A pipeline set handles the processing of data, from receiving streams
#  of events through event processing and writing the events to disk.
#* An indexer that operates multiple pipeline sets can achieve improved
#  performance with data parsing and disk writing, at the cost of additional
#  CPU cores.
#* For most installations, the default setting of "1" is optimal.
#* Use caution when changing this setting. Increasing the CPU usage for data
#  ingestion reduces available CPU cores for other tasks like searching.
#* NOTE: Enabling multiple ingestion pipelines can change the behavior of some
#  settings in other configuration files. Each ingestion pipeline enforces
#  the limits of the following settings independently:
#    1. maxKBps (in the limits.conf file)
#    2. max_fd (in the limits.conf file)
#    3. maxHotBuckets (in the indexes.conf file)
#    4. maxHotSpanSecs (in the indexes.conf file)
#* Default: 1

Full Queues

When bandwidth constraints are removed or alleviated in the previous step you can begin to troubleshoot queues filling up. This occurs when the agent has to chew on very large files causing the queue to fill up and near realtime processing to be delayed.

These queues relate to the data pipeline queues found in this Splunk documentation. Most Universal Forwarder queue settings can be fixed with the main queue setting and there is little need to adjust the individual queue settings.

Start by adjusting the main queue size by editing the server.conf file in $SPLUNK_HOME/etc/system/local and restart. If you have especially large files being processed, adjust the queue settings to maxSizes that are larger than these file sizes. Each adjustment will require additional system resources dedicated to the Splunk agent.

[queue]
maxSize = 512MB

Resetting the Splunk Universal Forwarder Password for v7.1+

If you need to reset your password for the local Universal Forwarder agent please follow these instructions. Please do not do this on a Splunk Enterprise or Splunk Heavy Forwarder where complex authentication is configured.

# Stop Splunk
$SPLUNK_DIR/bin/splunk stop
# Change the admin password
$SPLUNK_DIR/bin/splunk edit user admin -password thenewpassword
# Start Splunk
$SPLUNK_DIR/bin/splunk start

Configuration Files

At the end of the day, Splunk configuration occurs in tiers of configuration files. Lots is written in Splunk documentation about precedence. If you want to boil it down to general guidelines live by:

  1. NEVER alter a setting in a directory named default
  2. $SPLUNK_DIR/etc/system/local wins
    1. Configuration files that reside here win the precedence battle at the application configuration layer
  3. $SPLUNK_DIR/etc/apps/ works in ASCII order to resolve conflicts between conflicting stanzas

While a limited set of configuration file changes can be read into the running splunkd application with a $SPLUNK_DIR/bin/splunk reload xxxxx command, most of the time you are going to want to do a $SPLUNK_DIR/bin/splunk restart

Another important feature of the configuration file based approach is that file system access is powerful. A savvy Splunker does not need the endpoint password if they can write a conf file and restart the splunkd agent.

Btool

Splunk layers configurations in orders of precedence. This precedence is both by folder location and by naming of the folders. It applies to numerous levels of the Splunk configuration differently depending on the piece of the application you are working within (search, index, configuration, props/transforms, etc).

It can get tricky. One tool that you will see advanced Splunkers use frequently is btool to show you where and how a particular stanza is being compiled in the various configuration files.

Here are some key conf files:

  • inputs
  • ouputs
  • server
  • deploymentclient

Often you will start with a general btool to gather an idea about what is being compiled into the runtime.

/opt/splunk/bin/splunk btool inputs list

Once you find the stanza that might be at play you can invoke the double-dash debug to find the exact configuration file that is winning the precedence battle.

/opt/splunk/bin/splunk btool inputs list –debug

For the long configuration files (like server.conf or inputs.conf) you can target a specific stanza that you are trying to troubleshoot.

/opt/splunk/bin/splunk btool inputs list monitor://var/ossec/logs/ossec.log --debug

If you are stuck with having to dig through a huge pile of configuration stanzas, you can use a regex in your favorite editor to quickly jump through stanza headers.

For example, in Notepad++

  1. Search > Mark
  2. Change Search Mode to Regular Expression
  3. Search for the following regex^\[.*?\]

    Splunk CLI

    The agent has it’s own command line that utilizes the local API port 8089 (by default)

    This is helpful for troubleshooting local settings and conflicts

    http://docs.splunk.com/Documentation/Splunk/latest/Admin/CLIadmincommands

    Some useful commands to call attention to…

    # List your inputs
    $SPLUNK_DIR/bin/splunk list monitor
    # Restart Splunk
    $SPLUNK_DIR/bin/splunk restart
    # Bundled OpenSSL for machines without OpenSSL or an outdated SSL installed
    $SPLUNK_DIR/bin/splunk cmd openssl

    Ingestion Tricks

    Double Reading

    While not typically recommended, you can trick the Splunk agent into reading a file twice by including a dot in the file path. By including a crcsalt change at the stanza level, the files in that dot folder are registered as a second CRC value in the agent’s manifest.

    This can be useful in situations such as the code block below where we are sending data to one Splunk instance (splunk_legacy) under a legacy index name (main) and the new Splunk instance (on-ramp) under the new Splunk index name (djl_test_idx).

    [monitor://c:\temp\random.txt]
    index=main
    _TCP_ROUTING = splunk_legacy
    [monitor://c:\temp\.\random.txt]
    index=djl_test_idx
    _TCP_ROUTING = on_ramp
    crcSalt = on_ramp

    Double Reading – Linux

    In Linux the preferred route to read a file twice is to use a symbolic link with a different crcSalt

    ln -s /services/source_file /tmp/new_symlink

    In your inputs .conf you would configure it to include both paths

    [monitor:///services/source_file]
    index=main
    [monitor:///tmp/new_symlink]
    index=djl_test_idx
    crcSalt = on_ramp

    Split Streams

    The phrase split streams refers to sending the data from the universal forwarder to two Splunk Instances at the same time. This requires two outputs.conf configurations without a default output stanza. By default all data in the inputs.conf will route to both of these outputs, though you can control this by using TCP_ROUTING values such as the example above in the Double Reading example above.

    In the example below we are pushing a specially formatted outputs.conf to endpoints to split the stream between the legacy Splunk and new Splunk environments that are secured with custom certificates.

    [tcpout:on_ramp]
    server = splunk-on-ramp-index.machinedata.illinois.edu:9997
    sslCertPath = $SPLUNK_HOME/etc/auth/cert.pem
    sslPassword = ¯\_(ツ)_/¯
    sslRootCAPath = $SPLUNK_HOME/etc/auth/ca.pem
    sslCommonNameToCheck = illinoisSplunkIndexer
    sslVerifyServerCert = true
    [tcpout:splunk_legacy]
    server = splunk-indexers.opia.illinois.edu:8514

    Oneshot

    Ever have a single file that you want to upload and do not feel like restarting SplunkD to consume? Rarely.

    This command leverages the CLI on the endpoint to consume a single log file that you have no desire to upload again.

    $SPLUNK_DIR/bin/splunk add oneshot /var/log/applog

    Or directly via the spooler. Result is the same though the spool method is a less hospitable to wildcards and directories.

    $SPLUNK_DIR/bin/splunk spool /var/log/applog

Splunk at Illinois
Email: splunk-admin@illinois.edu
Log In