Contents
Monitor Files
The most basic approach to getting data in to Splunk is to monitor files on your server or workstation directly with a lightweight Splunk agent on your server or workstation. This lightweight Splunk agent is called a “Universal Forwarder” – or “UF”. As you might suspect, the purpose of the UF is simply to monitor files or directories (which you specify) and forward that data to the Splunk environment. (The monitoring method is similar to a Linux ‘tail’ command.) For more about ingesting by way of UF, see our UF docs (for installing, updating, configuring).
INFO: In our environment, a Splunk “deployment server” is used to manage the consistent distribution of a complex set of configurations UFs need to send data to Splunk. Think of the “deployment server” as a Splunk-centric Endpoint Management system. These managed configurations include specifics about where the data is to be delivered along with information about how and where to store it within Splunk; it includes certificates for securing the transmission; it can include technology-specific “bundles” (e.g., a bundle for Windows machines tells the UF where the Windows log files are to monitor, which ones to monitor, and what metadata to use with the data to improve the data’s searchability from within Splunk), and more.
“Push” data to a Splunk HTTP Event Collector (HEC)
A Splunk HTTP Event Collector (HEC) is a passive API endpoint on an instance of Splunk Enterprise. If your source application can be configured to send its machine (log) data over HTTPS, the HEC can listen for connections, receive the data, parse it, and then forward it to the Splunk indexers for ingestion. The HEC method has become increasingly popular largely because of the “push” v “pull” advantage: Allowing the source to “push” data to Splunk requires less configuration and coordination, thereby reducing overhead and improving resiliency. (Imagine being able to throw things in to a bucket whenever and however much you want, instead of having to be asked every so often for what you have to share and coordinating the hand-off.)
Each source is secured with a specific token that will be configured (to send over HTTPS) with the assistance of the Splunk Service Administration team.
This is the primary and preferred method of acquiring data from SaaS solutions where installing an agent is impractical or infeasible. HEC is also the path of choice for messaging-based (publish/subscribe) scenarios such as Kafka. For more about ingesting by way of HEC, see our HEC documentation.
A robust and scaleable HEC capability is part of the Splunk Service — customers do not need to run their own HEC.
“Pull” data from a source by way of an API (or jdbc)
When necessary, instances of Splunk Enterprise called “Heavy Forwarders” can be employed to pull data from sources on a schedule (cron) by way of API calls (e.g., SOAP/REST) or database queries (over JDBC). As with an HEC, a Heavy Forwarder parses the data and forwards it to the indexers for ingestion.
Although available and sometimes the only option, we will seek to rule out the approaches described above (UF, HEC) before implementing a solution that relies on ingestion by Heavy Forwarder.
Units do not need to run their own Heavy Forwarder(s) — ingestion by way of Heavy Forwarders is available as part of the Splunk Service. Some organizations, however, may have use cases (e.g., JDBC / DB Connect where SQL needs to be developed and refined) where running a Heavy Forwarder is desirable.
Syslog and other TCP or UDP Data Streams
Syslog still represents a large percentage of log data flowing in to Splunk. However…
Although Splunk has the ability to receive (listen on a port for) syslog data and other TCP or UDP data streams (similar to Wireshark), using Splunk as the syslog (or other TCP or UDP data stream) target for ingestion is recommended neither by Splunk (the vendor) nor by us. This port-listening functionality is something of a “carry-over” from a time in Splunk’s history when the product was a complement to – or extension of – syslog. But the loss-y nature of UDP and the need for resilience even for TCP streams means that such use cases need an intermediary to improve the likelihood that events are not lost.
If you need to get syslog data (or other TCP or UDP data stream) in to Splunk… 1) Use a dedicated solution for collecting that data (for example a syslog-ng or rsyslog server). 2) Install a Universal Forwarder (UF) on the collecting server to forward syslog data to Splunk. This approach significantly improves resiliency.
INFO: Increasingly, hardware manufacturers and software service providers are offering alternatives to syslog. Consider encouraging your vendors to support the ability to push logs over HTTP.