Easy Read Time: 13 Minutes
The “Splunk Success Framework (SSF)” is a flexible structure comprising of best practices which speeds up and escalates the value that user derive from its dataset by making use of Splunk software. The “Success Framework” gives reference materials, proper guidance, templates and expertise for each phase of “Splunk implementation”, from data enlistment and framework regulation to recommendations for user learning. The “SSF best practices” launch a highly robust foundation, after which these practices offer implementation guiding principles that are organized into four kinds of “functional areas” that provide backing to “standard”, “intermediate” and “advanced” objectives. These best practices are applied to both “Splunk Cloud” and “on premises Splunk Enterprise deployments”.
Table of Content +
- 1 Platform Management
- 1.1 Setting Unix Profiles
- 1.2 Preparing for failures in the Splunk utility tier
- 1.3 Using Splunk Sandbox
- 1.4 Managing Backup and Restore Processes
- 1.5 Setting up Lab Environment
- 2 Program Management
- 2.1 Managing Stakeholder
- 2.2 Defining Charter for Splunk Implementation
- 2.3 Establishing Operating Framework
- 2.4 Establishing Service levels
- 2.5 Assigning Responsibility for Splunk Deployment
- 2.6 Change Control
- 2.7 Building Splunk Community
- 2.8 Communication with User Community
- 2.9 Sending newsletter to Splunk Community
- 3 Data Management
- 3.1 Data Onboard Workflow
- 3.1.1 Step one: Request Data
- 3.1.2 Step 2: Define the Data
- 3.1.3 Step Three: Implement the use case
- 3.1.4 Step four: Validate
- 3.1.5 Step five: Communicate
- 3.2 Collecting Logs in Splunk
- 3.1 Data Onboard Workflow
- 4 People Management
- 4.1 Staffing a Splunk deployment
- 4.1.1 Guidelines for creating a staffing model
- 4.1.2 Recommendations for staff sizing
- 4.1.3 Technical drivers that influence staffing decisions
- 4.1.4 Operational drivers that influence staffing decisions
- 4.2 Setting Roles and responsibilities
- 4.3 Managing data based on role
- 4.4 Setting up a welcome page
- 4.5 Building user group workspaces
- 4.6 Enabling users with incentives
- 4.1 Staffing a Splunk deployment
- 5 References
In SSF, fundamental best practices include resolutions, agreements and favorable outcomes that facilitate the purpose, aims, and rights of Splunk implementation. All of these strategic decisions give transparency and liability that are necessary components for effective deployment. The SSF lays out the following four fundamental best practices:
- A purpose to set the goals of Splunk implementation.
- An elusive sponsor is a leader that is responsible for success of Splunk implementation.
- Metrics set benchmarks in order to measure the success as Splunk implementation grows.
- An operational Framework outlines the method to set up the Splunk environment based on objectives and goals in addition to best practices to set up successful Splunk implementation Team.
These basic practices set a threshold with stakeholders ensuring that Splunk implementation keeps on track and prosper and develop as the user requirements grow and increase.
Best practices to implement Splunk are structured into four functional areas.
- “Platform Management best practices” that provide backings to accessibility, scalability and manageability of Splunk placement.
- “Program Management best practices” sustain the methods to conduct Splunk execution in order to drive acceptance and apprehend the highest possible value from Splunk implementation.
- “Data Management best practices” provides effective data management strategies and produce efficient use cases which are firmly associated to the data.
- “User Management best practices” empower users and team members by making use of “learning incentives and role base access” to structures and data.
SSF has three maturity levels based on the classification of best practices in order to meet user’s priorities, objectives and needs for every activity.
- Standard: To create the basis for ideal Splunk environment to perform.
- Intermediate: To provide more control for outcomes user can adapt for managing Splunk deployment.
- Advanced: To suggest conformation and optimizations in order to expand and develop the Splunk deployment and execution.
SSF uses following terminologies:
- Splunk Deployment: Refers to installation of Splunk software and its system configuration along with its availability to one user and data source.
- Splunk Environment: Refers to tools hosting the Splunk system.
- Splunk Implementation: Refers to “Splunk deployment” and “Splunk environment”, Splunk team, its data, use cases of Splunk software, solutions (programs), and community users that develop Splunk software and program application usage.
In the next sections, description of each functional area is described in detail.
The best practices in platform management provide backing to the accessibility, scalability and sustainability to user Splunk Deployment. These assists in establishing Splunk platform system for “stability planning”, “capacity planning”, and “incident management”.
While deploying Splunk best practices and softwares, there is always this need of working with Unix-based OS. In these environment, Splunk installation might not be within exact location making it complex to use Splunk command line features. As user is not able to conform one set of standards, therefore “non-intrusive alternative” is to use environment variables to overcome these problems creating a constant and efficient working environment.
Splunk gives “product feature” for increasing accessibility and regaining option for two types of tiers i.e., “search tier” and “indexing tier”. Managerial functions and job roles that includes server deployment, system deployer and licensing server, all of them count on best practices for delivering resiliency.
Splunk utility tier components are deployed for Splunk managerial functions. On inaccessibility or destruction of any of these components, that function and source becomes unattainable and these functions exclude search heads, indexers and forwarders. There is always this need of reconstructing new instances and update the references to all new host information in every environment if any component destruction occurs. These strenuous and error-prone efforts can be mitigated by implementing best practices.
Several users make use of VMs rather than bare metal hardware for components of utility tier as VM gives two features that are very important for these utility components.
Dynamic Resource Sizing
VMs alter the hardware specifications of host on increasing the load.
State preservation and Transition
VMs give snapshots to the host that keep an image of the instance. VM, for instance VMotion from VMWare enables the user to express host image on new VM.
Managerial tasks for enhancing or upgrading networking details to clients including host name and IP, can be impractical in a large distributed environment on failing the utility instances. User might try to avoid the labor of rebuilding utility components using the same networking specifications but it is not possible. The best practice for it is to make use of “DNS CName (canonical name)” records as “translational service”.
On establishing CName instances, user can deviate all clients to these CName DNS, without relying on “true host” and “IP of host hardware”. There will be no need to use the same IP and hostname on replacing host hardware enabling user to create new utility instances same as the old along with DNS Toggle as cutover.
Same best practice can be used for load balancing on both “data collection tier” and “search tier”. In this case, “DNS A record” dispenses traffic to several hosts providing the most convenient way to scale. For load balancing of index tier, Splunk’s inbuilt load balancing feature is the best practice for forwarding data to the indexers.
A best practice to establish the consistent and steadfast production, Splunk environment set up a workflow including “separate sandboxes” for progress and improvement, “Splunk lab environment” for testing and “safe push to production” as soon as the things are prepared.
Sandbox is a “stand-alone Enterprise instance” that is used by one person to create novel and progressive ideas. It is the process of launching and using sandbox. Every member on Splunk team must have its own sandbox in order to feel safe while taking risks and learning because if user have its own sandbox, it won’t be risky to start over when needed.
Promoting “Splunk sandbox culture” for Splunk team enhances the learning and growth ensuring to have latitude to try innovative things without disturbing the already on going work.
Developing regular backups for implemented softwares is the best practice so far. User can classify the backup and reinstate points making regular backups of “Splunk configuration files” ensuring system continuity if failure or any mistake occurs.
Setting up the lab environment is also the best practice for any Splunk Enterprises as it simulates the production environments for testing the product feature.
Setting up lab is different than setting up separate sandbox, that is more “lightweight”, “highly volatile environment” for each “development and experimentation”. Lab is same in many contexts as the “production environment” with high stability and persistency for user to efficiently “test and develop” product specifications before deploying them in the “production environment”.
Best practices for program functional area consists of “business alignment”, “operation”, “collaboration”, “use cases” and “staffing”, that make user to grasp possibly maximum value from Splunk implementation. “Program manager” usually pushes such activities and become able to manage interdependencies between stakeholders.
“Stakeholder register” is the record of any anyone in the organization that has invested in Splunk, from the administrator who sanctioned the purchase to Splunk user.
Making “stakeholder register” is vital as it assists in keeping the track of “Splunk constitutes” and comprehend their needs. It is a significant benchmark that tells all the necessary requirements to deal with each stakeholders and measuring success with Splunk and eventually classifying how Splunk assists the overall success of organization.
It is always the best practice to define the purpose and scope of “Splunk implementation” to keep focus on using Splunk software and its solutions, identification of stakeholders, methods of using Splunk solutions to influence machine data. Clearly defining purpose and scope of Splunk deployment sets and outlook between stakeholders and helps in prioritizing the decisions that user makes and the actions to be taken.
Following are the guidelines for establishing Splunk implementation purpose:
- Define clear objectives that user wants to use Splunk software and solutions to achieve.
- Investigate any restraints on Splunk implementation
- Recognize stakeholders
- Enlist benefits of deploying “Splunk software and solutions” on machine data.
Operating framework gives structure for how to set up and bring about Splunk implementation.
In this following operating models are chosen.
Using this, teams can operate own Splunk implementation, architecture and projects operations.
It focuses on Splunk engineering into central team with single Splunk system deployment.
This model is amalgamation of both centralized and federated model, where critical mass of Splunk activity is in the central team.
Program managers fulfils the following responsibilities:
- Drive decision making
- Manger inter-dependencies among Success Framework pillar
- Ensuring Splunk deployment plan aligning with business objectives
- Oversee Splunk success measurements
- Promote and support program wide communication
- Ensures executive alignment
“Service Level definitions (SLD)” are agreement among service provider and organizations providing services that defines special aspects of facility including quality, accessibility, responsibilities. SLD is comprised of “service level objectives (SLOs)”, “service level agreements (SLA)”, “case priority levels” and “incident response times”. When Splunk is functioned as service offering, SLD gives all teams and organizations assurance of meeting all the needs irrespective of effecting areas of “Splunk operation” and “response models”.
A “responsibility assignment matrix” is tool that is casted off in the domain of “project management” for defining roles. Comprehend the tasks and activities that are linked with roles set the expectations for roles that are assigned in staffing model. RACI model defines:
- Responsible: Those working to complete the activities
- Accountable: Those answerable for the results
- Consulted: Those with opinions
- Informed: Those needed to be kept updated
When completing RACI matrix for Splunk community, Splunk deployer must cogitate the needs and preferences of all stakeholders in organization other than Splunk admins and users.
“Change Management” is the end to end process governing lifecycle of change from initial application of alteration to the final implementation and communication of change.
By making use of CM to administer changes in Splunk environment gives following advantages:
- A documented, defined process to sanction and install changes in Splunk environment.
- Common and visible methods to address alterations.
- Ability to respond to problems by delivering records of change to sustain platform health and troubleshooting.
Above mentioned best practices are for users with following roles.
- Executive sponsor
- Program Manager
- Project Manager
As each organization is exclusive, their CM implementation method changes. Its guidelines give best practices framework and tools to define and develop the CM processes fitting the exceptional needs and requirements.
“Splunk community” platform is great way to preserve and keep community engaged and informed. This platform works for various purposes:
- Interchange ideas and offer sustenance
- Recognize common aims and securities
- Create thriving and progressive Splunk community
- Give space for open collaboration
Communication plan is an important component of broad and ample “Splunk program management plan”. Some of the best ways to communicate with the user community and provide update about Splunk are:
- Splunk banner message
- SMS alert
- Phone calls
- Dashboard panel
- Ticketing system
- Portal (wiki)
Sending newsletter is the most efficient way to spread and influence broader audience with recent news and activities in Splunk environment. Following are the advantages of newsletter:
- Motivates innovative ideas form Splunk community
- Establishes the value of Splunk to all the administrator and stakeholders.
- Compliments “Splunk community portal” to assist in keeping organization informed and offer outreach.
- Registers Splunk accomplishment and activities
Data management and lifecycle is the integral part of Splunk implementation. Before indulging with Splunk software, best practices can manage and structure the data effectively to improve its search ability and value.
Best practices in data function area helps in designing effective use cases which are strictly allied to data, so Splunk provides the solutions to all the questions.
There exist several resources about ingesting data into Splunk. Workflow consists of five phases, i.e., streamlining data requests, defining use cases, validating data, and appropriately communicating the accessibility of new data.
During process, approach must be documented in order to keep community well informed. It not only helps in answering questions but also establishes expectations and make users aware of responsibilities and teach users about making contributions to data onboarding process.
Data onboarding starts with adding request to data. It can be stated as a simple email establishing formal processes with request requirements.
Simplify data requests
Captures only the important. Evade using Splunk-specific terms including index name, field extraction etc.
Ask for particular known information
Ask for valid detailed information including “hostname and IP address, path, location and availability information, retaining needs and brief explanation of data representation”. All of it helps in prioritizing the requests and defining source types.
Estimate data volume
It is possible that requester might know the estimated data volumes but it will be convenient for user to evaluate the source location and do own math. Maximum data volume works well with “Splunk licensing Model”. Average or median data volume must not be taken as the real data volume exceeds the threshold 50% of time.
Data definition is necessary to clear the request details. In this phase, chances of miscommunication are occurred helping implementation stage to go smoothly. Reviewing data upfront needs to verify the following:
- Splunk system ability to validate the data
- Data authorization within Splunk
- Dependency on modular input
- Data retention and storage considerations
As soon as the data is defined, there comes the need to proceed with technical implementation.
Build out search and reporting artifacts
Make use of the information collected in the define data step. Focus on “value add elements” that only user can exclusively provide including tags, saved searches, reports, dashboards, forms and other elements given by requesters.
Ask for clarification as needed
User should ask the requester if it needs additional information about data, its details and objectives of use case while deployment and implementation.
After developing the use cases specifications, there comes the validation step that the system has achieved expected results. For this purpose, there is this need of following procedure.
- Run thru use case in lab
- Call the requester to authenticate the use case
This phase assures various data points added to analytic directly contributing to business value.
Splunk doesn’t require logging standard. It recognizes event by making use of few default field from received raw data of event and then classifies and correlates common elements with other events on fly at search time. There is no permanent or fixed scheme for it making search using Splunk quick, easy and stretchable.
Nevertheless, user can optimize the method of data formation at source such that Splunk can divide the event fields more easily, quicker and more accurately for each upcoming events.
People functional area uses learning incentives and role-based access to features and data to get the best out of Splunk software, which empower users. Having an education plan is ensured by functional area by making users earn further access through advancing experience and knowledge. Workplace safety is ensured for further collaboration and development of new ideas.
Staff size depends on the business model and organizational needs, not on the amount of data ingested. A Splunk team comprises of roles, which are collection of responsibilities. Role is delegation of responsibilities to existing members. Staffing can be better estimated by approaching roles rather than individuals.
Resource creation for identification of Splunk team members and their roles. Make it easily accessible. Display contact information, such as an email link and a picture, so people know the persons and their locations.
Keeping in mind as staffing model is completed that multiple roles can be assigned. Multiple team members can also fulfill the same role.
Figure 1. User Management 
One person can usually manage a single instance of Splunk, a deployment server, and several forwarders. Growing Splunk implementation and addition of advanced features to see your data analysis needs. The larger, more distributed, and service-oriented your implementation, the more staff you will need to run it efficiently.
Risk aversion and increasing complexity are driving forces to increase Splunk staff members. Looking closer at the situations, which can increase demand on your team, adding half a person’s time should be considered. The skills required can be provided by Engineers and Architects.
If an implementation shifts to a more deployment distribution model, separating indexers from the search head, addition of engineer or architects maybe helpful for management of deployment expansion. Peer review and help optimization can be provided by another member of team.
For implementation of indexer clustering, necessary data management skills is needed for staff to maintain fidelity of data between data origin and the indexer cluster nodes. In case of a problem sufficient staff should be there for timely response–if you have high availability requirements for data or search, you also need high availability for people.
Search head clustering
For implementation of search head clustering, your staff should have the necessary skills for capacity tuning and optimization for maintenance and optimization of search head performance.
Data collection tier
Administrative complexity can be added by establishment of a data collection tier, through modular inputs and third-party data forwarding. Your staff should have the expertise with the systems your Splunk deployment integration.
Complex utility tier
Utility Splunk instances, such as Splunk deployers, masters, and deployment servers, are easily managed within normal operations. However, deploying complex redundancies, such as deployment servers’ pool, can result in increment of the team workload.
Setting up your operational model also influences staffing needs. Considerations for setting up Splunk in your organization, and how it influences staff employment decisions. Staff requirements for skills that interact with your customers and their use cases are filled by the developer, search expert, and user community roles.
Closed platform approach
In a closed platform setup, Responsibility for management and creation of all knowledge objects lies on Splunk Staff. This is a resource intensive model. Splunk staff are not technical experts in some matters relating to subject’s domain. Your staff will need more time consultation to determine the importance of available data. If this is your model, making sure the availability of necessary staff for ample data exploration.
Open platform approach
This approach is least resource intensive. The users are empowered for implementation of their own user cases enabling them to develop expertise in their own subject matter. Team focus shifts from consultation to the more important roles of education, empowerment and managing communities.
Whether you adopt a closed or open platform approach, Splunk usage often grows as the user community begins to use it, experiments using their own SPL, and eventually becomes proficient enough to create their own knowledge objects. Knowledge is shared as more and more people learn to use it. Staff should be equipped for consultation in closed platform approach and in the case of open platform approach, they should be able to educate, empower, and manage communities.
A Splunk implementation team comprises of roles that demonstrate different strengths and skills with Splunk software and within your general business. These roles reflect the business skills needed to fulfill the associated duties, and do not necessarily map directly to default user roles of Splunk platforms. One person on your Splunk team can fulfill more than one role. Incentive based access can be managed in a good way for encouraging user community to enhance Splunk software usage techniques.
The Splunk roles feature enables you to define permissions and capabilities for a collection of users, for example, setting search limitations, providing access to product features, data, and knowledge objects, and setting the default app users land in when they log into Splunk. When data becomes more diverse and the number of users increases, it can be more complex to manage roles. This topic explores how you can manage roles, access to data, and access to product features in a more modular and scalable way.
Role separation is a process that separates a user’s access to data from their access to Splunk capabilities. Data governance is a process that controls access to certain data and certain role capabilities. Introducing a naming convention and separating roles and features can enable you to implement a highly flexible and scalable role-based access control (RBAC) solution.
A welcome page is a default dashboard within a default app tailored to a specific role or user group that presents users with all the resources they need to get Splunking. A welcome page is a good way to focus users’ attention on workflows that are relevant to them. You can create your own welcome pages from scratch, or use the Welcome Page Creator app on Splunk base to help you quickly implement welcome pages that accelerate your users’ time to effectiveness.
A workspace is a Splunk app tailored to a specific role or user group that enables users to search, explore, and create without distractions from other teams and users.
In many Splunk deployments, all users work in the default Search and Reporting app. This can cause some users and teams to struggle as they sort through dashboards, saved searches and knowledge objects from other teams to figure out what to use. It can also stifle adoption and discovery, as users may feel intimidated about participating or concerned they may break knowledge objects created by other teams. The volume and diversity of knowledge objects can also have a negative effect on overall search performance.
Creating an app as a workspace for each role or user group helps teams focus on what is important to them without getting distracted by content from other teams. Workspaces also facilitate co-working, and faster learning and adoption. When teams have their own workspaces, users can feel more confident, discover more, and collaborate more.
Creating workspace apps can also help improve search performance. You can set permissions to limit searches to the knowledge objects of their workspace, and not others.
User enablement is about motivating your users to learn and grow. When you provide an environment of incentive-driven access, you encourage users to explore and implement best practices, which adds value to the whole user community.