Skip to main content

Measuring SSSD performance with SystemTap

This post is intended to provide information about finding SSSD bottlenecks with SystemTap.

One of the most common complaints with SSSD is slowness during login or NSS commands such as ‘getent’ or ‘id’ especially in large LDAP/Active Directory environments. Log analysis alone can be difficult to track down the source of the delay, especially with certain configurations(Indirect AD Integration) where there is a significant number of backend operations that occur during login.

In SSSD 1.14, performance enhancements were made to optimize cache write operations decreasing overall time spent updating the filesystem cache. These bottlenecks were discovered by developers based on userspace probing in certain areas of the SSSD code with SystemTap.

Below are some steps on getting started with SystemTap and SSSD, in this example we will use recent additions of High-Level Data Provider request probes.

  • First, install the necessary packages mentioned here: Installation and Setup

    • It is not required to install kernel-debuginfo or sssd-debuginfo to run these userspace systemtap scripts.
  • You can now check if the probe markers are available with:

# stap -L 'process("/usr/libexec/sssd/sssd_be").mark("*")'
process("/usr/libexec/sssd/sssd_be").mark("dp_req_done") $arg1:long $arg2:long $arg3:long
process("/usr/libexec/sssd/sssd_be").mark("dp_req_send") $arg1:long $arg2:long

# stap -L 'process("/usr/lib64/sssd/libsss_ldap_common.so").mark("*")' | head
process("/usr/lib64/sssd/libsss_ldap_common.so").mark("sdap_acct_req_recv") $arg1:long $arg2:long $arg3:long $arg4:long
process("/usr/lib64/sssd/libsss_ldap_common.so").mark("sdap_acct_req_send") $arg1:long $arg2:long $arg3:long $arg4:long
process("/usr/lib64/sssd/libsss_ldap_common.so").mark("sdap_deref_search_recv") $arg1:long $arg2:long
process("/usr/lib64/sssd/libsss_ldap_common.so").mark("sdap_deref_search_send") $arg1:long $arg2:long
process("/usr/lib64/sssd/libsss_ldap_common.so").mark("sdap_get_generic_ext_recv") $arg1:long $arg2:long $arg3:long
process("/usr/lib64/sssd/libsss_ldap_common.so").mark("sdap_get_generic_ext_send") $arg1:long $arg2:long $arg3:long
process("/usr/lib64/sssd/libsss_ldap_common.so").mark("sdap_nested_group_check_cache_post")
process("/usr/lib64/sssd/libsss_ldap_common.so").mark("sdap_nested_group_check_cache_pre")
process("/usr/lib64/sssd/libsss_ldap_common.so").mark("sdap_nested_group_deref_process_post")
process("/usr/lib64/sssd/libsss_ldap_common.so").mark("sdap_nested_group_deref_process_pre")

  • The existing SystemTap scripts are located in /usr/share/sssd/systemtap. The id_perf.stp can be used to measure performance specifically with the ‘id’ command, while the nested_group_perf.stp generates metrics and useful information associated with nested group processing code.
# ll /usr/share/sssd/systemtap/
-rw-r--r--. 1 root root 2038 May  4 18:16 dp_request.stp
-rw-r--r--. 1 root root 3854 May  4 13:56 id_perf.stp
-rw-r--r--. 1 root root 8613 May  4 14:44 nested_group_perf.stp

  • Running the dp_request.stp script will track Data Provider requests and provide information about the request which took the most time to complete.
# vim /usr/share/sssd/systemtap/dp_request.stp 
/* Start Run with:
 *   stap -v dp_request.stp
 *
 * Then reproduce slow login or id/getent in another terminal.
 * Ctrl-C running stap once login completes.

# stap -v /usr/share/sssd/systemtap/dp_request.stp 
Pass 1: parsed user script and 469 library scripts using 244964virt/45004res/7588shr/37596data kb, in 100usr/20sys/128real ms.
Pass 2: analyzed script: 4 probes, 13 functions, 5 embeds, 11 globals using 246992virt/48356res/8816shr/39624data kb, in 30usr/160sys/396real ms.
Pass 3: using cached /root/.systemtap/cache/d5/stap_d5d7fd869e61741e13b43b7a6932a761_11210.c
Pass 4: using cached /root/.systemtap/cache/d5/stap_d5d7fd869e61741e13b43b7a6932a761_11210.ko
Pass 5: starting run.
        *** Beginning run! ***
        --> DP Request [Account #1] sent for domain [AD.JSTEPHEN]
                 DP Request [Account #1] finished with return code [0]: [Success]
                 Elapsed time [0m8.476s]

        --> DP Request [Account #2] sent for domain [idm.jstephen]
                 DP Request [Account #2] finished with return code [0]: [Success]
                 Elapsed time [0m0.003s]

        --> DP Request [Initgroups #3] sent for domain [AD.JSTEPHEN]
                 DP Request [Initgroups #3] finished with return code [0]: [Success]
                 Elapsed time [0m0.115s]

        --> DP Request [Account #4] sent for domain [idm.jstephen]
                 DP Request [Account #4] finished with return code [0]: [Success]
                 Elapsed time [0m0.001s]

        --> DP Request [Account #5] sent for domain [idm.jstephen]
                 DP Request [Account #5] finished with return code [0]: [Success]
                 Elapsed time [0m0.002s]

        --> DP Request [Account #6] sent for domain [idm.jstephen]
                 DP Request [Account #6] finished with return code [0]: [Success]
                 Elapsed time [0m0.001s]

        --> DP Request [Account #7] sent for domain [idm.jstephen]
                 DP Request [Account #7] finished with return code [0]: [Success]
                 Elapsed time [0m0.000s]

        --> DP Request [Account #8] sent for domain [idm.jstephen]
                 DP Request [Account #8] finished with return code [0]: [Success]
                 Elapsed time [0m0.001s]

        --> DP Request [Account #9] sent for domain [idm.jstephen]
                 DP Request [Account #9] finished with return code [0]: [Success]
                 Elapsed time [0m0.001s]

^C
Ending Systemtap Run - Providing Summary
Total Number of DP requests: [9]
Total time in DP requests: [0m8.600s]
Slowest request data:
        Request: [Account #1]
        Start Time: [Fri May  5 10:47:14 2017 EDT]
        End Time: [Fri May  5 10:47:23 2017 EDT]
        Duration: [0m8.476s]

Pass 5: run completed in 0usr/40sys/15329real ms.

  • We can see that the getAccountInfo #1 DP request completed in 8.476 seconds, the Start Time/End Time provided here can be used to help narrow down log analysis.
(Fri May  5 10:47:14 2017) [sssd[be[idm.jstephen]]] [dp_get_account_info_handler] (0x0200): Got request for [0x1][BE_REQ_USER][name=trustuser1@ad.jstephen]
(Fri May  5 10:47:14 2017) [sssd[be[idm.jstephen]]] [dp_attach_req] (0x0400): DP Request [Account #1]: New request. Flags [0x0001].
(Fri May  5 10:47:14 2017) [sssd[be[idm.jstephen]]] [dp_attach_req] (0x0400): Number of active DP request: 1
...
<snip>
...
(Fri May  5 10:47:23 2017) [sssd[be[idm.jstephen]]] [dp_req_done] (0x0400): DP Request [Account #1]: Request handler finished [0]: Success

  • The existing SystemTap scripts can be modified or new scripts can be created for a certain use-case as long as the existing probes/tapsets in /usr/share/systemtap/tapset/sssd.stp are used.
# LDAP search probes
probe sdap_search_send = process("/usr/lib64/sssd/libsss_ldap_common.so").mark("sdap_get_generic_ext_send")
{
    base = user_string($arg1);
    scope = $arg2;
    filter = user_string($arg3);

    probestr = sprintf("-> search base [%s] scope [%d] filter [%s]",
                       base, scope, filter);
}

The stap -L command shown previously will list out the functions where probes were added making these markers available for writing scripts with.

The goal will be to add more low-level probes to iterative functions where SSSD spends a lot of time. This will allow developers and administrators to analyze performance issues in detail.

Comments

  1. By the way, if your stap scripts only use process probes such as these, you don't need kernel-debuginfo. If the scripts only use sys/sdt.h markers, then they don't need sssd-debuginfo either.

    ReplyDelete

Post a Comment