Skip to content

Conversation

@maurycy
Copy link
Contributor

@maurycy maurycy commented Jan 30, 2026

The PR reduces the high CPU usage of profiling.sampling, from 99%, without having any impact on the sampling quality or frequency.

The trick is to just use time.sleep.

Am I missing something obvious here?

I'm sure there are faster ways to do this (eg: timerfd), but not sure if worth, given complexity / decreased maintainability.

Test

Platform Before After Samples (before/after)
Linux 99% CPU (60.04s user) 6% CPU (2.78s user) 60,001 / 60,001
macOS 99% CPU (59.44s user) 2% CPU (0.53s user) 60,001 / 60,000

Linux (6.12.63+deb13-amd64, Debian trixie)

After

6% CPU:

2026-01-30T15:25:41.168435384+0100 maurycy@eiger /home/maurycy/github.com/maurycy/cpython (profiling-sampling-sleep) % ./test.sh 
Python: /home/maurycy/work/cpython/python ('--enable-optimizations' '--with-lto')
Duration: 60s
Target PID: 2909269
Captured 60,001 samples in 60.00 seconds
Sample rate: 1,000.02 samples/sec
Error rate: 0.70
Profile Stats:
       nsamples   sample%   tottime (s)    cumul%   cumtime (s)  filename:lineno(function)
    46378/46378      77.8        46.378      77.8        46.378  foof.py:6(l3)
    12559/12559      21.1        12.559      21.1        12.559  foof.py:5(l3)
      307/59493       0.5         0.307      99.9        59.493  foof.py:19(main)
        226/226       0.4         0.226       0.4         0.226  foof.py:7(l3)
       60/59164       0.1         0.060      99.3        59.164  foof.py:10(l2)
       43/59183       0.1         0.043      99.3        59.183  foof.py:13(l1)
            4/4       0.0         0.004       0.0         0.004  foof.py:3(l3)
            1/1       0.0         0.001       0.0         0.001  foof.py:4(l3)
        0/59578       0.0         0.000     100.0        59.578  foof.py:23(<module>)
           0/84       0.0         0.000       0.1         0.084  foof.py:18(main)
            0/2       0.0         0.000       0.0         0.002  foof.py:12(l1)
            0/1       0.0         0.000       0.0         0.001  foof.py:20(main)
            0/3       0.0         0.000       0.0         0.003  foof.py:9(l2)

Legend:
  nsamples: Direct/Cumulative samples (direct executing / on call stack)
  sample%: Percentage of total samples this function was directly executing
  tottime: Estimated total time spent directly in this function
  cumul%: Percentage of total samples when this function was on the call stack
  cumtime: Estimated cumulative time (including time in called functions)
  filename:lineno(function): Function location and name

Summary of Interesting Functions:

Functions with Highest Direct/Cumulative Ratio (Hot Spots):
  1.000 direct/cumulative ratio, 99.3% direct samples: foof.py:(l3)
  0.005 direct/cumulative ratio, 0.5% direct samples: foof.py:(main)
  0.001 direct/cumulative ratio, 0.1% direct samples: foof.py:(l2)

Functions with Highest Call Frequency (Indirect Calls):
  59578 indirect calls, 100.0% total stack presence: foof.py:(<module>)
  59271 indirect calls, 100.0% total stack presence: foof.py:(main)
  59142 indirect calls, 99.3% total stack presence: foof.py:(l1)

Functions with Highest Call Magnification (Cumulative/Direct):
  1376.4x call magnification, 59142 indirect calls from 43 direct: foof.py:(l1)
  986.1x call magnification, 59107 indirect calls from 60 direct: foof.py:(l2)
  194.1x call magnification, 59271 indirect calls from 307 direct: foof.py:(main)
2.78user 0.92system 1:00.04elapsed 6%CPU (0avgtext+0avgdata 20812maxresident)k
0inputs+0outputs (0major+3671minor)pagefaults 0swaps

Before

99% CPU:

2026-01-30T15:04:49.171073215+0100 maurycy@eiger /home/maurycy/github.com/maurycy/cpython (main) % ./test.sh 
Python: /home/maurycy/work/cpython/python ('--enable-optimizations' '--with-lto')
Duration: 60s
Target PID: 2907235
Captured 60,001 samples in 60.00 seconds
Sample rate: 1,000.02 samples/sec
Error rate: 0.58
Profile Stats:
       nsamples   sample%   tottime (s)    cumul%   cumtime (s)  filename:lineno(function)
    39116/39116      65.6        39.116      65.6        39.116  foof.py:6(l3)
    19916/19916      33.4        19.916      33.4        19.916  foof.py:5(l3)
      289/59643       0.5         0.289     100.0        59.643  foof.py:19(main)
        177/177       0.3         0.177       0.3         0.177  foof.py:7(l3)
       68/59333       0.1         0.068      99.5        59.333  foof.py:13(l1)
       53/59295       0.1         0.053      99.4        59.295  foof.py:10(l2)
          17/17       0.0         0.017       0.0         0.017  foof.py:3(l3)
          16/16       0.0         0.016       0.0         0.016  foof.py:4(l3)
        0/59652       0.0         0.000     100.0        59.652  foof.py:23(<module>)
            0/9       0.0         0.000       0.0         0.009  foof.py:18(main)

Legend:
  nsamples: Direct/Cumulative samples (direct executing / on call stack)
  sample%: Percentage of total samples this function was directly executing
  tottime: Estimated total time spent directly in this function
  cumul%: Percentage of total samples when this function was on the call stack
  cumtime: Estimated cumulative time (including time in called functions)
  filename:lineno(function): Function location and name

Summary of Interesting Functions:

Functions with Highest Direct/Cumulative Ratio (Hot Spots):
  1.000 direct/cumulative ratio, 99.3% direct samples: foof.py:(l3)
  0.005 direct/cumulative ratio, 0.5% direct samples: foof.py:(main)
  0.001 direct/cumulative ratio, 0.1% direct samples: foof.py:(l1)

Functions with Highest Call Frequency (Indirect Calls):
  59652 indirect calls, 100.0% total stack presence: foof.py:(<module>)
  59363 indirect calls, 100.0% total stack presence: foof.py:(main)
  59265 indirect calls, 99.5% total stack presence: foof.py:(l1)

Functions with Highest Call Magnification (Cumulative/Direct):
  1118.8x call magnification, 59242 indirect calls from 53 direct: foof.py:(l2)
  872.5x call magnification, 59265 indirect calls from 68 direct: foof.py:(l1)
  206.4x call magnification, 59363 indirect calls from 289 direct: foof.py:(main)
60.04user 0.00system 1:00.05elapsed 99%CPU (0avgtext+0avgdata 21108maxresident)k
0inputs+0outputs (0major+3687minor)pagefaults 0swaps

macOS (Tahoe 26.2)

After

2026-01-30T15:38:08.554326000+0100 maurycy@gimel /Users/maurycy/src/github.com/maurycy/cpython/Lib/profiling/sampling (profiling-sampling-sleep) % sudo ./test.sh              
Password:
Python: /Users/maurycy/work/cpython/python.exe
Duration: 60s
Target PID: 56781
Captured 60,000 samples in 60.00 seconds
Sample rate: 1,000.00 samples/sec
Error rate: 1.00
Profile Stats:
       nsamples   sample%   tottime (s)    cumul%   cumtime (s)  filename:lineno(function)
    52633/52633      88.6        52.633      88.6        52.633  foof.py:7(l3)
      6067/6067      10.2         6.067      10.2         6.067  foof.py:6(l3)
      428/59395       0.7         0.428     100.0        59.395  foof.py:21(main)
       80/58893       0.1         0.080      99.2        58.893  foof.py:11(l2)
          78/78       0.1         0.078       0.1         0.078  foof.py:8(l3)
       74/58967       0.1         0.074      99.3        58.967  foof.py:14(l1)
          23/23       0.0         0.023       0.0         0.023  foof.py:4(l3)
          12/12       0.0         0.012       0.0         0.012  foof.py:5(l3)
            2/2       0.0         0.002       0.0         0.002  foof.py:19(main)
        0/59397       0.0         0.000     100.0        59.397  foof.py:25(<module>)

Legend:
  nsamples: Direct/Cumulative samples (direct executing / on call stack)
  sample%: Percentage of total samples this function was directly executing
  tottime: Estimated total time spent directly in this function
  cumul%: Percentage of total samples when this function was on the call stack
  cumtime: Estimated cumulative time (including time in called functions)
  filename:lineno(function): Function location and name

Summary of Interesting Functions:

Functions with Highest Direct/Cumulative Ratio (Hot Spots):
  1.000 direct/cumulative ratio, 99.0% direct samples: foof.py:(l3)
  0.007 direct/cumulative ratio, 0.7% direct samples: foof.py:(main)
  0.001 direct/cumulative ratio, 0.1% direct samples: foof.py:(l2)

Functions with Highest Call Frequency (Indirect Calls):
  59397 indirect calls, 100.0% total stack presence: foof.py:(<module>)
  58967 indirect calls, 100.0% total stack presence: foof.py:(main)
  58893 indirect calls, 99.3% total stack presence: foof.py:(l1)

Functions with Highest Call Magnification (Cumulative/Direct):
  796.9x call magnification, 58893 indirect calls from 74 direct: foof.py:(l1)
  736.2x call magnification, 58813 indirect calls from 80 direct: foof.py:(l2)
  138.1x call magnification, 58967 indirect calls from 430 direct: foof.py:(main)
       60.05 real         0.53 user         0.59 sys

Before

[1] 2026-01-30T15:28:41.302602000+0100 maurycy@gimel /Users/maurycy/src/github.com/maurycy/cpython/Lib/profiling/sampling (main) % sudo ./test.sh 
Password:
Python: /Users/maurycy/work/cpython/python.exe
Duration: 60s
Target PID: 56466
Captured 60,001 samples in 60.00 seconds
Sample rate: 1,000.02 samples/sec
Error rate: 0.92
Profile Stats:
       nsamples   sample%   tottime (s)    cumul%   cumtime (s)  filename:lineno(function)
    52660/52660      88.6        52.660      88.6        52.660  foof.py:7(l3)
      6076/6076      10.2         6.076      10.2         6.076  foof.py:6(l3)
      443/59446       0.7         0.443     100.0        59.446  foof.py:21(main)
      105/58933       0.2         0.105      99.1        58.933  foof.py:11(l2)
       70/59003       0.1         0.070      99.2        59.003  foof.py:14(l1)
          50/50       0.1         0.050       0.1         0.050  foof.py:8(l3)
          28/28       0.0         0.028       0.0         0.028  foof.py:4(l3)
          14/14       0.0         0.014       0.0         0.014  foof.py:5(l3)
            3/3       0.0         0.003       0.0         0.003  foof.py:22(main)
            1/1       0.0         0.001       0.0         0.001  foof.py:19(main)
        0/59450       0.0         0.000     100.0        59.450  foof.py:25(<module>)

Legend:
  nsamples: Direct/Cumulative samples (direct executing / on call stack)
  sample%: Percentage of total samples this function was directly executing
  tottime: Estimated total time spent directly in this function
  cumul%: Percentage of total samples when this function was on the call stack
  cumtime: Estimated cumulative time (including time in called functions)
  filename:lineno(function): Function location and name

Summary of Interesting Functions:

Functions with Highest Direct/Cumulative Ratio (Hot Spots):
  1.000 direct/cumulative ratio, 99.0% direct samples: foof.py:(l3)
  0.008 direct/cumulative ratio, 0.8% direct samples: foof.py:(main)
  0.002 direct/cumulative ratio, 0.2% direct samples: foof.py:(l2)

Functions with Highest Call Frequency (Indirect Calls):
  59450 indirect calls, 100.0% total stack presence: foof.py:(<module>)
  59003 indirect calls, 100.0% total stack presence: foof.py:(main)
  58933 indirect calls, 99.2% total stack presence: foof.py:(l1)

Functions with Highest Call Magnification (Cumulative/Direct):
  842.9x call magnification, 58933 indirect calls from 70 direct: foof.py:(l1)
  561.3x call magnification, 58828 indirect calls from 105 direct: foof.py:(l2)
  133.0x call magnification, 59003 indirect calls from 447 direct: foof.py:(main)
       60.05 real        59.44 user         0.49 sys

Test scripts

test.sh
#!/bin/bash
set -e

PYTHON=~/work/cpython/python
DURATION=60

echo "Python: $PYTHON"

$PYTHON foof.py &
TARGET_PID=$!
sleep 1

echo "Target PID: $TARGET_PID"

cleanup() { kill $TARGET_PID 2>/dev/null || true; }
trap cleanup EXIT

/usr/bin/time env PYTHONPATH=. $PYTHON -m profiling.sampling attach $TARGET_PID -d $DURATION #  2>&1
foof.py
#!/bin/bash
set -e

PYTHON=~/work/cpython/python
DURATION=60

echo "Python: $PYTHON ($($PYTHON -c "import sysconfig; print(sysconfig.get_config_var('CONFIG_ARGS'))"))"

$PYTHON foof.py &
TARGET_PID=$!
sleep 1

echo "Target PID: $TARGET_PID"

cleanup() { kill $TARGET_PID 2>/dev/null || true; }
trap cleanup EXIT

/usr/bin/time env PYTHONPATH=. $PYTHON -m profiling.sampling attach $TARGET_PID -d $DURATION #  2>&1

@pablogsal
Copy link
Member

pablogsal commented Jan 30, 2026

Good catch!

Am I missing something obvious here?

I'm sure there are faster ways to do this (eg: timerfd), but not sure if worth, given complexity / decreased maintainability.

The problem i think was that with very small sample frequencies (microseconds) sleep was sleeping for more than the requested sampling rate and it was skewing the results and causing 'slow samples'. I think in some of the refactors the sleep fallback for bigger sample rates was lost and that's the reason this is happening :(

@pablogsal pablogsal enabled auto-merge (squash) January 30, 2026 14:53
@pablogsal
Copy link
Member

Thanks a lot for the investigation, the issue and the fix! Great work !

@pablogsal pablogsal merged commit be4ee8e into python:main Jan 30, 2026
53 checks passed
@maurycy maurycy deleted the profiling-sampling-sleep branch January 30, 2026 15:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants