Python 是一门非常优秀的面向对象的解释性语言,代码编写快且易读(但从代码字面意义上)。如果非要挑出一个缺点,那就是代码的执行速度相对于 C、Java、Golang 等语言较慢,不过在 3.11 版的 Python 运行速度已经进行大幅提高。但是,对于 Python 代码的性能分析仍然不可或缺,这有助于找出耗时的代码部分,通过优化能够加速程序运行。本篇介绍一些 Python 代码性能分析的方法。
代码的性能分析跟代码执行时间密切相关,只不过它关注的是耗时的位置。默认的 Python 性能分析工具是 cProfile 模块,它在执行一个程序或代码块时,会记录各函数所耗费的时间。但它不是转为 Python 设计的。
cProfile 一般是在命令行上使用的,它将执行整个程序然后输出各函数的执行时间。我们首先编写一个 Python 模块 py_performance.py:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 import  numpy as  npfrom  numpy.linalg import  eigvalsnp.random.seed(33 ) def  run_experiment (niter=100  ):    K = 100      results = []     for  _ in  range (niter):         mat = np.random.randn(K, K)         max_eigenvalue = np.abs (eigvals(mat)).max ()         results.append(max_eigenvalue)     return  results if  __name__ == "__main__" :    some_results = run_experiment()     print (f"最大特征值: {np.max (some_results)} " ) 
执行性能分析:
1 python -m cProfile py_performance.py 
结果大致是下面:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Largest one we saw: 11.375894329102476          80055 function  calls (77991 primitive calls) in  1.101 seconds    Ordered by: standard name    ncalls  tottime  percall  cumtime  percall filename:lineno(function )       100    0.000    0.000    0.002    0.000 <__array_function__ internals>:177(all)         1    0.000    0.000    0.000    0.000 <__array_function__ internals>:177(amax)         1    0.000    0.000    0.000    0.000 <__array_function__ internals>:177(concatenate)         1    0.000    0.000    0.000    0.000 <__array_function__ internals>:177(copyto)       100    0.000    0.000    0.592    0.006 <__array_function__ internals>:177(eigvals)     149/1    0.001    0.000    0.454    0.454 <frozen importlib._bootstrap>:1002(_find_and_load)    180/16    0.001    0.000    0.435    0.027 <frozen importlib._bootstrap>:1033(_handle_fromlist)       452    0.001    0.000    0.002    0.000 <frozen importlib._bootstrap>:112(release)       149    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:152(__init__)       149    0.000    0.000    0.003    0.000 <frozen importlib._bootstrap>:156(__enter__)       149    0.000    0.000    0.001    0.000 <frozen importlib._bootstrap>:160(__exit__)       452    0.002    0.000    0.003    0.000 <frozen importlib._bootstrap>:166(_get_module_lock) 
单从上面的日志很难发现最耗时的地方在哪里,常用 -s 标志对某一列进行排序,如 cumtime:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 Largest one we saw: 11.375894329102476          80055 function  calls (77991 primitive calls) in  0.743 seconds    Ordered by: cumulative time    ncalls  tottime  percall  cumtime  percall filename:lineno(function )     110/1    0.000    0.000    0.743    0.743 {built-in method builtins.exec}         1    0.000    0.000    0.743    0.743 py_performance.py:1(<module>)         1    0.001    0.001    0.621    0.621 py_performance.py:5(run_experiment)       100    0.000    0.000    0.567    0.006 <__array_function__ internals>:177(eigvals)   203/103    0.000    0.000    0.567    0.006 {built-in method numpy.core._multiarray_umath.implement_array_function}       100    0.562    0.006    0.567    0.006 linalg.py:976(eigvals)        13    0.001    0.000    0.259    0.020 __init__.py:1(<module>)     149/1    0.001    0.000    0.122    0.122 <frozen importlib._bootstrap>:1002(_find_and_load)     149/1    0.001    0.000    0.122    0.122 <frozen importlib._bootstrap>:967(_find_and_load_unlocked)     138/1    0.001    0.000    0.122    0.122 <frozen importlib._bootstrap>:659(_load_unlocked)     109/1    0.000    0.000    0.122    0.122 <frozen importlib._bootstrap_external>:844(exec_module)     216/1    0.000    0.000    0.122    0.122 <frozen importlib._bootstrap>:220(_call_with_frames_removed)    180/16    0.001    0.000    0.118    0.007 <frozen importlib._bootstrap>:1033(_handle_fromlist)     362/8    0.001    0.000    0.117    0.015 {built-in method builtins.__import__}       100    0.053    0.001    0.053    0.001 {method 'randn'  of 'numpy.random.mtrand.RandomState'  objects} 
注意:如果一个函数调用了别的函数,计时器是不会停下来重新计时的。
除了上面命令行的方式外,cProfile 还提供了编程式的分析代码块的性能方法,IPython 提供了方便的接口,如 %prun 和 %run -p.
基本性能分析 %prun 和 %run -p 该魔法函数可以直接在 IPython Jupyter notebook 中使用,
1 2 3 from  pyscripts.py_performance import  run_experiment%prun -l 7  -s cumulative run_experiment() 
结果如下:
1 2 3 4 5 6 7 8 9 10 11 12 13        4004 function  calls (3904 primitive calls) in  0.603 seconds  Ordered by: cumulative time  List reduced from 35 to 7 due to restriction <7>  ncalls  tottime  percall  cumtime  percall filename:lineno(function )       1    0.000    0.000    0.603    0.603 {built-in method builtins.exec}       1    0.000    0.000    0.603    0.603 <string>:1(<module>)       1    0.001    0.001    0.603    0.603 py_performance.py:5(run_experiment)     100    0.000    0.000    0.552    0.006 <__array_function__ internals>:177(eigvals) 200/100    0.000    0.000    0.552    0.006 {built-in method numpy.core._multiarray_umath.implement_array_function}     100    0.547    0.005    0.552    0.006 linalg.py:976(eigvals)     100    0.049    0.000    0.049    0.000 {method 'randn'  of 'numpy.random.mtrand.RandomState'  objects} 
1 %run -p -l 7  -s cumulative pyscripts/py_performance.py 
结果如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Largest one we saw: 11.375894329102476            4163 function  calls (4062 primitive calls) in  0.584 seconds    Ordered by: cumulative time    List reduced from 106 to 7 due to restriction <7>    ncalls  tottime  percall  cumtime  percall filename:lineno(function )       2/1    0.000    0.000    0.584    0.584 {built-in method builtins.exec}         1    0.000    0.000    0.584    0.584 <string>:1(<module>)         1    0.000    0.000    0.584    0.584 interactiveshell.py:2774(safe_execfile)         1    0.000    0.000    0.583    0.583 py3compat.py:51(execfile)         1    0.000    0.000    0.582    0.582 py_performance.py:1(<module>)         1    0.001    0.001    0.581    0.581 py_performance.py:5(run_experiment)       100    0.000    0.000    0.532    0.005 <__array_function__ internals>:177(eigvals) 
逐行分析函数性能 %lprun 逐行分析函数性能需要第三方模块 line_profiler,安装方法如下:
1 pip install line_profiler 
安装后,可以编辑 ~/.ipython/profile_default/ipython_config.py,增加如下内容:
1 2 3 c.TerminalIPythonApp.extensions = [     'line_profiler' , ] 
此时,可以在 IPython 中使用:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 In [1 ]: from  pyscripts.py_performance import  run_experiment In [2 ]: %lprun -f run_experiment run_experiment() Timer unit: 1e-09  s Total time: 0.997021  s File: /home/jinzhongxu/MEGA/py/nb/bs/pyscripts/py_performance.py Function: run_experiment at line 5  Line  ==============================================================      5                                            def  run_experiment (niter=100  ):      6          1        1530.0    1530.0       0.0       K = 100       7          1         421.0     421.0       0.0       results = []      8        100       57943.0     579.4       0.0       for  _ in  range (niter):      9        100    66120221.0  661202.2       6.6           mat = np.random.randn(K, K)     10        100   930721489.0  9307214.9      93.4           max_eigenvalue = np.abs (eigvals(mat)).max ()     11        100      118745.0    1187.5       0.0           results.append(max_eigenvalue)     12          1         321.0     321.0       0.0       return  results 
%lprun 的通用语法为:
1 %lprun -f func1 -f func2 func(x, y) 
在 Jupyter notebook 中使用:
1 2 3 %load_ext line_profiler from pyscripts.py_performance import run_experiment %lprun -f run_experiment run_experiment() 
结果如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Timer unit: 1e-09 s Total time: 0.58282 s File: /home/jinzhongxu/MEGA/py/nb/bs/pyscripts/py_performance.py Function: run_experiment at line 5 Line  ==============================================================      5                                           def run_experiment(niter=100):      6         1       4160.0   4160.0      0.0      K = 100      7         1        952.0    952.0      0.0      results = []      8       100      43829.0    438.3      0.0      for  _ in  range(niter):      9       100   46188517.0 461885.2      7.9          mat = np.random.randn(K, K)     10       100  536499670.0 5364996.7     92.1          max_eigenvalue = np.abs(eigvals(mat)).max()     11       100      82394.0    823.9      0.0          results.append(max_eigenvalue)     12         1        192.0    192.0      0.0      return  results 
参考文献 
4.3. Profiling your code line-by-line with line_profiler pyutils/line_profiler Wes McKinney 著 唐学韬 等译。利用 Python 进行数据分析。机械工业出版社。