Python 是一门非常优秀的面向对象的解释性语言,代码编写快且易读(但从代码字面意义上)。如果非要挑出一个缺点,那就是代码的执行速度相对于 C、Java、Golang 等语言较慢,不过在 3.11 版的 Python 运行速度已经进行大幅提高。但是,对于 Python 代码的性能分析仍然不可或缺,这有助于找出耗时的代码部分,通过优化能够加速程序运行。本篇介绍一些 Python 代码性能分析的方法。
代码的性能分析跟代码执行时间密切相关,只不过它关注的是耗时的位置。默认的 Python 性能分析工具是 cProfile 模块,它在执行一个程序或代码块时,会记录各函数所耗费的时间。但它不是转为 Python 设计的。
cProfile 一般是在命令行上使用的,它将执行整个程序然后输出各函数的执行时间。我们首先编写一个 Python 模块 py_performance.py
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 import numpy as npfrom numpy.linalg import eigvalsnp.random.seed(33 ) def run_experiment (niter=100 ): K = 100 results = [] for _ in range (niter): mat = np.random.randn(K, K) max_eigenvalue = np.abs (eigvals(mat)).max () results.append(max_eigenvalue) return results if __name__ == "__main__" : some_results = run_experiment() print (f"最大特征值: {np.max (some_results)} " )
执行性能分析:
1 python -m cProfile py_performance.py
结果大致是下面:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Largest one we saw: 11.375894329102476 80055 function calls (77991 primitive calls) in 1.101 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function ) 100 0.000 0.000 0.002 0.000 <__array_function__ internals>:177(all) 1 0.000 0.000 0.000 0.000 <__array_function__ internals>:177(amax) 1 0.000 0.000 0.000 0.000 <__array_function__ internals>:177(concatenate) 1 0.000 0.000 0.000 0.000 <__array_function__ internals>:177(copyto) 100 0.000 0.000 0.592 0.006 <__array_function__ internals>:177(eigvals) 149/1 0.001 0.000 0.454 0.454 <frozen importlib._bootstrap>:1002(_find_and_load) 180/16 0.001 0.000 0.435 0.027 <frozen importlib._bootstrap>:1033(_handle_fromlist) 452 0.001 0.000 0.002 0.000 <frozen importlib._bootstrap>:112(release) 149 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:152(__init__) 149 0.000 0.000 0.003 0.000 <frozen importlib._bootstrap>:156(__enter__) 149 0.000 0.000 0.001 0.000 <frozen importlib._bootstrap>:160(__exit__) 452 0.002 0.000 0.003 0.000 <frozen importlib._bootstrap>:166(_get_module_lock)
单从上面的日志很难发现最耗时的地方在哪里,常用 -s 标志对某一列进行排序,如 cumtime:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 Largest one we saw: 11.375894329102476 80055 function calls (77991 primitive calls) in 0.743 seconds Ordered by: cumulative time ncalls tottime percall cumtime percall filename:lineno(function ) 110/1 0.000 0.000 0.743 0.743 {built-in method builtins.exec} 1 0.000 0.000 0.743 0.743 py_performance.py:1(<module>) 1 0.001 0.001 0.621 0.621 py_performance.py:5(run_experiment) 100 0.000 0.000 0.567 0.006 <__array_function__ internals>:177(eigvals) 203/103 0.000 0.000 0.567 0.006 {built-in method numpy.core._multiarray_umath.implement_array_function} 100 0.562 0.006 0.567 0.006 linalg.py:976(eigvals) 13 0.001 0.000 0.259 0.020 __init__.py:1(<module>) 149/1 0.001 0.000 0.122 0.122 <frozen importlib._bootstrap>:1002(_find_and_load) 149/1 0.001 0.000 0.122 0.122 <frozen importlib._bootstrap>:967(_find_and_load_unlocked) 138/1 0.001 0.000 0.122 0.122 <frozen importlib._bootstrap>:659(_load_unlocked) 109/1 0.000 0.000 0.122 0.122 <frozen importlib._bootstrap_external>:844(exec_module) 216/1 0.000 0.000 0.122 0.122 <frozen importlib._bootstrap>:220(_call_with_frames_removed) 180/16 0.001 0.000 0.118 0.007 <frozen importlib._bootstrap>:1033(_handle_fromlist) 362/8 0.001 0.000 0.117 0.015 {built-in method builtins.__import__} 100 0.053 0.001 0.053 0.001 {method 'randn' of 'numpy.random.mtrand.RandomState' objects}
注意:如果一个函数调用了别的函数,计时器是不会停下来重新计时的。
除了上面命令行的方式外,cProfile 还提供了编程式的分析代码块的性能方法,IPython 提供了方便的接口,如 %prun
和 %run -p
.
基本性能分析 %prun 和 %run -p 该魔法函数可以直接在 IPython Jupyter notebook 中使用,
1 2 3 from pyscripts.py_performance import run_experiment%prun -l 7 -s cumulative run_experiment()
结果如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 4004 function calls (3904 primitive calls) in 0.603 seconds Ordered by: cumulative time List reduced from 35 to 7 due to restriction <7> ncalls tottime percall cumtime percall filename:lineno(function ) 1 0.000 0.000 0.603 0.603 {built-in method builtins.exec} 1 0.000 0.000 0.603 0.603 <string>:1(<module>) 1 0.001 0.001 0.603 0.603 py_performance.py:5(run_experiment) 100 0.000 0.000 0.552 0.006 <__array_function__ internals>:177(eigvals) 200/100 0.000 0.000 0.552 0.006 {built-in method numpy.core._multiarray_umath.implement_array_function} 100 0.547 0.005 0.552 0.006 linalg.py:976(eigvals) 100 0.049 0.000 0.049 0.000 {method 'randn' of 'numpy.random.mtrand.RandomState' objects}
1 %run -p -l 7 -s cumulative pyscripts/py_performance.py
结果如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Largest one we saw: 11.375894329102476 4163 function calls (4062 primitive calls) in 0.584 seconds Ordered by: cumulative time List reduced from 106 to 7 due to restriction <7> ncalls tottime percall cumtime percall filename:lineno(function ) 2/1 0.000 0.000 0.584 0.584 {built-in method builtins.exec} 1 0.000 0.000 0.584 0.584 <string>:1(<module>) 1 0.000 0.000 0.584 0.584 interactiveshell.py:2774(safe_execfile) 1 0.000 0.000 0.583 0.583 py3compat.py:51(execfile) 1 0.000 0.000 0.582 0.582 py_performance.py:1(<module>) 1 0.001 0.001 0.581 0.581 py_performance.py:5(run_experiment) 100 0.000 0.000 0.532 0.005 <__array_function__ internals>:177(eigvals)
逐行分析函数性能 %lprun 逐行分析函数性能需要第三方模块 line_profiler
,安装方法如下:
1 pip install line_profiler
安装后,可以编辑 ~/.ipython/profile_default/ipython_config.py
,增加如下内容:
1 2 3 c.TerminalIPythonApp.extensions = [ 'line_profiler' , ]
此时,可以在 IPython 中使用:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 In [1 ]: from pyscripts.py_performance import run_experiment In [2 ]: %lprun -f run_experiment run_experiment() Timer unit: 1e-09 s Total time: 0.997021 s File: /home/jinzhongxu/MEGA/py/nb/bs/pyscripts/py_performance.py Function: run_experiment at line 5 Line ============================================================== 5 def run_experiment (niter=100 ): 6 1 1530.0 1530.0 0.0 K = 100 7 1 421.0 421.0 0.0 results = [] 8 100 57943.0 579.4 0.0 for _ in range (niter): 9 100 66120221.0 661202.2 6.6 mat = np.random.randn(K, K) 10 100 930721489.0 9307214.9 93.4 max_eigenvalue = np.abs (eigvals(mat)).max () 11 100 118745.0 1187.5 0.0 results.append(max_eigenvalue) 12 1 321.0 321.0 0.0 return results
%lprun
的通用语法为:
1 %lprun -f func1 -f func2 func(x, y)
在 Jupyter notebook 中使用:
1 2 3 %load_ext line_profiler from pyscripts.py_performance import run_experiment %lprun -f run_experiment run_experiment()
结果如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Timer unit: 1e-09 s Total time: 0.58282 s File: /home/jinzhongxu/MEGA/py/nb/bs/pyscripts/py_performance.py Function: run_experiment at line 5 Line ============================================================== 5 def run_experiment(niter=100): 6 1 4160.0 4160.0 0.0 K = 100 7 1 952.0 952.0 0.0 results = [] 8 100 43829.0 438.3 0.0 for _ in range(niter): 9 100 46188517.0 461885.2 7.9 mat = np.random.randn(K, K) 10 100 536499670.0 5364996.7 92.1 max_eigenvalue = np.abs(eigvals(mat)).max() 11 100 82394.0 823.9 0.0 results.append(max_eigenvalue) 12 1 192.0 192.0 0.0 return results
参考文献
4.3. Profiling your code line-by-line with line_profiler
pyutils/line_profiler
Wes McKinney 著 唐学韬 等译。利用 Python 进行数据分析。机械工业出版社。