基于 OpenCompass 的大模型评测
本篇介绍如何在 Jupyter Notebook 中使用 OpenCompass 对大模型进行评测。
评测对象
主要评测对象为语言大模型与多模态大模型:
- 基座模型:一般是经过海量的文本数据以自监督学习的方式进行训练获得的模型(如OpenAI的GPT-3,Meta的LLaMA),往往具有强大的文字续写能力。
- 对话模型:一般是在的基座模型的基础上,经过指令微调或人类偏好对齐获得的模型(如OpenAI的ChatGPT、上海人工智能实验室的书生·浦语),能理解人类指令,具有较强的对话能力。
工具架构
- 模型层:大模型评测所涉及的主要模型种类,OpenCompass以基座模型和对话模型作为重点评测对象。
- 能力层:OpenCompass从本方案从通用能力和特色能力两个方面来进行评测维度设计。在模型通用能力方面,从语言、知识、理解、推理、安全等多个能力维度进行评测。在特色能力方面,从长文本、代码、工具、知识增强等维度进行评测。
- 方法层:OpenCompass采用客观评测与主观评测两种评测方式。客观评测能便捷地评估模型在具有确定答案(如选择,填空,封闭式问答等)的任务上的能力,主观评测能评估用户对模型回复的真实满意度,OpenCompass采用基于模型辅助的主观评测和基于人类反馈的主观评测两种方式。
- 工具层:OpenCompass提供丰富的功能支持自动化地开展大语言模型的高效评测。包括分布式评测技术,提示词工程,对接评测数据库,评测榜单发布,评测报告生成等诸多功能。
能力维度
通用能力涵盖学科综合能力、知识能力、语言能力、理解能力、推理能力、安全能力,共计六大维度构造立体全面的模型能力评价体系。
评测方法
OpenCompass采取客观评测与主观评测相结合的方法。针对具有确定性答案的能力维度和场景,通过构造丰富完善的评测集,对模型能力进行综合评价。针对体现模型能力的开放式或半开放式的问题、模型安全问题等,采用主客观相结合的评测方式。
客观评测
针对具有标准答案的客观问题,我们可以我们可以通过使用定量指标比较模型的输出与标准答案的差异,并根据结果衡量模型的性能。同时,由于大语言模型输出自由度较高,在评测阶段,我们需要对其输入和输出作一定的规范和设计,尽可能减少噪声输出在评测阶段的影响,才能对模型的能力有更加完整和客观的评价。
为了更好地激发出模型在题目测试领域的能力,并引导模型按照一定的模板输出答案,OpenCompass采用提示词工程 (prompt engineering)和语境学习(in-context learning)进行客观评测。
在客观评测的具体实践中,我们通常采用下列两种方式进行模型输出结果的评测:
- 判别式评测:该评测方式基于将问题与候选答案组合在一起,计算模型在所有组合上的困惑度(perplexity),并选择困惑度最小的答案作为模型的最终输出。例如,若模型在 问题? 答案1 上的困惑度为 0.1,在 问题? 答案2 上的困惑度为 0.2,最终我们会选择 答案1 作为模型的输出。
- 生成式评测:该评测方式主要用于生成类任务,如语言翻译、程序生成、逻辑分析题等。具体实践时,使用问题作为模型的原始输入,并留白答案区域待模型进行后续补全。我们通常还需要对其输出进行后处理,以保证输出满足数据集的要求。
主观评测
语言表达生动精彩,变化丰富,大量的场景和能力无法凭借客观指标进行评测。针对如模型安全和模型语言能力的评测,以人的主观感受为主的评测更能体现模型的真实能力,并更符合大模型的实际使用场景。
OpenCompass采取的主观评测方案是指借助受试者的主观判断对具有对话能力的大语言模型进行能力评测。在具体实践中,我们提前基于模型的能力维度构建主观测试问题集合,并将不同模型对于同一问题的不同回复展现给受试者,收集受试者基于主观感受的评分。由于主观测试成本高昂,本方案同时也采用使用性能优异的大语言模拟人类进行主观打分。在实际评测中,本文将采用真实人类专家的主观评测与基于模型打分的主观评测相结合的方式开展模型能力评估。
在具体开展主观评测时,OpenComapss采用单模型回复满意度统计和多模型满意度比较两种方式开展具体的评测工作。
评测流程
在 OpenCompass 中评估一个模型通常包括以下几个阶段:配置 -> 推理 -> 评估 -> 可视化。
- 配置:这是整个工作流的起点。您需要配置整个评估过程,选择要评估的模型和数据集。此外,还可以选择评估策略、计算后端等,并定义显示结果的方式。
- 推理与评估:在这个阶段,OpenCompass 将会开始对模型和数据集进行并行推理和评估。推理阶段主要是让模型从数据集产生输出,而评估阶段则是衡量这些输出与标准答案的匹配程度。这两个过程会被拆分为多个同时运行的“任务”以提高效率,但请注意,如果计算资源有限,这种策略可能会使评测变得更慢。
- 可视化:评估完成后,OpenCompass 将结果整理成易读的表格,并将其保存为 CSV 和 TXT 文件。你也可以激活飞书状态上报功能,此后可以在飞书客户端中及时获得评测状态报告。
接下来,我们将展示 OpenCompass 的基础用法,展示书生浦语在 C-Eval 基准任务上的评估。它们的配置文件可以在 configs/eval_demo.py 中找到。
所有命令均在 jupyter notebook 中执行
环境配置
以下命令在 terminal 中执行:
1 | conda create --name opencompass --clone=/root/share/conda_envs/internlm-base |
上面基础环境配置好后,就可以打开 jupyter,创建 notebook,选择内核 opencompass 进行代码编写
1 | # jupyter notebook 环境配置 |
env: CONDA_EXE=/root/.conda/bin/conda
env: CONDA_PREFIX=/root/.conda/envs/opencompass
env: CONDA_PYTHON_EXE=/root/.conda/bin/python
env: PATH=/root/.conda/envs/opencompass/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
1 | %pip install -q pickleshare |
1 | # 安装opencompass |
1 | %cd opencompass |
数据准备
1 | %cp /share/temp/datasets/OpenCompassData-core-20231110.zip /root/opencompass/ |
1 | !unzip -q OpenCompassData-core-20231110.zip |
1 | # 列出所有跟 internlm 及 ceval 相关的配置 |
+--------------------------+--------------------------------------------------------+
| Model | Config Path |
|--------------------------+--------------------------------------------------------|
| hf_internlm2_20b | configs/models/hf_internlm/hf_internlm2_20b.py |
| hf_internlm2_7b | configs/models/hf_internlm/hf_internlm2_7b.py |
| hf_internlm2_chat_20b | configs/models/hf_internlm/hf_internlm2_chat_20b.py |
| hf_internlm2_chat_7b | configs/models/hf_internlm/hf_internlm2_chat_7b.py |
| hf_internlm_20b | configs/models/hf_internlm/hf_internlm_20b.py |
| hf_internlm_7b | configs/models/hf_internlm/hf_internlm_7b.py |
| hf_internlm_chat_20b | configs/models/hf_internlm/hf_internlm_chat_20b.py |
| hf_internlm_chat_7b | configs/models/hf_internlm/hf_internlm_chat_7b.py |
| hf_internlm_chat_7b_8k | configs/models/hf_internlm/hf_internlm_chat_7b_8k.py |
| hf_internlm_chat_7b_v1_1 | configs/models/hf_internlm/hf_internlm_chat_7b_v1_1.py |
| internlm_7b | configs/models/internlm/internlm_7b.py |
| ms_internlm_chat_7b_8k | configs/models/ms_internlm/ms_internlm_chat_7b_8k.py |
+--------------------------+--------------------------------------------------------+
+----------------------------+------------------------------------------------------+
| Dataset | Config Path |
|----------------------------+------------------------------------------------------|
| ceval_clean_ppl | configs/datasets/ceval/ceval_clean_ppl.py |
| ceval_gen | configs/datasets/ceval/ceval_gen.py |
| ceval_gen_2daf24 | configs/datasets/ceval/ceval_gen_2daf24.py |
| ceval_gen_5f30c7 | configs/datasets/ceval/ceval_gen_5f30c7.py |
| ceval_ppl | configs/datasets/ceval/ceval_ppl.py |
| ceval_ppl_578f8d | configs/datasets/ceval/ceval_ppl_578f8d.py |
| ceval_ppl_93e5ce | configs/datasets/ceval/ceval_ppl_93e5ce.py |
| ceval_zero_shot_gen_bd40ef | configs/datasets/ceval/ceval_zero_shot_gen_bd40ef.py |
+----------------------------+------------------------------------------------------+
启动评测
1 | !python run.py --datasets ceval_gen --hf-path /share/temp/model_repos/internlm-chat-7b/ \ |
01/21 12:54:59 - OpenCompass - INFO - Loading ceval_gen: configs/datasets/ceval/ceval_gen.py
01/21 12:54:59 - OpenCompass - INFO - Loading example: configs/summarizers/example.py
01/21 12:54:59 - OpenCompass - WARNING - SlurmRunner is not used, so the partition argument is ignored.
01/21 12:54:59 - OpenCompass - DEBUG - Modules of opencompass's partitioner registry have been automatically imported from opencompass.partitioners
01/21 12:54:59 - OpenCompass - DEBUG - Get class `SizePartitioner` from "partitioner" registry in "opencompass"
01/21 12:54:59 - OpenCompass - DEBUG - An `SizePartitioner` instance is built from registry, and its implementation can be found in opencompass.partitioners.size
01/21 12:54:59 - OpenCompass - DEBUG - Key eval.runner.task.judge_cfg not found in config, ignored.
01/21 12:54:59 - OpenCompass - DEBUG - Key eval.runner.task.dump_details not found in config, ignored.
01/21 12:54:59 - OpenCompass - DEBUG - Additional config: {}
01/21 12:54:59 - OpenCompass - DEBUG - Modules of opencompass's load_dataset registry have been automatically imported from opencompass.datasets
01/21 12:54:59 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:54:59 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:54:59 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:54:59 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:54:59 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:54:59 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:54:59 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:54:59 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:54:59 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:54:59 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:54:59 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:54:59 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:54:59 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:54:59 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:54:59 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:54:59 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:54:59 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:54:59 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:54:59 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:54:59 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:54:59 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:54:59 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:54:59 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:54:59 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:54:59 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:54:59 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:54:59 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:54:59 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:54:59 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:54:59 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:54:59 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:54:59 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:54:59 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:54:59 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:54:59 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:54:59 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:54:59 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:54:59 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:54:59 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:54:59 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:54:59 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:54:59 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:54:59 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:54:59 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:54:59 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:54:59 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:54:59 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:54:59 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:54:59 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:55:00 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:55:00 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:55:00 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:55:00 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:55:00 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:55:00 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:55:00 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:55:00 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:55:00 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:55:00 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:55:00 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:55:00 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:55:00 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:55:00 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:55:00 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:55:00 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:55:00 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:55:00 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:55:00 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:55:00 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:55:00 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:55:00 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:55:00 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:55:00 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:55:00 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:55:00 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:55:00 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:55:00 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:55:00 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:55:00 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:55:00 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:55:00 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:55:00 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:55:00 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:55:00 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:55:00 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:55:00 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:55:00 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:55:00 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:55:00 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:55:00 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:55:00 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:55:00 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:55:00 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:55:00 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:55:00 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:55:00 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:55:00 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:55:00 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:55:00 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:55:00 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:55:00 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:55:00 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:55:00 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_dataset" registry in "opencompass"
01/21 12:55:00 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.ceval
01/21 12:55:00 - OpenCompass - INFO - Partitioned into 1 tasks.
01/21 12:55:00 - OpenCompass - DEBUG - Task 0: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-college_economics,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-accountant,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-tax_accountant,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-physician,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-civil_servant,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-urban_and_rural_planner,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-teacher_qualification,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-college_programming,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-electrical_engineer,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-business_administration,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-art_studies,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-fire_engineer,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-environmental_impact_assessment_engineer,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-education_science,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-professional_tour_guide,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-college_chemistry,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-metrology_engineer,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-mao_zedong_thought,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-law,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-veterinary_medicine,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-modern_chinese_history,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-chinese_language_and_literature,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-legal_professional,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-logic,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-middle_school_history,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-plant_protection,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-clinical_medicine,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-computer_architecture,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-middle_school_biology,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-middle_school_politics,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-middle_school_chemistry,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-high_school_history,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-computer_network,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-operating_system,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-college_physics,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-advanced_mathematics,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-high_school_physics,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-high_school_chemistry,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-high_school_biology,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-middle_school_mathematics,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-middle_school_physics,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-marxism,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-high_school_politics,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-high_school_geography,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-ideological_and_moral_cultivation,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-high_school_chinese,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-sports_science,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-basic_medicine,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-probability_and_statistics,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-high_school_mathematics,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-discrete_mathematics,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-middle_school_geography]
01/21 12:55:00 - OpenCompass - DEBUG - Modules of opencompass's runner registry have been automatically imported from opencompass.runners
01/21 12:55:00 - OpenCompass - DEBUG - Get class `LocalRunner` from "runner" registry in "opencompass"
01/21 12:55:00 - OpenCompass - DEBUG - An `LocalRunner` instance is built from registry, and its implementation can be found in opencompass.runners.local
01/21 12:55:00 - OpenCompass - DEBUG - Modules of opencompass's task registry have been automatically imported from opencompass.tasks
01/21 12:55:00 - OpenCompass - DEBUG - Get class `OpenICLInferTask` from "task" registry in "opencompass"
01/21 12:55:00 - OpenCompass - DEBUG - An `OpenICLInferTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_infer
01/21 12:55:11 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-college_economics,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-accountant,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-tax_accountant,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-physician,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-civil_servant,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-urban_and_rural_planner,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-teacher_qualification,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-college_programming,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-electrical_engineer,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-business_administration,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-art_studies,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-fire_engineer,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-environmental_impact_assessment_engineer,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-education_science,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-professional_tour_guide,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-college_chemistry,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-metrology_engineer,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-mao_zedong_thought,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-law,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-veterinary_medicine,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-modern_chinese_history,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-chinese_language_and_literature,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-legal_professional,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-logic,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-middle_school_history,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-plant_protection,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-clinical_medicine,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-computer_architecture,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-middle_school_biology,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-middle_school_politics,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-middle_school_chemistry,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-high_school_history,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-computer_network,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-operating_system,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-college_physics,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-advanced_mathematics,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-high_school_physics,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-high_school_chemistry,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-high_school_biology,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-middle_school_mathematics,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-middle_school_physics,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-marxism,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-high_school_politics,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-high_school_geography,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-ideological_and_moral_cultivation,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-high_school_chinese,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-sports_science,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-basic_medicine,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-probability_and_statistics,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-high_school_mathematics,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-discrete_mathematics,opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-middle_school_geography]
Loading checkpoint shards: 100%|██████████████████| 8/8 [00:12<00:00, 1.52s/it]
01/21 12:55:39 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-college_economics]
100%|██████████████████████████████████████| 55/55 [00:00<00:00, 1548233.02it/s]
[2024-01-21 12:55:39,531] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|███████████████████████████████████████████| 14/14 [00:38<00:00, 2.75s/it]
01/21 12:56:18 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-accountant]
100%|██████████████████████████████████████| 49/49 [00:00<00:00, 1478567.60it/s]
[2024-01-21 12:56:18,207] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|███████████████████████████████████████████| 13/13 [00:38<00:00, 2.93s/it]
01/21 12:56:56 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-tax_accountant]
100%|██████████████████████████████████████| 49/49 [00:00<00:00, 1500152.53it/s]
[2024-01-21 12:56:56,496] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|███████████████████████████████████████████| 13/13 [00:34<00:00, 2.69s/it]
01/21 12:57:31 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-physician]
100%|██████████████████████████████████████| 49/49 [00:00<00:00, 1533738.03it/s]
[2024-01-21 12:57:31,552] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|███████████████████████████████████████████| 13/13 [00:22<00:00, 1.71s/it]
01/21 12:57:53 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-civil_servant]
100%|██████████████████████████████████████| 47/47 [00:00<00:00, 1552222.74it/s]
[2024-01-21 12:57:53,942] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|███████████████████████████████████████████| 12/12 [00:42<00:00, 3.50s/it]
01/21 12:58:35 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-urban_and_rural_planner]
100%|██████████████████████████████████████| 46/46 [00:00<00:00, 1162277.01it/s]
[2024-01-21 12:58:36,114] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|███████████████████████████████████████████| 12/12 [00:25<00:00, 2.17s/it]
01/21 12:59:02 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-teacher_qualification]
100%|██████████████████████████████████████| 44/44 [00:00<00:00, 1476395.01it/s]
[2024-01-21 12:59:02,249] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|███████████████████████████████████████████| 11/11 [00:20<00:00, 1.89s/it]
01/21 12:59:23 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-college_programming]
100%|██████████████████████████████████████| 37/37 [00:00<00:00, 1241513.98it/s]
[2024-01-21 12:59:23,203] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|███████████████████████████████████████████| 10/10 [00:24<00:00, 2.46s/it]
01/21 12:59:47 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-electrical_engineer]
100%|██████████████████████████████████████| 37/37 [00:00<00:00, 1241513.98it/s]
[2024-01-21 12:59:47,894] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|███████████████████████████████████████████| 10/10 [00:24<00:00, 2.45s/it]
01/21 13:00:12 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-business_administration]
100%|██████████████████████████████████████| 33/33 [00:00<00:00, 1134524.85it/s]
[2024-01-21 13:00:12,543] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 9/9 [00:18<00:00, 2.08s/it]
01/21 13:00:31 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-art_studies]
100%|██████████████████████████████████████| 33/33 [00:00<00:00, 1163126.32it/s]
[2024-01-21 13:00:31,424] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 9/9 [00:13<00:00, 1.51s/it]
01/21 13:00:45 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-fire_engineer]
100%|██████████████████████████████████████| 31/31 [00:00<00:00, 1150649.77it/s]
[2024-01-21 13:00:45,156] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 8/8 [00:19<00:00, 2.47s/it]
01/21 13:01:04 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-environmental_impact_assessment_engineer]
100%|██████████████████████████████████████| 31/31 [00:00<00:00, 1015808.00it/s]
[2024-01-21 13:01:05,057] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 8/8 [00:17<00:00, 2.19s/it]
01/21 13:01:22 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-education_science]
100%|██████████████████████████████████████| 29/29 [00:00<00:00, 1030803.53it/s]
[2024-01-21 13:01:22,695] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 8/8 [00:11<00:00, 1.40s/it]
01/21 13:01:33 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-professional_tour_guide]
100%|██████████████████████████████████████| 29/29 [00:00<00:00, 1039613.81it/s]
[2024-01-21 13:01:34,053] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 8/8 [00:12<00:00, 1.62s/it]
01/21 13:01:47 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-college_chemistry]
100%|███████████████████████████████████████| 24/24 [00:00<00:00, 883011.37it/s]
[2024-01-21 13:01:47,116] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 6/6 [00:14<00:00, 2.38s/it]
01/21 13:02:01 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-metrology_engineer]
100%|███████████████████████████████████████| 24/24 [00:00<00:00, 915120.87it/s]
[2024-01-21 13:02:01,487] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 6/6 [00:14<00:00, 2.36s/it]
01/21 13:02:15 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-mao_zedong_thought]
100%|███████████████████████████████████████| 24/24 [00:00<00:00, 713924.09it/s]
[2024-01-21 13:02:15,714] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 6/6 [00:11<00:00, 1.90s/it]
01/21 13:02:27 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-law]
100%|███████████████████████████████████████| 24/24 [00:00<00:00, 689474.63it/s]
[2024-01-21 13:02:27,195] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 6/6 [00:17<00:00, 2.85s/it]
01/21 13:02:44 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-veterinary_medicine]
100%|███████████████████████████████████████| 23/23 [00:00<00:00, 869090.02it/s]
[2024-01-21 13:02:44,418] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 6/6 [00:10<00:00, 1.80s/it]
01/21 13:02:55 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-modern_chinese_history]
100%|███████████████████████████████████████| 23/23 [00:00<00:00, 846219.23it/s]
[2024-01-21 13:02:55,314] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 6/6 [00:12<00:00, 2.07s/it]
01/21 13:03:07 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-chinese_language_and_literature]
100%|███████████████████████████████████████| 23/23 [00:00<00:00, 893231.41it/s]
[2024-01-21 13:03:07,826] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 6/6 [00:10<00:00, 1.74s/it]
01/21 13:03:18 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-legal_professional]
100%|███████████████████████████████████████| 23/23 [00:00<00:00, 853707.89it/s]
[2024-01-21 13:03:18,394] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 6/6 [00:22<00:00, 3.81s/it]
01/21 13:03:41 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-logic]
100%|███████████████████████████████████████| 22/22 [00:00<00:00, 862380.26it/s]
[2024-01-21 13:03:41,414] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 6/6 [00:29<00:00, 4.84s/it]
01/21 13:04:10 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-middle_school_history]
100%|███████████████████████████████████████| 22/22 [00:00<00:00, 862380.26it/s]
[2024-01-21 13:04:10,543] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 6/6 [00:09<00:00, 1.55s/it]
01/21 13:04:19 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-plant_protection]
100%|███████████████████████████████████████| 22/22 [00:00<00:00, 878806.55it/s]
[2024-01-21 13:04:19,944] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 6/6 [00:08<00:00, 1.47s/it]
01/21 13:04:28 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-clinical_medicine]
100%|███████████████████████████████████████| 22/22 [00:00<00:00, 838860.80it/s]
[2024-01-21 13:04:28,856] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 6/6 [00:12<00:00, 2.01s/it]
01/21 13:04:40 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-computer_architecture]
100%|███████████████████████████████████████| 21/21 [00:00<00:00, 547083.13it/s]
[2024-01-21 13:04:41,027] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 6/6 [00:13<00:00, 2.22s/it]
01/21 13:04:54 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-middle_school_biology]
100%|███████████████████████████████████████| 21/21 [00:00<00:00, 765916.38it/s]
[2024-01-21 13:04:54,470] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 6/6 [00:09<00:00, 1.53s/it]
01/21 13:05:03 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-middle_school_politics]
100%|███████████████████████████████████████| 21/21 [00:00<00:00, 830947.02it/s]
[2024-01-21 13:05:03,749] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 6/6 [00:14<00:00, 2.41s/it]
01/21 13:05:18 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-middle_school_chemistry]
100%|███████████████████████████████████████| 20/20 [00:00<00:00, 791378.11it/s]
[2024-01-21 13:05:18,296] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 5/5 [00:12<00:00, 2.46s/it]
01/21 13:05:30 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-high_school_history]
100%|███████████████████████████████████████| 20/20 [00:00<00:00, 798915.05it/s]
[2024-01-21 13:05:30,687] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 5/5 [00:11<00:00, 2.25s/it]
01/21 13:05:41 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-computer_network]
100%|███████████████████████████████████████| 19/19 [00:00<00:00, 731117.21it/s]
[2024-01-21 13:05:42,035] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 5/5 [00:08<00:00, 1.73s/it]
01/21 13:05:50 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-operating_system]
100%|███████████████████████████████████████| 19/19 [00:00<00:00, 724470.69it/s]
[2024-01-21 13:05:50,775] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 5/5 [00:07<00:00, 1.44s/it]
01/21 13:05:57 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-college_physics]
100%|███████████████████████████████████████| 19/19 [00:00<00:00, 737886.81it/s]
[2024-01-21 13:05:58,040] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 5/5 [00:14<00:00, 2.90s/it]
01/21 13:06:12 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-advanced_mathematics]
100%|███████████████████████████████████████| 19/19 [00:00<00:00, 766267.08it/s]
[2024-01-21 13:06:12,653] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 5/5 [00:22<00:00, 4.41s/it]
01/21 13:06:34 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-high_school_physics]
100%|███████████████████████████████████████| 19/19 [00:00<00:00, 717943.93it/s]
[2024-01-21 13:06:34,792] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 5/5 [00:10<00:00, 2.02s/it]
01/21 13:06:44 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-high_school_chemistry]
100%|███████████████████████████████████████| 19/19 [00:00<00:00, 751809.21it/s]
[2024-01-21 13:06:44,964] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 5/5 [00:10<00:00, 2.06s/it]
01/21 13:06:55 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-high_school_biology]
100%|███████████████████████████████████████| 19/19 [00:00<00:00, 804967.43it/s]
[2024-01-21 13:06:55,327] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 5/5 [00:10<00:00, 2.09s/it]
01/21 13:07:05 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-middle_school_mathematics]
100%|███████████████████████████████████████| 19/19 [00:00<00:00, 758969.30it/s]
[2024-01-21 13:07:05,839] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 5/5 [00:15<00:00, 3.07s/it]
01/21 13:07:21 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-middle_school_physics]
100%|███████████████████████████████████████| 19/19 [00:00<00:00, 581691.80it/s]
[2024-01-21 13:07:21,273] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 5/5 [00:10<00:00, 2.14s/it]
01/21 13:07:31 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-marxism]
100%|███████████████████████████████████████| 19/19 [00:00<00:00, 744782.95it/s]
[2024-01-21 13:07:32,035] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 5/5 [00:08<00:00, 1.65s/it]
01/21 13:07:40 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-high_school_politics]
100%|███████████████████████████████████████| 19/19 [00:00<00:00, 681126.29it/s]
[2024-01-21 13:07:40,420] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 5/5 [00:18<00:00, 3.65s/it]
01/21 13:07:58 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-high_school_geography]
100%|███████████████████████████████████████| 19/19 [00:00<00:00, 781291.92it/s]
[2024-01-21 13:07:58,750] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 5/5 [00:08<00:00, 1.64s/it]
01/21 13:08:06 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-ideological_and_moral_cultivation]
100%|███████████████████████████████████████| 19/19 [00:00<00:00, 737886.81it/s]
[2024-01-21 13:08:07,017] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 5/5 [00:06<00:00, 1.21s/it]
01/21 13:08:13 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-high_school_chinese]
100%|███████████████████████████████████████| 19/19 [00:00<00:00, 766267.08it/s]
[2024-01-21 13:08:13,184] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 5/5 [00:24<00:00, 4.92s/it]
01/21 13:08:37 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-sports_science]
100%|███████████████████████████████████████| 19/19 [00:00<00:00, 758969.30it/s]
[2024-01-21 13:08:37,875] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 5/5 [00:07<00:00, 1.42s/it]
01/21 13:08:45 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-basic_medicine]
100%|███████████████████████████████████████| 19/19 [00:00<00:00, 737886.81it/s]
[2024-01-21 13:08:45,046] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 5/5 [00:06<00:00, 1.38s/it]
01/21 13:08:51 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-probability_and_statistics]
100%|███████████████████████████████████████| 18/18 [00:00<00:00, 725937.23it/s]
[2024-01-21 13:08:52,026] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 5/5 [00:23<00:00, 4.77s/it]
01/21 13:09:15 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-high_school_mathematics]
100%|███████████████████████████████████████| 18/18 [00:00<00:00, 740171.29it/s]
[2024-01-21 13:09:15,962] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 5/5 [00:16<00:00, 3.30s/it]
01/21 13:09:32 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-discrete_mathematics]
100%|███████████████████████████████████████| 16/16 [00:00<00:00, 122461.43it/s]
[2024-01-21 13:09:32,566] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 4/4 [00:07<00:00, 1.79s/it]
01/21 13:09:39 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-middle_school_geography]
100%|███████████████████████████████████████| 12/12 [00:00<00:00, 518882.97it/s]
[2024-01-21 13:09:39,795] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████| 3/3 [00:05<00:00, 1.73s/it]
01/21 13:09:44 - OpenCompass - INFO - time elapsed: 873.46s
01/21 13:09:49 - OpenCompass - DEBUG - Get class `NaivePartitioner` from "partitioner" registry in "opencompass"
01/21 13:09:49 - OpenCompass - DEBUG - An `NaivePartitioner` instance is built from registry, and its implementation can be found in opencompass.partitioners.naive
01/21 13:09:49 - OpenCompass - DEBUG - Key eval.runner.task.judge_cfg not found in config, ignored.
01/21 13:09:49 - OpenCompass - DEBUG - Key eval.runner.task.dump_details not found in config, ignored.
01/21 13:09:49 - OpenCompass - DEBUG - Additional config: {'eval': {'runner': {'task': {}}}}
01/21 13:09:49 - OpenCompass - INFO - Partitioned into 52 tasks.
01/21 13:09:49 - OpenCompass - DEBUG - Task 0: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-computer_network]
01/21 13:09:49 - OpenCompass - DEBUG - Task 1: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-operating_system]
01/21 13:09:49 - OpenCompass - DEBUG - Task 2: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-computer_architecture]
01/21 13:09:49 - OpenCompass - DEBUG - Task 3: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-college_programming]
01/21 13:09:49 - OpenCompass - DEBUG - Task 4: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-college_physics]
01/21 13:09:49 - OpenCompass - DEBUG - Task 5: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-college_chemistry]
01/21 13:09:49 - OpenCompass - DEBUG - Task 6: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-advanced_mathematics]
01/21 13:09:49 - OpenCompass - DEBUG - Task 7: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-probability_and_statistics]
01/21 13:09:49 - OpenCompass - DEBUG - Task 8: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-discrete_mathematics]
01/21 13:09:49 - OpenCompass - DEBUG - Task 9: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-electrical_engineer]
01/21 13:09:49 - OpenCompass - DEBUG - Task 10: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-metrology_engineer]
01/21 13:09:49 - OpenCompass - DEBUG - Task 11: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-high_school_mathematics]
01/21 13:09:49 - OpenCompass - DEBUG - Task 12: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-high_school_physics]
01/21 13:09:49 - OpenCompass - DEBUG - Task 13: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-high_school_chemistry]
01/21 13:09:49 - OpenCompass - DEBUG - Task 14: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-high_school_biology]
01/21 13:09:49 - OpenCompass - DEBUG - Task 15: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-middle_school_mathematics]
01/21 13:09:49 - OpenCompass - DEBUG - Task 16: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-middle_school_biology]
01/21 13:09:49 - OpenCompass - DEBUG - Task 17: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-middle_school_physics]
01/21 13:09:49 - OpenCompass - DEBUG - Task 18: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-middle_school_chemistry]
01/21 13:09:49 - OpenCompass - DEBUG - Task 19: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-veterinary_medicine]
01/21 13:09:49 - OpenCompass - DEBUG - Task 20: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-college_economics]
01/21 13:09:49 - OpenCompass - DEBUG - Task 21: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-business_administration]
01/21 13:09:49 - OpenCompass - DEBUG - Task 22: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-marxism]
01/21 13:09:49 - OpenCompass - DEBUG - Task 23: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-mao_zedong_thought]
01/21 13:09:49 - OpenCompass - DEBUG - Task 24: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-education_science]
01/21 13:09:49 - OpenCompass - DEBUG - Task 25: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-teacher_qualification]
01/21 13:09:49 - OpenCompass - DEBUG - Task 26: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-high_school_politics]
01/21 13:09:49 - OpenCompass - DEBUG - Task 27: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-high_school_geography]
01/21 13:09:49 - OpenCompass - DEBUG - Task 28: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-middle_school_politics]
01/21 13:09:49 - OpenCompass - DEBUG - Task 29: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-middle_school_geography]
01/21 13:09:49 - OpenCompass - DEBUG - Task 30: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-modern_chinese_history]
01/21 13:09:49 - OpenCompass - DEBUG - Task 31: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-ideological_and_moral_cultivation]
01/21 13:09:49 - OpenCompass - DEBUG - Task 32: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-logic]
01/21 13:09:49 - OpenCompass - DEBUG - Task 33: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-law]
01/21 13:09:49 - OpenCompass - DEBUG - Task 34: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-chinese_language_and_literature]
01/21 13:09:49 - OpenCompass - DEBUG - Task 35: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-art_studies]
01/21 13:09:49 - OpenCompass - DEBUG - Task 36: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-professional_tour_guide]
01/21 13:09:49 - OpenCompass - DEBUG - Task 37: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-legal_professional]
01/21 13:09:49 - OpenCompass - DEBUG - Task 38: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-high_school_chinese]
01/21 13:09:49 - OpenCompass - DEBUG - Task 39: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-high_school_history]
01/21 13:09:49 - OpenCompass - DEBUG - Task 40: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-middle_school_history]
01/21 13:09:49 - OpenCompass - DEBUG - Task 41: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-civil_servant]
01/21 13:09:49 - OpenCompass - DEBUG - Task 42: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-sports_science]
01/21 13:09:49 - OpenCompass - DEBUG - Task 43: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-plant_protection]
01/21 13:09:49 - OpenCompass - DEBUG - Task 44: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-basic_medicine]
01/21 13:09:49 - OpenCompass - DEBUG - Task 45: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-clinical_medicine]
01/21 13:09:49 - OpenCompass - DEBUG - Task 46: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-urban_and_rural_planner]
01/21 13:09:49 - OpenCompass - DEBUG - Task 47: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-accountant]
01/21 13:09:49 - OpenCompass - DEBUG - Task 48: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-fire_engineer]
01/21 13:09:49 - OpenCompass - DEBUG - Task 49: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-environmental_impact_assessment_engineer]
01/21 13:09:49 - OpenCompass - DEBUG - Task 50: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-tax_accountant]
01/21 13:09:49 - OpenCompass - DEBUG - Task 51: [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-physician]
01/21 13:09:49 - OpenCompass - DEBUG - Get class `LocalRunner` from "runner" registry in "opencompass"
01/21 13:09:49 - OpenCompass - DEBUG - An `LocalRunner` instance is built from registry, and its implementation can be found in opencompass.runners.local
01/21 13:09:49 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:09:49 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:11:21 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-computer_network]: {'accuracy': 31.57894736842105}
01/21 13:11:21 - OpenCompass - INFO - time elapsed: 60.60s
01/21 13:11:22 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:11:22 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:11:43 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-operating_system]: {'accuracy': 36.84210526315789}
01/21 13:11:49 - OpenCompass - INFO - time elapsed: 19.91s
01/21 13:11:49 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:11:49 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:12:12 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-computer_architecture]: {'accuracy': 28.57142857142857}
01/21 13:12:17 - OpenCompass - INFO - time elapsed: 20.15s
01/21 13:12:18 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:12:18 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:13:20 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-college_programming]: {'accuracy': 32.432432432432435}
01/21 13:13:20 - OpenCompass - INFO - time elapsed: 56.35s
01/21 13:13:21 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:13:21 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:13:33 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-college_physics]: {'accuracy': 26.31578947368421}
01/21 13:13:33 - OpenCompass - INFO - time elapsed: 5.75s
01/21 13:13:34 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:13:34 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:13:53 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-college_chemistry]: {'accuracy': 16.666666666666664}
01/21 13:13:54 - OpenCompass - INFO - time elapsed: 14.26s
01/21 13:13:55 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:13:55 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:14:26 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-advanced_mathematics]: {'accuracy': 21.052631578947366}
01/21 13:14:26 - OpenCompass - INFO - time elapsed: 24.90s
01/21 13:14:27 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:14:27 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:14:47 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-probability_and_statistics]: {'accuracy': 38.88888888888889}
01/21 13:14:49 - OpenCompass - INFO - time elapsed: 15.55s
01/21 13:14:49 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:14:49 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:15:50 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-discrete_mathematics]: {'accuracy': 18.75}
01/21 13:16:16 - OpenCompass - INFO - time elapsed: 61.13s
01/21 13:16:16 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:16:16 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:16:52 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-electrical_engineer]: {'accuracy': 35.13513513513514}
01/21 13:16:54 - OpenCompass - INFO - time elapsed: 17.86s
01/21 13:16:55 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:16:55 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:17:48 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-metrology_engineer]: {'accuracy': 50.0}
01/21 13:17:48 - OpenCompass - INFO - time elapsed: 30.54s
01/21 13:17:49 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:17:49 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:18:25 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-high_school_mathematics]: {'accuracy': 22.22222222222222}
01/21 13:18:25 - OpenCompass - INFO - time elapsed: 24.63s
01/21 13:18:26 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:18:26 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:19:07 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-high_school_physics]: {'accuracy': 31.57894736842105}
01/21 13:19:07 - OpenCompass - INFO - time elapsed: 14.48s
01/21 13:19:08 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:19:08 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:19:43 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-high_school_chemistry]: {'accuracy': 15.789473684210526}
01/21 13:20:12 - OpenCompass - INFO - time elapsed: 54.42s
01/21 13:20:13 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:20:13 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:20:32 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-high_school_biology]: {'accuracy': 36.84210526315789}
01/21 13:20:32 - OpenCompass - INFO - time elapsed: 10.37s
01/21 13:20:33 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:20:33 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:20:50 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-middle_school_mathematics]: {'accuracy': 26.31578947368421}
01/21 13:20:50 - OpenCompass - INFO - time elapsed: 7.74s
01/21 13:20:50 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:20:50 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:21:05 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-middle_school_biology]: {'accuracy': 61.904761904761905}
01/21 13:21:05 - OpenCompass - INFO - time elapsed: 7.39s
01/21 13:21:06 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:21:06 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:21:19 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-middle_school_physics]: {'accuracy': 63.1578947368421}
01/21 13:21:19 - OpenCompass - INFO - time elapsed: 6.52s
01/21 13:21:20 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:21:20 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:21:34 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-middle_school_chemistry]: {'accuracy': 60.0}
01/21 13:21:34 - OpenCompass - INFO - time elapsed: 7.03s
01/21 13:21:35 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:21:35 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:21:48 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-veterinary_medicine]: {'accuracy': 47.82608695652174}
01/21 13:21:48 - OpenCompass - INFO - time elapsed: 6.74s
01/21 13:21:49 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:21:49 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:22:01 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-college_economics]: {'accuracy': 41.81818181818181}
01/21 13:22:01 - OpenCompass - INFO - time elapsed: 5.99s
01/21 13:22:01 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:22:01 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:22:14 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-business_administration]: {'accuracy': 33.33333333333333}
01/21 13:22:14 - OpenCompass - INFO - time elapsed: 5.85s
01/21 13:22:15 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:22:15 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:22:27 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-marxism]: {'accuracy': 68.42105263157895}
01/21 13:22:27 - OpenCompass - INFO - time elapsed: 6.38s
01/21 13:22:28 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:22:28 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:22:39 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-mao_zedong_thought]: {'accuracy': 70.83333333333334}
01/21 13:22:39 - OpenCompass - INFO - time elapsed: 5.05s
01/21 13:22:40 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:22:40 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:22:51 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-education_science]: {'accuracy': 58.620689655172406}
01/21 13:22:51 - OpenCompass - INFO - time elapsed: 5.12s
01/21 13:22:52 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:22:52 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:23:03 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-teacher_qualification]: {'accuracy': 70.45454545454545}
01/21 13:23:03 - OpenCompass - INFO - time elapsed: 5.09s
01/21 13:23:03 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:23:03 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:23:15 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-high_school_politics]: {'accuracy': 26.31578947368421}
01/21 13:23:15 - OpenCompass - INFO - time elapsed: 5.58s
01/21 13:23:15 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:23:15 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:23:27 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-high_school_geography]: {'accuracy': 47.368421052631575}
01/21 13:23:27 - OpenCompass - INFO - time elapsed: 5.31s
01/21 13:23:28 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:23:28 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:23:40 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-middle_school_politics]: {'accuracy': 52.38095238095239}
01/21 13:23:40 - OpenCompass - INFO - time elapsed: 6.11s
01/21 13:23:41 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:23:41 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:23:52 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-middle_school_geography]: {'accuracy': 58.333333333333336}
01/21 13:23:52 - OpenCompass - INFO - time elapsed: 5.68s
01/21 13:23:53 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:23:53 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:24:05 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-modern_chinese_history]: {'accuracy': 73.91304347826086}
01/21 13:24:05 - OpenCompass - INFO - time elapsed: 6.29s
01/21 13:24:05 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:24:05 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:24:16 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-ideological_and_moral_cultivation]: {'accuracy': 63.1578947368421}
01/21 13:24:16 - OpenCompass - INFO - time elapsed: 5.21s
01/21 13:24:17 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:24:17 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:24:29 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-logic]: {'accuracy': 31.818181818181817}
01/21 13:24:29 - OpenCompass - INFO - time elapsed: 5.91s
01/21 13:24:29 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:24:29 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:24:41 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-law]: {'accuracy': 25.0}
01/21 13:24:41 - OpenCompass - INFO - time elapsed: 6.18s
01/21 13:24:42 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:24:42 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:24:55 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-chinese_language_and_literature]: {'accuracy': 30.434782608695656}
01/21 13:24:55 - OpenCompass - INFO - time elapsed: 6.18s
01/21 13:24:56 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:24:56 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:25:10 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-art_studies]: {'accuracy': 60.60606060606061}
01/21 13:25:10 - OpenCompass - INFO - time elapsed: 6.80s
01/21 13:25:10 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:25:10 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:25:21 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-professional_tour_guide]: {'accuracy': 62.06896551724138}
01/21 13:25:21 - OpenCompass - INFO - time elapsed: 4.76s
01/21 13:25:22 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:25:22 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:25:34 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-legal_professional]: {'accuracy': 39.130434782608695}
01/21 13:25:34 - OpenCompass - INFO - time elapsed: 6.15s
01/21 13:25:34 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:25:34 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:25:46 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-high_school_chinese]: {'accuracy': 63.1578947368421}
01/21 13:25:46 - OpenCompass - INFO - time elapsed: 5.56s
01/21 13:25:46 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:25:46 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:25:56 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-high_school_history]: {'accuracy': 70.0}
01/21 13:25:56 - OpenCompass - INFO - time elapsed: 4.41s
01/21 13:25:57 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:25:57 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:26:08 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-middle_school_history]: {'accuracy': 59.09090909090909}
01/21 13:26:08 - OpenCompass - INFO - time elapsed: 5.36s
01/21 13:26:09 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:26:09 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:26:20 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-civil_servant]: {'accuracy': 53.191489361702125}
01/21 13:26:20 - OpenCompass - INFO - time elapsed: 5.42s
01/21 13:26:21 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:26:21 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:26:33 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-sports_science]: {'accuracy': 52.63157894736842}
01/21 13:26:33 - OpenCompass - INFO - time elapsed: 6.01s
01/21 13:26:34 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:26:34 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:26:45 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-plant_protection]: {'accuracy': 59.09090909090909}
01/21 13:26:45 - OpenCompass - INFO - time elapsed: 5.62s
01/21 13:26:46 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:26:46 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:26:57 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-basic_medicine]: {'accuracy': 47.368421052631575}
01/21 13:26:57 - OpenCompass - INFO - time elapsed: 5.50s
01/21 13:26:58 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:26:58 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:27:10 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-clinical_medicine]: {'accuracy': 40.909090909090914}
01/21 13:27:10 - OpenCompass - INFO - time elapsed: 5.61s
01/21 13:27:11 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:27:11 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:27:22 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-urban_and_rural_planner]: {'accuracy': 45.65217391304348}
01/21 13:27:22 - OpenCompass - INFO - time elapsed: 4.93s
01/21 13:27:23 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:27:23 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:27:34 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-accountant]: {'accuracy': 26.53061224489796}
01/21 13:27:34 - OpenCompass - INFO - time elapsed: 5.41s
01/21 13:27:35 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:27:35 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:27:46 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-fire_engineer]: {'accuracy': 22.58064516129032}
01/21 13:27:46 - OpenCompass - INFO - time elapsed: 5.98s
01/21 13:27:47 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:27:47 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:27:59 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-environmental_impact_assessment_engineer]: {'accuracy': 64.51612903225806}
01/21 13:27:59 - OpenCompass - INFO - time elapsed: 6.09s
01/21 13:28:00 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:28:00 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:28:13 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-tax_accountant]: {'accuracy': 34.69387755102041}
01/21 13:28:13 - OpenCompass - INFO - time elapsed: 6.41s
01/21 13:28:14 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task" registry in "opencompass"
01/21 13:28:14 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
01/21 13:28:27 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b/ceval-physician]: {'accuracy': 40.816326530612244}
01/21 13:28:27 - OpenCompass - INFO - time elapsed: 5.91s
01/21 13:28:27 - OpenCompass - DEBUG - An `DefaultSummarizer` instance is built from registry, and its implementation can be found in opencompass.summarizers.default
dataset version metric mode opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b
---------------------------------------------- --------- ------------- ------ -------------------------------------------------------------------------
ceval-computer_network db9ce2 accuracy gen 31.58
ceval-operating_system 1c2571 accuracy gen 36.84
ceval-computer_architecture a74dad accuracy gen 28.57
ceval-college_programming 4ca32a accuracy gen 32.43
ceval-college_physics 963fa8 accuracy gen 26.32
ceval-college_chemistry e78857 accuracy gen 16.67
ceval-advanced_mathematics ce03e2 accuracy gen 21.05
ceval-probability_and_statistics 65e812 accuracy gen 38.89
ceval-discrete_mathematics e894ae accuracy gen 18.75
ceval-electrical_engineer ae42b9 accuracy gen 35.14
ceval-metrology_engineer ee34ea accuracy gen 50
ceval-high_school_mathematics 1dc5bf accuracy gen 22.22
ceval-high_school_physics adf25f accuracy gen 31.58
ceval-high_school_chemistry 2ed27f accuracy gen 15.79
ceval-high_school_biology 8e2b9a accuracy gen 36.84
ceval-middle_school_mathematics bee8d5 accuracy gen 26.32
ceval-middle_school_biology 86817c accuracy gen 61.9
ceval-middle_school_physics 8accf6 accuracy gen 63.16
ceval-middle_school_chemistry 167a15 accuracy gen 60
ceval-veterinary_medicine b4e08d accuracy gen 47.83
ceval-college_economics f3f4e6 accuracy gen 41.82
ceval-business_administration c1614e accuracy gen 33.33
ceval-marxism cf874c accuracy gen 68.42
ceval-mao_zedong_thought 51c7a4 accuracy gen 70.83
ceval-education_science 591fee accuracy gen 58.62
ceval-teacher_qualification 4e4ced accuracy gen 70.45
ceval-high_school_politics 5c0de2 accuracy gen 26.32
ceval-high_school_geography 865461 accuracy gen 47.37
ceval-middle_school_politics 5be3e7 accuracy gen 52.38
ceval-middle_school_geography 8a63be accuracy gen 58.33
ceval-modern_chinese_history fc01af accuracy gen 73.91
ceval-ideological_and_moral_cultivation a2aa4a accuracy gen 63.16
ceval-logic f5b022 accuracy gen 31.82
ceval-law a110a1 accuracy gen 25
ceval-chinese_language_and_literature 0f8b68 accuracy gen 30.43
ceval-art_studies 2a1300 accuracy gen 60.61
ceval-professional_tour_guide 4e673e accuracy gen 62.07
ceval-legal_professional ce8787 accuracy gen 39.13
ceval-high_school_chinese 315705 accuracy gen 63.16
ceval-high_school_history 7eb30a accuracy gen 70
ceval-middle_school_history 48ab4a accuracy gen 59.09
ceval-civil_servant 87d061 accuracy gen 53.19
ceval-sports_science 70f27b accuracy gen 52.63
ceval-plant_protection 8941f9 accuracy gen 59.09
ceval-basic_medicine c409d6 accuracy gen 47.37
ceval-clinical_medicine 49e82d accuracy gen 40.91
ceval-urban_and_rural_planner 95b885 accuracy gen 45.65
ceval-accountant 002837 accuracy gen 26.53
ceval-fire_engineer bc23f5 accuracy gen 22.58
ceval-environmental_impact_assessment_engineer c64e2d accuracy gen 64.52
ceval-tax_accountant 3a5e3c accuracy gen 34.69
ceval-physician 6e277d accuracy gen 40.82
ceval-stem - naive_average gen 35.09
ceval-social-science - naive_average gen 52.79
ceval-humanities - naive_average gen 52.58
ceval-other - naive_average gen 44.36
ceval-hard - naive_average gen 23.91
ceval - naive_average gen 44.16
01/21 13:28:27 - OpenCompass - INFO - write summary to /root/opencompass/outputs/default/20240121_125459/summary/summary_20240121_125459.txt
01/21 13:28:27 - OpenCompass - INFO - write csv to /root/opencompass/outputs/default/20240121_125459/summary/summary_20240121_125459.csv
1 | # %load /root/opencompass/outputs/default/20240121_125459/summary/summary_20240121_125459.txt |
1 | import pandas as pd |
1 | data = pd.read_csv("/root/opencompass/outputs/default/20240121_125459/summary/summary_20240121_125459.csv") |
dataset | version | metric | mode | opencompass.models.huggingface.HuggingFace_model_repos_internlm-chat-7b | |
---|---|---|---|---|---|
0 | ceval-computer_network | db9ce2 | accuracy | gen | 31.58 |
1 | ceval-operating_system | 1c2571 | accuracy | gen | 36.84 |
2 | ceval-computer_architecture | a74dad | accuracy | gen | 28.57 |
3 | ceval-college_programming | 4ca32a | accuracy | gen | 32.43 |
4 | ceval-college_physics | 963fa8 | accuracy | gen | 26.32 |
5 | ceval-college_chemistry | e78857 | accuracy | gen | 16.67 |
6 | ceval-advanced_mathematics | ce03e2 | accuracy | gen | 21.05 |
7 | ceval-probability_and_statistics | 65e812 | accuracy | gen | 38.89 |
8 | ceval-discrete_mathematics | e894ae | accuracy | gen | 18.75 |
9 | ceval-electrical_engineer | ae42b9 | accuracy | gen | 35.14 |
10 | ceval-metrology_engineer | ee34ea | accuracy | gen | 50.00 |
11 | ceval-high_school_mathematics | 1dc5bf | accuracy | gen | 22.22 |
12 | ceval-high_school_physics | adf25f | accuracy | gen | 31.58 |
13 | ceval-high_school_chemistry | 2ed27f | accuracy | gen | 15.79 |
14 | ceval-high_school_biology | 8e2b9a | accuracy | gen | 36.84 |
15 | ceval-middle_school_mathematics | bee8d5 | accuracy | gen | 26.32 |
16 | ceval-middle_school_biology | 86817c | accuracy | gen | 61.90 |
17 | ceval-middle_school_physics | 8accf6 | accuracy | gen | 63.16 |
18 | ceval-middle_school_chemistry | 167a15 | accuracy | gen | 60.00 |
19 | ceval-veterinary_medicine | b4e08d | accuracy | gen | 47.83 |
20 | ceval-college_economics | f3f4e6 | accuracy | gen | 41.82 |
21 | ceval-business_administration | c1614e | accuracy | gen | 33.33 |
22 | ceval-marxism | cf874c | accuracy | gen | 68.42 |
23 | ceval-mao_zedong_thought | 51c7a4 | accuracy | gen | 70.83 |
24 | ceval-education_science | 591fee | accuracy | gen | 58.62 |
25 | ceval-teacher_qualification | 4e4ced | accuracy | gen | 70.45 |
26 | ceval-high_school_politics | 5c0de2 | accuracy | gen | 26.32 |
27 | ceval-high_school_geography | 865461 | accuracy | gen | 47.37 |
28 | ceval-middle_school_politics | 5be3e7 | accuracy | gen | 52.38 |
29 | ceval-middle_school_geography | 8a63be | accuracy | gen | 58.33 |
30 | ceval-modern_chinese_history | fc01af | accuracy | gen | 73.91 |
31 | ceval-ideological_and_moral_cultivation | a2aa4a | accuracy | gen | 63.16 |
32 | ceval-logic | f5b022 | accuracy | gen | 31.82 |
33 | ceval-law | a110a1 | accuracy | gen | 25.00 |
34 | ceval-chinese_language_and_literature | 0f8b68 | accuracy | gen | 30.43 |
35 | ceval-art_studies | 2a1300 | accuracy | gen | 60.61 |
36 | ceval-professional_tour_guide | 4e673e | accuracy | gen | 62.07 |
37 | ceval-legal_professional | ce8787 | accuracy | gen | 39.13 |
38 | ceval-high_school_chinese | 315705 | accuracy | gen | 63.16 |
39 | ceval-high_school_history | 7eb30a | accuracy | gen | 70.00 |
40 | ceval-middle_school_history | 48ab4a | accuracy | gen | 59.09 |
41 | ceval-civil_servant | 87d061 | accuracy | gen | 53.19 |
42 | ceval-sports_science | 70f27b | accuracy | gen | 52.63 |
43 | ceval-plant_protection | 8941f9 | accuracy | gen | 59.09 |
44 | ceval-basic_medicine | c409d6 | accuracy | gen | 47.37 |
45 | ceval-clinical_medicine | 49e82d | accuracy | gen | 40.91 |
46 | ceval-urban_and_rural_planner | 95b885 | accuracy | gen | 45.65 |
47 | ceval-accountant | 002837 | accuracy | gen | 26.53 |
48 | ceval-fire_engineer | bc23f5 | accuracy | gen | 22.58 |
49 | ceval-environmental_impact_assessment_engineer | c64e2d | accuracy | gen | 64.52 |
50 | ceval-tax_accountant | 3a5e3c | accuracy | gen | 34.69 |
51 | ceval-physician | 6e277d | accuracy | gen | 40.82 |
52 | ceval-stem | - | naive_average | gen | 35.09 |
53 | ceval-social-science | - | naive_average | gen | 52.79 |
54 | ceval-humanities | - | naive_average | gen | 52.58 |
55 | ceval-other | - | naive_average | gen | 44.36 |
56 | ceval-hard | - | naive_average | gen | 23.91 |
57 | ceval | - | naive_average | gen | 44.16 |