前言
这是我最近才开始实验的工作流,目前还比较粗糙,后续应该会慢慢改进
情景
假设现在我在 benchmark 中有多组例子,benchmark/B1,benchmark/B2,benchmark/B3
在 bin 中有多个求解器(不一定是 bin,规定好路径即可),bin/A, bin/B,bin/C
现在,我想跑所有求解器在 benchmark/B1 上的结果
如果按照 先前 的做法,我们会需要自己写 n 个脚本(与求解器数量一致),每个求解器面临参数设置不同,输入文件格式不同,路径不同,输出格式不同以及一系列不同
不嫌麻烦当然可以每个求解器单独写一个生成脚本,但这就不是本文所要讲述的重点了
配置文件
我们可以通过实验配置文件与求解器配置文件来协助脚本的生成,这里,我们先讲求解器配置文件
求解器配置文件
Tip
由于配置文件和求解器关系很大,因此没有一个通用的方法,这里我给出一个简单的方法
我们使用 json-with-comment 来描述一个配置,最简单的做法是先定义一个 json schema,例如:
{
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"properties": {
"A": {
"type": "object",
"properties": {
"Complete": {
"type": "boolean"
},
"program": {
"type": "string"
},
"args": {
"type": "array",
"items": {
"type": "string"
}
},
"input": {
"type": "string"
},
"format": {
"type": "object",
"properties": {
"sat": {
"type": "string"
},
"unsat": {
"type": "string"
},
"has_timer": {
"type": "string"
}
},
"required": [
"sat",
"unsat",
"has_timer"
]
}
},
"required": [
"Complete",
"program",
"args",
"input",
"format"
]
},
"B": {
"type": "object",
"properties": {
"Complete": {
"type": "boolean"
},
"program": {
"type": "string"
},
"args": {
"type": "array",
"items": {
"type": "string"
}
},
"input": {
"type": "string"
},
"format": {
"type": "object",
"properties": {
"sat": {
"type": "string"
},
"unsolved": {
"type": "string"
},
"has_timer": {
"type": "string"
}
},
"required": [
"sat",
"unsolved",
"has_timer"
]
}
},
"required": [
"Complete",
"program",
"args",
"input",
"format"
]
},
"C": {
"type": "object",
"properties": {
"Complete": {
"type": "boolean"
},
"program": {
"type": "string"
},
"args": {
"type": "array",
"items": {
"type": "string"
}
},
"input": {
"type": "string"
},
"format": {
"type": "object",
"properties": {
"sat": {
"type": "string"
},
"unsolved": {
"type": "string"
},
"has_timer": {
"type": "string"
}
},
"required": [
"sat",
"unsolved",
"has_timer"
]
}
},
"required": [
"Complete",
"program",
"args",
"input",
"format"
]
},
},
"required": [
"A",
"B",
"C"
]
}然后,我们就可以把这个 json schema 丢给大模型,让他帮你生成对应的 jsonc 文件了
Info
当然,你也可以自己手写,本身我觉得手写
json schema也有点难度了,不如手写,只需要提前设计好求解器的每个字段有什么就可以
Attention
上面这个
schema是我生成的,如果要手写的话,应该是使用数组,数组的每一项是一个object,用来描述求解器的各个配置,而不是像现在这样,求解器名称是key值,参数是value
实验配置
实验配置文件一个简单的 json 如下所示:
{
"MagicsSquare": {
"solvers": {
"local_search": [
"A",
"B",
"C"
]
},
"file": {
"CNF": "benchmarks/Magicsq/CNF",
"KNF": "benchmarks/Magicsq/KNF",
"OPB": "benchmarks/Magicsq/OPB"
}
}
}意思是,现在我需要跑一个名为 MagicsSquare 的实验,其 benchmark 各类格式的文件都在 file 中记录,其需要运行的求解器,类别,与具体名称都在 solvers 中,且具体名称必须与 求解器配置文件 中相对应。
其 schema 就不在这里放了,知道大概的意思即可
Python 脚本
有了配置文件后,我们就可以使用 Python 来生成 cmd 脚本文件了
但因为这里是懒人版,所以这部分内容也可以直接让 AI 来书写,写好提示词即可,脚本工作 AI 十分擅长
比如我的一个简单脚本是:
import jsonc
import os
import re
_placeholder_re = re.compile(r"\{(\w+)\}")
SCRIPTS_PATH = "scripts/run"
EXCLUDED_SOLVER = set([""])
NTHREADS = 50
def seed_generator():
u = int.from_bytes(os.urandom(2), byteorder="little")
while u >= 99:
u = u // 10
return u
ARGS = {"seed": seed_generator}
def mkdir(path: str):
if not os.path.exists(path):
os.mkdir(path)
def generate_cmd(
bash_path: str,
solver_config: dict,
filefolder: str,
sol_folder: str,
enable_timeout: bool,
):
cmd_list = []
filelists = os.listdir(filefolder)
cmd_template = []
time_prefix = False
if enable_timeout:
cmd_template.append("timeout 5000")
if solver_config["format"]["has_timer"] is None:
cmd_template.append("{ time")
time_prefix = True
cmd_template.append(solver_config["program"])
for args in solver_config["args"]:
keys = _placeholder_re.findall(args)
if not keys:
cmd_template.append(args)
continue
filled: str = args
for k in keys:
if k in ARGS:
filled = filled.replace("{" + k + "}", str(ARGS[k]()))
cmd_template.append(filled)
cmd = " ".join(cmd_template)
for file in filelists:
solution = f"{sol_folder}/{os.path.basename(file)}.sol"
final_cmd = cmd + f" {os.path.join(filefolder, file)} > {solution}"
if time_prefix:
final_cmd += "; } >> " + solution + " 2>&1"
cmd_list.append(final_cmd)
with open(bash_path, "a+") as f:
for c in cmd_list:
f.write(f"{c}\n")
with open("scripts/solver.json", "r") as f:
SOLVER_CONFIG = jsonc.load(f)
with open("scripts/experiment_config.json", "r") as f:
EXPERIMENTS = jsonc.load(f)
solver_bash = {}
for experiment_name, experiment_info in EXPERIMENTS.items():
experiments_folder = f"solution/{experiment_name}"
mkdir(experiments_folder)
for solver_type, solver_lists in experiment_info["solvers"].items():
details_folder = f"{experiments_folder}/{solver_type}"
mkdir(details_folder)
type_bash_script_file = f"{SCRIPTS_PATH}/run_{solver_type}.sh"
for solver in solver_lists:
if solver in EXCLUDED_SOLVER:
continue
bash_script = f"{SCRIPTS_PATH}/{solver}.sh"
solver_folder = f"{details_folder}/{solver}"
mkdir(solver_folder)
solver_bash[solver] = (type_bash_script_file, bash_script)
solver_config = SOLVER_CONFIG[solver]
generate_cmd(
bash_script,
solver_config,
experiment_info["file"][solver_config["input"]],
solver_folder,
True if solver_type == "local_search" else False,
)
for _, (filename, bash_script) in solver_bash.items():
with open(filename, "a+") as f:
f.write(f"cat {bash_script} | xargs -P{NTHREADS} -d'\\n' -n1 bash -c\nwait\n")
这样,我们只需要运行这个脚本,就可以生成多个合理的 bash 文件,然后按照需要调用即可
Tip
如果我们对实验有任何需要修改的地方,就可以考虑修改配置文件,而不需要去修改这个生成脚本了,心智负担瞬间小了很多