前言

这是我最近才开始实验的工作流,目前还比较粗糙,后续应该会慢慢改进

情景

假设现在我在 benchmark 中有多组例子,benchmark/B1benchmark/B2benchmark/B3

bin 中有多个求解器(不一定是 bin,规定好路径即可),bin/Abin/Bbin/C

现在,我想跑所有求解器在 benchmark/B1 上的结果

如果按照 先前 的做法,我们会需要自己写 n 个脚本(与求解器数量一致),每个求解器面临参数设置不同,输入文件格式不同,路径不同,输出格式不同以及一系列不同

不嫌麻烦当然可以每个求解器单独写一个生成脚本,但这就不是本文所要讲述的重点了

配置文件

我们可以通过实验配置文件与求解器配置文件来协助脚本的生成,这里,我们先讲求解器配置文件

求解器配置文件

Tip

由于配置文件和求解器关系很大,因此没有一个通用的方法,这里我给出一个简单的方法

我们使用 json-with-comment 来描述一个配置,最简单的做法是先定义一个 json schema,例如:

{
    "$schema": "http://json-schema.org/draft-07/schema#",
    "type": "object",
    "properties": {
        "A": {
            "type": "object",
            "properties": {
                "Complete": {
                    "type": "boolean"
                },
                "program": {
                    "type": "string"
                },
                "args": {
                    "type": "array",
                    "items": {
                        "type": "string"
                    }
                },
                "input": {
                    "type": "string"
                },
                "format": {
                    "type": "object",
                    "properties": {
                        "sat": {
                            "type": "string"
                        },
                        "unsat": {
                            "type": "string"
                        },
                        "has_timer": {
                            "type": "string"
                        }
                    },
                    "required": [
                        "sat",
                        "unsat",
                        "has_timer"
                    ]
                }
            },
            "required": [
                "Complete",
                "program",
                "args",
                "input",
                "format"
            ]
        },
        "B": {
            "type": "object",
            "properties": {
                "Complete": {
                    "type": "boolean"
                },
                "program": {
                    "type": "string"
                },
                "args": {
                    "type": "array",
                    "items": {
                        "type": "string"
                    }
                },
                "input": {
                    "type": "string"
                },
                "format": {
                    "type": "object",
                    "properties": {
                        "sat": {
                            "type": "string"
                        },
                        "unsolved": {
                            "type": "string"
                        },
                        "has_timer": {
                            "type": "string"
                        }
                    },
                    "required": [
                        "sat",
                        "unsolved",
                        "has_timer"
                    ]
                }
            },
            "required": [
                "Complete",
                "program",
                "args",
                "input",
                "format"
            ]
        },
        "C": {
            "type": "object",
            "properties": {
                "Complete": {
                    "type": "boolean"
                },
                "program": {
                    "type": "string"
                },
                "args": {
                    "type": "array",
                    "items": {
                        "type": "string"
                    }
                },
                "input": {
                    "type": "string"
                },
                "format": {
                    "type": "object",
                    "properties": {
                        "sat": {
                            "type": "string"
                        },
                        "unsolved": {
                            "type": "string"
                        },
                        "has_timer": {
                            "type": "string"
                        }
                    },
                    "required": [
                        "sat",
                        "unsolved",
                        "has_timer"
                    ]
                }
            },
            "required": [
                "Complete",
                "program",
                "args",
                "input",
                "format"
            ]
        },
    },
    "required": [
	    "A",
	    "B",
	    "C"
    ]
}

然后,我们就可以把这个 json schema 丢给大模型,让他帮你生成对应的 jsonc 文件了

Info

当然,你也可以自己手写,本身我觉得手写 json schema 也有点难度了,不如手写,只需要提前设计好求解器的每个字段有什么就可以

Attention

上面这个 schema 是我生成的,如果要手写的话,应该是使用数组,数组的每一项是一个 object,用来描述求解器的各个配置,而不是像现在这样,求解器名称是 key 值,参数是 value

实验配置

实验配置文件一个简单的 json 如下所示:

{
    "MagicsSquare": {
        "solvers": {
            "local_search": [
	            "A",
	            "B",
	            "C"
            ]
        },
        "file": {
            "CNF": "benchmarks/Magicsq/CNF",
            "KNF": "benchmarks/Magicsq/KNF",
            "OPB": "benchmarks/Magicsq/OPB"
        }
    }
}

意思是,现在我需要跑一个名为 MagicsSquare 的实验,其 benchmark 各类格式的文件都在 file 中记录,其需要运行的求解器,类别,与具体名称都在 solvers 中,且具体名称必须与 求解器配置文件 中相对应。

schema 就不在这里放了,知道大概的意思即可

Python 脚本

有了配置文件后,我们就可以使用 Python 来生成 cmd 脚本文件了

但因为这里是懒人版,所以这部分内容也可以直接让 AI 来书写,写好提示词即可,脚本工作 AI 十分擅长

比如我的一个简单脚本是:

import jsonc
import os
import re
 
_placeholder_re = re.compile(r"\{(\w+)\}")
 
SCRIPTS_PATH = "scripts/run"
 
EXCLUDED_SOLVER = set([""])
 
NTHREADS = 50
 
def seed_generator():
    u = int.from_bytes(os.urandom(2), byteorder="little")
    while u >= 99:
        u = u // 10
    return u
 
 
ARGS = {"seed": seed_generator}
 
 
def mkdir(path: str):
    if not os.path.exists(path):
        os.mkdir(path)
 
 
def generate_cmd(
    bash_path: str,
    solver_config: dict,
    filefolder: str,
    sol_folder: str,
    enable_timeout: bool,
):
    cmd_list = []
    filelists = os.listdir(filefolder)
 
    cmd_template = []
    time_prefix = False
 
    if enable_timeout:
        cmd_template.append("timeout 5000")
 
    if solver_config["format"]["has_timer"] is None:
        cmd_template.append("{ time")
        time_prefix = True
    cmd_template.append(solver_config["program"])
    for args in solver_config["args"]:
        keys = _placeholder_re.findall(args)
        if not keys:
            cmd_template.append(args)
            continue
 
        filled: str = args
 
        for k in keys:
            if k in ARGS:
                filled = filled.replace("{" + k + "}", str(ARGS[k]()))
        cmd_template.append(filled)
    cmd = " ".join(cmd_template)
 
    for file in filelists:
        solution = f"{sol_folder}/{os.path.basename(file)}.sol"
        final_cmd = cmd + f" {os.path.join(filefolder, file)} > {solution}"
        if time_prefix:
            final_cmd += "; } >> " + solution + " 2>&1"
        cmd_list.append(final_cmd)
 
    with open(bash_path, "a+") as f:
        for c in cmd_list:
            f.write(f"{c}\n")
 
 
with open("scripts/solver.json", "r") as f:
    SOLVER_CONFIG = jsonc.load(f)
 
with open("scripts/experiment_config.json", "r") as f:
    EXPERIMENTS = jsonc.load(f)
 
solver_bash = {}
 
for experiment_name, experiment_info in EXPERIMENTS.items():
    experiments_folder = f"solution/{experiment_name}"
    mkdir(experiments_folder)
 
    for solver_type, solver_lists in experiment_info["solvers"].items():
        details_folder = f"{experiments_folder}/{solver_type}"
        mkdir(details_folder)
 
        type_bash_script_file = f"{SCRIPTS_PATH}/run_{solver_type}.sh"
        for solver in solver_lists:
            if solver in EXCLUDED_SOLVER:
                continue
 
            bash_script = f"{SCRIPTS_PATH}/{solver}.sh"
            solver_folder = f"{details_folder}/{solver}"
            mkdir(solver_folder)
 
            solver_bash[solver] = (type_bash_script_file, bash_script)
 
            solver_config = SOLVER_CONFIG[solver]
 
            generate_cmd(
                bash_script,
                solver_config,
                experiment_info["file"][solver_config["input"]],
                solver_folder,
                True if solver_type == "local_search" else False,
            )
 
for _, (filename, bash_script) in solver_bash.items():
    with open(filename, "a+") as f:
        f.write(f"cat {bash_script} | xargs -P{NTHREADS} -d'\\n' -n1 bash -c\nwait\n")
 

这样,我们只需要运行这个脚本,就可以生成多个合理的 bash 文件,然后按照需要调用即可

Tip

如果我们对实验有任何需要修改的地方,就可以考虑修改配置文件,而不需要去修改这个生成脚本了,心智负担瞬间小了很多