请在宿主服务器上安装 cuda 的驱动程序
镜像准备
下载 image 此镜像(可以 docker pull 或者直接下载 .tar 文件),注意选择的 ARCH 为 amd64
若下载
.tar文件,则需要使用docker load -i导入镜像
镜像下载成功后,我们需要开启 sudo 权限,在 /etc/docker/daemon.json 中添加如下内容(若不存在此文件,请先创建):
{ "runtimes": { "nvidia": { "path": "/usr/bin/nvidia-container-runtime", "runtimeArgs": [] } }}容器建立
运行:
docker run --restart=on-failure --runtime=nvidia --network host -it {{your image name}} /usr/bin/sh{{your image name}} 处填写镜像名称,使用 docker image ls 可查看,注意需要名称加标签,例如:

写作:nvidia/cuda:11.7.0-cudnn8-devel-ubuntu20.04
进入容器后,执行命令:
nvidia-smi出现类似下面的表格即成功创建了可运行 cuda 的 docker 容器

构建准备
安装必要的软件,当然首先进行换源:
sed -i 's/archive.ubuntu.com/mirrors.ustc.edu.cn/g' /etc/apt/sources.list然后安装软件:
apt updateapt upgradeapt updateapt-get install -y python3 python3-dev python3-setuptools gcc libtinfo-dev zlib1g-dev build-essential libedit-dev libxml2-dev libssl-dev unzip vim wget git安装 cmake-3.20
wget https://github.com/Kitware/CMake/releases/download/v3.20.0/cmake-3.20.0.tar.gztar -xvf cmake-3.20.0.tar.gzcd cmake-3.20.0./configuremake -j6make installcp bin/cmake /usr/bin/完成后输入 cmake -version 查看是否成功
安装 llvm
运行
apt install lsb-release wget software-properties-common gnupg -ywget https://apt.llvm.org/llvm.shchmod u+x llvm.sh./llvm.sh 14 all构建 tvm
-
从
github下载Terminal window git clone --recursive https://github.com/apache/tvm tvm -
下载
.zip然后从本机传到docker上解压在本机上,输入:
docker cp ~/tvm.zip {{container id}}:/root/, 在{{container id}}处输入容器的id,可以用docker ps -a进行查看然后进入
docker中,输入Terminal window unzip tvm.zip
得到一个 tvm 文件夹,随后进行构建
注意,请下载
v0.1.0版本,否则可能无法正常运行
unzip tvm.zipcd tvmmkdir -p buildcd buildcp ../cmake/config.cmake buildvim config.cmake # 在这里修改配置修改内容如下:
将set(USE_CUDA OFF) 改为 set(USE_CUDA ON) 启用 CUDA 后端,如果要使用例如 OpenGL 则启用对应的即可
LLVM 将 set(USE_LLVM OFF) 改为 set(USE_LLVM ON)
IR 调试,设置 set(USE_RELAY_DEBUG ON),同时设置环境变量 TVM_LOG_DEBUG
export TVM_LOG_DEBUG="ir/transform.cc=1,relay/ir/transform.cc=1"然后:
source .bashrc重启环境
然后在 build 文件夹中,输入:
cmake -DCMAKE_BUILD_TYPE=Debug ..make -j8等待 libtvm.so 的构建完成
对于 python 包的构建,设置环境变量 PYTHONPATH 来告诉 python 在哪里找到这个库。例如,假设我们在 /path/to/tvm 目录下克隆了 tvm,那么我们可以在 ~/.bashrc 中添加以下一行。一旦你拉出代码并重建项目,这些变化将立即反映出来(不需要再次调用 setup)
export TVM_HOME=/path/to/tvmexport PYTHONPATH=$TVM_HOME/python:${PYTHONPATH}然后在 tvm 目录下输入:
cd python; python3 setup.py install --user; cd ..Bug
在安装
scipy时会报错,要求python版本大于等于3.9,所以有些包需要我们手动安装:Terminal window apt install -y pippip uninstall numpypip install "numpy<=1.23" decorator attrs psutil 'xgboost<1.6.0' cloudpickle ml_dtypes scipy
启用 C++ 测试(注意这里在 ~ 目录下进行)
git clone https://github.com/google/googletestcd googletestmkdir buildcd buildcmake -DBUILD_SHARED_LIBS=ON ..make -j6make install然后在 tvm 目录下,运行:
make cpptest -j6进行构建,若无报错则已构建成功
测试
参见 编译PyTorch模型 测试
运行命令
pip install torch==1.7.0pip install torchvision==0.8.1然后创建一个 torch_test.py ,内容如下:
import timeimport tvmfrom tvm import relayimport numpy as npfrom tvm.contrib.download import download_testdataimport torchimport torchvisionfrom scipy.special import softmax# device = torch.device("cpu")model_name = "resnet18"model = getattr(torchvision.models, model_name)(pretrained=True)model = model.eval()
# We grab the TorchScripted model via tracinginput_shape = [1, 3, 224, 224]input_data = torch.randn(input_shape)scripted_model = torch.jit.trace(model, input_data).eval()
from PIL import Image
img_url = "https://github.com/dmlc/mxnet.js/blob/main/data/cat.png?raw=true"img_path = download_testdata(img_url, "cat.png", module="data")print(img_path)img = Image.open(img_path).resize((224, 224))
# Preprocess the image and convert to tensorfrom torchvision import transforms
my_preprocess = transforms.Compose( [ transforms.Resize(256), transforms.CenterCrop(224), transforms.ToTensor(), transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), ])img = my_preprocess(img)img = np.expand_dims(img, 0)
####################################################################### Import the graph to Relay# -------------------------# Convert PyTorch graph to Relay graph. The input name can be arbitrary.input_name = "input0"shape_list = [(input_name, img.shape)]mod, params = relay.frontend.from_pytorch(scripted_model, shape_list)
####################################################################### Relay Build# -----------# Compile the graph to llvm target with given input specification.target = "llvm"target_host = "llvm"dev = tvm.cpu(0)with tvm.transform.PassContext(opt_level=7): lib = relay.build(mod, target=target, target_host=target_host, params=params)
####################################################################### Execute the portable graph on TVM# ---------------------------------# Now we can try deploying the compiled model on target.from tvm.contrib import graph_executor
m = graph_executor.GraphModule(lib["default"](dev))
tvm_time_spent=[]torch_time_spent=[]n_warmup=5n_time=10# tvm_t0 = time.process_time()for i in range(n_warmup+n_time): dtype = "float32" # Set inputs m.set_input(input_name, tvm.nd.array(img.astype(dtype))) tvm_t0 = time.time() # Execute m.run() # Get outputs tvm_output = m.get_output(0) tvm_time_spent.append(time.time() - tvm_t0)# tvm_t1 = time.process_time()
###################################################################### Look up synset name# -------------------# Look up prediction top 1 index in 1000 class synset.synset_url = "".join( [ "https://raw.githubusercontent.com/Cadene/", "pretrained-models.pytorch/master/data/", "imagenet_synsets.txt", ])synset_name = "imagenet_synsets.txt"synset_path = download_testdata(synset_url, synset_name, module="data")with open(synset_path) as f: synsets = f.readlines()
synsets = [x.strip() for x in synsets]splits = [line.split(" ") for line in synsets]key_to_classname = {spl[0]: " ".join(spl[1:]) for spl in splits}
class_url = "".join( [ "https://raw.githubusercontent.com/Cadene/", "pretrained-models.pytorch/master/data/", "imagenet_classes.txt", ])class_name = "imagenet_classes.txt"class_path = download_testdata(class_url, class_name, module="data")with open(class_path) as f: class_id_to_key = f.readlines()
class_id_to_key = [x.strip() for x in class_id_to_key]
# Get top-1 result for TVMtop1_tvm = np.argmax(tvm_output.asnumpy()[0])tvm_class_key = class_id_to_key[top1_tvm]
# Convert input to PyTorch variable and get PyTorch result for comparison# torch_t0 = time.process_time()# torch.set_num_threads(1)for i in range(n_warmup+n_time): with torch.no_grad(): torch_img = torch.from_numpy(img) torch_t0 = time.time() output = model(torch_img) torch_time_spent.append(time.time() - torch_t0) # Get top-1 result for PyTorch top1_torch = np.argmax(output.numpy()) torch_class_key = class_id_to_key[top1_torch]# torch_t1 = time.process_time()
# tvm_time = tvm_t1 - tvm_t0# torch_time = torch_t1 - torch_t0tvm_time = np.mean(tvm_time_spent[n_warmup:]) * 1000torch_time = np.mean(torch_time_spent[n_warmup:]) * 1000tvm_output_prob = softmax(tvm_output.asnumpy())output_prob = softmax(output.numpy())print("Relay top-1 id: {}, class name: {}, class probality: {}".format(top1_tvm, key_to_classname[tvm_class_key], tvm_output_prob[0][top1_tvm]))print("Torch top-1 id: {}, class name: {}, class probality: {}".format(top1_torch, key_to_classname[torch_class_key], output_prob[0][top1_torch]))print('Relay time(ms): {:.3f}'.format(tvm_time))print('Torch time(ms): {:.3f}'.format(torch_time))注意,在文档中给出了 import set_env
set_env 用于在本次运行代码前添加如下函数用于设置 Python 临时环境:
def set_env(num, current_path='.'): ''' num 表示相对于 current_path 的父级根目录级别 ''' import sys from pathlib import Path
ROOT = Path(current_path).resolve().parents[num] sys.path.extend([str(ROOT/'src')]) # 设置 `tvm_book` 环境 from tvm_book.config.env import set_tvm # 设置 TVM 环境 set_tvm(TVM_ROOT)set_tvm 需要自行配置以适配设备
如果使用 cuda 的话,代码为:
import timeimport tvmfrom tvm import relayimport numpy as npfrom tvm.contrib.download import download_testdataimport torchimport torchvisionfrom scipy.special import softmax# device = torch.device("cpu")model_name = "resnet18"model = getattr(torchvision.models, model_name)(pretrained=True)model = model.eval()
# We grab the TorchScripted model via tracinginput_shape = [1, 3, 224, 224]input_data = torch.randn(input_shape)scripted_model = torch.jit.trace(model, input_data).eval()
from PIL import Image
img_url = "https://github.com/dmlc/mxnet.js/blob/main/data/cat.png?raw=true"img_path = download_testdata(img_url, "cat.png", module="data")print(img_path)img = Image.open(img_path).resize((224, 224))
# Preprocess the image and convert to tensorfrom torchvision import transforms
my_preprocess = transforms.Compose( [ transforms.Resize(256), transforms.CenterCrop(224), transforms.ToTensor(), transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), ])img = my_preprocess(img)img = np.expand_dims(img, 0)
####################################################################### Import the graph to Relay# -------------------------# Convert PyTorch graph to Relay graph. The input name can be arbitrary.input_name = "input0"shape_list = [(input_name, img.shape)]mod, params = relay.frontend.from_pytorch(scripted_model, shape_list)
####################################################################### Relay Build# -----------target = "cuda"target_host = "llvm"dev = tvm.gpu(0)with tvm.transform.PassContext(opt_level=7): lib = relay.build(mod, target=target, target_host=target_host, params=params)
####################################################################### Execute the portable graph on TVM# ---------------------------------# Now we can try deploying the compiled model on target.from tvm.contrib import graph_executor
m = graph_executor.GraphModule(lib["default"](dev))
tvm_time_spent=[]torch_time_spent=[]n_warmup=5n_time=10# tvm_t0 = time.process_time()for i in range(n_warmup+n_time): dtype = "float32" # Set inputs m.set_input(input_name, tvm.nd.array(img.astype(dtype))) tvm_t0 = time.time() # Execute m.run() # Get outputs tvm_output = m.get_output(0) tvm_time_spent.append(time.time() - tvm_t0)# tvm_t1 = time.process_time()
###################################################################### Look up synset name# -------------------# Look up prediction top 1 index in 1000 class synset.synset_url = "".join( [ "https://raw.githubusercontent.com/Cadene/", "pretrained-models.pytorch/master/data/", "imagenet_synsets.txt", ])synset_name = "imagenet_synsets.txt"synset_path = download_testdata(synset_url, synset_name, module="data")with open(synset_path) as f: synsets = f.readlines()
synsets = [x.strip() for x in synsets]splits = [line.split(" ") for line in synsets]key_to_classname = {spl[0]: " ".join(spl[1:]) for spl in splits}
class_url = "".join( [ "https://raw.githubusercontent.com/Cadene/", "pretrained-models.pytorch/master/data/", "imagenet_classes.txt", ])class_name = "imagenet_classes.txt"class_path = download_testdata(class_url, class_name, module="data")with open(class_path) as f: class_id_to_key = f.readlines()
class_id_to_key = [x.strip() for x in class_id_to_key]
# Get top-1 result for TVMtop1_tvm = np.argmax(tvm_output.asnumpy()[0])tvm_class_key = class_id_to_key[top1_tvm]
# Convert input to PyTorch variable and get PyTorch result for comparison# torch_t0 = time.process_time()# torch.set_num_threads(1)for i in range(n_warmup+n_time): with torch.no_grad(): torch_img = torch.from_numpy(img) torch_t0 = time.time() output = model(torch_img) torch_time_spent.append(time.time() - torch_t0) # Get top-1 result for PyTorch top1_torch = np.argmax(output.numpy()) torch_class_key = class_id_to_key[top1_torch]# torch_t1 = time.process_time()
# tvm_time = tvm_t1 - tvm_t0# torch_time = torch_t1 - torch_t0tvm_time = np.mean(tvm_time_spent[n_warmup:]) * 1000torch_time = np.mean(torch_time_spent[n_warmup:]) * 1000tvm_output_prob = softmax(tvm_output.asnumpy())output_prob = softmax(output.numpy())print("Relay top-1 id: {}, class name: {}, class probality: {}".format(top1_tvm, key_to_classname[tvm_class_key], tvm_output_prob[0][top1_tvm]))print("Torch top-1 id: {}, class name: {}, class probality: {}".format(top1_torch, key_to_classname[torch_class_key], output_prob[0][top1_torch]))print('Relay time(ms): {:.3f}'.format(tvm_time))print('Torch time(ms): {:.3f}'.format(torch_time))运行代码即可
运行会出现很多日志,暂时还没找到消除的方法,这个日志应该是由于开启了
USE_RELAY_DEBUG的原因
运行成功的截图如下:

如果第二次运行出现
Module Not Found: "tvm" is not found这种错误,请重新安装tvm的python环境