虚拟主机不能通过什么架设网站百度免费咨询
安装Extension
本地安装Remote-SSH、python
远程服务器上安装Python
- 难点:主机和远程服务器上安装Python扩展失败,可能是网络、代理等原因导致
- 解决方法:
- 主机在官方网站下载Python扩展:https://marketplace.visualstudio.com/items?itemName=ms-python.python
主机直接放在vscode的bin目录下并且执行指令code --install-extension ms-python.python-2022.9.11681004.vsix
即可
(细节见https://www.hangge.com/blog/cache/detail_3191.html) - 服务器的python扩展先使用scp从本地传上去,然后先要对其赋予执行权限,我一开始没有解决就是因为没有赋予权限,我直接chmod 777之后install from vsix即可(chmod +x应该也行)
之后就看到环境了:
现在可以选择自己在服务器的conda进行调试:
价值一天半时间的”权限访问“难题被破解!此时不禁想要听一百遍越权访问加深印象…
- 主机在官方网站下载Python扩展:https://marketplace.visualstudio.com/items?itemName=ms-python.python
之后就要run->add configuration->
launch.json如下:
{"version": "0.2","configurations": [{"name": "Python: Launch","type": "python","request": "launch","program": "${workspaceFolder}/CLIP4Clip/main_task_retrieval.py","args": ["--do_train","--num_thread_reader=0","--epochs=5","--batch_size=128","--n_display=50","--train_csv","${env:DATA_PATH}/MSRVTT_train.9k.csv","--val_csv","${env:DATA_PATH}/MSRVTT_JSFUSION_test.csv","--data_path","${env:DATA_PATH}/MSRVTT_data.json","--features_path","${env:DATA_PATH}/MSRVTT_Videos","--output_dir","ckpts/ckpt_msrvtt_retrieval_looseType","--lr","1e-4","--max_words","32","--max_frames","12","--batch_size_val","16","--datatype","msrvtt","--expand_msrvtt_sentences","--feature_framerate","1","--coef_lr","1e-3","--freeze_layer_num","0","--slice_framepos","2","--loose_type","--linear_patch","2d","--sim_header","meanP","--pretrained_clip_name","ViT-B/32"],"env": {"DATA_PATH": "/mnt/cloud_disk/wf/msrvtt_data"},"console": "integratedTerminal"}]
}
之后出现一个问题就是目前引用env变量在命令行中显示为空,目前不能用这个方式引用所以还得用笨方法,就是挨个复制粘贴。
并且python -m要变成module词段,module与program冲突,需要调整:
{"version": "0.2","configurations": [{"name": "Python: Launch","type": "python","request": "launch","module": "torch.distributed.launch","args": ["${workspaceFolder}/CLIP4Clip/main_task_retrieval.py","--do_train","--num_thread_reader=0","--epochs=5","--batch_size=128","--n_display=50","--train_csv","/mnt/cloud_disk/wf/msrvtt_data/MSRVTT_train.9k.csv","--val_csv","/mnt/cloud_disk/wf/msrvtt_data/MSRVTT_JSFUSION_test.csv","--data_path","/mnt/cloud_disk/wf/msrvtt_data/MSRVTT_data.json","--features_path","/mnt/cloud_disk/wf/msrvtt_data/MSRVTT_Videos","--output_dir","ckpts/ckpt_msrvtt_retrieval_looseType","--lr","1e-4","--max_words","32","--max_frames","12","--batch_size_val","16","--datatype","msrvtt","--expand_msrvtt_sentences","--feature_framerate","1","--coef_lr","1e-3","--freeze_layer_num","0","--slice_framepos","2","--loose_type","--linear_patch","2d","--sim_header","meanP","--pretrained_clip_name","ViT-B/32"],"console": "integratedTerminal"}]
}
之后设置断点调试之后发现这个问题:
挨个语句调试之后发现出现在某个加载模型的地方,模型的位置防止错误了,远程调试真的好用,可以清晰看到过程的调用栈call stack
发现以下问题:
在这段程序中计算frameCount的时候我发现计算出来的为0,fps也为0,因此引发了除零报错
检查后发现是视频数据集的位置放错了,导致输入的视频为空,改完之后就好使了。
按照CLIP4Clip中的下载方式就可以:
上面的msrvtt.zip只是给出了一个索引,具体的mp4内容还要在下面红框中的链接下载
遇到numpy一个报错AttributeError: module ‘numpy‘ has no attribute ‘long‘
,说是没有np.long,发现是现在新版本的np没有了long类型,回滚下版本就行:
pip install numpy==1.23.0
然后就可以跑了
总结,CLIP4Clip(也是CLIP的环境要求)的conda环境要求如下:
# packages in environment at /mnt/cloud_disk/wf/anaconda3/envs/clip4:
#
# Name Version Build Channel
_libgcc_mutex 0.1 main defaults
_openmp_mutex 5.1 1_gnu defaults
boto3 1.34.1 pypi_0 pypi
botocore 1.34.1 pypi_0 pypi
ca-certificates 2023.08.22 h06a4308_0 defaults
certifi 2022.12.7 pypi_0 pypi
charset-normalizer 2.1.1 pypi_0 pypi
filelock 3.9.0 pypi_0 pypi
fsspec 2023.4.0 pypi_0 pypi
ftfy 6.1.3 pypi_0 pypi
idna 3.4 pypi_0 pypi
jinja2 3.1.2 pypi_0 pypi
jmespath 1.0.1 pypi_0 pypi
ld_impl_linux-64 2.38 h1181459_1 defaults
libffi 3.4.4 h6a678d5_0 defaults
libgcc-ng 11.2.0 h1234567_1 defaults
libgomp 11.2.0 h1234567_1 defaults
libstdcxx-ng 11.2.0 h1234567_1 defaults
markupsafe 2.1.3 pypi_0 pypi
mpmath 1.3.0 pypi_0 pypi
ncurses 6.4 h6a678d5_0 defaults
networkx 3.0 pypi_0 pypi
numpy 1.23.0 pypi_0 pypi
nvidia-cublas-cu11 11.10.3.66 pypi_0 pypi
nvidia-cuda-nvrtc-cu11 11.7.99 pypi_0 pypi
nvidia-cuda-runtime-cu11 11.7.99 pypi_0 pypi
nvidia-cudnn-cu11 8.5.0.96 pypi_0 pypi
opencv-python 4.8.1.78 pypi_0 pypi
openssl 3.0.12 h7f8727e_0 defaults
pandas 2.1.4 pypi_0 pypi
pillow 9.3.0 pypi_0 pypi
pip 23.3.1 py39h06a4308_0 defaults
python 3.9.18 h955ad1f_0 defaults
python-dateutil 2.8.2 pypi_0 pypi
pytz 2023.3.post1 pypi_0 pypi
readline 8.2 h5eee18b_0 defaults
regex 2023.10.3 pypi_0 pypi
requests 2.28.1 pypi_0 pypi
s3transfer 0.9.0 pypi_0 pypi
setuptools 68.2.2 py39h06a4308_0 defaults
six 1.16.0 pypi_0 pypi
sqlite 3.41.2 h5eee18b_0 defaults
sympy 1.12 pypi_0 pypi
tk 8.6.12 h1ccaba5_0 defaults
torch 1.13.0 pypi_0 pypi
torchaudio 0.13.0 pypi_0 pypi
torchvision 0.14.0 pypi_0 pypi
tqdm 4.66.1 pypi_0 pypi
triton 2.1.0 pypi_0 pypi
typing-extensions 4.4.0 pypi_0 pypi
tzdata 2023.3 pypi_0 pypi
urllib3 1.26.13 pypi_0 pypi
wcwidth 0.2.12 pypi_0 pypi
wheel 0.41.2 py39h06a4308_0 defaults
xz 5.4.5 h5eee18b_0 defaults
zlib 1.2.13 h5eee18b_0 defaults
注意的点就是使用pytorch1.13(直接用pytorch官方文档中历史版本页中的1.13推荐conda安装命令就行)以及numpy1.23.0,剩下的基本就可以无脑安装。
吐槽:pytorch1.13之后改得torch.distributed.launch要被torchrun替代了,local_rank也被改成local-rank(好像是好多下划线都被优化了),无法很好的向前兼容了,有点难受。