GeoAB 模型安装与训练

GeoAB 是一个专注于 抗体药物开发 的 AI 模型系统,主要解决两个核心问题:通过深度学习预测抗体的 3D 结构(尤其是抗原结合部位 CDR 区),生成符合物理规律和生物特性的抗体变体;模拟自然免疫系统的亲和力成熟过程,定向优化抗体对抗原的结合强度。

文献:GeoAB: Towards Realistic Antibody Design and Reliable Affinity Maturation

GitHub 上面其实给了安装步骤,但是很难评 … 我老半天装不上,pip 库和 conda 包混淆了,自己折腾了半天才装上。

官方给的 GitHub 链接是:https://github.com/EDAPINENUT/GeoAB

我这里直接把我的系统信息和 conda 配置放出来。

[kl2@localhost ~]$ cat /etc/centos-release
CentOS Linux release 8.5.2111
[kl2@localhost ~]$ nvidia-smi 
Mon Mar 31 10:26:43 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.03              Driver Version: 560.35.03      CUDA Version: 12.6     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA RTX A6000               Off |   00000000:17:00.0 Off |                  Off |
| 58%   82C    P0            246W /  300W |    8805MiB /  49140MiB |     76%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A      2931      G   /usr/libexec/Xorg                               4MiB |
|    0   N/A  N/A   2378303      C   python3                                      8782MiB |
+-----------------------------------------------------------------------------------------+

这里涉及到的系统级依赖比较多,所以 conda 和 pip 分开来进行安装,conda 可以直接导入下面的 yml:

name: geoab
channels:
  - https://conda.rosettacommons.org
  - conda-forge
  - bioconda
  - defaults
dependencies:
  - python==3.9
  - _libgcc_mutex=0.1
  - _openmp_mutex=4.5
  - bzip2=1.0.8
  - ca-certificates=2024.2.2
  - c-ares=1.19.1
  - krb5=1.21.2
  - libblas=3.9.0
  - libcblas=3.9.0
  - liblapack=3.9.0
  - libopenblas=0.3.24
  - libstdcxx-ng=13.2.0
  - libgcc-ng=13.1.0
  - libgfortran-ng=13.2.0
  - libgfortran5=13.2.0
  - libzlib=1.2.13
  - openssl=3.2.1

然后进入环境,安装 pip 的依赖:

aiohttp==3.11.14
biopython==1.81
biotite==0.38.0
e3nn==0.5.1
easydict==1.13
fair-esm==2.0.0
matplotlib==3.8.0
nni==3.0
pip==25.0.1
pytorch-lightning==2.0.9
rdkit==2023.3.2
tensorboard==2.13.0
torch_geometric==2.3.1
torch-scatter==2.1.2+pt20cpu
torchaudio==2.0.2
torchvision==0.15.2

这里 torch-scatter 由于和系统的 CUDA 版本不匹配,所以我是安装的预编译的 CPU 版本。

安装完成之后进入环境,拉取官方的仓库:

git clone https://github.com/EDAPINENUT/GeoAB.git
cd GeoAB

然后现在需要下载数据集:

wget "https://drive.usercontent.google.com/download?id=1UwFDuQSE_7gEvrfQkEhqOp2h1NfGAHua&export=download&authuser=0&confirm=t&uuid=7e68c0a8-d842-4cb7-b09c-48e7841cb0b1&at=AEz70l7zL3Yrg_jnQ6f5lACx2v-e:1743385403752" -O all_data.zip
unzip all_data.zip

Run the following command for training:

# Train GeoAB-refiner
python train_refine.py

# Train GeoAB-Initializer
python train_init.py
# After GeoAB-Initializer is trained, train GeoAB-Designer
python train_design.py

For evaluation, run the following command:

# Evaluate GeoAB-Refiner
python eval.py --eval_dir H3_refine --run 1

# Evaluate GeoAB-Designer
python eval.py --eval_dir H3_design