{ "cells": [ { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "WORKDIR=%pwd " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Example 2: Water dipole and polarizability\n", "## Tasks\n", " - Train dipole and polarizability of [bulk water](https://github.com/lab-cosmo/SA-GPR/tree/master/example/water_bulk).\n", " - Use LAMMPS Calculator interface.\n", " - Plot the infrared and raman spectrum. \n", "\n", "## Prepare data \n", "You can see the following files required by training in this fold.\n", "\n", "```bash\n", "water\n", "|- dipole.yaml \n", "|- polar.yaml\n", "|- data/\n", " |- train.xyz \n", " |- test.xyz \n", "```\n", "\n", "[`dipole.yaml`](dipole.yaml) and [`polar.yaml`](polar.yaml) controls the details of model architecture and training for dipole and polarizability respectively. Some import parameters in this tasks are:\n", "\n", "For `dipole.yaml`:\n", "```yaml\n", "cutoff: 4.0\n", "Data:\n", " trainSet: data/train.xyz\n", " testSet: data/test.xyz\n", "Train:\n", " targetProp: [\"dipole\"] \n", " weight: [1.0]\n", "```\n", "\n", "These parameters mean the cutoff in this task is 4.0 Å. We use `data/train.xyz` as trainset and `data/test.xyz` as testset (actually it should be vaildation dataset here). We only train `dipole` in this task.\n", "\n", "For `polar.yaml`:\n", "```yaml\n", "Train:\n", " targetProp: [\"polarizability\"] \n", "```\n", "means we only train `polarizability`.\n", "\n", "## Train\n", "Now, we can train the model. For dipole:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "13:21:42 \n", "\n", " ) ( ( \n", " ( /( ) )\\ ) )\\ ) \n", " )\\()) ( /((()/((()/( \n", "((_)\\ ( )\\())/(_))/(_)) \n", " _((_) )\\ (_))/(_)) (_)) \n", "| || | ((_)| |_ | _ \\| _ \\ \n", "| __ |/ _ \\| _|| _/| _/ \n", "|_||_|\\___/ \\__||_| |_| \n", "HotPP (v.1.0.0 RELEASE)\n", "\n", "cp: cannot stat 'input.yaml': No such file or directory\n", "13:21:45 Using seed 0\n", "13:21:45 Preparing data...\n", "13:22:05 n_neighbor : 25.826607142857142\n", "13:22:05 all_elements : [1, 8]\n", "13:22:05 ground_energy : [0.0]\n", "13:22:05 std : 1.0\n", "13:22:05 mean : 0.0\n", "GPU available: True (cuda), used: True\n", "TPU available: False, using: 0 TPU cores\n", "IPU available: False, using: 0 IPUs\n", "HPU available: False, using: 0 HPUs\n", "`Trainer(val_check_interval=1.0)` was configured so validation will run at the end of the training epoch..\n", "13:22:06 Load checkpoints from dipole.ckpt\n", "You are using a CUDA device ('NVIDIA RTX 5880 Ada Generation') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision\n", "Missing logger folder: /home/gegejun/src/hotpp/examples/water/dipole/lightning_logs\n", "Restoring states from the checkpoint path at dipole.ckpt\n", "LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]\n", "\n", " | Name | Type | Params\n", "--------------------------------------------\n", "0 | model | MultiAtomicModule | 359 K \n", "--------------------------------------------\n", "359 K Trainable params\n", "0 Non-trainable params\n", "359 K Total params\n", "1.440 Total estimated model params size (MB)\n", "Restored all states from the checkpoint at dipole.ckpt\n", "/home/gegejun/anaconda3/envs/gpu/lib/python3.9/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:438: PossibleUserWarning: The dataloader, train_dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 52 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.\n", " rank_zero_warn(\n", "/home/gegejun/anaconda3/envs/gpu/lib/python3.9/site-packages/pytorch_lightning/loops/fit_loop.py:281: PossibleUserWarning: The number of training batches (44) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.\n", " rank_zero_warn(\n", "/home/gegejun/anaconda3/envs/gpu/lib/python3.9/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:438: PossibleUserWarning: The dataloader, val_dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 52 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.\n", " rank_zero_warn(\n", "13:22:07 epoch | step | lr | total | dipole \n", "13:22:07 793 | 34936 | 1.00e-03 | nan / 0.0006 | nan / 0.0006 \n", "13:22:44 794 | 34980 | 1.00e-03 | 0.0006 / 0.0008 | 0.0006 / 0.0008 \n", "13:23:30 795 | 35024 | 1.00e-03 | 0.0006 / 0.0008 | 0.0006 / 0.0008 \n", "13:24:20 796 | 35068 | 1.00e-03 | 0.0007 / 0.0007 | 0.0007 / 0.0007 \n", "13:25:16 797 | 35112 | 1.00e-03 | 0.0006 / 0.0007 | 0.0006 / 0.0007 \n", "13:25:51 798 | 35156 | 1.00e-03 | 0.0006 / 0.0007 | 0.0006 / 0.0007 \n", "13:26:26 799 | 35200 | 1.00e-03 | 0.0007 / 0.0009 | 0.0007 / 0.0009 \n", "13:27:08 800 | 35244 | 1.00e-03 | 0.0008 / 0.0009 | 0.0008 / 0.0009 \n", "13:27:49 801 | 35288 | 1.00e-03 | 0.0007 / 0.0007 | 0.0007 / 0.0007 \n", "13:28:25 802 | 35332 | 1.00e-03 | 0.0006 / 0.0007 | 0.0006 / 0.0007 \n", "`Trainer.fit` stopped: `max_epochs=803` reached.\n" ] } ], "source": [ "! hotpp train -i dipole.yaml -o dipole --load_checkpoint dipole.ckpt -ll INFO INFO" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "where \n", "- `-i dipole.yaml` means we use `dipole.yaml` as input parameters (default is `input.yaml`)\n", "- `-o dipole` means we use `dipole` as the output folder (default is `outDir`)\n", "- `-ll INFO INFO` means the log level for stream and file outputs are `INFO` and `INFO` (default are `DEBUG` and `INFO`)\n", "and similarly, for polarizability:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "13:29:36 \n", "\n", " ) ( ( \n", " ( /( ) )\\ ) )\\ ) \n", " )\\()) ( /((()/((()/( \n", "((_)\\ ( )\\())/(_))/(_)) \n", " _((_) )\\ (_))/(_)) (_)) \n", "| || | ((_)| |_ | _ \\| _ \\ \n", "| __ |/ _ \\| _|| _/| _/ \n", "|_||_|\\___/ \\__||_| |_| \n", "HotPP (v.1.0.0 RELEASE)\n", "\n", "cp: cannot stat 'input.yaml': No such file or directory\n", "13:29:39 Using seed 0\n", "13:29:39 Preparing data...\n", "13:29:42 n_neighbor : 25.826607142857142\n", "13:29:42 all_elements : [1, 8]\n", "13:29:42 ground_energy : [0.0]\n", "13:29:42 std : 1.0\n", "13:29:42 mean : 0.7\n", "GPU available: True (cuda), used: True\n", "TPU available: False, using: 0 TPU cores\n", "IPU available: False, using: 0 IPUs\n", "HPU available: False, using: 0 HPUs\n", "`Trainer(val_check_interval=1.0)` was configured so validation will run at the end of the training epoch..\n", "13:29:42 Load checkpoints from polar.ckpt\n", "You are using a CUDA device ('NVIDIA RTX 5880 Ada Generation') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision\n", "Missing logger folder: /home/gegejun/src/hotpp/examples/water/polar/lightning_logs\n", "Restoring states from the checkpoint path at polar.ckpt\n", "LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]\n", "\n", " | Name | Type | Params\n", "--------------------------------------------\n", "0 | model | MultiAtomicModule | 389 K \n", "--------------------------------------------\n", "389 K Trainable params\n", "0 Non-trainable params\n", "389 K Total params\n", "1.556 Total estimated model params size (MB)\n", "Restored all states from the checkpoint at polar.ckpt\n", "/home/gegejun/anaconda3/envs/gpu/lib/python3.9/site-packages/pytorch_lightning/loops/training_epoch_loop.py:151: UserWarning: You're resuming from a checkpoint that ended before the epoch ended. This can cause unreliable results if further training is done. Consider using an end-of-epoch checkpoint\n", " rank_zero_warn(\n", "/home/gegejun/anaconda3/envs/gpu/lib/python3.9/site-packages/pytorch_lightning/callbacks/model_checkpoint.py:359: UserWarning: `ModelCheckpoint(monitor='val_loss')` could not find the monitored key in the returned metrics: ['train_loss', 'train_polarizability', 'epoch', 'step']. HINT: Did you call `log('val_loss', value)` in the `LightningModule`?\n", " warning_cache.warn(m)\n", "13:30:26 epoch | step | lr | total | polarizability \n", "13:30:26 840 | 58845 | 1.00e-03 | 0.0009 / 0.0009 | 0.0009 / 0.0009 \n", "13:31:05 841 | 58889 | 1.00e-03 | 0.0009 / 0.0011 | 0.0009 / 0.0011 \n", "13:31:48 842 | 58933 | 1.00e-03 | 0.0009 / 0.0010 | 0.0009 / 0.0010 \n", "13:32:27 843 | 58977 | 1.00e-03 | 0.0008 / 0.0009 | 0.0008 / 0.0009 \n", "13:33:16 844 | 59021 | 1.00e-03 | 0.0009 / 0.0010 | 0.0009 / 0.0010 \n", "13:34:10 845 | 59065 | 1.00e-03 | 0.0009 / 0.0010 | 0.0009 / 0.0010 \n", "13:34:49 846 | 59109 | 1.00e-03 | 0.0008 / 0.0010 | 0.0008 / 0.0010 \n", "13:35:27 847 | 59153 | 1.00e-03 | 0.0009 / 0.0011 | 0.0009 / 0.0011 \n", "13:36:16 848 | 59197 | 1.00e-03 | 0.0009 / 0.0009 | 0.0009 / 0.0009 \n", "`Trainer.fit` stopped: `max_epochs=849` reached.\n" ] } ], "source": [ "! hotpp train -i polar.yaml -o polar --load_checkpoint polar.ckpt -ll INFO INFO" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## Eval\n", "After training done, we can evaluate the models:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/home/gegejun/src/hotpp/examples/water/eval\n", "100%|███████████████████████████████████████████| 19/19 [00:02<00:00, 6.87it/s]\n", "100%|███████████████████████████████████████████| 19/19 [00:09<00:00, 2.00it/s]\n" ] } ], "source": [ "%mkdir eval\n", "%cd eval\n", "! hotpp eval -m ../dipole/best.pt -d ../data/test.xyz -p dipole --device cuda -b 16\n", "! hotpp eval -m ../polar/best.pt -d ../data/test.xyz -p polarizability --device cuda -b 16" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And you can analyze them. To plot them to compare with DFT:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "polarizability: 0.0908 0.0698\n", " dipole : 0.0693 0.0515\n" ] } ], "source": [ "! hotpp plot -p polarizability dipole" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This command means that we plot `polarizability` and `dipole` calculated by `HotPP` and the target values. And you can see the results: \n", "\n", "|`polarizability`|`dipole`|\n", "|:-:|:-:|\n", "|\"polarizability\"|\"dipole\"|\n", "\n", "\n", "If you need plot `peratom polarizability` instead of `polarizability`, just use:\n", "```bash\n", "$ hotpp plot -p per_polarizability\n", "```\n", "\n", "## Molecule Dynamics\n", "Then we introduce how to calculate infrared and raman spectrum with lammps. \n", "First, we freeze the model so you can use it without `hotpp` being installed." ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/home/gegejun/src/hotpp/examples/water\n", "mkdir: cannot create directory ‘lmps’: File exists\n", "/home/gegejun/src/hotpp/examples/water/lmps\n" ] } ], "source": [ "%cd {WORKDIR}\n", "%mkdir lmps\n", "%cd lmps\n", "! hotpp freeze ../dipole/best.pt -o dipole.pt\n", "! hotpp freeze ../polar/best.pt -o polar.pt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can get `ase-dipole.pt`, `ase-polar.pt`, `lammps-dipole.pt`, and `lammps-polar.pt`. As its name, `lammps-dipole.pt` and `lammps-polar.pt` are we need. Besides, a machine learning potential is required to run molecular dynamics, and it can be trained as shown in the [carbon] example. Here, we skip this process and directly use the pre-trained force field.\n", "In summary, we need:\n", "```bash\n", "lmps\n", "|- lammps-dipole.pt # model to calculate dipole\n", "|- lammps-polar.pt # model to calculate polarizability\n", "|- lammps-infer.pt # model to calculate energy, forces, and virials\n", "|- restart.xyz # initial structure\n", "|- in.lammps # lammps input file\n", "|- lmp_hotpp # lammps binary compiled with hotpp\n", "```\n", "And the `lmp_hotpp` can be got as shown in the [document](https://hotpp.readthedocs.io/en/latest/install.html). \n", "\n", "To quickly show how we calculate infrared and raman spectrum, in this example we just run a 20 ps NVE with only 96 H2O. When used in practice, the simulation time and structure size should be extended. \n", "Now we can run lammps by:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "LAMMPS (2 Aug 2023 - Update 1)\n", "OMP_NUM_THREADS environment is not set. Defaulting to 1 thread. (src/lammps-2Aug2023/src/comm.cpp:98)\n", " using 1 OpenMP thread(s) per MPI task\n", "Reading data file ...\n", " orthogonal box = (0 0 0) to (13.348493 15.413518 14.532002)\n", " 1 by 1 by 1 MPI processor grid\n", " reading atoms ...\n", " 288 atoms\n", " reading velocities ...\n", " 288 velocities\n", " read_data CPU = 0.003 seconds\n", "The simulations are performed on the GPU\n", "ok\n", "Generated 0 of 1 mixed pair_coeff terms from geometric mixing rule\n", "The simulations are performed on the GPU\n", "ok\n", "The simulations are performed on the GPU\n", "ok\n", "Neighbor list info ...\n", " update: every = 10 steps, delay = 0 steps, check = no\n", " max neighbors/atom: 2000, page size: 100000\n", " master list distance cutoff = 6\n", " ghost atom cutoff = 6\n", " binsize = 3, bins = 5 6 5\n", " 3 neighbor lists, perpetual/occasional/extra = 1 2 0\n", " (1) pair miao, perpetual\n", " attributes: full, newton on\n", " pair build: full/bin/atomonly\n", " stencil: full/bin/3d\n", " bin: standard\n", " (2) compute miao/dipole, occasional, copy from (1)\n", " attributes: full, newton on\n", " pair build: copy\n", " stencil: none\n", " bin: none\n", " (3) compute miao/polarizability, occasional, copy from (1)\n", " attributes: full, newton on\n", " pair build: copy\n", " stencil: none\n", " bin: none\n", "Setting up Verlet run ...\n", " Unit style : metal\n", " Current step : 0\n", " Time step : 0.0005\n", "Per MPI rank memory allocation (min/avg/max) = 3.398 | 3.398 | 3.398 Mbytes\n", " Step PotEng KinEng TotEng Temp Press Volume \n", " 0 -25.274113 10.595795 -14.678317 285.61896 5714.4326 2989.9194 \n", " 2000 -24.876068 10.762293 -14.113775 290.10705 5837.1078 2989.9194 \n", " 4000 -25.068634 10.552267 -14.516367 284.44561 10893.037 2989.9194 \n", " 6000 -24.807093 10.840265 -13.966828 292.20884 5121.4215 2989.9194 \n", " 8000 -24.696821 10.775416 -13.921405 290.46078 8179.3193 2989.9194 \n", " 10000 -24.577827 10.886911 -13.690916 293.46624 13758.63 2989.9194 \n", " 12000 -25.001028 11.466796 -13.534232 309.09753 9750.4965 2989.9194 \n", " 14000 -24.072268 11.033265 -13.039002 297.41134 4151.8793 2989.9194 \n", " 16000 -24.983761 11.832023 -13.151737 318.94257 7842.389 2989.9194 \n", " 18000 -25.091131 12.09712 -12.994011 326.08848 12013.717 2989.9194 \n", " 20000 -24.735641 11.603845 -13.131797 312.79181 1327.8405 2989.9194 \n", " 22000 -24.224955 11.955183 -12.269772 322.26244 6106.2622 2989.9194 \n", " 24000 -24.92952 11.920466 -13.009054 321.32661 4742.5286 2989.9194 \n", " 26000 -24.36871 11.703269 -12.665441 315.47187 10279.647 2989.9194 \n", " 28000 -23.61507 11.501316 -12.113755 310.02805 6013.8162 2989.9194 \n", " 30000 -24.571568 12.66074 -11.910828 341.28134 4684.8707 2989.9194 \n", " 32000 -24.174849 12.356578 -11.81827 333.0824 4163.8555 2989.9194 \n", " 34000 -23.360071 12.387513 -10.972559 333.91627 11846.909 2989.9194 \n", " 36000 -24.550678 14.121505 -10.429173 380.65755 8654.3688 2989.9194 \n", " 38000 -23.345896 12.612058 -10.733838 339.96908 8257.0962 2989.9194 \n", " 40000 -22.509384 12.357559 -10.151825 333.10883 2826.3801 2989.9194 \n", "Loop time of 1968.52 on 1 procs for 40000 steps with 288 atoms\n", "\n", "Performance: 0.878 ns/day, 27.341 hours/ns, 20.320 timesteps/s, 5.852 katom-step/s\n", "92.1% CPU use with 1 MPI tasks x 1 OpenMP threads\n", "\n", "MPI task timing breakdown:\n", "Section | min time | avg time | max time |%varavg| %total\n", "---------------------------------------------------------------\n", "Pair | 1273.1 | 1273.1 | 1273.1 | 0.0 | 64.67\n", "Neigh | 3.8581 | 3.8581 | 3.8581 | 0.0 | 0.20\n", "Comm | 0.49211 | 0.49211 | 0.49211 | 0.0 | 0.02\n", "Output | 0.00097652 | 0.00097652 | 0.00097652 | 0.0 | 0.00\n", "Modify | 690.99 | 690.99 | 690.99 | 0.0 | 35.10\n", "Other | | 0.1057 | | | 0.01\n", "\n", "Nlocal: 288 ave 288 max 288 min\n", "Histogram: 1 0 0 0 0 0 0 0 0 0\n", "Nghost: 1640 ave 1640 max 1640 min\n", "Histogram: 1 0 0 0 0 0 0 0 0 0\n", "Neighs: 0 ave 0 max 0 min\n", "Histogram: 1 0 0 0 0 0 0 0 0 0\n", "FullNghs: 25640 ave 25640 max 25640 min\n", "Histogram: 1 0 0 0 0 0 0 0 0 0\n", "\n", "Total # of neighbors = 25640\n", "Ave neighs/atom = 89.027778\n", "Neighbor list builds = 4000\n", "Dangerous builds not checked\n", "Total wall time: 0:32:59\n" ] } ], "source": [ "! ./lmp_hotpp -in in.lammps" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we have got `dipole.txt` and `polar.txt`, and can calculate IR and Raman by:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "%cp ../../../tools/spectrum.py .\n", "import numpy as np\n", "from spectrum import load_lmps_dipole, load_lmps_polar, calc_acf_dp, calc_acf_beta, calc_ir, calc_raman, fs\n", "import matplotlib.pyplot as plt\n", "\n", "N = 1000\n", "T = 298.15\n", "dt = 1 * fs\n", "w_max = 4000\n", "\n", "dipole = load_lmps_dipole(\"dipole.txt\")\n", "acf_dp = calc_acf_dp(dipole, N)\n", "freq, ir = calc_ir(acf_dp, N, T, dt, w_max)\n", "ir = ir / ir.max()\n", "\n", "fig = plt.figure(figsize=(12, 7))\n", "fig.patch.set_facecolor('white')\n", "ax = plt.gca()\n", "ax.spines['bottom'].set_linewidth(3)\n", "ax.spines['left'].set_linewidth(3)\n", "ax.spines['right'].set_linewidth(3)\n", "ax.spines['top'].set_linewidth(3)\n", "ax.tick_params(labelsize=16)\n", "ax.set_xlabel('Frequency (cm-1)', fontsize=20)\n", "ax.set_ylabel('IR intensity (arb. units)', fontsize=20)\n", "plt.plot(freq, ir, color='blue', linewidth=3)\n", "plt.savefig(\"ir.png\")\n", "plt.close()\n", "\n", "polar = load_lmps_polar(\"polar.txt\")\n", "acf_beta = calc_acf_beta(polar, N)\n", "freq, raman = calc_raman(acf_beta, N, T, dt, w_max)\n", "raman /= raman.max()\n", "\n", "fig = plt.figure(figsize=(12, 7))\n", "fig.patch.set_facecolor('white')\n", "ax = plt.gca()\n", "ax.spines['bottom'].set_linewidth(3)\n", "ax.spines['left'].set_linewidth(3)\n", "ax.spines['right'].set_linewidth(3)\n", "ax.spines['top'].set_linewidth(3)\n", "ax.tick_params(labelsize=16)\n", "ax.set_xlabel('Frequency (cm-1)', fontsize=20)\n", "ax.set_ylabel('anisotropic Raman intensity (arb. units)', fontsize=20)\n", "plt.plot(freq, raman, color='blue', linewidth=3)\n", "plt.savefig(\"raman.png\")\n", "plt.close()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And you can see the results:\n", "|`IR`|`Raman`|\n", "|:-:|:-:|\n", "|\"IR\"|\"Raman\"|\n", "\n" ] } ], "metadata": { "kernelspec": { "display_name": "gpu", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.12" } }, "nbformat": 4, "nbformat_minor": 2 }