Spaces:

alexsoleg
/

cartnet-demo

Sleeping

App Files Files Community

Àlex Solé commited on Nov 18, 2024

Commit

744c6a1

1 Parent(s): 3fa8a08

merged from streamlit

Browse files

Files changed (15) hide show

.gitattributes +1 -0
.gitignore +1 -0
LICENSE +21 -0
README.md +57 -13
cpkt/cartnet_adp.ckpt +3 -0
fig/pipeline.png +3 -0
main.py +127 -0
main_local.py +119 -0
models/cartnet.py +289 -0
models/master.py +14 -0
models/utils.py +129 -0
predict.py +73 -0
process.py +100 -0
requirements.txt +8 -0
utils.py +323 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+*.png filter=lfs diff=lfs merge=lfs -text

.gitignore ADDED Viewed

	@@ -0,0 +1 @@


1	+ .DS_Store

LICENSE ADDED Viewed

	@@ -0,0 +1,21 @@

+MIT License
+Copyright (c) 2024 Àlex Solé Gómez
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

README.md CHANGED Viewed

@@ -1,13 +1,57 @@
----
-title: Cartnet Demo
-emoji: 📈
-colorFrom: purple
-colorTo: red
-sdk: streamlit
-sdk_version: 1.40.1
-app_file: app.py
-pinned: false
-license: mit
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# CartNet Streamlit Web App
+![Pipeline](fig/pipeline.png)
+### CartNet online demo available at: [CartNet Web App](https://cartnet-adp-estimation.streamlit.app)
+CartNet is a graph neural network specifically designed to predict Anisotropic Displacement Parameters (ADPs) in crystal structures. The model has been trained on over 220,000 molecular crystal structures from the Cambridge Structural Database (CSD), making it highly accurate and robust for ADP prediction tasks. CartNet addresses the computational challenges of traditional methods by encoding the full 3D geometry of atomic structures into a Cartesian reference frame, bypassing the need for unit cell encoding. The model incorporates innovative features, including a neighbour equalization technique to enhance interaction detection and a Cholesky-based output layer to ensure valid ADP predictions. Additionally, it introduces a rotational SO(3) data augmentation technique to improve generalization across different crystal structure orientations, making the model highly efficient and accurate in predicting ADPs while significantly reducing computational costs.
+This repository contains a web application based on the official implementation of CartNet, which can be found at [imatge-upc/CartNet](https://github.com/imatge-upc/CartNet).
+⚠️ **Warning**: The online web application can only process systems with less than 300 atoms in the unit cell. For large systems, please use the local application.
+## Local Application
+### Installation of the local application
+To set up the local application, you need to install the dependencies listed in `requirements.txt`. You can do this by running the following command:
+```bash
+pip install -r requirements.txt
+```
+### Usage
+You can make predictions directly from Python using the `predict.py` script.
+The script takes two arguments:
+1. `input_file`: Path to the input CIF file
+2. `output_file`: Path where you want to save the processed CIF file
+Example usage:
+```bash
+python predict.py input.cif output.cif
+```
+Or, if you prefer, you can use the browser app on your local machine without the atom number limitation by running:
+```bash
+streamlit run main_local.py
+```
+## How to cite
+If you use CartNet in your research, please cite our paper:
+```bibtex
+@article{your_paper_citation,
+title={Title of the Paper},
+author={Author1 and Author2 and Author3},
+journal={Journal Name},
+year={2023},
+volume={XX},
+number={YY},
+pages={ZZZ}
+}
+```

cpkt/cartnet_adp.ckpt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0829f2e8a631b380c04e2beb5c75b36767e5635321deea17fc5e7bafee643332
+size 30066357

fig/pipeline.png ADDED Viewed

Git LFS Details

SHA256: 2c02f5df0375788cb98c5f80872cb0859b48a86c23927757c531e2cc21de1d96
Pointer size: 132 Bytes
Size of remote file: 1.83 MB

main.py ADDED Viewed

	@@ -0,0 +1,127 @@

+import streamlit as st
+import os
+from ase.io import read
+from CifFile import ReadCif
+from torch_geometric.data import Data, Batch
+import torch
+from models.master import create_model
+from process import process_data
+from utils import radius_graph_pbc
+import gc
+MEAN_TEMP = torch.tensor(192.1785) #training temp mean
+STD_TEMP = torch.tensor(81.2135) #training temp std
+@torch.no_grad()
+def main():
+    model = create_model()
+    st.title("CartNet ADP Prediction")
+    st.image('fig/pipeline.png')
+    st.markdown("""
+                CartNet is a graph neural network specifically designed for predicting Anisotropic Displacement Parameters (ADPs) in crystal structures. The model has been trained on over 220,000 molecular crystal structures from the Cambridge Structural Database (CSD), making it highly accurate and robust for ADP prediction tasks. CartNet addresses the computational challenges of traditional methods by encoding the full 3D geometry of atomic structures into a Cartesian reference frame, bypassing the need for unit cell encoding. The model incorporates innovative features, including a neighbour equalization technique to enhance interaction detection and a Cholesky-based output layer to ensure valid ADP predictions. Additionally, it introduces a rotational SO(3) data augmentation technique to improve generalization across different crystal structure orientations, making the model highly efficient and accurate in predicting ADPs while significantly reducing computational costs.
+    """)
+    uploaded_file = st.file_uploader("Upload a CIF file", type=["cif"], accept_multiple_files=False)
+    # uploaded_file = "ABABEM.cif"
+    if uploaded_file is not None:
+        try:
+            with open(uploaded_file.name, "wb") as f:
+                f.write(uploaded_file.getbuffer())
+            filename = str(uploaded_file.name)
+            # Read the CIF file using ASE
+            atoms = read(filename, format="cif")
+            cif = ReadCif(filename)
+            cif_data = cif.first_block()
+            if "_diffrn_ambient_temperature" in cif_data.keys():
+                temperature = float(cif_data["_diffrn_ambient_temperature"])
+            else:
+                raise ValueError("Temperature not found in the CIF file. \
+                                    Please provide a temperature in the field _diffrn_ambient_temperature from the CIF file.")
+            st.success("CIF file successfully read.")
+            data = Data()
+            data.x = torch.tensor(atoms.get_atomic_numbers(), dtype=torch.int32)
+            if len(atoms.positions) > 300:
+                st.markdown("""
+                ⚠️ **Warning**: The structure is too large. Please upload a smaller one or use the [local implementation of CartNet Web App](https://github.com/alexsoleg/cartnet-streamlit/).
+                """)
+                raise ValueError("Please provide a structure with less than 300 atoms in the unit cell.")
+            data.pos = torch.tensor(atoms.positions, dtype=torch.float32)
+            data.temperature_og = torch.tensor([temperature], dtype=torch.float32)
+            data.temperature = (data.temperature_og - MEAN_TEMP) / STD_TEMP
+            data.cell = torch.tensor(atoms.cell.array, dtype=torch.float32).unsqueeze(0)
+            data.pbc = torch.tensor([True, True, True])
+            data.natoms = len(atoms)
+            del atoms
+            gc.collect()
+            batch = Batch.from_data_list([data])
+            edge_index, _, _, edge_attr = radius_graph_pbc(batch, 5.0, 64)
+            del batch
+            gc.collect()
+            data.cart_dist = torch.norm(edge_attr, dim=-1)
+            data.cart_dir = torch.nn.functional.normalize(edge_attr, dim=-1)
+            data.edge_index = edge_index
+            data.non_H_mask = data.x != 1
+            delattr(data, "pbc")
+            delattr(data, "natoms")
+            batch = Batch.from_data_list([data])
+            del data, edge_index, edge_attr
+            gc.collect()
+            st.success("Graph successfully created.")
+            process_data(batch, model)
+            st.success("ADPs successfully predicted.")
+            # Create a download button for the processed CIF file
+            with open("output.cif", "r") as f:
+                cif_contents = f.read()
+            st.download_button(
+                label="Download processed CIF file",
+                data=cif_contents,
+                file_name="output.cif",
+                mime="text/plain"
+            )
+            os.remove("output.cif")
+            os.remove(filename)
+            gc.collect()
+        except Exception as e:
+            st.error(f"An error occurred while reading the CIF file: {e}")
+    st.markdown("""
+    ⚠️ **Warning**: This online web application is designed for structures with up to 300 atoms in the unit cell. For larger structures, please use the [local implementation of CartNet Web App](https://github.com/alexsoleg/cartnet-streamlit/).
+    """)
+    st.markdown("""
+    📌 The official implementation of the paper with all experiments can be found at [CartNet GitHub Repository](https://github.com/imatge-upc/CartNet).
+    """)
+    st.markdown("""
+    ### How to cite
+    If you use CartNet in your research, please cite our paper:
+    ```bibtex
+    @article{your_paper_citation,
+    title={Title of the Paper},
+    author={Author1 and Author2 and Author3},
+    journal={Journal Name},
+    year={2023},
+    volume={XX},
+    number={YY},
+    pages={ZZZ}
+    }
+    ```
+    """)
+if __name__ == "__main__":
+    main()

main_local.py ADDED Viewed

	@@ -0,0 +1,119 @@

+import streamlit as st
+import os
+from ase.io import read
+from CifFile import ReadCif
+from torch_geometric.data import Data, Batch
+import torch
+from models.master import create_model
+from process import process_data
+from utils import radius_graph_pbc
+import gc
+MEAN_TEMP = torch.tensor(192.1785) #training temp mean
+STD_TEMP = torch.tensor(81.2135) #training temp std
+@torch.no_grad()
+def main():
+    model = create_model()
+    st.title("CartNet ADP Prediction")
+    st.image('fig/pipeline.png')
+    st.markdown("""
+                CartNet is a graph neural network specifically designed for predicting Anisotropic Displacement Parameters (ADPs) in crystal structures. The model has been trained on over 220,000 molecular crystal structures from the Cambridge Structural Database (CSD), making it highly accurate and robust for ADP prediction tasks. CartNet addresses the computational challenges of traditional methods by encoding the full 3D geometry of atomic structures into a Cartesian reference frame, bypassing the need for unit cell encoding. The model incorporates innovative features, including a neighbour equalization technique to enhance interaction detection and a Cholesky-based output layer to ensure valid ADP predictions. Additionally, it introduces a rotational SO(3) data augmentation technique to improve generalization across different crystal structure orientations, making the model highly efficient and accurate in predicting ADPs while significantly reducing computational costs.
+    """)
+    uploaded_file = st.file_uploader("Upload a CIF file", type=["cif"], accept_multiple_files=False)
+    # uploaded_file = "ABABEM.cif"
+    if uploaded_file is not None:
+        try:
+            with open(uploaded_file.name, "wb") as f:
+                f.write(uploaded_file.getbuffer())
+            filename = str(uploaded_file.name)
+            # Read the CIF file using ASE
+            atoms = read(filename, format="cif")
+            cif = ReadCif(filename)
+            cif_data = cif.first_block()
+            if "_diffrn_ambient_temperature" in cif_data.keys():
+                temperature = float(cif_data["_diffrn_ambient_temperature"])
+            else:
+                raise ValueError("Temperature not found in the CIF file. \
+                                    Please provide a temperature in the field _diffrn_ambient_temperature from the CIF file.")
+            st.success("CIF file successfully read.")
+            data = Data()
+            data.x = torch.tensor(atoms.get_atomic_numbers(), dtype=torch.int32)
+            data.pos = torch.tensor(atoms.positions, dtype=torch.float32)
+            data.temperature_og = torch.tensor([temperature], dtype=torch.float32)
+            data.temperature = (data.temperature_og - MEAN_TEMP) / STD_TEMP
+            data.cell = torch.tensor(atoms.cell.array, dtype=torch.float32).unsqueeze(0)
+            data.pbc = torch.tensor([True, True, True])
+            data.natoms = len(atoms)
+            del atoms
+            gc.collect()
+            batch = Batch.from_data_list([data])
+            edge_index, _, _, edge_attr = radius_graph_pbc(batch, 5.0, 64)
+            del batch
+            gc.collect()
+            data.cart_dist = torch.norm(edge_attr, dim=-1)
+            data.cart_dir = torch.nn.functional.normalize(edge_attr, dim=-1)
+            data.edge_index = edge_index
+            data.non_H_mask = data.x != 1
+            delattr(data, "pbc")
+            delattr(data, "natoms")
+            batch = Batch.from_data_list([data])
+            del data, edge_index, edge_attr
+            gc.collect()
+            st.success("Graph successfully created.")
+            process_data(batch, model)
+            st.success("ADPs successfully predicted.")
+            # Create a download button for the processed CIF file
+            with open("output.cif", "r") as f:
+                cif_contents = f.read()
+            st.download_button(
+                label="Download processed CIF file",
+                data=cif_contents,
+                file_name="output.cif",
+                mime="text/plain"
+            )
+            os.remove("output.cif")
+            os.remove(filename)
+            gc.collect()
+        except Exception as e:
+            st.error(f"An error occurred while reading the CIF file: {e}")
+    st.markdown("""
+    📌 The official implementation of the paper with all experiments can be found at [CartNet GitHub Repository](https://github.com/imatge-upc/CartNet).
+    """)
+    st.markdown("""
+    ### How to cite
+    If you use CartNet in your research, please cite our paper:
+    ```bibtex
+    @article{your_paper_citation,
+    title={Title of the Paper},
+    author={Author1 and Author2 and Author3},
+    journal={Journal Name},
+    year={2023},
+    volume={XX},
+    number={YY},
+    pages={ZZZ}
+    }
+    ```
+    """)
+if __name__ == "__main__":
+    main()

models/cartnet.py ADDED Viewed

	@@ -0,0 +1,289 @@

+# Copyright Universitat Politècnica de Catalunya 2024 https://imatge.upc.edu
+# Distributed under the MIT License.
+# (See accompanying file README.md file or copy at http://opensource.org/licenses/MIT)
+import torch
+import torch_geometric.nn as pyg_nn
+import torch.nn as nn
+import torch.nn.functional as F
+from torch_scatter import scatter
+from models.utils import ExpNormalSmearing, CosineCutoff
+class CartNet(torch.nn.Module):
+    """
+    CartNet model from Cartesian Encoding Graph Neural Network for Crystal Structures Property Prediction: Application to Thermal Ellipsoid Estimation.
+    Args:
+        dim_in (int): Dimensionality of the input features.
+        dim_rbf (int): Dimensionality of the radial basis function embeddings.
+        num_layers (int): Number of CartNet layers in the model.
+        radius (float, optional): Radius cutoff for neighbor interactions. Default is 5.0.
+        invariant (bool, optional): If `True`, enforces rotational invariance in the encoder. Default is `False`.
+        temperature (bool, optional): If `True`, includes temperature information in the encoder. Default is `True`.
+        use_envelope (bool, optional): If `True`, applies an envelope function to the interactions. Default is `True`.
+        cholesky (bool, optional): If `True`, uses a Cholesky head for the output. If `False`, uses a scalar head. Default is `True`.
+    Methods:
+        forward(batch):
+            Performs a forward pass of the model.
+            Args:
+                batch: A batch of input data.
+            Returns:
+                pred: The model's predictions.
+                true: The ground truth values corresponding to the input batch.
+    """
+    def __init__(self,
+        dim_in: int,
+        dim_rbf: int,
+        num_layers: int,
+        radius: float = 5.0,
+        invariant: bool = False,
+        temperature: bool = True,
+        use_envelope: bool = True,
+        cholesky: bool = True):
+        super().__init__()
+        self.encoder = Encoder(dim_in, dim_rbf=dim_rbf, radius=radius, invariant=invariant, temperature=temperature)
+        self.dim_in = dim_in
+        layers = []
+        for _ in range(num_layers):
+            layers.append(CartNet_layer(
+                dim_in=dim_in,
+                use_envelope=use_envelope,
+            ))
+        self.layers = torch.nn.Sequential(*layers)
+        if cholesky:
+            self.head = Cholesky_head(dim_in)
+        else:
+            self.head = Scalar_head(dim_in)
+    def forward(self, batch):
+        batch = self.encoder(batch)
+        for layer in self.layers:
+            batch = layer(batch)
+        pred = self.head(batch)
+        return pred
+class Encoder(torch.nn.Module):
+    """
+    Encoder module for the CartNet model.
+    This module encodes node and edge features for input into the CartNet model, incorporating optional temperature information and rotational invariance.
+    Args:
+        dim_in (int): Dimension of the input features after embedding.
+        dim_rbf (int): Dimension of the radial basis function used for edge attributes.
+        radius (float, optional): Cutoff radius for neighbor interactions. Defaults to 5.0.
+        invariant (bool, optional): If True, the encoder enforces rotational invariance by excluding directional information from edge attributes. Defaults to False.
+        temperature (bool, optional): If True, includes temperature data in the node embeddings. Defaults to True.
+    Attributes:
+        dim_in (int): Dimension of the input features.
+        invariant (bool): Indicates if rotational invariance is enforced.
+        temperature (bool): Indicates if temperature information is included.
+        embedding (nn.Embedding): Embedding layer mapping atomic numbers to feature vectors.
+        temperature_proj_atom (pyg_nn.Linear): Linear layer projecting temperature to embedding dimensions (used if temperature is True).
+        bias (nn.Parameter): Bias term added to embeddings (used if temperature is False).
+        activation (nn.Module): Activation function (SiLU).
+        encoder_atom (nn.Sequential): Sequential network encoding node features.
+        encoder_edge (nn.Sequential): Sequential network encoding edge features.
+        rbf (ExpNormalSmearing): Radial basis function for encoding distances.
+    """
+    def __init__(
+        self,
+        dim_in: int,
+        dim_rbf: int,
+        radius: float = 5.0,
+        invariant: bool = False,
+        temperature: bool = True,
+    ):
+        super(Encoder, self).__init__()
+        self.dim_in = dim_in
+        self.invariant = invariant
+        self.temperature = temperature
+        self.embedding = nn.Embedding(119, self.dim_in*2)
+        if self.temperature:
+            self.temperature_proj_atom = pyg_nn.Linear(1, self.dim_in*2, bias=True)
+        else:
+            self.bias = nn.Parameter(torch.zeros(self.dim_in*2))
+        self.activation = nn.SiLU(inplace=True)
+        self.encoder_atom = nn.Sequential(self.activation,
+                                        pyg_nn.Linear(self.dim_in*2, self.dim_in),
+                                        self.activation)
+        if self.invariant:
+            dim_edge = dim_rbf
+        else:
+            dim_edge = dim_rbf + 3
+        self.encoder_edge = nn.Sequential(pyg_nn.Linear(dim_edge, self.dim_in*2),
+                                        self.activation,
+                                        pyg_nn.Linear(self.dim_in*2, self.dim_in),
+                                        self.activation)
+        self.rbf = ExpNormalSmearing(0.0,radius,dim_rbf,False)
+        torch.nn.init.xavier_uniform_(self.embedding.weight.data)
+    def forward(self, batch):
+        x = self.embedding(batch.x) + self.temperature_proj_atom(batch.temperature.unsqueeze(-1))[batch.batch]
+        batch.x = self.encoder_atom(x)
+        batch.edge_attr = self.encoder_edge(torch.cat([self.rbf(batch.cart_dist), batch.cart_dir], dim=-1))
+        return batch
+class CartNet_layer(pyg_nn.conv.MessagePassing):
+    """
+    The message-passing layer used in the CartNet architecture.
+    Parameters:
+        dim_in (int): Dimension of the input node features.
+        use_envelope (bool, optional): If True, applies an envelope function to the distances. Defaults to True.
+    Attributes:
+        dim_in (int): Dimension of the input node features.
+        activation (nn.Module): Activation function (SiLU) used in the layer.
+        MLP_aggr (nn.Sequential): MLP used for aggregating messages.
+        MLP_gate (nn.Sequential): MLP used for computing gating coefficients.
+        norm (nn.BatchNorm1d): Batch normalization applied to the gating coefficients.
+        norm2 (nn.BatchNorm1d): Batch normalization applied to the aggregated messages.
+        use_envelope (bool): Indicates if the envelope function is used.
+        envelope (CosineCutoff): Envelope function applied to the distances.
+    """
+    def __init__(self,
+        dim_in: int,
+        use_envelope: bool = True
+    ):
+        super().__init__()
+        self.dim_in = dim_in
+        self.activation = nn.SiLU(inplace=True)
+        self.MLP_aggr = nn.Sequential(
+            pyg_nn.Linear(dim_in*3, dim_in, bias=True),
+            self.activation,
+            pyg_nn.Linear(dim_in, dim_in, bias=True),
+        )
+        self.MLP_gate = nn.Sequential(
+            pyg_nn.Linear(dim_in*3, dim_in, bias=True),
+            self.activation,
+            pyg_nn.Linear(dim_in, dim_in, bias=True),
+        )
+        self.norm = nn.BatchNorm1d(dim_in)
+        self.norm2 = nn.BatchNorm1d(dim_in)
+        self.use_envelope = use_envelope
+        self.envelope = CosineCutoff(0, 5.0)
+    def forward(self, batch):
+        x, e, edge_index, dist = batch.x, batch.edge_attr, batch.edge_index, batch.cart_dist
+        """
+        x               : [n_nodes, dim_in]
+        e               : [n_edges, dim_in]
+        edge_index      : [2, n_edges]
+        dist            : [n_edges]
+        batch           : [n_nodes]
+        """
+        x_in = x
+        e_in = e
+        x, e = self.propagate(edge_index,
+                                Xx=x, Ee=e,
+                                He=dist,
+                            )
+        batch.x = self.activation(x) + x_in
+        batch.edge_attr = e_in + e
+        return batch
+    def message(self, Xx_i, Ee, Xx_j, He):
+        """
+        x_i           : [n_edges, dim_in]
+        x_j           : [n_edges, dim_in]
+        e             : [n_edges, dim_in]
+        """
+        e_ij = self.MLP_gate(torch.cat([Xx_i, Xx_j, Ee], dim=-1))
+        e_ij = F.sigmoid(self.norm(e_ij))
+        if self.use_envelope:
+            sigma_ij = self.envelope(He).unsqueeze(-1)*e_ij
+        else:
+            sigma_ij = e_ij
+        self.e = sigma_ij
+        return sigma_ij
+    def aggregate(self, sigma_ij, index, Xx_i, Xx_j, Ee, Xx):
+        """
+        sigma_ij        : [n_edges, dim_in]  ; is the output from message() function
+        index           : [n_edges]
+        x_j           : [n_edges, dim_in]
+        """
+        dim_size = Xx.shape[0]
+        sender = self.MLP_aggr(torch.cat([Xx_i, Xx_j, Ee], dim=-1))
+        out = scatter(sigma_ij*sender, index, 0, None, dim_size,
+                                   reduce='sum')
+        return out
+    def update(self, aggr_out):
+        """
+        aggr_out        : [n_nodes, dim_in] ; is the output from aggregate() function after the aggregation
+        x             : [n_nodes, dim_in]
+        """
+        x = self.norm2(aggr_out)
+        e_out = self.e
+        del self.e
+        return x, e_out
+class Cholesky_head(torch.nn.Module):
+    """
+    The Cholesky head used in the CartNet model.
+    It enforce the positive definiteness of the output covariance matrix.
+    Args:
+        dim_in (int): The input dimension of the features.
+    """
+    def __init__(self,
+        dim_in: int
+    ):
+        super(Cholesky_head, self).__init__()
+        self.MLP = nn.Sequential(pyg_nn.Linear(dim_in, dim_in//2),
+                                nn.SiLU(inplace=True),
+                                pyg_nn.Linear(dim_in//2, 6))
+    def forward(self, batch):
+        pred = self.MLP(batch.x[batch.non_H_mask])
+        diag_elements = F.softplus(pred[:, :3])
+        i,j = torch.tensor([0,1,2,0,0,1]), torch.tensor([0,1,2,1,2,2])
+        L_matrix = torch.zeros(pred.size(0),3,3, device=pred.device, dtype=pred.dtype)
+        L_matrix[:,i[:3], i[:3]] = diag_elements
+        L_matrix[:,i[3:], j[3:]] = pred[:,3:]
+        U = torch.bmm(L_matrix.transpose(1, 2), L_matrix)
+        return U

models/master.py ADDED Viewed

	@@ -0,0 +1,14 @@

+import torch
+import streamlit as st
+from models.cartnet import CartNet
+# We cache the loading function to make is very fast on reload.
+@st.cache_resource
+def create_model():
+    model = CartNet(dim_in=256, dim_rbf=64, num_layers=4, radius=5.0, invariant=False, temperature=True, use_envelope=True, cholesky=True)
+    ckpt_path = "cpkt/cartnet_adp.ckpt"
+    load = torch.load(ckpt_path, map_location=torch.device('cpu'))["model_state"]
+    model.load_state_dict(load)
+    model.eval()
+    return model

models/utils.py ADDED Viewed

	@@ -0,0 +1,129 @@

+import torch
+import math
+import numpy as np
+from torch import nn, Tensor
+import torch.nn.functional as F
+from typing import Optional
+# Implementation from TensorNet
+# https://github.com/torchmd/torchmd-net
+class ExpNormalSmearing(nn.Module):
+    def __init__(
+        self,
+        cutoff_lower=0.0,
+        cutoff_upper=5.0,
+        num_rbf=50,
+        trainable=True,
+        dtype=torch.float32,
+    ):
+        super(ExpNormalSmearing, self).__init__()
+        self.cutoff_lower = cutoff_lower
+        self.cutoff_upper = cutoff_upper
+        self.num_rbf = num_rbf
+        self.trainable = trainable
+        self.dtype = dtype
+        self.cutoff_fn = CosineCutoff(0, cutoff_upper)
+        self.alpha = 5.0 / (cutoff_upper - cutoff_lower)
+        means, betas = self._initial_params()
+        if trainable:
+            self.register_parameter("means", nn.Parameter(means))
+            self.register_parameter("betas", nn.Parameter(betas))
+        else:
+            self.register_buffer("means", means)
+            self.register_buffer("betas", betas)
+    def _initial_params(self):
+        # initialize means and betas according to the default values in PhysNet
+        # https://pubs.acs.org/doi/10.1021/acs.jctc.9b00181
+        start_value = torch.exp(
+            torch.scalar_tensor(
+                -self.cutoff_upper + self.cutoff_lower, dtype=self.dtype
+            )
+        )
+        means = torch.linspace(start_value, 1, self.num_rbf, dtype=self.dtype)
+        betas = torch.tensor(
+            [(2 / self.num_rbf * (1 - start_value)) ** -2] * self.num_rbf,
+            dtype=self.dtype,
+        )
+        return means, betas
+    def reset_parameters(self):
+        means, betas = self._initial_params()
+        self.means.data.copy_(means)
+        self.betas.data.copy_(betas)
+    def forward(self, dist):
+        dist = dist.unsqueeze(-1)
+        return self.cutoff_fn(dist) * torch.exp(
+            -self.betas
+            * (torch.exp(self.alpha * (-dist + self.cutoff_lower)) - self.means) ** 2
+        )
+class CosineCutoff(nn.Module):
+    def __init__(self, cutoff_lower=0.0, cutoff_upper=5.0):
+        super(CosineCutoff, self).__init__()
+        self.cutoff_lower = cutoff_lower
+        self.cutoff_upper = cutoff_upper
+    def forward(self, distances: Tensor) -> Tensor:
+        if self.cutoff_lower > 0:
+            cutoffs = 0.5 * (
+                torch.cos(
+                    math.pi
+                    * (
+                        2
+                        * (distances - self.cutoff_lower)
+                        / (self.cutoff_upper - self.cutoff_lower)
+                        + 1.0
+                    )
+                )
+                + 1.0
+            )
+            # remove contributions below the cutoff radius
+            cutoffs = cutoffs * (distances < self.cutoff_upper)
+            cutoffs = cutoffs * (distances > self.cutoff_lower)
+            return cutoffs
+        else:
+            cutoffs = 0.5 * (torch.cos(distances * math.pi / self.cutoff_upper) + 1.0)
+            # remove contributions beyond the cutoff radius
+            cutoffs = cutoffs * (distances < self.cutoff_upper)
+            return cutoffs
+# Implementation from Comformer
+# https://github.com/divelab/AIRS/tree/main/OpenMat/ComFormer
+class RBFExpansion(nn.Module):
+    """Expand interatomic distances with radial basis functions."""
+    def __init__(
+        self,
+        vmin: float = 0,
+        vmax: float = 8,
+        bins: int = 40,
+        lengthscale: Optional[float] = None,
+    ):
+        """Register torch parameters for RBF expansion."""
+        super().__init__()
+        self.vmin = vmin
+        self.vmax = vmax
+        self.bins = bins
+        self.register_buffer(
+            "centers", torch.linspace(self.vmin, self.vmax, self.bins)
+        )
+        if lengthscale is None:
+            # SchNet-style
+            # set lengthscales relative to granularity of RBF expansion
+            self.lengthscale = np.diff(self.centers).mean()
+            self.gamma = 1 / self.lengthscale
+        else:
+            self.lengthscale = lengthscale
+            self.gamma = 1 / (lengthscale ** 2)
+    def forward(self, distance: torch.Tensor) -> torch.Tensor:
+        """Apply RBF expansion to interatomic distance tensor."""
+        return torch.exp(
+            -self.gamma * (distance.unsqueeze(1) - self.centers) ** 2
+        )

predict.py ADDED Viewed

	@@ -0,0 +1,73 @@

+import argparse
+import os
+from ase.io import read
+from CifFile import ReadCif
+from torch_geometric.data import Data, Batch
+import torch
+from models.master import create_model
+from process import process_data
+from utils import radius_graph_pbc
+import gc
+MEAN_TEMP = torch.tensor(192.1785) #training temp mean
+STD_TEMP = torch.tensor(81.2135) #training temp std
+@torch.no_grad()
+def process_cif(input_file, output_file):
+    model = create_model()
+    try:
+        # Read the CIF file using ASE
+        atoms = read(input_file, format="cif")
+        cif = ReadCif(input_file)
+        cif_data = cif.first_block()
+        if "_diffrn_ambient_temperature" in cif_data.keys():
+            temperature = float(cif_data["_diffrn_ambient_temperature"])
+        else:
+            raise ValueError("Temperature not found in the CIF file. \
+                                Please provide a temperature in the field _diffrn_ambient_temperature from the CIF file.")
+        data = Data()
+        data.x = torch.tensor(atoms.get_atomic_numbers(), dtype=torch.int32)
+        if len(atoms.positions) > 300:
+            raise ValueError("This implementation is not optimized for large systems. For large systems, please use the local version.")
+        data.pos = torch.tensor(atoms.positions, dtype=torch.float32)
+        data.temperature_og = torch.tensor([temperature], dtype=torch.float32)
+        data.temperature = (data.temperature_og - MEAN_TEMP) / STD_TEMP
+        data.cell = torch.tensor(atoms.cell.array, dtype=torch.float32).unsqueeze(0)
+        data.pbc = torch.tensor([True, True, True])
+        data.natoms = len(atoms)
+        del atoms
+        gc.collect()
+        batch = Batch.from_data_list([data])
+        edge_index, _, _, edge_attr = radius_graph_pbc(batch, 5.0, 64)
+        del batch
+        gc.collect()
+        data.cart_dist = torch.norm(edge_attr, dim=-1)
+        data.cart_dir = torch.nn.functional.normalize(edge_attr, dim=-1)
+        data.edge_index = edge_index
+        data.non_H_mask = data.x != 1
+        delattr(data, "pbc")
+        delattr(data, "natoms")
+        batch = Batch.from_data_list([data])
+        del data, edge_index, edge_attr
+        gc.collect()
+        process_data(batch, model, output_file)
+        gc.collect()
+    except Exception as e:
+        print(f"An error occurred while processing the CIF file: {e}")
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description="Process a CIF file and output the result.")
+    parser.add_argument("input_file", type=str, help="Path to the input CIF file.")
+    parser.add_argument("output_file", type=str, help="Path to the output CIF file.")
+    args = parser.parse_args()
+    process_cif(args.input_file, args.output_file)

process.py ADDED Viewed

	@@ -0,0 +1,100 @@

+import torch
+from ase.io import write
+from ase import Atoms
+import gc
+@torch.no_grad()
+def process_data(batch, model, output_file="output.cif"):
+    atoms = batch.x.numpy().astype(int)  # Atomic numbers
+    positions = batch.pos.numpy()  # Atomic positions
+    cell = batch.cell.squeeze(0).numpy()  # Cell parameters
+    temperature = batch.temperature_og.numpy()[0]
+    adps = model(batch)
+    # Convert Ucart to Ucif
+    M = batch.cell.squeeze(0)
+    N = torch.diag(torch.linalg.norm(torch.linalg.inv(M.transpose(-1,-2)).squeeze(0), dim=-1))
+    M = torch.linalg.inv(M)
+    N = torch.linalg.inv(N)
+    adps = M.transpose(-1,-2)@adps@M
+    adps = N.transpose(-1,-2)@adps@N
+    del M, N
+    gc.collect()
+    non_H_mask = batch.non_H_mask.numpy()
+    indices = torch.arange(len(atoms))[non_H_mask].numpy()
+    indices = {indices[i]: i for i in range(len(indices))}
+    # Create ASE Atoms object
+    ase_atoms = Atoms(numbers=atoms, positions=positions, cell=cell, pbc=True)
+    # Convert positions to fractional coordinates
+    fractional_positions = ase_atoms.get_scaled_positions()
+    # Write to CIF file
+    write(output_file, ase_atoms)
+    with open(output_file, 'r') as file:
+        lines = file.readlines()
+    # Find the line where "loop_" appears and remove lines from there to the end
+    for i, line in enumerate(lines):
+        if line.strip().startswith('loop_'):
+            lines = lines[:i]
+            break
+    # Write the modified lines to a new output file
+    with open(output_file, 'w') as file:
+        file.writelines(lines)
+    # Manually append positions and ADPs to the CIF file
+    with open(output_file, 'a') as cif_file:
+        # Write temperature
+        cif_file.write(f"\n_diffrn_ambient_temperature    {temperature}\n")
+        # Write atomic positions
+        cif_file.write("\nloop_\n")
+        cif_file.write("_atom_site_label\n")
+        cif_file.write("_atom_site_type_symbol\n")
+        cif_file.write("_atom_site_fract_x\n")
+        cif_file.write("_atom_site_fract_y\n")
+        cif_file.write("_atom_site_fract_z\n")
+        cif_file.write("_atom_site_U_iso_or_equiv\n")
+        cif_file.write("_atom_site_thermal_displace_type\n")
+        element_count = {}
+        for i, (atom_number, frac_pos) in enumerate(zip(atoms, fractional_positions)):
+            element = ase_atoms[i].symbol
+            assert atom_number == ase_atoms[i].number
+            if element not in element_count:
+                element_count[element] = 0
+            element_count[element] += 1
+            label = f"{element}{element_count[element]}"
+            u_iso = torch.trace(adps[indices[i]]).mean() if element != 'H' else 0.01
+            type = "Uani" if element != 'H' else "Uiso"
+            cif_file.write(f"{label} {element} {frac_pos[0]} {frac_pos[1]} {frac_pos[2]} {u_iso} {type}\n")
+        # Write ADPs
+        cif_file.write("\nloop_\n")
+        cif_file.write("_atom_site_aniso_label\n")
+        cif_file.write("_atom_site_aniso_U_11\n")
+        cif_file.write("_atom_site_aniso_U_22\n")
+        cif_file.write("_atom_site_aniso_U_33\n")
+        cif_file.write("_atom_site_aniso_U_23\n")
+        cif_file.write("_atom_site_aniso_U_13\n")
+        cif_file.write("_atom_site_aniso_U_12\n")
+        element_count = {}
+        for i, atom_number in enumerate(atoms):
+            if atom_number == 1:
+                continue
+            element = ase_atoms[i].symbol
+            if element not in element_count:
+                element_count[element] = 0
+            element_count[element] += 1
+            label = f"{element}{element_count[element]}"
+            cif_file.write(f"{label} {adps[indices[i],0,0]} {adps[indices[i],1,1]} {adps[indices[i],2,2]} {adps[indices[i],1,2]} {adps[indices[i],0,2]} {adps[indices[i],0,1]}\n")

requirements.txt ADDED Viewed

	@@ -0,0 +1,8 @@

+numpy<2
+torch==1.13.1
+torch_geometric==2.5.2
+torch-scatter==2.1.1
+-f https://data.pyg.org/whl/torch-1.13.1+cpu.html
+streamlit==1.40.1
+ase==3.23.0
+PyCifRW==4.4.6

utils.py ADDED Viewed

	@@ -0,0 +1,323 @@

+"""
+Copyright (c) Facebook, Inc. and its affiliates.
+This source code is licensed under the MIT license found in the
+LICENSE file in the root directory of this source tree.
+"""
+import numpy as np
+import torch
+from torch_scatter import segment_coo, segment_csr
+def radius_graph_pbc(
+    data,
+    radius,
+    max_num_neighbors_threshold,
+    enforce_max_neighbors_strictly: bool = False,
+    pbc=[True, True, True],
+):
+    device = data.pos.device
+    batch_size = len(data.natoms)
+    if hasattr(data, "pbc"):
+        data.pbc = torch.atleast_2d(data.pbc)
+        for i in range(3):
+            if not torch.any(data.pbc[:, i]).item():
+                pbc[i] = False
+            elif torch.all(data.pbc[:, i]).item():
+                pbc[i] = True
+            else:
+                raise RuntimeError(
+                    "Different structures in the batch have different PBC configurations. This is not currently supported."
+                )
+    # position of the atoms
+    atom_pos = data.pos
+    # Before computing the pairwise distances between atoms, first create a list of atom indices to compare for the entire batch
+    num_atoms_per_image = data.natoms
+    num_atoms_per_image_sqr = (num_atoms_per_image**2).long()
+    # index offset between images
+    index_offset = (
+        torch.cumsum(num_atoms_per_image, dim=0) - num_atoms_per_image
+    )
+    index_offset_expand = torch.repeat_interleave(
+        index_offset, num_atoms_per_image_sqr
+    )
+    num_atoms_per_image_expand = torch.repeat_interleave(
+        num_atoms_per_image, num_atoms_per_image_sqr
+    )
+    # Compute a tensor containing sequences of numbers that range from 0 to num_atoms_per_image_sqr for each image
+    # that is used to compute indices for the pairs of atoms. This is a very convoluted way to implement
+    # the following (but 10x faster since it removes the for loop)
+    # for batch_idx in range(batch_size):
+    #    batch_count = torch.cat([batch_count, torch.arange(num_atoms_per_image_sqr[batch_idx], device=device)], dim=0)
+    num_atom_pairs = torch.sum(num_atoms_per_image_sqr)
+    index_sqr_offset = (
+        torch.cumsum(num_atoms_per_image_sqr, dim=0) - num_atoms_per_image_sqr
+    )
+    index_sqr_offset = torch.repeat_interleave(
+        index_sqr_offset, num_atoms_per_image_sqr
+    )
+    atom_count_sqr = (
+        torch.arange(num_atom_pairs, device=device) - index_sqr_offset
+    )
+    # Compute the indices for the pairs of atoms (using division and mod)
+    # If the systems get too large this apporach could run into numerical precision issues
+    index1 = (
+        torch.div(
+            atom_count_sqr, num_atoms_per_image_expand, rounding_mode="floor"
+        )
+    ) + index_offset_expand
+    index2 = (
+        atom_count_sqr % num_atoms_per_image_expand
+    ) + index_offset_expand
+    # Get the positions for each atom
+    pos1 = torch.index_select(atom_pos, 0, index1)
+    pos2 = torch.index_select(atom_pos, 0, index2)
+    # Calculate required number of unit cells in each direction.
+    # Smallest distance between planes separated by a1 is
+    # 1 / ||(a2 x a3) / V||_2, since a2 x a3 is the area of the plane.
+    # Note that the unit cell volume V = a1 * (a2 x a3) and that
+    # (a2 x a3) / V is also the reciprocal primitive vector
+    # (crystallographer's definition).
+    cross_a2a3 = torch.cross(data.cell[:, 1], data.cell[:, 2], dim=-1)
+    cell_vol = torch.sum(data.cell[:, 0] * cross_a2a3, dim=-1, keepdim=True)
+    if pbc[0]:
+        inv_min_dist_a1 = torch.norm(cross_a2a3 / cell_vol, p=2, dim=-1)
+        rep_a1 = torch.ceil(radius * inv_min_dist_a1)
+    else:
+        rep_a1 = data.cell.new_zeros(1)
+    if pbc[1]:
+        cross_a3a1 = torch.cross(data.cell[:, 2], data.cell[:, 0], dim=-1)
+        inv_min_dist_a2 = torch.norm(cross_a3a1 / cell_vol, p=2, dim=-1)
+        rep_a2 = torch.ceil(radius * inv_min_dist_a2)
+    else:
+        rep_a2 = data.cell.new_zeros(1)
+    if pbc[2]:
+        cross_a1a2 = torch.cross(data.cell[:, 0], data.cell[:, 1], dim=-1)
+        inv_min_dist_a3 = torch.norm(cross_a1a2 / cell_vol, p=2, dim=-1)
+        rep_a3 = torch.ceil(radius * inv_min_dist_a3)
+    else:
+        rep_a3 = data.cell.new_zeros(1)
+    # Take the max over all images for uniformity. This is essentially padding.
+    # Note that this can significantly increase the number of computed distances
+    # if the required repetitions are very different between images
+    # (which they usually are). Changing this to sparse (scatter) operations
+    # might be worth the effort if this function becomes a bottleneck.
+    max_rep = [rep_a1.max(), rep_a2.max(), rep_a3.max()]
+    # Tensor of unit cells
+    cells_per_dim = [
+        torch.arange(-rep, rep + 1, device=device, dtype=torch.float)
+        for rep in max_rep
+    ]
+    unit_cell = torch.cartesian_prod(*cells_per_dim)
+    num_cells = len(unit_cell)
+    unit_cell_per_atom = unit_cell.view(1, num_cells, 3).repeat(
+        len(index2), 1, 1
+    )
+    unit_cell = torch.transpose(unit_cell, 0, 1)
+    unit_cell_batch = unit_cell.view(1, 3, num_cells).expand(
+        batch_size, -1, -1
+    )
+    # Compute the x, y, z positional offsets for each cell in each image
+    data_cell = torch.transpose(data.cell, 1, 2)
+    pbc_offsets = torch.bmm(data_cell, unit_cell_batch)
+    pbc_offsets_per_atom = torch.repeat_interleave(
+        pbc_offsets, num_atoms_per_image_sqr, dim=0
+    )
+    # Expand the positions and indices for the 9 cells
+    pos1 = pos1.view(-1, 3, 1).expand(-1, -1, num_cells)
+    pos2 = pos2.view(-1, 3, 1).expand(-1, -1, num_cells)
+    index1 = index1.view(-1, 1).repeat(1, num_cells).view(-1)
+    index2 = index2.view(-1, 1).repeat(1, num_cells).view(-1)
+    # Add the PBC offsets for the second atom
+    pos2 = pos2 + pbc_offsets_per_atom
+    # Compute the squared distance between atoms
+    direction = pos1 - pos2
+    atom_distance_sqr = torch.sum((direction) ** 2, dim=1)
+    direction = direction.permute(0, 2, 1).reshape(-1, 3)
+    atom_distance_sqr = atom_distance_sqr.view(-1)
+    # Remove pairs that are too far apart
+    mask_within_radius = torch.le(atom_distance_sqr, radius * radius)
+    # Remove pairs with the same atoms (distance = 0.0)
+    mask_not_same = torch.gt(atom_distance_sqr, 0.0001)
+    mask = torch.logical_and(mask_within_radius, mask_not_same)
+    index1 = torch.masked_select(index1, mask)
+    index2 = torch.masked_select(index2, mask)
+    unit_cell = torch.masked_select(
+        unit_cell_per_atom.view(-1, 3), mask.view(-1, 1).expand(-1, 3)
+    )
+    unit_cell = unit_cell.view(-1, 3)
+    atom_distance_sqr = torch.masked_select(atom_distance_sqr, mask)
+    direction = torch.masked_select(direction, mask.view(-1, 1).expand(-1, 3)).view(-1, 3)
+    if max_num_neighbors_threshold is not None:
+        mask_num_neighbors, num_neighbors_image = get_max_neighbors_mask(
+            natoms=data.natoms,
+            index=index1,
+            atom_distance=atom_distance_sqr,
+            max_num_neighbors_threshold=max_num_neighbors_threshold,
+            enforce_max_strictly=enforce_max_neighbors_strictly,
+        )
+        if not torch.all(mask_num_neighbors):
+            # Mask out the atoms to ensure each atom has at most max_num_neighbors_threshold neighbors
+            index1 = torch.masked_select(index1, mask_num_neighbors)
+            index2 = torch.masked_select(index2, mask_num_neighbors)
+            atom_distance_sqr = torch.masked_select(atom_distance_sqr, mask_num_neighbors)
+            direction = torch.masked_select(direction, mask_num_neighbors.view(-1, 1).expand(-1, 3)).view(-1, 3)
+            unit_cell = torch.masked_select(
+                unit_cell.view(-1, 3), mask_num_neighbors.view(-1, 1).expand(-1, 3)
+            )
+            unit_cell = unit_cell.view(-1, 3)
+    edge_index = torch.stack((index2, index1))
+    return edge_index, unit_cell, torch.sqrt(atom_distance_sqr), direction
+def get_max_neighbors_mask(
+    natoms,
+    index,
+    atom_distance,
+    max_num_neighbors_threshold,
+    degeneracy_tolerance: float = 0.01,
+    enforce_max_strictly: bool = False,
+):
+    """
+    Give a mask that filters out edges so that each atom has at most
+    `max_num_neighbors_threshold` neighbors.
+    Assumes that `index` is sorted.
+    Enforcing the max strictly can force the arbitrary choice between
+    degenerate edges. This can lead to undesired behaviors; for
+    example, bulk formation energies which are not invariant to
+    unit cell choice.
+    A degeneracy tolerance can help prevent sudden changes in edge
+    existence from small changes in atom position, for example,
+    rounding errors, slab relaxation, temperature, etc.
+    """
+    device = natoms.device
+    num_atoms = natoms.sum()
+    # Get number of neighbors
+    # segment_coo assumes sorted index
+    ones = index.new_ones(1).expand_as(index)
+    num_neighbors = segment_coo(ones, index, dim_size=num_atoms)
+    max_num_neighbors = num_neighbors.max()
+    num_neighbors_thresholded = num_neighbors.clamp(
+        max=max_num_neighbors_threshold
+    )
+    # Get number of (thresholded) neighbors per image
+    image_indptr = torch.zeros(
+        natoms.shape[0] + 1, device=device, dtype=torch.long
+    )
+    image_indptr[1:] = torch.cumsum(natoms, dim=0)
+    num_neighbors_image = segment_csr(num_neighbors_thresholded, image_indptr)
+    # If max_num_neighbors is below the threshold, return early
+    if (
+        max_num_neighbors <= max_num_neighbors_threshold
+        or max_num_neighbors_threshold <= 0
+    ):
+        mask_num_neighbors = torch.tensor(
+            [True], dtype=bool, device=device
+        ).expand_as(index)
+        return mask_num_neighbors, num_neighbors_image
+    # Create a tensor of size [num_atoms, max_num_neighbors] to sort the distances of the neighbors.
+    # Fill with infinity so we can easily remove unused distances later.
+    distance_sort = torch.full(
+        [num_atoms * max_num_neighbors], np.inf, device=device
+    )
+    # Create an index map to map distances from atom_distance to distance_sort
+    # index_sort_map assumes index to be sorted
+    index_neighbor_offset = torch.cumsum(num_neighbors, dim=0) - num_neighbors
+    index_neighbor_offset_expand = torch.repeat_interleave(
+        index_neighbor_offset, num_neighbors
+    )
+    index_sort_map = (
+        index * max_num_neighbors
+        + torch.arange(len(index), device=device)
+        - index_neighbor_offset_expand
+    )
+    print(index_sort_map.dtype, atom_distance.dtype)
+    distance_sort.index_copy_(0, index_sort_map, atom_distance)
+    distance_sort = distance_sort.view(num_atoms, max_num_neighbors)
+    # Sort neighboring atoms based on distance
+    distance_sort, index_sort = torch.sort(distance_sort, dim=1)
+    # Select the max_num_neighbors_threshold neighbors that are closest
+    if enforce_max_strictly:
+        distance_sort = distance_sort[:, :max_num_neighbors_threshold]
+        index_sort = index_sort[:, :max_num_neighbors_threshold]
+        max_num_included = max_num_neighbors_threshold
+    else:
+        effective_cutoff = (
+            distance_sort[:, max_num_neighbors_threshold]
+            + degeneracy_tolerance
+        )
+        is_included = torch.le(distance_sort.T, effective_cutoff)
+        # Set all undesired edges to infinite length to be removed later
+        distance_sort[~is_included.T] = np.inf
+        # Subselect tensors for efficiency
+        num_included_per_atom = torch.sum(is_included, dim=0)
+        max_num_included = torch.max(num_included_per_atom)
+        distance_sort = distance_sort[:, :max_num_included]
+        index_sort = index_sort[:, :max_num_included]
+        # Recompute the number of neighbors
+        num_neighbors_thresholded = num_neighbors.clamp(
+            max=num_included_per_atom
+        )
+        num_neighbors_image = segment_csr(
+            num_neighbors_thresholded, image_indptr
+        )
+    # Offset index_sort so that it indexes into index
+    index_sort = index_sort + index_neighbor_offset.view(-1, 1).expand(
+        -1, max_num_included
+    )
+    # Remove "unused pairs" with infinite distances
+    mask_finite = torch.isfinite(distance_sort)
+    index_sort = torch.masked_select(index_sort, mask_finite)
+    # At this point index_sort contains the index into index of the
+    # closest max_num_neighbors_threshold neighbors per atom
+    # Create a mask to remove all pairs not in index_sort
+    mask_num_neighbors = torch.zeros(len(index), device=device, dtype=bool)
+    mask_num_neighbors.index_fill_(0, index_sort, True)
+    return mask_num_neighbors, num_neighbors_image