Analysis Module¶

The analysis module provides low-level utilities for structure graph construction and compositional sequence computation.

StructureGraph¶

Bases: StructureGraph

Extended StructureGraph with methods for Graph ID computation.

This class extends pymatgen's StructureGraph with additional functionality for computing compositional sequences and handling loops/rings in the structure graph.

Attributes:

Name	Type	Description
`starting_labels`	`list of str`	Labels for each site used as starting points for compositional sequences.
`cc_cs`	`list of dict`	Compositional sequences for each connected component. Each dict contains `site_i` (set of site indices) and `cs_list` (list of compositional sequence strings).

See Also

pymatgen.analysis.graphs.StructureGraph : Base class

Source code in graph_id/analysis/graphs.py

class StructureGraph(PmgStructureGraph):  # type: ignore
    """Extended StructureGraph with methods for Graph ID computation.

    This class extends pymatgen's StructureGraph with additional functionality
    for computing compositional sequences and handling loops/rings in the
    structure graph.

    Attributes
    ----------
    starting_labels : list of str
        Labels for each site used as starting points for compositional sequences.
    cc_cs : list of dict
        Compositional sequences for each connected component.
        Each dict contains ``site_i`` (set of site indices) and ``cs_list``
        (list of compositional sequence strings).

    See Also
    --------
    pymatgen.analysis.graphs.StructureGraph : Base class

    """

    @staticmethod
    def from_pymatgen_structure_graph(sg: PmgStructureGraph):
        """Create a StructureGraph from a pymatgen StructureGraph.

        Parameters
        ----------
        sg : PmgStructureGraph
            A pymatgen StructureGraph object.

        Returns
        -------
        StructureGraph
            A new StructureGraph instance.

        """
        graph_data = sg.as_dict()["graphs"]

        return StructureGraph(sg.structure, graph_data)

    @staticmethod
    def from_local_env_strategy(structure, strategy, weights=False):
        """Create a StructureGraph using a neighbor-finding strategy.

        Parameters
        ----------
        structure : Structure
            A pymatgen Structure object.
        strategy : NearNeighbors
            A neighbor-finding strategy from pymatgen.analysis.local_env,
            such as MinimumDistanceNN, CrystalNN, etc.
        weights : bool, default False
            If True, include bond weights from the strategy.

        Returns
        -------
        StructureGraph
            A new StructureGraph with edges representing bonds.

        Raises
        ------
        ValueError
            If the strategy does not support structures.

        Examples
        --------
        >>> from pymatgen.analysis.local_env import MinimumDistanceNN
        >>> sg = StructureGraph.from_local_env_strategy(
        ...     structure, MinimumDistanceNN()
        ... )

        """
        if not strategy.structures_allowed:
            msg = "Chosen strategy is not designed for use with structures! Please choose another strategy."
            raise ValueError(msg)

        sg = StructureGraph.from_empty_graph(structure, name="bonds")

        for n, neighbors in enumerate(strategy.get_all_nn_info(structure)):
            for neighbor in neighbors:
                # local_env will always try to add two edges
                # for any one bond, one from site u to site v
                # and another form site v to site u: this is
                # harmless, so warn_duplicates=False
                sg.add_edge(
                    from_index=n,
                    from_jimage=(0, 0, 0),
                    to_index=neighbor["site_index"],
                    to_jimage=neighbor["image"],
                    weight=neighbor["weight"] if weights else None,
                    warn_duplicates=False,
                )

        return sg

    @staticmethod
    def with_indivisual_state_comp_strategy(structure, strategy, _sg, n, weights=False, rank_k=1, cutoff=6.0):
        """Add edges for a specific site using a distance clustering strategy.

        This method is used by DistanceClusteringGraphID to add bonds
        for a specific site and distance cluster.

        Parameters
        ----------
        structure : Structure
            A pymatgen Structure object.
        strategy : DistanceClusteringNN
            A distance clustering neighbor-finding strategy.
        _sg : StructureGraph
            An existing StructureGraph to modify.
        n : int
            The site index to add edges for.
        weights : bool, default False
            If True, include bond weights from the strategy.
        rank_k : int, default 1
            The distance cluster index (0-based).
        cutoff : float, default 6.0
            Maximum distance cutoff in Angstroms.

        Returns
        -------
        StructureGraph
            The modified StructureGraph with new edges.

        Raises
        ------
        ValueError
            If the strategy does not support structures.

        """
        if not strategy.structures_allowed:
            raise ValueError(  # noqa: TRY003
                "Chosen strategy is not designed for use with structures!",  # noqa: EM101
            )

        nn_info = strategy.get_nn_info(structure, n, rank_k, cutoff)

        for neighbor in nn_info:
            # local_env will always try to add two edges
            # for any one bond, one from site u to site v
            # and another form site v to site u: this is
            # harmless, so warn_duplicates=False
            _sg.add_edge(
                from_index=n,
                from_jimage=(0, 0, 0),
                to_index=neighbor["site_index"],
                to_jimage=neighbor["image"],
                weight=neighbor["weight"] if weights else None,
                warn_duplicates=False,
                edge_properties=neighbor["edge_properties"],
            )

        return _sg

    def set_elemental_labels(self):
        """Set element symbols as starting labels for compositional sequences.

        This is the default labeling scheme where each site is labeled
        by its element symbol (e.g., "Na", "Cl").
        """
        self.starting_labels = [site.species_string for site in self.structure]

    def get_connected_sites_light(self, n, jimage=(0, 0, 0)):
        """Get connected sites with minimal memory footprint.

        A lightweight version of get_connected_sites that returns
        ConnectedSiteLight objects instead of full ConnectedSite objects.

        Parameters
        ----------
        n : int
            The site index to get neighbors for.
        jimage : tuple, default (0, 0, 0)
            The periodic image of the site.

        Returns
        -------
        list of ConnectedSiteLight
            List of connected sites with minimal information.

        """
        connected_sites = set()
        connected_site_images = set()

        out_edges = [(u, v, d, "out") for u, v, d in self.graph.out_edges(n, data=True)]
        in_edges = [(u, v, d, "in") for u, v, d in self.graph.in_edges(n, data=True)]

        for u, v, d, direction in out_edges + in_edges:
            to_jimage = d["to_jimage"]

            if direction == "in":
                u, v = v, u  # noqa: PLW2901
                to_jimage = np.multiply(-1, to_jimage)

            to_jimage = tuple(map(int, np.add(to_jimage, jimage)))

            if (v, to_jimage) not in connected_site_images:
                connected_site = ConnectedSiteLight(
                    site=self.structure[v],
                    jimage=to_jimage,
                    index=v,
                    weight=None,
                    dist=None,
                )

                connected_sites.add(connected_site)
                connected_site_images.add((v, to_jimage))

        return list(connected_sites)

    def set_wyckoffs(self, symmetry_tol: float = 0.01) -> None:
        """Set Wyckoff position labels for each site.

        Labels each site with its element, Wyckoff letter, and space group
        number in the format ``"{element}_{wyckoff}_{spacegroup}"``.

        Parameters
        ----------
        symmetry_tol : float, default 0.01
            Tolerance for symmetry detection in Angstroms.

        Notes
        -----
        If symmetry detection fails, falls back to elemental labels.

        """
        siteless_strc = self.structure.copy()

        for site_i in range(len(self.structure)):
            siteless_strc.replace(site_i, Element("H"))

        sga = SpacegroupAnalyzer(siteless_strc, symprec=symmetry_tol)
        sym_dataset = sga.get_symmetry_dataset()

        if sym_dataset is None:
            self.set_elemental_labels()
            return

        wyckoffs = sym_dataset.wyckoffs
        number = sym_dataset.number

        attribute_values = {}

        self.starting_labels = []
        for site_i, w in enumerate(wyckoffs):
            attribute_values[site_i] = f"{self.structure[site_i].species_string}_{w}_{number}"
            self.starting_labels.append(f"{self.structure[site_i].species_string}_{w}_{number}")

    def set_compositional_sequence_node_attr(
        self,
        hash_cs: bool = False,
        wyckoff: bool = False,
        additional_depth: int = 0,
        diameter_factor: int = 2,
        use_previous_cs: bool = False,
    ) -> None:
        """Compute and set compositional sequences as node attributes.

        This is the core method that computes the local environment
        fingerprint for each site by traversing the graph and counting
        neighbors at each depth.

        Parameters
        ----------
        hash_cs : bool, default False
            If True, hash the compositional sequence incrementally
            during computation for memory efficiency.
        wyckoff : bool, default False
            If True, use Wyckoff labels in the computation.
        additional_depth : int, default 0
            Extra traversal depth to add.
        diameter_factor : int, default 2
            Multiplier for graph diameter to determine traversal depth.
        use_previous_cs : bool, default False
            If True, use previous compositional sequence as starting labels.

        Notes
        -----
        After calling this method:

        - Node attributes are set with key ``"compositional_sequence"``
        - ``self.cc_cs`` contains compositional sequences per component

        """
        node_attributes = {}
        self.cc_cs = []
        get_connected_sites_light = functools.lru_cache(maxsize=None)(self.get_connected_sites_light)

        ug = self.graph.to_undirected()

        for cc in nx.connected_components(ug):
            cs_list = []

            d = diameter(ug.subgraph(cc))

            for focused_site_i in cc:
                depth = diameter_factor * d + additional_depth

                cs = CompositionalSequence(
                    focused_site_i=focused_site_i,
                    starting_labels=self.starting_labels,
                    hash_cs=hash_cs,
                    use_previous_cs=use_previous_cs or wyckoff,
                )

                for _ in range(depth):
                    for c_site in cs.get_current_starting_sites():
                        nsites = get_connected_sites_light(c_site[0], c_site[1])
                        cs.count_composition_for_neighbors(nsites)

                    cs.finalize_this_depth()

                this_cs = str(cs)

                node_attributes[focused_site_i] = self.starting_labels[focused_site_i] + "_" + this_cs
                cs_list.append(this_cs)

            self.cc_cs.append({"site_i": cc, "cs_list": cs_list})

        nx.set_node_attributes(self.graph, values=node_attributes, name="compositional_sequence")

    def get_loops(self, depth: int, index: int, shortest: bool = True):  # noqa: C901
        """Find all loops/rings starting from a given atom.

        Traverses the graph to find closed loops that start and end at the
        specified atom index.

        Parameters
        ----------
        depth : int
            Maximum loop size to search for.
        index : int
            The starting atom index.
        shortest : bool, default True
            If True, stop searching when all theoretically possible shortest
            loops are found.

        Returns
        -------
        list of list of tuple
            A list of loops, where each loop is a list of ``(index, image)``
            tuples representing the path.

        Notes
        -----
        Loops are found by breadth-first traversal and tracking when paths
        return to their starting point.

        """
        get_connected_sites = functools.lru_cache(maxsize=None)(self.get_connected_sites)

        def find_all_rings(index, ring_list):
            neighbors = get_connected_sites(index, (0, 0, 0))
            for n0, n1 in combinations(neighbors, 2):
                found = False
                for ring in ring_list:
                    term0 = ring[1]
                    term1 = ring[-2]

                    if all(
                        (
                            n0.index == term0[0],
                            n0.jimage == term0[1],
                            n1.index == term1[0],
                            n1.jimage == term1[1],
                        ),
                    ):
                        found = True
                        break

                    if all(
                        (
                            n1.index == term0[0],
                            n1.jimage == term0[1],
                            n0.index == term1[0],
                            n0.jimage == term1[1],
                        ),
                    ):
                        found = True
                        break

                if found is False:
                    return False

            return True

        def get_further_lines_from_lines(lines):
            new_lines = []
            for line in lines:
                ind, image = line[-1]
                neighbors = get_connected_sites(ind, image)

                for n in neighbors:
                    new_line = [*line, (n.index, n.jimage)]

                    # 戻らない場合のみ。
                    if len(new_line[:-1]) == len(set(new_line[:-1])):
                        new_lines.append(new_line)

            return new_lines

        lines = []
        lines.append([(index, (0, 0, 0))])

        ring_list = []

        for depth_i in range(depth):
            next_lines = []
            lines = get_further_lines_from_lines(lines)

            for line in lines:
                # 前と後ろが同じ
                if line[0] == line[-1]:
                    if depth_i > 1 and list(reversed(line)) not in ring_list:
                        ring_list.append(line)
                else:
                    next_lines.append(line)

            lines = next_lines

            # ここで理論上の値に達したら探索を打ち切る
            if shortest and find_all_rings(index, ring_list):
                return ring_list

        return list(ring_list)

    def set_loops(self, diameter_factor: int, additional_depth: int) -> None:
        """Set loop-based labels for each site.

        Computes all loops for each site and creates a hashed label
        representing the ring topology around that site.

        Parameters
        ----------
        diameter_factor : int
            Multiplier for graph diameter to determine search depth.
        additional_depth : int
            Extra depth to add to the search.

        Notes
        -----
        Sets ``self.starting_labels`` to hashed loop representations.
        Used when ``loop=True`` in GraphIDGenerator.

        """
        self.starting_labels = []

        undirected_graph = self.graph.to_undirected()

        max_diameter = 0
        for cc in nx.connected_components(undirected_graph):
            d = diameter(undirected_graph.subgraph(cc))
            max_diameter = max(max_diameter, d)

        depth = max_diameter * diameter_factor + additional_depth

        for site_i in range(len(self.graph.nodes)):
            all_loops = self.get_loops(depth=depth, index=site_i)
            all_loop_strings = []
            # print(all_loops)
            for loop in all_loops:
                loop_elements = []
                for site_i_jimage in loop:
                    loop_species_string = self.structure[site_i_jimage[0]].species_string
                    # print(loop_species_string)
                    loop_elements.append(loop_species_string)

                loop_elements = standardize_loop(loop_elements)

                seed_str = "-".join(loop_elements)
                hashed_loop = blake2b(seed_str.encode(), digest_size=8).hexdigest()

                all_loop_strings.append(hashed_loop)

            seed_str_all_loops = ":".join(sorted(all_loop_strings))
            hashed_all_loops = blake2b(seed_str_all_loops.encode(), digest_size=8).hexdigest()

            self.starting_labels.append(hashed_all_loops)

    def set_indivisual_compositional_sequence_node_attr(
        self,
        n: int,
        hash_cs: bool = False,
        wyckoff: bool = False,
        additional_depth: int = 0,
        diameter_factor: int = 2,
        use_previous_cs: bool = False,
    ) -> None:
        """Compute compositional sequence for a single site.

        Similar to set_compositional_sequence_node_attr, but only computes
        the sequence for the specified site. Used by DistanceClusteringGraphID.

        Parameters
        ----------
        n : int
            The site index to compute the sequence for.
        hash_cs : bool, default False
            If True, hash the sequence incrementally.
        wyckoff : bool, default False
            If True, use Wyckoff labels.
        additional_depth : int, default 0
            Extra traversal depth.
        diameter_factor : int, default 2
            Multiplier for graph diameter.
        use_previous_cs : bool, default False
            If True, use previous sequence as starting labels.

        """
        node_attributes = {}
        self.cc_cs = []
        get_connected_sites_light = functools.lru_cache(maxsize=None)(self.get_connected_sites_light)

        ug = self.graph.to_undirected()

        for cc in nx.connected_components(ug):
            cs_list = []

            d = diameter(ug.subgraph(cc))

            if n in cc:
                depth = diameter_factor * d + additional_depth

                cs = CompositionalSequence(
                    focused_site_i=n,
                    starting_labels=self.starting_labels,
                    hash_cs=hash_cs,
                    use_previous_cs=use_previous_cs or wyckoff,
                )

                for _this_depth in range(depth):
                    for c_site in cs.get_current_starting_sites():
                        nsites = get_connected_sites_light(c_site[0], c_site[1])
                        cs.count_composition_for_neighbors(nsites)

                    cs.finalize_this_depth()

                this_cs = str(cs)

                node_attributes[n] = self.starting_labels[n] + "_" + this_cs
                cs_list.append(this_cs)

                self.cc_cs.append({"site_i": cc, "cs_list": cs_list})

        nx.set_node_attributes(self.graph, values=node_attributes, name="compositional_sequence")

Methods:¶

`from_local_env_strategy(structure, strategy, weights=False)` `staticmethod` ¶

Create a StructureGraph using a neighbor-finding strategy.

Parameters:

Name	Type	Description	Default
`structure`	`Structure`	A pymatgen Structure object.	required
`strategy`	`NearNeighbors`	A neighbor-finding strategy from pymatgen.analysis.local_env, such as MinimumDistanceNN, CrystalNN, etc.	required
`weights`	`bool`	If True, include bond weights from the strategy.	`False`

Returns:

Type	Description
`StructureGraph`	A new StructureGraph with edges representing bonds.

Raises:

Type	Description
`ValueError`	If the strategy does not support structures.

Examples:

>>> from pymatgen.analysis.local_env import MinimumDistanceNN
>>> sg = StructureGraph.from_local_env_strategy(
...     structure, MinimumDistanceNN()
... )

Source code in graph_id/analysis/graphs.py

@staticmethod
def from_local_env_strategy(structure, strategy, weights=False):
    """Create a StructureGraph using a neighbor-finding strategy.

    Parameters
    ----------
    structure : Structure
        A pymatgen Structure object.
    strategy : NearNeighbors
        A neighbor-finding strategy from pymatgen.analysis.local_env,
        such as MinimumDistanceNN, CrystalNN, etc.
    weights : bool, default False
        If True, include bond weights from the strategy.

    Returns
    -------
    StructureGraph
        A new StructureGraph with edges representing bonds.

    Raises
    ------
    ValueError
        If the strategy does not support structures.

    Examples
    --------
    >>> from pymatgen.analysis.local_env import MinimumDistanceNN
    >>> sg = StructureGraph.from_local_env_strategy(
    ...     structure, MinimumDistanceNN()
    ... )

    """
    if not strategy.structures_allowed:
        msg = "Chosen strategy is not designed for use with structures! Please choose another strategy."
        raise ValueError(msg)

    sg = StructureGraph.from_empty_graph(structure, name="bonds")

    for n, neighbors in enumerate(strategy.get_all_nn_info(structure)):
        for neighbor in neighbors:
            # local_env will always try to add two edges
            # for any one bond, one from site u to site v
            # and another form site v to site u: this is
            # harmless, so warn_duplicates=False
            sg.add_edge(
                from_index=n,
                from_jimage=(0, 0, 0),
                to_index=neighbor["site_index"],
                to_jimage=neighbor["image"],
                weight=neighbor["weight"] if weights else None,
                warn_duplicates=False,
            )

    return sg

`with_indivisual_state_comp_strategy(structure, strategy, _sg, n, weights=False, rank_k=1, cutoff=6.0)` `staticmethod` ¶

Add edges for a specific site using a distance clustering strategy.

This method is used by DistanceClusteringGraphID to add bonds for a specific site and distance cluster.

Parameters:

Name	Type	Description	Default
`structure`	`Structure`	A pymatgen Structure object.	required
`strategy`	`DistanceClusteringNN`	A distance clustering neighbor-finding strategy.	required
`_sg`	`StructureGraph`	An existing StructureGraph to modify.	required
`n`	`int`	The site index to add edges for.	required
`weights`	`bool`	If True, include bond weights from the strategy.	`False`
`rank_k`	`int`	The distance cluster index (0-based).	`1`
`cutoff`	`float`	Maximum distance cutoff in Angstroms.	`6.0`

Returns:

Type	Description
`StructureGraph`	The modified StructureGraph with new edges.

Raises:

Type	Description
`ValueError`	If the strategy does not support structures.

Source code in graph_id/analysis/graphs.py

@staticmethod
def with_indivisual_state_comp_strategy(structure, strategy, _sg, n, weights=False, rank_k=1, cutoff=6.0):
    """Add edges for a specific site using a distance clustering strategy.

    This method is used by DistanceClusteringGraphID to add bonds
    for a specific site and distance cluster.

    Parameters
    ----------
    structure : Structure
        A pymatgen Structure object.
    strategy : DistanceClusteringNN
        A distance clustering neighbor-finding strategy.
    _sg : StructureGraph
        An existing StructureGraph to modify.
    n : int
        The site index to add edges for.
    weights : bool, default False
        If True, include bond weights from the strategy.
    rank_k : int, default 1
        The distance cluster index (0-based).
    cutoff : float, default 6.0
        Maximum distance cutoff in Angstroms.

    Returns
    -------
    StructureGraph
        The modified StructureGraph with new edges.

    Raises
    ------
    ValueError
        If the strategy does not support structures.

    """
    if not strategy.structures_allowed:
        raise ValueError(  # noqa: TRY003
            "Chosen strategy is not designed for use with structures!",  # noqa: EM101
        )

    nn_info = strategy.get_nn_info(structure, n, rank_k, cutoff)

    for neighbor in nn_info:
        # local_env will always try to add two edges
        # for any one bond, one from site u to site v
        # and another form site v to site u: this is
        # harmless, so warn_duplicates=False
        _sg.add_edge(
            from_index=n,
            from_jimage=(0, 0, 0),
            to_index=neighbor["site_index"],
            to_jimage=neighbor["image"],
            weight=neighbor["weight"] if weights else None,
            warn_duplicates=False,
            edge_properties=neighbor["edge_properties"],
        )

    return _sg

`set_elemental_labels()` ¶

Set element symbols as starting labels for compositional sequences.

This is the default labeling scheme where each site is labeled by its element symbol (e.g., "Na", "Cl").

Source code in graph_id/analysis/graphs.py

def set_elemental_labels(self):
    """Set element symbols as starting labels for compositional sequences.

    This is the default labeling scheme where each site is labeled
    by its element symbol (e.g., "Na", "Cl").
    """
    self.starting_labels = [site.species_string for site in self.structure]

`set_wyckoffs(symmetry_tol: float = 0.01) -> None` ¶

Set Wyckoff position labels for each site.

Labels each site with its element, Wyckoff letter, and space group number in the format "{element}_{wyckoff}_{spacegroup}".

Parameters:

Name	Type	Description	Default
`symmetry_tol`	`float`	Tolerance for symmetry detection in Angstroms.	`0.01`

Notes

If symmetry detection fails, falls back to elemental labels.

Source code in graph_id/analysis/graphs.py

def set_wyckoffs(self, symmetry_tol: float = 0.01) -> None:
    """Set Wyckoff position labels for each site.

    Labels each site with its element, Wyckoff letter, and space group
    number in the format ``"{element}_{wyckoff}_{spacegroup}"``.

    Parameters
    ----------
    symmetry_tol : float, default 0.01
        Tolerance for symmetry detection in Angstroms.

    Notes
    -----
    If symmetry detection fails, falls back to elemental labels.

    """
    siteless_strc = self.structure.copy()

    for site_i in range(len(self.structure)):
        siteless_strc.replace(site_i, Element("H"))

    sga = SpacegroupAnalyzer(siteless_strc, symprec=symmetry_tol)
    sym_dataset = sga.get_symmetry_dataset()

    if sym_dataset is None:
        self.set_elemental_labels()
        return

    wyckoffs = sym_dataset.wyckoffs
    number = sym_dataset.number

    attribute_values = {}

    self.starting_labels = []
    for site_i, w in enumerate(wyckoffs):
        attribute_values[site_i] = f"{self.structure[site_i].species_string}_{w}_{number}"
        self.starting_labels.append(f"{self.structure[site_i].species_string}_{w}_{number}")

`set_compositional_sequence_node_attr(hash_cs: bool = False, wyckoff: bool = False, additional_depth: int = 0, diameter_factor: int = 2, use_previous_cs: bool = False) -> None` ¶

Compute and set compositional sequences as node attributes.

This is the core method that computes the local environment fingerprint for each site by traversing the graph and counting neighbors at each depth.

Parameters:

Name	Type	Description	Default
`hash_cs`	`bool`	If True, hash the compositional sequence incrementally during computation for memory efficiency.	`False`
`wyckoff`	`bool`	If True, use Wyckoff labels in the computation.	`False`
`additional_depth`	`int`	Extra traversal depth to add.	`0`
`diameter_factor`	`int`	Multiplier for graph diameter to determine traversal depth.	`2`
`use_previous_cs`	`bool`	If True, use previous compositional sequence as starting labels.	`False`

Notes

After calling this method:

Node attributes are set with key "compositional_sequence"
self.cc_cs contains compositional sequences per component

Source code in graph_id/analysis/graphs.py

def set_compositional_sequence_node_attr(
    self,
    hash_cs: bool = False,
    wyckoff: bool = False,
    additional_depth: int = 0,
    diameter_factor: int = 2,
    use_previous_cs: bool = False,
) -> None:
    """Compute and set compositional sequences as node attributes.

    This is the core method that computes the local environment
    fingerprint for each site by traversing the graph and counting
    neighbors at each depth.

    Parameters
    ----------
    hash_cs : bool, default False
        If True, hash the compositional sequence incrementally
        during computation for memory efficiency.
    wyckoff : bool, default False
        If True, use Wyckoff labels in the computation.
    additional_depth : int, default 0
        Extra traversal depth to add.
    diameter_factor : int, default 2
        Multiplier for graph diameter to determine traversal depth.
    use_previous_cs : bool, default False
        If True, use previous compositional sequence as starting labels.

    Notes
    -----
    After calling this method:

    - Node attributes are set with key ``"compositional_sequence"``
    - ``self.cc_cs`` contains compositional sequences per component

    """
    node_attributes = {}
    self.cc_cs = []
    get_connected_sites_light = functools.lru_cache(maxsize=None)(self.get_connected_sites_light)

    ug = self.graph.to_undirected()

    for cc in nx.connected_components(ug):
        cs_list = []

        d = diameter(ug.subgraph(cc))

        for focused_site_i in cc:
            depth = diameter_factor * d + additional_depth

            cs = CompositionalSequence(
                focused_site_i=focused_site_i,
                starting_labels=self.starting_labels,
                hash_cs=hash_cs,
                use_previous_cs=use_previous_cs or wyckoff,
            )

            for _ in range(depth):
                for c_site in cs.get_current_starting_sites():
                    nsites = get_connected_sites_light(c_site[0], c_site[1])
                    cs.count_composition_for_neighbors(nsites)

                cs.finalize_this_depth()

            this_cs = str(cs)

            node_attributes[focused_site_i] = self.starting_labels[focused_site_i] + "_" + this_cs
            cs_list.append(this_cs)

        self.cc_cs.append({"site_i": cc, "cs_list": cs_list})

    nx.set_node_attributes(self.graph, values=node_attributes, name="compositional_sequence")

`get_loops(depth: int, index: int, shortest: bool = True)` ¶

Find all loops/rings starting from a given atom.

Traverses the graph to find closed loops that start and end at the specified atom index.

Parameters:

Name	Type	Description	Default
`depth`	`int`	Maximum loop size to search for.	required
`index`	`int`	The starting atom index.	required
`shortest`	`bool`	If True, stop searching when all theoretically possible shortest loops are found.	`True`

Returns:

Type	Description
`list of list of tuple`	A list of loops, where each loop is a list of `(index, image)` tuples representing the path.

Notes

Loops are found by breadth-first traversal and tracking when paths return to their starting point.

Source code in graph_id/analysis/graphs.py

def get_loops(self, depth: int, index: int, shortest: bool = True):  # noqa: C901
    """Find all loops/rings starting from a given atom.

    Traverses the graph to find closed loops that start and end at the
    specified atom index.

    Parameters
    ----------
    depth : int
        Maximum loop size to search for.
    index : int
        The starting atom index.
    shortest : bool, default True
        If True, stop searching when all theoretically possible shortest
        loops are found.

    Returns
    -------
    list of list of tuple
        A list of loops, where each loop is a list of ``(index, image)``
        tuples representing the path.

    Notes
    -----
    Loops are found by breadth-first traversal and tracking when paths
    return to their starting point.

    """
    get_connected_sites = functools.lru_cache(maxsize=None)(self.get_connected_sites)

    def find_all_rings(index, ring_list):
        neighbors = get_connected_sites(index, (0, 0, 0))
        for n0, n1 in combinations(neighbors, 2):
            found = False
            for ring in ring_list:
                term0 = ring[1]
                term1 = ring[-2]

                if all(
                    (
                        n0.index == term0[0],
                        n0.jimage == term0[1],
                        n1.index == term1[0],
                        n1.jimage == term1[1],
                    ),
                ):
                    found = True
                    break

                if all(
                    (
                        n1.index == term0[0],
                        n1.jimage == term0[1],
                        n0.index == term1[0],
                        n0.jimage == term1[1],
                    ),
                ):
                    found = True
                    break

            if found is False:
                return False

        return True

    def get_further_lines_from_lines(lines):
        new_lines = []
        for line in lines:
            ind, image = line[-1]
            neighbors = get_connected_sites(ind, image)

            for n in neighbors:
                new_line = [*line, (n.index, n.jimage)]

                # 戻らない場合のみ。
                if len(new_line[:-1]) == len(set(new_line[:-1])):
                    new_lines.append(new_line)

        return new_lines

    lines = []
    lines.append([(index, (0, 0, 0))])

    ring_list = []

    for depth_i in range(depth):
        next_lines = []
        lines = get_further_lines_from_lines(lines)

        for line in lines:
            # 前と後ろが同じ
            if line[0] == line[-1]:
                if depth_i > 1 and list(reversed(line)) not in ring_list:
                    ring_list.append(line)
            else:
                next_lines.append(line)

        lines = next_lines

        # ここで理論上の値に達したら探索を打ち切る
        if shortest and find_all_rings(index, ring_list):
            return ring_list

    return list(ring_list)

`set_loops(diameter_factor: int, additional_depth: int) -> None` ¶

Set loop-based labels for each site.

Computes all loops for each site and creates a hashed label representing the ring topology around that site.

Parameters:

Name	Type	Description	Default
`diameter_factor`	`int`	Multiplier for graph diameter to determine search depth.	required
`additional_depth`	`int`	Extra depth to add to the search.	required

Notes

Sets self.starting_labels to hashed loop representations. Used when loop=True in GraphIDGenerator.

Source code in graph_id/analysis/graphs.py

def set_loops(self, diameter_factor: int, additional_depth: int) -> None:
    """Set loop-based labels for each site.

    Computes all loops for each site and creates a hashed label
    representing the ring topology around that site.

    Parameters
    ----------
    diameter_factor : int
        Multiplier for graph diameter to determine search depth.
    additional_depth : int
        Extra depth to add to the search.

    Notes
    -----
    Sets ``self.starting_labels`` to hashed loop representations.
    Used when ``loop=True`` in GraphIDGenerator.

    """
    self.starting_labels = []

    undirected_graph = self.graph.to_undirected()

    max_diameter = 0
    for cc in nx.connected_components(undirected_graph):
        d = diameter(undirected_graph.subgraph(cc))
        max_diameter = max(max_diameter, d)

    depth = max_diameter * diameter_factor + additional_depth

    for site_i in range(len(self.graph.nodes)):
        all_loops = self.get_loops(depth=depth, index=site_i)
        all_loop_strings = []
        # print(all_loops)
        for loop in all_loops:
            loop_elements = []
            for site_i_jimage in loop:
                loop_species_string = self.structure[site_i_jimage[0]].species_string
                # print(loop_species_string)
                loop_elements.append(loop_species_string)

            loop_elements = standardize_loop(loop_elements)

            seed_str = "-".join(loop_elements)
            hashed_loop = blake2b(seed_str.encode(), digest_size=8).hexdigest()

            all_loop_strings.append(hashed_loop)

        seed_str_all_loops = ":".join(sorted(all_loop_strings))
        hashed_all_loops = blake2b(seed_str_all_loops.encode(), digest_size=8).hexdigest()

        self.starting_labels.append(hashed_all_loops)

Extended version of pymatgen's StructureGraph with additional methods for Graph ID generation.

Import¶

from graph_id.analysis.graphs import StructureGraph

Class Methods¶

from_local_env_strategy¶

@staticmethod
def from_local_env_strategy(structure, strategy, weights=False)

Constructor for StructureGraph using a neighbor detection strategy.

Parameters:

structure (Structure): pymatgen Structure object
strategy (NearNeighbors): A neighbor detection strategy
weights (bool): If True, use weights from the strategy

Returns:

StructureGraph: Constructed structure graph

Example:

from graph_id.analysis.graphs import StructureGraph
from pymatgen.analysis.local_env import MinimumDistanceNN

sg = StructureGraph.from_local_env_strategy(structure, MinimumDistanceNN())

Instance Methods¶

set_elemental_labels¶

def set_elemental_labels(self)

Set elemental species strings as starting labels for compositional sequence computation.

set_wyckoffs¶

def set_wyckoffs(self, symmetry_tol: float = 0.01)

Set Wyckoff position labels for each site.

Parameters:

symmetry_tol (float): Tolerance for symmetry detection

set_compositional_sequence_node_attr¶

def set_compositional_sequence_node_attr(
    self,
    hash_cs: bool = False,
    wyckoff: bool = False,
    additional_depth: int = 0,
    diameter_factor: int = 2,
    use_previous_cs: bool = False
)

Compute and set compositional sequences as node attributes.

Parameters:

hash_cs (bool): Hash the compositional sequence during computation
wyckoff (bool): Use Wyckoff-labeled sequences
additional_depth (int): Extra traversal depth
diameter_factor (int): Multiplier for graph diameter
use_previous_cs (bool): Use previous CS as starting point

get_loops¶

def get_loops(self, depth: int, index: int, shortest: bool = True)

Compute loops/rings starting from a given atom.

Parameters:

depth (int): Maximum loop size to search
index (int): Starting atom index
shortest (bool): Stop when all theoretical shortest loops are found

Returns:

list: List of loops, each as a list of (index, image) tuples

CompositionalSequence¶

Compute the compositional sequence for a site in a structure graph.

A compositional sequence is a fingerprint of the local chemical environment around an atom, computed by traversing the graph in shells and counting the elements encountered at each depth.

For example, for Na in NaCl rock salt structure:

Depth 0: Na (the central atom)
Depth 1: Cl6 (6 nearest Cl neighbors)
Depth 2: Na12 (12 next-nearest Na neighbors)

The sequence "Na-Cl6-Na12-..." uniquely identifies the local environment.

Parameters:

Name	Type	Description	Default
`focused_site_i`	`int`	The index of the central atom.	required
`starting_labels`	`list of str`	Labels for each site in the structure.	required
`hash_cs`	`bool`	If True, hash the sequence incrementally to save memory.	`False`
`use_previous_cs`	`bool`	If True, use previous compositional sequences as labels (for iterative refinement).	`False`

Attributes:

Name	Type	Description
`focused_site_i`	`int`	The central site index.
`first_element`	`str`	The label of the central site.
`compositional_seq`	`list of str`	The composition at each depth (if hash_cs=False).
`cs_for_hashing`	`str`	The incrementally hashed sequence (if hash_cs=True).

Examples:

>>> cs = CompositionalSequence(0, ["Na", "Cl", "Na", "Cl"])
>>> # ... add neighbors at each depth ...
>>> print(str(cs))
'Na-Cl6-Na12...'

Source code in graph_id/analysis/compositional_sequence.py

class CompositionalSequence:
    """Compute the compositional sequence for a site in a structure graph.

    A compositional sequence is a fingerprint of the local chemical environment
    around an atom, computed by traversing the graph in shells and counting
    the elements encountered at each depth.

    For example, for Na in NaCl rock salt structure:

    - Depth 0: Na (the central atom)
    - Depth 1: Cl6 (6 nearest Cl neighbors)
    - Depth 2: Na12 (12 next-nearest Na neighbors)

    The sequence ``"Na-Cl6-Na12-..."`` uniquely identifies the local environment.

    Parameters
    ----------
    focused_site_i : int
        The index of the central atom.
    starting_labels : list of str
        Labels for each site in the structure.
    hash_cs : bool, default False
        If True, hash the sequence incrementally to save memory.
    use_previous_cs : bool, default False
        If True, use previous compositional sequences as labels
        (for iterative refinement).

    Attributes
    ----------
    focused_site_i : int
        The central site index.
    first_element : str
        The label of the central site.
    compositional_seq : list of str
        The composition at each depth (if hash_cs=False).
    cs_for_hashing : str
        The incrementally hashed sequence (if hash_cs=True).

    Examples
    --------
    >>> cs = CompositionalSequence(0, ["Na", "Cl", "Na", "Cl"])
    >>> # ... add neighbors at each depth ...
    >>> print(str(cs))
    'Na-Cl6-Na12...'

    """

    def __init__(self, focused_site_i, starting_labels, hash_cs=False, use_previous_cs=False):
        """Initialize the compositional sequence computation."""
        self.hash_cs = hash_cs
        if hash_cs:
            self.cs_for_hashing = ""
        else:
            self.compositional_seq = []

        self.focused_site_i = focused_site_i
        self.new_sites = [(focused_site_i, (0, 0, 0))]

        self.seen_sites = set(self.new_sites)
        self.use_previous_cs = use_previous_cs
        self.labels = starting_labels
        self.composition_counter: Counter = Counter()
        self.first_element = starting_labels[focused_site_i]

    def __str__(self):
        """Return the string representation of the compositional sequence.

        Returns
        -------
        str
            Format: ``"{first_element}-{depth1}-{depth2}-..."``

        """
        if self.hash_cs:
            return f"{self.first_element}-{self.cs_for_hashing}"  # type: ignore

        return f"{self.first_element}-{'-'.join(self.compositional_seq)}"  # type: ignore

    def get_current_starting_sites(self):
        """Get the sites to expand from for the next depth.

        Returns
        -------
        list of tuple
            List of ``(site_index, jimage)`` tuples for the frontier sites.

        """
        new_sites = self.new_sites
        self.new_sites = []
        return [*new_sites]

    def count_composition_for_neighbors(
        self,
        nsites: list[Neighbor],
    ) -> None:
        """Count the composition of neighboring sites.

        Adds new neighbors to the frontier and counts their labels
        for the current depth.

        Parameters
        ----------
        nsites : list of Neighbor
            The neighboring sites to count.

        """
        for neighbor in nsites:
            neighbor_info = (neighbor.index, neighbor.jimage)

            if neighbor_info not in self.seen_sites:
                self.seen_sites.add(neighbor_info)

                self.new_sites.append(neighbor_info)

                if self.use_previous_cs:
                    cs = self.labels[neighbor.index]
                    self.composition_counter[cs] += 1
                else:
                    self.composition_counter[self.labels[neighbor.index]] += 1

    def finalize_this_depth(self):
        """Finalize counting for the current depth.

        Converts the composition counter to a formula string and
        either appends it to the sequence or hashes it incrementally.
        Resets the counter for the next depth.
        """
        formula = self.get_sorted_composition_list_from(self.composition_counter)

        if self.hash_cs:
            self.cs_for_hashing = blake(f"{self.cs_for_hashing}-{''.join(formula)}")
        else:
            self.compositional_seq.append("".join(formula))

    def get_sorted_composition_list_from(self, composition_counter: Counter) -> list[str]:
        """Convert a composition counter to a sorted formula list.

        Parameters
        ----------
        composition_counter : Counter
            Counts of each element/label.

        Returns
        -------
        list of str
            Sorted list of ``"{element}{count}"`` strings.

        """
        sorted_symbols = sorted(composition_counter.keys())
        return [s + str(formula_double_format(composition_counter[s], ignore_ones=False)) for s in sorted_symbols]

Methods:¶

`init(focused_site_i, starting_labels, hash_cs=False, use_previous_cs=False)` ¶

Initialize the compositional sequence computation.

Source code in graph_id/analysis/compositional_sequence.py

def __init__(self, focused_site_i, starting_labels, hash_cs=False, use_previous_cs=False):
    """Initialize the compositional sequence computation."""
    self.hash_cs = hash_cs
    if hash_cs:
        self.cs_for_hashing = ""
    else:
        self.compositional_seq = []

    self.focused_site_i = focused_site_i
    self.new_sites = [(focused_site_i, (0, 0, 0))]

    self.seen_sites = set(self.new_sites)
    self.use_previous_cs = use_previous_cs
    self.labels = starting_labels
    self.composition_counter: Counter = Counter()
    self.first_element = starting_labels[focused_site_i]

`count_composition_for_neighbors(nsites: list[Neighbor]) -> None` ¶

Count the composition of neighboring sites.

Adds new neighbors to the frontier and counts their labels for the current depth.

Parameters:

Name	Type	Description	Default
`nsites`	`list of Neighbor`	The neighboring sites to count.	required

Source code in graph_id/analysis/compositional_sequence.py

def count_composition_for_neighbors(
    self,
    nsites: list[Neighbor],
) -> None:
    """Count the composition of neighboring sites.

    Adds new neighbors to the frontier and counts their labels
    for the current depth.

    Parameters
    ----------
    nsites : list of Neighbor
        The neighboring sites to count.

    """
    for neighbor in nsites:
        neighbor_info = (neighbor.index, neighbor.jimage)

        if neighbor_info not in self.seen_sites:
            self.seen_sites.add(neighbor_info)

            self.new_sites.append(neighbor_info)

            if self.use_previous_cs:
                cs = self.labels[neighbor.index]
                self.composition_counter[cs] += 1
            else:
                self.composition_counter[self.labels[neighbor.index]] += 1

`finalize_this_depth()` ¶

Finalize counting for the current depth.

Converts the composition counter to a formula string and either appends it to the sequence or hashes it incrementally. Resets the counter for the next depth.

Source code in graph_id/analysis/compositional_sequence.py

def finalize_this_depth(self):
    """Finalize counting for the current depth.

    Converts the composition counter to a formula string and
    either appends it to the sequence or hashes it incrementally.
    Resets the counter for the next depth.
    """
    formula = self.get_sorted_composition_list_from(self.composition_counter)

    if self.hash_cs:
        self.cs_for_hashing = blake(f"{self.cs_for_hashing}-{''.join(formula)}")
    else:
        self.compositional_seq.append("".join(formula))

`get_current_starting_sites()` ¶

Get the sites to expand from for the next depth.

Returns:

Type	Description
`list of tuple`	List of `(site_index, jimage)` tuples for the frontier sites.

Source code in graph_id/analysis/compositional_sequence.py

def get_current_starting_sites(self):
    """Get the sites to expand from for the next depth.

    Returns
    -------
    list of tuple
        List of ``(site_index, jimage)`` tuples for the frontier sites.

    """
    new_sites = self.new_sites
    self.new_sites = []
    return [*new_sites]

Class for computing compositional sequences around an atom.

Import¶

from graph_id.analysis.compositional_sequence import CompositionalSequence

Constructor¶

CompositionalSequence(
    focused_site_i,
    starting_labels,
    hash_cs=False,
    use_previous_cs=False
)

Parameters:

focused_site_i (int): Index of the central atom
starting_labels (list[str]): Labels for each site
hash_cs (bool): Hash sequences incrementally
use_previous_cs (bool): Use previous sequence as labels

Methods¶

count_composition_for_neighbors¶

def count_composition_for_neighbors(self, nsites)

Count the composition of neighboring sites.

finalize_this_depth¶

def finalize_this_depth(self)

Finalize counting for the current depth level.

String Representation¶

The string representation gives the full compositional sequence:

cs = CompositionalSequence(0, labels)
# ... compute neighbors ...
print(str(cs))  # "Na-Cl6-Na12-..."

DistanceClusteringNN¶

Bases: NearNeighbors

Neighbor detection using DBSCAN clustering on interatomic distances.

This class identifies neighbors by clustering the distribution of interatomic distances using the DBSCAN algorithm. This allows for automatic detection of distinct bond length populations, which is useful for structures with multiple bond types or unusual bonding.

The algorithm:

Computes all pairwise distances within a cutoff
Applies DBSCAN clustering (eps=0.5, min_samples=2)
Each cluster represents a distinct bond length population
Neighbors are assigned to clusters by their distance

Examples:

>>> from graph_id.analysis.local_env import DistanceClusteringNN
>>> nn = DistanceClusteringNN()
>>> neighbors = nn.get_nn_info(structure, site_index=0, rank_k=0)

Attributes¶

`structures_allowed: bool` `property` ¶

Check if this neighbor finder can be used with Structure objects.

Returns:

Type	Description
`bool`	Always True for this class.

Methods:¶

`init() -> None` ¶

Initialize the DistanceClusteringNN neighbor finder.

Source code in graph_id/analysis/local_env.py

def __init__(self) -> None:
    """Initialize the DistanceClusteringNN neighbor finder."""

`get_nn_info(structure: Structure, n: int, rank_k: int, cutoff: float = 6.0) -> list[dict[str, Any]]` ¶

Get neighbor information for a specific site and distance cluster.

Parameters:

Name	Type	Description	Default
`structure`	`Structure`	The input pymatgen Structure.	required
`n`	`int`	Index of the site to find neighbors for.	required
`rank_k`	`int`	The distance cluster index (0-based). Cluster 0 contains the shortest bonds, cluster 1 the next shortest, etc.	required
`cutoff`	`float`	Maximum distance cutoff in Angstroms.	`6.0`

Returns:

Type	Description
`list of dict`	List of neighbor information dictionaries, each containing: `site`: The neighbor Site object `image`: Periodic image indices (i, j, k) `weight`: The bond distance (rounded to 3 decimals) `site_index`: Index of the neighbor in the structure `edge_properties`: Dict with `cluster_idx` key

Source code in graph_id/analysis/local_env.py

def get_nn_info(self, structure: Structure, n: int, rank_k: int, cutoff: float = 6.0) -> list[dict[str, Any]]:
    """Get neighbor information for a specific site and distance cluster.

    Parameters
    ----------
    structure : Structure
        The input pymatgen Structure.
    n : int
        Index of the site to find neighbors for.
    rank_k : int
        The distance cluster index (0-based). Cluster 0 contains the
        shortest bonds, cluster 1 the next shortest, etc.
    cutoff : float, default 6.0
        Maximum distance cutoff in Angstroms.

    Returns
    -------
    list of dict
        List of neighbor information dictionaries, each containing:

        - ``site``: The neighbor Site object
        - ``image``: Periodic image indices (i, j, k)
        - ``weight``: The bond distance (rounded to 3 decimals)
        - ``site_index``: Index of the neighbor in the structure
        - ``edge_properties``: Dict with ``cluster_idx`` key

    """
    site = structure[n]
    cutoff_cluster_list = self.get_cutoff_cluster(structure, n, cutoff)
    if len(cutoff_cluster_list) <= rank_k:
        return []

    neighs_dists = structure.get_neighbors(site, cutoff_cluster_list[rank_k])
    max_weight = round(cutoff_cluster_list[rank_k], 3)
    # is_periodic = isinstance(structure, Structure | IStructure) # Python 3.10 以降でのみサポート
    is_periodic = isinstance(structure, (IStructure, Structure))
    siw = []

    for nn in neighs_dists:
        weight = round(nn.nn_distance, 3)
        if (rank_k > 0 and weight <= max_weight and weight > round(cutoff_cluster_list[rank_k - 1], 3)) or (
            rank_k == 0 and weight <= max_weight
        ):
            siw.append(
                {
                    "site": nn,
                    "image": self._get_image(structure, nn) if is_periodic else None,
                    "weight": weight,
                    "site_index": self._get_original_site(structure, nn),
                    "edge_properties": {"cluster_idx": rank_k + 1},
                },
            )

    return siw

`get_cutoff_cluster(structure: Structure, n: int, cutoff: float = 6.0) -> list` ¶

Get distance cluster cutoffs using DBSCAN clustering.

Computes all interatomic distances from site n within the cutoff, then clusters them using DBSCAN. Returns the maximum distance in each cluster as cutoff thresholds.

Parameters:

Name	Type	Description	Default
`structure`	`Structure`	The input pymatgen Structure.	required
`n`	`int`	Index of the central site.	required
`cutoff`	`float`	Maximum distance to consider in Angstroms.	`6.0`

Returns:

Type	Description
`list of float`	Sorted list of maximum distances for each cluster. The i-th element is the cutoff for cluster i.

Notes

Uses DBSCAN with eps=0.5 and min_samples=2 to cluster distances. Distances that don't fit into any cluster are ignored.

Source code in graph_id/analysis/local_env.py

def get_cutoff_cluster(self, structure: Structure, n: int, cutoff: float = 6.0) -> list:
    """Get distance cluster cutoffs using DBSCAN clustering.

    Computes all interatomic distances from site n within the cutoff,
    then clusters them using DBSCAN. Returns the maximum distance in
    each cluster as cutoff thresholds.

    Parameters
    ----------
    structure : Structure
        The input pymatgen Structure.
    n : int
        Index of the central site.
    cutoff : float, default 6.0
        Maximum distance to consider in Angstroms.

    Returns
    -------
    list of float
        Sorted list of maximum distances for each cluster.
        The i-th element is the cutoff for cluster i.

    Notes
    -----
    Uses DBSCAN with eps=0.5 and min_samples=2 to cluster distances.
    Distances that don't fit into any cluster are ignored.

    """
    # # スーパーセルを作成し、6.0angまでの結合長を数え上げる
    # copy_structure = structure.copy()
    # supercell = copy_structure.make_supercell([3, 3, 3])
    # site_i = structure[n]

    # site_index = None
    # for idx, site in enumerate(supercell):
    #     # Siteのdistanceメソッドを使うとなぜか正しく距離が計算されない
    #     if float(np.linalg.norm(site_i.coords - site.coords)) < 0.01:
    #         site_index = idx
    #         break

    distance_list = []
    neighbors = structure.get_sites_in_sphere(structure[n].coords, cutoff)
    for neighbor in neighbors:
        dist = neighbor.nn_distance
        distance_list.append([dist, 0])

    dbscan = DBSCAN(eps=0.5, min_samples=2)
    dbscan.fit(distance_list)
    labels = dbscan.labels_

    max_dist_list = [0 for _ in range(max(labels) + 1)]
    for label_number in range(max(labels) + 1):
        max_dist = 0
        for label, distance in zip(labels, distance_list, strict=False):
            if label == label_number:
                max_dist = max(max_dist, distance[0])

        max_dist_list[label_number] = max_dist

    return sorted(max_dist_list)

Neighbor detection based on DBSCAN clustering of interatomic distances.

Import¶

from graph_id.analysis.local_env import DistanceClusteringNN

Constructor¶

DistanceClusteringNN()

Methods¶

get_nn_info¶

def get_nn_info(
    self,
    structure: Structure,
    n: int,
    rank_k: int,
    cutoff: float = 6.0
) -> list[dict]

Get neighbor information for a specific site and distance cluster.

Parameters:

structure (Structure): Input structure
n (int): Site index
rank_k (int): Cluster index (0-based)
cutoff (float): Maximum distance cutoff

Returns:

list[dict]: List of neighbor information dictionaries

get_cutoff_cluster¶

def get_cutoff_cluster(
    self,
    structure: Structure,
    n: int,
    cutoff: float = 6.0
) -> list

Get distance cutoffs for each cluster using DBSCAN.

Parameters:

structure (Structure): Input structure
n (int): Site index
cutoff (float): Maximum distance to consider

Returns:

list: Sorted list of maximum distances for each cluster

How DBSCAN Clustering Works¶

The algorithm:

Computes all pairwise distances within the cutoff
Runs DBSCAN with eps=0.5 and min_samples=2
Groups distances into clusters
Returns cutoffs as the maximum distance in each cluster

This is useful for structures with distinct bond length populations.

Analysis Module¶

StructureGraph¶

Methods:¶

from_local_env_strategy(structure, strategy, weights=False) staticmethod ¶

with_indivisual_state_comp_strategy(structure, strategy, _sg, n, weights=False, rank_k=1, cutoff=6.0) staticmethod ¶

set_elemental_labels() ¶

set_wyckoffs(symmetry_tol: float = 0.01) -> None ¶

set_compositional_sequence_node_attr(hash_cs: bool = False, wyckoff: bool = False, additional_depth: int = 0, diameter_factor: int = 2, use_previous_cs: bool = False) -> None ¶

get_loops(depth: int, index: int, shortest: bool = True) ¶

set_loops(diameter_factor: int, additional_depth: int) -> None ¶

Import¶

Class Methods¶

from_local_env_strategy¶

Instance Methods¶

set_elemental_labels¶

set_wyckoffs¶

set_compositional_sequence_node_attr¶

get_loops¶

CompositionalSequence¶

Methods:¶

__init__(focused_site_i, starting_labels, hash_cs=False, use_previous_cs=False) ¶

count_composition_for_neighbors(nsites: list[Neighbor]) -> None ¶

finalize_this_depth() ¶

get_current_starting_sites() ¶

Import¶

Constructor¶

Methods¶

count_composition_for_neighbors¶

finalize_this_depth¶

String Representation¶

DistanceClusteringNN¶

Attributes¶

structures_allowed: bool property ¶

Methods:¶

__init__() -> None ¶

get_nn_info(structure: Structure, n: int, rank_k: int, cutoff: float = 6.0) -> list[dict[str, Any]] ¶

get_cutoff_cluster(structure: Structure, n: int, cutoff: float = 6.0) -> list ¶

Import¶

Constructor¶

Methods¶

get_nn_info¶

get_cutoff_cluster¶

How DBSCAN Clustering Works¶

`from_local_env_strategy(structure, strategy, weights=False)` `staticmethod` ¶

`with_indivisual_state_comp_strategy(structure, strategy, _sg, n, weights=False, rank_k=1, cutoff=6.0)` `staticmethod` ¶

`set_elemental_labels()` ¶

`set_wyckoffs(symmetry_tol: float = 0.01) -> None` ¶

`set_compositional_sequence_node_attr(hash_cs: bool = False, wyckoff: bool = False, additional_depth: int = 0, diameter_factor: int = 2, use_previous_cs: bool = False) -> None` ¶

`get_loops(depth: int, index: int, shortest: bool = True)` ¶

`set_loops(diameter_factor: int, additional_depth: int) -> None` ¶

`init(focused_site_i, starting_labels, hash_cs=False, use_previous_cs=False)` ¶

`count_composition_for_neighbors(nsites: list[Neighbor]) -> None` ¶

`finalize_this_depth()` ¶

`get_current_starting_sites()` ¶

`structures_allowed: bool` `property` ¶

`init() -> None` ¶

`get_nn_info(structure: Structure, n: int, rank_k: int, cutoff: float = 6.0) -> list[dict[str, Any]]` ¶

`get_cutoff_cluster(structure: Structure, n: int, cutoff: float = 6.0) -> list` ¶