Skip to content

Analysis Module

The analysis module provides low-level utilities for structure graph construction and compositional sequence computation.

StructureGraph

Bases: StructureGraph

Extended StructureGraph with methods for Graph ID computation.

This class extends pymatgen's StructureGraph with additional functionality for computing compositional sequences and handling loops/rings in the structure graph.

Attributes:

Name Type Description
starting_labels list of str

Labels for each site used as starting points for compositional sequences.

cc_cs list of dict

Compositional sequences for each connected component. Each dict contains site_i (set of site indices) and cs_list (list of compositional sequence strings).

See Also

pymatgen.analysis.graphs.StructureGraph : Base class

Source code in graph_id/analysis/graphs.py
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
class StructureGraph(PmgStructureGraph):  # type: ignore

    """Extended StructureGraph with methods for Graph ID computation.

    This class extends pymatgen's StructureGraph with additional functionality
    for computing compositional sequences and handling loops/rings in the
    structure graph.

    Attributes
    ----------
    starting_labels : list of str
        Labels for each site used as starting points for compositional sequences.
    cc_cs : list of dict
        Compositional sequences for each connected component.
        Each dict contains ``site_i`` (set of site indices) and ``cs_list``
        (list of compositional sequence strings).

    See Also
    --------
    pymatgen.analysis.graphs.StructureGraph : Base class
    """

    @staticmethod
    def from_pymatgen_structure_graph(sg: PmgStructureGraph):
        """Create a StructureGraph from a pymatgen StructureGraph.

        Parameters
        ----------
        sg : PmgStructureGraph
            A pymatgen StructureGraph object.

        Returns
        -------
        StructureGraph
            A new StructureGraph instance.
        """
        graph_data = sg.as_dict()["graphs"]

        return StructureGraph(sg.structure, graph_data)

    @staticmethod
    def from_local_env_strategy(structure, strategy, weights=False):
        """Create a StructureGraph using a neighbor-finding strategy.

        Parameters
        ----------
        structure : Structure
            A pymatgen Structure object.
        strategy : NearNeighbors
            A neighbor-finding strategy from pymatgen.analysis.local_env,
            such as MinimumDistanceNN, CrystalNN, etc.
        weights : bool, default False
            If True, include bond weights from the strategy.

        Returns
        -------
        StructureGraph
            A new StructureGraph with edges representing bonds.

        Raises
        ------
        ValueError
            If the strategy does not support structures.

        Examples
        --------
        >>> from pymatgen.analysis.local_env import MinimumDistanceNN
        >>> sg = StructureGraph.from_local_env_strategy(
        ...     structure, MinimumDistanceNN()
        ... )
        """
        if not strategy.structures_allowed:
            msg = "Chosen strategy is not designed for use with structures! Please choose another strategy."
            raise ValueError(msg)

        sg = StructureGraph.from_empty_graph(structure, name="bonds")

        for n, neighbors in enumerate(strategy.get_all_nn_info(structure)):
            for neighbor in neighbors:
                # local_env will always try to add two edges
                # for any one bond, one from site u to site v
                # and another form site v to site u: this is
                # harmless, so warn_duplicates=False
                sg.add_edge(
                    from_index=n,
                    from_jimage=(0, 0, 0),
                    to_index=neighbor["site_index"],
                    to_jimage=neighbor["image"],
                    weight=neighbor["weight"] if weights else None,
                    warn_duplicates=False,
                )

        return sg

    @staticmethod
    def with_indivisual_state_comp_strategy(structure, strategy, _sg, n, weights=False, rank_k=1, cutoff=6.0):
        """Add edges for a specific site using a distance clustering strategy.

        This method is used by DistanceClusteringGraphID to add bonds
        for a specific site and distance cluster.

        Parameters
        ----------
        structure : Structure
            A pymatgen Structure object.
        strategy : DistanceClusteringNN
            A distance clustering neighbor-finding strategy.
        _sg : StructureGraph
            An existing StructureGraph to modify.
        n : int
            The site index to add edges for.
        weights : bool, default False
            If True, include bond weights from the strategy.
        rank_k : int, default 1
            The distance cluster index (0-based).
        cutoff : float, default 6.0
            Maximum distance cutoff in Angstroms.

        Returns
        -------
        StructureGraph
            The modified StructureGraph with new edges.

        Raises
        ------
        ValueError
            If the strategy does not support structures.
        """
        if not strategy.structures_allowed:
            raise ValueError(  # noqa: TRY003
                "Chosen strategy is not designed for use with structures!",  # noqa: EM101
            )

        nn_info = strategy.get_nn_info(structure, n, rank_k, cutoff)

        for neighbor in nn_info:
            # local_env will always try to add two edges
            # for any one bond, one from site u to site v
            # and another form site v to site u: this is
            # harmless, so warn_duplicates=False
            _sg.add_edge(
                from_index=n,
                from_jimage=(0, 0, 0),
                to_index=neighbor["site_index"],
                to_jimage=neighbor["image"],
                weight=neighbor["weight"] if weights else None,
                warn_duplicates=False,
                edge_properties=neighbor["edge_properties"],
            )

        return _sg

    def set_elemental_labels(self):
        """Set element symbols as starting labels for compositional sequences.

        This is the default labeling scheme where each site is labeled
        by its element symbol (e.g., "Na", "Cl").
        """
        self.starting_labels = [site.species_string for site in self.structure]

    def get_connected_sites_light(self, n, jimage=(0, 0, 0)):
        """Get connected sites with minimal memory footprint.

        A lightweight version of get_connected_sites that returns
        ConnectedSiteLight objects instead of full ConnectedSite objects.

        Parameters
        ----------
        n : int
            The site index to get neighbors for.
        jimage : tuple, default (0, 0, 0)
            The periodic image of the site.

        Returns
        -------
        list of ConnectedSiteLight
            List of connected sites with minimal information.
        """
        connected_sites = set()
        connected_site_images = set()

        out_edges = [(u, v, d, "out") for u, v, d in self.graph.out_edges(n, data=True)]
        in_edges = [(u, v, d, "in") for u, v, d in self.graph.in_edges(n, data=True)]

        for u, v, d, direction in out_edges + in_edges:
            to_jimage = d["to_jimage"]

            if direction == "in":
                u, v = v, u  # noqa: PLW2901
                to_jimage = np.multiply(-1, to_jimage)

            to_jimage = tuple(map(int, np.add(to_jimage, jimage)))

            if (v, to_jimage) not in connected_site_images:
                connected_site = ConnectedSiteLight(
                    site=self.structure[v],
                    jimage=to_jimage,
                    index=v,
                    weight=None,
                    dist=None,
                )

                connected_sites.add(connected_site)
                connected_site_images.add((v, to_jimage))

        return list(connected_sites)

    def set_wyckoffs(self, symmetry_tol: float = 0.01) -> None:
        """Set Wyckoff position labels for each site.

        Labels each site with its element, Wyckoff letter, and space group
        number in the format ``"{element}_{wyckoff}_{spacegroup}"``.

        Parameters
        ----------
        symmetry_tol : float, default 0.01
            Tolerance for symmetry detection in Angstroms.

        Notes
        -----
        If symmetry detection fails, falls back to elemental labels.
        """
        siteless_strc = self.structure.copy()

        for site_i in range(len(self.structure)):
            siteless_strc.replace(site_i, Element("H"))

        sga = SpacegroupAnalyzer(siteless_strc, symprec=symmetry_tol)
        sym_dataset = sga.get_symmetry_dataset()

        if sym_dataset is None:
            self.set_elemental_labels()
            return

        wyckoffs = sym_dataset.wyckoffs
        number = sym_dataset.number

        attribute_values = {}

        self.starting_labels = []
        for site_i, w in enumerate(wyckoffs):
            attribute_values[site_i] = f"{self.structure[site_i].species_string}_{w}_{number}"
            self.starting_labels.append(f"{self.structure[site_i].species_string}_{w}_{number}")

    def set_compositional_sequence_node_attr(
        self,
        hash_cs: bool = False,
        wyckoff: bool = False,
        additional_depth: int = 0,
        diameter_factor: int = 2,
        use_previous_cs: bool = False,
    ) -> None:
        """Compute and set compositional sequences as node attributes.

        This is the core method that computes the local environment
        fingerprint for each site by traversing the graph and counting
        neighbors at each depth.

        Parameters
        ----------
        hash_cs : bool, default False
            If True, hash the compositional sequence incrementally
            during computation for memory efficiency.
        wyckoff : bool, default False
            If True, use Wyckoff labels in the computation.
        additional_depth : int, default 0
            Extra traversal depth to add.
        diameter_factor : int, default 2
            Multiplier for graph diameter to determine traversal depth.
        use_previous_cs : bool, default False
            If True, use previous compositional sequence as starting labels.

        Notes
        -----
        After calling this method:

        - Node attributes are set with key ``"compositional_sequence"``
        - ``self.cc_cs`` contains compositional sequences per component
        """
        node_attributes = {}
        self.cc_cs = []
        get_connected_sites_light = functools.lru_cache(maxsize=None)(self.get_connected_sites_light)

        ug = self.graph.to_undirected()

        for cc in nx.connected_components(ug):
            cs_list = []

            d = diameter(ug.subgraph(cc))

            for focused_site_i in cc:
                depth = diameter_factor * d + additional_depth

                cs = CompositionalSequence(
                    focused_site_i=focused_site_i,
                    starting_labels=self.starting_labels,
                    hash_cs=hash_cs,
                    use_previous_cs=use_previous_cs or wyckoff,
                )

                for _ in range(depth):
                    for c_site in cs.get_current_starting_sites():
                        nsites = get_connected_sites_light(c_site[0], c_site[1])
                        cs.count_composition_for_neighbors(nsites)

                    cs.finalize_this_depth()

                this_cs = str(cs)

                node_attributes[focused_site_i] = self.starting_labels[focused_site_i] + "_" + this_cs
                cs_list.append(this_cs)

            self.cc_cs.append({"site_i": cc, "cs_list": cs_list})

        nx.set_node_attributes(self.graph, values=node_attributes, name="compositional_sequence")

    def get_loops(self, depth: int, index: int, shortest: bool = True):  # noqa: C901
        """Find all loops/rings starting from a given atom.

        Traverses the graph to find closed loops that start and end at the
        specified atom index.

        Parameters
        ----------
        depth : int
            Maximum loop size to search for.
        index : int
            The starting atom index.
        shortest : bool, default True
            If True, stop searching when all theoretically possible shortest
            loops are found.

        Returns
        -------
        list of list of tuple
            A list of loops, where each loop is a list of ``(index, image)``
            tuples representing the path.

        Notes
        -----
        Loops are found by breadth-first traversal and tracking when paths
        return to their starting point.
        """
        get_connected_sites = functools.lru_cache(maxsize=None)(self.get_connected_sites)

        def find_all_rings(index, ring_list):
            neighbors = get_connected_sites(index, (0, 0, 0))
            for n0, n1 in combinations(neighbors, 2):
                found = False
                for ring in ring_list:
                    term0 = ring[1]
                    term1 = ring[-2]

                    if all(
                        (
                            n0.index == term0[0],
                            n0.jimage == term0[1],
                            n1.index == term1[0],
                            n1.jimage == term1[1],
                        ),
                    ):
                        found = True
                        break

                    if all(
                        (
                            n1.index == term0[0],
                            n1.jimage == term0[1],
                            n0.index == term1[0],
                            n0.jimage == term1[1],
                        ),
                    ):
                        found = True
                        break

                if found is False:
                    return False

            return True

        def get_further_lines_from_lines(lines):
            new_lines = []
            for line in lines:
                ind, image = line[-1]
                neighbors = get_connected_sites(ind, image)

                for n in neighbors:
                    new_line = [*line, (n.index, n.jimage)]

                    # 戻らない場合のみ。
                    if len(new_line[:-1]) == len(set(new_line[:-1])):
                        new_lines.append(new_line)

            return new_lines

        lines = []
        lines.append([(index, (0, 0, 0))])

        ring_list = []

        for depth_i in range(depth):
            next_lines = []
            lines = get_further_lines_from_lines(lines)

            for line in lines:
                # 前と後ろが同じ
                if line[0] == line[-1]:
                    if depth_i > 1 and list(reversed(line)) not in ring_list:
                        ring_list.append(line)
                else:
                    next_lines.append(line)

            lines = next_lines

            # ここで理論上の値に達したら探索を打ち切る
            if shortest and find_all_rings(index, ring_list):
                return ring_list

        return list(ring_list)

    def set_loops(self, diameter_factor: int, additional_depth: int) -> None:
        """Set loop-based labels for each site.

        Computes all loops for each site and creates a hashed label
        representing the ring topology around that site.

        Parameters
        ----------
        diameter_factor : int
            Multiplier for graph diameter to determine search depth.
        additional_depth : int
            Extra depth to add to the search.

        Notes
        -----
        Sets ``self.starting_labels`` to hashed loop representations.
        Used when ``loop=True`` in GraphIDGenerator.
        """
        self.starting_labels = []

        undirected_graph = self.graph.to_undirected()

        max_diameter = 0
        for cc in nx.connected_components(undirected_graph):
            d = diameter(undirected_graph.subgraph(cc))
            if d > max_diameter:
                max_diameter = d

        depth = max_diameter * diameter_factor + additional_depth

        for site_i in range(len(self.graph.nodes)):
            all_loops = self.get_loops(depth=depth, index=site_i)
            all_loop_strings = []
            # print(all_loops)
            for loop in all_loops:
                loop_elements = []
                for site_i_jimage in loop:
                    loop_species_string = self.structure[site_i_jimage[0]].species_string
                    # print(loop_species_string)
                    loop_elements.append(loop_species_string)

                loop_elements = standardize_loop(loop_elements)

                seed_str = "-".join(loop_elements)
                hashed_loop = blake2b(seed_str.encode(), digest_size=8).hexdigest()

                all_loop_strings.append(hashed_loop)

            seed_str_all_loops = ":".join(sorted(all_loop_strings))
            hashed_all_loops = blake2b(seed_str_all_loops.encode(), digest_size=8).hexdigest()

            self.starting_labels.append(hashed_all_loops)

    def set_indivisual_compositional_sequence_node_attr(
        self,
        n: int,
        hash_cs: bool = False,
        wyckoff: bool = False,
        additional_depth: int = 0,
        diameter_factor: int = 2,
        use_previous_cs: bool = False,
    ) -> None:
        """Compute compositional sequence for a single site.

        Similar to set_compositional_sequence_node_attr, but only computes
        the sequence for the specified site. Used by DistanceClusteringGraphID.

        Parameters
        ----------
        n : int
            The site index to compute the sequence for.
        hash_cs : bool, default False
            If True, hash the sequence incrementally.
        wyckoff : bool, default False
            If True, use Wyckoff labels.
        additional_depth : int, default 0
            Extra traversal depth.
        diameter_factor : int, default 2
            Multiplier for graph diameter.
        use_previous_cs : bool, default False
            If True, use previous sequence as starting labels.
        """
        node_attributes = {}
        self.cc_cs = []
        get_connected_sites_light = functools.lru_cache(maxsize=None)(self.get_connected_sites_light)

        ug = self.graph.to_undirected()

        for cc in nx.connected_components(ug):
            cs_list = []

            d = diameter(ug.subgraph(cc))

            if n in cc:
                depth = diameter_factor * d + additional_depth

                cs = CompositionalSequence(
                    focused_site_i=n,
                    starting_labels=self.starting_labels,
                    hash_cs=hash_cs,
                    use_previous_cs=use_previous_cs or wyckoff,
                )

                for _this_depth in range(depth):
                    for c_site in cs.get_current_starting_sites():
                        nsites = get_connected_sites_light(c_site[0], c_site[1])
                        cs.count_composition_for_neighbors(nsites)

                    cs.finalize_this_depth()

                this_cs = str(cs)

                node_attributes[n] = self.starting_labels[n] + "_" + this_cs
                cs_list.append(this_cs)

                self.cc_cs.append({"site_i": cc, "cs_list": cs_list})

        nx.set_node_attributes(self.graph, values=node_attributes, name="compositional_sequence")

Methods:

from_local_env_strategy(structure, strategy, weights=False) staticmethod

Create a StructureGraph using a neighbor-finding strategy.

Parameters:

Name Type Description Default
structure Structure

A pymatgen Structure object.

required
strategy NearNeighbors

A neighbor-finding strategy from pymatgen.analysis.local_env, such as MinimumDistanceNN, CrystalNN, etc.

required
weights bool

If True, include bond weights from the strategy.

False

Returns:

Type Description
StructureGraph

A new StructureGraph with edges representing bonds.

Raises:

Type Description
ValueError

If the strategy does not support structures.

Examples:

>>> from pymatgen.analysis.local_env import MinimumDistanceNN
>>> sg = StructureGraph.from_local_env_strategy(
...     structure, MinimumDistanceNN()
... )
Source code in graph_id/analysis/graphs.py
@staticmethod
def from_local_env_strategy(structure, strategy, weights=False):
    """Create a StructureGraph using a neighbor-finding strategy.

    Parameters
    ----------
    structure : Structure
        A pymatgen Structure object.
    strategy : NearNeighbors
        A neighbor-finding strategy from pymatgen.analysis.local_env,
        such as MinimumDistanceNN, CrystalNN, etc.
    weights : bool, default False
        If True, include bond weights from the strategy.

    Returns
    -------
    StructureGraph
        A new StructureGraph with edges representing bonds.

    Raises
    ------
    ValueError
        If the strategy does not support structures.

    Examples
    --------
    >>> from pymatgen.analysis.local_env import MinimumDistanceNN
    >>> sg = StructureGraph.from_local_env_strategy(
    ...     structure, MinimumDistanceNN()
    ... )
    """
    if not strategy.structures_allowed:
        msg = "Chosen strategy is not designed for use with structures! Please choose another strategy."
        raise ValueError(msg)

    sg = StructureGraph.from_empty_graph(structure, name="bonds")

    for n, neighbors in enumerate(strategy.get_all_nn_info(structure)):
        for neighbor in neighbors:
            # local_env will always try to add two edges
            # for any one bond, one from site u to site v
            # and another form site v to site u: this is
            # harmless, so warn_duplicates=False
            sg.add_edge(
                from_index=n,
                from_jimage=(0, 0, 0),
                to_index=neighbor["site_index"],
                to_jimage=neighbor["image"],
                weight=neighbor["weight"] if weights else None,
                warn_duplicates=False,
            )

    return sg

with_indivisual_state_comp_strategy(structure, strategy, _sg, n, weights=False, rank_k=1, cutoff=6.0) staticmethod

Add edges for a specific site using a distance clustering strategy.

This method is used by DistanceClusteringGraphID to add bonds for a specific site and distance cluster.

Parameters:

Name Type Description Default
structure Structure

A pymatgen Structure object.

required
strategy DistanceClusteringNN

A distance clustering neighbor-finding strategy.

required
_sg StructureGraph

An existing StructureGraph to modify.

required
n int

The site index to add edges for.

required
weights bool

If True, include bond weights from the strategy.

False
rank_k int

The distance cluster index (0-based).

1
cutoff float

Maximum distance cutoff in Angstroms.

6.0

Returns:

Type Description
StructureGraph

The modified StructureGraph with new edges.

Raises:

Type Description
ValueError

If the strategy does not support structures.

Source code in graph_id/analysis/graphs.py
@staticmethod
def with_indivisual_state_comp_strategy(structure, strategy, _sg, n, weights=False, rank_k=1, cutoff=6.0):
    """Add edges for a specific site using a distance clustering strategy.

    This method is used by DistanceClusteringGraphID to add bonds
    for a specific site and distance cluster.

    Parameters
    ----------
    structure : Structure
        A pymatgen Structure object.
    strategy : DistanceClusteringNN
        A distance clustering neighbor-finding strategy.
    _sg : StructureGraph
        An existing StructureGraph to modify.
    n : int
        The site index to add edges for.
    weights : bool, default False
        If True, include bond weights from the strategy.
    rank_k : int, default 1
        The distance cluster index (0-based).
    cutoff : float, default 6.0
        Maximum distance cutoff in Angstroms.

    Returns
    -------
    StructureGraph
        The modified StructureGraph with new edges.

    Raises
    ------
    ValueError
        If the strategy does not support structures.
    """
    if not strategy.structures_allowed:
        raise ValueError(  # noqa: TRY003
            "Chosen strategy is not designed for use with structures!",  # noqa: EM101
        )

    nn_info = strategy.get_nn_info(structure, n, rank_k, cutoff)

    for neighbor in nn_info:
        # local_env will always try to add two edges
        # for any one bond, one from site u to site v
        # and another form site v to site u: this is
        # harmless, so warn_duplicates=False
        _sg.add_edge(
            from_index=n,
            from_jimage=(0, 0, 0),
            to_index=neighbor["site_index"],
            to_jimage=neighbor["image"],
            weight=neighbor["weight"] if weights else None,
            warn_duplicates=False,
            edge_properties=neighbor["edge_properties"],
        )

    return _sg

set_elemental_labels()

Set element symbols as starting labels for compositional sequences.

This is the default labeling scheme where each site is labeled by its element symbol (e.g., "Na", "Cl").

Source code in graph_id/analysis/graphs.py
def set_elemental_labels(self):
    """Set element symbols as starting labels for compositional sequences.

    This is the default labeling scheme where each site is labeled
    by its element symbol (e.g., "Na", "Cl").
    """
    self.starting_labels = [site.species_string for site in self.structure]

set_wyckoffs(symmetry_tol: float = 0.01) -> None

Set Wyckoff position labels for each site.

Labels each site with its element, Wyckoff letter, and space group number in the format "{element}_{wyckoff}_{spacegroup}".

Parameters:

Name Type Description Default
symmetry_tol float

Tolerance for symmetry detection in Angstroms.

0.01
Notes

If symmetry detection fails, falls back to elemental labels.

Source code in graph_id/analysis/graphs.py
def set_wyckoffs(self, symmetry_tol: float = 0.01) -> None:
    """Set Wyckoff position labels for each site.

    Labels each site with its element, Wyckoff letter, and space group
    number in the format ``"{element}_{wyckoff}_{spacegroup}"``.

    Parameters
    ----------
    symmetry_tol : float, default 0.01
        Tolerance for symmetry detection in Angstroms.

    Notes
    -----
    If symmetry detection fails, falls back to elemental labels.
    """
    siteless_strc = self.structure.copy()

    for site_i in range(len(self.structure)):
        siteless_strc.replace(site_i, Element("H"))

    sga = SpacegroupAnalyzer(siteless_strc, symprec=symmetry_tol)
    sym_dataset = sga.get_symmetry_dataset()

    if sym_dataset is None:
        self.set_elemental_labels()
        return

    wyckoffs = sym_dataset.wyckoffs
    number = sym_dataset.number

    attribute_values = {}

    self.starting_labels = []
    for site_i, w in enumerate(wyckoffs):
        attribute_values[site_i] = f"{self.structure[site_i].species_string}_{w}_{number}"
        self.starting_labels.append(f"{self.structure[site_i].species_string}_{w}_{number}")

set_compositional_sequence_node_attr(hash_cs: bool = False, wyckoff: bool = False, additional_depth: int = 0, diameter_factor: int = 2, use_previous_cs: bool = False) -> None

Compute and set compositional sequences as node attributes.

This is the core method that computes the local environment fingerprint for each site by traversing the graph and counting neighbors at each depth.

Parameters:

Name Type Description Default
hash_cs bool

If True, hash the compositional sequence incrementally during computation for memory efficiency.

False
wyckoff bool

If True, use Wyckoff labels in the computation.

False
additional_depth int

Extra traversal depth to add.

0
diameter_factor int

Multiplier for graph diameter to determine traversal depth.

2
use_previous_cs bool

If True, use previous compositional sequence as starting labels.

False
Notes

After calling this method:

  • Node attributes are set with key "compositional_sequence"
  • self.cc_cs contains compositional sequences per component
Source code in graph_id/analysis/graphs.py
def set_compositional_sequence_node_attr(
    self,
    hash_cs: bool = False,
    wyckoff: bool = False,
    additional_depth: int = 0,
    diameter_factor: int = 2,
    use_previous_cs: bool = False,
) -> None:
    """Compute and set compositional sequences as node attributes.

    This is the core method that computes the local environment
    fingerprint for each site by traversing the graph and counting
    neighbors at each depth.

    Parameters
    ----------
    hash_cs : bool, default False
        If True, hash the compositional sequence incrementally
        during computation for memory efficiency.
    wyckoff : bool, default False
        If True, use Wyckoff labels in the computation.
    additional_depth : int, default 0
        Extra traversal depth to add.
    diameter_factor : int, default 2
        Multiplier for graph diameter to determine traversal depth.
    use_previous_cs : bool, default False
        If True, use previous compositional sequence as starting labels.

    Notes
    -----
    After calling this method:

    - Node attributes are set with key ``"compositional_sequence"``
    - ``self.cc_cs`` contains compositional sequences per component
    """
    node_attributes = {}
    self.cc_cs = []
    get_connected_sites_light = functools.lru_cache(maxsize=None)(self.get_connected_sites_light)

    ug = self.graph.to_undirected()

    for cc in nx.connected_components(ug):
        cs_list = []

        d = diameter(ug.subgraph(cc))

        for focused_site_i in cc:
            depth = diameter_factor * d + additional_depth

            cs = CompositionalSequence(
                focused_site_i=focused_site_i,
                starting_labels=self.starting_labels,
                hash_cs=hash_cs,
                use_previous_cs=use_previous_cs or wyckoff,
            )

            for _ in range(depth):
                for c_site in cs.get_current_starting_sites():
                    nsites = get_connected_sites_light(c_site[0], c_site[1])
                    cs.count_composition_for_neighbors(nsites)

                cs.finalize_this_depth()

            this_cs = str(cs)

            node_attributes[focused_site_i] = self.starting_labels[focused_site_i] + "_" + this_cs
            cs_list.append(this_cs)

        self.cc_cs.append({"site_i": cc, "cs_list": cs_list})

    nx.set_node_attributes(self.graph, values=node_attributes, name="compositional_sequence")

get_loops(depth: int, index: int, shortest: bool = True)

Find all loops/rings starting from a given atom.

Traverses the graph to find closed loops that start and end at the specified atom index.

Parameters:

Name Type Description Default
depth int

Maximum loop size to search for.

required
index int

The starting atom index.

required
shortest bool

If True, stop searching when all theoretically possible shortest loops are found.

True

Returns:

Type Description
list of list of tuple

A list of loops, where each loop is a list of (index, image) tuples representing the path.

Notes

Loops are found by breadth-first traversal and tracking when paths return to their starting point.

Source code in graph_id/analysis/graphs.py
def get_loops(self, depth: int, index: int, shortest: bool = True):  # noqa: C901
    """Find all loops/rings starting from a given atom.

    Traverses the graph to find closed loops that start and end at the
    specified atom index.

    Parameters
    ----------
    depth : int
        Maximum loop size to search for.
    index : int
        The starting atom index.
    shortest : bool, default True
        If True, stop searching when all theoretically possible shortest
        loops are found.

    Returns
    -------
    list of list of tuple
        A list of loops, where each loop is a list of ``(index, image)``
        tuples representing the path.

    Notes
    -----
    Loops are found by breadth-first traversal and tracking when paths
    return to their starting point.
    """
    get_connected_sites = functools.lru_cache(maxsize=None)(self.get_connected_sites)

    def find_all_rings(index, ring_list):
        neighbors = get_connected_sites(index, (0, 0, 0))
        for n0, n1 in combinations(neighbors, 2):
            found = False
            for ring in ring_list:
                term0 = ring[1]
                term1 = ring[-2]

                if all(
                    (
                        n0.index == term0[0],
                        n0.jimage == term0[1],
                        n1.index == term1[0],
                        n1.jimage == term1[1],
                    ),
                ):
                    found = True
                    break

                if all(
                    (
                        n1.index == term0[0],
                        n1.jimage == term0[1],
                        n0.index == term1[0],
                        n0.jimage == term1[1],
                    ),
                ):
                    found = True
                    break

            if found is False:
                return False

        return True

    def get_further_lines_from_lines(lines):
        new_lines = []
        for line in lines:
            ind, image = line[-1]
            neighbors = get_connected_sites(ind, image)

            for n in neighbors:
                new_line = [*line, (n.index, n.jimage)]

                # 戻らない場合のみ。
                if len(new_line[:-1]) == len(set(new_line[:-1])):
                    new_lines.append(new_line)

        return new_lines

    lines = []
    lines.append([(index, (0, 0, 0))])

    ring_list = []

    for depth_i in range(depth):
        next_lines = []
        lines = get_further_lines_from_lines(lines)

        for line in lines:
            # 前と後ろが同じ
            if line[0] == line[-1]:
                if depth_i > 1 and list(reversed(line)) not in ring_list:
                    ring_list.append(line)
            else:
                next_lines.append(line)

        lines = next_lines

        # ここで理論上の値に達したら探索を打ち切る
        if shortest and find_all_rings(index, ring_list):
            return ring_list

    return list(ring_list)

set_loops(diameter_factor: int, additional_depth: int) -> None

Set loop-based labels for each site.

Computes all loops for each site and creates a hashed label representing the ring topology around that site.

Parameters:

Name Type Description Default
diameter_factor int

Multiplier for graph diameter to determine search depth.

required
additional_depth int

Extra depth to add to the search.

required
Notes

Sets self.starting_labels to hashed loop representations. Used when loop=True in GraphIDGenerator.

Source code in graph_id/analysis/graphs.py
def set_loops(self, diameter_factor: int, additional_depth: int) -> None:
    """Set loop-based labels for each site.

    Computes all loops for each site and creates a hashed label
    representing the ring topology around that site.

    Parameters
    ----------
    diameter_factor : int
        Multiplier for graph diameter to determine search depth.
    additional_depth : int
        Extra depth to add to the search.

    Notes
    -----
    Sets ``self.starting_labels`` to hashed loop representations.
    Used when ``loop=True`` in GraphIDGenerator.
    """
    self.starting_labels = []

    undirected_graph = self.graph.to_undirected()

    max_diameter = 0
    for cc in nx.connected_components(undirected_graph):
        d = diameter(undirected_graph.subgraph(cc))
        if d > max_diameter:
            max_diameter = d

    depth = max_diameter * diameter_factor + additional_depth

    for site_i in range(len(self.graph.nodes)):
        all_loops = self.get_loops(depth=depth, index=site_i)
        all_loop_strings = []
        # print(all_loops)
        for loop in all_loops:
            loop_elements = []
            for site_i_jimage in loop:
                loop_species_string = self.structure[site_i_jimage[0]].species_string
                # print(loop_species_string)
                loop_elements.append(loop_species_string)

            loop_elements = standardize_loop(loop_elements)

            seed_str = "-".join(loop_elements)
            hashed_loop = blake2b(seed_str.encode(), digest_size=8).hexdigest()

            all_loop_strings.append(hashed_loop)

        seed_str_all_loops = ":".join(sorted(all_loop_strings))
        hashed_all_loops = blake2b(seed_str_all_loops.encode(), digest_size=8).hexdigest()

        self.starting_labels.append(hashed_all_loops)

Extended version of pymatgen's StructureGraph with additional methods for Graph ID generation.

Import

from graph_id.analysis.graphs import StructureGraph

Class Methods

from_local_env_strategy

@staticmethod
def from_local_env_strategy(structure, strategy, weights=False)

Constructor for StructureGraph using a neighbor detection strategy.

Parameters:

  • structure (Structure): pymatgen Structure object
  • strategy (NearNeighbors): A neighbor detection strategy
  • weights (bool): If True, use weights from the strategy

Returns:

  • StructureGraph: Constructed structure graph

Example:

from graph_id.analysis.graphs import StructureGraph
from pymatgen.analysis.local_env import MinimumDistanceNN

sg = StructureGraph.from_local_env_strategy(structure, MinimumDistanceNN())

Instance Methods

set_elemental_labels

def set_elemental_labels(self)

Set elemental species strings as starting labels for compositional sequence computation.

set_wyckoffs

def set_wyckoffs(self, symmetry_tol: float = 0.01)

Set Wyckoff position labels for each site.

Parameters:

  • symmetry_tol (float): Tolerance for symmetry detection

set_compositional_sequence_node_attr

def set_compositional_sequence_node_attr(
    self,
    hash_cs: bool = False,
    wyckoff: bool = False,
    additional_depth: int = 0,
    diameter_factor: int = 2,
    use_previous_cs: bool = False
)

Compute and set compositional sequences as node attributes.

Parameters:

  • hash_cs (bool): Hash the compositional sequence during computation
  • wyckoff (bool): Use Wyckoff-labeled sequences
  • additional_depth (int): Extra traversal depth
  • diameter_factor (int): Multiplier for graph diameter
  • use_previous_cs (bool): Use previous CS as starting point

get_loops

def get_loops(self, depth: int, index: int, shortest: bool = True)

Compute loops/rings starting from a given atom.

Parameters:

  • depth (int): Maximum loop size to search
  • index (int): Starting atom index
  • shortest (bool): Stop when all theoretical shortest loops are found

Returns:

  • list: List of loops, each as a list of (index, image) tuples

CompositionalSequence

Compute the compositional sequence for a site in a structure graph.

A compositional sequence is a fingerprint of the local chemical environment around an atom, computed by traversing the graph in shells and counting the elements encountered at each depth.

For example, for Na in NaCl rock salt structure:

  • Depth 0: Na (the central atom)
  • Depth 1: Cl6 (6 nearest Cl neighbors)
  • Depth 2: Na12 (12 next-nearest Na neighbors)

The sequence "Na-Cl6-Na12-..." uniquely identifies the local environment.

Parameters:

Name Type Description Default
focused_site_i int

The index of the central atom.

required
starting_labels list of str

Labels for each site in the structure.

required
hash_cs bool

If True, hash the sequence incrementally to save memory.

False
use_previous_cs bool

If True, use previous compositional sequences as labels (for iterative refinement).

False

Attributes:

Name Type Description
focused_site_i int

The central site index.

first_element str

The label of the central site.

compositional_seq list of str

The composition at each depth (if hash_cs=False).

cs_for_hashing str

The incrementally hashed sequence (if hash_cs=True).

Examples:

>>> cs = CompositionalSequence(0, ["Na", "Cl", "Na", "Cl"])
>>> # ... add neighbors at each depth ...
>>> print(str(cs))
'Na-Cl6-Na12...'
Source code in graph_id/analysis/compositional_sequence.py
class CompositionalSequence:

    """Compute the compositional sequence for a site in a structure graph.

    A compositional sequence is a fingerprint of the local chemical environment
    around an atom, computed by traversing the graph in shells and counting
    the elements encountered at each depth.

    For example, for Na in NaCl rock salt structure:

    - Depth 0: Na (the central atom)
    - Depth 1: Cl6 (6 nearest Cl neighbors)
    - Depth 2: Na12 (12 next-nearest Na neighbors)

    The sequence ``"Na-Cl6-Na12-..."`` uniquely identifies the local environment.

    Parameters
    ----------
    focused_site_i : int
        The index of the central atom.
    starting_labels : list of str
        Labels for each site in the structure.
    hash_cs : bool, default False
        If True, hash the sequence incrementally to save memory.
    use_previous_cs : bool, default False
        If True, use previous compositional sequences as labels
        (for iterative refinement).

    Attributes
    ----------
    focused_site_i : int
        The central site index.
    first_element : str
        The label of the central site.
    compositional_seq : list of str
        The composition at each depth (if hash_cs=False).
    cs_for_hashing : str
        The incrementally hashed sequence (if hash_cs=True).

    Examples
    --------
    >>> cs = CompositionalSequence(0, ["Na", "Cl", "Na", "Cl"])
    >>> # ... add neighbors at each depth ...
    >>> print(str(cs))
    'Na-Cl6-Na12...'
    """

    def __init__(self, focused_site_i, starting_labels, hash_cs=False, use_previous_cs=False):
        """Initialize the compositional sequence computation."""
        self.hash_cs = hash_cs
        if hash_cs:
            self.cs_for_hashing = ""
        else:
            self.compositional_seq = []

        self.focused_site_i = focused_site_i
        self.new_sites = [(focused_site_i, (0, 0, 0))]

        self.seen_sites = set(self.new_sites)
        self.use_previous_cs = use_previous_cs
        self.labels = starting_labels
        self.composition_counter: Counter = Counter()
        self.first_element = starting_labels[focused_site_i]

    def __str__(self):
        """Return the string representation of the compositional sequence.

        Returns
        -------
        str
            Format: ``"{first_element}-{depth1}-{depth2}-..."``
        """
        if self.hash_cs:
            return f"{self.first_element}-{self.cs_for_hashing}"  # type: ignore

        return f"{self.first_element}-{'-'.join(self.compositional_seq)}"  # type: ignore

    def get_current_starting_sites(self):
        """Get the sites to expand from for the next depth.

        Returns
        -------
        list of tuple
            List of ``(site_index, jimage)`` tuples for the frontier sites.
        """
        new_sites = self.new_sites
        self.new_sites = []
        return [*new_sites]

    def count_composition_for_neighbors(
        self,
        nsites: list[Neighbor],
    ) -> None:
        """Count the composition of neighboring sites.

        Adds new neighbors to the frontier and counts their labels
        for the current depth.

        Parameters
        ----------
        nsites : list of Neighbor
            The neighboring sites to count.
        """
        for neighbor in nsites:
            neighbor_info = (neighbor.index, neighbor.jimage)

            if neighbor_info not in self.seen_sites:
                self.seen_sites.add(neighbor_info)

                self.new_sites.append(neighbor_info)

                if self.use_previous_cs:
                    cs = self.labels[neighbor.index]
                    self.composition_counter[cs] += 1
                else:
                    self.composition_counter[self.labels[neighbor.index]] += 1

    def finalize_this_depth(self):
        """Finalize counting for the current depth.

        Converts the composition counter to a formula string and
        either appends it to the sequence or hashes it incrementally.
        Resets the counter for the next depth.
        """
        formula = self.get_sorted_composition_list_from(self.composition_counter)

        if self.hash_cs:
            self.cs_for_hashing = blake(f"{self.cs_for_hashing}-{''.join(formula)}")
        else:
            self.compositional_seq.append("".join(formula))

    def get_sorted_composition_list_from(self, composition_counter: Counter) -> list[str]:
        """Convert a composition counter to a sorted formula list.

        Parameters
        ----------
        composition_counter : Counter
            Counts of each element/label.

        Returns
        -------
        list of str
            Sorted list of ``"{element}{count}"`` strings.
        """
        sorted_symbols = sorted(composition_counter.keys())
        return [s + str(formula_double_format(composition_counter[s], ignore_ones=False)) for s in sorted_symbols]

Methods:

__init__(focused_site_i, starting_labels, hash_cs=False, use_previous_cs=False)

Initialize the compositional sequence computation.

Source code in graph_id/analysis/compositional_sequence.py
def __init__(self, focused_site_i, starting_labels, hash_cs=False, use_previous_cs=False):
    """Initialize the compositional sequence computation."""
    self.hash_cs = hash_cs
    if hash_cs:
        self.cs_for_hashing = ""
    else:
        self.compositional_seq = []

    self.focused_site_i = focused_site_i
    self.new_sites = [(focused_site_i, (0, 0, 0))]

    self.seen_sites = set(self.new_sites)
    self.use_previous_cs = use_previous_cs
    self.labels = starting_labels
    self.composition_counter: Counter = Counter()
    self.first_element = starting_labels[focused_site_i]

count_composition_for_neighbors(nsites: list[Neighbor]) -> None

Count the composition of neighboring sites.

Adds new neighbors to the frontier and counts their labels for the current depth.

Parameters:

Name Type Description Default
nsites list of Neighbor

The neighboring sites to count.

required
Source code in graph_id/analysis/compositional_sequence.py
def count_composition_for_neighbors(
    self,
    nsites: list[Neighbor],
) -> None:
    """Count the composition of neighboring sites.

    Adds new neighbors to the frontier and counts their labels
    for the current depth.

    Parameters
    ----------
    nsites : list of Neighbor
        The neighboring sites to count.
    """
    for neighbor in nsites:
        neighbor_info = (neighbor.index, neighbor.jimage)

        if neighbor_info not in self.seen_sites:
            self.seen_sites.add(neighbor_info)

            self.new_sites.append(neighbor_info)

            if self.use_previous_cs:
                cs = self.labels[neighbor.index]
                self.composition_counter[cs] += 1
            else:
                self.composition_counter[self.labels[neighbor.index]] += 1

finalize_this_depth()

Finalize counting for the current depth.

Converts the composition counter to a formula string and either appends it to the sequence or hashes it incrementally. Resets the counter for the next depth.

Source code in graph_id/analysis/compositional_sequence.py
def finalize_this_depth(self):
    """Finalize counting for the current depth.

    Converts the composition counter to a formula string and
    either appends it to the sequence or hashes it incrementally.
    Resets the counter for the next depth.
    """
    formula = self.get_sorted_composition_list_from(self.composition_counter)

    if self.hash_cs:
        self.cs_for_hashing = blake(f"{self.cs_for_hashing}-{''.join(formula)}")
    else:
        self.compositional_seq.append("".join(formula))

get_current_starting_sites()

Get the sites to expand from for the next depth.

Returns:

Type Description
list of tuple

List of (site_index, jimage) tuples for the frontier sites.

Source code in graph_id/analysis/compositional_sequence.py
def get_current_starting_sites(self):
    """Get the sites to expand from for the next depth.

    Returns
    -------
    list of tuple
        List of ``(site_index, jimage)`` tuples for the frontier sites.
    """
    new_sites = self.new_sites
    self.new_sites = []
    return [*new_sites]

Class for computing compositional sequences around an atom.

Import

from graph_id.analysis.compositional_sequence import CompositionalSequence

Constructor

CompositionalSequence(
    focused_site_i,
    starting_labels,
    hash_cs=False,
    use_previous_cs=False
)

Parameters:

  • focused_site_i (int): Index of the central atom
  • starting_labels (list[str]): Labels for each site
  • hash_cs (bool): Hash sequences incrementally
  • use_previous_cs (bool): Use previous sequence as labels

Methods

count_composition_for_neighbors

def count_composition_for_neighbors(self, nsites)

Count the composition of neighboring sites.

finalize_this_depth

def finalize_this_depth(self)

Finalize counting for the current depth level.

String Representation

The string representation gives the full compositional sequence:

cs = CompositionalSequence(0, labels)
# ... compute neighbors ...
print(str(cs))  # "Na-Cl6-Na12-..."

DistanceClusteringNN

Bases: NearNeighbors

Neighbor detection using DBSCAN clustering on interatomic distances.

This class identifies neighbors by clustering the distribution of interatomic distances using the DBSCAN algorithm. This allows for automatic detection of distinct bond length populations, which is useful for structures with multiple bond types or unusual bonding.

The algorithm:

  1. Computes all pairwise distances within a cutoff
  2. Applies DBSCAN clustering (eps=0.5, min_samples=2)
  3. Each cluster represents a distinct bond length population
  4. Neighbors are assigned to clusters by their distance

Examples:

>>> from graph_id.analysis.local_env import DistanceClusteringNN
>>> nn = DistanceClusteringNN()
>>> neighbors = nn.get_nn_info(structure, site_index=0, rank_k=0)
See Also

DistanceClusteringGraphID : Graph ID generator using this class

Source code in graph_id/analysis/local_env.py
class DistanceClusteringNN(NearNeighbors):

    """Neighbor detection using DBSCAN clustering on interatomic distances.

    This class identifies neighbors by clustering the distribution of
    interatomic distances using the DBSCAN algorithm. This allows for
    automatic detection of distinct bond length populations, which is
    useful for structures with multiple bond types or unusual bonding.

    The algorithm:

    1. Computes all pairwise distances within a cutoff
    2. Applies DBSCAN clustering (eps=0.5, min_samples=2)
    3. Each cluster represents a distinct bond length population
    4. Neighbors are assigned to clusters by their distance

    Examples
    --------
    >>> from graph_id.analysis.local_env import DistanceClusteringNN
    >>> nn = DistanceClusteringNN()
    >>> neighbors = nn.get_nn_info(structure, site_index=0, rank_k=0)

    See Also
    --------
    DistanceClusteringGraphID : Graph ID generator using this class
    """

    def __init__(self) -> None:
        """Initialize the DistanceClusteringNN neighbor finder."""

    @property
    def structures_allowed(self) -> bool:
        """Check if this neighbor finder can be used with Structure objects.

        Returns
        -------
        bool
            Always True for this class.
        """
        return True

    def get_nn_info(self, structure: Structure, n: int, rank_k: int, cutoff: float = 6.0) -> list[dict[str, Any]]:
        """Get neighbor information for a specific site and distance cluster.

        Parameters
        ----------
        structure : Structure
            The input pymatgen Structure.
        n : int
            Index of the site to find neighbors for.
        rank_k : int
            The distance cluster index (0-based). Cluster 0 contains the
            shortest bonds, cluster 1 the next shortest, etc.
        cutoff : float, default 6.0
            Maximum distance cutoff in Angstroms.

        Returns
        -------
        list of dict
            List of neighbor information dictionaries, each containing:

            - ``site``: The neighbor Site object
            - ``image``: Periodic image indices (i, j, k)
            - ``weight``: The bond distance (rounded to 3 decimals)
            - ``site_index``: Index of the neighbor in the structure
            - ``edge_properties``: Dict with ``cluster_idx`` key
        """
        site = structure[n]
        cutoff_cluster_list = self.get_cutoff_cluster(structure, n, cutoff)
        if len(cutoff_cluster_list) <= rank_k:
            return []

        neighs_dists = structure.get_neighbors(site, cutoff_cluster_list[rank_k])
        max_weight = round(cutoff_cluster_list[rank_k], 3)
        # is_periodic = isinstance(structure, Structure | IStructure) # Python 3.10 以降でのみサポート
        is_periodic = isinstance(structure, (IStructure, Structure))
        siw = []

        for nn in neighs_dists:
            weight = round(nn.nn_distance, 3)
            if (rank_k > 0 and weight <= max_weight and weight > round(cutoff_cluster_list[rank_k - 1], 3)) or (
                rank_k == 0 and weight <= max_weight
            ):
                siw.append(
                    {
                        "site": nn,
                        "image": self._get_image(structure, nn) if is_periodic else None,
                        "weight": weight,
                        "site_index": self._get_original_site(structure, nn),
                        "edge_properties": {"cluster_idx": rank_k + 1},
                    },
                )

        return siw

    def get_cutoff_cluster(self, structure: Structure, n: int, cutoff: float = 6.0) -> list:
        """Get distance cluster cutoffs using DBSCAN clustering.

        Computes all interatomic distances from site n within the cutoff,
        then clusters them using DBSCAN. Returns the maximum distance in
        each cluster as cutoff thresholds.

        Parameters
        ----------
        structure : Structure
            The input pymatgen Structure.
        n : int
            Index of the central site.
        cutoff : float, default 6.0
            Maximum distance to consider in Angstroms.

        Returns
        -------
        list of float
            Sorted list of maximum distances for each cluster.
            The i-th element is the cutoff for cluster i.

        Notes
        -----
        Uses DBSCAN with eps=0.5 and min_samples=2 to cluster distances.
        Distances that don't fit into any cluster are ignored.
        """
        # # スーパーセルを作成し、6.0angまでの結合長を数え上げる
        # copy_structure = structure.copy()
        # supercell = copy_structure.make_supercell([3, 3, 3])
        # site_i = structure[n]

        # site_index = None
        # for idx, site in enumerate(supercell):
        #     # Siteのdistanceメソッドを使うとなぜか正しく距離が計算されない
        #     if float(np.linalg.norm(site_i.coords - site.coords)) < 0.01:
        #         site_index = idx
        #         break

        distance_list = []
        neighbors = structure.get_sites_in_sphere(structure[n].coords, cutoff)
        for neighbor in neighbors:
            dist = neighbor.nn_distance
            distance_list.append([dist, 0])

        dbscan = DBSCAN(eps=0.5, min_samples=2)
        dbscan.fit(distance_list)
        labels = dbscan.labels_

        max_dist_list = [0 for _ in range(max(labels) + 1)]
        for label_number in range(max(labels) + 1):
            max_dist = 0
            for label, distance in zip(labels, distance_list, strict=False):
                if label == label_number:
                    max_dist = max(max_dist, distance[0])

            max_dist_list[label_number] = max_dist

        return sorted(max_dist_list)

Attributes

structures_allowed: bool property

Check if this neighbor finder can be used with Structure objects.

Returns:

Type Description
bool

Always True for this class.

Methods:

__init__() -> None

Initialize the DistanceClusteringNN neighbor finder.

Source code in graph_id/analysis/local_env.py
def __init__(self) -> None:
    """Initialize the DistanceClusteringNN neighbor finder."""

get_nn_info(structure: Structure, n: int, rank_k: int, cutoff: float = 6.0) -> list[dict[str, Any]]

Get neighbor information for a specific site and distance cluster.

Parameters:

Name Type Description Default
structure Structure

The input pymatgen Structure.

required
n int

Index of the site to find neighbors for.

required
rank_k int

The distance cluster index (0-based). Cluster 0 contains the shortest bonds, cluster 1 the next shortest, etc.

required
cutoff float

Maximum distance cutoff in Angstroms.

6.0

Returns:

Type Description
list of dict

List of neighbor information dictionaries, each containing:

  • site: The neighbor Site object
  • image: Periodic image indices (i, j, k)
  • weight: The bond distance (rounded to 3 decimals)
  • site_index: Index of the neighbor in the structure
  • edge_properties: Dict with cluster_idx key
Source code in graph_id/analysis/local_env.py
def get_nn_info(self, structure: Structure, n: int, rank_k: int, cutoff: float = 6.0) -> list[dict[str, Any]]:
    """Get neighbor information for a specific site and distance cluster.

    Parameters
    ----------
    structure : Structure
        The input pymatgen Structure.
    n : int
        Index of the site to find neighbors for.
    rank_k : int
        The distance cluster index (0-based). Cluster 0 contains the
        shortest bonds, cluster 1 the next shortest, etc.
    cutoff : float, default 6.0
        Maximum distance cutoff in Angstroms.

    Returns
    -------
    list of dict
        List of neighbor information dictionaries, each containing:

        - ``site``: The neighbor Site object
        - ``image``: Periodic image indices (i, j, k)
        - ``weight``: The bond distance (rounded to 3 decimals)
        - ``site_index``: Index of the neighbor in the structure
        - ``edge_properties``: Dict with ``cluster_idx`` key
    """
    site = structure[n]
    cutoff_cluster_list = self.get_cutoff_cluster(structure, n, cutoff)
    if len(cutoff_cluster_list) <= rank_k:
        return []

    neighs_dists = structure.get_neighbors(site, cutoff_cluster_list[rank_k])
    max_weight = round(cutoff_cluster_list[rank_k], 3)
    # is_periodic = isinstance(structure, Structure | IStructure) # Python 3.10 以降でのみサポート
    is_periodic = isinstance(structure, (IStructure, Structure))
    siw = []

    for nn in neighs_dists:
        weight = round(nn.nn_distance, 3)
        if (rank_k > 0 and weight <= max_weight and weight > round(cutoff_cluster_list[rank_k - 1], 3)) or (
            rank_k == 0 and weight <= max_weight
        ):
            siw.append(
                {
                    "site": nn,
                    "image": self._get_image(structure, nn) if is_periodic else None,
                    "weight": weight,
                    "site_index": self._get_original_site(structure, nn),
                    "edge_properties": {"cluster_idx": rank_k + 1},
                },
            )

    return siw

get_cutoff_cluster(structure: Structure, n: int, cutoff: float = 6.0) -> list

Get distance cluster cutoffs using DBSCAN clustering.

Computes all interatomic distances from site n within the cutoff, then clusters them using DBSCAN. Returns the maximum distance in each cluster as cutoff thresholds.

Parameters:

Name Type Description Default
structure Structure

The input pymatgen Structure.

required
n int

Index of the central site.

required
cutoff float

Maximum distance to consider in Angstroms.

6.0

Returns:

Type Description
list of float

Sorted list of maximum distances for each cluster. The i-th element is the cutoff for cluster i.

Notes

Uses DBSCAN with eps=0.5 and min_samples=2 to cluster distances. Distances that don't fit into any cluster are ignored.

Source code in graph_id/analysis/local_env.py
def get_cutoff_cluster(self, structure: Structure, n: int, cutoff: float = 6.0) -> list:
    """Get distance cluster cutoffs using DBSCAN clustering.

    Computes all interatomic distances from site n within the cutoff,
    then clusters them using DBSCAN. Returns the maximum distance in
    each cluster as cutoff thresholds.

    Parameters
    ----------
    structure : Structure
        The input pymatgen Structure.
    n : int
        Index of the central site.
    cutoff : float, default 6.0
        Maximum distance to consider in Angstroms.

    Returns
    -------
    list of float
        Sorted list of maximum distances for each cluster.
        The i-th element is the cutoff for cluster i.

    Notes
    -----
    Uses DBSCAN with eps=0.5 and min_samples=2 to cluster distances.
    Distances that don't fit into any cluster are ignored.
    """
    # # スーパーセルを作成し、6.0angまでの結合長を数え上げる
    # copy_structure = structure.copy()
    # supercell = copy_structure.make_supercell([3, 3, 3])
    # site_i = structure[n]

    # site_index = None
    # for idx, site in enumerate(supercell):
    #     # Siteのdistanceメソッドを使うとなぜか正しく距離が計算されない
    #     if float(np.linalg.norm(site_i.coords - site.coords)) < 0.01:
    #         site_index = idx
    #         break

    distance_list = []
    neighbors = structure.get_sites_in_sphere(structure[n].coords, cutoff)
    for neighbor in neighbors:
        dist = neighbor.nn_distance
        distance_list.append([dist, 0])

    dbscan = DBSCAN(eps=0.5, min_samples=2)
    dbscan.fit(distance_list)
    labels = dbscan.labels_

    max_dist_list = [0 for _ in range(max(labels) + 1)]
    for label_number in range(max(labels) + 1):
        max_dist = 0
        for label, distance in zip(labels, distance_list, strict=False):
            if label == label_number:
                max_dist = max(max_dist, distance[0])

        max_dist_list[label_number] = max_dist

    return sorted(max_dist_list)

Neighbor detection based on DBSCAN clustering of interatomic distances.

Import

from graph_id.analysis.local_env import DistanceClusteringNN

Constructor

DistanceClusteringNN()

Methods

get_nn_info

def get_nn_info(
    self,
    structure: Structure,
    n: int,
    rank_k: int,
    cutoff: float = 6.0
) -> list[dict]

Get neighbor information for a specific site and distance cluster.

Parameters:

  • structure (Structure): Input structure
  • n (int): Site index
  • rank_k (int): Cluster index (0-based)
  • cutoff (float): Maximum distance cutoff

Returns:

  • list[dict]: List of neighbor information dictionaries

get_cutoff_cluster

def get_cutoff_cluster(
    self,
    structure: Structure,
    n: int,
    cutoff: float = 6.0
) -> list

Get distance cutoffs for each cluster using DBSCAN.

Parameters:

  • structure (Structure): Input structure
  • n (int): Site index
  • cutoff (float): Maximum distance to consider

Returns:

  • list: Sorted list of maximum distances for each cluster

How DBSCAN Clustering Works

The algorithm:

  1. Computes all pairwise distances within the cutoff
  2. Runs DBSCAN with eps=0.5 and min_samples=2
  3. Groups distances into clusters
  4. Returns cutoffs as the maximum distance in each cluster

This is useful for structures with distinct bond length populations.