MATHEMATICAL MODELLING OF TECHNOLOGICAL PROCESSES AND SYSTEMS

Topology as a lens for semantic organization in transformer embeddings

  • 1 Faculty of Computer Science and Engineering, “Ss. Cyril and Methodius” University, Skopje, Macedonia

Abstract

This paper examines the geometric structure of sentence embeddings through the lens of persistent homology. The goal is to determine whether semantic similarity produces distinctive topological patterns in a controlled embedding environment. To isolate semantic effects, a single sentence template was combined with different target words, forming two point clouds in a transformer embedding space: one derived from semantically similar words and one from dissimilar words. A Vietoris–Rips filtration was applied to both clouds, and the resulting persistence diagrams were summarized by average lifetime, entropy of birth–death intervals, and the area under the Betti curve. The results show a coherent difference across topological dimensions: similar words generate stable connected components with lower variability, while dissimilar words produce a richer set of cycle features that persist across a broader range of scales. These findings indicate that persistent homology can capture multi-scale structural differences in embedding spaces that are not visible through standard distance-based comparisons. Although the experiment is intentionally simple, it highlights the potential of topological methods for studying how semantic structure is distributed across levels of a neural embedding space.

Keywords

References

  1. A. Zomorodian, G. Carlsson, Computing persistent homology, Discrete & Computational Geometry, vol. 33, pp. 249–274, (2005).
  2. H. Adams, T. Emerson, M. Kirby, R. Neville, C. Peterson, et al., Persistence images: A stable vector representation of persistent homology, Journal of Machine Learning Research, vol. 18(8), pp. 1–35, (2017).
  3. F. Fontana, G. L. Torrisi, A. Tulipani, A novel method of extracting topological features from word embeddings, arXiv preprint arXiv:2003.13074, (2020).
  4. F. Hill, R. Reichart, A. Korhonen, SimLex-999: Evaluating semantic models with (genuine) similarity estimation, Computational Linguistics, vol. 41(4), pp. 665–695, (2015).
  5. N. Reimers, I. Gurevych, Sentence-BERT: Sentence embeddings using Siamese BERT-networks, Proceedings of EMNLP, pp. 3982–3992, (2019).
  6. M. De Silva, J. P. Jackson, B. Harreld, J. Mordukhovich, The shape of word embeddings: Quantifying non-isometry with topological data analysis, Findings of EMNLP, (2024).

Article full text

Download PDF