Datasets
DeepCosmoNet datasets are grouped into two categories: N-body Simulation Dataset (A), and
Cosmological structures catalogues (B), composed of Sub-halo catalogue and Void Catalogue.
Each dataset comes with a minimal open sample (CSV, 5 rows) for quick inspection and the full files in compact formats (e.g. .csv) for research use.
FAIR artefacts (metadata, README, provenance, dictionary, and citation) are being added incrementally and are clearly marked below.
Need the full files? See “Licence & Citation” for terms and preferred citation, then follow the repository or the contact instructions where noted.
A. N-body Simulation Dataset
A1. N-body simulation sample DEMNUni (1 Gpc)
Overview. DEMNUni simulations are a suite of large-scale N-body cosmological simulations, this is a sample from a simulation with a box side length of 1 Gpc, that investigate the clustering of cosmic structures specifically in the presence of massive neutrinos. The particle data from these simulations typically includes the following columns: x, y, z (positions), iord (a unique particle identifier), velx, vely, velz (velocities), and mass.
Intended use: training and validation of our pipeline.
Primary files
Last updated: 2025-08-23
Preview (CSV)
First 5 rows from a spatial subset of our Dataset; the full subset is available via the .csv download above.
FAIR artefacts (status)
- Metadata record· metadata.json
- README· README.md
- Data dictionary· dictionary.csv
- Provenance & methods· provenance.md
- Licensing & citation· LICENCE citation.bib
B. Cosmological Structures Catalogues
B1. Sub-halo Catalogue
Overview. This catalogue contains sub-halos identified by our pipeline from a subset of the N-body simulation.
Intended use: analyze and study the cosmic web .
Primary files
Last updated:
Preview (CSV)
First 5 rows from a tiny sample file.
FAIR artefacts (status)
- Metadata record · metadata.json
- README · README.md
- Data dictionary · dictionary.csv
- Provenance & methods · provenance.md
- Licensing & citation · LICENCE · citation.txt
B2. Void Catalogue
Overview. This catalogue contains cosmic voids identified by our pipeline, which uses a 3D YOLO-like architecture to process voxelised data from the N-body simulation. Each entry includes the center and radius of the spherical voids, with data validated using metrics like spherical Intersection over Union (IoU).
Intended use: This resource is essential for studying large-scale cosmic structures and for comparative analysis with other detection methods.
Primary files
Last updated:
Preview (CSV)
First 5 rows from a tiny sample file.
FAIR artefacts (status)
- Metadata record · metadata.json
- README · README.md
- Data dictionary · dictionary.csv
- Provenance & methods · provenance.md
- Licensing & citation · LICENCE · citation.txt
Source Code
Heads up: FAIR artefacts are being published in stages. Items marked “Coming soon” will appear in the next updates; “External” links point to project-controlled sources (e.g., GitHub or a data catalogue) when appropriate.
DeepCosmoNet Halos Core Repository
Planned contents
- Training & evaluation scripts
- Model architectures
- Data loaders and preprocessing utilities
- Reproducible configs
DeepCosmoNet Voids Core Repository
Planned contents
- Training & evaluation scripts
- Model architectures
- Data loaders and preprocessing utilities
- Reproducible configs
Publications
This section lists journal & conference submissions, technical diagrams/notes, and selected
Journal & Journal
-
HALOS: Hierarchical Aggregation Learning for Overdensity Search
Show short note
Scope: instance segmentation on cosmic web Halos and subhalos. -
3D YOLO-like Detector for Cosmic Voids: A Multi-Scale Deep Learning Approach to Large-Scale Underdense Structures
Abstract Preprint (coming soon)
Show short note
Scope: instance segnmentation on cosmic web classes.
Diagrams & Technical Notes
-
Schematic of the HALOS segmentation and identification pipeline.
-
Predicted vs. true density distribution.
-
Density distribution on the 1 Gpc simulation box.
-
Subhalo mass function comparison on the 0.5 Gpc box.
-
Subhalo mass function comparison on the 1 Gpc box.
-
Feature importance ranking for the high-density regime.
-
Visualizations of the first identified halo.
-
Comparison plots between predicted and reference subhalos.
Licence & Citation
To support ethical reuse and proper attribution, DeepCosmoNet provides default licensing and citation templates for datasets and software.
Important: if a dataset or repository includes its own LICENSE, citation.txt, or DOI,
that local file overrides the defaults below. Always prefer the per-item files when present.
If you adapt the datasets or code, indicate changes and, where practical, link back to this hub so others can find the original materials.
Licence & how to cite
Licence (default): Creative Commons Attribution 4.0 International (CC BY 4.0). You must provide appropriate credit and indicate if changes were made. Read the licence.
Recommended paper Halos citation (plain text)
Fabio Spampinato, Vincenzo Del Zoppo, Giuseppe Puglisi, Alessio Mezzina, Marco Cataldo, Jean Marc Christille, Matteo Calabrese, Luca Naso, Carmelita Carbone,
HALOS: Hierarchical Aggregation Learning for Overdensity Search,
Astronomy and Computing,
Volume 56,
2026,
101114,
ISSN 2213-1337,
https://doi.org/10.1016/j.ascom.2026.101114.
(https://www.sciencedirect.com/science/article/pii/S2213133726000569)
Abstract: The identification of gravitationally bound substructures (subhalos) within cosmological simulations is a cornerstone for understanding galaxy formation and evolution. Traditional algorithms, while accurate, are often computationally intensive, posing a significant bottleneck for the analysis of next-generation cosmological simulations and limiting the feasibility of on-the-fly processing. In this work, we introduce HALOS (Hierarchical Aggregation Learning for Overdensity Search), a novel deep learning pipeline for subhalo identification in 3D point clouds. Our method employs a multi-stage approach that decouples particle classification from instance segmentation. First, we engineer a set of physically motivated features for each particle. Second, a multi-layer perceptron simultaneously performs two tasks: (i) a semantic segmentation to classify particles as either bound to a subhalo or part of the unbound background, and (ii) a regression to predict the 3D coordinates of the parent subhalo centroid for each bound particle. Finally, the HDBSCAN (Hierarchical Density-Based Spatial Clustering of Applications with Noise) algorithm performs a density-based clustering exclusively on the pre-filtered set of bound particles, significantly reducing computational complexity. We train and validate our model using catalogues generated from cosmological N-body simulations by the SUBFIND algorithm. HALOS achieves a semantic classification accuracy of 95%, an Adjusted Rand Index for instance segmentation >90%, and an overall Completeness of 90%, demonstrating a close alignment with SUBFIND, while reducing computational time by a factor of ∼ 16.
Keywords: Subhalos; Cosmology; Deep learning; Clustering; Particle segmentation; SUBFIND; Halo-finders; 3D point clouds
dataURL: https://deepcosmonet.koexai.com/resources/ Licence: CC BY 4.0.
BibTeX Paper Halos(template)
@article{SPAMPINATO2026101114,
title = {HALOS: Hierarchical Aggregation Learning for Overdensity Search},
journal = {Astronomy and Computing},
volume = {56},
pages = {101114},
year = {2026},
issn = {2213-1337},
doi = {https://doi.org/10.1016/j.ascom.2026.101114},
url = {https://www.sciencedirect.com/science/article/pii/S2213133726000569},
author = {Fabio Spampinato and Vincenzo {Del Zoppo} and Giuseppe Puglisi and Alessio Mezzina and Marco Cataldo and Jean Marc Christille and Matteo Calabrese and Luca Naso and Carmelita Carbone},
keywords = {Subhalos, Cosmology, Deep learning, Clustering, Particle segmentation, SUBFIND, Halo-finders, 3D point clouds},
abstract = {The identification of gravitationally bound substructures (subhalos) within cosmological simulations is a cornerstone for understanding galaxy formation and evolution. Traditional algorithms, while accurate, are often computationally intensive, posing a significant bottleneck for the analysis of next-generation cosmological simulations and limiting the feasibility of on-the-fly processing. In this work, we introduce HALOS (Hierarchical Aggregation Learning for Overdensity Search), a novel deep learning pipeline for subhalo identification in 3D point clouds. Our method employs a multi-stage approach that decouples particle classification from instance segmentation. First, we engineer a set of physically motivated features for each particle. Second, a multi-layer perceptron simultaneously performs two tasks: (i) a semantic segmentation to classify particles as either bound to a subhalo or part of the unbound background, and (ii) a regression to predict the 3D coordinates of the parent subhalo centroid for each bound particle. Finally, the HDBSCAN (Hierarchical Density-Based Spatial Clustering of Applications with Noise) algorithm performs a density-based clustering exclusively on the pre-filtered set of bound particles, significantly reducing computational complexity. We train and validate our model using catalogues generated from cosmological N-body simulations by the SUBFIND algorithm. HALOS achieves a semantic classification accuracy of 95%, an Adjusted Rand Index for instance segmentation >90%, and an overall Completeness of 90%, demonstrating a close alignment with SUBFIND, while reducing computational time by a factor of ∼ 16.}
}
Software — Licence & how to cite
Licence (intended): GPL V3 license (to be confirmed in the repository).
A copy of the licence will be included as LICENSE in the repo.
About GPL V3.
Recommended software citation (plain text)
DeepCosmoNet Project (2025). DeepCosmoNet Core (v0.1) — Deep Learning for cosmic web Analysis. Source code. URL: https://deepcosmonet.koexai.com/resources/ Licence: GPL V3.
Software BibTeX (template)
@software{deepcosmonet_core_v0_1_2025,
author = {Koexai Srl},
title = {DeepCosmoNet Source Code},
year = {2025},
version = {0.1},
url = {https://deepcosmonet.koexai.com/resources/},
license = {GPL V3},
note = {Replace with repository URL and tag when public}
}
DeepCosmoNet