Welcome to the Data Clumps Dataset repository, a comprehensive collection designed for research purposes. This repository hosts a curated set of projects analyzed for data clumps, providing valuable insights for researchers and practitioners in the field.
The repository is structured as follows:
Each project within the Projects
directory contains its own set of tags, encapsulating the unique aspects of data clumps found within that project.
To understand the format of the data clumps report, please refer to our detailed definition data-clumps-type-context. This guide provides the necessary context and types of data clumps that we have identified in our research.
We have collected and analyzed the following projects for data clumps:
- Android-Universal-Image-Loader
- BroadleafCommerce
- antlr4
- argouml
- caffeine
- ceylon-ide-eclipse
- dolphinscheduler
- elasticsearch
- hazelcast
- jflex
- jfreechart
- junit4
- junit5
- mapdb
- mcMMO
- neo4j
- netty
- orientdb
- oryx
- rocketmq
- spring-boot
- titan
- xerces2-j
Each project offers a unique perspective and contributes significantly to our understanding of data clumps in software development.
We analysed the following UML model datasets from https://github.com/NilsBaumgartner1994/UML-Class-Diagram-Dataset
- LindholmenDb & Public Repos.
- ModelsDb
- Live Code Smell Detection of Data Clumps in an Integrated Development Environment - Baumgartner, N., Adleh, F., & Pulvermüller, E. (2023). Live Code Smell Detection of Data Clumps in an Integrated Development Environment. In Proceedings of the 18th International Conference on Evaluation of Novel Approaches to Software Engineering. 18th International Conference on Evaluation of Novel Approaches to Software Engineering. SCITEPRESS - Science and Technology Publications. https://doi.org/10.5220/0011727500003464
- Advancing Code Smell Detection: Live Global Data Detection and Data Clumps Testcases - Baumgartner, N., Adleh, F., & Pulvermüller, E. (2024). Advancing Code Smell Detection: Live Global Data Detection and Data Clumps Testcases. In Communications in Computer and Information Science (pp. 230–250). Springer Nature Switzerland. https://doi.org/10.1007/978-3-031-64182-4_11
- An Extensive Analysis of Data Clumps in UML Class Diagrams - Baumgartner N. and Pulvermüller E. (2024). An Extensive Analysis of Data Clumps in UML Class Diagrams. In Proceedings of the 19th International Conference on Evaluation of Novel Approaches to Software Engineering - Volume 1: ENASE; ISBN 978-989-758-696-5, SciTePress, pages 15-26. DOI: 10.5220/0012550500003687
- Considerations in Prioritizing for Efficiently Refactoring the Data Clumps Model Smell: A Preliminary Study - Baumgartner N., Iyenghar P. and Pulvermüller E. (2024). Considerations in Prioritizing for Efficiently Refactoring the Data Clumps Model Smell: A Preliminary Study. In Proceedings of the 19th International Conference on Evaluation of Novel Approaches to Software Engineering - Volume 1: ENASE; ISBN 978-989-758-696-5, SciTePress, pages 144-155. DOI: 10.5220/0012698000003687
- Challenges of Processing Data Clumps within Plugin Architectures of Integrated Development Environment - Baumgartner, N., & Pulvermüller, E. (2024). Challenges of Processing Data Clumps within Plugin Architectures of Integrated Development Environment (Version 1). arXiv. https://doi.org/10.48550/ARXIV.2403.03903
- AI-Driven Refactoring: A Pipeline for Identifying and Correcting Data Clumps in Git Repositories - Baumgartner, N., Iyenghar, P., Schoemaker, T., & Pulvermüller, E. (2024). AI-Driven Refactoring: A Pipeline for Identifying and Correcting Data Clumps in Git Repositories. In Electronics (Vol. 13, Issue 9, p. 1644). MDPI AG. https://doi.org/10.3390/electronics13091644
- The Lifecycle of Data Clumps: A Longitudinal Case Study in Open-Source Projects - Baumgartner, N., & Pulvermüller, E. (2024). The Lifecycle of Data Clumps: A Longitudinal Case Study in Open-Source Projects. In Proceedings of the 12th International Conference on Model-Based Software and Systems Engineering. 12th International Conference on Model-Based Software and Systems Engineering. SCITEPRESS - Science and Technology Publications. https://doi.org/10.5220/0012313900003645
- Unveiling Data Clumps: A Detailed Longitudinal Analysis on Software Quality Across Public Repositories - Not published yet
This repository is an essential resource for research on data clumps. It can be utilized to explore various open questions in the field, such as:
- Prioritizing data clumps by severity
- Analyzing the correlation between data clumps and bugs
- Developing new methodologies for detecting and resolving data clumps
- Understanding the impact of data clumps on software maintainability and performance
Please cite this dataset as follows:
- The Lifecycle of Data Clumps: A Longitudinal Case Study in Open-Source Projects - Baumgartner, N., & Pulvermüller, E. (2024). The Lifecycle of Data Clumps: A Longitudinal Case Study in Open-Source Projects. In Proceedings of the 12th International Conference on Model-Based Software and Systems Engineering. 12th International Conference on Model-Based Software and Systems Engineering. SCITEPRESS - Science and Technology Publications. https://doi.org/10.5220/0012313900003645
@conference{lifecycleDataClumps2024aBaumgartner,
author = {Nils Baumgartner and Elke Pulvermüller},
title = {The Lifecycle of Data Clumps: A Longitudinal Case Study in Open-Source Projects},
booktitle = {Proceedings of the 12th International Conference on Model-Based Software and Systems Engineering (MODELSWARD 2024)},
year = {2024},
month = {February 21-23},
address = {Rome, Italy},
publisher = {SciTePress},
doi = {10.5220/0012313900003645},
note = {{Nominated for Best Paper Award}}
}
We welcome contributions from the research community. If you have suggestions, data, or analyses that could enrich this repository, please feel free to contribute.
All Rights Reserved.
Copyright (c) 2023 Nils Baumgartner
No part of this software may be reproduced, distributed, or transmitted in any form or by any means, including photocopying, recording, or other electronic or mechanical methods, without the prior written permission of the copyright holder, except in the case of brief quotations embodied in critical reviews and certain other noncommercial uses permitted by copyright law.
For permission requests, please contact the copyright holder