Data Clumps Dataset

Overview

Welcome to the Data Clumps Dataset repository, a comprehensive collection designed for research purposes. This repository hosts a curated set of projects analyzed for data clumps, providing valuable insights for researchers and practitioners in the field.

Repository Structure

The repository is structured as follows:

Each project within the Projects directory contains its own set of tags, encapsulating the unique aspects of data clumps found within that project.

Data Clumps Report Format

To understand the format of the data clumps report, please refer to our detailed definition data-clumps-type-context. This guide provides the necessary context and types of data clumps that we have identified in our research.

Collected Source Code Projects

We have collected and analyzed the following projects for data clumps:

Android-Universal-Image-Loader
BroadleafCommerce
antlr4
argouml
caffeine
ceylon-ide-eclipse
dolphinscheduler
elasticsearch
hazelcast
jflex
jfreechart
junit4
junit5
mapdb
mcMMO
neo4j
netty
orientdb
oryx
rocketmq
spring-boot
titan
xerces2-j

Each project offers a unique perspective and contributes significantly to our understanding of data clumps in software development.

Collected Model Projects

We analysed the following UML model datasets from https://github.com/NilsBaumgartner1994/UML-Class-Diagram-Dataset

LindholmenDb & Public Repos.
ModelsDb

Research:

Live Code Smell Detection of Data Clumps in an Integrated Development Environment - Baumgartner, N., Adleh, F., & Pulvermüller, E. (2023). Live Code Smell Detection of Data Clumps in an Integrated Development Environment. In Proceedings of the 18th International Conference on Evaluation of Novel Approaches to Software Engineering. 18th International Conference on Evaluation of Novel Approaches to Software Engineering. SCITEPRESS - Science and Technology Publications. https://doi.org/10.5220/0011727500003464
Advancing Code Smell Detection: Live Global Data Detection and Data Clumps Testcases - Baumgartner, N., Adleh, F., & Pulvermüller, E. (2024). Advancing Code Smell Detection: Live Global Data Detection and Data Clumps Testcases. In Communications in Computer and Information Science (pp. 230–250). Springer Nature Switzerland. https://doi.org/10.1007/978-3-031-64182-4_11
An Extensive Analysis of Data Clumps in UML Class Diagrams - Baumgartner N. and Pulvermüller E. (2024). An Extensive Analysis of Data Clumps in UML Class Diagrams. In Proceedings of the 19th International Conference on Evaluation of Novel Approaches to Software Engineering - Volume 1: ENASE; ISBN 978-989-758-696-5, SciTePress, pages 15-26. DOI: 10.5220/0012550500003687
Considerations in Prioritizing for Efficiently Refactoring the Data Clumps Model Smell: A Preliminary Study - Baumgartner N., Iyenghar P. and Pulvermüller E. (2024). Considerations in Prioritizing for Efficiently Refactoring the Data Clumps Model Smell: A Preliminary Study. In Proceedings of the 19th International Conference on Evaluation of Novel Approaches to Software Engineering - Volume 1: ENASE; ISBN 978-989-758-696-5, SciTePress, pages 144-155. DOI: 10.5220/0012698000003687
Challenges of Processing Data Clumps within Plugin Architectures of Integrated Development Environment - Baumgartner, N., & Pulvermüller, E. (2024). Challenges of Processing Data Clumps within Plugin Architectures of Integrated Development Environment (Version 1). arXiv. https://doi.org/10.48550/ARXIV.2403.03903
AI-Driven Refactoring: A Pipeline for Identifying and Correcting Data Clumps in Git Repositories - Baumgartner, N., Iyenghar, P., Schoemaker, T., & Pulvermüller, E. (2024). AI-Driven Refactoring: A Pipeline for Identifying and Correcting Data Clumps in Git Repositories. In Electronics (Vol. 13, Issue 9, p. 1644). MDPI AG. https://doi.org/10.3390/electronics13091644
The Lifecycle of Data Clumps: A Longitudinal Case Study in Open-Source Projects - Baumgartner, N., & Pulvermüller, E. (2024). The Lifecycle of Data Clumps: A Longitudinal Case Study in Open-Source Projects. In Proceedings of the 12th International Conference on Model-Based Software and Systems Engineering. 12th International Conference on Model-Based Software and Systems Engineering. SCITEPRESS - Science and Technology Publications. https://doi.org/10.5220/0012313900003645
Unveiling Data Clumps: A Detailed Longitudinal Analysis on Software Quality Across Public Repositories - Not published yet

Research Applications

This repository is an essential resource for research on data clumps. It can be utilized to explore various open questions in the field, such as:

Prioritizing data clumps by severity
Analyzing the correlation between data clumps and bugs
Developing new methodologies for detecting and resolving data clumps
Understanding the impact of data clumps on software maintainability and performance

Cite

Please cite this dataset as follows:

The Lifecycle of Data Clumps: A Longitudinal Case Study in Open-Source Projects - Baumgartner, N., & Pulvermüller, E. (2024). The Lifecycle of Data Clumps: A Longitudinal Case Study in Open-Source Projects. In Proceedings of the 12th International Conference on Model-Based Software and Systems Engineering. 12th International Conference on Model-Based Software and Systems Engineering. SCITEPRESS - Science and Technology Publications. https://doi.org/10.5220/0012313900003645

@conference{lifecycleDataClumps2024aBaumgartner,
  author    = {Nils Baumgartner and Elke Pulvermüller},
  title     = {The Lifecycle of Data Clumps: A Longitudinal Case Study in Open-Source Projects},
  booktitle = {Proceedings of the 12th International Conference on Model-Based Software and Systems Engineering (MODELSWARD 2024)},
  year      = {2024},
  month     = {February 21-23},
  address   = {Rome, Italy},
  publisher = {SciTePress},
  doi       = {10.5220/0012313900003645},
  note      = {{Nominated for Best Paper Award}}
  
}

Contributing

We welcome contributions from the research community. If you have suggestions, data, or analyses that could enrich this repository, please feel free to contribute.

License

No part of this software may be reproduced, distributed, or transmitted in any form or by any means, including photocopying, recording, or other electronic or mechanical methods, without the prior written permission of the copyright holder, except in the case of brief quotations embodied in critical reviews and certain other noncommercial uses permitted by copyright law.

For permission requests, please contact the copyright holder

Contact

[email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
Data		Data
.gitignore		.gitignore
README.md		README.md
logo.gif		logo.gif
manageFiles.sh		manageFiles.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Clumps Dataset

Overview

Repository Structure

Data Clumps Report Format

Collected Source Code Projects

Collected Model Projects

Research:

Research Applications

Cite

Contributing

License

Contact

Contributors

About

Releases

Packages

Languages

NilsBaumgartner1994/Data-Clumps-Dataset

Folders and files

Latest commit

History

Repository files navigation

Data Clumps Dataset

Overview

Repository Structure

Data Clumps Report Format

Collected Source Code Projects

Collected Model Projects

Research:

Research Applications

Cite

Contributing

License

Contact

Contributors

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages