Skip to content

NilsBaumgartner1994/Data-Clumps-Dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Logo

Data Clumps Dataset

Overview

Welcome to the Data Clumps Dataset repository, a comprehensive collection designed for research purposes. This repository hosts a curated set of projects analyzed for data clumps, providing valuable insights for researchers and practitioners in the field.

Repository Structure

The repository is structured as follows:

Each project within the Projects directory contains its own set of tags, encapsulating the unique aspects of data clumps found within that project.

Data Clumps Report Format

To understand the format of the data clumps report, please refer to our detailed definition data-clumps-type-context. This guide provides the necessary context and types of data clumps that we have identified in our research.

Collected Source Code Projects

We have collected and analyzed the following projects for data clumps:

  • Android-Universal-Image-Loader
  • BroadleafCommerce
  • antlr4
  • argouml
  • caffeine
  • ceylon-ide-eclipse
  • dolphinscheduler
  • elasticsearch
  • hazelcast
  • jflex
  • jfreechart
  • junit4
  • junit5
  • mapdb
  • mcMMO
  • neo4j
  • netty
  • orientdb
  • oryx
  • rocketmq
  • spring-boot
  • titan
  • xerces2-j

Each project offers a unique perspective and contributes significantly to our understanding of data clumps in software development.

Collected Model Projects

We analysed the following UML model datasets from https://github.com/NilsBaumgartner1994/UML-Class-Diagram-Dataset

  • LindholmenDb & Public Repos.
  • ModelsDb

Research:

Research Applications

This repository is an essential resource for research on data clumps. It can be utilized to explore various open questions in the field, such as:

  • Prioritizing data clumps by severity
  • Analyzing the correlation between data clumps and bugs
  • Developing new methodologies for detecting and resolving data clumps
  • Understanding the impact of data clumps on software maintainability and performance

Cite

Please cite this dataset as follows:

@conference{lifecycleDataClumps2024aBaumgartner,
  author    = {Nils Baumgartner and Elke Pulvermüller},
  title     = {The Lifecycle of Data Clumps: A Longitudinal Case Study in Open-Source Projects},
  booktitle = {Proceedings of the 12th International Conference on Model-Based Software and Systems Engineering (MODELSWARD 2024)},
  year      = {2024},
  month     = {February 21-23},
  address   = {Rome, Italy},
  publisher = {SciTePress},
  doi       = {10.5220/0012313900003645},
  note      = {{Nominated for Best Paper Award}}
  
}

Contributing

We welcome contributions from the research community. If you have suggestions, data, or analyses that could enrich this repository, please feel free to contribute.

License

All Rights Reserved.

Copyright (c) 2023 Nils Baumgartner

No part of this software may be reproduced, distributed, or transmitted in any form or by any means, including photocopying, recording, or other electronic or mechanical methods, without the prior written permission of the copyright holder, except in the case of brief quotations embodied in critical reviews and certain other noncommercial uses permitted by copyright law.

For permission requests, please contact the copyright holder

Contact

Contributors

Contributors

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published