Turkish-NLP-QA-Dataset (SQuAD Format)

This dataset contains question-answer pairs about historical places and tourist attractions in Turkey, prepared in SQuAD format. The dataset includes 15.000 QA pairs in total.

📝 About the Dataset

This dataset has been created using Google Gemini AI Created from fully validated data, converted to SQUAD format only with Google Gemini model, providing a comprehensive question-answer collection about Turkey's historical and tourist attractions. The dataset is specifically prepared in SQuAD (Stanford Question Answering Dataset) format for machine learning and natural language processing studies.

🔍 Dataset Content

The dataset contains information about structures in the following categories:

Historical Baths (Hamams)
Ancient Cities and Necropolises
Domed Tombs (Kumbets) and Monuments
Civil Architecture Examples (Mansions and Houses)
Historical Public Buildings
and more.

📊 Dataset Characteristics

Format: SQuAD (Stanford Question Answering Dataset)
Language: Turkish
Subject: Historical places and tourist attractions in Turkey
Data Type: Question-Answer pairs
Source: Content generated with Google Gemini AI

🎯 Example Data

Example 1: Kozlu Ancient Site (Kırıkkale)

ENGLISH Translation:

{
  "context": "Located approximately 7 km from Sulakyurt District of Kırıkkale Province, accessible via dirt roads, it is an ancient city ruins with no standing structural remains...",
  "qas": [
    {
      "question": "Which period is Kozlu Ancient Site thought to belong to?",
      "answer": "Roman Period"
    },
    {
      "question": "How can one reach Kozlu Ancient Site?",
      "answer": "via dirt roads"
    }
  ]
}

Example 2: Emir Ali Kumbet (Bitlis)

ENGLISH Translation:

{
  "context": "Measuring 9.10X 6.05 in total external dimensions, Emir Ali Kumbet...",
  "qas": [
    {
      "question": "What is the plan shape of the kumbet?",
      "answer": "rectangular"
    },
    {
      "question": "What are the external dimensions of Emir Ali Kumbet?",
      "answer": "9.10X 6.05"
    }
  ]
}

Example 3: Kazım Civciv House (Denizli)

ENGLISH Translation:

{
  "context": "Located in Serinhisar District of Denizli Province, the house is two-storied, built with stone foundation and adobe in upper floors...",
  "qas": [
    {
      "question": "What materials were used in the construction of the house?",
      "answer": "stone foundation and adobe in upper floors"
    },
    {
      "question": "What is the plan type of the house?",
      "answer": "open-sofa"
    }
  ]
}

🛠️ Dataset Creation Process

The dataset was created following these steps:

Collection of raw data in Excel format
Processing content using Google Gemini AI
Generation of JSON outputs for every 500 records
Conversion of data to SQuAD format
Quality control and editing

📊 Dataset Structure

The dataset is in JSON format, with each record containing the following information:

{
    "version": "v2.0",
    "data": [
        {
            "title": "context_title",
            "paragraphs": [
                {
                    "context": "Text content",
                    "qas": [
                        {
                            "question": "Question text",
                            "id": "unique_id",
                            "answers": [
                                {
                                    "text": "Answer text",
                                    "answer_start": integer
                                }
                            ]
                        }
                    ]
                }
            ]
        }
    ]
}

🎯 Use Cases

Training Turkish Natural Language Processing models
Developing Question-Answering systems
Historical and cultural heritage information systems
Tourism applications
Educational material development

📦 Requirements

Libraries used to create the dataset:

Python 3.x
pandas
google.generativeai
json
logging

🤝 Contributing

To contribute to the dataset:

Fork the repository
Create a new branch
Commit your changes
Submit a pull request

📄 License

This dataset is licensed under the GNU General Public License v3.0 (GPL-3.0). This means you are free to:

Use the dataset for commercial purposes
Modify the dataset
Distribute the dataset
Patent the dataset
Use the dataset for private use

For more details, see the LICENSE file or visit GNU GPL v3.0.

📞 Contact

EN: For project development and collaboration:

Email: [email protected] For questions and feedback about the dataset, please create an issue in this repository.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
LICENSE		LICENSE
README.md		README.md
README_TR.md		README_TR.md
Squad_Turkish_Dataset.json		Squad_Turkish_Dataset.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Turkish-NLP-QA-Dataset (SQuAD Format)

📝 About the Dataset

🔍 Dataset Content

📊 Dataset Characteristics

🎯 Example Data

Example 1: Kozlu Ancient Site (Kırıkkale)

Example 2: Emir Ali Kumbet (Bitlis)

Example 3: Kazım Civciv House (Denizli)

🛠️ Dataset Creation Process

📊 Dataset Structure

🎯 Use Cases

📦 Requirements

🤝 Contributing

📄 License

📞 Contact

About

Releases

Packages

License

Aieyup/Turkish-NLP-QA-Dataset

Folders and files

Latest commit

History

Repository files navigation

Turkish-NLP-QA-Dataset (SQuAD Format)

📝 About the Dataset

🔍 Dataset Content

📊 Dataset Characteristics

🎯 Example Data

Example 1: Kozlu Ancient Site (Kırıkkale)

Example 2: Emir Ali Kumbet (Bitlis)

Example 3: Kazım Civciv House (Denizli)

🛠️ Dataset Creation Process

📊 Dataset Structure

🎯 Use Cases

📦 Requirements

🤝 Contributing

📄 License

📞 Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages