Türkçe | English
This dataset contains question-answer pairs about historical places and tourist attractions in Turkey, prepared in SQuAD format. The dataset includes 15.000 QA pairs in total.
This dataset has been created using Google Gemini AI Created from fully validated data, converted to SQUAD format only with Google Gemini model, providing a comprehensive question-answer collection about Turkey's historical and tourist attractions. The dataset is specifically prepared in SQuAD (Stanford Question Answering Dataset) format for machine learning and natural language processing studies.
The dataset contains information about structures in the following categories:
- Historical Baths (Hamams)
- Ancient Cities and Necropolises
- Domed Tombs (Kumbets) and Monuments
- Civil Architecture Examples (Mansions and Houses)
- Historical Public Buildings
- and more.
- Format: SQuAD (Stanford Question Answering Dataset)
- Language: Turkish
- Subject: Historical places and tourist attractions in Turkey
- Data Type: Question-Answer pairs
- Source: Content generated with Google Gemini AI
ENGLISH Translation:
{
"context": "Located approximately 7 km from Sulakyurt District of Kırıkkale Province, accessible via dirt roads, it is an ancient city ruins with no standing structural remains...",
"qas": [
{
"question": "Which period is Kozlu Ancient Site thought to belong to?",
"answer": "Roman Period"
},
{
"question": "How can one reach Kozlu Ancient Site?",
"answer": "via dirt roads"
}
]
}
ENGLISH Translation:
{
"context": "Measuring 9.10X 6.05 in total external dimensions, Emir Ali Kumbet...",
"qas": [
{
"question": "What is the plan shape of the kumbet?",
"answer": "rectangular"
},
{
"question": "What are the external dimensions of Emir Ali Kumbet?",
"answer": "9.10X 6.05"
}
]
}
ENGLISH Translation:
{
"context": "Located in Serinhisar District of Denizli Province, the house is two-storied, built with stone foundation and adobe in upper floors...",
"qas": [
{
"question": "What materials were used in the construction of the house?",
"answer": "stone foundation and adobe in upper floors"
},
{
"question": "What is the plan type of the house?",
"answer": "open-sofa"
}
]
}
The dataset was created following these steps:
- Collection of raw data in Excel format
- Processing content using Google Gemini AI
- Generation of JSON outputs for every 500 records
- Conversion of data to SQuAD format
- Quality control and editing
The dataset is in JSON format, with each record containing the following information:
{
"version": "v2.0",
"data": [
{
"title": "context_title",
"paragraphs": [
{
"context": "Text content",
"qas": [
{
"question": "Question text",
"id": "unique_id",
"answers": [
{
"text": "Answer text",
"answer_start": integer
}
]
}
]
}
]
}
]
}
- Training Turkish Natural Language Processing models
- Developing Question-Answering systems
- Historical and cultural heritage information systems
- Tourism applications
- Educational material development
Libraries used to create the dataset:
- Python 3.x
- pandas
- google.generativeai
- json
- logging
To contribute to the dataset:
- Fork the repository
- Create a new branch
- Commit your changes
- Submit a pull request
This dataset is licensed under the GNU General Public License v3.0 (GPL-3.0). This means you are free to:
- Use the dataset for commercial purposes
- Modify the dataset
- Distribute the dataset
- Patent the dataset
- Use the dataset for private use
For more details, see the LICENSE file or visit GNU GPL v3.0.
EN: For project development and collaboration:
- Email: [email protected] For questions and feedback about the dataset, please create an issue in this repository.