Skip to content

Commit

Permalink
Merge pull request #4 from ksh24865/dev
Browse files Browse the repository at this point in the history
Dev
  • Loading branch information
ksh24865 authored Sep 7, 2020
2 parents c8c46e8 + 4ee708e commit 76c2a37
Show file tree
Hide file tree
Showing 12 changed files with 210 additions and 110 deletions.
28 changes: 28 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
*.pyc
*.o
*.so
*.swp
*.png
*.jpg
*.bmp
*.out*
*.dat*
*.log
*.pot
*.report
__pycache__
build
static

# virtualenv
.python-version

# Distribution / packaging
MANIFEST
*.idea

# python Virtualenv
venv/

# MAC
.DS_Store
21 changes: 21 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2019 Yoonje Choi

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
77 changes: 46 additions & 31 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,61 +1,76 @@
# COVID-19 Project
Store and Analyze data for COVID-19.
Visualize and Analyze data for COVID-19.

Table of contents
=================
<!--ts-->
* [Result](#Result)
* [Requirement](#Requirement)
* [Installation](#Build--Installation)
* [Open Data](#open-data)
* [Installation](#Installation)
* [Run](#run)
* [Open Data](#open-data)
<!--te-->

Result
=======
* 코로나 실시간 현황판
* 3월1일 ~ 오늘까지의 데이터

![date_change_](https://user-images.githubusercontent.com/55729930/92361595-90972880-f129-11ea-918c-7aa35ae12ab0.gif)


* 당일 지역별 확진자, 사망자, 격리해제 인원
* 날짜별 검사결과 그래프
* 날짜별 확진자 증가 추이 그래프
* 당일 지역별 확진자, 사망자, 격리해제 인원
* 날짜별 검사결과 그래프
* 날짜별 확진자 증가 추이 그래프
![preview_](https://user-images.githubusercontent.com/55729930/92361544-7b21fe80-f129-11ea-87b4-f4b82b83468d.gif)

* 연령대별 성별 감염자 그래프 (예정)
* 성별 연령별 확진자 비율 그래프
![covid_board - Kibana](https://user-images.githubusercontent.com/55729930/92398418-7bd78680-f163-11ea-9cb8-6a72bf165737.png)

Requirement
=======
```
Python >= 3.0
```

Installation
=======
$ git clone https://github.com/COVID19-SSU/covid19-project.git
Open Data
=======
* 공공 데이터 포털 (https://data.go.kr/)
* 보건복지부_코로나19 감염_현황
* 보건복지부_코로나19 시·도발생_현황
* 보건복지부_코로나19 연령별·성별감염_현황
```sh
$ git clone https://github.com/COVID19-SSU/covid19-project.git
```
```sh
$ sudo docker-compose up
```
```sh
$ pip3 install -r requirements.txt
```

Run
=======
* crawling
* 보건복지부 코로나19 데이터(3월1일~오늘)를 크롤링하여 elasticsearch에 추가


$ python3 covid19-project/covid19_infection_city/crawling_covid19_infection_city.py
$ python3 covid19-project/covid19_infection_status/crawling_covid19_infection_status.py
```shell script
$ python3 covid19-project/covid19_infection_city/crawling_covid19_infection_city.py
$ python3 covid19-project/covid19_infection_status/crawling_covid19_infection_status.py
```

* update
* 기존 데이터에 보건복지부 코로나19 데이터(오늘)를 elasticsearch에 업데이트

$ python3 covid19-project/covid19_infection_city/update_covid19_infection_city.py
$ python3 covid19-project/covid19_infection_status/update_covid19_infection_status.py

```shell script
$ python3 covid19-project/covid19_infection_city/update_covid19_infection_city.py
$ python3 covid19-project/covid19_infection_status/update_covid19_infection_status.py
```

* add Task scheduler(cron)
* 매일 12:00(PM)에 오늘 데이터 업데이트가 실행되도록 해당 작업 crontab에 등록 (작업 환경에 맞추어 절대 경로 수정 해줘야 함)

$ echo -e "0 12 * * * python3 ~/covid19-project/covid19_infection_status/update_covid19_infection_status.py\n0 12 * * * python3 ~/covid19-project/covid19_infection_city/update_covid19_infection_city.py" | crontab
* 매일 12:00(PM)에 오늘 데이터 업데이트가 실행되도록 해당 작업 crontab에 등록 (작업 환경에 맞추어 절대 경로 수정 해줘야 함)
```shell script
$ echo -e "0 12 * * * python3 ~/covid19-project/covid19_infection_status/update_covid19_infection_status.py\n0 12 * * * python3 ~/covid19-project/covid19_infection_city/update_covid19_infection_city.py" | crontab
```

* Dashboard
* Open http://localhost:5601
* Click `Management` tab
* Import [Kibana dashboard](https://github.com/COVID19-SSU/covid19-project/dashboard/export.ndjson)
* Create Index pattern
* Go to `Dashboard` tab

Open Data
=======
* 공공 데이터 포털 (https://data.go.kr/)
* 보건복지부_코로나19 감염_현황
* 보건복지부_코로나19 시·도발생_현황
* 보건복지부_코로나19 연령별·성별감염_현황
4 changes: 1 addition & 3 deletions covid19_infection_city/README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,7 @@
## Data Description

- Save 'covid19_infection_city' data from 'OPEN api' in 'elastick search' and update daily
Save `covid19_infection_city` data from `OPEN api` in `elasticsearch and update daily

## Meaning of an item

|Item name|Item Description|Sample|
:----:|:----:|:----:
|SEQ|게시글번호(국내 시도별 발생현황 고유값)|130|
Expand Down
50 changes: 24 additions & 26 deletions covid19_infection_city/crawling_covid19_infection_city.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@

from bs4 import BeautifulSoup
import requests
import json
Expand All @@ -7,50 +6,49 @@
from elasticsearch import Elasticsearch
import os


# 문자열 변경 함수 (str.replace의 반대버전)
def rreplace(s, old, new, occurrence):
li = s.rsplit(old, occurrence)
return new.join(li)
li = s.rsplit(old, occurrence)
return new.join(li)


# 공공 데이터 센터의 covid19 감염자 지역현황 집계 첫날부터 오늘까지의 감염 현황 데이터를 크롤링하여 엘라스틱 서치에 삽입

# 코로나 집계 첫날부터 오늘까지의 감염자 지역 데이터 받기

update = '?serviceKey=MMq1VsRlz5qKvsdKDrvMavJB5rGdJOA8JGKgyojceXcL5tj6MJtzG21jN30ke9OOHZI%2FsQEwftRprl%2FQjcE2bg%3D%3D&pageNo=1&numOfRows=10&startCreateDt=20200120&endCreateDt='+datetime.today().strftime("%Y%m%d")+'&'
url='http://openapi.data.go.kr/openapi/service/rest/Covid19/getCovid19SidoInfStateJson'+update
update = '?serviceKey=MMq1VsRlz5qKvsdKDrvMavJB5rGdJOA8JGKgyojceXcL5tj6MJtzG21jN30ke9OOHZI%2FsQEwftRprl%2FQjcE2bg%3D%3D&pageNo=1&numOfRows=10&startCreateDt=20200120&endCreateDt=' + datetime.today().strftime(
"%Y%m%d") + '&'
url = 'http://openapi.data.go.kr/openapi/service/rest/Covid19/getCovid19SidoInfStateJson' + update

# xml데이터를 파싱 후 json형태로 변환
req=requests.get(url)
html=req.text
jsontxt=json.dumps(xmltodict.parse(html), indent=4)
req = requests.get(url)
html = req.text
jsontxt = json.dumps(xmltodict.parse(html), indent=4)
root_json = json.loads(jsontxt)
#print(root_json)



# 매핑 불러오기

with open("mapping_covid19_infection_city.json", 'r') as f:
mapping = json.load(f)
mapping = json.load(f)

# 오늘까지의 doc을 dict형태로 변환 후 doc_list에 추가
# 공공api에서 제공하는 데이터 중 기준 날짜를 의미하는 문자열을 수정하여 date형식으로 변경
doc_list=[]
doc_list = []
for j in root_json['response']['body']['items']['item']:
doc ={}
for i in j:
doc[i]=j[i]
doc['stdDay']=doc['stdDay'][:doc['stdDay'].index('일')]
doc['stdDay']=doc['stdDay'].replace('년 ','-').replace('월 ','-')
if len(doc['stdDay'].split('-')[1]) <2:
doc['stdDay']=doc['stdDay'].replace(doc['stdDay'].split('-')[1],'0'+doc['stdDay'].split('-')[1],1)
if len(doc['stdDay'].split('-')[2]) <2:
doc['stdDay']=rreplace(doc['stdDay'],doc['stdDay'].split('-')[2],'0'+doc['stdDay'].split('-')[2],1)
doc_list.append(doc)
doc = {}
for i in j:
doc[i] = j[i]
doc['stdDay'] = doc['stdDay'][:doc['stdDay'].index('일')]
doc['stdDay'] = doc['stdDay'].replace('년 ', '-').replace('월 ', '-')
if len(doc['stdDay'].split('-')[1]) < 2:
doc['stdDay'] = doc['stdDay'].replace(doc['stdDay'].split('-')[1], '0' + doc['stdDay'].split('-')[1], 1)
if len(doc['stdDay'].split('-')[2]) < 2:
doc['stdDay'] = rreplace(doc['stdDay'], doc['stdDay'].split('-')[2], '0' + doc['stdDay'].split('-')[2], 1)
doc_list.append(doc)

# 엘라스틱 서치에 doc_list추가
es = Elasticsearch('localhost:9200')
index="covid19_infection_city"
index = "covid19_infection_city"
es.indices.create(index=index, body=mapping)
for i in doc_list:
es.index(index=index,doc_type="_doc",body=i)
es.index(index=index, doc_type="_doc", body=i)
44 changes: 23 additions & 21 deletions covid19_infection_city/update_covid19_infection_city.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,40 +8,42 @@

# 문자열 변경 함수 (str.replace의 반대버전)
def rreplace(s, old, new, occurrence):
li = s.rsplit(old, occurrence)
return new.join(li)
li = s.rsplit(old, occurrence)
return new.join(li)


# 공공 데이터 센터의 covid19 지역 별 감염자 현황의 오늘자 데이터를 크롤링하여 엘라스틱 서치에 삽입(업데이트)
# 매일 정오 즈음 해당 코드가 자동 실행될 수 있도록 작업 스케쥴러 설정(공공 데이터 센터의 데이터가 오전10~11시 쯤 업데이트됨)


# covid19 지역 별 감염자 현황 집계 오늘의 데이터 받기
update = '?serviceKey=MMq1VsRlz5qKvsdKDrvMavJB5rGdJOA8JGKgyojceXcL5tj6MJtzG21jN30ke9OOHZI%2FsQEwftRprl%2FQjcE2bg%3D%3D&pageNo=1&numOfRows=10&startCreateDt='+datetime.today().strftime("%Y%m%d")+'&'
url='http://openapi.data.go.kr/openapi/service/rest/Covid19/getCovid19SidoInfStateJson'+update
update = '?serviceKey=MMq1VsRlz5qKvsdKDrvMavJB5rGdJOA8JGKgyojceXcL5tj6MJtzG21jN30ke9OOHZI%2FsQEwftRprl%2FQjcE2bg%3D%3D&pageNo=1&numOfRows=10&startCreateDt=' + datetime.today().strftime(
"%Y%m%d") + '&'
url = 'http://openapi.data.go.kr/openapi/service/rest/Covid19/getCovid19SidoInfStateJson' + update

# xml데이터를 파싱 후 json형태로 변환
req=requests.get(url)
html=req.text
soup=BeautifulSoup(html, 'xml')
jsontxt=json.dumps(xmltodict.parse(html), indent=4)
req = requests.get(url)
html = req.text
soup = BeautifulSoup(html, 'xml')
jsontxt = json.dumps(xmltodict.parse(html), indent=4)
root_json = json.loads(jsontxt)

# 오늘 doc을 dict형태로 변환 후 doc_list에 저장
doc_list=[]
doc_list = []
for j in root_json['response']['body']['items']['item']:
doc ={}
for i in j:
doc[i]=j[i]
doc['stdDay']=doc['stdDay'][:doc['stdDay'].index('일')]
doc['stdDay']=doc['stdDay'].replace('년 ','-').replace('월 ','-')
if len(doc['stdDay'].split('-')[1]) <2:
doc['stdDay']=doc['stdDay'].replace(doc['stdDay'].split('-')[1],'0'+doc['stdDay'].split('-')[1],1)
if len(doc['stdDay'].split('-')[2]) <2:
doc['stdDay']=rreplace(doc['stdDay'],doc['stdDay'].split('-')[2],'0'+doc['stdDay'].split('-')[2],1)
doc_list.append(doc)
doc = {}
for i in j:
doc[i] = j[i]
doc['stdDay'] = doc['stdDay'][:doc['stdDay'].index('일')]
doc['stdDay'] = doc['stdDay'].replace('년 ', '-').replace('월 ', '-')
if len(doc['stdDay'].split('-')[1]) < 2:
doc['stdDay'] = doc['stdDay'].replace(doc['stdDay'].split('-')[1], '0' + doc['stdDay'].split('-')[1], 1)
if len(doc['stdDay'].split('-')[2]) < 2:
doc['stdDay'] = rreplace(doc['stdDay'], doc['stdDay'].split('-')[2], '0' + doc['stdDay'].split('-')[2], 1)
doc_list.append(doc)

# 엘라스틱 서치에 doc추가
es = Elasticsearch('localhost:9200')
index="covid19_infection_city"
index = "covid19_infection_city"
for i in doc_list:
es.index(index=index,doc_type="_doc",body=i)
es.index(index=index, doc_type="_doc", body=i)
4 changes: 1 addition & 3 deletions covid19_infection_status/README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,7 @@
## Data Description

- Save 'covid19_infection_status' data from 'OPEN api' in 'elastick search' and update daily
Save `covid19_infection_status data` from `OPEN api` in `elasticsearch` and update daily

## Meaning of an item

|Item name|Item Description|Sample|
|:----:|:----:|:----:|
|SEQ|게시글번호(국내 시도별 발생현황 고유값)|74|
Expand Down
31 changes: 16 additions & 15 deletions covid19_infection_status/crawling_covid19_infection_status.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,32 +8,33 @@
# 공공 데이터 센터의 covid19 감염현황 집계 첫날부터 오늘까지의 감염 현황 데이터를 매핑 후 엘라스틱 서치에 삽입

# 코로나 집계 첫날부터 오늘까지의 데이터 받기
update = '?serviceKey=MMq1VsRlz5qKvsdKDrvMavJB5rGdJOA8JGKgyojceXcL5tj6MJtzG21jN30ke9OOHZI%2FsQEwftRprl%2FQjcE2bg%3D%3D&pageNo=1&numOfRows=10&startCreateDt=20200120&endCreateDt='+datetime.today().strftime("%Y%m%d")+'&'
url='http://openapi.data.go.kr/openapi/service/rest/Covid19/getCovid19InfStateJson'+update
update = '?serviceKey=MMq1VsRlz5qKvsdKDrvMavJB5rGdJOA8JGKgyojceXcL5tj6MJtzG21jN30ke9OOHZI%2FsQEwftRprl%2FQjcE2bg%3D%3D&pageNo=1&numOfRows=10&startCreateDt=20200120&endCreateDt=' + datetime.today().strftime(
"%Y%m%d") + '&'
url = 'http://openapi.data.go.kr/openapi/service/rest/Covid19/getCovid19InfStateJson' + update

# xml데이터를 파싱 후 json형태로 변환
req=requests.get(url)
html=req.text
soup=BeautifulSoup(html, 'xml')
jsontxt=json.dumps(xmltodict.parse(html), indent=4)
req = requests.get(url)
html = req.text
soup = BeautifulSoup(html, 'xml')
jsontxt = json.dumps(xmltodict.parse(html), indent=4)
root_json = json.loads(jsontxt)

# 매핑 불러오기
with open('mapping_covid19_infection_status.json', 'r') as f:
mapping = json.load(f)
mapping = json.load(f)

# 오늘까지의 doc을 dict형태로 변환 후 doc_list에 추가
doc_list=[]
doc_list = []
for j in root_json['response']['body']['items']['item']:
doc ={}
for i in j:
doc[i]=j[i]
doc['createDt']=doc['createDt'][0:10]
doc_list.append(doc)
doc = {}
for i in j:
doc[i] = j[i]
doc['createDt'] = doc['createDt'][0:10]
doc_list.append(doc)

# 엘라스틱 서치에 doc_list추가
es = Elasticsearch('localhost:9200')
index="covid19_infection_status"
index = "covid19_infection_status"
es.indices.create(index=index, body=mapping)
for i in doc_list:
es.index(index=index,doc_type="_doc",body=i)
es.index(index=index, doc_type="_doc", body=i)
Loading

0 comments on commit 76c2a37

Please sign in to comment.