่ฏทๅฐๆฐๆฎๆไปถๆ็ งไปฅไธๆนๅผๅญๆพ๏ผ
ๆฐๆฎ้ๆไธๅๆ ๅฝข็ปๆๆพๅจ dataset
ๆไปถๅคนไธ
dataset
โโCFPS 2010
โ โ
โ โโData
โ โ โโStata
โ โ cfps2010adult_202008.dta
โ โ cfps2010child_201906.dta
โ โ cfps2010comm_201906.dta
โ โ cfps2010famconf_202008.dta
โ โ cfps2010famecon_202008.dta
โ โ
โ โโDocumentation
โ ๅฎถๅบญๅ
ณ็ณปๅบ.pdf
โ ๅฎถๅบญ้ฎๅทๅบ.pdf
โ ๅฐๅฟ้ฎๅทๅบ.pdf
โ ๆไบบ้ฎๅทๅบ.pdf
โ ็คพๅบ้ฎๅทๅบ.pdf
โ ้ฎๅท.pdf
โ
โโCFPS 2011
โ โ
โ โโData
โ โ โโStata
โ โ cfps2011adult_102014.dta
โ โ cfps2011child_102014.dta
โ โ cfps2011family_202008.dta
โ โ cfps2011famroster_202008.dta
โ โ
โ โโDocumentation
โ 6b1bb40d683b405e9b7ed1e8329c1e65.pdf
โ
โโCFPS 2012
โ โ
โ โโData
โ โ โโStata
โ โ cfps2012adult_201906.dta
โ โ cfps2012child_201906.dta
โ โ cfps2012famconf_092015.dta
โ โ cfps2012famecon_201906.dta
โ โ
โ โโDocumentation
โ CFPS2012codebook(ๅฐๅฟ้ฎๅท).xls
โ ๅฎถๅบญๅ
ณ็ณปๅบ.pdf
โ ๅฎถๅบญ็ปๆตๅบ.pdf
โ ๆไบบ้ฎๅท.pdf
โ ่ทจๅนดidๅบ.pdf
โ ้ฎๅท.pdf
โ
โโCFPS 2014
โ โ
โ โโData
โ โ โโStata
โ โ cfps2014adult_201906.dta
โ โ cfps2014child_201906.dta
โ โ cfps2014comm_201906.dta
โ โ cfps2014famconf_170630.dta
โ โ cfps2014famecon_201906.dta
โ โ
โ โโDocumentation
โ 019b4fced85d4e42a825c3a186695155.pdf
โ CFPS2014codebook.xls
โ
โโCFPS 2016
โ โ
โ โโData
โ โ โโStata
โ โ cfps2016adult_201906.dta
โ โ cfps2016child_201906.dta
โ โ cfps2016famconf_201804.dta
โ โ cfps2016famecon_201807.dta
โ โ
โ โโDocumentation
โ CFPS2016codebook.xls
โ ้ฎๅท.pdf
โ
โโCFPS 2018
โ โ
โ โโData
โ โ โโStata
โ โ cfps2018childproxy_202012.dta
โ โ cfps2018crossyearid_202104.dta
โ โ cfps2018famconf_202008.dta
โ โ cfps2018famecon_202101.dta
โ โ cfps2018person_202012.dta
โ โ
โ โโDocumentation
โ CFPS2018codebook.xlsx
โ crossyearid_codebook.xlsx
โ ้ฎๅท.pdf
โ
โโๆๅญฆๆฐๆฎ้
onlinedemo.dta
- ๅปบ่ฎฎไฝฟ็จ Linux ๆไฝ็ณป็ป
- ๅจๅฎ่ฃ ไพ่ตไนๅ๏ผ่ฏท็กฎ่ฎคไฝ ๆไฝฟ็จ็ Python ็ๆฌไธไฝไบ 3.10
- ipython ไธบๆจ่ไพ่ต้กน๏ผไนๅฏไปฅไธๅฎ่ฃ
pip install -r process/requirements.txt
pip install ipython
ๅจ Repo ๆ น็ฎๅฝไธ่ฟ่กไปฅไธๅฝไปคๅฏไปฅ็ๆ schemas:
(็ๆ็ *.schemas.json ๅญๆพๅจๅฏนๅบ็ *.dta ๆไปถๆ่พน)
python process/stata_converter.py gen-schemas dataset
ๆจไนๅฏไปฅ่ฟ่กไปฅไธๅฝไปคๅฏผๅบๅ้่กจ๏ผ
python process/stata_converter.py gen-labels dataset
ๅฆๅค๏ผๆจไนๅฏไปฅๅฏผๅบ csv ๆไปถ๏ผExperimental๏ผ ไธไฟ่ฏๅฏผๅบๆฐๆฎ็่ดจ้๏ผ๏ผ
python process/stata_converter.py gen-csv dataset
ipython -i process/cfps_shell.py
ๆญคไบคไบๅผ Shell ๅทฒ็ป้ป่ฎค import
ไบ numpy
, pandas
, matplotlib
็ญๅบใ
ไฝฟ็จๆญคไบคไบๅผ Shell ๅ๏ผ่ฏท็กฎไฟๅทฒ็ป็ๆไบ schemas
ๅจๆญค Shell ไธญ๏ผๆจๅฏไปฅไฝฟ็จ cfps
ๅ
จๅฑๅ้ๆฅ่ฎฟ้ฎ cfps
ๆฐๆฎ๏ผๆจไนๅฏไปฅไฝฟ็จ cfpsๅนดไปฝ
่ฟๆ ท็ๅ
จๅฑๅ้ๆฅ่ฎฟ้ฎๅฏนๅบๅนดไปฝ็ๆฐๆฎใ
In [2]: cfps2011
Out[2]:
namespace(adult=StataDetail(2011, adult_102014, primary:pid),
child=StataDetail(2011, child_102014, primary:pid),
family=StataDetail(2011, family_202008, primary:fid),
famroster=StataDetail(2011, famroster_202008, primary:pid))
In [3]: cfps[2012]
Out[3]:
namespace(adult=StataDetail(2012, adult_201906, primary:pid),
child=StataDetail(2012, child_201906, primary:pid),
famconf=StataDetail(2012, famconf_092015, primary:('pid', 'fid12')),
famecon=StataDetail(2012, famecon_201906, primary:fid12))
็ถๅ๏ผๆจๅฏไปฅ้่ฟๅไธชๅญๆฎต่ฎฟ้ฎๆฐๆฎๅๅ ๆฐๆฎใ
# ๅฏไปฅ็จๆฐ็ปๅฝขๅผ่ฎฟ้ฎๅฏนๅบๅนด็ๆฐๆฎ
cfps[2011].adult.year # 2011
# ไนๅฏไปฅ็ดๆฅ็จๅ้ๅ่ฎฟ้ฎ
cfps2011.adult.key # adult_102014
cfps2011.adult.primary # pid
cfps2011.adult.path # 'dataset/CFPS 2011/Data/Stata/cfps2011adult_102014.dta'
cfps2011.adult.schema # ่ฟๅ Schema ๅญๅ
ธ
cfps2011.adult.data # ่ฟๅ Pandas DataFrame (Lazy load)
cfps2012.adult.rural # ่ฟๅไนกๆๅฐๅบๆฐๆฎ
cfps2012.adult.urban # ่ฟๅๅ้ๅฐๅบๆฐๆฎ
# ไนๅฏไปฅไปฅ็ดขๅผ็ๅฝขๅผ่ฎฟ้ฎ
cfps2012.adult["urban"] # ่ฟๅๅ้ๅฐๅบๆฐๆฎ
cfps2016.child["east"] # ่ฟๅไธ้จๅฐๅบๆฐๆฎ
cfps2016.child["west"] # ่ฟๅ่ฅฟ้จๅฐๅบๆฐๆฎ
cfps2018.person["west", "rural"] # ่ฟๅ่ฅฟ้จไนกๆๆฐๆฎ
cfps2018.person["northeast", "urban"] # ่ฟๅไธๅ้จๅ้ๆฐๆฎ
ๅฏๅจๆฐๆฎๅบ๏ผๅฆๅทฒๅฏๅจ๏ผ่ฏทๅฟฝ็ฅ๏ผ, ่ฟๅ ฅ Mysql Shell
ๆฌๆๅ่ฎพๆจๅจ Linux ็ฏๅขไธๆไฝ๏ผWindows ็จๆท่ฏท่ช่กๆง่กไธ้ข็ๆไปคๅฏนๅบ็ๆไฝใ
sudo systemctl start mysql
sudo mysql
ไธบๅบ็จ็จๅบๅๅปบ MySQL ๆฐๆฎๅบ่ดฆๆท, ็ถๅไฟฎๆน process/mysql_storage.py
ไธญ็ๆฐๆฎๅบ่ฟๆฅ้
็ฝฎใ
ๆณจๆ๏ผ ไธๅๅฝไปค็ปไบ cfps ๅ จ้จๆ้๏ผๅจ็ไบง็ฏๅขไธญ่ฏทๆ้ไฟฎๆน!
CREATE USER 'cfps'@'localhost' IDENTIFIED BY 'cfpsMySQL111++';
GRANT ALL PRIVILEGES ON * . * TO 'cfps'@'localhost';
FLUSH PRIVILEGES;
่ฟ่กไปฅไธๅฝไปคๅๅงๅๆฐๆฎๅบ
python process/mysql_storage.py db init
่ฟ่กไปฅไธๅฝไปคๅฐๆๆๆฐๆฎๅๅ ฅๆฐๆฎๅบ๏ผ
๏ผๆณจๆ๏ผไฝ ้่ฆๅ ็ๆ schemas ๆไปถ๏ผๅ่งไธๆ๏ผ
python process/mysql_storage.py db write
ๅจๅๅ ฅๆฐๆฎๅบๆถ๏ผๅบ็จ็จๅบไผๆพ็คบ่ฟๅบฆๆก๏ผ
.......
Creating famecon_2010...
100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 14797/14797 [00:28<00:00, 527.86it/s]
Creating adult_2011...
100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 1279/1279 [00:03<00:00, 364.06it/s]
Creating child_2011...
100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 7524/7524 [00:16<00:00, 443.37it/s]
Creating family_2011...
100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 13129/13129 [00:13<00:00, 948.86it/s]
Creating famroster_2011...
100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 50954/50954 [00:10<00:00, 4902.50it/s]
Creating adult_2012...
100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 35719/35719 [03:01<00:00, 197.33it/s]
Creating child_2012...
100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 8620/8620 [00:22<00:00, 379.30it/s]
Creating famconf_2012...
100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 55012/55012 [01:07<00:00, 811.34it/s]
Creating famecon_2012...
100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 13315/13315 [00:25<00:00, 515.88it/s]
Creating adult_2014...
100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 37147/37147 [02:13<00:00, 277.24it/s]
Creating child_2014...
ๅฆๆ้่ฆ dry-run, ๅช้่ฆๆไธ่ฟฐๅฝไปคไธญ็ db ๆฟๆขไธบ dry
ๅฐ start_year
ๅฐ end_year
(้ญๅบ้ด) ็ๆฐๆฎๅ
ฅๅบ
python process/mysql_storage.py db write [start_year=2010] [end_year=2018]
ๅฐๆๅฎ่กจ็ๆฐๆฎๅ ฅๅบ
python process/mysql_storage.py db write-one <year> <table_base_name>
ไพๅฆ
python process/mysql_storage.py db write-one 2011 adult
ๅบไบๆฌๆฌกๅคงไฝไธ่ฆๆฑ๏ผๆไปฌ้่ฆไฟ็ๅๅงๆฐๆฎๅบ๏ผๅ ๆญคๆไปฌไฟ็ไบๆๆๅๅง่กจๆ ผ๏ผ่ชๅทฑๅๆฐๅปบไธไบ่กจๆ ผ/่งๅพๆฅๅญๆพๆธ ๆดๅ็ๆฐๆฎ
่ฟ่กไปฅไธๅฝไปคๆฅ่ฟ่กๆฐๆฎ็ญ้๏ผ็ญ้ๅๅฝขๆ็ๆฐ่กจ็ๅ็งฐไผๅธฆๆ clean ๅ็ผ๏ผๅฆ๏ผ adult_2010_clean
python process/mysql_storage.py db filter
็ถๅ๏ผๅฏไปฅ่ฟ่กไปฅไธๅฝไปคๆฅๅ่งฃ้จๅ่กจๆ ผ
python process/mysql_storage.py db decompose <้
็ฝฎๆไปถ่ทฏๅพ>
้ ็ฝฎๆไปถ็่ฏญๆณๅฆไธๆ็คบ๏ผ
decompositions/child.json
{
"table": "child",
"postfix": "infant",
"2012|2014": {
"condition": "cfps{year}_age<2",
"columns": [
"wb8",
"wf701",
"wd2",
"wf603m",
"wa103",
"wa105b",
"wz302",
"wf605m",
"wg305",
"wb701",
"wg302",
"wd402",
"wb401",
"wb801",
"wf501",
"pid"
]
},
"2010": {
"condition": "childgroup=1",
"columns": {
"pid": "pid",
"wa101": "ๅญฉๅญ็่้พ๏ผๆ๏ผ",
"wa102": "ๅญฉๅญๅบ็ๆถ็ไฝ้๏ผๆค๏ผ",
"wa103": "ๅญฉๅญ็ฐๅจ็ไฝ้๏ผๆค๏ผ",
"wa104": "ๅญฉๅญ็ฐๅจ็่บซ้ซ๏ผๅ็ฑณ๏ผ"
}
},
"2016|2018": {
"condition": {
"2016": "cfps_age<2",
"2018": "age<2"
},
"columns": [
"wb8",
"wf701",
"wd2",
"wf603m",
"wa103",
"wa105b",
"wz302",
"wf605m",
"wg305",
"wb701",
"wg302",
"wd402",
"wb401",
"wb801",
"wf501",
"pid"
]
}
}
้
็ฝฎๆไปถ็่ฏญๆณ้ๅธธ็ตๆดป๏ผๆจ้่ฆๆ่ฆๅค็็่กจๆ ผๅ็งฐๅญๅจ table
ไธญ๏ผๅ็ผๅญๅฐ postfix
ไธญ๏ผ่ฟๆ ท็ๆ็ๆฐ่กจๆ ผ็ๅ็งฐๅฐฑๆฏ child_2010_infant
่ฟๆ ทๅญ็.
ๅฝไธไธช่ฎพ็ฝฎ้็จไบๅคๅนดๆถ๏ผๆจๅฏไปฅๅฐๅๅนดไปฅ |
ๅ้ไฝไธบ้ฎใ
ๅฝๅคไธชๅนด็ๆ้ๆฐๆฎ็ธๅ๏ผ่ๆฅ่ฏขๆกไปถไธๅๆถ๏ผๆจไนๅฏไปฅๅฐ condition
ๅๆไธไธชๅญๅ
ธใ
ๅฆๅค๏ผๆไปฌๆฏๆๅจ condition
ไธญๆๅผ๏ผๆจๅฏไปฅไฝฟ็จ {year}
่ฟๆ ท็ๆๅผๅญ็ฌฆไธฒๆฅๅค็่ฏธๅฆ cfps2012_age
่ฟ็งๅ้ใ
ๆจๅบ่ฏฅๅฐ้่ฆ้ๅ็ๅ้ไนฆๅๅจ columns
ไธญ๏ผๆ ่ฎบๆฏๅ่กจ่ฟๆฏๅญๅ
ธ๏ผๆไปฌ็็จๅบ้ฝ่ฝๅฆฅๅๅค็