Skip to content

Commit

Permalink
add ado, docs, data
Browse files Browse the repository at this point in the history
  • Loading branch information
vedshastry committed Feb 1, 2022
1 parent b7d3c03 commit 0d797e1
Show file tree
Hide file tree
Showing 13 changed files with 9,389 additions and 0 deletions.
14 changes: 14 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# india-bridge

India bridge is a collection of Stata ado programs and datasets tracking district changes in India as between 1951-2011.

## Directory structure

- /ado contains `indiabridge.ado`, a Stata program that assigns district & state level identification given a Census year.
- /data contains a bridge/crosswalk Stata .dta, and an Excel file that can be used to track these changes.
- /docs contains relevant references and documentation used to build india-bridge

## Usage
Running `help indiabridge` in Stata will pop-up a dialog with syntax and examples for use.

Last updated: Feb 01, 2022
4,248 changes: 4,248 additions & 0 deletions ado/dclean.ado

Large diffs are not rendered by default.

4,189 changes: 4,189 additions & 0 deletions ado/dcode.ado

Large diffs are not rendered by default.

107 changes: 107 additions & 0 deletions ado/indiabridge.ado
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
*-------------------------------------------------------------------------------
* Objective: Assign india_bridge consistent identifiers
*-------------------------------------------------------------------------------

* Input: year, state, district (year must be in YYYY entered directly or as a numeric var)
* indiabridge, y() s() d()

* define program
capture program drop indiabridge
program define indiabridge
* syntax: statename must be string, and year must be specified
syntax [if], Year(string) State(varlist string) District(varlist string)

* confirm if dependency -egenmore- is available
capture findfile egenmore.sthlp, path(BASE;SITE;PERSONAL;PLUS)
* return error if not available
if "`r(fn)'" == "" {
di as error "User-written package -egenmore- needs to be installed first;"
di as error "Ensure the dependency is available by running -ssc install egenmore- before indiabridge"
exit 498
}

* display progress
di as text _dup(99) "_"
di as text "Running india-bridge for year (`year') on state variable/s (`state') and district variable/s (`district')"
di as text _dup(99) "_"

* pass arguments to state and district programs individually over varlist specified

* for each statename variable,
di as text "Applying (`year') identification on states: `state'"

foreach sv in `state'{

* run state programs for the given year
ibrstate, y(`year') s(`sv')

* for each district variable,
di as text "Applying (`year') identification on districts: `district'"
foreach dv in `district'{
* store isocode from above
local iso iso_`sv'
* run district programs for the given year
ibrdist, y(`year') d(`dv') i(iso_`sv')
* concatenate state & district identifiers to generate unique identifier
if `year' != 2011{
qui replace dcode_`dv' = scode_`sv' + dcode_`dv'
}

di as text _dup(99) "-"
di as text "State identifiers assigned in variables: iso_`sv' scode_`sv' ut_`sv'"
di as text _dup(99) "-"
di as text "District codes assigned in variable: dcode_`dv'"
}
}

di as text _dup(99) "_"
* end indiabridge
end


*-------------------------------------------------------------------------------
* subprograms
*-------------------------------------------------------------------------------
* States
* Objective: Assign india_bridge consistent identifiers to a list of state names
*-------------------------------------------------------------------------------

* define ibrstate (india_bridge state)
capture program drop ibrstate
program define ibrstate

* syntax: statename must be string, and year must be specified
syntax [if], Year(string) Statevar(varlist string)

* call state clean + assign programs
sclean `statevar' `year'
scode `statevar' `year'

* end ibrstate
end

*-------------------------------------------------------------------------------
* Districts
* Objective: Assign india_bridge consistent identifiers to a column of district names
*-------------------------------------------------------------------------------

* define ibrdist (india_bridge district)
capture program drop ibrdist
program define ibrdist

* syntax: distname must be string, and year must be specified along with iso code
syntax [if], Year(string) Distvar(varlist string) Isocode(varlist string)

* backup original state name, and generate new sieved name
qui rename `distvar' `distvar'_raw
qui egen `distvar' = sieve(`distvar'_raw), keep(a)
* trim spaces and standardise to lowercase
qui replace `distvar' = ustrtrim(strlower(`distvar'))
qui replace `distvar' = subinstr(`distvar'," ","",.)

* call district clean + assign programs
dclean `distvar' `isocode' `year'
dcode `distvar' `isocode' `year'

* end ibrdist
end
86 changes: 86 additions & 0 deletions ado/indiabridge.sthlp
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
{smcl}

{* *! version 17.1 01feb2022}{...}

{p2col:{bf:indiabridge}}

Census consistent district identification

{marker syntax}{...}

{title:Syntax}

{p}

{cmd:indiabridge} {it:[if]}{cmd:,} {opt y:ear(num)} {opt s:tate(varlist)} {opt d:istrict(varlist)}

{marker menu}{...}

{marker description}{...}

{title:Description}

{pstd}

{cmd:indiabridge} is a collection of ado programs to standardise variations/typos in state and district names in India from 1951 until 2011.
These names are standardised in accordance with Census reports and the administrative atlas (available in the project documentation).
After standardising names, {cmd:indiabridge} also assigns unique state codes and district codes along with the relevant (current) ISO code & union territory status.

{marker options}{...}

{title:Options}

{phang}{opt y:ear(num)} is a year required in YYYY format (numeric).

{phang}{opt s:tate(var)} specifies a variable containing state names.

{phang}{opt d:istrict(var)} specifies a variable containing district names.

{marker examples}{...}

{title:Examples}
{hline}

{pstd} Assuming 2001 identification is required for state names stored in {cmd: state_name} and district names stored in {cmd: district_name} -

{phang2}{cmd:. indiabridge, y(2001) s(state_name) d(district_name)}


{hline}

{title:Maintainer}

{p 4 4 2}{bf:Vedarshi Shastry}{break}
[email protected]{break}
shastryved.github.io

{bf:To-do}
{hline}
{cmd:indiabridge} currently only works for one variable of state & district each, for one year at a time.
I plan on building in input for a year variable in YYYY format for {opt y:ear()}, and parsing multiple state/district variables in one go.

{pstd} Suppose now the variable {cmd: cenyear} contains a list of multiple Census years (say, 1991 and 2001) -

{phang2}{cmd:. indiabridge, y(cenyear) s(statelist) d(districtlist)}

{phang2} will now read the year from {cmd: cenyear} and assign the relevant state/district identifiers to variables in {cmd:statelist, districtlist}.

{hline}

{title:Acknowledgements & References}

{p 4 8 2}
{bf:Kumar, Hemanshu and Somanathan, Rohini}{break}
{it:State and District Boundary Changes in India: 1961-2001 (November 6, 2015)}{break}
http://dx.doi.org/10.2139/ssrn.2687484

Additional information on district splits and merges was derived from work previously done by the authors of this paper.

{p 4 4 2}{bf:Nicholas J. Cox, Durham University, U.K.}{break}
[email protected]

{cmd:egenmore}, maintained by Nicholas J. Cox is a dependency for {cmd:indiabridge}.



{hline}
Loading

0 comments on commit 0d797e1

Please sign in to comment.