types of classification:
- state
- use
- importance
classification is for risk management - reduce costs associated with protecting data
- match level of protection and cost with value of asset
- at rest
- being created
- in transit
- being changed/deleted
data storage location should be considered as well
- determining how it is used can help determine how it should be shared (or not)
Usage classifications
- Internal - created, computed, stored (in memory) within application
- Input - read into system and possibly stored
- Output - written to output destination
Sensitivity categories
- Security-sensitive - subset of data, very valuable to attacker
- PII
- Hidden - concealed using obfuscation to protect from unauthorized disclosure
- 📝 labeled according to risk if lost (high, medium, low)
- will differ from firm to firm
- 📝 impact considerations include cost, operation impact, people impact, customer impact
- data is not owned by a person, it is owned by the enterprise
- data is assigned to people for stewardship/ownership for practical reasons
- ownership is business driven
- acts in the interests of the enterprise
- determines who has what access
- owner != custodian (e.g. CFO owns accounting records but DBA can make direct changes)
- 📝 data owners define data classification, authorized users, access criteria, and security controls
- ensure processes safely transport, manipulate, and store data
- ensure data management processes (set by owner) are followed
- 📝 perform backups, data retention, disposal
- 📝 manage anything else (e.g. security controls) defined by owner
- custodians may not need read access
- built around business purpose
- wider concern than sensitivity
- includes loss, disclosure, and alteration
- three levels (high, medium or moderate, low) - each level should be clearly defined
- 📝 NIST FIPS 199 and SP 800-18 provide framework for classifying based on CIA
- high: set high enough that a small number of data elements are included
- financial limits and customer impact vary, each company needs to decide for itself
- has a defined structure that can be parsed, sorted, searched
- 📝 is not determined by where it is stored, but how
- 📝 look for relationships between data elements
examples:
- tables in a database
- formatted file structures
- XML
- JSON
- certain text files like log files
- not easily parsed/searched
- more difficult to modify outside originating application (word docs, pdfs)
- majority of data is unstructured
- cost of storing data is still a resource issue despite lower costs
- must be managed from backup, business continuity, disaster recovery perspective
if data is going to be persistent, need to label, classify, protect
if retained, must have metadata defined such as:
- data owner
- purpose of storing
- level of protection
- length of storage (retention policy)
protection must also consider how to protect backups and copies for DR
System logs: are considered important from a legal/compliance perspective, often have sensitive info that needs to be protected
primary purposes of disposal:
- conserve resources
- limit liabilities
📝 Length of storage is determined by:
- business purpose
- compliance
Legal hold data is not subject to normal disposal procedures