GNU General Public License v3.0 licensed. Source available on github.com/zifeo/EPFL.
Fall 2015: Software Engineering
[TOC]
- casual vs pro soft. dev.
- real soft. : millions of LoC, written by team, maintained and evolved over time
- can cause real harm if soft. fails
- quite new science
- soft. depends on people : fallible, written in group
- topics
- soft. dev. tools
- soft. dev. methodology
- modularity and abstraction
- verification and validation
- design patterns
- coding conventions
- refactoring code
- secure programming
- good performance
- program analysis and defect detection tools
- soft. engineering research
- agile soft. dev. : scrum
- collections of practices for a small team
- lightweight
- organized in short-term called sprints (complete iterations)
- close interaction with customer : no advance planning
- transparent : everything is showed to the team (included in-person progress)
- needs
- safety
- security
- reliability
- performance
- manageability
- soft. errors is costly
- need for reliability : mars, rocket, cars
- improve developers productivity
- stimulate programmers creativity
- make cheaper to build good software
- powerful tools
- management : large soft. project need coordinations (and budjet control)
- understand the problem
- read carefully
- digest and think
- break down the problem
- separate thinking from execution
- time management : key to success
- make a plan with deadlines
- expect the unexpected : margin of error (nasty bugs)
- track time-to-deliver, learn from mistakes (analysis)
- team coordination
- designate a part-time manager : rotate management role among team members
- tracker system : split problem into individual, assign task to people
- how to deal with slacker : set per-task deadlines, present concrete evidence
- team effort
- code reviewed by peers : >= on each piece of code
- put code in the public eye : public tests and results
- get involved in others code
- think of code as write-once/read-many
- focus on code readability
- adopt a specific coding style for the team
- self-documenting code repository : informative commit messages, enable checkstyle
- code in repository should always compile
- use quantitative metrics
- measurement : #1 rule in engineering
- examples : code size, test effectiveness/efficiency, debugging effectiveness/efficiency, productivity (LoC)
- what if you screw up
- avoid foolish optimism
- add more people : only if clear splitted tasks
- best choice is reduce scope of project : less functionality, partition into (must-have/nice-to-have/optional)
- work smarter
- be agile : prototype and throw if not good, fix inefficiencies on the go, don't aim for perfection
- be smart : factor out things need > 3 usages
- love what you do : invest time in learning the full power of tools, contribute to open-source projects, news
- defining the problem : what problem should the software solve? (product vision, definition)
- brief : less than 2 pages
- no reference to possible solutions
- from users point of view
- formulating requirements : what should the software do to solve the problem? (functional specification)
- acts as a contract between user and developer
- makes users requirements explicit
- good functional specifications
- all user tasks : description, expected reponse time, success vs failure
- all system inputs : source, accuracy, value range frquency of arrival
- all system outputs : destination, accuracy, value range, format
- all interfaces with the rest of the world : hw, sw, communication interfaces
- good functional specs
- requirements are verifiable : quantitative vs qualitative assessment
- competing attriutes can be resolved
- clear connection to problem definition
- requirement change
- customer does not know what he wants
- 25% of requirements change during development : account for 75% of the amout of work
- software architecture
- block approach : requirements -> architecture -> detail design -> construction -> testing
- when requirements are stable
- design is straighforward and well understood
- development team need not to experiment
- agile approach : bounce between everything
- when requirement are blurry
- complex and challenging design
- dev. team not familiar with the soft. being built
- preprocessors : macros
- compilers : static analysises, complexity metrics
- linker
- builders
- testing/debugging framework : coverage tools, fault injectors, logging
- profiler : identify bottlenecks, time spent
- IDE
- VCS
- continuous integrations server
- documentation automators
- CASE : computer aided soft. engineering (code generation)
- checkpoints of a project
- rebuild any version of the project (revert if needed)
- finds bugs introduced
- save and copy the code
- trace evolution and issue
- share code dev.
- branching allow parallel work
- conflict : modification overlap
- logical structure of program
- make the code understandable
- whitespace : key to understandable text
- indentation : 2-4 spaces optimal
- parentheses to clarify
- class layout
- header comment
- class data
- public methods
- protected methods
- private methods
- files : separate files for interface and implementations, one class per file, name of file after the name of the class
- formatting : length of line <= 120 chars. and length of class <= 2'000
- imports : avoid wildcard, avoid platform-specific, group imports by package
- methods parameters : no alignmenent
- simple methods : small and focused (not > 50 LoC, max 7 params)
- braces : always
- vertical alignements can make things more readable
- split complex predicates cleanly
- multi-line statements : make incompleteness obvious
- each line should do one thing
- var names
- short but clear
- long names for rarely used variables
- loop indexes only inside of loop : i, j, k
- computed-value qualifies at end of vars : total, sum, average, max, min, count
- common opposites : begin/end, first/last, locked/unlocked, min/max, next/previous, old/new, opened/closed, visible/invisibe, source/target, source/destination, up/down
- purpose-driven naming (no temp)
- no flag in the name
- avoid hybrid coupling (hidden meaning) : prefer things like positiveBytes
- separate namespaces
- use enums liberally
- bools : use done, error, found, success names but positive meaning
- shortening names : remove non-leading vowels, remove articles, remove useless endings, stay consistent, should pass the telephone test
- data declaration : put in comments what cannot go in name
- avoid numerals namings or commonly misspelled words
- improper initilization : fertile source of bugs, should be close to usage, can be set to final
- use const instead or hardcoded value
- variable span : LoC between successive references
- variable liveness interval : LoC between first and last reference (can be reordered at compile time)
- variable scope : program space where var is valid
- in general, try to minimize these
- soft. system : group of interconnected components that exhibits an exprected collective behavior observed at the interfaces with its environment
- symptoms of complexity
- large number of components : libs
- large number of interconnections
- many irregularities and exceptions
- high "Kolmogorov" complexity : minimal lenght of a description of the object
- structured programming
- single-entry/exit control constructs
- no unpredictable jump
- can reason about
- OOP
- encapsulation
- contain behavior inside of modules
- isolation
- configure and maintain separately
- virtualization : server consolidation, on-demand cloud computing, replication
- modules : replaceable units
- top design : break down into major subsystems, define interfaces for them, internal class design left to programmer
- example of abstractions : routines, lambda, objects
- guidlines
- clean and lean : compartmentalized, avoid being too clever, aim for minimality
- loosely coupled : minize bonds to other classes, avoid implicit connections
- portable : avoid exposing environement
- extensible : anticipate change
- stratified : use layers of abstraction, nested classes
- reusable : capture the right level of fundamental attributes correspondin to level of abstraction
- maintainable : self-explanatory design, use hierarchy, design for testability, be paranoid
- hide what is likely to change : protects from legacy code
- use invariants (always truee)
- pre/post-conditions
- asserts : assumptions, impossible case, documentation (
java -ea
to enable), default switch statement - audit can check invariant at call
- testing, debugging during execution
- beware of concurrency
- return the number of errors
- garbage in output non garbage
- source of garbage : errors, corruption, unpredictable events
- handling garbage
- nothing out
- error message out
- clean input
- things to check : reference not null, valid range, stream status, file access type
- throw exception if bad input
- fix invalid data : previously used value, neutral value, closest legal value
- be careful on known truths
- sanity checks for your program
- defensive copying
- systematic process of running a program and verifying
- prevent bug respawn
- "Program testing can be used to show the presence of bugs, but never to show their absence!" Dijkstra
- definitions : bug (fault) -> error -> failure
- error : a discrepancy between a computed, observed, or measured value or condition and the true, specified, or theoretically correct value or condition
- failure : the inability of a system or component to perform its required functions within specified performance requirements
- fault : an incorrect step, process, or data definition in a computer program which causes the program to perform in an unintended or unanticipated manner
- bug : a fault in a program which causes the program to perform in an unintended or unanticipated manner
- goal of testing is to identify faults
- types of tests
- unit tests : self-contained test of a method, class, module
- integration tests : do the pieces work together
- check-in tests (smoke tests) :inexpensive, broad coverage of apps's functionality (run before every check-in)
- regression tests : thorough tests (high coverage), test of every bug that has been encountered
- random testing (fuzz testing) : concolic testing
- white box vs black box testing
- cannot prove correctness
- risk management
- bug density : 10-100 bugs / KLoC after coding, 1-5 bugs / KLoC not detected before delivery
- boundary testing : special cases
- equivalence classes : two classes one check
- traditional quality assurance
- run code in your head
- heed IDE warnings
- heed compiler warnings
- code review
- unit testing (black + white)
- integration testing
- system testing
- alpha testing
- beta testing
- metrics : covered if it is executed by at least one test (methods, statements, branches, paths coverage)
- tension between quality vs cost
- unit test : independ of each other
- others testings
- acceptance testing : performed by user upon receiving product
- smoke/sanity testing : quick test for serious errors (compile)
- compatibility testing : work elsewhere
- fault injection : bad inputs, no network
- performance testing
- usability testing : can users accomplish their objectives with the software as designed ?
- test-driven dev. : write test that fails -> make test pass -> refactor -> ..
- the #1 challenge
- paths ~= 2 ^ program size (firefox: 5'000'000 LoC)
- software verification
- testing
- human proofs (< 100 LoC) : mathematical view of soft. engineering
- machine proofs (< 100'000 LoC)
- soft. complexity grows fast
- cannot only rely on testing
- building bridge vs soft. dev. : fundamental difference is that bridge is based on immutable laws
- average vs best programmer
- initial coding time : 20x ratio
- debugging time : 20x ratio
- program execution speed : 10x ratio
- 80% of contribution from 20% of programmers
- debugging is 50% average dev. cycle
- long-lived errors are expensive : finding errors early reduces time and cost >2x
- cost of downtime : from email app (2000$/h) to trading app (75000$/h)
- test-driven dev. : verification before coding
- behavior-driven dev. : validation before coding
- agile dev. processes : shorten time between coding and error discovery
- discipline and rigor
- building blocks
- experience = toolbox of reusable solutions
- description : name, problem, solution, consequences
- creational patterns
- Abstract Factory : assemble something without regard to components details (UI builder call on factory depending on the OS)
- Builder
- Factory Method
- static factory methods : gain control over object creation (from cartesian, from polar)
- original GOF (gang of four) : delegate to subclasses the decision of what instance of a class to create (polymorphism, non-static)
- Factory Object
- Lazy Initialization
- Prototype
- Singleton : only one object (static get intance, should check for reflexion attacks, can use enum)
- structural patterns
- Adaptor
- Bridge
- Composite
- Decorator
- Façade
- Flyweight
- Proxy
- behavioral patterns
- Chain of Responsibility
- Command
- Interpreter
- Iterator
- Mediator
- Memento
- Observer : one-to-many dependency, subject changes and observers get notified
- State
- Strategy
- Template Method
- Visitor
- architectural patterns
- model-view-controller
- service-oriented achitecture
- concurrency patterns
- active object
- monitor
- thread pool
- basic patterns
- encapsulation
- problem : exposure can lead to invariance
- solution : hide components
- consequences : may reduce perfs
- inheritance
- problem : similar abstractions
- solution : inherit default members
- consequences : runtime dispatching introduces overhead
- iteration
- problem : accessing all collection members
- solution : implementation does traversal
- consequences : iteration order constrained by implementation
- exceptions
- problem : errors occur in one place
- solution : specialized language struct
- consequences : handling level unknown
- iterator
- problem : iterating over a collection with state
- solution : encapsulate iteration
- consequences : can fail rapidly (robust: keep modifications count at creation, check it at each iterator step, throw
ConcurrentModificationException
if changed)
- strategy
- problem : different method for different type
- solution : encapsulate differences into a family
- consequences : limited interface
- model-view-controller
- problem : differentiate gui, data and controls
- solution : 3 modules with controller controlling model and view
- consequences : can be fuzzy to split roles
- adapter
- problem : class has different interface from what caller expects
- solution : solution template
- consequences : overhead but multiple times used
- decorator
- problem : augment functionality
- solution : dynamically add/override behavirs underneath same interface
- consequences : large number, use factories
- proxy
- problem : different object might use same protocol
- solution : isolate protocol specification with a surrogate object having the same interface (caching)
- consequences : can overlap with other patterns
- composite
- problem : recursive structures of similar objects used in the same manner
- solution : compose one or more object behind same interface, blackbox reuse
- consequences : need to exhibit same functionality
- visitor
- problem : large hierarchy of dispatched overriden function (add new behavior to a group of classes without changing them)
- solution : centralize same methods in visitor (pattern match)
- consequences : breaks encapsulation, may need privileges
- encapsulation
- Liskov Subsitution Principle (LSP) : let q(x) be a property provable about objects x of type T, then q(y) should be provable for objects y of type S where S is a subtype if T
- subtype must preserve supertypes invariants
- subtype not allowed to strengthen preconditions or weaken postconditions
- inheritance : is a
- use only when it simplifies design
- use final when enforcing limitation
- can break encapsulation (favor private over protected)
- avoid deep hierarchy (max 3 level and 7+/-2 subclasses, otherwise buggy)
- avoid linear hierarchy
- push common things as high up as possible
- multiple heritance is bad (diamond problem)
- mixins : inheritance of implementations allowed, orthogonal functionality, single purpose, not instantiable in their own
- containement : has a
- avoid excessive method forwarding
- contraing class controls the interface
- use when classes share common data but not behavior
- design by contract
- natural language is less precise than code
- comment should be checked at reviews (disagree between code and comment is nasty)
- types of comments
- summary : OK
- description of code intent : OK
- repeat, explain the code : BAD
- code markers (issue) : OK
- comment what, not how
- warn the reader : e.g. workaround
- class comments : header comment contains purpose + design
- interface comments : dependent from implementation
- method header : 1/2 sentence, desribe assumptions, include source (algo)
- end-of-line comments : rarely justified (except for bug or non-obvious case)
- comment while coding
- coding conventions is important : 40-80% of lifetime cost of soft. goes to maintenance
- one convention is better than none or many
- use checkstyle
- understand phases of soft. dev.
- dev. phases
- market research : talks to the users, evaluate feasability, cost
- requirements gathering and analysis : define product
- specification, planning, desing : turn requirements into design specification
- implementation and testing : write (15%) and test with integration (50%)
- deployment : package code
- support and maintenance
- prototyping model : frim structured dev. to amorphous dev.
- waterfall model : linear, well defined, requirement must be known upfront, inflexible
- iterative model : waterfall with a divide and conquery strategy, benefit of better control, important functionality early, temptation todefer
- sequence of mini waterfalls
- break down into mini waterfalls to be pursued in parallel
- do a waterfall up to design, then do iterative prototyping
- spiral model : identify risk early, users involved, evoluation possible, maintenance included, complex, can spiral forever, use when medium size, robust implementation more important than functionality
- prototyping model (evolutionary model) : encourage user pariticipation, resolves unclearness, innovation, flexibility, difficult to analyse and manage, use for high-pressure, short-lived, customer not very knowledgeable
- waterfall model : linear, well defined, requirement must be known upfront, inflexible
- agile methods : scrum, adaptive soft. dev., feature driven dev., crystal clear, extreme programming, rapid appliation dev., rational unify process
- scrum
- basic structure : sprint (1-4 weeks)
- working product after each sprint
- cross-functional teams of 5-10 people
- meet daily
- product owner : represents customer interests, maximizes buisness value
- team : build the product, multiple domain, self-managing
- scrum master : ensure team success (not manager) protect team from outside interference, helps resolve impediments
- manager : not a nanny but a guru
- product backlog : to-do list to prioritized by value to customer (contaisn features, dev. requirements, bugs, reserach) evolves over time, team provides estimates
- sprint planning meeting : review of product backlog, team selects items to complete, go for the sprint
- daily scrum : what was done (rapidly), no discussion, just rapporting, and what will be done until next scrum
- if goal not met, learn from it
- sprint review : demo product, everyone, ask questions, last as long as needed
- sprint retrospective : updates product backlog, can be done by someone from outside, maintain the pace
- confidentiality : not disclosed to unauthorized parties
- integrity : data modifications must be detected
- availability : information available when needed
- authenticity : each party is who they claim to be
- accountability : actions of a party can be traced uniquely to that party
- security controls
- administrative : policies, procedures (enforce logical and phyical)
- logical : encryption, access control
- physical : locks, separation of duties
- challenges
- threat : deliberate (cracker), accidental (failure), environmental (tornado)
- vulnerability : buffer underflow
- attack : assault, intelligent act to violate the security
- anticipate abnormal behavior
- attacks
- interruption : availability (disrupting traffic)
- interception : confidentiality (eavesdropping)
- modification : integrity (corrupting)
- fabrication : authenticity, accountability (forging data)
- toolkit : disassemblers, debuggers, fuzzers, malware
- risks
- connectivity : globally accessible
- extensibility : plugin architectures
- complexcity : system sizes
- security is not a feature but a property
- lack of planning issue
- cannot be added
- pillars of soft. security
- buisness goals
- high ROI (return on investments)
- high SLA (service level agreement)
- reducing dev. costs
- rank goals according to priority
- identify risks and threats : into context (likelihood, severity, impact, frequency, completeness)
- monitor and update risks over time
- known problems
- target priviledged level : root
- argument injection : shell scripts
- embedding scripts in scripts : mix languages
- buffer overflows
- attack configuration files
- manipulate parsing
- abusing input sanitization : cross-site scripting, use local filename
- client-server : directly to the server, manipulate session ID, inject scripts into variables
- race conditions
- misuse of cryptograhpy
- password management : unsalted hashes
- best pratices
- check code quality : broken error handling
- encapsulation : restrict interactions
- defensive programming : separate clean inside from dirty outside
- white list, not black list
- decouple user input from application logic
- least privilege
- redundant security controls
- multiple layers : firewall, vpn, authentication, checksums, bometric access
- improve code : prevent decay, clean up, simplify, increase readability
- avoid future problems : find bugs, reduce debugging time
- rethink, be creative : better way to solve a problem
- code smell : indication for refactoring, bad practices
- feature envy : using more from another class (move or extract)
- shotgun surgery : making a change require many changes (move or extract)
- workflow : save original, simplify and tests for correctness (chance of error is high for a few line), review changes
- risks
- introduce new bugs
- expensive
- easy to waste time (focus on the 20%)
- refactor when : changes needed, add method of class, fix bug, smells
- do not refactor when : close to deadline, API published, behind a clean interface, someone else is working with the code
- composing fields, methods and classes
- extract method : improves clarity
- inline method
- remplace temporary var by a method
- subsitude algorithm
- class with too much functionnality : divide
- duplicated code
- in same class or method : extract
- in sibiling classes : extract and pull up
- in different classes : extract
- different algorhtm : subsitute algorithm and extract
- inheritance
- pull up fields
- share same features : creates superclass
- replace inheritance with delegration
- latency : time from submission to receipt of first bit (threshold ~200ms latency)
- throughput: # of ops per second
- 80/20 rule : 20% of code use 80% of runtime
- focus on observable performance : end user matterns
- improve performance receipt
- turn on compiler optimizations
- try another compiler
- adapt performance requirements
- get faster hardware
- change high-level design
- tune the code : only if really needed
- process
- write functionnaly complete software including tests
- save known-good version
- profile code to find bottlenecks
- premature optimization is the root of all evil (Donald Knuth)
- measurements
- profilers : instrumentation, dump, interpret
- sampling-based profilers : minimize interference
- duration : wall-clock time, CPU ticks, app ticks vs kernel ticks, synchronization time
- page faults
- cache misses
- branch mispredictions
- best pratices
- make common case fast
- keep infrequent case correct
- low performance sources
- unnecessary IO
- cache miscomprehension : matrix multiplication
- control flow
- if : fastest test first and most likely to be true
- prefer switch statements
- loop early exit
- loop unswitching : if inside loop
- loop jamming : join similar loop
- loop unrolling
- place busiest loop inside : swap loop reduce condition checks
- replace operation with cheaper ones
- data accesses
- caching lacks
- memoization
- precomputing results : constants
- automation
- blackbox testing : determine equivalence classes of inputs (fast, easy, manual configuration, redundance)
- input generation : derived from specifications, probabilistics, corner cases, grammers
- whitebox testing : determine equivalence classes of program paths (SMT solvers)
- contracts : precondition and postcondition
- static code analysis : slow, hard to scale
- manage issues over time : prioritize, use filter, highlight new issues wiht bug trackers
-
endoscopic soft. analysis
- in vitro : mockup environement
- in vivo : real environement
- automatically prune paths that are out of interests
- merge similar nodes to collaspe paths without loss of precision
-
immunity against bugs
- autoimmune disease : deadlock
- vaccination : faster immunity through collaboration
- learn to avoid previously seen failures
-
diagnosis of failed soft.
- speed up the debugging of complex soft.
-
soft. rating service : 100% objective
- evaluate reliability automatically
- publish results and create comparison
- explan and quantify reliability to consumers
-
crowd-sourced dependability