Skip to content

Latest commit

 

History

History
707 lines (667 loc) · 28.9 KB

cs305.md

File metadata and controls

707 lines (667 loc) · 28.9 KB

CS305

GNU General Public License v3.0 licensed. Source available on github.com/zifeo/EPFL.

Fall 2015: Software Engineering

[TOC]

Introduction

  • casual vs pro soft. dev.
  • real soft. : millions of LoC, written by team, maintained and evolved over time
  • can cause real harm if soft. fails
  • quite new science
  • soft. depends on people : fallible, written in group
  • topics
    • soft. dev. tools
    • soft. dev. methodology
    • modularity and abstraction
    • verification and validation
    • design patterns
    • coding conventions
    • refactoring code
    • secure programming
    • good performance
    • program analysis and defect detection tools
    • soft. engineering research
  • agile soft. dev. : scrum
    • collections of practices for a small team
    • lightweight
    • organized in short-term called sprints (complete iterations)
    • close interaction with customer : no advance planning
    • transparent : everything is showed to the team (included in-person progress)
  • needs
    • safety
    • security
    • reliability
    • performance
    • manageability
  • soft. errors is costly
  • need for reliability : mars, rocket, cars
  • improve developers productivity
  • stimulate programmers creativity
  • make cheaper to build good software

Software engineering

  • powerful tools
  • management : large soft. project need coordinations (and budjet control)
  • understand the problem
    • read carefully
    • digest and think
    • break down the problem
    • separate thinking from execution
  • time management : key to success
    • make a plan with deadlines
    • expect the unexpected : margin of error (nasty bugs)
    • track time-to-deliver, learn from mistakes (analysis)
  • team coordination
    • designate a part-time manager : rotate management role among team members
    • tracker system : split problem into individual, assign task to people
    • how to deal with slacker : set per-task deadlines, present concrete evidence
  • team effort
    • code reviewed by peers : >= on each piece of code
    • put code in the public eye : public tests and results
    • get involved in others code
  • think of code as write-once/read-many
    • focus on code readability
    • adopt a specific coding style for the team
    • self-documenting code repository : informative commit messages, enable checkstyle
    • code in repository should always compile
  • use quantitative metrics
    • measurement : #1 rule in engineering
    • examples : code size, test effectiveness/efficiency, debugging effectiveness/efficiency, productivity (LoC)
  • what if you screw up
    • avoid foolish optimism
    • add more people : only if clear splitted tasks
    • best choice is reduce scope of project : less functionality, partition into (must-have/nice-to-have/optional)
  • work smarter
    • be agile : prototype and throw if not good, fix inefficiencies on the go, don't aim for perfection
    • be smart : factor out things need > 3 usages
    • love what you do : invest time in learning the full power of tools, contribute to open-source projects, news

Building software

  • defining the problem : what problem should the software solve? (product vision, definition)
    • brief : less than 2 pages
    • no reference to possible solutions
    • from users point of view
  • formulating requirements : what should the software do to solve the problem? (functional specification)
    • acts as a contract between user and developer
    • makes users requirements explicit
  • good functional specifications
    • all user tasks : description, expected reponse time, success vs failure
    • all system inputs : source, accuracy, value range frquency of arrival
    • all system outputs : destination, accuracy, value range, format
    • all interfaces with the rest of the world : hw, sw, communication interfaces
  • good functional specs
    • requirements are verifiable : quantitative vs qualitative assessment
    • competing attriutes can be resolved
    • clear connection to problem definition
  • requirement change
    • customer does not know what he wants
    • 25% of requirements change during development : account for 75% of the amout of work
  • software architecture
    • data design : data structures, DB tables, interoperability
    • user interface : GUI, CLI, UI replaceable (evolution and testing)
    • resources : memory, disk space, threads, etc.
    • level of fault tolerance
    • detection and handling of errors
    • build vs reuse
    • internationalization
    • design for change extensibility
  • block approach : requirements -> architecture -> detail design -> construction -> testing
    • when requirements are stable
    • design is straighforward and well understood
    • development team need not to experiment
  • agile approach : bounce between everything
    • when requirement are blurry
    • complex and challenging design
    • dev. team not familiar with the soft. being built

Dev. tools

  • preprocessors : macros
  • compilers : static analysises, complexity metrics
  • linker
  • builders
  • testing/debugging framework : coverage tools, fault injectors, logging
  • profiler : identify bottlenecks, time spent
  • IDE
  • VCS
  • continuous integrations server
  • documentation automators
  • CASE : computer aided soft. engineering (code generation)

Version control systems

  • checkpoints of a project
  • rebuild any version of the project (revert if needed)
  • finds bugs introduced
  • save and copy the code
  • trace evolution and issue
  • share code dev.
  • branching allow parallel work
  • conflict : modification overlap
    • syntactic
    • semantic : change behaviour

Code layout

  • logical structure of program
  • make the code understandable
  • whitespace : key to understandable text
  • indentation : 2-4 spaces optimal
  • parentheses to clarify
  • class layout
    • header comment
    • class data
    • public methods
    • protected methods
    • private methods
  • files : separate files for interface and implementations, one class per file, name of file after the name of the class
  • formatting : length of line <= 120 chars. and length of class <= 2'000
  • imports : avoid wildcard, avoid platform-specific, group imports by package
  • methods parameters : no alignmenent
  • simple methods : small and focused (not > 50 LoC, max 7 params)
  • braces : always
  • vertical alignements can make things more readable
  • split complex predicates cleanly
  • multi-line statements : make incompleteness obvious
  • each line should do one thing
  • var names
    • short but clear
    • long names for rarely used variables
    • loop indexes only inside of loop : i, j, k
    • computed-value qualifies at end of vars : total, sum, average, max, min, count
    • common opposites : begin/end, first/last, locked/unlocked, min/max, next/previous, old/new, opened/closed, visible/invisibe, source/target, source/destination, up/down
    • purpose-driven naming (no temp)
    • no flag in the name
    • avoid hybrid coupling (hidden meaning) : prefer things like positiveBytes
  • separate namespaces
  • use enums liberally
  • bools : use done, error, found, success names but positive meaning
  • shortening names : remove non-leading vowels, remove articles, remove useless endings, stay consistent, should pass the telephone test
  • data declaration : put in comments what cannot go in name
  • avoid numerals namings or commonly misspelled words
  • improper initilization : fertile source of bugs, should be close to usage, can be set to final
  • use const instead or hardcoded value

Metrics and symptoms

  • variable span : LoC between successive references
  • variable liveness interval : LoC between first and last reference (can be reordered at compile time)
  • variable scope : program space where var is valid
  • in general, try to minimize these
  • soft. system : group of interconnected components that exhibits an exprected collective behavior observed at the interfaces with its environment
  • symptoms of complexity
    • large number of components : libs
    • large number of interconnections
    • many irregularities and exceptions
    • high "Kolmogorov" complexity : minimal lenght of a description of the object

Modularity and abstraction

  • structured programming
    • single-entry/exit control constructs
    • no unpredictable jump
    • can reason about
  • OOP
    • encapsulation
    • contain behavior inside of modules
    • isolation
    • configure and maintain separately
  • virtualization : server consolidation, on-demand cloud computing, replication
  • modules : replaceable units
  • top design : break down into major subsystems, define interfaces for them, internal class design left to programmer
  • example of abstractions : routines, lambda, objects

Good interfaces

  • guidlines
    • clean and lean : compartmentalized, avoid being too clever, aim for minimality
    • loosely coupled : minize bonds to other classes, avoid implicit connections
    • portable : avoid exposing environement
    • extensible : anticipate change
    • stratified : use layers of abstraction, nested classes
    • reusable : capture the right level of fundamental attributes correspondin to level of abstraction
    • maintainable : self-explanatory design, use hierarchy, design for testability, be paranoid
  • hide what is likely to change : protects from legacy code

Audit methods and defensive programming

  • use invariants (always truee)
  • pre/post-conditions
  • asserts : assumptions, impossible case, documentation (java -ea to enable), default switch statement
  • audit can check invariant at call
    • testing, debugging during execution
    • beware of concurrency
    • return the number of errors
  • garbage in output non garbage
  • source of garbage : errors, corruption, unpredictable events
  • handling garbage
    • nothing out
    • error message out
    • clean input
  • things to check : reference not null, valid range, stream status, file access type
  • throw exception if bad input
    • at right level of abstraction
    • informative
    • cost : cost of getting wrong input x probability of it being wrong
  • fix invalid data : previously used value, neutral value, closest legal value
  • be careful on known truths
  • sanity checks for your program
  • defensive copying

Testing

  • systematic process of running a program and verifying
  • prevent bug respawn
  • "Program testing can be used to show the presence of bugs, but never to show their absence!" Dijkstra
  • definitions : bug (fault) -> error -> failure
    • error : a discrepancy between a computed, observed, or measured value or condition and the true, specified, or theoretically correct value or condition
    • failure : the inability of a system or component to perform its required functions within specified performance requirements
    • fault : an incorrect step, process, or data definition in a computer program which causes the program to perform in an unintended or unanticipated manner
    • bug : a fault in a program which causes the program to perform in an unintended or unanticipated manner
  • goal of testing is to identify faults
  • types of tests
    • unit tests : self-contained test of a method, class, module
    • integration tests : do the pieces work together
    • check-in tests (smoke tests) :inexpensive, broad coverage of apps's functionality (run before every check-in)
    • regression tests : thorough tests (high coverage), test of every bug that has been encountered
    • random testing (fuzz testing) : concolic testing
  • white box vs black box testing
  • cannot prove correctness
  • risk management
  • bug density : 10-100 bugs / KLoC after coding, 1-5 bugs / KLoC not detected before delivery
  • boundary testing : special cases
  • equivalence classes : two classes one check
  • traditional quality assurance
    • run code in your head
    • heed IDE warnings
    • heed compiler warnings
    • code review
    • unit testing (black + white)
    • integration testing
    • system testing
    • alpha testing
    • beta testing
  • metrics : covered if it is executed by at least one test (methods, statements, branches, paths coverage)
  • tension between quality vs cost
  • unit test : independ of each other
  • others testings
    • acceptance testing : performed by user upon receiving product
    • smoke/sanity testing : quick test for serious errors (compile)
    • compatibility testing : work elsewhere
    • fault injection : bad inputs, no network
    • performance testing
    • usability testing : can users accomplish their objectives with the software as designed ?
  • test-driven dev. : write test that fails -> make test pass -> refactor -> ..
    • reduce bug density
    • good candidates : UI logic, buisness logic, any java class
    • bad : UI appearance, client/server interaction, legacy code
    • tips : should cover > 90%, TDD at start of project, displice, each problem = test

Complexity

  • the #1 challenge
  • paths ~= 2 ^ program size (firefox: 5'000'000 LoC)
  • software verification
    • testing
    • human proofs (< 100 LoC) : mathematical view of soft. engineering
    • machine proofs (< 100'000 LoC)
  • soft. complexity grows fast
  • cannot only rely on testing

Preventing and finding errors early

  • building bridge vs soft. dev. : fundamental difference is that bridge is based on immutable laws
  • average vs best programmer
    • initial coding time : 20x ratio
    • debugging time : 20x ratio
    • program execution speed : 10x ratio
    • 80% of contribution from 20% of programmers
  • debugging is 50% average dev. cycle
  • long-lived errors are expensive : finding errors early reduces time and cost >2x
  • cost of downtime : from email app (2000$/h) to trading app (75000$/h)
  • test-driven dev. : verification before coding
  • behavior-driven dev. : validation before coding
  • agile dev. processes : shorten time between coding and error discovery
  • discipline and rigor

Design patterns

  • building blocks
  • experience = toolbox of reusable solutions
  • description : name, problem, solution, consequences
  • creational patterns
    • Abstract Factory : assemble something without regard to components details (UI builder call on factory depending on the OS)
    • Builder
    • Factory Method
      • static factory methods : gain control over object creation (from cartesian, from polar)
      • original GOF (gang of four) : delegate to subclasses the decision of what instance of a class to create (polymorphism, non-static)
    • Factory Object
    • Lazy Initialization
    • Prototype
    • Singleton : only one object (static get intance, should check for reflexion attacks, can use enum)
  • structural patterns
    • Adaptor
    • Bridge
    • Composite
    • Decorator
    • Façade
    • Flyweight
    • Proxy
  • behavioral patterns
    • Chain of Responsibility
    • Command
    • Interpreter
    • Iterator
    • Mediator
    • Memento
    • Observer : one-to-many dependency, subject changes and observers get notified
    • State
    • Strategy
    • Template Method
    • Visitor
  • architectural patterns
    • model-view-controller
    • service-oriented achitecture
  • concurrency patterns
    • active object
    • monitor
    • thread pool
  • basic patterns
    • encapsulation
      • problem : exposure can lead to invariance
      • solution : hide components
      • consequences : may reduce perfs
    • inheritance
      • problem : similar abstractions
      • solution : inherit default members
      • consequences : runtime dispatching introduces overhead
    • iteration
      • problem : accessing all collection members
      • solution : implementation does traversal
      • consequences : iteration order constrained by implementation
    • exceptions
      • problem : errors occur in one place
      • solution : specialized language struct
      • consequences : handling level unknown
    • iterator
      • problem : iterating over a collection with state
      • solution : encapsulate iteration
      • consequences : can fail rapidly (robust: keep modifications count at creation, check it at each iterator step, throw ConcurrentModificationException if changed)
    • strategy
      • problem : different method for different type
      • solution : encapsulate differences into a family
      • consequences : limited interface
    • model-view-controller
      • problem : differentiate gui, data and controls
      • solution : 3 modules with controller controlling model and view
      • consequences : can be fuzzy to split roles
    • adapter
      • problem : class has different interface from what caller expects
      • solution : solution template
      • consequences : overhead but multiple times used
    • decorator
      • problem : augment functionality
      • solution : dynamically add/override behavirs underneath same interface
      • consequences : large number, use factories
    • proxy
      • problem : different object might use same protocol
      • solution : isolate protocol specification with a surrogate object having the same interface (caching)
      • consequences : can overlap with other patterns
    • composite
      • problem : recursive structures of similar objects used in the same manner
      • solution : compose one or more object behind same interface, blackbox reuse
      • consequences : need to exhibit same functionality
    • visitor
      • problem : large hierarchy of dispatched overriden function (add new behavior to a group of classes without changing them)
      • solution : centralize same methods in visitor (pattern match)
      • consequences : breaks encapsulation, may need privileges

Inheritance vs containment

  • Liskov Subsitution Principle (LSP) : let q(x) be a property provable about objects x of type T, then q(y) should be provable for objects y of type S where S is a subtype if T
    • subtype must preserve supertypes invariants
    • subtype not allowed to strengthen preconditions or weaken postconditions
  • inheritance : is a
    • use only when it simplifies design
    • use final when enforcing limitation
    • can break encapsulation (favor private over protected)
    • avoid deep hierarchy (max 3 level and 7+/-2 subclasses, otherwise buggy)
    • avoid linear hierarchy
    • push common things as high up as possible
    • multiple heritance is bad (diamond problem)
  • mixins : inheritance of implementations allowed, orthogonal functionality, single purpose, not instantiable in their own
  • containement : has a
    • avoid excessive method forwarding
    • contraing class controls the interface
    • use when classes share common data but not behavior
  • design by contract
    • caller satisfy precondition : rely on postcondition
    • callee ensure postcondition : rely on precondition
    • by default, pre/post condition are assumed to be true
    • clarify obligations of caller and callee, provide documentation

UML and comments

  • natural language is less precise than code
  • comment should be checked at reviews (disagree between code and comment is nasty)
  • types of comments
    • summary : OK
    • description of code intent : OK
    • repeat, explain the code : BAD
    • code markers (issue) : OK
  • comment what, not how
  • warn the reader : e.g. workaround
  • class comments : header comment contains purpose + design
  • interface comments : dependent from implementation
  • method header : 1/2 sentence, desribe assumptions, include source (algo)
  • end-of-line comments : rarely justified (except for bug or non-obvious case)
  • comment while coding
  • coding conventions is important : 40-80% of lifetime cost of soft. goes to maintenance
  • one convention is better than none or many
  • use checkstyle

Soft. dev. processes

  • understand phases of soft. dev.
  • dev. phases
    • market research : talks to the users, evaluate feasability, cost
    • requirements gathering and analysis : define product
    • specification, planning, desing : turn requirements into design specification
    • implementation and testing : write (15%) and test with integration (50%)
    • deployment : package code
    • support and maintenance
  • prototyping model : frim structured dev. to amorphous dev.
    • waterfall model : linear, well defined, requirement must be known upfront, inflexible
    • iterative model : waterfall with a divide and conquery strategy, benefit of better control, important functionality early, temptation todefer
      • sequence of mini waterfalls
      • break down into mini waterfalls to be pursued in parallel
      • do a waterfall up to design, then do iterative prototyping
    • spiral model : identify risk early, users involved, evoluation possible, maintenance included, complex, can spiral forever, use when medium size, robust implementation more important than functionality
      • objectives (performance, interfaces), alternatives (build, reuse), constraints (cost, schedule)
      • evaluate, identify, resolve risks (lack of experience)
      • develop and verify
      • plan next phase
    • prototyping model (evolutionary model) : encourage user pariticipation, resolves unclearness, innovation, flexibility, difficult to analyse and manage, use for high-pressure, short-lived, customer not very knowledgeable
  • agile methods : scrum, adaptive soft. dev., feature driven dev., crystal clear, extreme programming, rapid appliation dev., rational unify process
  • scrum
    • basic structure : sprint (1-4 weeks)
    • working product after each sprint
    • cross-functional teams of 5-10 people
    • meet daily
    • product owner : represents customer interests, maximizes buisness value
    • team : build the product, multiple domain, self-managing
    • scrum master : ensure team success (not manager) protect team from outside interference, helps resolve impediments
    • manager : not a nanny but a guru
    • product backlog : to-do list to prioritized by value to customer (contaisn features, dev. requirements, bugs, reserach) evolves over time, team provides estimates
    • sprint planning meeting : review of product backlog, team selects items to complete, go for the sprint
    • daily scrum : what was done (rapidly), no discussion, just rapporting, and what will be done until next scrum
    • if goal not met, learn from it
    • sprint review : demo product, everyone, ask questions, last as long as needed
    • sprint retrospective : updates product backlog, can be done by someone from outside, maintain the pace

Information security

  • confidentiality : not disclosed to unauthorized parties
  • integrity : data modifications must be detected
  • availability : information available when needed
  • authenticity : each party is who they claim to be
  • accountability : actions of a party can be traced uniquely to that party
  • security controls
    • administrative : policies, procedures (enforce logical and phyical)
    • logical : encryption, access control
    • physical : locks, separation of duties
  • challenges
    • threat : deliberate (cracker), accidental (failure), environmental (tornado)
    • vulnerability : buffer underflow
    • attack : assault, intelligent act to violate the security
    • anticipate abnormal behavior
  • attacks
    • interruption : availability (disrupting traffic)
    • interception : confidentiality (eavesdropping)
    • modification : integrity (corrupting)
    • fabrication : authenticity, accountability (forging data)
    • toolkit : disassemblers, debuggers, fuzzers, malware
  • risks
    • connectivity : globally accessible
    • extensibility : plugin architectures
    • complexcity : system sizes
  • security is not a feature but a property
    • lack of planning issue
    • cannot be added
  • pillars of soft. security
    • risk management : understand and quantify
    • soft. dev. for security : best practices
    • knowledge : prescriptive (principles), diagnostic (exploits), historical
  • buisness goals
    • high ROI (return on investments)
    • high SLA (service level agreement)
    • reducing dev. costs
    • rank goals according to priority
    • identify risks and threats : into context (likelihood, severity, impact, frequency, completeness)
  • monitor and update risks over time
    • validation process : penetration tests, security tests
    • measure progress
  • known problems
    • target priviledged level : root
    • argument injection : shell scripts
    • embedding scripts in scripts : mix languages
    • buffer overflows
    • attack configuration files
    • manipulate parsing
    • abusing input sanitization : cross-site scripting, use local filename
    • client-server : directly to the server, manipulate session ID, inject scripts into variables
    • race conditions
    • misuse of cryptograhpy
    • password management : unsalted hashes
  • best pratices
    • check code quality : broken error handling
    • encapsulation : restrict interactions
    • defensive programming : separate clean inside from dirty outside
    • white list, not black list
    • decouple user input from application logic
    • least privilege
    • redundant security controls
    • multiple layers : firewall, vpn, authentication, checksums, bometric access

Refactoring

  • improve code : prevent decay, clean up, simplify, increase readability
  • avoid future problems : find bugs, reduce debugging time
  • rethink, be creative : better way to solve a problem
  • code smell : indication for refactoring, bad practices
    • feature envy : using more from another class (move or extract)
    • shotgun surgery : making a change require many changes (move or extract)
  • workflow : save original, simplify and tests for correctness (chance of error is high for a few line), review changes
  • risks
    • introduce new bugs
    • expensive
    • easy to waste time (focus on the 20%)
    • refactor when : changes needed, add method of class, fix bug, smells
    • do not refactor when : close to deadline, API published, behind a clean interface, someone else is working with the code
  • composing fields, methods and classes
    • extract method : improves clarity
    • inline method
    • remplace temporary var by a method
    • subsitude algorithm
    • class with too much functionnality : divide
  • duplicated code
    • in same class or method : extract
    • in sibiling classes : extract and pull up
    • in different classes : extract
    • different algorhtm : subsitute algorithm and extract
  • inheritance
    • pull up fields
    • share same features : creates superclass
    • replace inheritance with delegration

Performance and optimization

  • latency : time from submission to receipt of first bit (threshold ~200ms latency)
    • throughput: # of ops per second
  • 80/20 rule : 20% of code use 80% of runtime
  • focus on observable performance : end user matterns
  • improve performance receipt
    • turn on compiler optimizations
    • try another compiler
    • adapt performance requirements
    • get faster hardware
    • change high-level design
    • tune the code : only if really needed
  • process
    • write functionnaly complete software including tests
    • save known-good version
    • profile code to find bottlenecks
  • premature optimization is the root of all evil (Donald Knuth)
  • measurements
    • profilers : instrumentation, dump, interpret
    • sampling-based profilers : minimize interference
    • duration : wall-clock time, CPU ticks, app ticks vs kernel ticks, synchronization time
    • page faults
    • cache misses
    • branch mispredictions
  • best pratices
    • make common case fast
    • keep infrequent case correct
  • low performance sources
    • unnecessary IO
    • cache miscomprehension : matrix multiplication
    • control flow
      • if : fastest test first and most likely to be true
      • prefer switch statements
      • loop early exit
      • loop unswitching : if inside loop
      • loop jamming : join similar loop
      • loop unrolling
      • place busiest loop inside : swap loop reduce condition checks
      • replace operation with cheaper ones
    • data accesses
      • caching lacks
      • memoization
      • precomputing results : constants

Automated testing

  • automation
    • blackbox testing : determine equivalence classes of inputs (fast, easy, manual configuration, redundance)
    • input generation : derived from specifications, probabilistics, corner cases, grammers
    • whitebox testing : determine equivalence classes of program paths (SMT solvers)
    • contracts : precondition and postcondition
  • static code analysis : slow, hard to scale
    • analyze without execution
    • look for generic bugs : injection, NPE
    • look for functional errors : contract violations
    • findbugs
      • difficult language features
      • misunderstood API, invariants
      • code modified during maintenance
      • typos, bad operators
      • hard part : dealing with reports
  • manage issues over time : prioritize, use filter, highlight new issues wiht bug trackers

Soft. as biological organisms

  • endoscopic soft. analysis

    • in vitro : mockup environement
    • in vivo : real environement
    • automatically prune paths that are out of interests
    • merge similar nodes to collaspe paths without loss of precision
  • immunity against bugs

    • autoimmune disease : deadlock
    • vaccination : faster immunity through collaboration
    • learn to avoid previously seen failures
  • diagnosis of failed soft.

    • speed up the debugging of complex soft.
  • soft. rating service : 100% objective

    • evaluate reliability automatically
    • publish results and create comparison
    • explan and quantify reliability to consumers
  • crowd-sourced dependability