Software Engineering Best Practices: A Comprehensive Guide

Ethics

What is Human Flourishing? According to Harvard’s Human flourishing program: Human flourishing is composed of 5 central domains: happiness & life satisfaction, mental & physical health, meaning & purpose, character & virtue, & close social relationships. Why Human Flourishing? Universal Declaration of Human Rights: “All human beings are born free and equal in dignity and rights.” Declaration of Independence: “We hold these truths to be self-evident…”, Internal Compass, Faith. Algorithmic Bias: Algorithms affect: Where we go to school, Access to money, Access to health care, Receiving parole, Possibility of Bail, Risk Scores. Therac-25: Bug (race-condition) in software lead to at least 6 deaths Traced to: Lack of reporting bugs, Lack of proper due diligence, Engineers were overconfident, removed hardware locks, Race condition of 8 seconds could lead to problems. Code of Ethics: Research shows that the code of ethics does not appear to affect the decisions made by software developers. Does my software respect the humanity of the users? 6 human sensitivities: Emotional, Attention, Sense making, Decision making, Social Reasoning, and Group Dynamics. User Centered Design: User centered design tries to optimize the product around how users can, want, or need to use the product, rather than forcing the users to change their behavior to accommodate the product. Does my software amplify positive or negative behavior for users and society at large? Anil Dash on how to prevent abuse: You should have real humans dedicated to monitoring and responding to your community. You should have community policies about what is and isn’t acceptable behavior. Your site should have accountable identities. You should have the technology to easily identify and stop bad behaviors. You should make a budget that supports having a good community, or you should find another line of work. Will my software’s quality impact the humanity of others? Engineering ethics: Ethics applies and is formalized in many professional fields: medical, legal, business, and engineering. The first codes of engineering ethics were formally adopted by American engineering societies in 1912- 1914. In 1946 the National Society of Professional Engineers (NSPE) adopted their first formal Canons of Ethics. Professional ethics encompass the personal, and corporate standards of behavior expected by professionals. First three “professions”: Divinity, Law, Medicine. Legal Malpractice: Not every mistake is legal malpractice. For malpractice to exist: Attorney must handle a case inappropriately due to negligence or with intent to harm And cause damages to a client. Malpractice vs. Negligence: Negligence is a failure to exercise the care that a reasonably prudent person would exercise in like circumstances. Malpractice is a type of negligence; it is often called”professional negligenc”. It occurs when a licensed professional (like a doctor, lawyer or accountant) fails to provide services as per the standards set by the governing body “standard of car”), subsequently causing harm to the plaintiff. 

CI/Deployment

CI helps us catch errors before others see them. Agile values fast quality feedback loops: Faster feedback = lower cost to fix bugs. CI is triggered by commits, pullrequests, and other actions. Attributes of effective CI processes: Policies: Do not allow builds to remain broken for a long time, CI should run for every change, CI should not completely replace precommit testing. Infrastructure: CI should be fast, providing feedback within minutes or hours, CI should be repeatable (deterministic). Effective CI processes are run often enough to reduce debugging effort: Failed CI runs indicate a bug was introduced, and caught in that run, More changes per-CI run require more manual debugging effort to assign blame, A single change per-CI run pinpoints the culprit. Effective CI processes allocate enough resources to mitigate flaky tests: Flaky tests might be dependent on timing (failing due to timeouts), Running tests without enough CPU/RAM can result in increased flaky failure rates and unreliable builds. CI in practice at Google: Large scale example: Google TAP: 50,000 unique changes per-day, 4 billion test cases per-day, Pre-submit optimization: run fast tests for each individual change (before code review). Block merge if they fail, Then: run all affected tests; “build cop” monitors and acts immediately to roll-back or fix, Build cop monitors integration test runs, Average wait time to submit a change: 11 minutes. How can we continuously update our software in production? Cloud Computing enables CD. Many apps rely on common infrastructure: Content delivery network: caches static content “at the edge” (e.g. cloudflare, Akamai), Web servers: Speak HTTP, servestatic content, load balance between app servers (e.g. haproxy, traefik), App servers: Runs our application (e.g. nodejs), Misc services: Logging, monitoring, firewall, Database servers: Persistent data. What is the infrastructure that needs to be shared?: Our apps run on a “tall stack” of dependencies, Traditionally this full stack is self-managed, Cloud providers offer products that manage parts of that stack for us: “Infrastructure as a service”, “Platform as a service”, “Software as a Service”. Multi-Tenancy creates economies of scale: At the physical level: Multiple customers’ physical machines in the same data center, Save on physical costs (centralize power, cooling, security, maintenance), At the physical server level: Multiple customers’ virtual machines in the same physical machine, Save on resource costs (utilize marginal computing capacity – CPUs, RAM, disk) At the application level: Multiple customer’s applications hosted in same virtual machine, Save on resource overhead (eliminate redundant infrastructure like OS), “Cloud” is the natural expansion of multi-tenancy at all levels. Cloud infrastructure scales elastically: “Traditional” computing infrastructure requires capital investment: “Scaling up” means buying more hardware, or maintaining excess capacity for when scale is needed, “Scaling down” means selling hardware, or powering it off, Cloud computing scales elastically: “Scaling up” means allocating more shared resources, “Scaling down” means releasing resources into a pool, Billed on consumption (usually per-second, per-minute or per-hour). Cloud services gives on-demand access to infrastructure, “as a service”: Vendor provides a service catalog of “X as a service” abstractions that provide infrastructure as a service, API allows us to provision resources on-demand, Transfers responsibility for managing the underlying infrastructure to a vendor. Infrastructure as a Service: Virtual Machines • Virtual machines: • Virtualize a single large server into many smaller machines • Separates administration responsibilities for physical machine vs virtual machines • OS limits resource usage and guarantees quality per-VM • Each VM runs its own OS • Examples: • Cloud: Amazon EC2, Google Compute Engine, Azure • On-Premises: VMWare, Proxmox. The operating system allows several apps to share the underlying hardware. A virtual machine allows shared hardware. Virtual Machines facilitate multitenancy • Multi-Tenancy • Multiple customers sharing same physical machine, oblivious to each other • Decouples application from hardware • virtualization service can provide “live migration” transparent to the operating system, maximizing utilization • Faster to provision and release • VM v. physical machines == ~mins v. ~hours. Virtual Machines to Containers • Each VM contains a full operating system • What if each application could run in the same (overall) operating system? Why have multiple copies? • Advantages to smaller apps: • Faster to copy (and hence provision) • Consume less storage (base OS images are usually 3-10GB). Continuous Delivery: • “Faster is safer”: Key values of continuous delivery • Release frequently, in small batches • Maintain key performance indicators to evaluate the impact of updates • Phase roll-outs • Evaluate business impact of new features Continuous Delivery != Immediate Delivery • Even if you are deploying every day (“continuously”), you still have some latency • A new feature I develop today won’t be released today • But, a new feature I develop today can begin the release pipeline today (minimizes risk) • Release Engineer: gatekeeper who decides when something is ready to go out, oversees the actual deployment process. Split Deployments Mitigate Risk • Idea: Deploy to a complete production-like environment, but don’t have users use it, collect preliminary feedback • Lower risk if a problem occurs in staging than in production. Test driven development: • Write and maintain tests per-feature • Unit tests help locate bugs (at unit level) • Integration/system tests also needed to locate interaction-related faults • Continuous delivery: • Write and maintain high-level observability metrics • Deploy features one-at-a-time, look for canaries in metrics • Write fewer integration/system tests

LLM

Large Language ModelsLanguage Modeling: Measure probability of a sequence of words • Input: Text sequence • Output: Most likely next word *not actual size • LLMs are… large • GPT-3 has 175B parameters • GPT-4 is estimated to have ~1.24 Trillion • Google Gemeni is rumored to have ~1.5 Trillion • Pre-trained with up to a PB of Internet text data • Massive financial and environmental cost. Access through API calls – OpenAI, Google Vertex AI, Anthropic, Hugging Face. LLMs are far from perfect: • Hallucinations • Factually Incorrect Output • High Latency • Output words generated one at a time • Larger models also tend to be slower • Output format • Hard to structure output (e.g. extracting date from text). Consider alternative solutions, error probability, risk tolerance and risk mitigation strategies: Alternative Solutions: Are there alternative solutions to your task that deterministically yield better results? Eg: Type checking Java code Error Probability: How often do we expect the LLM to correctly solve an instance of your problem? This will change over time. Eg: Grading mathematical proofs Risk tolerance: What’s the cost associated with making a mistake? Eg: Answering emergency medical questions Risk mitigation strategies: Are there ways to verify outputs and/or minimize the cost of errors? Eg: Unit test generation. Basic LLM Integration: Parameters (Demo) Model: gpt-3.5-turbo, gpt-4, claude-2, etc. • Different performance, latency, pricing… Temperature: Controls the randomness of the output. • Lower is more deterministic, higher is more diverse Token limit: Controls token length of the output. Top-K, Top-P: Controls words the LLM considers (API-dependent). Techniques to improve performance: Prompt Engineering: Rewording text prompts to achieve desired output. Low-hanging fruit to improve LLM performance! Popular prompt styles • Zero-shot: instruction + no examples • Few-shot: instruction + examples of desired input-output pairs. Chain of Thought Prompting: Few-shot prompting strategy • Example responses include reasoning • Useful for solving more complex word problems [arXiv] Example: Q: A person is traveling at 20 km/hr and reached his destiny in 2.5 hr then find the distance? Answer Choices: (a) 53 km (b) 55 km (c) 52 km (d) 60 km (e) 50 km A: The distance that the person traveled would have been 20 km/hr * 2.5 hrs = 50 km. The answer is (e). Fine-Tuning: Retrain part of the LLM with your own data • Create dataset specific to your task • Provide input-output examples (>= 100) • Quality over quantity! RAG: Retrieval-Augmented Generation • Used when you want LLMs to interact with a large knowledge base (e.g. codebase, company documents) Pipelines: Break a large task into smaller sub-tasks • Use LLMs to solve subtasks • Function/microservice for each one Pros: • Useful for multi-step tasks • Maximum control over each step Challenges: • Standardize LLM output formats (e.g. JSON) • Implement multiple services and LLM calls. Estimating operational costs: Most LLMs will charge based on prompt length. Use these prices together with assumptions about usage of your application to estimate operating costs. Some companies (like OpenAI) quote prices in terms of tokens – chunks of words that the model operates on. Understanding and optimizing latency/speed: Making inferences using LLMs can be slow… Strategies to improve performance: ● Caching – store LLM input/output pairs for future use ● Streaming responses – supported by most LLM API providers. Better UX by streaming response line by line. Reinforcement Learning from Human Feedback: Use user feedback, and interactions to improve the performance of your LLM application. Basis for the success of ChatGPT. 

LLM as Tools

The Stochastic parrot: LLMs still struggle with large context windows and detail oriented spaces even with it’s many techniques to improve performance in those areas • “Chat-GPT Lawyer”: lawyer who submitted a legal report largely created by Chat-GPT • responses described as filled with “bogus judicial decisions , bogus quotes, and bogus internal citations.” • Example of a larger problem of hallucinations in detail oriented tasks. Developers are using LLMs as a part of their workflow to generate code They can do a lot, but not everything. 

LLM 1. Prompt Engineering affects quality by way question is written, list of products sizes prices, example how to calculate prices, request it to LLM. 2. Privacy issue with address, ask zipcode area anonymize LLM. 3. relevance, personalization, variation, consistency. LLM As Tools 1. Complexity and volume, LLMs have processing limitations for large inputs, lack of specificity. 2. Generating personalized marketing content for users based on order history data driven scalability efficiency. 3. Provide context, iterative feedback, divide and conquer. Ethics 1.Privacy and data security, accessibility and inclusivity, user-centric design. 2. Community buildings sustainability. 3. Reliability, accuracy, safety, social interaction. SoftwareQuality 1.Boundaryanalysis(edge case), equivalence classes(formatting, out of bounds), text all combinations. 2.DeliberatePrudent(chose simpler way but no scalability), RecklessInadvertent(skips testing), RecklessDeliberate(nosecurity), PrudentInadvertent(bad design). 3. CI Pipeline-automates testing,early bug detection,improved developer productivity,reliable releases. Security&Privacy 1. Authentication(Phishing), Authorization(Privilege Escalation Attacks), Confidentiality(Man-InMiddle). 2. Data entry, storage 3. Data leak, data deletion


Software Quality

Internal Quality: • Is the code well structured? • Is the code understandable? • How well documented? External Quality: • Does the software crash? • Does it meet the requirements? • Is the UI well designed? Failure: • Manifested inability of a system to perform required function. Defect (fault): • missing / incorrect code Error (mistake): • human action producing fault. • Testing: Attempt to trigger failures • Debugging: Attempt to find faults given a failure. Principles of Testing #1: Avoid the absence of defects fallacy.          • Testing shows the presence of defects • Testing does not show the absence of defects! • “no test team can achieve 100% defect detection effectiveness” Principles of Testing #2: Exhaustive testing is impossible ● A simple function, 1 input, string, max. 26 lowercase characters + symbols (@,.,_,-) ● Assume we can use 1 zettaFLOPS: 1021 tests per second Principles of Testing #3: Start testing early • To let tests guide design • To get feedback as early as possible • To find bugs when they are cheapest to fix • To find bugs when have caused least damage. Principles of Testing #4: Defects are usually clustered • “Hot” components requiring frequent change, bad habits, poor developers, tricky logic, business uncertainty, innovative, size, … • Use as heuristic to focus test effort Principles of Testing #5: The pesticide paradox “Every method you use to prevent or find bugs leaves a residue of subtler bugs against which those methods are ineffectual.” • Re-running the same test suite again and again on a changing program gives a false sense of security • Variation in testing Principles of Testing #6: Testing is context-dependent Principles of Testing #7: Verification is not validation Verification • Does the software system meet the requirements specifications? • Are we building the software right? Validation • Does the software system meet the user’s real needs? • Are we building the right software? Test design techniques:Opportunistic/exploratory testing: Add some unit tests, without much planning • Specification-based testing (“black box”): Derive test cases from specifications • Boundary value analysis • Equivalence classes • Combinatorial testing • Random testing: • Structural testing (“white box”): Derive test cases to cover implementation paths • Line coverage, branch coverage. Specification Testing: Tests are based on the specification Advantages: • Avoids implementation bias • Robust to changes in the implementation • Tests don’t require familiarity with the code • Tests can be developed before the implementation What about exhaustive testing?: Idea: Try all values! • age: int (2 – 117) years • datetime: DateTime (hh:mm + M/D/Y) • rideTime: int (in minutes, 1 – 2 Hours) • is_public_holiday: bool (2 values) Exhaustive testing is usually impractical – even for trivially small problem Key problem: choosing test suite • Small enough to finish in a useful amount of time • Large enough to provide a useful amount of validation Alternative: Heuristics. Equivalence Partitioning: • Identify sets with same behavior (equivalence class) • Try one input from each set • Equivalence classes derived from specifications (e.g., cases, input ranges, error conditions, fault models) • Requires domain-knowledge The category-partition method: • Identify the parameters • The domains of each parameter • From the specs • Not from the specs • Add constraints (minimize) • Remove invalid combinations • Reduce number of exceptional behaviors • Generate combinations Boundary-value: analysis Key Insight: Errors often occur at the boundaries of a variable value. Pairwise testing: Key Insight: some problems only occur as the result of an interaction between parameters/components • Examples of interactions: • The bug occurs for senior citizens traveling on weekends (pairwise interaction) • The bug occurs for senior citizens traveling on weekends during peak hours (3-way interaction) • The bug occurs for adults traveling long trips during public holidays that are weekends. (4-way interaction). Technical debt: Internal quality makes it easier to add features. Technical Debt != Bad Internal Quality: “In software-intensive systems, technical debt consists of design or implementation constructs that are expedient in the short term but that set up a technical context that can make a future change more costly or impossible. “ “Technical debt is a contingent liability whose impact is limited to internal system qualities – primarily, but not only, maintainability and evolvability.” High internal quality is an investment. What actions cause technical debt?: Tightly-coupled components Poorly-specified requirements Business pressure Lack of process Lack of documentation 41 Lack of automated testing Lack of knowledge Lack of ownership Delayed refactoring Multiple, long-lived development branches. Managing technical debt: Organizations needs to address the following challenges continuously: 1. Recognizing technical debt 2. Making technical debt visible 3. Deciding when and how to resolve debt 4. Living with technical debt. Common Anti-Patterns: • Not having a QA process! Or no-one follows it • Bad version control practices • Slow and encumbering QA processes • Reliance on repetitive manual labor • focused on superficial problems rather than structural ones • results may vary (e.g., manual testing) • mistakes will happen!

Open Source

OSS: • Source code availability • Right to modify and creative derivative works • (Often) Right to redistribute derivate works. proprietary software: a black box: • Intention is to be used, not examined, inspected, or modified. • No source code – only download a binary (e.g., an app) or use via the internet (e.g., a web service). • Often contains an End User License Agreement (EULA) governing rights and liabilities. • EULAs may specifically prohibit attempts to understand application internals. Why Go Open Source (vs. Proprietary)?: Advantages: • Transparency, gain user trust • Many eyes: crowd-source bug reports and fixes •Security: more likely for vulnerabilities to be quickly identified • Community and adoption: get others to contribute features, build stuff around you, or fork your project. • Security: more likely for vulnerabilities to be quickly identified • Community and adoption: get others to contribute features, build stuff around you, or fork your project Disadvantages: • Reveal implementation secrets • Many eyes: users can find faults more easily • Security: more likely for others to find vulnerabilities first • Control: You may not be able to influence the long-term direction of your platform. OSS has many stakeholders: Core members: • Often (but not always) includes the original creators • Direct push access to main repository • May be further split into admin roles and developers • External contributors: • File bug reports and report other issues • Contribute code and documentation via pull requests • Other supporters: • Beta testers (users) • Sponsors (financial or platform) • Steering committees or public commenters (for standards and RFCs) • Spin-offs: • Maintainers of forks of the original repository. Common requirements: • Coding style (recall: linters) and passing static checks • Inclusion of test cases with new code • Minimum number of code reviews from core devs • Standards for documentation • Contributing licensing agreements (more on that later) Use of open source software within companies: • Is the license compatible with our intended use? • More on this later • How will we handle versioning and updates? • Does every internal project declare its own versioned dependency or do we all agree on using one fixed (e.g., latest) version? • Sometimes resolved by assigning internal “owners” of a third-party dependency, who are responsible for testing updates and declaring allowable versions. • How to handle customization of the OSS software? • Internal forks are useful but hard to sync with upstream changes. • One option: Assign an internal owner who keeps internal fork up-to-date with upstream. • Another option: Contribute all customizations back to upstream to maintain clean dependencies. • Security risks? Supply chain attacks on the rise. MIT License: • Simple, commercial-friendly license • Must retain copyright credit • Software is provided as is • Authors are not liable for software • No other restrictions Apache License: • Similar to MIT license • Not copyleft • Not required to distribute source code • Does not grant permission to use project’s trademark • Does not require modifications to use the same license BSD License: • No liability and provided as is. • Copyright statement must be included in source and binary • The copyright holder does not endorse any extensions without explicit written consent

Security and Privacy
Attacking the Network: • Examples • Person-in-the-middle attack • Sniffing • Spoofing • We must assume the network is not secure • We must guard against a compromised network Person-in-the-Middle: • Someone that can intercept network traffic • Can read the messages (coming and going) • Can change the messages before sending them on (to the correct or incorrect destination) Sniffing, Eavesdropping, etc: • You can listen to the traffic going by on the net • This is typically traffic on your subnet • Still it can be most interesting • If you can plug in to the backbone Spoofing: • Pretending to be someone you’re not • IP spoofing • Pretending to a “client” you’re not (with a specific IP address) • E-mail Spoofing • DNS spoofing • Pretending to be a server that you’re not • Fool a DNS server to give out incorrect IP addresses for DNS Names The “big three” concepts in network security: • Authentication • Knowing with whom you are communicating • User knowing the server and/or server knowing the user • Authorization • User having privilege to perform an operation on server • Confidentiality • Communicating without others knowing what’s been said • Intermediaries cannot change what was said • Typically includes protection from replay attack (Typically does *not* provide secrecy of communication. Others can know communication occurred) • Sniffing?Confidentiality • Spoofing? Authentication • Person-in-the-middle Attack?Authentication + Confidentiality A hash function is a one-way encoding of data • Same input, same output •Different output, different input. Secret Key Cryptography: Like in the old movies and spy books • One key (K) • Shared Secret • Used to encrypt and decrypt • Notation: {data}K Public Key Cryptography: • Key Pair (key 1 & key 2) • Either key can be used to encrypt (key 1 or key 2) • You can only decrypt using the “other key” (key 2 or key 1) • One key is given out (the public key) • The other key is kept secret (the private key) • Notation: For entity X, we havekeys Xpub & Xpriv • A public key can be given out freely to • Encrypt data sent to the holder (X) of the private key • Notation: {data}Xpub
Patents and Dependencies
“A patent is an exclusive right granted for an invention, which is a product or a process that provides, in general, a new way of doing something, or offers a new technical solution to a problem. To get a patent, technical information about the invention must be disclosed to the public in a patent application.” What rights do patents grant? • Patents don’t give you the right to make, use, or sell an invention. • Patents do give you the right to exclude others from making, using, and selling an invention for the term of a patent (20 years) ● stop or sue others ● licensing and royalties. What’s the difference? Patents vs. Copyright • Copyrights cover the details of expression of a work • Copyrights don’t cover any ideas Patents only cover ideas and the use of ideas • Copyrights happen automatically. Patents are issued by a patent office in response to an application. Why do patents exist? • Encourage disclosure of inventions • Reward invention and creativity • Protect investment of capital into R&D • Encourage the market to “design around” • Protect small companies from large ones. Problem: Only large organizations benefit • The patent system relies on people to challenge bad patents ● requires considerable time, money, and legal expertise ● the US legal system requires both parties to pay legal fees (c.f., losers pay costs in Europe) * • US software patents cost between $15,000 to $45,000! ● that’s before you even apply for international patents! What is a Dependency? • Core of what most build systems do • “Compile” and “Run Tests” is just a fraction of their job • Examples: Maven, Gradle, NPM, Bazel, … • Foo->Bar: To build Foo, you may need to have a built version of Bar • Dependency Scopes: • Compile: Foo uses classes, functions, etc. defined by Bar • Runtime: Foo uses an abstract API whose implementation is provided by Bar (e.g. logging, database, network or other I/O) • Test: Foo needs Bar only for tests (e.g. JUnit, mocks) • Internal vs. External Dependencies • Is Bar also built/maintained by your org or is it pulled from elsewhere using a package manager? Transitive Dependencies Packages: can depend on other packages. Resolutions to the Diamond Problem: 1. Duplicate it! • Doesn’t work with static linking (e.g. C/C++), but may be doable with Java (e.g. using ClassLoader hacking or package renaming) • Values of types defined by duplicated libraries cannot be exchanged across 2. Ban transitive dependencies; just use a global list with one version for each • Challenge: Keeping things in sync with latest • Challenge: Deciding which version of transitive deps to keep 3. Newest version (keep everything at latest) • Requires ordering semantics • Intermediate dependency may break with update to transitive 4. Oldest version (lowest denominator) • Also requires ordering semantics • Sacrifices new functionality 5. Oldest non-breaking version / Newest non-breaking version • Requires faith in tests or semantic versioning contract. Semantic Versioning Contracts: • Largely trusting developers to maintain them • Constrained/range dependencies can cause unexpected build failures • Automatic validation of SemVer is hard. Cyclic Dependencies: • A very bad thing • Avoid at all costs • Sometimes unavoidable or intentional