A curated list of awesome and useful posts, videos, and articles on leading a data team. This includes leadership at the middle-management, Director/VP, or C-suite level, for organizations both big and small. A few relevant engineering management articles are sprinkled in.
- Hiring (15)
- Culture (15)
- Impact (6)
- Strategy (9)
- Diversity Equity and Inclusion (3)
- Project Management (8)
- Code Review (3)
- Organization Structure and Job Titles (18)
- ML and AI Within an Organization (11)
- BI and Analytics Within an Organization (16)
- Management Skills (6)
- Data Platforms (10)
- Data Governance (4)
Author | Title | One-sentence summary | Year |
---|---|---|---|
Eli Goldberg | Hire better data scientists: A field guide for hiring managers new to data science. Part 1. Creating better job descriptions brings in better talent. | When hiring, highlight the "why you", desecribe opportunities instead of responsibilities, describe key actions and background experience needed not technologies, and proofread! | 2020 |
Eli Goldberg | Hire better data scientists: A field guide for hiring managers new to data science Part 2. Create a clear interviewing process. | Make time for hiring and use your shift in priorities to your advantage, don't "wing it", write your process down and engineer it to be data driven, and modify the process not your adherence to it. | 2020 |
Gergely Orosz | Hiring (and Retaining) a Diverse Engineering Team | Stories from six engineering leaders who succeeded in building and growing diverse teams | 2021 |
“Are we being too harsh on junior candidates?” | Reddit thread discussing expectations of junior ML job candidates | 2022 | |
Hacker News | “When did 7 interviews become normal” | A “Ask HN” forum question around the topic of over-interviewing | 2022 |
Farhan Thawar | VP of Engineering hiring cheatsheet | A guide for assessing a candidate for a engineering or data leadership role: provides good and bad responses to questions. | 2022 |
Freaking Rectange Blog | How to Freaking Find Great Developers By Having Them Read Code | When hiring for data engineering, analytics, data science, or ML Engineering roles, it would be better to have candidates try to read code instead of writing it (it can be neutral interview-only code). | 2022 |
Emily Thompson | Hiring Data Scientists With Intention | Gives guidance on: writing a focused job description, being strategic in sourcing, and designing a structured interview process so that you can be consistent in evaluating candidates. | 2022 |
Nate Rosidi | 15 Python Coding Interview Questions You Must Know For Data Science | Provides 15 examples of testing basic python dta manipulation skills for interviews. | 2022 |
Jike Chong, Ben Lorica, Yue Cathy Chang | Top Places to Work for Data Scientists: We identify U.S. organizations that will help you develop your career in data science | Looks at factors that make a data science org attractive to an IC, but this provides some insights for hiring managers trying to get in the heads of talent. | 2022 |
Randy Au | Let's talk a bit about giving interviews | Gives thoughts on planning and carrying out a technical data science interview. | 2022 |
Michelangelo D'Agostino, Katie Malone | The Care and Feeding of Data Scientists, Chapters 2 and 3 | "How to Win Friends and Recruit Data Scientists" and "Interview with the Data Scientist" has tips on recruiting and interviewing. | 2019 |
Dip Ranjan Chatterjee | The Data Science Interview Book | A very comprehensive set of topics to interview data science candidates with (spans statistics, ML, NLP, etc). | 2022 |
Tristan Handy | When to hire a data engineer? | Article makes the claim that increasingly data analysts and scientists are working on ETL pipelines themselves (with the help of Stitch, Fivetran, dbt, etc.) but data engineers are still essential for: managing core data infrastructure, building and maintaining custom ingestion pipelines, supporting data team resources with design and performance optimization, and building non-SQL transformation pipelines. | 2022 |
Jacob Kaplan-Moss | My questions for prospective employers (Director/VP roles) | This post discusses the other side of the hiring table, and gives great questions a candidate for a Director or VP-level engineering leadership role should be asking (though this post could also be helpful to hiring team thinking through the scope of a Director or VP-level role). | 2019 |
Author | Title | One-sentence summary | Year |
---|---|---|---|
Emily Thompson | Growing Data Teams from Reactive to Influential | Reactive data teams lead to low impact and attrition, so instead acknowledge if your team is reactive, assess reactivity quantitatively, focus on near-term wins for cultural change, and build longer-term foundational work into the team’s capacity | 2022 |
Prukalpa Sankar | It’s Time for the Modern Data Culture Stack | We need a modern data culture stack: best practices, values, and cultural rituals that will help data people come together and collaborate effectively. | 2021 |
Kuba Niechcial | How to set goals for engineers? | Provides some examples of good engineer personnel goals and things to keep in mind (e.g. KPIs should not be personal goals). | 2021 |
Jacob Kaplan-Moss | “Exit Interviews Are a Trap” | Rethinking the exit interview: there is very little upside (unlikely things will change) and potentially significant downside (bad blood, retracted references, malicious actions by employer, etc. | 2022 |
Christoph Neijenhuis | How to stop shrinkage in engineering teams | The journey to stopping shrinkage in engineering teams is long and rarely straightforward, but there are practical things leaders can do to take control of the chaos, from taking steps to get out of survival mode and tackling problems around culture to involving teams in the development of a solid technical strategy. | 2022 |
Caitlin Moorman | Proficiency v. Creativity | It is critical to find a balance between open-endedness/opportunities for creativity and standardized rigor when leading a data function. | 2020 |
Shimin Zhang | Why a Meeting Costs More than a MacBook Pro – the Business Case for Fewer Developers in Meetings | Describes the opportunity cost of having all developers or data engineers attending meetings and describes ways to recoup this. | 2022 |
David Waller | 10 Steps to Creating a Data-Driven Culture | Details some steps for working towards a data-driven culture, from taking care in choosing metrics to quantifying uncertainty. | 2020 |
Michael Kaminsky | A Culture of Partnership | Building a culture of partnership on your analytics team is crucial to maximizing the impact your team can have. | 2019 |
Benn Stancil | Do data-driven companies actually win? | Article discusses how much a data-driven culture actually contributes to a company's successs through a handful of hypothetical fashion companies. | 2022 |
Michelangelo D'Agostino, Katie Malone | The Care and Feeding of Data Scientists, Chapter 4 | "Fear and Loathing in Data Science" offers concrete tips on culture that help to retain your best people. | 2019 |
Benjamin Rogojan | Onboarding For Data Teams | The costs (both opportunity costs and retention problems) of poor onboarding are great, to help with this the author writes about 'Onboarding For Context', 'Environment Set-Up', and the concept of 'Commit Something Day One'. | 2022 |
Prukalpa Sankar | The “knowledge-creating” company, a big announcement and other takeaways from dbt Coalesce | Prukalpa provides thoughts around this great quote from an early 90's HBR article: "...markets shift, technologies proliferate, competitors multiply, and products become obsolete almost overnight, successful companies are those that consistently create new knowledge, disseminate it widely throughout the organization, and quickly embody it in new technologies and products; these activities define the ‘knowledge-creating’ company, whose sole business is continuous innovation." | 2022 |
Christine Garcia | The secrets of a modern data leader: The first 365 days inside a data team | Fantastic video covering how leaders should nurture their data teams, build the right team values, establish governance inside the team, create cadences and rituals, etc. | 2022 |
Claire Carroll | Data education is broken | The post explores the disconnect between data education and real data practice in industry (e.g. analyzing static flat files in R, Pandas, or SPSS compared with using SQL along with tools like git, dbt, Airflow, VSCode, etc), why this occurs, and the effects it has on the data industry. | 2021 |
Author | Title | One-sentence summary | Year |
---|---|---|---|
McKinsey | Ten red flags signaling your analytics program will fail. | A list ranging from the executive team doesn't have a clear vision for it's analytics program to nobody knows the quantitative impact that analytics is providing | 2018 |
Erik Bernhardsson | Building a data team at a mid-stage startup: a short story | A story about a fictional company that became more data-driven and how it was done. | 2021 |
Abinaya Sundarraj | Data Management: How to Stay on Top of Your Customer’s Mind? | Describes the virtues and challenges around achieving a customer-centric, data perspective in a business. | 2022 |
Mikkel Dengsøe | How to measure data quality: Practical guidelines for how to measure quality, engagement and productivity in a data team | Provides some thoughts around how to evaluate your data team and suggests three categories of metrics: quality, productivity, and engagement. | 2022 |
Sarah Krasnik | Choosing a Data Catalog | Although not technically on management, this tackles the critical topic of documentation, dictionaries, knowledge repos and such, which are critically important for a data org. | 2022 |
Chad Sanderson | The Existential Threat of Data Quality: and Why the Modern Data Stack Can't Solve It | Despite the rapidly-evolving/growing data stack, poor data quality remains an enormous problem; the article breaks it down into "downstream" and "upstream" categories. | 2022 |
Author | Title | One-sentence summary | Year |
---|---|---|---|
Prukalpa Sankar | Data Advantage Matrix: A New Way to Think About Data Strategy | Break down your data advantage into four categories (e.g. operational, strategic, product, and business opportunity) and then assess what stage each of these is at (e.g. basic, intermediate, advanced) | 2021 |
Ilan Man | Creating a Data Road Map | Provides suggestions for what factors to consider when thinking about a data roadmap or data strategy (e.g. identifying the audience, set up the scaffolding, etc.) . | 2019 |
Chris Brown | Executing a Data Strategy with OKRs | Outlines how OKRs (Objectives and Key Results) can help with executing on data strategy and provides some examples. | 2022 |
Yali Sassoon | Organizations need to deliberately create data | Organizations spend an incredible amount of time and resources extracting data from various sources, but rarely consider making their own data to generate inputs for the ML systems. | 2022 |
Leo Polovets | The Value of Data, Part 1: Using Data as a Competitive Advantage | Software and hardware infrastructure are becoming commoditized, so data you generate gives you the advantage; data helps you make good content recommendations, helps with ad targetting, gives you actionable insights, makes operations more efficient, and more. | 2015 |
Leo Polovets | The Value of Data, Part 2: Building Valuable Datasets | Describes the attributes of high-value datasets, common approaches for capturing this data, and common pitfalls people fall into during this process (e.g. consider the law of diminishing returns, how clean is your data, etc.) | 2015 |
Leo Polovets | The Value of Data, Part 3: Data Business Models | Final post in this series describes the concept of a "Data Business Model", the reality of how data can be monetized with examples of companies in each scenario. | 2015 |
Emilie Schario and Taylor A Murphy | Run Your Data Team Like A Product Team | Service-oriented data teams aren’t effective, and the authors suggest running the data team like a product team is ideal, where you take a more active roll in defining your org's success metrics and push the business forward in a more active way. | 2021 |
Jeremy Salfen | Building a Data Practice from Scratch | Provides a series of suggestions for first data hires at an early stage startup, including the following principles: "don’t worry about making things fancy", "keep an eye on how things will scale, but rein in your impulses to optimize them", and "documentation, transparency, and reproducibility are interrelated and fundamental". | 2021 |
Author | Title | One-sentence summary | Year |
---|---|---|---|
Sergio Morales | Future-proof your Analytics Efforts in 2020: Hire Diverse Teams | Post describes how data team diversity deters bias and encourages curiosity, skepticism and analytical thinking; attributes any analytics enterprise will highly value. | 2020 |
Swathi Young | How To Make Sure That Diversity In AI Works | Post provides guidance on how management teams can build diverse AI teams, including suggestions like restructuring talent acquisition, thinking through pay parity, and more. | 2021 |
Gergely Orosz | Hiring (and Retaining) a Diverse Engineering Team | Stories from six engineering leaders who succeeded in building and growing diverse teams | 2021 |
Author | Title | One-sentence summary | Year |
---|---|---|---|
Erik Bernhardsson | “Why software projects take longer than you think: a statistical model” | Adding up time estimates for many subtasks isnt advised, instead, figure out which tasks have the highest uncertainty – those tasks are basically going to dominate the time to completion. | 2019 |
Erik Bernhardsson | “σ-driven project management: when is the optimal time to give up?” | The post describes an abstract measure “alpha” that captures the risk of a project and based on that risk the post describes a statistical model that shows when one ought to give up on a project. | 2022 |
Michael Kaminsky | Agile Analytics, Part 1: The Good Stuff | When it comes to data science and analytics, these aspects of the scrum work flow work well: acceptance criteria, pointing, two-week chunks (sprints), and explicit prioritization. | 2018 |
Michael Kaminsky | Agile Analytics, Part 2: The Bad Stuff | Some aspects of agile don't work so well with data teams, these include: "The fortuitous finding", exploratory data analysis needs, product ownership / story-writing, and business-as-usual support. | 2018 |
Michael Kaminsky | Agile Analytics, Part 3: The Adjustments | Adjustments are suggested for agile to work well on a data team: time-bound spikes for research, build in slack time for exploration, acceptance criteria includes “write the next story”, peer-review instead of sprint-review. | 2018 |
Michelangelo D'Agostino, Katie Malone | The Care and Feeding of Data Scientists, Chapter 5 | "To Agile or Not to Agile". | 2019 |
Oscar Baruffa | Dealing with difficult stakeholders | Presents some approaches for handling difficult stakeholders that you need buy-in from, including things like take the path of least resistance, work towards getting stakeholders to think it's their idea, have lots of private conversations beforehand, and more. | 2022 |
Lucas F Costa | Useful engineering metrics and why velocity is not one of them | Covers four useful metrics that are easily attainable from JIRA that aren't easily gameable and can help you debug process problems: arrival rate, work in progress, throughput, and cycle time. | 2022 |
Author | Title | One-sentence summary | Year |
---|---|---|---|
Gunnar Morling | The Code Review Pyramid | There should be a hierachy of effort in reviewing code, where more effort is spent on core concepts, how performant code is, and documentation, with less effort on test quality (though of course tests are important) and syntax. | 2022 |
Tim Hopper | Code Review Guidelines for Data Science Teams | In the context of data team, desecribes what a code review should achieve, bullets to carry out pull requests, and some links to additional reading. | 2020 |
Eric Ma | Practicing Code Review | In the context of data science the essay briefly describes the purpose of code review, what it should not be, and the value of it in data work. | 2021 |
Author | Title | One-sentence summary | Year |
---|---|---|---|
Rob Dearborn | Organizing and scaling an effective data team | General guidelines on what a properly-structured data team should look like, with describes ranging from 1-person data team to 32+ person team. | 2022 |
Brittany Bennett | Building Powerful Data Teams: On Investing in Junior Talent | Provides suggestions on how developing junior talent: blocking off time for personal development, celebrating this blocked off time, hiring tutors, and more. | 2021 |
Eric Colson | "Beware the data science pin factory: The power of the full-stack data science generalist and the perils of division of labor through function" | Beware specialization in data science (data science is not to execute. Rather, the goal is to learn and develop profound new business capabilities), as there are costs to specialization. | 2019 |
Chuong Do | "What is the most effective way to structure a data science team?" | Covers how should data scientist roles be defined (analysis vs building), where should data scientists report (centralized vs decentralized), where should the data science function live (engineering org vs product org vs independent consultancy), and what should an organization do to set up data science for success. | 2017 |
Mikkel Dengsøe | "Data team structure: embedded or centralised?" | There are three common models of how data teams are structured, each with their drawbacks and advantages: centralized, embedded, and hybrid. | 2022 |
Randy Bean | Chief Data Officers Struggle To Make A Business Impact | There is widespread disparity of opinion on what defines a successful Chief Data Officer, so it makes sense that only CDOs are poised for success according to a recent Gartner report. | 2019 |
Matthew Mayo | Data Scientist, Data Engineer & Other Data Careers, Explained | Explanations of various titles such as Data Architect, Data Engineer, Analyst, ML Engineer, and Data Scientist | 2022 |
Gergely Orosz | What Silicon Valley "Gets" about Software Engineers that Traditional Companies Do Not | The Silicon Valley treats engineers as autonomous adults who are smart people because that’s who they hire because that’s who can do the work they need done, while traditional companies tend to keep developers in pure execution roles. | 2021 |
Rifat Majumder | The Data Product Manager | Describes the emerging role of "Data Product Manager", and how benefits they provide an org: better business impact, a deep understanding of customer problems, and more clarity on priorities. | 2021 |
Benn Stancil | The technical pay gap: The culture we build is the culture we buy | Describes the current state of confusion around data titles (using the "analytics engineer" as an example), and describes how the tech industry overvalues technical skills at times. | 2022 |
Ben Darfler | Engineering Levels at Honeycomb: Avoiding the Scope Trap | Describes a nice framework for thinking about job levels, based on scope and level of project complexity. | 2022 |
Mikkel Dengsøe | Data teams are getting larger, faster | There are many problems you can encounter when your data team grows beyond a handful of people; the article provides some tips on working through these problems. | 2022 |
Jorge Fioranelli | A framework for Engineering Managers | Although not directly about data this is relevant: a framework for engineering managers to think through titles and expectations (including domains of technology, systems, people, process, and influence). | 2022 |
Pardis Noorzad | Models for integrating data science teams within companies: A comparative analysis | Compares different models for situating DS teams including the "center-of-excellence model", the "Accounting model", the "consultant model", the "embedded model", and more, and considers factors like "Coordination efficiency", "Employee happiness", and others. | 2019 |
Kurt Cagle | Why You Don’t Need Data Scientists | Early in an organization's data maturity stage, you don't need "data scientists" and machine learning people, you instead need to focus on data quality and ontological engineering problems. | 2018 |
Michelangelo D'Agostino, Katie Malone | The Care and Feeding of Data Scientists, Chapter 6 | "Chutes and Career Ladders" discusses how to write a great career ladder for your team. | 2019 |
Benjamin Rogojan | Different Types Of "Data Engineering" Teams | Post gives nice overview of the various flavors of data engineering roles in organizations (including software engineers, data platform engineers, etc). | 2022 |
Morgan Krey | Storytellers and System Builders: A New Way to Think About Data Roles | There has been a proliferation of "data X" roles (e.g. data engineer, data scientist, data analyst, etc) but the author argues that there are really just two kinds of data practitioners: system builders (your engineers that build pipelines, schedule jobs, stand up APIs, etc.) and storytellers (looking for actionable insights, visualizing data on dashboards, etc). | 2022 |
Author | Title | One-sentence summary | Year |
---|---|---|---|
Monica Rogati | The AI Hierarchy of Needs | Before you can fully get value out of ML/AI in an organization, it is critical to have foundational data needs met (i.e. good data collection processes, checks, and analytics). | 2017 |
Mario Perrakis | “The “0 / 1 / Done” Strategy for Data Science” | A description for what a DS org should aspire to: 0-day handovers facilitated by great documentation and code, 1-day prototypes enabled by good tooling and good knowledge, and a clear definition of “done”. | 2022 |
Thomas Redman | “Your Data Initiatives Can’t Just Be for Data Scientists” | Describes the tole and importance of non-data experts in DS projects: collaborators, customers, and as creators of the data. | 2022 |
Natassha Selvaraj | “Why Are So Many Data Scientists Quitting Their Jobs?” | Two primary factors drive a number of new data scientists out of the profession: a mis-match between employer and employee expectations around data science work and the general difficulty of ML to add clear business value. | 2022 |
Pete Warden | How Should you Protect your Machine Learning Models and IP? | Some thoughts on the importance of protecting IP in a ML org. | 2022 |
Jeff Saltz | Managing Machine Learning Projects | Touches on difficulties of managing ML projects and how the management process differs from standard software development. | 2021 |
Alfred Spector, Peter Norvig, Chris Wiggins, and Jeannette M. Wing | Data Science in Context: Foundations, Challenges, Opportunities | A pre-release of a book that gives a thorough accounting of the history of Data Science, a high-level understanding of its applications, and the ethical and social concerns associated with it. | 2022 |
Brooke Carter, Melissa Barr, and Michael Mui | ML Education at Uber: Frameworks Inspired by Engineering Principles | Provides an overview of the philosophy behind Uber's ML education program. | 2022 |
Eyal Trabelsi | How to build TRUST in Machine Learning, the sane way | Provides suggestions on how teams can improve trust in ML in their org, including defining metrics up front, following some best practices when developing the model, A/B testing the model upon deployment, and more. | 2022 |
Andrew Lukyanenko | Lessons learned after 10 years in IT: What I have learned from my mistakes and successes | A senior data scientist gives general DS career (some of which is worth noting as a leader) including topics around interviewing, productivity, communication, time estimation, and more. | 2022 |
Shreya Shankar et al. | Operationalizing Machine Learning: An Interview Study | From the abstract: They conducted interviews with 18 MLEs working across many applications, touching on how Velocity , Validation , and Versioning govern project success (in terms of deployment and long-term maintanence), and they also discuss interviewees’ pain points and anti-patterns. |
2022 |
Author | Title | One-sentence summary | Year |
---|---|---|---|
Lenny Rachitsky | Choosing Your North Star Metric | Proposes metrics based on your type of business, recommends having a singular north star metric, and avoid using revenue as your metric. | 2021 |
Ron Berman | “The Value of Descriptive Analytics: Evidence from Online Retailers” | The authors estimate an increase of 4%–10% in average weekly revenues post-adoption associated with the adoption of descriptive analytics among online retailers. | 2020 |
Roger M. Stein | "Why Managing Data Scientists Is Different" | Two challenges in managing data scientists: (1) managing a data research effort tends to be a dynamic and self-correcting process in which it is difficult to plan either a project’s timing or final outcomes, and (2) analytics is highly sensitive to time, cost, and quality tradeoffs. | 2015 |
Eric Colson | "The Sobering Truth about the Impact of your Business Ideas" | The vast majority of business ideas fail to generate a positive impact, and this underscores the value of measuring impact, collecting data, and testing. | 2021 |
Joe McFarren | 5 Tips for Managing a Successful Analytics Project | In the context of analytics consulting it is important to: clearly establish project scope, be in constant communication, determine a line of escalation, monitor work with tracking apps, and track finances. | 2022 |
Erik Balodis | A Framework for Embedding Decision Intelligence into your Organization | Provides a high-level overview of how to infuse decision-intelligence into an organization, along with some additional reading sources. | 2022 |
Nelson Auner | Building an Analytics Stack in 2020 | Gives an overview of the modern analytics stack via three buckets: a data-moving tool (ETL), a data warehouse to store the data, and a BI layer to analyze the data. | 2020 |
Mode | The Data Team’s Guide for Marketing Metrics | Good overview of the landscape of metrics used in data marketing work (as well as information on the technical side of it). | 2022 |
SeattleDataGuy | Why Are We Still Struggling To Answer How Many Active Customers We Have? | Surprisingly, metrics are still hard to calculate and this is at least partly because of turnover of developers, ERP and CRM migrations, producers of data constantly changing what data they provide, and mergers and acquisitions, and other reasons. | 2022 |
Randy Au | We take our units of analysis for granted | Understanding what the "unit of analysis" is, is critical to answer a research question, and yet in industry it's something we often poorly handle. | 2022 |
Marie Lefevre | Not All Data Requests Are Urgent, So Start by Asking These 5 Questions | Details five questions the authors typically asks of those that request analyses: Why? Why again? Who is it for? When is it due? Is it more of a priority than that other request? | 2022 |
Amplitude | The North Star Playbook: The guide to discovering your product’s North Star | A short book intended for product managers and product designers that describes the value of North Star metrics and how to iddentify them. | 2018 |
Gergely Orosz | Checklist used at Uber to determine if something is urgent | 1. What is the impact? 2. Do you have a signed spec answering the why and the what? 3. Do you have your estimate of the cost? 4. Make the cost of dropping what you're doing very clear. | 2022 |
Dan Frank | Experimentation Platform in a Day | A short technical (but very accessible guide) to setting up a simple experimentation "platform" with elements of logging, measurement, assignment, and analysis. | 2022 |
Ron Kohavi, Diane Tang, and Ya Xu | Trustworthy Online Controlled Experiments : A Practical Guide to A/B Testing | A fantastic introductory book on A/B testing for program and feature evaluation; covers methods, interpretation, biases that can arise, and culture around experimentation. | 2020 |
W.D. ([email protected]) | Caveats and Limitations of A/B Testing at Growth Tech Companies | Highlights an issue of A/B tests where over time effect sizes tend to shrink, and growth companies can find themselves in a situation where the statistical power benefits of a growing user base are outweighed by this diminishing returns effect. | 2022 |
Tristan Handy | The Startup Founder’s Guide to Analytics | Although written in 2017, this article gives a still relevant high-level overview on creating the analytics competency at your org, at different levels of company size. | 2017 |
Author | Title | One-sentence summary | Year |
---|---|---|---|
David Loftesness | The Engineer to Manager Transition, by Former Twitter Director of Engineering | Talks about an engineering management "event loop", where you touch base on people, projects, process, and self on daily, weekly, and monthly basis. | 2015 |
The Institute of Leadership & Management | "Spotlight on Leadership Styles" | Describes a set of leadership/management styles including pace-setting, democratic, laissez-faire, and more. | 2018 |
Andy Johns | How to know when to stop: A guide to avoiding burnout and establishing balance in your life—by guest author Andy Johns | A framework for thinking throughout burnout including: 1) Define your personal range of tolerance, 2) Pick your career progression, 3) Pick your life progression. | 2022 |
Alan Johnson | 11 Principles of Engineering Management | A brief, digestable list of management principles for new engineering managers. | 2022 |
GitLab | Preventing burnout: A manager's toolkit | Provides 12 strategies managers can utilize to support their team and prevent burnout | 2022 |
Tanya Reilly | Being glue | Describes the importance of "glue work" (e.g. noticing when other people in the team are blocked and helping them out, reviewing design documents and noticing what's inconsistent, onboarding the new people and making them productive faster, or improving processes to make customers happy. | 2019 |
Author | Title | One-sentence summary | Year |
---|---|---|---|
Elijah Ben Izzy and Stefan Krawczyk | Deployment for Free -- A Machine Learning Platform for Stitch Fix's Data Scientists | The authors describe, at a high-level, the initial design considerations for Stitch Fix's ML platform, present the API data scientists use to interact with it, and detail its capabilities. | 2022 |
Barr Moses and Lior Gavish | What is a Data Platform? And How to Build One | While every organization’s data platform approach will vary based on the industry and the size of their company, this quick and dirty guide lays out a blueprint for a modern data platform. | 2022 |
Jordan Volz | The Modern Data Stack Ecosystem: Spring 2022 Edition | Articles maps out the various pieces of the modern data stack, including event tracking, a data warehouse, data governance, and more. | 2022 |
Krzysztof Szafranek | Zalando's Machine Learning Platform: Architecture and tooling behind machine learning at Zalando | Provides an overview of Zalando's ML platform (AWS-powered) from the perspective of a machine learning practitioner. | 2022 |
Jean-Georges Perrin | The next generation of Data Platforms is the Data Mesh | The post summarizes Zhamak Dehghani's proposal for transitioning from current breadth-first data platforms (end-to-end data lifecycle) into vertical/depth-first architectures (one business domain at a time). | 2022 |
Gabrielle Davelaar and Jordan Edwards | DevOps for AI - Microsoft | Great talk outlines how DevOps principles can be applied to AI, and then shows in detail how CI/CD, version control, model storage, and more fit into a great MLOps process. | 2018 |
Kevin Hu | The Four Pillars of Data Observability | Provides a definition of data observability and how in the context of a data platform this includes the following facets: metrics, lineage, metadata, and logs. | 2022 |
Stefan Krawczyk | What I Learned Building Platforms at Stitch Fix: Five lessons learned while building platforms for Data Scientists. | The author describes 5 lessons learned in building a data science platform, including things like don't build them for all possible users, abstract away any underlying APIs to simplify things for end-users. | 2022 |
Lak Lakshmanan | No, you don’t need MLOps: Keep It Simple: the complexity of full MLOps is rarely needed | In counterpoint to all the buzz, the author warns that MLOps is no panacea, and can often automate away important detail or cause a large amount of technical debt that ultimately doesn't save time. | 2022 |
Nishith Agarwal | The Build vs. Buy Guide for the Modern Data Stack | The author claims that the decision to build vs buy comes down to five main considerations: cost, complexity, expertise, time to value, and competitive advantage. | 2022 |
Author | Title | One-sentence summary | Year |
---|---|---|---|
Sanjana Sen and Stephen Bailey | Locally Optimistic Meetup - Governance and Compliance | A conversation among many data practitioners about how their organizations handle data access control, data tagging, anonymization, and other key compliance activities, and what frameworks they have found helpful. | 2020 |
Bryan Petzold, Matthias Roggendorf, Kayvaun Rowshankish, and Christoph Sporleder | Designing data governance that delivers value | Briefly surveys the problem of poor data governance, describes an idea data governance model, and provides six ways to drive data-governance excellence. | 2020 |
Ilan Man | People-first Data stacks | Proposes switching from tech- to user-centric data management by i) integrating data into company culture (raising awareness, tracking adoption); ii) making data governance options actionable for stakeholders outside of the data platform and iii) introducing ownership of tests on data quality. | 2022 |
Yali Sassoon | Why Data Contracts are Obviously a Good Idea. And why there is so much resistance to this idea from the community around the Modern Data Stack | Briefly describes the importance of data contracts, provides an example of a complaint against contracts, and then how complaints arise because practitioners are stuck in the “data is oil” paradigm i.e. assume that the data is extracted, rather than deliberately creating data. | 2022 |
- Add the following posts:
- https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3454379
- https://cghlewis.com/blog/data_dictionary/
- https://sarahsnewsletter.substack.com/p/the-analytics-requirements-document
- https://medium.com/coriers/onboarding-for-data-teams-100e041a012c
- https://medium.com/@leandroscarvalho/data-product-canvas-a-practical-framework-for-building-high-performance-data-products-7a1717f79f0
- https://arxiv.org/pdf/2205.02302.pdf
- https://www.techrxiv.org/articles/preprint/Requirements_and_Reference_Architecture_for_MLOps_Insights_from_Industry/21397413
- https://medium.com/@koendit/whats-the-big-deal-about-data-products-26ac347b7d7a
- https://docs.getdbt.com/blog/demystifying-event-streams?utm_source=substack&utm_medium=email
- https://betterprogramming.pub/architecture-of-modern-startup-abaec235c2eb