Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sprint - July 8 to July 19, 2024 #23325

Closed
mariusandra opened this issue Jun 28, 2024 · 10 comments
Closed

Sprint - July 8 to July 19, 2024 #23325

mariusandra opened this issue Jun 28, 2024 · 10 comments
Labels
sprint Sprint planning

Comments

@mariusandra
Copy link
Collaborator

mariusandra commented Jun 28, 2024

Global Sprint Planning

3 things that might take us down

  1. MSK Kafka -> someone shipping something that writes a lot to kafka, b/c we use shared kafka. Could we monitor/alert?

Team sprint planning

For your team sprint planning copy this template into a comment below for each team.

# Team ___

**Support hero:** ___

## Retro

<!-- Grab the high and low priority items from last time and add whether that item was completed or not -->

- 

## Hang over items from previous sprint

<!-- For each item, decide to re-prioritise (and add below) or deprioritise -->

- Item 1. prioritised/deprioritise

## OKR

1. OKR, status (red/yellow/green) and action points if yellow/red


### High priority

-

### Low priority / side quests

-

@mariusandra mariusandra added the sprint Sprint planning label Jun 28, 2024
@mariusandra mariusandra pinned this issue Jun 28, 2024
@mariusandra
Copy link
Collaborator Author

mariusandra commented Jun 28, 2024

Team CDP, WIP.

We're both off the last week of the previous sprint, so posting this already.

Retro

  • Marius: did a lot of work this week to improve the editing experience.
  • Ben:

Hang over items from previous sprint

  • Secret management (probably go for simple encrypted field in django) @benjackwhite
  • Hook up to rusty webhook service (hopefully Brett, if not Marius)
  • Go through existing CodeQL reports and sanity check the safety of the system @mariusandra
  • Ensure every existing plugin destination could be built with Hog and build them as templates @mariusandra
  • Add calculation and limits for resource usage (memory)

OKR

  • Goal 1: Widespread usage goal
    • Generally available to all customers
    • 5 happy customers (tight feedback loop with them)
    • Get all post-ingestion plugins migrated to Hog Functions
    • Idea: Template gallery (publish your own template for others to use)
    • Scaling work
  • Goal 2: Messaging V1
    • Build on top of hog functions to have “HogWorkflows”
    • Requirements gathering - what do we need to build here
    • We should be able to replace some (or all) of our customer.io workflows with our product
  • Goal 3: Hog Functions as a building block
    • Work with other teams to spread understanding of the power of Hog (functions)
    • Generate various use cases for embeddable functions
      • Multiple sources for functions (ActivityLog, InternalEvents, Alerts)
      • More destinations for functions (tracking events, updating person properties)

Sprint plan

Megaissues: CDP & Hog

  • Goal 1 @benjackwhite
    • Launch the private beta
    • Get 5 happy users
    • Ensure the system stays up and running
  • Goal 2
    • Hook up the rusty webhook service
  • Goal 3 @mariusandra
    • Port over all existing destinations
    • Implement missing language features

@pauldambra pauldambra changed the title Sprint - July 1 to July 12, 2024 Sprint - July 8 to July 19, 2024 Jul 3, 2024
@pauldambra
Copy link
Member

pauldambra commented Jul 3, 2024

Team Error Obsession, Obsessing on things

Support hero: @pauldambra

items from previous sprint

High priority

  • ✅ Q3 goals - everyone
  • ✅ masking text in the screenshot images on iOS @marandaneto
  • 🌀 Replay React native plugin for Android and iOS @marandaneto - going faster than expected - good results but tricky
  • 🌀 Universal filters released for everyone @daibhin - definitely will be release ready this week
    - will roll out to team 2 and a few customers waiting on it in support tickets
    - it's a big change so we want to get some feedback before big bang
  • 🌀Error tracking @daibhin @pauldambra - ⛔ Generate embeddings in plugin server using new S3 mounted disk
    - learned we don't need to do this first, but that we can if/when we want to
    - ⛔ Materialized embeddings column on the events table
    - learned this is possible but not needed yet
    - ✅ Playlist of errors & associated recordings
    - using the new virtual playlist code 🎉
    - ✅ Figure out grouping (https://github.com/PostHog/product-internal/pull/615)
    - have a plan here and need https://github.com/pipeline 🤝 replay ingestion changes #23395 before we can really start experimenting
    - ✅ first user interview done
    - great 30 minutes with $largeEUCustomer

Low priority / side quests

OKR

  1. OKR, status (red/yellow/green) and action points if yellow/red
  • 🟡 📱Goal 1: People think of PostHog as a mobile solution
  • 🟡 🪲 Goal 2: Error tracking in people's hands
  • 🟡 ⁉️ Goal 3: Hiring

High priority

Low priority / side quests

@EDsCODE
Copy link
Member

EDsCODE commented Jul 3, 2024

Team Data <->, collecting of Hogs and more

OKR Q2 2024

Objective

Query 3000

  • Key Results:
    • Autocomplete
    • Increase general BI experience/product BI meta#157
    • Declutter the data warehouse UI and make the features intuitive to find

Data Modeling MVP

  • Key Results:
    • Infrastructure decided and implemented
    • Integrating external data with feature flags
    • External data everywhere in insights/persons/cohorts
    • Get billing team to use modeling in posthog for their invoices_with_annual table

Retro

High Priority

  • finish launching pricing @EDsCODE
  • finish work for showing errors from syncs @EDsCODE
  • Improve error visibility from querying @Gilbert09
  • data modeling (TBD after planning meeting) @tomasfarias
  • add historical exports to pipeline 3000

@thmsobrmlr
Copy link
Contributor

thmsobrmlr commented Jul 3, 2024

Team Product Analytics

Support hero:
Week 1: @thmsobrmlr @skoob13 (secondary)
Week 2: @Twixes @thmsobrmlr (secondary)

Time off: @aspicer only there for first three days

Retro

  • 🟡 Getting rid of remaining legacy filters use continued (owner: @thmsobrmlr)
    • This turned out to be quite complex, as we store insights and their dashboards in memory for "fast mode", but don't actually have a normalized state. Added playwright e2e tests, which aren't yet working reliably on CI. For now back to refactoring. We'll want to add e2e tests as part of the quarterly goals and it might make sense to talk about normalization or an alternative like GraphQL to improve maintainability.
  • 🔴 Experiments migrated from legacy trends/funnels to HogQL-based (owner: @thmsobrmlr)
    • Not started yet.
  • 🟢 Multiple breakdowns in Trends released to users (owner: @skoob13)
    • Probably going to be released this week. Some issues with data warehouse queries & a remaining bugfix.
  • 🔴 Project Environments (owner: @Twixes)
    • Probably lots of things going on + vacation.
  • 🟢 Insight background reloads monitoring/cleanup (@webjunkie)
    • Released behind a feature flag. Improved cache lifetime, so we can roll it out more.
  • 🟢 Fixed a lot of with new support system. Fixed OOM issue with cohorts for a large customer, but one-by-one all places where we naively fetch persons breaks -> need to have a general answer. Same for properties of events.

Extra things done

  • Offsite planning
  • Working with a contributor, Nikita, who's in the process of shipping analytics alerts

High priority

  • @webjunkie Insight caching state investigation (do we pre-warming? -> inclined to remove).
  • @skoob13 Probably start experimenting with LLMs on insights (natural text -> query nodes).
  • @aspicer Driving along tickets.
  • @Twixes tbd

Low priority / side quests

  • @webjunkie Staying on alerts topic to make sure we have a good first version.

Q3 2024 objectives

  1. Rock-solid analytics (@thmsobrmlr + @webjunkie + @aspicer + @anirudhpillai)
    1. Legacy Minus – removing legacy insights code so that we can move fast
      • FilterType gone from the frontend.
      • rm -rf posthog/queries/
      • Experiments ported to HogQL.
      • All the flags from HogQL/querying work.
    2. Tests Plus – shipping fewer bugs in the first place
      • Ensure we test with the feature flags that users actually experience, both in end-to-end and integration tests.
      • When shipping changes to queries, replay old vs. new version on thousands of real queries to check for regressions.
    3. Metrics Plus – catching issues before before users report
      • Analytics performance dashboard in Grafana (query duration, failures, etc.). Paging alerts on critical metrics, e.g. if the number of queries drops rapidly, or failures rise.
      • Analytics experience dashboard in PostHog (time till data available, result freshness across insights and subscriptions, refreshes initiated manually vs. automatically, etc.)
      • Alerts on major Product Analytics errors from Sentry, and us acting on every alert. (Bonus: checking up the Sentry routing rules for the #product-analytics team.)
      • Cohorts dashboard in Grafana (successful vs. failed calculations per day, recalculation backlog). Alerts here too.
    4. Performance Plus - eliminating UX pain via maximum query performance/reliability, based on Metrics Plus data
      • Partial calculation of multi-day time series results
        …and more – work with Team Query Performance to find the lowest-hanging fruit, similarly to Tim's performance mega issue
    5. Support Plus – sparking joy for users when they’re led to report a bug
      • 1 hero + 1 sidekick
      • Goal: 90% of tickets fulfill the SLA
  2. Answering more product questions, deeper (@thmsobrmlr + @webjunkie + @aspicer + @anirudhpillai)
    1. Growth Plus - increasing ease of onboarding, and subsequent retention
      • Identify growth opportunities working with Anna, our product manager – implement growth optimizations and track their impact whenever possible.
      • Work with Team Growth on optimizing the onboarding experience of Product Analytics.
    2. Analysis Plus - answering more product questions, more deeply
      • Analytics alerts are out to users (implemented with the contributor)
      • “Done for the first time” in Trends, to kill the janky First Time Event Plugin
      • Query in new insight URL for instant insight sharing
      • Optional funnel steps
      • ...and more, based on user feedback - see the most requested features in GitHub
  3. ArtificialHog (@Twixes + @skoob13) – an LLM-based chat-like interface for answering product questions.

@tiina303
Copy link
Contributor

tiina303 commented Jul 3, 2024

Team Pipeline

Off: Brett 2 days, Xavier 2 days
Support: Tiina

Retro

High priority

  • Fix excessive overrides written in support of Personless mode (Brett)
  • Hog support for Rusty-Hook (Brett) (This takes backburner to the one above)
    • carry-over
  • capture-rs: fix billing limits (Xavier)

Low priority / side quests

  • Finish hog-rs to posthog repo migration (deploy out of posthog through state.yaml)
  • Collect rdkafka metrics (broker response latency, error rates) for all node producers & consumer (Xavier)
    • carry-over
  • capture-rs: read redis out-of-band (avoid latency if redis slow)
    • carry-over

OKR

✅=finished 🟢=on track to finish this quarter 🟡=might not finish 🔴=won't finish
✔️=progressed last sprint ; ➡️=planned work for this sprint

🟢 Test Warpstream as PoC and decide whether to do it or not
🟢➡️ Pipeline scalability Improving pipeline throughput
🟢➡️ Help other teams ship fast
🟢 Stretch: better e2e monitoring

High priority

  • Hog support for Rusty-Hook (Brett)
  • Separate pipeline for $$heatmap events (Xavier, Tiina, Paul)
  • Inline a processEvent plugin (Oliver)

Low priority / side quests

  • Collect rdkafka metrics (broker response latency, error rates) for all node producers & consumer (Xavier)

@fuziontech
Copy link
Member

fuziontech commented Jul 3, 2024

Team Click Haus, Haus of the Hogs

OKR Q2 2024

Objective

James as a Service -> Clickhouse as a Service

  • P0 tasks such as
    • 🟡 Deletes
    • 🟢 Keeping clusters happy
    • 🟢 Provisioning more disks
    • 🟢 Schema Reviews
    • 🟢 Debugging
    • 🟡 Performance
    • 🟢 Backups/Restores
  • Decide whether ByConity is the way forward
    • 🟢 Load it with data, set up
    • 🟢 Test performance, test the functionality/compatibility gaps
  • IF ByConity works, migrate over to it
    • 🟢 Enumerate all functionality that doesn’t work and update the functions/contribute to ByConity
    • 🟢 Syntax
    • 🟡 If it works on metal, put it in k8s with Karpenter
    • 🟡 Evaluate which nodes we should use
  • IF ByConity doesn’t work, reshard US to look like EU cluster
    • 🟡 All clusters (Dev, US, EU) should be consistent in shape and topology. This will make it easier to manage and maintain the clusters and apply learnings from one cluster to another.
    • 🟢 We want all cluster operations to be automated and managed through some form of infra as code that is available in source control.
    • 🟡 Schema management on ClickHouse should be entirely automated and managed through source control with no exceptions. This includes Coordinator schemas.
    • 🟢 We should be able to spin up and down replicas of any cluster with no manual intervention.
    • 🟢 We should be able to upgrade ClickHouse versions with no manual intervention.
    • 🟡 We should have tooling / runbooks for resharding (if we continue down the current coordinator path)

Board

https://github.com/orgs/PostHog/projects/85/views/2

Retro

@Daesgar - There have been changes to our scope. We have changed our scope by 1/2 just because of changing priorities and fires. Feeling comfortable. Able to do config automation and provide value in the first sprint. Working on the backups. Needs more context for the rest of how things work at PH (like the plugin server). Sometimes it's hard to get focus on something. When a question comes up in the chat there is ambiguity on whether it's something urgent or something to focus on.

@fuziontech - Overall I think this sprint went amazingly. Having 2x the firepower is a hack. Getting a lot more done than even my highest expectations. Having ~two incidents was less than ideal though for a first sprint.

  • 📟 Monitoring and Alerting on EU Coordinator
  • ⏩ Move parts around so last 3 months of data are on NVME on US @Daesgar
  • Retire old Offline Nodes on US Cluster @fuziontech
  • Remove projections in EU on events table @fuziontech
  • 🗑️ Delete persons on teams that are still ingesting data (for personless events) @fuziontech
  • Configs in Ansible for ClickHouse EU Coordinator @Daesgar
  • Configs in Ansible for ClickHouse US @Daesgar
  • 🧪 Test incremental backup restores
  • Major fixes to HouseWatch backups @Daesgar (unplanned but needed)
  • 🏃 2 new i4i.metal replicas for US
  • Configs in Ansible for ClickHouse EU @Daesgar
  • 🔥 Kafka consumer fire recovery and initial debugging (major distraction)

High priority

image

@robbie-c
Copy link
Member

robbie-c commented Jul 3, 2024

Team web analytics session table

Support hero: @robbie-c

Retro

Session table PR got merged, we are dogfooding, I'm fixing issues as they come up.

Had some detailed customer interactions around channel type attribution. One customer sent me a spreadsheet of their GA compared with us. We're pretty close but there were a few differences that I was able to fix or help them fix. A few other support tickets have asked for help with this, so I'm adding a session attribution debugger.

Tasks

🟢 Get session table v2 PR over the line
🟢 Start backfilling, prioritising EU, and team 2 on US for dogfooding
🆕 Help customers debug attribution
🆕🟢 Add live session count

Stretch

🔴 Get a versions of WA up that is terrifyingly fast because it can just use the sessions table + it can sample

OKR

  1. Make querying fast enough for large customers
  2. Heavily requested features
  3. Improve synergy with other products
  4. Product and growth

High priority

  • Figure out difference between queries with session v1 and v2
  • Clear out the support queue
  • Finish the attribution debug tool
  • Improve the refresh logic

Ongoing

  • In the background, continue to backfill the sessions table

@neilkakkar
Copy link
Contributor

neilkakkar commented Jul 3, 2024

Team Feature Success

Support hero: @Phanatic
Days off:
Juraj: 1 days
Phani: 0 days
Dylan: 1 days
Neil: 2 weeks

Retro

Hang over items from previous sprint


OKRs

  1. Make sure feature flags can handle 10x current scale
  2. No-code experiments
  3. Split out experiments into its own product

High priority

Low priority / side quests / maybe Neil will get to this next year

@raquelmsmith
Copy link
Member

raquelmsmith commented Jul 3, 2024

Team Growth

Retro

Retro items
  • Q3 planning
  • @raquelmsmith
    • Support for first week
    • Pricing page experiments - iterate here with cory and eli until it's done
    • Stay on top of revenue issues
    • Start working on toolbar dashboard template thing
    • Keep on top of personless comms and customer issues and metrics
    • Lots of interviews...
  • @zlwaterfield
    • Complete subscribe to all products
      • frontend changes
      • release under feature flag to new users
      • backfill existing users and communicate with them
      • (if time permits - probably next sprint) cleanup! remove/clean single product subscribe code where we can.
    • Start on the Stripe metadata changes - close RFC, updates to Zapier, work on backfill, etc.
    • re-run the plans map and compare with the new auto-cancel functionality

Q3 Goals

✅=finished 🟡=in progress 🔴=won't finish ⚪=not started

  1. 🟡 Make onboarding awesome for Product analytics and Data warehouse (Raquel)
  2. ⚪ Support self-serve annual commitments (Zach)
  3. ⚪ Dive into the data to understand our billing metrics and customers better (Zach)
  4. 🟡 Launch pricing for data warehouse (Raquel)
  5. 🟡 Hire 2 people (one for billing, one for auth/permissions focus)

This sprint

High priority

  • @raquelmsmith (support first week, on-call second week)
    • Personless events launch
      • Oversee pricing calc changes, keep iterating until sales feels like it's good and we feel like it works for us as well
      • If above is completed, make sure comms are sent out
      • Figure out if we an roll default out to everyone
    • Data warehouse pricing
      • Launch it for non-beta-users
    • Dashboard templates in onboarding
    • Hiring
    • Project-access-on-invites
      • Do some digging to see what this entails (I don't think it will be involved or difficult)
  • @zlwaterfield (on call first first week - support second week)
    • subscribe to all products
      • Run backfill for subscribe to all products and notify users
      • Remove feature flag code for subscribe to all products and cleanup code
    • stripe startup metadata
      • Finish stripe metadata clean - 20-30 left to manual fixes + a few hours of manual checks
      • Build a startup plan dashboard in dashboard in PostHog
    • misc
      • Think through "Free / paid - same feature-set"
      • Add at least one E2E SAML test
      • Run backfill for starting/ending backfill bug
      • Re-run the plans map compare
      • Deprecate billing v2 (PR done just need to merge)

@benjackwhite
Copy link
Contributor

Team Infra

OKR

  1. 🦹 Zero-trust security
  2. 🤓 10x Developer Experience
  3. 💪 Every service lives and dies alone
  4. 💰 Save big on cost

High priority

  • Reverse proxy sharding approach + roadmap @frankh
  • Billing alerts and follow up from incident issues @danielxnj
  • Postgres issue investigation in EU - bigint etc @danielxnj
  • Start planning out security group changes @danielxnj
  • VPA grafana chart changes in all regions @ZeleniJure
  • Autoscaling based on celery queue depth @ZeleniJure

@daibhin daibhin closed this as completed Jul 31, 2024
@daibhin daibhin unpinned this issue Jul 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sprint Sprint planning
Projects
None yet
Development

No branches or pull requests