Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sprint - Jul 22 to Aug 2, 2024 #23768

Closed
pauldambra opened this issue Jul 17, 2024 · 10 comments
Closed

Sprint - Jul 22 to Aug 2, 2024 #23768

pauldambra opened this issue Jul 17, 2024 · 10 comments
Labels
sprint Sprint planning

Comments

@pauldambra
Copy link
Member

pauldambra commented Jul 17, 2024

Global Sprint Planning

3 things that might take us down

  1. MSK Kafka -> someone shipping something that writes a lot to kafka, b/c we use shared kafka. Could we monitor/alert? Notes: we have alerts on capture latency as a whole, so if it becomes too high we have a runbook. See infra planning (looking at kafka alternatives which might make it easy to scale up "many kafkas")
  2. 1-based indexes - need to decide w/ Hog, please check the RFC as this will influence a lot of how functions are written in future.

Team sprint planning

For your team sprint planning copy this template into a comment below for each team.

# Team ___

**Support hero:** ___

## Retro

<!-- Grab the high and low priority items from last time and add whether that item was completed or not -->

- 

## Hang over items from previous sprint

<!-- For each item, decide to re-prioritise (and add below) or deprioritise -->

- Item 1. prioritised/deprioritise

## OKR

1. OKR, status (red/yellow/green) and action points if yellow/red


### High priority

-

### Low priority / side quests

-

Sprint name game score

6 out of 10. Barely average. C'mon people.

@pauldambra pauldambra added the sprint Sprint planning label Jul 17, 2024
@daibhin daibhin pinned this issue Jul 17, 2024
@pauldambra
Copy link
Member Author

pauldambra commented Jul 17, 2024

Team Something

Support hero: @pauldambra

retro

  • 3 and a bit days of incidents and confusing bugs on top made for "lots of fun" - @pauldambra
  • basically turned into a one week spring for me 😫 - @pauldambra
  • feels good that filters have landed without kerfuffle - @daibhin
  • very excited to delete the old filtering code - @daibhin
  • still feel like we don't have a handle on how to investigate complex replay bugs @daibhin
  • exciting that we've started on pricing changes - @pauldambra and @annikaschmid
  • feels hard to coordinate pricing changes, there are a lot of pricing changes changes happening - @annikaschmid
  • we're still doing a lot of things at the same time which means we risk half-assing things @pauldambra @daibhin

items from previous sprint

High priority

Low priority / side quests

  • we are recruiting beta testers for Android and iOS - (we should have screenshots now) @marandaneto @annikaschmid
    ✅ mobile support @pauldambra
    • holding the fort while @marandaneto relaxes in Brazil
      ✅ continue investigating recordings snapshots capture @pauldambra
      🟡 finish the rrweb alpha-16 upgrade! @daibhin and @pauldambra
    • maybe waiting for a type fix from rrweb
    • David' CSS parsing changes have been merged, so we might end up waiting for alpha 17

OKR

  1. OKR, status (red/yellow/green) and action points if yellow/red
  • 🟡 📱Goal 1: People think of PostHog as a mobile solution
  • 🟡 🪲 Goal 2: Error tracking in people's hands
  • 🟡 ⁉️ Goal 3: Hiring

High priority

  • make sure pricing changes have landed @pauldambra
  • error tracking
    • if/when new ingestion pipeline
      • add s3 mounted volume @pauldambra
      • add source maps
        • what's the state of the art here
      • demangle posthog stack traces
    • alerting/notifications? @daibhin
      • only if we can CDP this?
        • e.g. slack message when error first happens / an archived error re-occurs
  • heatmap ingestion separation? @pauldambra

Low priority / side quests

  • replay capture "large messages" @pauldambra
  • feels like there is a person filtering bug where we don't get consistent results in replay @daibhin / @pauldambra
  • replay bug investigation tooling @pauldambra / @daibhin
    • playback snapshot by snapshot
    • chrome extension avoidance
  • session replays & survey following up on the new filtering experience @daibhin

@benjackwhite
Copy link
Contributor

benjackwhite commented Jul 17, 2024

Team Crocodile Dundee Programming

Team Availability

  • Marius: 50% (second week off)
  • Ben 100%

Retro

  • Marius
    • Starting to feel like a great programming exp (Hog)
    • Great to build the Kinesis app and implement missing features
    • Happy about the progress
  • Ben
    • Working with Brett is super nice and seems that Webhooks and App Metrics should be ready and working soon
    • Generally lots of good polishing progress
    • Good to get some actual users and feedback (generally feedback was positive and mostly UX based)
    • I made 6 integrations in one day!

Hang over items from previous sprint

  • 🟢 Launch the private beta
  • 🟡 Get 5 happy users
  • 🟢 Ensure the system stays up and running
  • 🟡 Hook up the rusty webhook service
  • 🟡 Port over all existing destinations
    • not all ported, but Kinesis was complex which meant we ended up with lots of nice STL functions
    • Experience feels pretty good
    • Implement missing language features
  • 🟢 Secret management (actual encryption will follow)
  • 🟢 Add calculation and limits for resource usage (memory)

OKR

  • 🟢 Goal 1: Widespread usage goal
    • Generally available to all customers
    • 5 happy customers (tight feedback loop with them)
    • Get all post-ingestion plugins migrated to Hog Functions
    • Idea: Template gallery (publish your own template for others to use)
    • Scaling work
  • 🟡 Goal 2: Messaging V1
    • Build on top of hog functions to have “HogWorkflows”
    • Requirements gathering - what do we need to build here
    • We should be able to replace some (or all) of our customer.io workflows with our product
  • 🟢 Goal 3: Hog Functions as a building block
    • Work with other teams to spread understanding of the power of Hog (functions)
    • Generate various use cases for embeddable functions
      • Multiple sources for functions (ActivityLog, InternalEvents, Alerts)
      • More destinations for functions (tracking events, updating person properties)

Sprint plan

Megaissues: CDP & Hog

  • Focus on "big Hog problems" @mariusandra
    • Array access (1-based?)
    • Exceptions
    • Port over more plugins/apps:
      • GCS pubsub (& blob?)
    • Stretch: inline functions/lambdas (map, reduce, each, etc)
  • Final work to enable HogFunctions as feature preview @benjackwhite
    • Hook up to an actual webhook service
    • Load testing of Hog functions in our own account
    • Get AppMetrics connected
    • Native oauth integration (needed for a lot of destinations, e.g. salesforce)
    • Show function status in the list view and alert when disabled
    • Test autocapture work (likely works but lets just triple check)

@jurajmajerik
Copy link
Contributor

jurajmajerik commented Jul 17, 2024

Team Feature Success

Support hero: @jurajmajerik
Days off:
Juraj: 0 days
Phani: 2-3 days
Dylan: 2 days
Neil: 0 days

Retro

Hang over items from previous sprint


OKRs

  1. Make sure feature flags can handle 10x current scale
  2. No-code experiments
  3. Split out experiments into its own product

High priority

  • Experiment confidence intervals @jurajmajerik
  • verification framework/service to ensure that rust flags service works the same as decide. @dmarticus
  • no-code experiments RFC - @Phanatic

Low priority / side quests / maybe Neil will get to this next year

@benjackwhite
Copy link
Contributor

Team Infra

Retro / hangover

  • 🟢 Reverse proxy sharding approach + plan @frankh
    • Plan in place and we know high level what to do, just need to do it.
  • 🟢 Billing alerts and follow up from incident issues @danielxnj
  • 🟡 Postgres issue investigation in EU - bigint etc @danielxnj
    • Generally good but one ID field still needs changing (working with Jams)
    • Worked well on dev but locked up on EU replica (coming up with follow up plan
  • 🟡 Start planning out security group changes @danielxnj
    • Testing on dev to lock down DBs as starting point
  • 🟢 K8s rbac improvements
    • Rolled out with admin/developer/readonly roles
  • 🟢 VPA grafana chart changes in all regions @benjackwhite
  • 🟢 Autoscaling based on celery queue depth @benjackwhite
    • Just need to roll it out in EU as well
  • 🥳 Testing out Warpstream
    • Brought this into prio due to last sprint MSK issues
    • By end of sprint will have a duplicate replay ingestion stack running on it
    • Will run for some time to validate perf and costs estimates
  • 🟡 Infiscal testing

OKR

  1. 🦹 Zero-trust security 🟢
  2. 🤓 10x Developer Experience 🟡
  3. 💪 Every service lives and dies alone
  4. 💰 Save big on cost 🟡

High priority

@EDsCODE
Copy link
Member

EDsCODE commented Jul 17, 2024

Team Data <->, collecting of Hogs and more

OKR Q2 2024

Objective

Query 3000

  • Key Results:
    • Autocomplete
    • Increase general BI experience/product BI meta#157
    • Declutter the data warehouse UI and make the features intuitive to find

Data Modeling MVP

  • Key Results:
    • Infrastructure decided and implemented
    • Integrating external data with feature flags
    • External data everywhere in insights/persons/cohorts
    • Get billing team to use modeling in posthog for their invoices_with_annual table

Retro

  • finish launching pricing
  • finish work for showing errors from syncs @EDsCODE
  • Improve error visibility from querying @Gilbert09
  • data modeling (TBD after planning meeting) @tomasfarias
  • add historical exports to pipeline 3000

High Priority

@daibhin daibhin changed the title Sprint Jul 22 to August 2nd Sprint Jul 22 to Aug 2, 2024 Jul 17, 2024
@daibhin daibhin changed the title Sprint Jul 22 to Aug 2, 2024 Sprint - Jul 22 to Aug 2, 2024 Jul 17, 2024
@zlwaterfield
Copy link
Contributor

zlwaterfield commented Jul 17, 2024

Team Growth

Retro

Retro items
  • @raquelmsmith (support first week, on-call second week)
    • Personless events launch
      • Oversee pricing calc changes, keep iterating until sales feels like it's good and we feel like it works for us as well
      • If above is completed, make sure comms are sent out
      • Figure out if we an roll default out to everyone
    • Data warehouse pricing
      • Launch it for non-beta-users
    • Dashboard templates in onboarding
    • Hiring
    • Project-access-on-invites
      • Do some digging to see what this entails (I don't think it will be involved or difficult)
  • @zlwaterfield (on call first first week - support second week)
    • subscribe to all products
      • Run backfill for subscribe to all products and notify users
      • Remove feature flag code for subscribe to all products and cleanup code
    • stripe startup metadata
      • Finish stripe metadata clean - 20-30 left to manual fixes + a few hours of manual checks
      • Build a startup plan dashboard in dashboard in PostHog
    • misc
      • Think through "Free / paid - same feature-set"
      • Add at least one E2E SAML test
      • Run backfill for starting/ending backfill bug
      • Re-run the plans map compare
      • Deprecate billing v2 (PR done just need to merge)

Q3 Goals

✅=finished 🟡=in progress 🔴=won't finish ⚪=not started

  1. 🟡 Make onboarding awesome for Product analytics and Data warehouse (Raquel)
  2. ⚪ Support self-serve annual commitments (Zach)
  3. 🟡 Dive into the data to understand our billing metrics and customers better (Zach)
  4. ✅ Launch pricing for data warehouse (Raquel)
  5. 🟡 Hire 2 people (one for billing, one for auth/permissions focus)

This sprint

Time off: @raquelmsmith (July 15-19 and July 26)

  • @zlwaterfield
    • Complete the subscribe to all products backfill (left over - ran into data issues that have been resolved)
    • Complete the startup plan metadata clean and dashboard (left over)
    • Add at least one E2E SAML test (left over)
    • Improve activation error redirects in billing
    • Block customers from resubscribing if they've previously had their sub canceled from failed payments
    • Update support response time copy to use "target response time"
    • Improve billing limits - store as number, improve validation / error handling in client and server
    • Startup customer events for customer.io emails when rolling off
    • Misc plan issues - users on enterprise when they shouldn't be, mis matched tiers, free session replay plans, etc.
  • @raquelmsmith

@fuziontech
Copy link
Member

fuziontech commented Jul 17, 2024

Team Click Haus, Haus of the Hogs

OKR Q2 2024

Objective

James as a Service -> Clickhouse as a Service

  • P0 tasks such as
    • 🟡 Deletes
    • 🟢 Keeping clusters happy
    • 🟢 Provisioning more disks
    • 🟢 Schema Reviews
    • 🟢 Debugging
    • 🟡 Performance
    • 🟢 Backups/Restores
  • Decide whether ByConity is the way forward
    • 🟢 Load it with data, set up
    • 🟢 Test performance, test the functionality/compatibility gaps
  • IF ByConity works, migrate over to it
    • 🟢 Enumerate all functionality that doesn’t work and update the functions/contribute to ByConity
    • 🟢 Syntax
    • 🟡 If it works on metal, put it in k8s with Karpenter
    • 🟡 Evaluate which nodes we should use
  • IF ByConity doesn’t work, reshard US to look like EU cluster
    • 🟡 All clusters (Dev, US, EU) should be consistent in shape and topology. This will make it easier to manage and maintain the clusters and apply learnings from one cluster to another.
    • 🟢 We want all cluster operations to be automated and managed through some form of infra as code that is available in source control.
    • 🟡 Schema management on ClickHouse should be entirely automated and managed through source control with no exceptions. This includes Coordinator schemas.
    • 🟢 We should be able to spin up and down replicas of any cluster with no manual intervention.
    • 🟢 We should be able to upgrade ClickHouse versions with no manual intervention.
    • 🟡 We should have tooling / runbooks for resharding (if we continue down the current coordinator path)

Board

https://github.com/orgs/PostHog/projects/85/views/2

Retro

@Daesgar
The good: We have accomplished quite a bit. US cluster is pretty much done. 4 new replicas are available. Just about to the point where we can retire the old nodes. We can start with deletes and test ByConity. Very happy that backups are working - huge peace of mind. We can retire the snapshot nodes.

The bad: Waiting a lot for PRs to get approved (in general). Feels annoying to bug people in dev channel. Developer role in AWS does not have privileges to read IAM roles (we should grant this)

@fuziontech - It feels like we got a lot done and that we've hit our stride. We are both working effectively together getting things done at the same time. It's really nice having follow the sun coverage for CH. Overall 11/10 right now.

Board Snapshot

image

@Twixes
Copy link
Member

Twixes commented Jul 17, 2024

Team ___ (placeholder left intentionally)

Support hero: @anirudhpillai (one week by @Twixes, one week by @aspicer)

Time off: @webjunkie (full sprint) + @aspicer (first week)

Retro

  • @webjunkie Insight caching state investigation (do we pre-warming? -> inclined to remove). 🟢 New pre-warming rolled out to the largest customers (for when Julian's back: cleaning up after the older InsightCachingState state)
  • @skoob13 Probably start experimenting with LLMs on insights (natural text -> query nodes). Prioritized rolling out multiple breakdowns.
  • @aspicer Driving along tickets. 🟢 Also, funnels as a user-defined function.
  • @Twixes tbd 🟢 Support. Project environments still WIP.
  • @thmsobrmlr Support.

OKR

Q3 2024 objectives

  1. Rock-solid analytics (@thmsobrmlr + @webjunkie + @aspicer + @anirudhpillai)
    1. 🟢 Legacy Minus – removing legacy insights code so that we can move fast
    2. 🔴 Tests Plus – shipping fewer bugs in the first place.
    3. 🔴 Metrics Plus – catching issues before before users report
    4. 🟡 Performance Plus - eliminating UX pain via maximum query performance/reliability, based on Metrics Plus data
    5. 🟢 Support Plus – sparking joy for users when they’re led to report a bug
  2. Answering more product questions, deeper (@thmsobrmlr + @webjunkie + @aspicer + @anirudhpillai)
    1. 🔴 Growth Plus - increasing ease of onboarding, and subsequent retention
    2. 🟡 Analysis Plus - answering more product questions, more deeply
  3. 🟡 ArtificialHog (@Twixes + @skoob13) – an LLM-based chat-like interface for answering product questions.

High priority

  • @thmsobrmlr Remove filters from the frontend, allowing us to save new insights with query only.
  • @skoob13 "First time this event was done by user" filer in Trends. Then, LLM-based insight generation MVP.
  • @Twixes 1 week of support. Project environments rolled out internally. Onboarding @annaszell.
  • @aspicer Just 1 week of support.

@robbie-c
Copy link
Member

robbie-c commented Jul 17, 2024

Team web analytics

Support hero: @robbie-c

Retro

Sessions v2 backfill finished on EU, I've started migrating some users, planning to change the default later this week.

Sessions attribution explorer has been pretty handy for one customer, it's enabled some customers to give me super detailed feedback on how our attribution works.

I've also made some other improvements to attribution, for example we now support much more traffic referred from mobile apps (e.g. the native android search widget). I wrote a scraper that uses the list of domains we already have, and grabs their .well-known files to see what mobile apps they have.

Tasks

  • 🟢 Figure out difference between queries with session v1 and v2
  • 🟡 Clear out the support queue
  • 🟢 Finish the attribution debug tool
  • 🟡 Improve the refresh logic

OKR

  1. Make querying fast enough for large customers
  2. Heavily requested features
  3. Work better with other products
  4. Product and growth

High priority

Stretch goals

  • Add some visual regression tests for web analytics
  • Docs changes for the improvements to attribution

Ongoing

  • In the background, continue to backfill the sessions table

@tiina303
Copy link
Contributor

tiina303 commented Jul 17, 2024

Team Pipeline

Off: Tiina 7 days, Xavier 5 days
Support: Brett

Retro

High priority

  • Hog support for Rusty-Hook (Brett)
  • Separate pipeline for $$heatmap events (Xavier, Tiina, Paul)
    • iwarnings, heatmaps and exceptions will all be deployed to prod by eow
  • Inline a processEvent plugin (Oliver)
    • likely will ship to by eow

Low priority / side quests

  • Collect rdkafka metrics (broker response latency, error rates) for all node producers & consumer (Xavier)
    • likely will ship by eow

OKR

✅=finished 🟢=on track to finish this quarter 🟡=might not finish 🔴=won't finish
✔️=progressed last sprint ; ➡️=planned work for this sprint

🟢➡️ Test Warpstream as PoC and decide whether to do it or not
🟢✔️➡️ Pipeline scalability Improving pipeline throughput
🟢✔️➡️ Help other teams ship fast
🟢➡️ Stretch: better e2e monitoring

High priority

Low priority / side quests

  • mvp of event and property definitions as a separate service (Oliver)
  • Kafka topic in front of Rusty Hook (Brett)

@marandaneto marandaneto unpinned this issue Aug 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sprint Sprint planning
Projects
None yet
Development

No branches or pull requests

10 participants