SREcon18 Americas 我的推荐清单 #63

Data Collection: Packetbeat, Collectd, DSC, Fievel, and GoPassiveDNS
DB engines: Prometheus, Druid, ClickHouse, InfluxDB, ElasticSearch, and OpenTSDB
Visualization: Kibana, Grafana, and Graphite Web

17. Security as a Service ⭐️

安全即服务，嘛我不是安全团队的，不感兴趣。

18. "Capacity Prediction" instead of "Capacity Planning": How Uber Uses ML to Accurately Forecast Resource Utilization ⭐️⭐️⭐️

Uber 使用机器学习进行“容量预测”，取代以前的“容量规划”的一些探索。容量很复杂，这个仅仅是一个探索方向。

19. Distributed Tracing, Lessons Learned ⭐️

分布式 Trace 的一些心得。

20. Know Thy Enemy: How to Prioritize and Communicate Risks ⭐️⭐️⭐️

认识和管理风险，这个是 Google CRE 团队的入门文章。看过《Google SRE》的就会发现大部分书里都提过。

21. Building Shopify's PaaS on Kubernetes ⭐️

K8s 实战经验<_<

22. Automatic Metric Screening for Service Diagnosis ⭐️⭐️⭐️⭐️

陈老师的故障根因诊断！

23. Approaching the Unacceptable Workload Boundary ⭐️⭐️

关于容量和工作负载的文章，如果你对性能瓶颈、压测感兴趣，可以看看。

ninehills changed the title ~~[SREcon18 Americas] 我的推荐~~ [SREcon18 Americas] 我的推荐清单 Jun 2, 2018

ninehills added blog done labels Jun 3, 2018

ninehills changed the title ~~[SREcon18 Americas] 我的推荐清单~~ SREcon18 Americas 我的推荐清单 Jun 6, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SREcon18 Americas 我的推荐清单 #63

SREcon18 Americas 我的推荐清单 #63

ninehills commented Jun 2, 2018 •

edited

Loading

SREcon18 Americas 我的推荐清单 #63

SREcon18 Americas 我的推荐清单 #63

Comments

ninehills commented Jun 2, 2018 • edited Loading

1. [Workshop] Containers from Scratch ⭐️⭐️⭐️⭐️

2. [Workshop] How to Build a Distributed System in 3 Hours ⭐️⭐️⭐️

3. [Workshop] Ansible for SRE Teams ⭐️⭐️⭐️

4. [Workshop]Tech Writing 101 for SREs ⭐️⭐️

5. [Workshop]Chaos Engineering Bootcamp ⭐️⭐️

6. If You Don’t Know Where You’re Going, It Doesn’t Matter How Fast You Get There ⭐️⭐️⭐️

7. Stable and Accurate Health-Checking of Horizontally-Scaled Services ⭐️⭐️⭐️⭐️

8. Don’t Ever Change! Are Immutable Deployments Really Simpler, Faster, and Safer? ⭐️⭐️⭐️

9. Lessons Learned from Our Main Database Migrations at Facebook ⭐️⭐️

10. Leveraging Multiple Regions to Improve Site Reliability: Lessons Learned from Jet.com ⭐️⭐️⭐️

11. Lessons Learned from Five Years of Multi-Cloud at PagerDuty ⭐️⭐️

12. Help Protect Your Data Centers with Safety Constraints ⭐️⭐️⭐️⭐️⭐️

13. Real World SLOs and SLIs: A Deep Dive ⭐️⭐️

14. Learning at Scale Is Hard! Outage Pattern Analysis and Dirty Data ⭐️⭐️

15. Containerization War Stories ⭐️

16. Monitoring DNS with Open-Source Solutions ⭐️⭐️

17. Security as a Service ⭐️

18. "Capacity Prediction" instead of "Capacity Planning": How Uber Uses ML to Accurately Forecast Resource Utilization ⭐️⭐️⭐️

19. Distributed Tracing, Lessons Learned ⭐️

20. Know Thy Enemy: How to Prioritize and Communicate Risks ⭐️⭐️⭐️

21. Building Shopify's PaaS on Kubernetes ⭐️

22. Automatic Metric Screening for Service Diagnosis ⭐️⭐️⭐️⭐️

23. Approaching the Unacceptable Workload Boundary ⭐️⭐️

ninehills commented Jun 2, 2018 •

edited

Loading