index.xml

<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>datastrophic</title><link>https://datastrophic.io/</link><description>Recent content on datastrophic</description><generator>Hugo -- gohugo.io</generator><language>en-us</language><lastBuildDate>Thu, 16 Dec 2021 00:00:00 +0000</lastBuildDate><atom:link href="https://datastrophic.io/index.xml" rel="self" type="application/rss+xml"/><item><title>Secure Kubeflow Ingress and Authentication with Istio External Auth, Dex, and OAuth2 Proxy</title><link>https://datastrophic.io/secure-kubeflow-ingress-and-authentication/</link><pubDate>Thu, 16 Dec 2021 00:00:00 +0000</pubDate><guid>https://datastrophic.io/secure-kubeflow-ingress-and-authentication/</guid><description>Publicly exposed insecure service endpoints on Kubernetes produce a major risk of malicious workloads being deployed on your clusters. We&amp;rsquo;ve seen reports of the Kubernetes Dashboard, the Kubeflow Central Dashboard, and the Kubeflow Pipelines all were compromised when publicly exposed to the Internet. Combined with wide RBAC permissions, a publicly exposed software with workload scheduling capabilities opens your clusters for malicious deployments to anybody knowing the endpoint URL.
This blog post focuses on building a secure ingress and authentication stack on Kubernetes with Istio targeting Kubeflow installations.</description></item><item><title>The Ultimate Kubernetes Homelab Guide: From Zero to Production Cluster On-Premises</title><link>https://datastrophic.io/kubernetes-homelab-with-proxmox-kubeadm-calico-openebs-and-metallb/</link><pubDate>Wed, 01 Dec 2021 00:00:00 +0000</pubDate><guid>https://datastrophic.io/kubernetes-homelab-with-proxmox-kubeadm-calico-openebs-and-metallb/</guid><description>Whether you&amp;rsquo;re looking for a more powerful development environment or a production-grade Kubernetes cluster for experiments, this guide provides end-to-end deployment and configuration instructions to get the cluster up and running.
The first part of this guide covers the planning and provisioning of the infrastructure with Proxmox and Terraform. The second part is dedicated to installing Kubernetes and essential software such as Calico for networking, OpenEBS for volume provisioning, and MetalLB for network load balancing.</description></item><item><title>Kubeflow Training Operators and Istio: solving the proxy sidecar lifecycle problem for AI/ML workloads</title><link>https://datastrophic.io/kubeflow-training-operators-and-istio-solving-the-proxy-sidecar-lifecycle-problem-for-aiml-workloads/</link><pubDate>Mon, 04 Oct 2021 00:00:00 +0000</pubDate><guid>https://datastrophic.io/kubeflow-training-operators-and-istio-solving-the-proxy-sidecar-lifecycle-problem-for-aiml-workloads/</guid><description>With Kubeflow gaining traction in the community and its early adoption in enterprises, security and observability concerns become more and more important. Many organizations that are running AI/ML workloads, operate with sensitive personal or financial data and have stricter requirements for data encryption, traceability, and access control. Quite often, we can see the use of the Istio service mesh for solving these problems and gaining other benefits of the rich functionality it provides.</description></item><item><title>Spark JobServer: from Spark Standalone to Mesos, Marathon and Docker</title><link>https://datastrophic.io/spark-jobserver-from-spark-standalone-to-mesos-marathon-and-docker-part-i/</link><pubDate>Thu, 12 Oct 2017 00:00:00 +0000</pubDate><guid>https://datastrophic.io/spark-jobserver-from-spark-standalone-to-mesos-marathon-and-docker-part-i/</guid><description>After several years of running Spark JobServer workloads, the need for better availability and multi-tenancy emerged across several projects author was involved in. This blog post covers design decisions made to provide higher availability and fault tolerance of JobServer installations, multi-tenancy for Spark workloads, scalability and failure recovery automation, and software choices made in order to reach these goals.
Spark JobServer Spark JobServer is widely used across a variety of reporting and aggregating systems.</description></item><item><title>Resource Allocation in Mesos: Dominant Resource Fairness</title><link>https://datastrophic.io/resource-allocation-in-mesos-dominant-resource-fairness-explained/</link><pubDate>Sun, 27 Mar 2016 00:00:00 +0000</pubDate><guid>https://datastrophic.io/resource-allocation-in-mesos-dominant-resource-fairness-explained/</guid><description>Apache Mesos provides a unique approach to cluster resource management called two-level scheduling: instead of storing information about available cluster resources in a centralized manner it operates with a notion of resource offers which slave nodes advertise to running frameworks via Mesos master, thus keeping the whole system architecture concise and scalable. Master&amp;rsquo;s allocation module is responsible for making the decisions about which application should receive the next resource offer and it relies on Dominant Resource Fairness(DRF) algorithm for making these decisions.</description></item><item><title>Apache Spark: core concepts, architecture and internals</title><link>https://datastrophic.io/core-concepts-architecture-and-internals-of-apache-spark/</link><pubDate>Thu, 03 Mar 2016 00:00:00 +0000</pubDate><guid>https://datastrophic.io/core-concepts-architecture-and-internals-of-apache-spark/</guid><description>This post covers core concepts of Apache Spark such as RDD, DAG, execution workflow, forming stages of tasks, and shuffle implementation and also describes the architecture and main components of Spark Driver. There&amp;rsquo;s a github.com/datastrophic/spark-workshop project created alongside this post which contains Spark Applications examples and dockerized Hadoop environment to play with. Slides are also available at slideshare.
Intro Spark is a generalized framework for distributed data processing providing functional API for manipulating data at scale, in-memory data caching, and reuse across computations.</description></item><item><title>Data processing platforms architectures with SMACK: Spark, Mesos, Akka, Cassandra and Kafka</title><link>https://datastrophic.io/data-processing-platforms-architectures-with-spark-mesos-akka-cassandra-and-kafka/</link><pubDate>Wed, 16 Sep 2015 00:00:00 +0000</pubDate><guid>https://datastrophic.io/data-processing-platforms-architectures-with-spark-mesos-akka-cassandra-and-kafka/</guid><description>This post is a follow-up of the talk given at Big Data AW meetup in Stockholm and focused on different use cases and design approaches for building scalable data processing platforms with SMACK(Spark, Mesos, Akka, Cassandra, Kafka) stack. While stack is really concise and consists of only several components it is possible to implement different system designs which list not only purely batch or stream processing, but more complex Lambda and Kappa architectures as well.</description></item><item><title>Cassandra 2.1 Counters: Testing Consistency During Node Failures</title><link>https://datastrophic.io/evaluating-cassandra-2-1-counters-consistency/</link><pubDate>Thu, 03 Sep 2015 00:00:00 +0000</pubDate><guid>https://datastrophic.io/evaluating-cassandra-2-1-counters-consistency/</guid><description>For some cases such as the ones present in AdServing, the counters come really handy to accumulate totals for events coming into a system compared to batch aggregates. While distributed counters consistency is a well-known problem Cassandra counters in version 2.1 are claimed to be more accurate compared to the prior ones. This post describes the approach and the results of Cassandra counters consistency testing in different failure scenarios such as rolling restarts, abnormal termination of nodes, and network splits.</description></item><item><title>In the Wake of Scala Days 2015</title><link>https://datastrophic.io/in-the-wake-of-scala-days-2015/</link><pubDate>Wed, 01 Jul 2015 00:00:00 +0000</pubDate><guid>https://datastrophic.io/in-the-wake-of-scala-days-2015/</guid><description>Scala Days Amsterdam conference was full of interesting topics so in this post I&amp;rsquo;ll cover talks on the Scala platform, core concepts for making Scala code more idiomatic, monad transformers, consistency in distributed systems, distributed domain driven design, and a little more.
This post came out of the post-conference presentation to my team, so the slides are also available here and contain all the links to related materials and presentations so you can discover more on your own.</description></item><item><title/><link>https://datastrophic.io/top/about/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://datastrophic.io/top/about/</guid><description>About Hi! It&amp;rsquo;s Anton here and I&amp;rsquo;m the author of datastrophic.io. I&amp;rsquo;m a technical leader and a software engineer specializing in distributed systems, data platforms, and AI infrastructure. My tenure in the industry passed the mark of 15 years during which I was working on high-load, distributed, big data, workload and container orchestration systems. If you&amp;rsquo;d like to connect or to learn more about my background - the best way to do it is via LinkedIn.</description></item><item><title>Posts Archive</title><link>https://datastrophic.io/top/archive/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://datastrophic.io/top/archive/</guid><description/></item></channel></rss>