Netflix Technology BlogData Gateway — A Platform for Growing and Protecting the Data TierShahar Zimmerman, Vidhya Arvind, Joey Lynch, Vinay ChellaApr 23, 20246945Apr 23, 20246945
InTDS ArchivebyCiro GrecoWrite-Audit-Publish for Data Lakes in Pure Python (no JVM)An open source implementation of WAP using Apache Iceberg, Lambdas, and Project Nessie all running entirely PythonApr 12, 20243342Apr 12, 20243342
Ong Xuan HongDataOps 03: Trino + DBT + Spark — Everything Everywhere All at OnceI choose a lazy person to do a hard job. Because a lazy person will find an easy way to do it. — Bill GatesFeb 22, 20232552Feb 22, 20232552
Albert WongOpen Source Alternatives to DataBricks SQL WarehouseDatabricks SQL Warehouse is a managed service within the Databricks platform that provides scalable SQL compute resources decoupled from…Jan 5, 202465Jan 5, 202465
InStarRocks EngineeringbyStarRocks EngineeringTrip.com Chooses StarRocks over Trino to Query Data in Apache Hive (10x performance)IntroductionDec 9, 202321Dec 9, 202321
Jeremy SurgetWhat I learned after one year of building a Data Platform from scratchMy key learnings on building a Data platform, from the tech side to the business sideNov 14, 20233.2K49Nov 14, 20233.2K49
InDev GeniusbyPetrica LeucaData processing with Spark: time travelingThe utilities which enable us to accomplish ACID in Spark, do not come empty handed. They have many features and in this article I will…Sep 28, 2022591Sep 28, 2022591
InInterviewNoodlebyMahesh SainiDoorDash’s Write-Heavy Scalable Inventory Platform — System DesignAs DoorDash made the move from made-to-order restaurant delivery into the Convenience and Grocery (CnG) business, they had to find a way…Oct 4, 20231.1K12Oct 4, 20231.1K12
InBlaBlaCarbySouhaib Guitouni11 lessons learned managing a Data Platform team within a data meshIntroductionOct 10, 20234397Oct 10, 20234397
Sairam KrishVisualize parquet files with Apache Superset using Trino or PrestoSQLMany times, I like to visualize contents in formats like parquet, csv, json etc in Apache Superset. This article tries to provide a demo…Dec 30, 202159Dec 30, 202159
Mariusz KujawskiData Lakehouse vs Data Warehouse vs Data Lake-Comparison of data platformsFor decades, data warehouses have been the dominant architectural approach for building data platforms in enterprises. However, with the…Jul 24, 20233904Jul 24, 20233904
InTDS Archiveby💡Mike ShakhomirovHow to Become a Data EngineerA shortcut for beginners in 2024Oct 7, 20231.3K15Oct 7, 20231.3K15
InTDS ArchivebyDamian GilMastering Customer Segmentation with LLMUnlock advanced customer segmentation techniques using LLMs, and improve your clustering models with advanced techniquesSep 26, 20234.5K36Sep 26, 20234.5K36
InProfitOpticsbyJean-Georges PerrinData Contract 101A quick and not-so-dirty introduction to data contractsSep 10, 20233651Sep 10, 20233651
InAWS in Plain EnglishbyApache SeaTunnelMeet Apache SeaTunnel, a new Apache Top-Level Project!IntroductionJun 1, 2023671Jun 1, 2023671
InTDS ArchivebyMahdi KarabibenWriting design docs for data pipelinesExploring the what, why, and how of design docs for data components — and why they matter.May 22, 20237811May 22, 20237811
Vasileios AnagnostopoulosManipulating Delta Lake tables on MinIO with TrinoAn educational Delta LakehouseMar 2, 2023832Mar 2, 2023832
InITNEXTbyGary A. StaffordBuilding Data Lakes on AWS with Kafka Connect, Debezium, Apicurio Registry, and Apache HudiLearn how to build a near real-time transactional data lake on AWS using a combination of Open Source Software (OSS) and AWS ServicesFeb 28, 20231743Feb 28, 20231743
InNetflix TechBlogbyNetflix Technology BlogAuto-Diagnosis and Remediation in Netflix Data PlatformBy Vikram Srivastava and Marcelo MaywormJan 14, 20225671Jan 14, 20225671