<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[The Coder Cafe: The Coding Corner]]></title><description><![CDATA[A corner of The Coder Cafe where we build real-world systems together.]]></description><link>https://read.thecoder.cafe/s/coding-corner</link><image><url>https://substackcdn.com/image/fetch/$s_!OZXv!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa08792c7-cadb-478c-bce4-a10c5dc5ac05_1280x1280.png</url><title>The Coder Cafe: The Coding Corner</title><link>https://read.thecoder.cafe/s/coding-corner</link></image><generator>Substack</generator><lastBuildDate>Sat, 18 Apr 2026 21:24:56 GMT</lastBuildDate><atom:link href="https://read.thecoder.cafe/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Teiva Harsanyi]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[thecodercafe@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[thecodercafe@substack.com]]></itunes:email><itunes:name><![CDATA[Teiva Harsanyi]]></itunes:name></itunes:owner><itunes:author><![CDATA[Teiva Harsanyi]]></itunes:author><googleplay:owner><![CDATA[thecodercafe@substack.com]]></googleplay:owner><googleplay:email><![CDATA[thecodercafe@substack.com]]></googleplay:email><googleplay:author><![CDATA[Teiva Harsanyi]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Build Your Own Key-Value Storage Engine—Week 8]]></title><description><![CDATA[Concurrency]]></description><link>https://read.thecoder.cafe/p/build-your-own-kv-engine-8</link><guid isPermaLink="false">https://read.thecoder.cafe/p/build-your-own-kv-engine-8</guid><dc:creator><![CDATA[Teiva Harsanyi]]></dc:creator><pubDate>Wed, 11 Mar 2026 11:02:34 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Ghr3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed12df79-76ee-4217-af6b-eb4a13f41498_1600x800.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1>Monster Scale Summit</h1><p><em>Curious how leading engineers tackle extreme scale challenges with data-intensive applications? Join Monster Scale Summit (free + virtual). It&#8217;s hosted by ScyllaDB, the monstrously fast and scalable database.</em></p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://www.scylladb.com/monster-scale-summit/?latest_sfdc_campaign=701Rb00000YVkNx&amp;campaign_status=Submitted&amp;utm_campaign=pn%20coder%20cafe%202026-03-11%20monster%20scale%20summit&amp;utm_medium=paid%20newsletter&amp;utm_source=paid%20newsletter&amp;lead_source_type=coder%20cafe" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!p4cN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png 424w, https://substackcdn.com/image/fetch/$s_!p4cN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png 848w, https://substackcdn.com/image/fetch/$s_!p4cN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png 1272w, https://substackcdn.com/image/fetch/$s_!p4cN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!p4cN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png" width="1456" height="182" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:182,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:737127,&quot;alt&quot;:&quot;Monster Scale Summit.&quot;,&quot;title&quot;:&quot;Monster Scale Summit.&quot;,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://www.scylladb.com/monster-scale-summit/?latest_sfdc_campaign=701Rb00000YVkNx&amp;campaign_status=Submitted&amp;utm_campaign=pn%20coder%20cafe%202026-03-11%20monster%20scale%20summit&amp;utm_medium=paid%20newsletter&amp;utm_source=paid%20newsletter&amp;lead_source_type=coder%20cafe&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.thecoder.cafe/i/174600320?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Monster Scale Summit." title="Monster Scale Summit." srcset="https://substackcdn.com/image/fetch/$s_!p4cN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png 424w, https://substackcdn.com/image/fetch/$s_!p4cN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png 848w, https://substackcdn.com/image/fetch/$s_!p4cN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png 1272w, https://substackcdn.com/image/fetch/$s_!p4cN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a></figure></div><div><hr></div><h1>Agenda</h1><ul><li><p><a href="https://read.thecoder.cafe/p/build-your-own-kv-engine">Week 0: Introduction</a></p></li><li><p><a href="https://read.thecoder.cafe/p/build-your-own-kv-engine-1">Week 1: In-Memory Store</a></p></li><li><p><a href="https://read.thecoder.cafe/p/build-your-own-kv-engine-2">Week 2: LSM Tree Foundations</a></p></li><li><p><a href="https://read.thecoder.cafe/p/build-your-own-kv-engine-3">Week 3: Durability with Write-Ahead Logging</a></p></li><li><p><a href="https://read.thecoder.cafe/p/build-your-own-kv-engine-4">Week 4: Deletes, Tombstones, and Compaction</a></p></li><li><p><a href="https://read.thecoder.cafe/p/build-your-own-kv-engine-5">Week 5: Leveling and Key-Range Partitioning</a></p></li><li><p><a href="https://read.thecoder.cafe/p/build-your-own-kv-engine-6">Week 6: Block-Based SSTables and Indexing</a></p></li><li><p><a href="https://read.thecoder.cafe/p/build-your-own-kv-engine-7">Week 7: Bloom Filters and Trie Memtable</a></p></li><li><p><strong><a href="https://read.thecoder.cafe/p/build-your-own-kv-engine-8">Week 8: Concurrency</a></strong></p></li></ul><h1>Introduction</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Ghr3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed12df79-76ee-4217-af6b-eb4a13f41498_1600x800.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Ghr3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed12df79-76ee-4217-af6b-eb4a13f41498_1600x800.png 424w, https://substackcdn.com/image/fetch/$s_!Ghr3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed12df79-76ee-4217-af6b-eb4a13f41498_1600x800.png 848w, https://substackcdn.com/image/fetch/$s_!Ghr3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed12df79-76ee-4217-af6b-eb4a13f41498_1600x800.png 1272w, https://substackcdn.com/image/fetch/$s_!Ghr3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed12df79-76ee-4217-af6b-eb4a13f41498_1600x800.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Ghr3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed12df79-76ee-4217-af6b-eb4a13f41498_1600x800.png" width="1456" height="728" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ed12df79-76ee-4217-af6b-eb4a13f41498_1600x800.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:728,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2297071,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://read.thecoder.cafe/i/174613534?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed12df79-76ee-4217-af6b-eb4a13f41498_1600x800.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Ghr3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed12df79-76ee-4217-af6b-eb4a13f41498_1600x800.png 424w, https://substackcdn.com/image/fetch/$s_!Ghr3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed12df79-76ee-4217-af6b-eb4a13f41498_1600x800.png 848w, https://substackcdn.com/image/fetch/$s_!Ghr3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed12df79-76ee-4217-af6b-eb4a13f41498_1600x800.png 1272w, https://substackcdn.com/image/fetch/$s_!Ghr3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed12df79-76ee-4217-af6b-eb4a13f41498_1600x800.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Over this series, you built a working LSM tree: you flush to persist the memtable to disk and compact to reclaim space.</p><p>Yet, you&#8217;ve been single-threaded so far. This week, we lift that constraint: flush and compaction will run in the background while you keep serving requests.</p><p>There are many ways to add concurrency. The approach here is to introduce a versioned, ref-counted catalog that lets readers take a stable snapshot while background flush/compaction proceeds.</p><p>A catalog holds references to:</p><ul><li><p>The current memtable.</p></li><li><p>The current WAL.</p></li><li><p>The current MANIFEST.</p></li></ul><p>Each request pins one catalog version for the duration of the operation.</p><p>When a flush or compaction completes, the system creates a new catalog version. Old resources (e.g., obsolete SSTables) are not deleted immediately. Instead, each catalog tracks a refcount of in-flight requests. Once an old catalog&#8217;s refcount drops to zero and a newer catalog exists, you can safely garbage-collect the resources that appear in the old version but not in the new one.</p><p>For example, with two catalog versions (red = older version&#8217;s elements, blue = newer element&#8221;):</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!qdlg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe51f1a4-8d59-4b32-a5d9-7c96c58e7c4c_1042x1002.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!qdlg!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe51f1a4-8d59-4b32-a5d9-7c96c58e7c4c_1042x1002.png 424w, https://substackcdn.com/image/fetch/$s_!qdlg!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe51f1a4-8d59-4b32-a5d9-7c96c58e7c4c_1042x1002.png 848w, https://substackcdn.com/image/fetch/$s_!qdlg!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe51f1a4-8d59-4b32-a5d9-7c96c58e7c4c_1042x1002.png 1272w, https://substackcdn.com/image/fetch/$s_!qdlg!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe51f1a4-8d59-4b32-a5d9-7c96c58e7c4c_1042x1002.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!qdlg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe51f1a4-8d59-4b32-a5d9-7c96c58e7c4c_1042x1002.png" width="550" height="528.8867562380038" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fe51f1a4-8d59-4b32-a5d9-7c96c58e7c4c_1042x1002.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1002,&quot;width&quot;:1042,&quot;resizeWidth&quot;:550,&quot;bytes&quot;:63459,&quot;alt&quot;:&quot;Diagram showing two side-by-side columns labeled &#8220;Catalog v1&#8221; and &#8220;Catalog v2.&#8221; Each column contains stacked boxes for &#8220;Memtable,&#8221; &#8220;WAL,&#8221; &#8220;MANIFEST,&#8221; and several &#8220;SSTable&#8221; entries. In Catalog v1, some boxes (MANIFEST, SSTable 2, SSTable 3) are shaded red to mark older elements, while in Catalog v2 the MANIFEST and SSTable 4 are shaded blue to indicate newer ones.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://read.thecoder.cafe/i/174613534?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe51f1a4-8d59-4b32-a5d9-7c96c58e7c4c_1042x1002.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Diagram showing two side-by-side columns labeled &#8220;Catalog v1&#8221; and &#8220;Catalog v2.&#8221; Each column contains stacked boxes for &#8220;Memtable,&#8221; &#8220;WAL,&#8221; &#8220;MANIFEST,&#8221; and several &#8220;SSTable&#8221; entries. In Catalog v1, some boxes (MANIFEST, SSTable 2, SSTable 3) are shaded red to mark older elements, while in Catalog v2 the MANIFEST and SSTable 4 are shaded blue to indicate newer ones." title="Diagram showing two side-by-side columns labeled &#8220;Catalog v1&#8221; and &#8220;Catalog v2.&#8221; Each column contains stacked boxes for &#8220;Memtable,&#8221; &#8220;WAL,&#8221; &#8220;MANIFEST,&#8221; and several &#8220;SSTable&#8221; entries. In Catalog v1, some boxes (MANIFEST, SSTable 2, SSTable 3) are shaded red to mark older elements, while in Catalog v2 the MANIFEST and SSTable 4 are shaded blue to indicate newer ones." srcset="https://substackcdn.com/image/fetch/$s_!qdlg!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe51f1a4-8d59-4b32-a5d9-7c96c58e7c4c_1042x1002.png 424w, https://substackcdn.com/image/fetch/$s_!qdlg!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe51f1a4-8d59-4b32-a5d9-7c96c58e7c4c_1042x1002.png 848w, https://substackcdn.com/image/fetch/$s_!qdlg!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe51f1a4-8d59-4b32-a5d9-7c96c58e7c4c_1042x1002.png 1272w, https://substackcdn.com/image/fetch/$s_!qdlg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe51f1a4-8d59-4b32-a5d9-7c96c58e7c4c_1042x1002.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Once we can guarantee that catalog v1 is no longer referenced, we can delete the old MANIFEST, SST-2 and SST-3.</p><p>Another example: a flush produced a new memtable and WAL file:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!mjZi!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa51ae17f-7a55-40ea-96e3-6fa5bda5602f_1042x1002.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!mjZi!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa51ae17f-7a55-40ea-96e3-6fa5bda5602f_1042x1002.png 424w, https://substackcdn.com/image/fetch/$s_!mjZi!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa51ae17f-7a55-40ea-96e3-6fa5bda5602f_1042x1002.png 848w, https://substackcdn.com/image/fetch/$s_!mjZi!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa51ae17f-7a55-40ea-96e3-6fa5bda5602f_1042x1002.png 1272w, https://substackcdn.com/image/fetch/$s_!mjZi!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa51ae17f-7a55-40ea-96e3-6fa5bda5602f_1042x1002.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!mjZi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa51ae17f-7a55-40ea-96e3-6fa5bda5602f_1042x1002.png" width="550" height="528.8867562380038" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a51ae17f-7a55-40ea-96e3-6fa5bda5602f_1042x1002.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1002,&quot;width&quot;:1042,&quot;resizeWidth&quot;:550,&quot;bytes&quot;:61772,&quot;alt&quot;:&quot;Diagram showing two side-by-side columns labeled &#8220;Catalog v1&#8221; and &#8220;Catalog v2.&#8221; Each column lists stacked boxes for &#8220;Memtable,&#8221; &#8220;WAL,&#8221; &#8220;MANIFEST,&#8221; and several &#8220;SSTable&#8221; entries. In Catalog v1, the Memtable and WAL are shaded red, while in Catalog v2, those same elements are shaded blue to indicate the new versions created after a flush.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://read.thecoder.cafe/i/174613534?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa51ae17f-7a55-40ea-96e3-6fa5bda5602f_1042x1002.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Diagram showing two side-by-side columns labeled &#8220;Catalog v1&#8221; and &#8220;Catalog v2.&#8221; Each column lists stacked boxes for &#8220;Memtable,&#8221; &#8220;WAL,&#8221; &#8220;MANIFEST,&#8221; and several &#8220;SSTable&#8221; entries. In Catalog v1, the Memtable and WAL are shaded red, while in Catalog v2, those same elements are shaded blue to indicate the new versions created after a flush." title="Diagram showing two side-by-side columns labeled &#8220;Catalog v1&#8221; and &#8220;Catalog v2.&#8221; Each column lists stacked boxes for &#8220;Memtable,&#8221; &#8220;WAL,&#8221; &#8220;MANIFEST,&#8221; and several &#8220;SSTable&#8221; entries. In Catalog v1, the Memtable and WAL are shaded red, while in Catalog v2, those same elements are shaded blue to indicate the new versions created after a flush." srcset="https://substackcdn.com/image/fetch/$s_!mjZi!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa51ae17f-7a55-40ea-96e3-6fa5bda5602f_1042x1002.png 424w, https://substackcdn.com/image/fetch/$s_!mjZi!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa51ae17f-7a55-40ea-96e3-6fa5bda5602f_1042x1002.png 848w, https://substackcdn.com/image/fetch/$s_!mjZi!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa51ae17f-7a55-40ea-96e3-6fa5bda5602f_1042x1002.png 1272w, https://substackcdn.com/image/fetch/$s_!mjZi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa51ae17f-7a55-40ea-96e3-6fa5bda5602f_1042x1002.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In this case, once vatalog v1 has no remaining references, we can free the old memtable and delete the old WAL file.</p><h1>Your Tasks</h1><p>&#128172; If you want to share your progress, discuss solutions, or collaborate with other coders, join the community Discord server (<code>#kv-store-engine</code> channel):</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://discord.thecoder.cafe/&quot;,&quot;text&quot;:&quot;Join the Discord&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://discord.thecoder.cafe/"><span>Join the Discord</span></a></p><h2>Catalog</h2><ul><li><p>Add a <code>Catalog</code> data structure that tracks:</p><ul><li><p>Memtable reference.</p></li><li><p>WAL path.</p></li><li><p>MANIFEST path.</p></li><li><p>Version (monotonic).</p></li><li><p>Refcount of active readers.</p></li></ul></li><li><p>Implement a <code>Catalog</code> manager that keeps catalog versions in memory:</p><ul><li><p><code>Acquire() &#8594; Catalog</code>:</p><ul><li><p>Pick the latest catalog.</p></li><li><p>Increment its refcount.</p></li><li><p>Return it.</p></li></ul></li><li><p><code>Release(Catalog)</code>:</p><ul><li><p>Decrement the refcount of the catalog.</p></li><li><p>If refcount is zero and there&#8217;s a new catalog version:</p><ul><li><p>Remove the current catalog.</p></li><li><p>Remove elements present in the current catalog but not in the latest version (files, WAL, etc.)</p></li></ul></li></ul></li><li><p><code>NewCatalog(memtable, walPath, manifestPath)</code>:</p><ul><li><p>Create a new catalog based on the provided data.</p></li><li><p>Assign a unique, monotonic version.</p></li></ul></li></ul></li><li><p>At startup:</p><ul><li><p>Read from the authoritative MANIFEST (latest MANIFEST file).</p></li><li><p>Treat any files not listed in MANIFEST as orphans and delete them.</p></li><li><p>Read all WAL files you still have on disk, in order, to rebuild the in-memory state.</p></li><li><p>Create the current catalog version from the reconstructed state.</p></li><li><p>Start the background worker.</p></li></ul></li></ul><h2>Background Workers</h2><p>In a nutshell, flush and compaction will move to the background. You&#8217;ll use internal queues plus worker pools to ensure no overlapping work on the same resources: at most one flush running at a time, and at most one compaction running at a time.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!xJrM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9928275d-93f5-4632-91bf-aeb059efcb60_1140x886.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!xJrM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9928275d-93f5-4632-91bf-aeb059efcb60_1140x886.png 424w, https://substackcdn.com/image/fetch/$s_!xJrM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9928275d-93f5-4632-91bf-aeb059efcb60_1140x886.png 848w, https://substackcdn.com/image/fetch/$s_!xJrM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9928275d-93f5-4632-91bf-aeb059efcb60_1140x886.png 1272w, https://substackcdn.com/image/fetch/$s_!xJrM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9928275d-93f5-4632-91bf-aeb059efcb60_1140x886.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!xJrM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9928275d-93f5-4632-91bf-aeb059efcb60_1140x886.png" width="500" height="388.5964912280702" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9928275d-93f5-4632-91bf-aeb059efcb60_1140x886.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:886,&quot;width&quot;:1140,&quot;resizeWidth&quot;:500,&quot;bytes&quot;:210788,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://read.thecoder.cafe/i/174613534?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9928275d-93f5-4632-91bf-aeb059efcb60_1140x886.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!xJrM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9928275d-93f5-4632-91bf-aeb059efcb60_1140x886.png 424w, https://substackcdn.com/image/fetch/$s_!xJrM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9928275d-93f5-4632-91bf-aeb059efcb60_1140x886.png 848w, https://substackcdn.com/image/fetch/$s_!xJrM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9928275d-93f5-4632-91bf-aeb059efcb60_1140x886.png 1272w, https://substackcdn.com/image/fetch/$s_!xJrM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9928275d-93f5-4632-91bf-aeb059efcb60_1140x886.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Compaction:</p><ul><li><p>Keep the same trigger: Every 10,000 update requests.</p></li><li><p>Behavior:</p><ul><li><p>Do not run compaction in the request path.</p></li><li><p>On compaction trigger: Post a notification to an internal queue and return.</p></li></ul></li><li><p>Worker:</p><ul><li><p>A single background thread listens on the queue and runs the actual work.</p></li><li><p>Similar compaction process, except:</p><ul><li><p>Do not overwrite the existing WAL file. Instead, create a new <code>WAL-&lt;version+1&gt;</code> file.</p></li><li><p>Create a new catalog that references the new WAL.</p></li></ul></li></ul></li></ul><p>Flush:</p><ul><li><p>Keep the same trigger: When the memtable contains 2,000 entries.</p></li><li><p>Behavior:</p><ul><li><p>Do not run flush in the request path.</p></li><li><p>On flush trigger:</p><ul><li><p>Allocate a new memtable and create a new WAL file for subsequent writes.</p></li><li><p>Post a notification to an internal queue.</p></li><li><p>Return immediately to the caller.</p></li></ul></li></ul></li><li><p>Worker:</p><ul><li><p>A single background thread listens on the queue and runs the actual work.</p></li><li><p>Similar flush process, except:</p><ul><li><p>Do not overwrite the existing MANIFEST file. Instead, create a new <code>MANIFEST-&lt;version+1&gt;</code> file.</p></li><li><p>Create a new catalog referencing the new MANIFEST.</p></li></ul></li></ul></li></ul><h2>GET/PUT/DELETE</h2><ul><li><p> Acquire a catalog from the manager.</p></li><li><p>Do the operation using paths/refs from that catalog.</p></li><li><p>Release the catalog.</p></li></ul><h2>Client &amp; Validation</h2><p>Concurrent requests make deterministic assertions harder. For example, suppose the validation file contains the following requests that can run in parallel:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!KMgu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9aa3bcb0-62f0-4e4e-bcb6-52ec79e2df48_1322x1162.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!KMgu!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9aa3bcb0-62f0-4e4e-bcb6-52ec79e2df48_1322x1162.png 424w, https://substackcdn.com/image/fetch/$s_!KMgu!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9aa3bcb0-62f0-4e4e-bcb6-52ec79e2df48_1322x1162.png 848w, https://substackcdn.com/image/fetch/$s_!KMgu!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9aa3bcb0-62f0-4e4e-bcb6-52ec79e2df48_1322x1162.png 1272w, https://substackcdn.com/image/fetch/$s_!KMgu!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9aa3bcb0-62f0-4e4e-bcb6-52ec79e2df48_1322x1162.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!KMgu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9aa3bcb0-62f0-4e4e-bcb6-52ec79e2df48_1322x1162.png" width="300" height="263.6913767019667" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9aa3bcb0-62f0-4e4e-bcb6-52ec79e2df48_1322x1162.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1162,&quot;width&quot;:1322,&quot;resizeWidth&quot;:300,&quot;bytes&quot;:50359,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://read.thecoder.cafe/i/174613534?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9aa3bcb0-62f0-4e4e-bcb6-52ec79e2df48_1322x1162.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!KMgu!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9aa3bcb0-62f0-4e4e-bcb6-52ec79e2df48_1322x1162.png 424w, https://substackcdn.com/image/fetch/$s_!KMgu!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9aa3bcb0-62f0-4e4e-bcb6-52ec79e2df48_1322x1162.png 848w, https://substackcdn.com/image/fetch/$s_!KMgu!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9aa3bcb0-62f0-4e4e-bcb6-52ec79e2df48_1322x1162.png 1272w, https://substackcdn.com/image/fetch/$s_!KMgu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9aa3bcb0-62f0-4e4e-bcb6-52ec79e2df48_1322x1162.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>What should you assert for <code>GET foo</code>: <code>1</code>, <code>2</code>, or <code>3</code>?</p><p>To make validation deterministic, you will handle barriers: all requests before a barrier must finish before starting the next block. You will also relax <code>GET</code> checks: a <code>GET</code> is valid if it returns any value written for a key before the last barrier.</p><p>A similar example with barriers:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!lLYs!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F161c49de-f751-4f5a-9969-eb71a5dc0d4a_660x1130.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!lLYs!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F161c49de-f751-4f5a-9969-eb71a5dc0d4a_660x1130.png 424w, https://substackcdn.com/image/fetch/$s_!lLYs!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F161c49de-f751-4f5a-9969-eb71a5dc0d4a_660x1130.png 848w, https://substackcdn.com/image/fetch/$s_!lLYs!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F161c49de-f751-4f5a-9969-eb71a5dc0d4a_660x1130.png 1272w, https://substackcdn.com/image/fetch/$s_!lLYs!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F161c49de-f751-4f5a-9969-eb71a5dc0d4a_660x1130.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!lLYs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F161c49de-f751-4f5a-9969-eb71a5dc0d4a_660x1130.png" width="300" height="513.6363636363636" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/161c49de-f751-4f5a-9969-eb71a5dc0d4a_660x1130.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1130,&quot;width&quot;:660,&quot;resizeWidth&quot;:300,&quot;bytes&quot;:197240,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://read.thecoder.cafe/i/174613534?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F161c49de-f751-4f5a-9969-eb71a5dc0d4a_660x1130.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!lLYs!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F161c49de-f751-4f5a-9969-eb71a5dc0d4a_660x1130.png 424w, https://substackcdn.com/image/fetch/$s_!lLYs!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F161c49de-f751-4f5a-9969-eb71a5dc0d4a_660x1130.png 848w, https://substackcdn.com/image/fetch/$s_!lLYs!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F161c49de-f751-4f5a-9969-eb71a5dc0d4a_660x1130.png 1272w, https://substackcdn.com/image/fetch/$s_!lLYs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F161c49de-f751-4f5a-9969-eb71a5dc0d4a_660x1130.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ul><li><p>The first two <code>PUT</code> requests run in parallel.</p></li><li><p>The first barrier waits for both to complete.</p></li><li><p>The first GET should accept either <code>1</code> or <code>2</code>.</p></li><li><p>The second <code>GET</code> request should accept only <code>3</code>.</p></li></ul><h3>File Format</h3><p>The new validation file is a sequence of blocks separated by <code>BARRIER</code> instructions:</p><pre><code>(PUT|DELETE) &lt;key&gt; &lt;value&gt;
(PUT|DELETE) &lt;key&gt; &lt;value&gt;
...
BARRIER # New instruction
GET &lt;key&gt; [&lt;value1&gt; &lt;value2&gt; ... &lt;valueK&gt;]
GET &lt;key&gt; [&lt;value1&gt; &lt;value2&gt; ... &lt;valueK&gt;]
...
BARRIER</code></pre><ul><li><p>All the lines between two barriers form a block.</p></li><li><p>On <code>BARRIER</code> instruction, wait for all in-flight requests in the current block to finish before starting the next block.</p></li><li><p><code>PUT</code>/<code>DELETE</code> lines are issued in parallel within their block.</p></li><li><p><code>GET</code> lines are also issued in parallel within their block.</p></li><li><p><code>GET &lt;key&gt; [&lt;value1&gt; &lt;value2&gt; ... &lt;valueK</code> means the response must be any one of the list values.</p></li></ul><p>Download and run your client against a new file: <a href="https://github.com/teivah/thecodercafe/blob/main/res/kv/gen/concurrency.txt">concurrency.txt</a>.</p><h2>[Optional] Flush Anticipation</h2><p>When the memtable reaches 80% of capacity:</p><ul><li><p>Pre-allocate the next memtable in memory.</p></li><li><p>Pre-create/rotate to the next WAL on disk.</p></li></ul><h1>Wrap Up</h1><p>That's it for the whole series.</p><p>You implemented a fully functional LSM tree:</p><ul><li><p>Started with a memtable (hashtable) and a flush that writes immutable SSTables to disk.</p></li><li><p>Added a WAL for durability.</p></li><li><p>Handled deletes and compaction to reclaim space.</p></li><li><p>Introduced leveling and key-range partitioning to speed up reads.</p></li><li><p>Switched to block-based SSTables with indexing.</p></li><li><p>Added Bloom filters and replaced the memtable with a radix trie for faster lookups.</p></li><li><p>Finally, introduced concurrency: a simple, single-threaded foreground path with flush and compaction running in the background.</p></li></ul><p>I hope you had fun building it. Thank you for following the series, and special thanks to our partner, <a href="https://www.scylladb.com/">ScyllaDB</a>!</p><h1>Further Notes</h1><p>To get more information on how things work in production databases, you can read how <a href="https://github.com/facebook/rocksdb/wiki/How-we-keep-track-of-live-SST-files">RocksDB keeps track of live SST files</a>. The <code>Catalog</code> structure is inspired by RocksDB&#8217;s <code>VersionSet</code>.</p><p>Conflict resolution is one aspect we&#8217;re missing in the series (maybe as a follow-up?) A versioned catalog is enough for reads, but what about conflicting writes?</p><p>Suppose two clients, Alice and Bob, update the same key around the same time. A simple policy to resolve conflicts is latest wins. The database can serialize operations for the same key to ensure the latest request wins:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!XJcG!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e3cb329-c938-4bc6-a672-c06f6d6c143d_640x880.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!XJcG!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e3cb329-c938-4bc6-a672-c06f6d6c143d_640x880.png 424w, https://substackcdn.com/image/fetch/$s_!XJcG!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e3cb329-c938-4bc6-a672-c06f6d6c143d_640x880.png 848w, https://substackcdn.com/image/fetch/$s_!XJcG!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e3cb329-c938-4bc6-a672-c06f6d6c143d_640x880.png 1272w, https://substackcdn.com/image/fetch/$s_!XJcG!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e3cb329-c938-4bc6-a672-c06f6d6c143d_640x880.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!XJcG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e3cb329-c938-4bc6-a672-c06f6d6c143d_640x880.png" width="350" height="481.25" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0e3cb329-c938-4bc6-a672-c06f6d6c143d_640x880.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:880,&quot;width&quot;:640,&quot;resizeWidth&quot;:350,&quot;bytes&quot;:188643,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://read.thecoder.cafe/i/174613534?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e3cb329-c938-4bc6-a672-c06f6d6c143d_640x880.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!XJcG!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e3cb329-c938-4bc6-a672-c06f6d6c143d_640x880.png 424w, https://substackcdn.com/image/fetch/$s_!XJcG!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e3cb329-c938-4bc6-a672-c06f6d6c143d_640x880.png 848w, https://substackcdn.com/image/fetch/$s_!XJcG!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e3cb329-c938-4bc6-a672-c06f6d6c143d_640x880.png 1272w, https://substackcdn.com/image/fetch/$s_!XJcG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e3cb329-c938-4bc6-a672-c06f6d6c143d_640x880.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In this example, the database ends up with <code>foo=baz</code> as the latest state.</p><p>This approach works with one node. But what about databases composed of multiple nodes? Say the two requests go to two different nodes at roughly the same time:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!A_zd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70ca8181-7b76-4c35-bc23-9afcbdd6de14_2082x1202.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!A_zd!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70ca8181-7b76-4c35-bc23-9afcbdd6de14_2082x1202.png 424w, https://substackcdn.com/image/fetch/$s_!A_zd!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70ca8181-7b76-4c35-bc23-9afcbdd6de14_2082x1202.png 848w, https://substackcdn.com/image/fetch/$s_!A_zd!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70ca8181-7b76-4c35-bc23-9afcbdd6de14_2082x1202.png 1272w, https://substackcdn.com/image/fetch/$s_!A_zd!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70ca8181-7b76-4c35-bc23-9afcbdd6de14_2082x1202.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!A_zd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70ca8181-7b76-4c35-bc23-9afcbdd6de14_2082x1202.png" width="522" height="301.5123626373626" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/70ca8181-7b76-4c35-bc23-9afcbdd6de14_2082x1202.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:841,&quot;width&quot;:1456,&quot;resizeWidth&quot;:522,&quot;bytes&quot;:70795,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://read.thecoder.cafe/i/174613534?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70ca8181-7b76-4c35-bc23-9afcbdd6de14_2082x1202.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!A_zd!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70ca8181-7b76-4c35-bc23-9afcbdd6de14_2082x1202.png 424w, https://substackcdn.com/image/fetch/$s_!A_zd!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70ca8181-7b76-4c35-bc23-9afcbdd6de14_2082x1202.png 848w, https://substackcdn.com/image/fetch/$s_!A_zd!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70ca8181-7b76-4c35-bc23-9afcbdd6de14_2082x1202.png 1272w, https://substackcdn.com/image/fetch/$s_!A_zd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70ca8181-7b76-4c35-bc23-9afcbdd6de14_2082x1202.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>With multiple nodes, the database must resolve conflicts consistently. There are two common ways:</p><ol><li><p>Coordination via a leader (consensus): Route both writes to the same leader node, which solves the conflict and determines the end state.</p></li><li><p>Reconcile with comparable timestamps: Attach a timestamp to each write and store it with the key. By timestamp, we don&#8217;t mean relying on wall-clock time but a logical clock, so that &#8220;later&#8220; is well-defined across nodes.</p></li></ol><p>If we go with the second approach and start storing <code>(key, value, timestamp)</code>  data, we also unlock something production systems use: consistent snapshots. A read can include a timestamp, and the database returns the last version at or before that time; hence, providing a consistent view of the data, even while flush/compaction runs in the background.</p><p>This pattern has a name: Multi-Version Concurrency Control (MVCC). It involves keeping multiple versions per key instead of only the last one, reading using a chosen point in time, and deleting old versions once they are no longer needed.</p><p>See how ScyllaDB handles <a href="https://github.com/scylladb/scylladb/blob/master/docs/dev/timestamp-conflict-resolution.md">timestamp conflict resolution</a> for more information.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ZoDz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ZoDz!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png 424w, https://substackcdn.com/image/fetch/$s_!ZoDz!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png 848w, https://substackcdn.com/image/fetch/$s_!ZoDz!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png 1272w, https://substackcdn.com/image/fetch/$s_!ZoDz!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ZoDz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png" width="449" height="224.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:600,&quot;width&quot;:1200,&quot;resizeWidth&quot;:449,&quot;bytes&quot;:82853,&quot;alt&quot;:&quot;The Coder Cafe: Learn One Concept With Your Coffee.&quot;,&quot;title&quot;:&quot;The Coder Cafe: Learn One Concept With Your Coffee.&quot;,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.thecoder.cafe/i/151119215?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="The Coder Cafe: Learn One Concept With Your Coffee." title="The Coder Cafe: Learn One Concept With Your Coffee." srcset="https://substackcdn.com/image/fetch/$s_!ZoDz!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png 424w, https://substackcdn.com/image/fetch/$s_!ZoDz!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png 848w, https://substackcdn.com/image/fetch/$s_!ZoDz!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png 1272w, https://substackcdn.com/image/fetch/$s_!ZoDz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://read.thecoder.cafe/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">AI is getting better every day. Are you? At The Coder Cafe, we serve fundamental concepts to make you an engineer that AI can&#8217;t replace. Written by a Google SWE, trusted by thousands of engineers worldwide.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>&#10084;&#65039; <em>If you enjoyed this post, please hit the like button.</em></p>]]></content:encoded></item><item><title><![CDATA[Build Your Own Key-Value Storage Engine—Week 7]]></title><description><![CDATA[Block-Based SSTables and Indexing]]></description><link>https://read.thecoder.cafe/p/build-your-own-kv-engine-7</link><guid isPermaLink="false">https://read.thecoder.cafe/p/build-your-own-kv-engine-7</guid><dc:creator><![CDATA[Teiva Harsanyi]]></dc:creator><pubDate>Thu, 26 Feb 2026 15:47:19 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/1585479e-d6ed-4950-8029-ad87c0e15827_1600x800.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1>Monster Scale Summit</h1><p><em>Curious how leading engineers tackle extreme scale challenges with data-intensive applications? Join Monster Scale Summit (free + virtual). It&#8217;s hosted by ScyllaDB, the monstrously fast and scalable database.</em></p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://www.scylladb.com/monster-scale-summit/?latest_sfdc_campaign=701Rb00000YVkNx&amp;campaign_status=Submitted&amp;utm_campaign=pn%20coder%20cafe%202026-03-11%20monster%20scale%20summit&amp;utm_medium=paid%20newsletter&amp;utm_source=paid%20newsletter&amp;lead_source_type=coder%20cafe" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!p4cN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png 424w, https://substackcdn.com/image/fetch/$s_!p4cN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png 848w, https://substackcdn.com/image/fetch/$s_!p4cN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png 1272w, https://substackcdn.com/image/fetch/$s_!p4cN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!p4cN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png" width="1456" height="182" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:182,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:737127,&quot;alt&quot;:&quot;Monster Scale Summit.&quot;,&quot;title&quot;:&quot;&quot;,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://www.scylladb.com/monster-scale-summit/?latest_sfdc_campaign=701Rb00000YVkNx&amp;campaign_status=Submitted&amp;utm_campaign=pn%20coder%20cafe%202026-03-11%20monster%20scale%20summit&amp;utm_medium=paid%20newsletter&amp;utm_source=paid%20newsletter&amp;lead_source_type=coder%20cafe&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.thecoder.cafe/i/174600320?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Monster Scale Summit." title="Monster Scale Summit." srcset="https://substackcdn.com/image/fetch/$s_!p4cN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png 424w, https://substackcdn.com/image/fetch/$s_!p4cN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png 848w, https://substackcdn.com/image/fetch/$s_!p4cN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png 1272w, https://substackcdn.com/image/fetch/$s_!p4cN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a></figure></div><div><hr></div><h1>Agenda</h1><ul><li><p><a href="https://read.thecoder.cafe/p/build-your-own-kv-engine">Week 0: Introduction</a></p></li><li><p><a href="https://read.thecoder.cafe/p/build-your-own-kv-engine-1">Week 1: In-Memory Store</a></p></li><li><p><a href="https://read.thecoder.cafe/p/build-your-own-kv-engine-2">Week 2: LSM Tree Foundations</a></p></li><li><p><a href="https://read.thecoder.cafe/p/build-your-own-kv-engine-3">Week 3: Durability with Write-Ahead Logging</a></p></li><li><p><a href="https://read.thecoder.cafe/p/build-your-own-kv-engine-4">Week 4: Deletes, Tombstones, and Compaction</a></p></li><li><p><a href="https://read.thecoder.cafe/p/build-your-own-kv-engine-5">Week 5: Leveling and Key-Range Partitioning</a></p></li><li><p><a href="https://read.thecoder.cafe/p/build-your-own-kv-engine-6">Week 6: Block-Based SSTables and Indexing</a></p></li><li><p><strong><a href="https://read.thecoder.cafe/p/build-your-own-kv-engine-7">Week 7: Bloom Filters and Trie Memtable</a></strong></p></li><li><p><a href="https://read.thecoder.cafe/p/build-your-own-kv-engine-8">Week 8: Concurrency</a></p></li></ul><h1>Introduction</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5k1K!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7aa8b83e-3673-497d-b6be-4c53075b3c7b_1600x800.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5k1K!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7aa8b83e-3673-497d-b6be-4c53075b3c7b_1600x800.png 424w, https://substackcdn.com/image/fetch/$s_!5k1K!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7aa8b83e-3673-497d-b6be-4c53075b3c7b_1600x800.png 848w, https://substackcdn.com/image/fetch/$s_!5k1K!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7aa8b83e-3673-497d-b6be-4c53075b3c7b_1600x800.png 1272w, https://substackcdn.com/image/fetch/$s_!5k1K!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7aa8b83e-3673-497d-b6be-4c53075b3c7b_1600x800.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5k1K!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7aa8b83e-3673-497d-b6be-4c53075b3c7b_1600x800.png" width="1456" height="728" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7aa8b83e-3673-497d-b6be-4c53075b3c7b_1600x800.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:728,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2303878,&quot;alt&quot;:&quot;Week 7 Bloom Filters and Trie Memtable&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://read.thecoder.cafe/i/174613526?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7aa8b83e-3673-497d-b6be-4c53075b3c7b_1600x800.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Week 7 Bloom Filters and Trie Memtable" title="Week 7 Bloom Filters and Trie Memtable" srcset="https://substackcdn.com/image/fetch/$s_!5k1K!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7aa8b83e-3673-497d-b6be-4c53075b3c7b_1600x800.png 424w, https://substackcdn.com/image/fetch/$s_!5k1K!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7aa8b83e-3673-497d-b6be-4c53075b3c7b_1600x800.png 848w, https://substackcdn.com/image/fetch/$s_!5k1K!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7aa8b83e-3673-497d-b6be-4c53075b3c7b_1600x800.png 1272w, https://substackcdn.com/image/fetch/$s_!5k1K!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7aa8b83e-3673-497d-b6be-4c53075b3c7b_1600x800.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Over the last few weeks, you refined your LSM tree to introduce leveling. In case of a key miss, the <code>GET</code> process requires the following steps:</p><ul><li><p>Lookup from the memtable.</p></li><li><p>Lookup from all the L0 SSTables.</p></li><li><p>Lookup from one L1 SSTable.</p></li><li><p>Lookup from one L2 SSTable.</p></li><li><p>Etc.</p></li></ul><p>Last week, you optimized the lookups by introducing block-based SSTables and indexing, but a lookup is still not a &#8220;free&#8221; operation. Worst case, it requires fetching two pages (one for the index block and one for the data block) to find out that a key is missing in an SSTable.</p><p>This week, you will optimize searches by introducing a &#8220;tiny&#8221; level of caching per SSTable.</p><p>If you&#8217;re an avid reader of <em>The Coder Cafe<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a></em>, we already discussed a great candidate for such a cache:</p><ul><li><p>One that doesn&#8217;t consume too much memory to make sure we don&#8217;t increase space amplification drastically.</p></li><li><p>One that is fast enough so that a lookup doesn&#8217;t introduce too much overhead, especially if we have to check a cache before making any lookup in an SSTable.</p></li></ul><p>You will implement a cache using <a href="https://read.thecoder.cafe/p/bloom-filters">Bloom filters</a>: a space-efficient, probabilistic data structure to check for set membership. A Bloom filter can return two possible answers:</p><ul><li><p>The element is definitely not in the set (no false negatives).</p></li><li><p>The element may be in the set (false positives are possible).</p></li></ul><p>In addition to optimizing SSTable lookups, you will also optimize your memtable.</p><p>In week 2, you implemented a memtable using a hashtable. Let&#8217;s get some perspective to understand the problems of using a hashtable:</p><ul><li><p>A memtable buffers writes.</p></li><li><p>As it&#8217;s the main entry point for writes, a write has to be fast. &#8594; OK: a hashtable has average <code>O(1)</code> inserts, plus <code>O(k)</code> (<code>k</code>: the length of the key) for hashing.</p></li><li><p>For reads, doing a key lookup has to be fast &#8594; OK: average <code>O(1)</code> lookups, plus <code>O(k)</code> to hash.</p></li><li><p>Doing range scanning operations (week 5, optional work), such as: &#8220;<em>Give me the list of keys between bar and foo</em>&#8220; &#8594; A hashtable, because it&#8217;s not an ordered data structure, is terrible: you end up touching everything so <code>O(n)</code> with <code>n</code> the number of elements in the hashtable.</p></li><li><p>Flush to L0 &#8594; A hashtable isn&#8217;t ordered, so it requires sorting all the keys (<code>O(n log n</code>) with n the number of elements) to produce the SSTables.</p></li></ul><p>Because of these negative points, could we find a better data structure? Yes! This week, you will switch the memtable to a radix trie (see Further Notes for a discussion on alternative data structures).</p><p>A trie is a tree-shaped data structure usually used to store strings efficiently. The common example to illustrate a trie is to store a dictionary. For example, suppose you want to store these two words:</p><ul><li><p><code>bake</code></p></li><li><p><code>baker</code></p></li></ul><p>Despite that <code>baker</code> starts with the same four letters, you need to store a total of 4 + 5 = 9 letters.</p><p>Tries optimize the storage required by sharing prefixes. Each node stores one letter. Here&#8217;s an example of a trie storing these two words in addition to the word foo (<code>*</code> nodes represent the end of a word):</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!AaUV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4141c7eb-6654-466f-af28-943ef8878e92_520x1160.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!AaUV!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4141c7eb-6654-466f-af28-943ef8878e92_520x1160.png 424w, https://substackcdn.com/image/fetch/$s_!AaUV!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4141c7eb-6654-466f-af28-943ef8878e92_520x1160.png 848w, https://substackcdn.com/image/fetch/$s_!AaUV!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4141c7eb-6654-466f-af28-943ef8878e92_520x1160.png 1272w, https://substackcdn.com/image/fetch/$s_!AaUV!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4141c7eb-6654-466f-af28-943ef8878e92_520x1160.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!AaUV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4141c7eb-6654-466f-af28-943ef8878e92_520x1160.png" width="268" height="597.8461538461538" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4141c7eb-6654-466f-af28-943ef8878e92_520x1160.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1160,&quot;width&quot;:520,&quot;resizeWidth&quot;:268,&quot;bytes&quot;:99972,&quot;alt&quot;:&quot;Diagram of a trie structure with a root node branching into two paths. One path spells &#8220;b &#8594; a &#8594; k &#8594; e,&#8221; with branches leading to &#8220;*&#8221; and &#8220;r.&#8221; The other path spells &#8220;f &#8594; o &#8594; o &#8594; *.&#8221;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.thecoder.cafe/i/174613526?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4141c7eb-6654-466f-af28-943ef8878e92_520x1160.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Diagram of a trie structure with a root node branching into two paths. One path spells &#8220;b &#8594; a &#8594; k &#8594; e,&#8221; with branches leading to &#8220;*&#8221; and &#8220;r.&#8221; The other path spells &#8220;f &#8594; o &#8594; o &#8594; *.&#8221;" title="Diagram of a trie structure with a root node branching into two paths. One path spells &#8220;b &#8594; a &#8594; k &#8594; e,&#8221; with branches leading to &#8220;*&#8221; and &#8220;r.&#8221; The other path spells &#8220;f &#8594; o &#8594; o &#8594; *.&#8221;" srcset="https://substackcdn.com/image/fetch/$s_!AaUV!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4141c7eb-6654-466f-af28-943ef8878e92_520x1160.png 424w, https://substackcdn.com/image/fetch/$s_!AaUV!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4141c7eb-6654-466f-af28-943ef8878e92_520x1160.png 848w, https://substackcdn.com/image/fetch/$s_!AaUV!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4141c7eb-6654-466f-af28-943ef8878e92_520x1160.png 1272w, https://substackcdn.com/image/fetch/$s_!AaUV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4141c7eb-6654-466f-af28-943ef8878e92_520x1160.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>As you can see, we didn&#8217;t duplicate the first four letters of <code>bake</code> to store <code>baker</code>. In this very example, instead of storing 9 letters for <code>bake</code> and <code>baker</code>, we stored only five letters.</p><p>Yet, you&#8217;re not going to implement a &#8220;basic&#8221; trie for your memtable; instead, you will implement a compressed trie called a radix trie (also known as a patricia<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a> trie).</p><p>Back to the previous example, storing one node (one square) has an overhead. It usually means at least one extra field to store the next element, usually a pointer.</p><p>In the previous example, we needed 11 nodes in total, but what if we could compress the number of nodes required? The idea is to combine nodes with a single child:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!9-It!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21b2b981-3224-408a-93fe-b7f8c577cd96_520x560.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!9-It!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21b2b981-3224-408a-93fe-b7f8c577cd96_520x560.png 424w, https://substackcdn.com/image/fetch/$s_!9-It!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21b2b981-3224-408a-93fe-b7f8c577cd96_520x560.png 848w, https://substackcdn.com/image/fetch/$s_!9-It!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21b2b981-3224-408a-93fe-b7f8c577cd96_520x560.png 1272w, https://substackcdn.com/image/fetch/$s_!9-It!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21b2b981-3224-408a-93fe-b7f8c577cd96_520x560.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!9-It!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21b2b981-3224-408a-93fe-b7f8c577cd96_520x560.png" width="278" height="299.38461538461536" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/21b2b981-3224-408a-93fe-b7f8c577cd96_520x560.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:560,&quot;width&quot;:520,&quot;resizeWidth&quot;:278,&quot;bytes&quot;:87661,&quot;alt&quot;:&quot;Diagram of a compressed trie with a root node branching into two paths. One branch leads to &#8220;bake&#8221; with child nodes &#8220;&#8221; and &#8220;r,&#8221; and the other branch leads to &#8220;foo&#8221; with a child node &#8220;.&#8221;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.thecoder.cafe/i/174613526?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21b2b981-3224-408a-93fe-b7f8c577cd96_520x560.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Diagram of a compressed trie with a root node branching into two paths. One branch leads to &#8220;bake&#8221; with child nodes &#8220;&#8221; and &#8220;r,&#8221; and the other branch leads to &#8220;foo&#8221; with a child node &#8220;.&#8221;" title="Diagram of a compressed trie with a root node branching into two paths. One branch leads to &#8220;bake&#8221; with child nodes &#8220;&#8221; and &#8220;r,&#8221; and the other branch leads to &#8220;foo&#8221; with a child node &#8220;.&#8221;" srcset="https://substackcdn.com/image/fetch/$s_!9-It!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21b2b981-3224-408a-93fe-b7f8c577cd96_520x560.png 424w, https://substackcdn.com/image/fetch/$s_!9-It!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21b2b981-3224-408a-93fe-b7f8c577cd96_520x560.png 848w, https://substackcdn.com/image/fetch/$s_!9-It!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21b2b981-3224-408a-93fe-b7f8c577cd96_520x560.png 1272w, https://substackcdn.com/image/fetch/$s_!9-It!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21b2b981-3224-408a-93fe-b7f8c577cd96_520x560.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This new trie stores the exact same information, except it requires 6 nodes instead of 11. That&#8217;s what radix tries are about.</p><p>To summarize the benefits of switching a memtable from a hashtable to a radix trie:</p><ul><li><p>Ordered by design: Tries keep keys in order and make prefix/range lookups natural, which helps for <code>SCAN</code> and for streaming a sorted flush.</p></li><li><p>No rebalancing/rehashing pauses: The shape doesn&#8217;t depend on insertion order, and operations don&#8217;t need rebalancing; you avoid periodic rehash work.</p></li><li><p>Prefix compression: A radix trie can cut duplicated key bytes in the memtable, reducing in-memory space.</p></li></ul><h1>Your Tasks</h1><p>&#128172; If you want to share your progress, discuss solutions, or collaborate with other coders, join the community Discord server (<code>#kv-store-engine</code> channel):</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://discord.thecoder.cafe/&quot;,&quot;text&quot;:&quot;Join the Discord&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://discord.thecoder.cafe/"><span>Join the Discord</span></a></p><h2>Bloom Filter</h2><p>Let&#8217;s size the Bloom filter. You will target:</p><ul><li><p><code>p</code> (false-positive rate) = 1%</p></li><li><p><code>n</code> (max elements per SSTable) = 1,953</p></li><li><p><code>k</code> (hash functions) = 5</p></li></ul><p>Using the formula from the <a href="https://www.thecoder.cafe/p/bloom-filters">Bloom Filters</a> post:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;m = -\\frac{n k}{\\ln(1 - p^{\\frac{1}{k}})}&quot;,&quot;id&quot;:&quot;WGVAGHLAGR&quot;}" data-component-name="LatexBlockToDOM"></div><p>We get <code>m</code> &#8776; 19,230 bits, i.e., 2,404 B. We will round up to 2,496 B (39 &#215; 64 B), so the bitset is a whole number of cache lines.</p><blockquote><p><strong>NOTE</strong>: Using <code>k</code>=7 would shave only ~2&#8211;3% space for ~40% more hash work, so <code>k</code>=5 is a good trade-off.</p></blockquote><p>To distribute elements across the bitvector, you will use the following approach. You will use xxHash64 with two different constant seeds to get two base hashes, then derive k indices by double hashing (pseudo-code):</p><pre><code>const m: uint64 = 19230
const k: int    = 5

const seed1: uint64 = 0x9E3779B185EBCA87
const seed2: uint64 = 0xC2B2AE3D27D4EB4F

function bits(key: string): int[] {
    h1: uint64 = xxhash64(key, seed1)
    h2: uint64 = xxhash64(key, seed2)

    // Initialize an integer array of k elements.
    idx: int[] = new int[k]
    for i from 0 to k-1:
        idx[i] = (h1 + i*h2) % m
    return idx
}</code></pre><p>The required changes to introduce Bloom filters:</p><ul><li><p>Startup:</p><ul><li><p>For each SSTable in the MANIFEST, cache its related Bloom filter in memory.</p><p>Since each Bloom filter requires only a small amount of space, this optimization has a minimal memory footprint. For example, caching 1,000 Bloom filters of the type you designed requires less than 2.5 MB of memory.</p></li></ul></li><li><p>SSTable creation:</p><ul><li><p>For each new SSTable you write, initialize an empty bitvector of 2,496 B.</p></li><li><p>Build the Bloom filter in memory as you emit the keys (including tombstones):</p><ul><li><p>Compute <code>idx</code> based on the key.</p></li><li><p>For each <code>i</code>, set bit at position <code>idx[i]</code>.</p></li></ul></li><li><p>When the SSTable is done, persist a sidecar file next to it (e.g., <code>/l1/sst-3.sst</code> and <code>/l1/sst-3.bloom</code>) and <code>fsync</code> the file.</p></li><li><p>Update the cache containing the Bloom filters.</p></li></ul></li><li><p>Compaction:</p><ul><li><p>Delete from memory the Bloom filters corresponding to deleted SSTables.</p></li></ul></li><li><p>Lookup:</p><ul><li><p>Before reading an SSTable:</p><ul><li><p>Compute <code>idx</code> based on the key.</p><ul><li><p>If all the bits of <code>idx</code> are set: The key may be present, therefore, proceed with your normal lookup in the SSTable.</p></li><li><p>Otherwise: Skip this SSTable.</p></li></ul></li></ul></li></ul></li></ul><h2>Trie Memtable</h2><p>Now, let&#8217;s replace your hashtable with a trie.</p><h3>Node shape</h3><ul><li><p><code>label</code>: Compressed edge fragment.</p></li><li><p><code>children</code>: A map keyed by the next character after <code>label</code> to a node.</p></li><li><p><code>state</code>: An enum with the different possible values:</p><ul><li><p><code>Empty</code>: The node is just a prefix, no full key ends here.</p></li><li><p><code>Value</code>: A full key exists at this node.</p></li><li><p><code>Tombstone</code>: This key was explicitly deleted.</p></li></ul></li><li><p><code>value</code>: If <code>state</code> is <code>Value</code>, the corresponding value.</p></li></ul><p>Root is a sentinel node with an empty <code>label</code>.</p><h3>Trie Operations</h3><ul><li><p><code>update(key, val)</code>:</p><ul><li><p>Walk from the root, matching the longest common prefix against <code>label</code>.</p></li><li><p>If partial match in the middle of an edge, split once: Create a parent with the common part, two children: the old suffix and the new suffix.</p></li><li><p>Descend via the next child (next unmatched character).</p></li><li><p>At the terminal node: set <code>value=val</code> and <code>tombstone=false.</code></p></li></ul></li><li><p><code>get(key)</code>:</p><ul><li><p>Walk edges by longest-prefix match. If an edge doesn&#8217;t match, return not found.</p></li><li><p>At the terminal node:</p><ul><li><p>If <code>state == HasValue</code>: return <code>value</code></p></li><li><p>If <code>state == Empty</code> or <code>state == Tombstone</code>, return not found.</p></li></ul></li></ul></li><li><p><code>delete(key)</code>:</p><ul><li><p>Walk as in <code>get</code>. If the path doesn&#8217;t fully exist, create the missing suffix nodes with <code>state = Empty</code> so that a terminal node exists.</p></li><li><p>At the terminal node: set <code>state = Tombstone</code> (you may have to clear <code>value</code>).</p></li></ul></li></ul><p>Flush process:</p><ul><li><p>In-order traversal:</p><ul><li><p><code>HasValue</code>: Emit <code>(key, value)</code>.</p></li><li><p><code>Tombstone</code>: Emit tombstone.</p></li><li><p><code>Empty</code>: Emit nothing.</p></li></ul></li></ul><h2>Client &amp; Validation</h2><p>There are no changes to the client. Run it against the same file (<a href="https://github.com/teivah/thecodercafe/blob/main/res/kv/gen/put-delete.txt">put-delete.txt</a>) to validate that your changes are correct.</p><h2>[Optional] Variable Bloom Filter Seed</h2><p>Use per-SSTable random seeds for the Bloom hash functions. Persist them in the Bloom filter files.</p><h2>[Optional] Blocked Bloom Filters</h2><p>In <a href="https://www.thecoder.cafe/p/bloom-filters">Bloom Filters</a>, you introduced blocked Bloom filters, a variant that optimizes spatial locality by:</p><ul><li><p>Dividing the bloom filter into contiguous blocks, each the size of a cache line.</p></li><li><p>Restricting each query to a single block to ensure all bit lookups stay within the same cache line.</p></li></ul><p>Switch to blocked Bloom filters and see the impacts on latency and throughput.</p><h2>[Optional] SCAN</h2><p>If you implemented the <code>SCAN</code> operation from week 5 (optional work), wire it to your memtable radix trie.</p><h1>Wrap Up</h1><p>That&#8217;s it for this week! You optimized lookups with per-SSTable Bloom filters and switched the memtable to a radix trie, an ordered data structure.</p><p>Since the beginning of the series, everything you built has been single-threaded, and flush/compaction remains stop-the-world. In two weeks, you will finally tackle the final boss of LSM trees: concurrency.</p><h1>Further Notes</h1><p>If you want to dive more into tries, <em><a href="https://www.vldb.org/pvldb/vol15/p3359-lambov.pdf">Trie Memtables in Cassandra</a></em> is a paper that explains why Cassandra moved from a skip list + B-tree memtable to a trie, and what it changed for topics such as GC and CPU locality.</p><p>A popular variant of radix trie is the Adaptive Radix Tree (ART): it dynamically resizes node types based on the number of children to stay compact and cache-friendly, while supporting fast in-memory lookups, inserts, and deletes. <a href="https://db.in.tum.de/~leis/papers/ART.pdf">This paper</a> (or this <a href="https://www.the-paper-trail.org/post/art-paper-notes/">summary</a>) explores the topic in depth.</p><p>You should also be aware that tries aren&#8217;t the only option for memtables, as other data structures exist. For example, RocksDB relies on a skip list. See <a href="https://github.com/facebook/rocksdb/wiki/MemTable">this resource</a> for more information.</p><p>About Bloom filters, some engines keep a Bloom filter not only per SSTable but per data-block range as well. This was the case for RocksDB&#8217;s older block-based filter format (<a href="https://rocksdb.org/blog/2014/09/12/new-bloom-filter-format.html">source</a>). RocksDB later shifted toward partitioned index/filters, which partition the index and full-file filter into smaller blocks with a top-level directory for on-demand loading. The <a href="https://github.com/facebook/rocksdb/wiki/Partitioned-Index-Filters">official doc</a> delves into the new approach.</p><p>Next: <a href="https://read.thecoder.cafe/p/build-your-own-kv-engine-8">Week 8: Concurrency</a></p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ZoDz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ZoDz!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png 424w, https://substackcdn.com/image/fetch/$s_!ZoDz!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png 848w, https://substackcdn.com/image/fetch/$s_!ZoDz!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png 1272w, https://substackcdn.com/image/fetch/$s_!ZoDz!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ZoDz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png" width="449" height="224.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:600,&quot;width&quot;:1200,&quot;resizeWidth&quot;:449,&quot;bytes&quot;:82853,&quot;alt&quot;:&quot;The Coder Cafe: Learn One Concept With Your Coffee.&quot;,&quot;title&quot;:&quot;The Coder Cafe: Learn One Concept With Your Coffee.&quot;,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.thecoder.cafe/i/151119215?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="The Coder Cafe: Learn One Concept With Your Coffee." title="The Coder Cafe: Learn One Concept With Your Coffee." srcset="https://substackcdn.com/image/fetch/$s_!ZoDz!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png 424w, https://substackcdn.com/image/fetch/$s_!ZoDz!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png 848w, https://substackcdn.com/image/fetch/$s_!ZoDz!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png 1272w, https://substackcdn.com/image/fetch/$s_!ZoDz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://read.thecoder.cafe/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">AI is getting better every day. Are you? At The Coder Cafe, we serve fundamental concepts to make you an engineer that AI can&#8217;t replace. Written by a Google SWE, trusted by thousands of engineers worldwide.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>&#10084;&#65039; <em>If you enjoyed this post, please hit the like button.</em></p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>I&#8217;m sure you are.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p>Hey mom!</p></div></div>]]></content:encoded></item><item><title><![CDATA[Build Your Own Key-Value Storage Engine—Week 6]]></title><description><![CDATA[Block-Based SSTables and Indexing]]></description><link>https://read.thecoder.cafe/p/build-your-own-kv-engine-6</link><guid isPermaLink="false">https://read.thecoder.cafe/p/build-your-own-kv-engine-6</guid><dc:creator><![CDATA[Teiva Harsanyi]]></dc:creator><pubDate>Wed, 21 Jan 2026 13:01:06 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!zbw_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ae38adb-d254-46e8-9e73-19f2ada7d5a7_1600x800.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1>Monster Scale Summit</h1><p><em>Curious how leading engineers tackle extreme scale challenges with data-intensive applications? Join Monster Scale Summit (free + virtual). It&#8217;s hosted by ScyllaDB, the monstrously fast and scalable database.</em></p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://www.scylladb.com/monster-scale-summit/?latest_sfdc_campaign=701Rb00000YVkNx&amp;campaign_status=Submitted&amp;utm_campaign=pn%20coder%20cafe%202026-03-11%20monster%20scale%20summit&amp;utm_medium=paid%20newsletter&amp;utm_source=paid%20newsletter&amp;lead_source_type=coder%20cafe" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!p4cN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png 424w, https://substackcdn.com/image/fetch/$s_!p4cN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png 848w, https://substackcdn.com/image/fetch/$s_!p4cN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png 1272w, https://substackcdn.com/image/fetch/$s_!p4cN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!p4cN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png" width="1456" height="182" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:182,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:737127,&quot;alt&quot;:&quot;Monster Scale Summit.&quot;,&quot;title&quot;:&quot;&quot;,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://www.scylladb.com/monster-scale-summit/?latest_sfdc_campaign=701Rb00000YVkNx&amp;campaign_status=Submitted&amp;utm_campaign=pn%20coder%20cafe%202026-03-11%20monster%20scale%20summit&amp;utm_medium=paid%20newsletter&amp;utm_source=paid%20newsletter&amp;lead_source_type=coder%20cafe&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.thecoder.cafe/i/174600320?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Monster Scale Summit." title="Monster Scale Summit." srcset="https://substackcdn.com/image/fetch/$s_!p4cN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png 424w, https://substackcdn.com/image/fetch/$s_!p4cN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png 848w, https://substackcdn.com/image/fetch/$s_!p4cN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png 1272w, https://substackcdn.com/image/fetch/$s_!p4cN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a></figure></div><div><hr></div><h1>Agenda</h1><ul><li><p><a href="https://read.thecoder.cafe/p/build-your-own-kv-engine">Week 0: Introduction</a></p></li><li><p><a href="https://read.thecoder.cafe/p/build-your-own-kv-engine-1">Week 1: In-Memory Store</a></p></li><li><p><a href="https://read.thecoder.cafe/p/build-your-own-kv-engine-2">Week 2: LSM Tree Foundations</a></p></li><li><p><a href="https://read.thecoder.cafe/p/build-your-own-kv-engine-3">Week 3: Durability with Write-Ahead Logging</a></p></li><li><p><a href="https://read.thecoder.cafe/p/build-your-own-kv-engine-4">Week 4: Deletes, Tombstones, and Compaction</a></p></li><li><p><a href="https://read.thecoder.cafe/p/build-your-own-kv-engine-5">Week 5: Leveling and Key-Range Partitioning</a></p></li><li><p><strong><a href="https://read.thecoder.cafe/p/build-your-own-kv-engine-6">Week 6: Block-Based SSTables and Indexing</a></strong></p></li><li><p><a href="https://read.thecoder.cafe/p/build-your-own-kv-engine-7">Week 7: Bloom Filters and Trie Memtable</a></p></li><li><p><a href="https://read.thecoder.cafe/p/build-your-own-kv-engine-8">Week 8: Concurrency</a></p></li></ul><h1>Introduction</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!zbw_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ae38adb-d254-46e8-9e73-19f2ada7d5a7_1600x800.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!zbw_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ae38adb-d254-46e8-9e73-19f2ada7d5a7_1600x800.png 424w, https://substackcdn.com/image/fetch/$s_!zbw_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ae38adb-d254-46e8-9e73-19f2ada7d5a7_1600x800.png 848w, https://substackcdn.com/image/fetch/$s_!zbw_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ae38adb-d254-46e8-9e73-19f2ada7d5a7_1600x800.png 1272w, https://substackcdn.com/image/fetch/$s_!zbw_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ae38adb-d254-46e8-9e73-19f2ada7d5a7_1600x800.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!zbw_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ae38adb-d254-46e8-9e73-19f2ada7d5a7_1600x800.png" width="1456" height="728" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2ae38adb-d254-46e8-9e73-19f2ada7d5a7_1600x800.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:728,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2300272,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://read.thecoder.cafe/i/174613519?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ae38adb-d254-46e8-9e73-19f2ada7d5a7_1600x800.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!zbw_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ae38adb-d254-46e8-9e73-19f2ada7d5a7_1600x800.png 424w, https://substackcdn.com/image/fetch/$s_!zbw_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ae38adb-d254-46e8-9e73-19f2ada7d5a7_1600x800.png 848w, https://substackcdn.com/image/fetch/$s_!zbw_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ae38adb-d254-46e8-9e73-19f2ada7d5a7_1600x800.png 1272w, https://substackcdn.com/image/fetch/$s_!zbw_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ae38adb-d254-46e8-9e73-19f2ada7d5a7_1600x800.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In week 2, you used JSON as the SSTable format. That works for document databases, but the overhead of this serialization format doesn&#8217;t make it the best choice for your storage engine:</p><ul><li><p>Best case: You stream the file and linearly scan entries until you find the key, but a miss means scanning the entire file.</p></li><li><p>Worst case: You read the whole file and parse everything, then search for the key.</p></li></ul><p>This week, you will switch to block-based SSTables. Data will be chunked into fixed-size blocks designed to fit within a single disk page. The main benefits:</p><ul><li><p>Efficient I/O: Each lookup can fetch a complete block with a single page read.</p></li><li><p>Predictable latency: Since every block maps to exactly one page, each read involves a fixed, bounded amount of I/O, improving latency consistency.</p></li><li><p>Smaller on disk: Binary encoding typically compresses better than JSON.</p></li><li><p>Integrity: Per-block checksums detect corruption without requiring a re-read of the file.</p></li><li><p>Caching: Hot SSTable blocks are cached in a memory-based block cache to reduce I/O and decompression overhead.</p></li></ul><p>Alongside the data blocks, you will maintain a small index that stores the first key of each block and its corresponding offset, allowing lookups to jump directly to the relevant block without scanning all of them.</p><h1>Your Tasks</h1><p>&#128172; If you want to share your progress, discuss solutions, or collaborate with other coders, join the community Discord server (<code>#kv-store-engine</code> channel):</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://discord.thecoder.cafe/&quot;,&quot;text&quot;:&quot;Join the Discord&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://discord.thecoder.cafe/"><span>Join the Discord</span></a></p><h2>Assumptions</h2><ul><li><p>Fixed 64-byte keys and values: This alleviates a lot of logic to keep fixed-size blocks, making the implementation easier to write and reason about.</p></li><li><p>Because of the week 1 assumption (keys are lowercase ASCII strings), each character is one byte, which also makes the implementation easier.</p></li></ul><h2>SSTable Format</h2><p>A block-based SSTable will be composed of:</p><ul><li><p>One index block (first 4 KB page)</p></li><li><p>Multiple data blocks (each 4 KB)</p></li></ul><p>Each block has a fixed size of 4 KB. Aligning blocks to 4 KB means a disk read can fetch a block in one page. If blocks are not aligned, a read may span two pages.</p><p>Here&#8217;s the file layout at a glance:</p><pre><code><code>Offset 0             4096                 8192                 ...
&#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9516;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9516;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9516;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;
&#9474; Index block (4 KB)  &#9474; Data block 0 (4 KB)&#9474; Data block 1 (4 KB)&#9474; ...
&#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9524;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9524;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9524;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;</code></code></pre><h3>Index Block</h3><p>The layout of an index block (4 KB):</p><ul><li><p><code>blockCount</code>: The number of data blocks in the SSTable.</p></li><li><p>A set of key entries (64 B), each being the first key of the corresponding data block. Entries are sorted by key and used to decide which block to fetch during a lookup.</p></li></ul><p>To make the index fit into a single 4 KB page, it must contain at most 63 entries.</p><p>Here&#8217;s the layout (note this is a binary layout; newlines are used only for the representation):</p><pre><code><code>&lt;blockCount&gt;&lt;pad&gt;
|----1B----||63B|

# First index entry
&lt;firstKey&gt; # Right-padded with 0x00
|--64B--|

# Second index entry
&lt;firstKey&gt; # Right-padded with 0x00
|--64B--|

# Third index entry, etc.

# Padding with 0x00 to reach exactly 4096 bytes</code></code></pre><blockquote><p><strong>NOTE</strong>: If you&#8217;re not familiar with the concept of padding: it&#8217;s filling unused bytes (here with 0x00) so fields and blocks have fixed sizes.</p></blockquote><p><code>blockCount</code> has a value between 0 and 63. If you encoded 63 as text, you would need two bytes (<code>&#8216;6&#8217;</code> = <code>0x36</code> and <code>&#8216;3&#8216;</code> = <code>0x33</code>). Instead, you can store it as a binary integer so it fits in one byte: <code>0x3f</code>.</p><p>Same layout, with explicit offsets:</p><pre><code><code>Offset  Size  Field
0       1     blockCount
1       63    padding (0x00) to 64-byte alignment
64      64    Entry 0: firstKey of data block 0 (right-padded to 64B)
128     64    Entry 1: firstKey of data block 1 (right-padded to 64B)
192     64    Entry 2: firstKey of data block 2 (right-padded to 64B)
...     64    ...
64+64*n 64    Entry n: firstKey of data block n (right-padded to 64B)

# If fewer than 63 entries, fill remaining bytes with 0x00 up to 4096B.</code></code></pre><p>An example of an SSTable with three data blocks, hence three entries. Remember: this is binary; newlines are for readability only:</p><pre><code><code>3&lt;63 0x00&gt;                           # 1B key + 63B pad  = 64B slot
aaa&lt;61 0x00&gt;                         # 3B key + 61B pad  = 64B slot
everyLittleIslandSeemsEmpty&lt;37 0x00&gt; # 27B key + 37B pad = 64B slot
hello&lt;59 0x00&gt;                       # 5B key + 59B pad  = 64B slot
&lt;3840 0x00&gt;                          # Tail pad: 4096 - (4*64)</code></code></pre><p>This index block indicates:</p><ul><li><p>Block 0 starts with the key <code>aaa</code>.</p></li><li><p>Block 1 starts with the key <code>everyLittleIslandSeemsEmpty</code>.</p></li><li><p>Block 2 starts with the key <code>hello</code>.</p></li></ul><p>You don&#8217;t need to store per-block offsets. Because the index is stored on a 4 KB page and every data block is exactly 4 KB and written contiguously, offsets can be calculated this way (<code>blockId</code> starts at 0): </p><pre><code><code>offset(blockId) = (blockId + 1) &#215; 4096</code></code></pre><p>Therefore:</p><ul><li><p>Block 0 starts at offset 4096.</p></li><li><p>Block 1 starts at offset 8192.</p></li><li><p>Block 2 starts at offset 12288.</p></li></ul><h3>Data Blocks</h3><p>Now, let&#8217;s focus on data blocks.</p><p>In addition to the key-value entries, reserve 8 bytes in the block at the start to store a CRC computed over <code>entryCount</code> + all entries; this lets you verify data integrity on read.</p><p>The layout of a data block (4 KB per block):</p><ul><li><p>Header (128 B):</p><ul><li><p><code>CRC-64</code> (8 B): A checksum computed over bytes [8..4096). You can choose any standard variant (e.g., CRC-64/ECMA-182).</p></li><li><p><code>entryCount</code> (1 B): the number of entries in this block (0..31).</p></li><li><p>Padding (119 B).</p></li></ul></li><li><p>Entries area (31 x 128 B = 3968 B), each entry is:</p><ul><li><p><code>key</code> (64 B, right-padded).</p></li><li><p><code>value</code> (64 B, right-padded).</p></li></ul></li></ul><p>The last data block may contain fewer than 31 entries (<code>entryCount &lt; 31</code>), but always pad with zeros to reach exactly 4 KB. This guarantees one-page reads and prevents errors across read modes (e.g., <code>SIGBUS</code> with <a href="https://man7.org/linux/man-pages/man2/mmap.2.html">mmap</a>).</p><p>The layout of a data block (again, newlines are used only for the representation):</p><pre><code><code>&lt;CRC64&gt;&lt;entryCount&gt;&lt;pad&gt;
|--8B--|----1B----|119B|

&lt;key&gt;&lt;value&gt; # Keys and values are right-padded with 0x00
|-64B|-64B|
&lt;key&gt;&lt;value&gt; # Keys and values are right-padded with 0x00
|-64B|-64B|
&lt;key&gt;&lt;value&gt; # Keys and values are right-padded with 0x00
|-64B|-64B|
# ...
# Zero-fill unused 128B entry slots so block size = 4096B</code></code></pre><p>Same layout, with explicit offsets:</p><pre><code>Offset  Size  Field
0       8     CRC64: CRC over [8..4096)
8       1     entryCount: 0..31 valid entries
9      119   padding

# Entries area (31 slots &#215; 128B = 3968B)
128     64    Entry 0: key (right-padded to 64B)
192     64    Entry 0: value (right-padded to 64B)

256     64    Entry 1: key (right-padded to 64B)
320     64    Entry 1: value (right-padded to 64B)

384     64    Entry 2: key (right-padded to 64B)
448     64    Entry 2: value (right-padded to 64B)
...     ...   ...

# If entryCount &lt; 31, zero-fill remaining 128B slots so block size = 4096B.</code></pre><p>An example of a block composed of three key-value pairs:</p><pre><code><code>0x42F0E1EBA9EA3693&lt;entryCount=3&gt;&lt;119 0x00&gt;
alastor&lt;57 0x00&gt;foo&lt;61 0x00&gt;
aristide&lt;56 0x00&gt;bar&lt;61 0x00&gt;
armelle&lt;57 0x00&gt;z&lt;63 0x00&gt;
&lt;3584 0x00&gt; # (31 - 3) &#215; 128B = 3584B</code></code></pre><p>Note that because the index block holds at most 63 key entries, an SSTable can have at most 63 data blocks. With 31 entries per block, that caps an SSTable at 63 &#215; 31 = 1,953 entries.</p><p>A tombstone is represented by a value of 64 bytes all set to 0x00. Due to this sentinel, the all-zero value is reserved and cannot be used as an application value from this week onward.</p><h2>GET</h2><p>Searching for a value doesn&#8217;t change (memtable &#8594; L0 &#8594; L1, etc.). What changes is how you read one SSTable (remember: from L1, you only need to read one SSTable per level because of non-overlapping key ranges).</p><p>The process to read from an SSTable:</p><ol><li><p>Read <code>blockCount</code>.</p></li><li><p>Binary search the index in <code>[0, blockCount-1]</code> to find the largest <code>firstKey</code> &#8804; key and get <code>blockId</code>. </p><ul><li><p>If not found (e.g., first index key is <code>bac</code> and your key is <code>aaa</code>), return a miss for this SSTable.</p></li></ul></li><li><p>Compute the block offset:<br><code>blockOffset(blockId) = (blockId + 1) &#215; 4096</code>.</p></li><li><p>Fetch the corresponding 4 KB block.</p></li><li><p>Verify CRC before using the block:</p><ul><li><p>Compute CRC64 over bytes [8..4096).</p></li><li><p>Compare with the 8-byte CRC stored at offset 0..7. If it doesn&#8217;t match, fail the read for this SSTable.</p></li></ul></li><li><p>Read <code>entryCount</code>.</p></li><li><p>Binary search the entries in <code>[0, entryCount-1]</code> for the key.</p></li><li><p>Return the corresponding value or a miss.</p></li></ol><h2>Compaction</h2><p>Last week, you split at 2,000 entries during the compaction process. This week, because a single SSTable is limited to 1,953 entries, change the split threshold to 1,953.</p><h2>Client &amp; Validation</h2><p>There are no changes to the client. Run it against the same file (<a href="https://github.com/teivah/thecodercafe/blob/main/res/kv/gen/put-delete.txt">put-delete.txt</a>) to validate that your changes are correct.</p><h2>[Optional] Variable-Length Key/Values</h2><p>Drop the 64-byte constraint: store a length-prefixed key and value per entry (short header with key length and value length).</p><p>Keep entries sorted and include the lengths in your checksum.</p><h2>[Optional] No Sentiel Values for Tombstones</h2><p>Tombstones are currently represented by a sentinel value (a 64-byte all-zero value), which prevents storing an actual empty value.</p><p>Instead, avoid reserving any value for deletes: add an explicit entry type per record (value or tombstone).</p><h2>[Optional] Compression </h2><p>Now that the format is binary, compression becomes more effective and saves more space.</p><p>As an optional task, compress each data block independently so lookups still touch only one block:</p><ul><li><p>Record each block&#8217;s offset and compressed size in the index.</p></li><li><p>Read just those bytes, decompress, and search.</p></li></ul><p>This packs more logical blocks into each cached page, raising cache hit rates, reducing pages touched during scans, and smoothing read latency.</p><h1>Wrap Up</h1><p>That&#8217;s it for this week! You implemented block-based SSTables and indexing, gaining benefits like more efficient I/O and reduced write amplification.</p><p>In two weeks, you will focus on improving read performance by adding a layer that can tell whether an SSTable is worth parsing, and say goodbye to your hashtable-based memtable, replacing it with a more efficient data structure.</p><h1>Further Notes</h1><p>For a production-grade implementation of block-based SSTables, see <a href="https://github.com/facebook/rocksdb/wiki/rocksdb-blockbasedtable-format">RocksDB&#8217;s block-based SSTable format</a>. It details block layout, per-block compression, and how the index stores offsets and sizes.</p><p>You can also check out <a href="https://docs.scylladb.com/manual/stable/architecture/sstable/sstable3/sstables-3-summary.html">ScyllaDB&#8217;s SSTables v3 docs</a>. ScyllaDB maintains a small in-memory summary of sampled keys to narrow the search, then uses the on-disk index to locate the exact block. This provides a nice contrast to our single-page index and illustrates how to scale when SSTables grow large.</p><p>For a deeper look at how things work in practice in terms of directory structure, you can explore the <a href="https://github.com/scylladb/scylladb/blob/master/docs/dev/sstables-directory-structure.md">ScyllaDB SSTables directory structure</a>, which shows how metadata and data are organized on disk.</p><p>Regarding CRC read failures, we mentioned that a checksum mismatch should simply cause the read to fail for that SSTable. In real systems, databases rely on replication to handle corruption. When multiple replicas exist, a system can recover by using data from an intact replica if one becomes corrupted or unavailable. Upon detecting a checksum mismatch, the system discards the corrupt replica and rebuilds it from a healthy one. This approach only works as long as a valid replica exists, which is why frequent checksum verification is critical: it ensures corruption is caught and repaired as early as possible, before it propagates.</p><p>Next: <a href="https://read.thecoder.cafe/p/build-your-own-kv-engine-7">Week 7: Bloom Filters and Trie Memtable</a></p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ZoDz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ZoDz!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png 424w, https://substackcdn.com/image/fetch/$s_!ZoDz!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png 848w, https://substackcdn.com/image/fetch/$s_!ZoDz!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png 1272w, https://substackcdn.com/image/fetch/$s_!ZoDz!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ZoDz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png" width="449" height="224.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:600,&quot;width&quot;:1200,&quot;resizeWidth&quot;:449,&quot;bytes&quot;:82853,&quot;alt&quot;:&quot;The Coder Cafe: Learn One Concept With Your Coffee.&quot;,&quot;title&quot;:&quot;The Coder Cafe: Learn One Concept With Your Coffee.&quot;,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.thecoder.cafe/i/151119215?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="The Coder Cafe: Learn One Concept With Your Coffee." title="The Coder Cafe: Learn One Concept With Your Coffee." srcset="https://substackcdn.com/image/fetch/$s_!ZoDz!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png 424w, https://substackcdn.com/image/fetch/$s_!ZoDz!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png 848w, https://substackcdn.com/image/fetch/$s_!ZoDz!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png 1272w, https://substackcdn.com/image/fetch/$s_!ZoDz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://read.thecoder.cafe/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">AI is getting better every day. Are you? At The Coder Cafe, we serve fundamental concepts to make you an engineer that AI can&#8217;t replace. Written by a Google SWE, trusted by thousands of engineers worldwide.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>&#10084;&#65039; <em>If you enjoyed this post, please hit the like button.</em></p>]]></content:encoded></item><item><title><![CDATA[Build Your Own Key-Value Storage Engine—Week 5]]></title><description><![CDATA[Leveling and Key-Range Partitioning]]></description><link>https://read.thecoder.cafe/p/build-your-own-kv-engine-5</link><guid isPermaLink="false">https://read.thecoder.cafe/p/build-your-own-kv-engine-5</guid><dc:creator><![CDATA[Teiva Harsanyi]]></dc:creator><pubDate>Wed, 14 Jan 2026 11:03:11 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!ofwM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F136af441-23ba-4088-9335-4e9ea105ca68_1600x800.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1>Monster Scale Summit</h1><p><em>Curious how leading engineers tackle extreme scale challenges with data-intensive applications? Join Monster Scale Summit (free + virtual). It&#8217;s hosted by ScyllaDB, the monstrously fast and scalable database.</em></p><p><em>I&#8217;ll also give a talk there, so feel free to join!</em></p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://www.scylladb.com/monster-scale-summit/?latest_sfdc_campaign=701Rb00000YVkNx&amp;campaign_status=Submitted&amp;utm_campaign=pn%20coder%20cafe%202026-03-11%20monster%20scale%20summit&amp;utm_medium=paid%20newsletter&amp;utm_source=paid%20newsletter&amp;lead_source_type=coder%20cafe" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!p4cN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png 424w, https://substackcdn.com/image/fetch/$s_!p4cN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png 848w, https://substackcdn.com/image/fetch/$s_!p4cN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png 1272w, https://substackcdn.com/image/fetch/$s_!p4cN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!p4cN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png" width="1456" height="182" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:182,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:737127,&quot;alt&quot;:&quot;Monster Scale Summit.&quot;,&quot;title&quot;:&quot;&quot;,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://www.scylladb.com/monster-scale-summit/?latest_sfdc_campaign=701Rb00000YVkNx&amp;campaign_status=Submitted&amp;utm_campaign=pn%20coder%20cafe%202026-03-11%20monster%20scale%20summit&amp;utm_medium=paid%20newsletter&amp;utm_source=paid%20newsletter&amp;lead_source_type=coder%20cafe&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.thecoder.cafe/i/174600320?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Monster Scale Summit." title="Monster Scale Summit." srcset="https://substackcdn.com/image/fetch/$s_!p4cN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png 424w, https://substackcdn.com/image/fetch/$s_!p4cN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png 848w, https://substackcdn.com/image/fetch/$s_!p4cN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png 1272w, https://substackcdn.com/image/fetch/$s_!p4cN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a></figure></div><div><hr></div><h1>Agenda</h1><ul><li><p><a href="https://read.thecoder.cafe/p/build-your-own-kv-engine">Week 0: Introduction</a></p></li><li><p><a href="https://read.thecoder.cafe/p/build-your-own-kv-engine-1">Week 1: In-Memory Store</a></p></li><li><p><a href="https://read.thecoder.cafe/p/build-your-own-kv-engine-2">Week 2: LSM Tree Foundations</a></p></li><li><p><a href="https://read.thecoder.cafe/p/build-your-own-kv-engine-3">Week 3: Durability with Write-Ahead Logging</a></p></li><li><p><a href="https://read.thecoder.cafe/p/build-your-own-kv-engine-4">Week 4: Deletes, Tombstones, and Compaction</a></p></li><li><p><strong><a href="https://read.thecoder.cafe/p/build-your-own-kv-engine-5">Week 5: Leveling and Key-Range Partitioning</a></strong></p></li><li><p><a href="https://read.thecoder.cafe/p/build-your-own-kv-engine-6">Week 6: Block-Based SSTables and Indexing</a></p></li><li><p><a href="https://read.thecoder.cafe/p/build-your-own-kv-engine-7">Week 7: Bloom Filters and Trie Memtable</a></p></li><li><p><a href="https://read.thecoder.cafe/p/build-your-own-kv-engine-8">Week 8: Concurrency</a></p></li></ul><h1>Introduction</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ofwM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F136af441-23ba-4088-9335-4e9ea105ca68_1600x800.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ofwM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F136af441-23ba-4088-9335-4e9ea105ca68_1600x800.png 424w, https://substackcdn.com/image/fetch/$s_!ofwM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F136af441-23ba-4088-9335-4e9ea105ca68_1600x800.png 848w, https://substackcdn.com/image/fetch/$s_!ofwM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F136af441-23ba-4088-9335-4e9ea105ca68_1600x800.png 1272w, https://substackcdn.com/image/fetch/$s_!ofwM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F136af441-23ba-4088-9335-4e9ea105ca68_1600x800.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ofwM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F136af441-23ba-4088-9335-4e9ea105ca68_1600x800.png" width="1456" height="728" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/136af441-23ba-4088-9335-4e9ea105ca68_1600x800.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:728,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2305364,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://read.thecoder.cafe/i/174613510?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F136af441-23ba-4088-9335-4e9ea105ca68_1600x800.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ofwM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F136af441-23ba-4088-9335-4e9ea105ca68_1600x800.png 424w, https://substackcdn.com/image/fetch/$s_!ofwM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F136af441-23ba-4088-9335-4e9ea105ca68_1600x800.png 848w, https://substackcdn.com/image/fetch/$s_!ofwM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F136af441-23ba-4088-9335-4e9ea105ca68_1600x800.png 1272w, https://substackcdn.com/image/fetch/$s_!ofwM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F136af441-23ba-4088-9335-4e9ea105ca68_1600x800.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Last week, you implemented deletion and compaction, making sure the LSM tree wouldn&#8217;t grow indefinitely.</p><p>Still, there&#8217;s a weak spot: in the worst-case scenario (e.g., on a key miss), a single read has to scan all SSTables. To address this, you will implement leveling, a core idea in LSM trees.</p><p>Instead of a single flat list of SSTables, leveling stores data across multiple levels: <code>L0</code>, <code>L1</code>, <code>L2</code>, etc.</p><ul><li><p><code>L0</code> gets compacted to <code>L1</code> and makes space for future memtable flushes.</p></li><li><p><code>L1</code> gets compacted to <code>L2</code> and makes space for <code>L0</code> compaction.</p></li><li><p><code>L2</code> gets compacted to <code>L3</code> and makes space for <code>L1</code> compaction.</p></li><li><p>&#8230;</p></li><li><p><code>Ln-1</code> gets compacted to <code>Ln</code> and makes space for <code>Ln-2</code> compaction.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!UzuB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F526b14d6-6ee3-438f-b59f-431998513f19_1360x1820.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!UzuB!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F526b14d6-6ee3-438f-b59f-431998513f19_1360x1820.png 424w, https://substackcdn.com/image/fetch/$s_!UzuB!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F526b14d6-6ee3-438f-b59f-431998513f19_1360x1820.png 848w, https://substackcdn.com/image/fetch/$s_!UzuB!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F526b14d6-6ee3-438f-b59f-431998513f19_1360x1820.png 1272w, https://substackcdn.com/image/fetch/$s_!UzuB!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F526b14d6-6ee3-438f-b59f-431998513f19_1360x1820.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!UzuB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F526b14d6-6ee3-438f-b59f-431998513f19_1360x1820.png" width="550" height="736.0294117647059" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/526b14d6-6ee3-438f-b59f-431998513f19_1360x1820.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1820,&quot;width&quot;:1360,&quot;resizeWidth&quot;:550,&quot;bytes&quot;:177915,&quot;alt&quot;:&quot;Diagram showing a hierarchical LSM tree structure. A blue box labeled &#8220;Memtable&#8221; sits above a dashed line separating memory from disk. Below it, multiple levels labeled &#8220;Level 0,&#8221; &#8220;Level 1,&#8221; and &#8220;Level 2&#8221; each contain several yellow boxes labeled &#8220;SSTable.&#8221; Arrows labeled &#8220;Flush&#8221; and &#8220;Compaction&#8221; point downward from one level to the next, indicating data movement from memory to lower disk levels.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.thecoder.cafe/i/174613510?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F526b14d6-6ee3-438f-b59f-431998513f19_1360x1820.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Diagram showing a hierarchical LSM tree structure. A blue box labeled &#8220;Memtable&#8221; sits above a dashed line separating memory from disk. Below it, multiple levels labeled &#8220;Level 0,&#8221; &#8220;Level 1,&#8221; and &#8220;Level 2&#8221; each contain several yellow boxes labeled &#8220;SSTable.&#8221; Arrows labeled &#8220;Flush&#8221; and &#8220;Compaction&#8221; point downward from one level to the next, indicating data movement from memory to lower disk levels." title="Diagram showing a hierarchical LSM tree structure. A blue box labeled &#8220;Memtable&#8221; sits above a dashed line separating memory from disk. Below it, multiple levels labeled &#8220;Level 0,&#8221; &#8220;Level 1,&#8221; and &#8220;Level 2&#8221; each contain several yellow boxes labeled &#8220;SSTable.&#8221; Arrows labeled &#8220;Flush&#8221; and &#8220;Compaction&#8221; point downward from one level to the next, indicating data movement from memory to lower disk levels." srcset="https://substackcdn.com/image/fetch/$s_!UzuB!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F526b14d6-6ee3-438f-b59f-431998513f19_1360x1820.png 424w, https://substackcdn.com/image/fetch/$s_!UzuB!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F526b14d6-6ee3-438f-b59f-431998513f19_1360x1820.png 848w, https://substackcdn.com/image/fetch/$s_!UzuB!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F526b14d6-6ee3-438f-b59f-431998513f19_1360x1820.png 1272w, https://substackcdn.com/image/fetch/$s_!UzuB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F526b14d6-6ee3-438f-b59f-431998513f19_1360x1820.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This process is called level compaction.</p><p>Something important to understand: <code>L0</code> is slightly different from all the other levels. <code>L0</code> is created during memtable flushes. If a key already exists at <code>L0</code> and also in the memtable, the next flush can write that key again to a new <code>L0</code> file. In other words, <code>L0</code> can have overlapping keys.</p><p>For all the other levels (<code>L1</code> to <code>Ln</code>), that&#8217;s not the case. They are created by compaction, which removes duplicates and produces non-overlapping key ranges. In this week&#8217;s simplified design, an <code>Li-1</code> to <code>Li</code> compaction takes all SSTables from <code>Li-1</code> and <code>Li</code>, performs a k-way merge, then rewrites <code>Li</code> fully. As a result, each key appears at most once per level from <code>L1</code> downward.</p><p>What&#8217;s the consequence of non-overlapping keys? You can improve lookups using a simple range-to-file mapping, for example:</p><ul><li><p>Keys from <code>aaa</code> to <code>bac</code> are stored in this SSTable.</p></li><li><p>Keys from <code>bac</code> to <code>cad</code> are stored in this SSTable.</p></li><li><p>etc.</p></li></ul><p>With this setup, a read checks only one SSTable per level from <code>L1</code> to <code>Ln</code>. <code>L0</code> is the exception due to overlaps, so a read may still need to scan all <code>L0</code> SSTables.</p><h1>Your Tasks</h1><p>&#128172; If you want to share your progress, discuss solutions, or collaborate with other coders, join the community Discord server (<code>#kv-store-engine</code> channel):</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://discord.thecoder.cafe/&quot;,&quot;text&quot;:&quot;Join the Discord&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://discord.thecoder.cafe/"><span>Join the Discord</span></a></p><h2>Leveling</h2><ul><li><p>Limit the number of levels to two:</p><ul><li><p><code>L0</code>, which may contain overlapping keys.</p></li><li><p><code>L1</code>, no overlapping keys.</p></li></ul></li><li><p>Create a folder for each level: <code>/l0</code>, and <code>/l1</code>.</p></li><li><p>Keep one global <code>MANIFEST</code> file at the root.</p></li></ul><h2>Manifest Layout</h2><p>You will create a <code>MANIFEST</code> layout for both <code>L0</code> and <code>L1</code>:</p><ul><li><p><code>L0</code> remains a simple list of SSTables.</p></li><li><p><code>L1</code> allows key-range partitioning.</p></li></ul><p>For example:</p><pre><code>[L0]
sst-11.json
sst-12.json
sst-13.json

[L1]
aaa-karate: sst-1.json
karate-leo: sst-2.json
leo-yaml: sst-3.json</code></pre><p>This <code>MANIFEST</code> indicates:</p><ul><li><p><code>L0</code> is composed of three SSTables:</p><ul><li><p><code>/l0/sst-11.json</code>.</p></li><li><p><code>/l0/sst-12.json</code>.</p></li><li><p><code>/l0/sst-13.json</code>.</p></li></ul></li><li><p><code>L1</code>:</p><ul><li><p>Keys between <code>aaa</code> (included) and <code>karate</code> (excluded) live in <code>/l1/sst-1.json</code>.</p></li><li><p>Keys between <code>karate</code> (included) and <code>leo</code> (excluded) live in <code>/l1/sst-2.json</code>.</p></li><li><p>Keys between <code>leo</code> (included) and <code>yaml</code> (excluded) live in <code>/l1/sst-3.json</code>.</p></li></ul></li></ul><h2>Compaction</h2><p>The main goal of the compaction process is to compact both <code>L0</code> and <code>L1</code>. At the end, you should merge all the data from <code>L0</code> and <code>L1</code> into <code>L1</code>. <code>L0</code> will be left empty.</p><p>When <code>L0</code> reaches five full SSTable files (2,000 entries each), run an <code>L0</code> &#8594; <code>L1</code> compaction:</p><ul><li><p>Open iterators on all <code>L0</code> and <code>L1</code> SSTables.</p></li><li><p>Apply the k-way merge algorithm:</p><ul><li><p>Comparator:</p><ul><li><p>Primary: <code>key</code>.</p></li><li><p>Tie-break (equal <code>key</code>):</p><ol><li><p>Prefer <code>L0</code> over <code>L1</code>.</p></li><li><p>At <code>L0</code>, prefer the newest SSTable.</p></li></ol></li></ul></li><li><p>Version order: any record from <code>L0</code> is newer than records from <code>L1</code>. Within <code>L0</code>, newer files win (same as week 4).</p></li><li><p>Keep at most one record per key (newest wins).</p></li><li><p>Tombstones: because <code>L1</code> is the bottom level, drop a tombstone if no older value for that key remains in the merge result.</p></li><li><p>Create new L1 SSTables with at most 2,000 entries.</p></li><li><p>When naming new L1 files, make sure they are unique. For example, if <code>/l1</code> contains <code>sst-1.json</code> and <code>sst-2.json</code>, the first SSTable file created should be <code>sst-3.json</code>.</p></li></ul></li><li><p>Publish atomically:</p><ol><li><p><code>fsync</code> each new <code>L1</code> file</p></li><li><p><code>fsync</code> the <code>/l1</code> directory.</p></li><li><p>Update the <code>MANIFEST</code> atomically.</p></li><li><p><code>fsync</code> the <code>MANIFEST</code> file.</p></li><li><p><code>fsync</code> the root directory (the directory containing the <code>MANIFEST</code> file and <code>/l0</code> and <code>/l1</code> folders).</p></li></ol></li><li><p>Clean up: </p><ol><li><p>Delete obsolete L1 files, then <code>fsync</code> <code>/l1</code>.</p></li><li><p>Delete all files in <code>/l0</code>, then <code>fsync</code> <code>/l0</code>.</p></li></ol></li></ul><h2>Flush (memtable to <code>L0</code>)</h2><p>The logic is unchanged from previous weeks. The only difference is that flush writes to <code>L0</code> and updates the <code>MANIFEST</code> file in the <code>[L0]</code> section.</p><h2>GET</h2><ul><li><p>Check the memtable.</p></li><li><p>If not found, scan all <code>L0</code> files newest to oldest using section <code>[L0]</code> of the <code>MANIFEST</code>.</p></li><li><p>If not found at <code>L0</code>:</p><ol><li><p>Use the section <code>[L1]</code> of the <code>MANIFEST</code> to choose the one shard that contains the key&#8217;s range, then read only that L1 file.</p></li><li><p>Return the value if found; otherwise, return <code>404 Not Found</code>.</p></li></ol></li></ul><h2>Client &amp; Validation</h2><p>There are no changes to the client. Run it against the same file (<a href="https://github.com/teivah/thecodercafe/blob/main/res/kv/gen/put-delete.txt">put-delete.txt</a>) to validate that your changes are correct.</p><h2>[Optional] Configurable Number of Levels</h2><p>Introducing leveling has a fundamental impact on deletions. With a single level, compaction sees all versions of every key at once, so a tombstone can be dropped as soon as it has &#8220;killed&#8220; every older record for that key. Yet, the rule we mentioned last week holds true: a tombstone can be evicted only after all data it shadows no longer exist on disk.</p><p>With multiple levels, compaction must propagate tombstones downward. It&#8217;s only at the bottommost level <code>Ln</code> that tombstones can be dropped, because only there you can prove they no longer shadow any other records.</p><p>As an optional task, make the number of levels configurable: <code>L0</code>, <code>L1</code>, &#8230;, <code>Ln</code>:</p><ul><li><p>Define a size ratio so each level has a target size larger than the previous one.</p></li><li><p>Keep one directory per level: <code>/l0</code>, <code>/l1</code>, &#8230;, <code>/ln</code>.</p></li><li><p>Keep a single global <code>MANIFEST</code>.</p></li><li><p>When a level reaches its max number of SSTables (derived from the size ratio), compact that level into the next.</p></li><li><p>Only drop tombstones at the bottommost level <code>Ln</code>. At any intermediate level <code>Li</code> with <code>0 &#8804; i &lt; n</code>, propagate the tombstone downward during compaction.</p></li></ul><h2>[Optional] SCAN</h2><p>Implement <code>GET /scan?start={start}&amp;end={end}</code>:</p><ul><li><p>Return all keys between <code>start</code> (included) and <code>end</code> (excluded).</p></li><li><p>Use <a href="https://github.com/teivah/thecodercafe/blob/main/res/kv/gen/put-delete-scan.txt">put-delete-scan.txt</a> to validate that your changes are correct. It introduces the <code>SCAN</code> keyword. For example:</p><pre><code>SCAN a-c aaa,bbb,bdx</code></pre><p>This line means: between <code>a</code> (included) and <code>c</code> (excluded), the keys are <code>aaa</code>, <code>bbb</code>, <code>bdx</code> (the output will always be sorted)</p></li></ul><blockquote><p><strong>NOTE</strong>: If this route conflicts with <code>GET /{key}</code>, rename the single-key route to <code>GET /keys/{key}</code>.</p></blockquote><h1>Wrap Up</h1><p>That&#8217;s it for this week! Your LSM tree is taking shape. You implemented leveling, a key LSM design idea, and refined compaction so reads are tighter and storage stays under control.</p><p>In two weeks, you will revisit the week 2 choice of JSON for SSTables. You will switch to block-based SSTables to reduce parsing and I/O overhead and add indexing within each SSTable.</p><h1>Further Notes</h1><p>We mentioned that, because of key overlaps, a read may still need to scan all <code>L0</code> SSTables (e.g., key miss). This is the main reason why <code>L0</code> is typically kept small. In general, each level is larger than the one above it by a fixed size ratio (e.g., 10&#215;). Some databases even use less static mechanisms. For instance, RocksDB relies on <a href="https://rocksdb.org/blog/2015/07/23/dynamic-level.html">Dynamic Leveled Compaction</a>, where the size of each level is automatically adjusted based on the size of the oldest (last) level, eliminating the need to define each level&#8217;s size statically.</p><p>Regarding compaction, you should know that in real-world databases, it isn&#8217;t done in batch mode across all data. Let&#8217;s understand why.</p><p>Suppose you have four levels and a layout like this for one key:</p><ul><li><p>The key exists at L3.</p></li><li><p>The key doesn&#8217;t exist at L2.</p></li><li><p>The key is updated at L1.</p></li><li><p>A tombstone is placed at L0.</p></li></ul><p>You can&#8217;t compact L0 with L1/L2/L3 in one shot; that would mean checking every SSTable against every level.</p><p>What happens in reality is that compaction is a promotion process. In our example, the tombstone at L0 is promoted to L1. Implementations ensure that it either (a) is compacted together with the L1 SSTable it shadows, or (b) waits until that L1 data is promoted to L2. The same rule repeats level by level, until the tombstone reaches L3 and finally removes the shadowed value.</p><p>Meanwhile, it&#8217;s essential to understand that compaction is crucial in LSM trees. Let&#8217;s take some perspective to understand the reason. An LSM tree buffers writes in a memtable and flushes to L0. Compaction merges SSTables across levels to control read amplification. If compaction falls behind, L0 files accumulate, flushes slow down (or stall at file-count thresholds), write latency climbs, and in the worst case, you can observe write pauses. Not because the memtable is &#8220;locked,&#8221; but because the engine can&#8217;t safely create more L0 files until compaction catches up.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!NPHB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d4e2086-c0e2-4179-86c8-f047be7c75cf_1680x2120.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!NPHB!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d4e2086-c0e2-4179-86c8-f047be7c75cf_1680x2120.png 424w, https://substackcdn.com/image/fetch/$s_!NPHB!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d4e2086-c0e2-4179-86c8-f047be7c75cf_1680x2120.png 848w, https://substackcdn.com/image/fetch/$s_!NPHB!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d4e2086-c0e2-4179-86c8-f047be7c75cf_1680x2120.png 1272w, https://substackcdn.com/image/fetch/$s_!NPHB!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d4e2086-c0e2-4179-86c8-f047be7c75cf_1680x2120.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!NPHB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d4e2086-c0e2-4179-86c8-f047be7c75cf_1680x2120.png" width="599" height="755.7438186813187" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1d4e2086-c0e2-4179-86c8-f047be7c75cf_1680x2120.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1837,&quot;width&quot;:1456,&quot;resizeWidth&quot;:599,&quot;bytes&quot;:288097,&quot;alt&quot;:&quot;Diagram showing the LSM tree hierarchy with a pink &#8220;Client&#8221; box writing to a blue &#8220;Memtable.&#8221; Below, multiple disk levels labeled &#8220;Level 0,&#8221; &#8220;Level 1,&#8221; and &#8220;Level 2&#8221; each contain several yellow boxes labeled &#8220;SSTable.&#8221; Arrows labeled &#8220;Flush&#8221; and &#8220;Compaction&#8221; point downward, while red arrows labeled &#8220;Backpressure&#8221; point upward from each level back toward the memtable and client.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.thecoder.cafe/i/174613510?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d4e2086-c0e2-4179-86c8-f047be7c75cf_1680x2120.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Diagram showing the LSM tree hierarchy with a pink &#8220;Client&#8221; box writing to a blue &#8220;Memtable.&#8221; Below, multiple disk levels labeled &#8220;Level 0,&#8221; &#8220;Level 1,&#8221; and &#8220;Level 2&#8221; each contain several yellow boxes labeled &#8220;SSTable.&#8221; Arrows labeled &#8220;Flush&#8221; and &#8220;Compaction&#8221; point downward, while red arrows labeled &#8220;Backpressure&#8221; point upward from each level back toward the memtable and client." title="Diagram showing the LSM tree hierarchy with a pink &#8220;Client&#8221; box writing to a blue &#8220;Memtable.&#8221; Below, multiple disk levels labeled &#8220;Level 0,&#8221; &#8220;Level 1,&#8221; and &#8220;Level 2&#8221; each contain several yellow boxes labeled &#8220;SSTable.&#8221; Arrows labeled &#8220;Flush&#8221; and &#8220;Compaction&#8221; point downward, while red arrows labeled &#8220;Backpressure&#8221; point upward from each level back toward the memtable and client." srcset="https://substackcdn.com/image/fetch/$s_!NPHB!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d4e2086-c0e2-4179-86c8-f047be7c75cf_1680x2120.png 424w, https://substackcdn.com/image/fetch/$s_!NPHB!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d4e2086-c0e2-4179-86c8-f047be7c75cf_1680x2120.png 848w, https://substackcdn.com/image/fetch/$s_!NPHB!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d4e2086-c0e2-4179-86c8-f047be7c75cf_1680x2120.png 1272w, https://substackcdn.com/image/fetch/$s_!NPHB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d4e2086-c0e2-4179-86c8-f047be7c75cf_1680x2120.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This is one of the reasons why the RUM conjecture we introduced last week is important.</p><ul><li><p>If you compact too eagerly, you burn a lot of disk I/O and lose the LSM&#8217;s write advantage.</p></li><li><p>If you compact too lazily, you incur a penalty on your read path.</p></li><li><p>If you compact everything all the time, you incur a space-amplification penalty during compaction roughly equal to the working set size.</p></li></ul><p>Because compaction is so important, most key-value stores support parallel compactions across levels (except <code>L0</code> &#8594; <code>L1</code>, which isn&#8217;t parallelized due to overlapping key ranges in L0).</p><p>You should also be aware that ongoing research keeps improving compaction. For example, the <em><a href="https://drive.google.com/file/d/1RCBW70TNXqGowl4I7cPjRJjzpfZqI0rs/view">SILK: Preventing Latency Spikes in LSM Key-Value Stores</a></em> paper analyzes why LSM systems can exhibit high tail latency. The main reason is that limited I/O bandwidth causes interference between client writes, flushes, and compactions. The key takeaway is that not all internal operations are equal. The paper explores solutions such as</p><ul><li><p>Bandwidth awareness: Monitor client I/O and allocate the leftover to internal work dynamically instead of static configuration.</p></li><li><p>Prioritization: Give priority to operations near the top of the tree (flushes and L0 &#8594; L1 compaction). Slowdowns there create backpressure that impacts tail latency more than work at deeper levels.</p></li></ul><p>Last but not least, what you implemented this week is called level compaction. Other strategies like tiered compaction exist, which merge SSTables based on their size and count rather than fixed levels. You can explore <a href="https://smalldatum.blogspot.com/2018/08/name-that-compaction-algorithm.html">this great resource</a> from Mark Callaghan, which dives deeper into the design trade-offs and performance characteristics of different compaction strategies in LSM trees.</p><p>Next: <a href="https://read.thecoder.cafe/p/build-your-own-kv-engine-6">Week 6: Block-Based SSTables and Indexing</a></p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ZoDz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ZoDz!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png 424w, https://substackcdn.com/image/fetch/$s_!ZoDz!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png 848w, https://substackcdn.com/image/fetch/$s_!ZoDz!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png 1272w, https://substackcdn.com/image/fetch/$s_!ZoDz!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ZoDz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png" width="449" height="224.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:600,&quot;width&quot;:1200,&quot;resizeWidth&quot;:449,&quot;bytes&quot;:82853,&quot;alt&quot;:&quot;The Coder Cafe: Learn One Concept With Your Coffee.&quot;,&quot;title&quot;:&quot;The Coder Cafe: Learn One Concept With Your Coffee.&quot;,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.thecoder.cafe/i/151119215?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="The Coder Cafe: Learn One Concept With Your Coffee." title="The Coder Cafe: Learn One Concept With Your Coffee." srcset="https://substackcdn.com/image/fetch/$s_!ZoDz!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png 424w, https://substackcdn.com/image/fetch/$s_!ZoDz!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png 848w, https://substackcdn.com/image/fetch/$s_!ZoDz!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png 1272w, https://substackcdn.com/image/fetch/$s_!ZoDz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://read.thecoder.cafe/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">AI is getting better every day. Are you? At The Coder Cafe, we serve fundamental concepts to make you an engineer that AI can&#8217;t replace. Written by a Google SWE, trusted by thousands of engineers worldwide.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>&#10084;&#65039; <em>If you enjoyed this post, please hit the like button.</em></p>]]></content:encoded></item><item><title><![CDATA[Build Your Own Key-Value Storage Engine—Week 4]]></title><description><![CDATA[Deletes, Tombstones, and Compaction]]></description><link>https://read.thecoder.cafe/p/build-your-own-kv-engine-4</link><guid isPermaLink="false">https://read.thecoder.cafe/p/build-your-own-kv-engine-4</guid><dc:creator><![CDATA[Teiva Harsanyi]]></dc:creator><pubDate>Wed, 17 Dec 2025 13:00:27 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!XgwD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22e744e2-f920-4fde-b050-459b014e8307_1600x800.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1>Monster Scale Summit</h1><p><em>Curious how leading engineers tackle extreme scale challenges with data-intensive applications? Join Monster Scale Summit (free + virtual). It&#8217;s hosted by ScyllaDB, the monstrously fast and scalable database.</em></p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://www.scylladb.com/monster-scale-summit/?latest_sfdc_campaign=701Rb00000YVkNx&amp;campaign_status=Submitted&amp;utm_campaign=pn%20coder%20cafe%202026-03-11%20monster%20scale%20summit&amp;utm_medium=paid%20newsletter&amp;utm_source=paid%20newsletter&amp;lead_source_type=coder%20cafe" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!p4cN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png 424w, https://substackcdn.com/image/fetch/$s_!p4cN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png 848w, https://substackcdn.com/image/fetch/$s_!p4cN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png 1272w, https://substackcdn.com/image/fetch/$s_!p4cN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!p4cN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png" width="1456" height="182" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:182,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:737127,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:&quot;&quot;,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://www.scylladb.com/monster-scale-summit/?latest_sfdc_campaign=701Rb00000YVkNx&amp;campaign_status=Submitted&amp;utm_campaign=pn%20coder%20cafe%202026-03-11%20monster%20scale%20summit&amp;utm_medium=paid%20newsletter&amp;utm_source=paid%20newsletter&amp;lead_source_type=coder%20cafe&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.thecoder.cafe/i/174600320?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!p4cN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png 424w, https://substackcdn.com/image/fetch/$s_!p4cN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png 848w, https://substackcdn.com/image/fetch/$s_!p4cN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png 1272w, https://substackcdn.com/image/fetch/$s_!p4cN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a></figure></div><div><hr></div><h1>Agenda</h1><ul><li><p><a href="https://read.thecoder.cafe/p/build-your-own-kv-engine">Week 0: Introduction</a></p></li><li><p><a href="https://read.thecoder.cafe/p/build-your-own-kv-engine-1">Week 1: In-Memory Store</a></p></li><li><p><a href="https://read.thecoder.cafe/p/build-your-own-kv-engine-2">Week 2: LSM Tree Foundations</a></p></li><li><p><a href="https://read.thecoder.cafe/p/build-your-own-kv-engine-3">Week 3: Durability with Write-Ahead Logging</a></p></li><li><p><strong><a href="https://read.thecoder.cafe/p/build-your-own-kv-engine-4">Week 4: Deletes, Tombstones, and Compaction</a></strong></p></li><li><p><a href="https://read.thecoder.cafe/p/build-your-own-kv-engine-5">Week 5: Leveling and Key-Range Partitioning</a></p></li><li><p><a href="https://read.thecoder.cafe/p/build-your-own-kv-engine-6">Week 6: Block-Based SSTables and Indexing</a></p></li><li><p><a href="https://read.thecoder.cafe/p/build-your-own-kv-engine-7">Week 7: Bloom Filters and Trie Memtable</a></p></li><li><p><a href="https://read.thecoder.cafe/p/build-your-own-kv-engine-8">Week 8: Concurrency</a></p></li></ul><h1>Introduction</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!XgwD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22e744e2-f920-4fde-b050-459b014e8307_1600x800.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!XgwD!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22e744e2-f920-4fde-b050-459b014e8307_1600x800.png 424w, https://substackcdn.com/image/fetch/$s_!XgwD!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22e744e2-f920-4fde-b050-459b014e8307_1600x800.png 848w, https://substackcdn.com/image/fetch/$s_!XgwD!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22e744e2-f920-4fde-b050-459b014e8307_1600x800.png 1272w, https://substackcdn.com/image/fetch/$s_!XgwD!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22e744e2-f920-4fde-b050-459b014e8307_1600x800.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!XgwD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22e744e2-f920-4fde-b050-459b014e8307_1600x800.png" width="1456" height="728" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/22e744e2-f920-4fde-b050-459b014e8307_1600x800.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:728,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2297638,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://read.thecoder.cafe/i/174613473?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22e744e2-f920-4fde-b050-459b014e8307_1600x800.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!XgwD!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22e744e2-f920-4fde-b050-459b014e8307_1600x800.png 424w, https://substackcdn.com/image/fetch/$s_!XgwD!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22e744e2-f920-4fde-b050-459b014e8307_1600x800.png 848w, https://substackcdn.com/image/fetch/$s_!XgwD!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22e744e2-f920-4fde-b050-459b014e8307_1600x800.png 1272w, https://substackcdn.com/image/fetch/$s_!XgwD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22e744e2-f920-4fde-b050-459b014e8307_1600x800.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Over the past few weeks, you built an LSM tree and three main components: a memtable, SSTables, and a WAL that records the same operations you keep in the memtable.</p><p>To prevent on-disk data from growing forever, you will implement compaction, a critical process in LSM trees. Compaction periodically merges SSTables to reclaim space and keep read performance predictable. For example, if key <code>1234</code> exists in every SSTable on disk:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!4DBE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcdfa524f-63f7-4c5b-b720-bcaf5b4f8663_640x680.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4DBE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcdfa524f-63f7-4c5b-b720-bcaf5b4f8663_640x680.png 424w, https://substackcdn.com/image/fetch/$s_!4DBE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcdfa524f-63f7-4c5b-b720-bcaf5b4f8663_640x680.png 848w, https://substackcdn.com/image/fetch/$s_!4DBE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcdfa524f-63f7-4c5b-b720-bcaf5b4f8663_640x680.png 1272w, https://substackcdn.com/image/fetch/$s_!4DBE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcdfa524f-63f7-4c5b-b720-bcaf5b4f8663_640x680.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4DBE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcdfa524f-63f7-4c5b-b720-bcaf5b4f8663_640x680.png" width="290" height="308.125" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cdfa524f-63f7-4c5b-b720-bcaf5b4f8663_640x680.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:680,&quot;width&quot;:640,&quot;resizeWidth&quot;:290,&quot;bytes&quot;:133247,&quot;alt&quot;:&quot;Diagram showing three SSTables stacked vertically, each containing the same key 1234 with different values (foo, bar, and baz).&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://read.thecoder.cafe/i/174613473?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcdfa524f-63f7-4c5b-b720-bcaf5b4f8663_640x680.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Diagram showing three SSTables stacked vertically, each containing the same key 1234 with different values (foo, bar, and baz)." title="Diagram showing three SSTables stacked vertically, each containing the same key 1234 with different values (foo, bar, and baz)." srcset="https://substackcdn.com/image/fetch/$s_!4DBE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcdfa524f-63f7-4c5b-b720-bcaf5b4f8663_640x680.png 424w, https://substackcdn.com/image/fetch/$s_!4DBE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcdfa524f-63f7-4c5b-b720-bcaf5b4f8663_640x680.png 848w, https://substackcdn.com/image/fetch/$s_!4DBE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcdfa524f-63f7-4c5b-b720-bcaf5b4f8663_640x680.png 1272w, https://substackcdn.com/image/fetch/$s_!4DBE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcdfa524f-63f7-4c5b-b720-bcaf5b4f8663_640x680.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Compaction drops duplicates and keeps only the newest record: <code>1234=baz</code>.</p><p>In addition, you will implement a <code>DELETE</code> endpoint. Handling deletes in an LSM tree isn&#8217;t straightforward at all: SSTables are immutable. To preserve the append-only nature of LSM trees, deletions are written as tombstones: markers indicating a key was logically deleted. You write it to the WAL, keep it in the memtable, and propagate it during flush.</p><p>How should compaction work in the presence of tombstones? Suppose you have the following SSTables on disk: the key exists in <code>SST-1</code>, doesn&#8217;t exist in <code>SST-2</code>, exists in <code>SST-3</code>, and is deleted at <code>SST-4</code>:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!rqB7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc96eea5b-0fbf-4f4b-8471-05b0235c0f59_640x880.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!rqB7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc96eea5b-0fbf-4f4b-8471-05b0235c0f59_640x880.png 424w, https://substackcdn.com/image/fetch/$s_!rqB7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc96eea5b-0fbf-4f4b-8471-05b0235c0f59_640x880.png 848w, https://substackcdn.com/image/fetch/$s_!rqB7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc96eea5b-0fbf-4f4b-8471-05b0235c0f59_640x880.png 1272w, https://substackcdn.com/image/fetch/$s_!rqB7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc96eea5b-0fbf-4f4b-8471-05b0235c0f59_640x880.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!rqB7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc96eea5b-0fbf-4f4b-8471-05b0235c0f59_640x880.png" width="290" height="398.75" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c96eea5b-0fbf-4f4b-8471-05b0235c0f59_640x880.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:880,&quot;width&quot;:640,&quot;resizeWidth&quot;:290,&quot;bytes&quot;:148258,&quot;alt&quot;:&quot;Diagram with four vertically stacked boxes labeled &#8220;SSTable 1,&#8221; &#8220;SSTable 2,&#8221; &#8220;SSTable 3,&#8221; and &#8220;SSTable 4&#8221;; the first box contains the text &#8220;1234 = foo,&#8221; the second box contains &#8220;Key 1234 doesn&#8217;t exist,&#8221; the third box contains &#8220;1234 = bar,&#8221; and the fourth box contains &#8220;1234 = <deleted>.&#8221;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://read.thecoder.cafe/i/174613473?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc96eea5b-0fbf-4f4b-8471-05b0235c0f59_640x880.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Diagram with four vertically stacked boxes labeled &#8220;SSTable 1,&#8221; &#8220;SSTable 2,&#8221; &#8220;SSTable 3,&#8221; and &#8220;SSTable 4&#8221;; the first box contains the text &#8220;1234 = foo,&#8221; the second box contains &#8220;Key 1234 doesn&#8217;t exist,&#8221; the third box contains &#8220;1234 = bar,&#8221; and the fourth box contains &#8220;1234 = <deleted>.&#8221;" title="Diagram with four vertically stacked boxes labeled &#8220;SSTable 1,&#8221; &#8220;SSTable 2,&#8221; &#8220;SSTable 3,&#8221; and &#8220;SSTable 4&#8221;; the first box contains the text &#8220;1234 = foo,&#8221; the second box contains &#8220;Key 1234 doesn&#8217;t exist,&#8221; the third box contains &#8220;1234 = bar,&#8221; and the fourth box contains &#8220;1234 = <deleted>.&#8221;" srcset="https://substackcdn.com/image/fetch/$s_!rqB7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc96eea5b-0fbf-4f4b-8471-05b0235c0f59_640x880.png 424w, https://substackcdn.com/image/fetch/$s_!rqB7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc96eea5b-0fbf-4f4b-8471-05b0235c0f59_640x880.png 848w, https://substackcdn.com/image/fetch/$s_!rqB7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc96eea5b-0fbf-4f4b-8471-05b0235c0f59_640x880.png 1272w, https://substackcdn.com/image/fetch/$s_!rqB7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc96eea5b-0fbf-4f4b-8471-05b0235c0f59_640x880.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>If the key doesn&#8217;t exist in the memtable, the current state for <code>1234</code> is deleted. Now, imagine that during compaction, you merge <code>SST-3</code> and <code>SST-4</code>. As the key is marked as deleted in the newest SSTable, you may decide to drop the tombstone, as it hides the key in <code>SST-3</code>:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!UEzj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F048a894c-54b3-40a2-80f6-6c662249ef88_1380x880.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!UEzj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F048a894c-54b3-40a2-80f6-6c662249ef88_1380x880.png 424w, https://substackcdn.com/image/fetch/$s_!UEzj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F048a894c-54b3-40a2-80f6-6c662249ef88_1380x880.png 848w, https://substackcdn.com/image/fetch/$s_!UEzj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F048a894c-54b3-40a2-80f6-6c662249ef88_1380x880.png 1272w, https://substackcdn.com/image/fetch/$s_!UEzj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F048a894c-54b3-40a2-80f6-6c662249ef88_1380x880.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!UEzj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F048a894c-54b3-40a2-80f6-6c662249ef88_1380x880.png" width="626" height="399.18840579710144" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/048a894c-54b3-40a2-80f6-6c662249ef88_1380x880.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:880,&quot;width&quot;:1380,&quot;resizeWidth&quot;:626,&quot;bytes&quot;:191910,&quot;alt&quot;:&quot;Diagram showing four stacked boxes labeled &#8220;SSTable 1,&#8221; &#8220;SSTable 2,&#8221; &#8220;SSTable 3,&#8221; and &#8220;SSTable 4&#8221; on the left, each containing key 1234 with various states. An arrow labeled &#8220;Compact&#8221; points from SSTable 3 and SSTable 4 to a new box on the right labeled &#8220;SSTable 5,&#8221; which contains the text &#8220;Key 1234 doesn&#8217;t exist.&#8221;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://read.thecoder.cafe/i/174613473?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F048a894c-54b3-40a2-80f6-6c662249ef88_1380x880.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Diagram showing four stacked boxes labeled &#8220;SSTable 1,&#8221; &#8220;SSTable 2,&#8221; &#8220;SSTable 3,&#8221; and &#8220;SSTable 4&#8221; on the left, each containing key 1234 with various states. An arrow labeled &#8220;Compact&#8221; points from SSTable 3 and SSTable 4 to a new box on the right labeled &#8220;SSTable 5,&#8221; which contains the text &#8220;Key 1234 doesn&#8217;t exist.&#8221;" title="Diagram showing four stacked boxes labeled &#8220;SSTable 1,&#8221; &#8220;SSTable 2,&#8221; &#8220;SSTable 3,&#8221; and &#8220;SSTable 4&#8221; on the left, each containing key 1234 with various states. An arrow labeled &#8220;Compact&#8221; points from SSTable 3 and SSTable 4 to a new box on the right labeled &#8220;SSTable 5,&#8221; which contains the text &#8220;Key 1234 doesn&#8217;t exist.&#8221;" srcset="https://substackcdn.com/image/fetch/$s_!UEzj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F048a894c-54b3-40a2-80f6-6c662249ef88_1380x880.png 424w, https://substackcdn.com/image/fetch/$s_!UEzj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F048a894c-54b3-40a2-80f6-6c662249ef88_1380x880.png 848w, https://substackcdn.com/image/fetch/$s_!UEzj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F048a894c-54b3-40a2-80f6-6c662249ef88_1380x880.png 1272w, https://substackcdn.com/image/fetch/$s_!UEzj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F048a894c-54b3-40a2-80f6-6c662249ef88_1380x880.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Now, <code>GET 1234</code> would do:</p><ul><li><p>Key doesn&#8217;t exist in the memtable &#8594; Continue.</p></li><li><p>Key doesn&#8217;t exist in SST-5 &#8594; Continue.</p></li><li><p>Key doesn&#8217;t exist in SST-2 &#8594; Continue.</p></li><li><p>Key exists in SST-1 &#8594; Return <code>foo</code> (instead of <code>404 Not Found</code>).</p></li></ul><p>The fundamental rule is the following: during compaction, a tombstone can be evicted only after all data it shadows no longer exist on disk. Otherwise, dropping a tombstone too early can make an old value reappear. This is known as data resurrection: a key that &#8220;comes back to life&#8221; after a deletion.</p><h1>Your Tasks</h1><p>&#128172; If you want to share your progress, discuss solutions, or collaborate with other coders, join the community Discord server (<code>#kv-store-engine</code> channel):</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://discord.thecoder.cafe/&quot;,&quot;text&quot;:&quot;Join the Discord&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://discord.thecoder.cafe/"><span>Join the Discord</span></a></p><h2>Assumptions</h2><ul><li><p>Flush and compaction should be single-threaded and stop-the-world operations: do not serve client requests until the operations complete.</p></li></ul><h2>DELETE Endpoint</h2><p>Add <code>DELETE /{key}</code>:</p><ol><li><p>Append a tombstone to the WAL file, with <code>op=delete</code>:</p></li></ol><pre><code><code>{&#8221;op&#8221;:&#8221;delete&#8221;,&#8221;key&#8221;:&#8221;k&#8221;}</code></code></pre><ol start="2"><li><p><code>fsync</code> the WAL.</p></li><li><p>Update the memtable: Do not remove the key directly; mark it deleted with a tombstone.</p></li><li><p>Acknowledge the request.</p></li></ol><h2>Flush</h2><p>During flush, carry tombstones into the new SSTable using a new <code>deleted</code> field. For example:</p><pre><code>[
  ...
  { &#8220;key&#8221;: &#8220;foo&#8221;, &#8220;deleted&#8221;: true }, 
  ...
]</code></pre><p>The keys must remain sorted.</p><h2>Compaction</h2><p>The goals of the compaction process for this week are the following:</p><ul><li><p>For each key, keep only the newest record.</p></li><li><p>Drop records hidden by newer versions. This is where merging happens: the newest record wins, and older versions are evicted.</p></li><li><p>Drop tombstones when no older value remains.</p></li></ul><p>The compaction trigger is: every 10,000 update requests (<code>PUT</code> and <code>DELETE</code>, not <code>GET</code>), compact all SSTables.</p><p>Algorithm (<a href="https://en.wikipedia.org/wiki/K-way_merge_algorithm">k-way</a> merge using a <a href="https://read.thecoder.cafe/p/binary-heaps">min-heap</a> on key):</p><ol><li><p>Open an iterator for each SSTable file known by the MANIFEST.</p></li><li><p>Push each iterator&#8217;s current record into a min-heap with the following comparator:</p><ul><li><p>Primary: <code>key</code>.</p></li><li><p>Tie-break (equal <code>key</code>): Newest SSTable first based on MANIFEST order (to make sure an old value doesn&#8217;t win).</p></li></ul></li><li><p>While the heap is not empty:</p><ol><li><p>Pop the smallest key <code>k</code> (this first pop is the newest version of <code>k</code> due to the tie-break).</p></li><li><p>Drain all other heap entries whose key is <code>k</code> and discard them (older values).</p></li><li><p>For the record you picked:</p><ul><li><p>If it&#8217;s a tombstone, emit nothing for <code>k</code>.</p></li><li><p>Otherwise, emit the value for <code>k</code>.</p></li></ul></li><li><p>Advance only the iterators you drained for <code>k</code> and push their next records into the heap.</p></li></ol></li><li><p>Stream emitted records (sorted) into new SSTables. Remember: the max entries in an SSTable should remain 2,000.</p></li><li><p><code>fsync</code> each new SSTable file, then <code>fsync</code> its parent directory.</p></li><li><p>Update the MANIFEST atomically (see week 3).</p></li><li><p>Remove the old SSTable files.</p></li></ol><h2>GET</h2><ul><li><p>Check the memtable:</p><ul><li><p>If the key is marked as deleted, return <code>404 Not Found</code>.</p></li><li><p>Else, return the value.</p></li></ul></li><li><p>Scan SSTables from newest to oldest, given the MANIFEST order (same as before).</p></li><li><p>For the first record with the requested key:</p><ul><li><p>If <code>deleted=true</code>, return <code>404 Not Found</code>.</p></li><li><p>Else, return the value.</p></li></ul></li><li><p>If the key isn&#8217;t found, return <code>404 Not Found</code>.</p></li></ul><h2>Startup</h2><ul><li><p>When replaying the WAL, make sure to take into account tombstone values (<code>op=delete</code>).</p></li></ul><h2>Client &amp; Validation</h2><ul><li><p>Update your client to handle <code>DELETE k</code> lines &#8594; Send a <code>DELETE</code> request to <code>/k</code>.</p></li><li><p>Download and run your client against a new file containing <code>DELETE</code> requests: <a href="https://github.com/teivah/thecodercafe/blob/main/res/kv/gen/put-delete.txt">put-delete.txt</a>.</p><blockquote><p><strong>NOTE</strong>: Refer to week 1 if you need to generate your own file with the number of lines you want.</p></blockquote></li></ul><h1>Wrap Up</h1><p>That&#8217;s it for this week! Your storage engine now supports deletes and a compaction mechanism that prevents unbounded growth.</p><p>The Coder Cafe will take a break for two weeks. On January 7th, you will continue exploring LSM trees and cover leveling. In your current implementation, a miss still scans all SSTables; therefore, you will also add key range partitioning to limit the number of SSTables that need to be checked during a lookup.</p><p>See you next year!</p><h1>Further Notes</h1><p>The compaction trigger you used was simple: every 10,000 PUT or DELETE requests. In real systems, compaction is usually driven by factors such as too many SSTable files, space pressure, or high read amplification.</p><p>Also, many systems add safeguards to keep compaction controlled and resource-efficient. For example, a common one is bounded fan-in (merging only a small, fixed number of SSTables per batch), so the engine never opens every file at once. Others track each SSTable&#8217;s first and last key to select only overlapping candidates, hence avoiding unrelated files.</p><p>Taking a step back, it&#8217;s interesting to note that the core LSM idea&#8212;append-only writes with regular compaction&#8212;shows up in many systems, even outside pure LSM trees. For example:</p><ul><li><p><a href="https://lucenenet.apache.org/quick-start/introduction.html#lucenes-lsm-inspired-architecture">Lucene</a>: Immutable segments are created and later merged in the background, an LSM-like pattern, even though it isn&#8217;t an LSM tree per se.</p></li><li><p><a href="https://docs.memcached.org/features/flashstorage/">Memcached Extstore</a>: Flushes values to free RAM, but keeps the hashtable, keys, and storage pointers in memory. It later compacts the data.</p></li><li><p><a href="https://docs.confluent.io/kafka/design/log_compaction.html">Kafka</a>: Rewrites segments to keep the latest value per key and drop older versions, which is conceptually similar to SSTable compaction.</p></li></ul><p>Also, we briefly introduced the concept of key resurrection in the introduction. You should be aware that this is a common challenge with LSM trees. In real-world conditions, crashes, slow WAL truncation, and complex compaction can allow an old value to be replayed during recovery after its tombstone has been removed, leading to key resurrection. Here are two great references that delve more into this kind of problem:</p><ul><li><p><em><a href="https://resources.scylladb.com/overhead-complexity/preventing-data-resurrection-with-repair-based-tombstone-garbage-collection">Preventing Data Resurrection with Repair Based Tombstone Garbage Collection</a></em></p></li><li><p><a href="https://msun.io/cassandra-scylla-repairs/">Repair Time Requirements to Prevent Data Resurrection in Cassandra &amp; Scylla</a></p></li></ul><p>Another excellent reference is <em><a href="https://cs-people.bu.edu/mathan/publications/sigmod23-zhu.pdf">Acheron: Persisting Tombstones in LSM Engines</a></em>. It shows how standard LSM compaction can leave tombstones stuck for long periods, so &#8220;deleted" data may still linger in lower levels and complicate compliance requirements such as GDPR/CCPA compliance. The paper introduces delete-aware techniques that prioritize pushing tombstones down the tree to make deletions persist more predictably.</p><p>Lastly, you can explore the <a href="https://openproceedings.org/2016/conf/edbt/paper-12.pdf">RUM conjecture</a>. Structurally, it&#8217;s similar to <a href="https://www.thecoder.cafe/p/cap">the CAP theorem</a>: &#8220;<em>three things, pick two&#8221;</em>. In short, you can make a database excel at two of: reads, updates (insert/change/delete), and memory/space, but not all three at once. Make any two really good and the third gets worse; that&#8217;s an unavoidable trade-off. This helps explain why, for example, LSM trees optimized for fast updates and good space efficiency pay a cost in read performance due to read amplification.</p><p>That trade-off shows up in the design of the compaction process you implemented this week: you trade space and significant I/O for simplicity by compacting everything in one shot. This is fine for the example, but with 500GB of SSTables, you may need roughly another 500GB of free space during the merge in the worst case.</p><p>Next: <a href="https://read.thecoder.cafe/p/build-your-own-kv-engine-5">Week 5: Leveling and Key-Range Partitioning</a></p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ZoDz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ZoDz!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png 424w, https://substackcdn.com/image/fetch/$s_!ZoDz!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png 848w, https://substackcdn.com/image/fetch/$s_!ZoDz!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png 1272w, https://substackcdn.com/image/fetch/$s_!ZoDz!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ZoDz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png" width="449" height="224.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:600,&quot;width&quot;:1200,&quot;resizeWidth&quot;:449,&quot;bytes&quot;:82853,&quot;alt&quot;:&quot;The Coder Cafe: Learn One Concept With Your Coffee.&quot;,&quot;title&quot;:&quot;The Coder Cafe: Learn One Concept With Your Coffee.&quot;,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.thecoder.cafe/i/151119215?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="The Coder Cafe: Learn One Concept With Your Coffee." title="The Coder Cafe: Learn One Concept With Your Coffee." srcset="https://substackcdn.com/image/fetch/$s_!ZoDz!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png 424w, https://substackcdn.com/image/fetch/$s_!ZoDz!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png 848w, https://substackcdn.com/image/fetch/$s_!ZoDz!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png 1272w, https://substackcdn.com/image/fetch/$s_!ZoDz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://read.thecoder.cafe/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Missing direction in your tech career? At The Coder Cafe, we serve timeless concepts with your coffee to help you master the fundamentals. Written by a Google SWE and trusted by thousands of readers, we support your growth as an engineer, one coffee at a time.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>&#10084;&#65039; <em>If you enjoyed this post, please hit the like button.</em></p>]]></content:encoded></item><item><title><![CDATA[Build Your Own Key-Value Storage Engine—Week 3]]></title><description><![CDATA[Durability with Write-Ahead Logging]]></description><link>https://read.thecoder.cafe/p/build-your-own-kv-engine-3</link><guid isPermaLink="false">https://read.thecoder.cafe/p/build-your-own-kv-engine-3</guid><dc:creator><![CDATA[Teiva Harsanyi]]></dc:creator><pubDate>Wed, 03 Dec 2025 13:00:57 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!EuPg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc14ddec1-bab4-4a52-92a9-4c2bd99deef6_1600x800.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1>Monster Scale Summit</h1><p><em>Curious how leading engineers tackle extreme scale challenges with data-intensive applications? Join Monster Scale Summit (free + virtual). It&#8217;s hosted by ScyllaDB, the monstrously fast and scalable database.</em></p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://www.scylladb.com/monster-scale-summit/?latest_sfdc_campaign=701Rb00000YVkNx&amp;campaign_status=Submitted&amp;utm_campaign=pn%20coder%20cafe%202026-03-11%20monster%20scale%20summit&amp;utm_medium=paid%20newsletter&amp;utm_source=paid%20newsletter&amp;lead_source_type=coder%20cafe" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!p4cN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png 424w, https://substackcdn.com/image/fetch/$s_!p4cN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png 848w, https://substackcdn.com/image/fetch/$s_!p4cN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png 1272w, https://substackcdn.com/image/fetch/$s_!p4cN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!p4cN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png" width="1456" height="182" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:182,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:737127,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:&quot;&quot;,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://www.scylladb.com/monster-scale-summit/?latest_sfdc_campaign=701Rb00000YVkNx&amp;campaign_status=Submitted&amp;utm_campaign=pn%20coder%20cafe%202026-03-11%20monster%20scale%20summit&amp;utm_medium=paid%20newsletter&amp;utm_source=paid%20newsletter&amp;lead_source_type=coder%20cafe&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.thecoder.cafe/i/174600320?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!p4cN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png 424w, https://substackcdn.com/image/fetch/$s_!p4cN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png 848w, https://substackcdn.com/image/fetch/$s_!p4cN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png 1272w, https://substackcdn.com/image/fetch/$s_!p4cN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a></figure></div><div><hr></div><h1>Agenda</h1><ul><li><p><a href="https://read.thecoder.cafe/p/build-your-own-kv-engine">Week 0: Introduction</a></p></li><li><p><a href="https://read.thecoder.cafe/p/build-your-own-kv-engine-1">Week 1: In-Memory Store</a></p></li><li><p><a href="https://read.thecoder.cafe/p/build-your-own-kv-engine-2">Week 2: LSM Tree Foundations</a></p></li><li><p><strong><a href="https://read.thecoder.cafe/p/build-your-own-kv-engine-3">Week 3: Durability with Write-Ahead Logging</a></strong></p></li><li><p><a href="https://read.thecoder.cafe/p/build-your-own-kv-engine-4">Week 4: Deletes, Tombstones, and Compaction</a></p></li><li><p><a href="https://read.thecoder.cafe/p/build-your-own-kv-engine-5">Week 5: Leveling and Key-Range Partitioning</a></p></li><li><p><a href="https://read.thecoder.cafe/p/build-your-own-kv-engine-6">Week 6: Block-Based SSTables and Indexing</a></p></li><li><p><a href="https://read.thecoder.cafe/p/build-your-own-kv-engine-7">Week 7: Bloom Filters and Trie Memtable</a></p></li><li><p><a href="https://read.thecoder.cafe/p/build-your-own-kv-engine-8">Week 8: Concurrency</a></p></li></ul><h1>Introduction</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!EuPg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc14ddec1-bab4-4a52-92a9-4c2bd99deef6_1600x800.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!EuPg!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc14ddec1-bab4-4a52-92a9-4c2bd99deef6_1600x800.png 424w, https://substackcdn.com/image/fetch/$s_!EuPg!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc14ddec1-bab4-4a52-92a9-4c2bd99deef6_1600x800.png 848w, https://substackcdn.com/image/fetch/$s_!EuPg!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc14ddec1-bab4-4a52-92a9-4c2bd99deef6_1600x800.png 1272w, https://substackcdn.com/image/fetch/$s_!EuPg!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc14ddec1-bab4-4a52-92a9-4c2bd99deef6_1600x800.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!EuPg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc14ddec1-bab4-4a52-92a9-4c2bd99deef6_1600x800.png" width="1456" height="728" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c14ddec1-bab4-4a52-92a9-4c2bd99deef6_1600x800.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:728,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2304993,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://read.thecoder.cafe/i/174611991?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc14ddec1-bab4-4a52-92a9-4c2bd99deef6_1600x800.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!EuPg!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc14ddec1-bab4-4a52-92a9-4c2bd99deef6_1600x800.png 424w, https://substackcdn.com/image/fetch/$s_!EuPg!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc14ddec1-bab4-4a52-92a9-4c2bd99deef6_1600x800.png 848w, https://substackcdn.com/image/fetch/$s_!EuPg!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc14ddec1-bab4-4a52-92a9-4c2bd99deef6_1600x800.png 1272w, https://substackcdn.com/image/fetch/$s_!EuPg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc14ddec1-bab4-4a52-92a9-4c2bd99deef6_1600x800.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Last week, you built the first version of an LSM: an in-memory memtable for recent writes, immutable SSTables on disk, and a MANIFEST file listing the SSTable files. However, if the database crashes, data in the memtable would be lost.</p><p>This week, you will focus on durability by introducing Write-Ahead Logging (WAL). A WAL is an append-only file on disk that records the same operations you keep in memory. How it works:</p><ul><li><p>On write, record it in the WAL and the memtable.</p></li><li><p>On restart, you read the WAL from start to end and apply each record to the memtable.</p></li></ul><p>Introducing a WAL is not free, though. Writes are slower because each write also goes to the WAL. It also increases write amplification, the ratio of data written to data requested by a client.</p><p>Another important aspect of durability is when to synchronize a file&#8217;s state with the storage device. When you write to a file, it may appear as saved, but the bytes may sit in memory caches rather than on the physical disk. These caches are managed by the OS&#8217;s filesystem, an abstraction over the disk. If the machine crashes before the data is flushed, you can lose data.</p><p>To force the data to stable storage, you need to call a sync primitive. The simple, portable choice is to call <em><a href="https://man7.org/linux/man-pages/man2/fsync.2.html">fsync</a></em>, a system call that flushes a file&#8217;s buffered data and required metadata to disk.</p><h1>Your Tasks</h1><p>&#128172; If you want to share your progress, discuss solutions, or collaborate with other coders, join the community Discord server (<code>#kv-store-engine</code> channel):</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://discord.thecoder.cafe/&quot;,&quot;text&quot;:&quot;Join the Discord&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://discord.thecoder.cafe/"><span>Join the Discord</span></a></p><h2>PUT</h2><p>For the WAL data format, you won&#8217;t use JSON like the SSTables, but NDJSON (Newline-Delimited JSON). It is a true append-only format with one JSON object per line.</p><ol><li><p>Append a record to the WAL file <code>wal.db</code>, opened with <code>O_APPEND</code>. Set the <code>op</code> field to <code>&#8220;put&#8221;</code>, and the <code>key</code> and <code>value</code> fields to the provided key and value. For example, writing <code>k/v2</code>: </p><pre><code>{&#8221;op&#8221;:&#8221;put&#8221;,&#8221;key&#8221;:&#8221;k&#8221;,&#8221;value&#8221;:&#8221;v1&#8221;}
{&#8221;op&#8221;:&#8221;put&#8221;,&#8221;key&#8221;:&#8221;k&#8221;,&#8221;value&#8221;:&#8221;v2&#8221;} &lt;- Latest write</code></pre></li><li><p><code>fsync</code> <code>wal.db</code>.</p></li><li><p>Update the memtable with the same logic as before:</p><ul><li><p>If the key exists, update the value.</p></li><li><p>Otherwise, create a new entry.</p></li></ul></li><li><p>Acknowledge the HTTP request.</p></li></ol><h2>Startup</h2><ul><li><p>Create an empty <code>wal.db</code> file if it doesn&#8217;t exist.</p></li><li><p>Replay the WAL from start to end. For each valid line, apply it to the memtable.</p></li></ul><h2>Flush</h2><p>Keep the same flush trigger (2,000 entries) and the same logic (stop-the-world operation) as last week:</p><ol><li><p>Write the new SSTable:</p><ol><li><p>Flush the memtable as a new immutable JSON SSTable file with keys sorted (same as before).</p></li><li><p>fsync the SSTable file.</p></li><li><p><code>fsync</code> the parent directory of the SSTable to make the new filename persistent.</p></li></ol></li><li><p>Update the MANIFEST atomically:</p><ol><li><p>Read the current MANIFEST lines into memory and append the new SSTable filename.</p></li><li><p>Open <code>MANIFEST.tmp</code> with <code>O_CREAT | O_TRUNC | O_WRONLY</code>.</p></li><li><p>Write the entire list to <code>MANIFEST.tmp</code> from the start.</p></li><li><p><code>fsync</code> <code>MANIFEST.tmp</code>.</p></li><li><p>Rename <code>MANIFEST.tmp</code> &#8594; <code>MANIFEST</code>.</p></li><li><p>fsync <code>MANIFEST</code>.</p></li><li><p><code>fsync</code> the parent directory of the MANIFEST.</p></li></ol></li><li><p>Reset the WAL:</p><ol><li><p>Truncate the WAL to zero length.</p></li><li><p><code>fsync</code> the WAL file.</p></li></ol></li></ol><h2>Client &amp; Validation</h2><p>If the server is unavailable, do not fail. Retry indefinitely with a short delay (or exponential backoff).</p><p>To assess durability:</p><ul><li><p>Run the client against the same input file (<a href="https://github.com/teivah/thecodercafe/blob/main/res/kv/gen/put.txt">put.txt</a>).</p></li><li><p>Stop and restart your database randomly during the run.</p></li><li><p>Your client should confirm that no acknowledged writes were lost after recovery.</p></li></ul><h2>[Optional] Partial WAL Records</h2><p>Add a per-record checksum to each WAL record. On startup, verify records and stop at the first invalid/truncated one, discarding the tail.</p><p>For reference, ScyllaDB checksums segments using CRC32; see its <a href="https://github.com/scylladb/scylladb/blob/master/docs/dev/commitlog-file-format.md">commitlog segment file format</a> for inspiration.</p><h2>[Optional] Dangling Files</h2><p>Regarding the flush process, if the database crashes after step 1 (write the new SSTable) and before step 2 (update the MANIFEST atomically), you may end up with a dangling SSTable file on disk.</p><p>Add a startup routine to delete any file that exists on disk but is not listed in the MANIFEST. This keeps the data directory aligned with the MANIFEST after a crash.</p><h1>Wrap Up</h1><p>That&#8217;s it for this week! Your storage engine is now durable. On restart, data that was in the memtable is recovered from the WAL. This is made possible by <code>fsync</code> and the atomic update of the MANIFEST.</p><p>Deletion is not handled yet. In the worst case, a miss can read all SSTables, which quickly becomes highly inefficient.</p><p>In two weeks, you will add a <code>DELETE</code> endpoint and learn how SSTables are compacted so the engine can reclaim space and keep reads efficient.</p><h1>Further Notes</h1><p>In your implementation, you used <code>fsync</code> as a simple &#8220;make it durable now&#8220; button. In practice, <code>fsync</code> offers finer control both over <em>what</em> you sync and <em>when</em> you sync.</p><ul><li><p>What: <code>fdatasync</code> (or opening the file with <code>O_DSYNC</code>) persists the data without pushing unrelated metadata, which is usually what you want for WAL appends. You can go further with <code>O_DIRECT + O_DSYNC</code> to bypass the page cache and sync only the data you wrote, but that comes with extra complexity.</p></li><li><p>When: While calling a sync primitive after every request is offered by systems that promise durability, it is often not the default. Many databases use group commit, which batches several writes into one <code>fsync</code> call to amortize the cost while still providing strong guarantees. For additional information, see <a href="https://notes.eatonphil.com/2024-07-01-a-write-ahead-log-is-not-a-universal-part-of-durability.html">A write-ahead log is not a universal part of durability</a> by <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Phil Eaton&quot;,&quot;id&quot;:29407701,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/347f3ad8-fe7e-46da-a04b-f587e1206b56_400x400.jpeg&quot;,&quot;uuid&quot;:&quot;c70ed8a9-fb45-4511-91dc-8346a7eed2eb&quot;}" data-component-name="MentionToDOM"></span>.</p><p>For example, RocksDB provides options for tuning WAL behavior to meet the needs of different applications:</p><ul><li><p>Synchronous WAL writes (what you implemented this week)</p></li><li><p>Group commit.</p></li><li><p>No WAL writes at all.</p></li></ul></li></ul><p>If you want, you can also explore group commit in your implementation and its impact on durability and latency/throughput, since this series will not cover it later.</p><p>Also, you should know that since a WAL adds I/O to the write path, storage engines use a few practical tricks to keep it fast and predictable. A common one is to preallocate fixed-size WAL segments at startup to:</p><ul><li><p>Avoid the penalty of dynamic allocation.</p></li><li><p>Prevent write fragmentation.</p></li><li><p>Align buffers for <code>O_DIRECT</code> (an <em><a href="https://man7.org/linux/man-pages/man2/open.2.html">open</a></em><a href="https://man7.org/linux/man-pages/man2/open.2.html">(2)</a> flag for direct I/O that bypasses the OS page cache).</p></li></ul><p>Next: <a href="https://read.thecoder.cafe/p/build-your-own-kv-engine-4">Week 4: Deletes, Tombstones, and Compaction</a></p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ZoDz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ZoDz!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png 424w, https://substackcdn.com/image/fetch/$s_!ZoDz!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png 848w, https://substackcdn.com/image/fetch/$s_!ZoDz!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png 1272w, https://substackcdn.com/image/fetch/$s_!ZoDz!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ZoDz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png" width="449" height="224.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:600,&quot;width&quot;:1200,&quot;resizeWidth&quot;:449,&quot;bytes&quot;:82853,&quot;alt&quot;:&quot;The Coder Cafe: Learn One Concept With Your Coffee.&quot;,&quot;title&quot;:&quot;The Coder Cafe: Learn One Concept With Your Coffee.&quot;,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.thecoder.cafe/i/151119215?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="The Coder Cafe: Learn One Concept With Your Coffee." title="The Coder Cafe: Learn One Concept With Your Coffee." srcset="https://substackcdn.com/image/fetch/$s_!ZoDz!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png 424w, https://substackcdn.com/image/fetch/$s_!ZoDz!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png 848w, https://substackcdn.com/image/fetch/$s_!ZoDz!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png 1272w, https://substackcdn.com/image/fetch/$s_!ZoDz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://read.thecoder.cafe/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">AI is getting better every day. Are you? At The Coder Cafe, we serve fundamental concepts to make you an engineer that AI can&#8217;t replace. Written by a Google SWE, trusted by thousands of engineers worldwide.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>&#10084;&#65039; <em>If you enjoyed this post, please hit the like button.</em></p>]]></content:encoded></item><item><title><![CDATA[Build Your Own Key-Value Storage Engine—Week 2]]></title><description><![CDATA[LSM Tree Foundations]]></description><link>https://read.thecoder.cafe/p/build-your-own-kv-engine-2</link><guid isPermaLink="false">https://read.thecoder.cafe/p/build-your-own-kv-engine-2</guid><dc:creator><![CDATA[Teiva Harsanyi]]></dc:creator><pubDate>Wed, 19 Nov 2025 13:02:03 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Yz9h!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F166c05d8-f6f2-4a0c-b0f6-3f30340ddcdb_1600x800.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1>Monster Scale Summit</h1><p><em>Curious how leading engineers tackle extreme scale challenges with data-intensive applications? Join Monster Scale Summit (free + virtual). It&#8217;s hosted by ScyllaDB, the monstrously fast and scalable database.</em></p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://www.scylladb.com/monster-scale-summit/?latest_sfdc_campaign=701Rb00000YVkNx&amp;campaign_status=Submitted&amp;utm_campaign=pn%20coder%20cafe%202026-03-11%20monster%20scale%20summit&amp;utm_medium=paid%20newsletter&amp;utm_source=paid%20newsletter&amp;lead_source_type=coder%20cafe" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!p4cN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png 424w, https://substackcdn.com/image/fetch/$s_!p4cN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png 848w, https://substackcdn.com/image/fetch/$s_!p4cN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png 1272w, https://substackcdn.com/image/fetch/$s_!p4cN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!p4cN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png" width="1456" height="182" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:182,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:737127,&quot;alt&quot;:&quot;Monster Scale Summit.&quot;,&quot;title&quot;:&quot;&quot;,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://www.scylladb.com/monster-scale-summit/?latest_sfdc_campaign=701Rb00000YVkNx&amp;campaign_status=Submitted&amp;utm_campaign=pn%20coder%20cafe%202026-03-11%20monster%20scale%20summit&amp;utm_medium=paid%20newsletter&amp;utm_source=paid%20newsletter&amp;lead_source_type=coder%20cafe&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.thecoder.cafe/i/174600320?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Monster Scale Summit." title="Monster Scale Summit." srcset="https://substackcdn.com/image/fetch/$s_!p4cN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png 424w, https://substackcdn.com/image/fetch/$s_!p4cN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png 848w, https://substackcdn.com/image/fetch/$s_!p4cN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png 1272w, https://substackcdn.com/image/fetch/$s_!p4cN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a></figure></div><div><hr></div><h1>Agenda</h1><ul><li><p><a href="https://read.thecoder.cafe/p/build-your-own-kv-engine">Week 0: Introduction</a></p></li><li><p><a href="https://read.thecoder.cafe/p/build-your-own-kv-engine-1">Week 1: In-Memory Store</a></p></li><li><p><strong><a href="https://read.thecoder.cafe/p/build-your-own-kv-engine-2">Week 2: LSM Tree Foundations</a></strong></p></li><li><p><a href="https://read.thecoder.cafe/p/build-your-own-kv-engine-3">Week 3: Durability with Write-Ahead Logging</a></p></li><li><p><a href="https://read.thecoder.cafe/p/build-your-own-kv-engine-4">Week 4: Deletes, Tombstones, and Compaction</a></p></li><li><p><a href="https://read.thecoder.cafe/p/build-your-own-kv-engine-5">Week 5: Leveling and Key-Range Partitioning</a></p></li><li><p><a href="https://read.thecoder.cafe/p/build-your-own-kv-engine-6">Week 6: Block-Based SSTables and Indexing</a></p></li><li><p><a href="https://read.thecoder.cafe/p/build-your-own-kv-engine-7">Week 7: Bloom Filters and Trie Memtable</a></p></li><li><p><a href="https://read.thecoder.cafe/p/build-your-own-kv-engine-8">Week 8: Concurrency</a></p></li></ul><h1>Introduction</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Yz9h!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F166c05d8-f6f2-4a0c-b0f6-3f30340ddcdb_1600x800.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Yz9h!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F166c05d8-f6f2-4a0c-b0f6-3f30340ddcdb_1600x800.png 424w, https://substackcdn.com/image/fetch/$s_!Yz9h!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F166c05d8-f6f2-4a0c-b0f6-3f30340ddcdb_1600x800.png 848w, https://substackcdn.com/image/fetch/$s_!Yz9h!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F166c05d8-f6f2-4a0c-b0f6-3f30340ddcdb_1600x800.png 1272w, https://substackcdn.com/image/fetch/$s_!Yz9h!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F166c05d8-f6f2-4a0c-b0f6-3f30340ddcdb_1600x800.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Yz9h!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F166c05d8-f6f2-4a0c-b0f6-3f30340ddcdb_1600x800.png" width="1456" height="728" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/166c05d8-f6f2-4a0c-b0f6-3f30340ddcdb_1600x800.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:728,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2291768,&quot;alt&quot;:&quot;Week 2 LSM Tree Foundations&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://read.thecoder.cafe/i/174570737?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F166c05d8-f6f2-4a0c-b0f6-3f30340ddcdb_1600x800.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Week 2 LSM Tree Foundations" title="Week 2 LSM Tree Foundations" srcset="https://substackcdn.com/image/fetch/$s_!Yz9h!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F166c05d8-f6f2-4a0c-b0f6-3f30340ddcdb_1600x800.png 424w, https://substackcdn.com/image/fetch/$s_!Yz9h!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F166c05d8-f6f2-4a0c-b0f6-3f30340ddcdb_1600x800.png 848w, https://substackcdn.com/image/fetch/$s_!Yz9h!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F166c05d8-f6f2-4a0c-b0f6-3f30340ddcdb_1600x800.png 1272w, https://substackcdn.com/image/fetch/$s_!Yz9h!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F166c05d8-f6f2-4a0c-b0f6-3f30340ddcdb_1600x800.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Before delving into this week&#8217;s tasks, it&#8217;s important to understand what you will implement. This week, you will implement a basic log-structured merge-tree (LSM tree).</p><p>At its core, an LSM tree is a data structure that prioritizes write efficiency by trading off some read complexity. It buffers writes in memory and uses append-only files on disk, then rewrites data during compaction. It consists of two main components:</p><ul><li><p>A mutable in-memory data structure called a memtable, used to store recent writes.</p></li><li><p>A set of immutable SSTables (Sorted String Table) stored on disk.</p></li></ul><p>Regularly, the current memtable is snapshotted, its entries are sorted by key, and a new immutable SSTable file is written.</p><p>In addition, a MANIFEST file is an append-only list of SSTable filenames. It tells the engine which SSTable files exist and in which order to read them, newest to oldest.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!COEC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d24cf55-591a-4a49-9d5f-e8c988bb07c5_1720x820.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!COEC!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d24cf55-591a-4a49-9d5f-e8c988bb07c5_1720x820.png 424w, https://substackcdn.com/image/fetch/$s_!COEC!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d24cf55-591a-4a49-9d5f-e8c988bb07c5_1720x820.png 848w, https://substackcdn.com/image/fetch/$s_!COEC!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d24cf55-591a-4a49-9d5f-e8c988bb07c5_1720x820.png 1272w, https://substackcdn.com/image/fetch/$s_!COEC!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d24cf55-591a-4a49-9d5f-e8c988bb07c5_1720x820.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!COEC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d24cf55-591a-4a49-9d5f-e8c988bb07c5_1720x820.png" width="1456" height="694" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8d24cf55-591a-4a49-9d5f-e8c988bb07c5_1720x820.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:694,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:113973,&quot;alt&quot;:&quot;Diagram showing how a memtable is periodically flushed to disk as immutable SSTables. The in-memory memtable writes to SSTables 1, 2, and 3 on disk, while a MANIFEST file tracks the existing SSTables and the order in which to read them.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.thecoder.cafe/i/174570737?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d24cf55-591a-4a49-9d5f-e8c988bb07c5_1720x820.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Diagram showing how a memtable is periodically flushed to disk as immutable SSTables. The in-memory memtable writes to SSTables 1, 2, and 3 on disk, while a MANIFEST file tracks the existing SSTables and the order in which to read them." title="Diagram showing how a memtable is periodically flushed to disk as immutable SSTables. The in-memory memtable writes to SSTables 1, 2, and 3 on disk, while a MANIFEST file tracks the existing SSTables and the order in which to read them." srcset="https://substackcdn.com/image/fetch/$s_!COEC!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d24cf55-591a-4a49-9d5f-e8c988bb07c5_1720x820.png 424w, https://substackcdn.com/image/fetch/$s_!COEC!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d24cf55-591a-4a49-9d5f-e8c988bb07c5_1720x820.png 848w, https://substackcdn.com/image/fetch/$s_!COEC!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d24cf55-591a-4a49-9d5f-e8c988bb07c5_1720x820.png 1272w, https://substackcdn.com/image/fetch/$s_!COEC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d24cf55-591a-4a49-9d5f-e8c988bb07c5_1720x820.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Why LSM trees shine for write-heavy workloads:</p><ul><li><p>Fast writes with sequential I/O: New updates are buffered in memory (memtable) and later written sequentially to disk during a flush (SSTable), which is faster than the random I/O patterns common with B-trees, for example.</p></li><li><p>Decouples writes from read optimization: Writes complete against the memtable, while compaction work runs later (you will tackle that in a future week).</p></li><li><p>Space and long-term efficiency: Compaction processes remove dead data and merge many small files into larger sorted files, which keeps space usage in check and sustains read performance over time.</p></li></ul><p>For the memtable, you will start with a hashtable. In a future week, you will learn why a hashtable is not the most efficient data structure for an LSM tree, but it is a simple starting point.</p><p>For the SSTables, you will use JSON as the data format. Get comfortable with a JSON parser if you are not already.</p><h1>Your Tasks</h1><p>&#128172; If you want to share your progress, discuss solutions, or collaborate with other coders, join the community Discord server (<code>#kv-store-engine</code> channel):</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://discord.thecoder.cafe/&quot;,&quot;text&quot;:&quot;Join the Discord&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://discord.thecoder.cafe/"><span>Join the Discord</span></a></p><h2>Assumptions</h2><ul><li><p>This week&#8217;s implementation is single-threaded. You will revisit that assumption later.</p></li></ul><h2>Memtable</h2><p>Implement a hashtable to store <code>PUT</code> requests (create or update). You can probably reuse a lot of code from Week 1.</p><h2>Flush</h2><p>When your memtable contains 2,000 entries:</p><ul><li><p>Flush the memtable as a new immutable JSON SSTable file with keys sorted. The SSTable file is a JSON array of objects, each with two fields, <code>key</code> and <code>value</code>. Keys are unique within a file.</p></li><li><p>For example, if your memtable contains the following entries:</p><pre><code>a = hello
b = world
z = 42</code></pre><p>You need to create the following SSTable:</p><pre><code>[
  { &#8220;key&#8221;: &#8220;a&#8221;, &#8220;value&#8221;: &#8220;hello&#8221; },
  { &#8220;key&#8221;: &#8220;b&#8221;, &#8220;value&#8221;: &#8220;world&#8221; },
  { &#8220;key&#8221;: &#8220;z&#8221;, &#8220;value&#8221;: &#8220;42&#8221; }
]</code></pre></li><li><p>Use a counter for the filename prefix, for example <code>sst-1.json</code>, <code>sst-2.json</code>, <code>sst-3.json</code>.</p></li><li><p>After writing the new SSTable, append its filename to the MANIFEST (append only), then clear the memtable:</p><pre><code>sst-1.json
sst-2.json
sst-3.json &lt;- new</code></pre></li></ul><p>For now, the flush is a stop-the-world operation. While the file is being written, do not serve reads or writes. You will revisit that later.</p><h2>Startup</h2><ul><li><p>Create an empty <code>MANIFEST</code> file if it doesn&#8217;t exist.</p></li><li><p>Derive the next SSTable ID from the MANIFEST so you don't reuse the same filename.</p></li></ul><h2>GET</h2><p>Update <code>GET /{key}</code>:</p><ul><li><p>Check the memtable:</p><ul><li><p>If found, return the corresponding value.</p></li><li><p>If not found, read the MANIFEST to list SSTable filenames:</p><ul><li><p>Scan SSTables from newest to oldest (for example <code>sst-3.json</code>, then <code>sst-2.json</code>, then <code>sst-1.json</code>). Use a simple linear scan inside each file for now. Stop at the first hit and return the corresponding value.</p></li><li><p>If still not found, return <code>404 Not Found</code>.</p></li></ul></li></ul></li></ul><h2>Client &amp; Validation</h2><p>There are no changes to the client you built in week 1. Run it against the same file (<a href="https://github.com/teivah/thecodercafe/blob/main/res/kv/gen/put.txt">put.txt</a>) to validate that your changes are correct.</p><h2>[Optional] Negative Cache</h2><p>Keep a small LRU cache of known-absent keys (negative cache) between the memtable and SSTables. This avoids repeated disk scans for hot misses: after the first miss, subsequent lookups are O(1).</p><p>Implementation details are up to you.</p><h2>[Optional] MANIFEST Cache</h2><p>Instead of parsing the MANIFEST file for each <code>GET</code> request, you can cache the content in-memory.</p><h1>Wrap Up</h1><p>That&#8217;s it for this week! You have built the first version of an LSM tree: a memtable in memory, SSTable files written by regular flushes, and a MANIFEST that lists those SSTables.</p><p>For now, durability isn&#8217;t guaranteed. Data already flushed to SSTables will be read after a restart, but anything still in the memtable during a crash is lost.</p><p>In two weeks, you will make sure that any request acknowledged to a client remains in your storage engine, even after a restart.</p><h1>Further Notes</h1><p>The flush trigger you used was pretty simple: once the memtable contains 2,000 entries. In real systems, flushes can be triggered by various factors, for example:</p><ul><li><p>Some databases flush when the memtable reaches a target size in bytes, ensuring predictable memory usage.</p></li><li><p>A flush can also occur after a period of time has passed. This occurs because the database eventually needs to release commit log segments. For tables with very low write activity, this can sometimes lead to data resurrection scenarios. Here&#8217;s an old <a href="https://github.com/scylladb/scylladb/issues/14870">issue</a> from the ScyllaDB codebase that illustrates this behavior.</p></li></ul><p>Regarding the model, this series assumes a simple key&#8211;value one: every PUT stores the whole value, so a GET just finds the newest entry and returns it. If you need a richer model (e.g., rows with many fields or collections), writes are often partial (patches) rather than full replacements. Therefore, reads must reconstruct the result by scanning newest to oldest and merging changes until all required fields are found or a full-write record is encountered.</p><p>Last but not least, in this series, you implicitly rely on client-side ordering: the validation client issues requests sequentially. Production KV databases typically attach a sequence number or a logical timestamp to each write to handle out-of-order arrivals, merging, and reconciling results. Pure wall-clock timestamps are convenient but brittle; see <a href="https://github.com/aphyr/distsys-class?tab=readme-ov-file#clocks">Kyle Kingsbury&#8217;s notes on clock pitfalls</a> for a deeper dive.</p><p>Next: <a href="https://read.thecoder.cafe/p/build-your-own-kv-engine-3">Week 3: Durability with Write-Ahead Logging</a></p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ZoDz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ZoDz!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png 424w, https://substackcdn.com/image/fetch/$s_!ZoDz!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png 848w, https://substackcdn.com/image/fetch/$s_!ZoDz!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png 1272w, https://substackcdn.com/image/fetch/$s_!ZoDz!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ZoDz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png" width="449" height="224.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:600,&quot;width&quot;:1200,&quot;resizeWidth&quot;:449,&quot;bytes&quot;:82853,&quot;alt&quot;:&quot;The Coder Cafe: Learn One Concept With Your Coffee.&quot;,&quot;title&quot;:&quot;The Coder Cafe: Learn One Concept With Your Coffee.&quot;,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.thecoder.cafe/i/151119215?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="The Coder Cafe: Learn One Concept With Your Coffee." title="The Coder Cafe: Learn One Concept With Your Coffee." srcset="https://substackcdn.com/image/fetch/$s_!ZoDz!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png 424w, https://substackcdn.com/image/fetch/$s_!ZoDz!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png 848w, https://substackcdn.com/image/fetch/$s_!ZoDz!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png 1272w, https://substackcdn.com/image/fetch/$s_!ZoDz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://read.thecoder.cafe/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">AI is getting better every day. Are you? At The Coder Cafe, we serve fundamental concepts to make you an engineer that AI can&#8217;t replace. Written by a Google SWE, trusted by thousands of engineers worldwide.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h1>Sources</h1><ul><li><p><a href="https://www.cs.umb.edu/~poneil/lsmtree.pdf">The Log-Structured Merge-Tree (LSM-Tree)</a> <em>// The original LSM tree whitepaper.</em></p></li><li><p><a href="https://www.scylladb.com/glossary/log-structured-merge-tree">Log Structured Merge Tree - ScyllaDB</a> <em>// LSM tree definition from ScyllaDB <a href="https://www.scylladb.com/technical-glossary/">technical glossary</a>.</em></p></li></ul><div><hr></div><p>&#10084;&#65039; <em>If you enjoyed this post, please hit the like button.</em></p>]]></content:encoded></item><item><title><![CDATA[Build Your Own Key-Value Storage Engine—Week 1]]></title><description><![CDATA[In-Memory Store]]></description><link>https://read.thecoder.cafe/p/build-your-own-kv-engine-1</link><guid isPermaLink="false">https://read.thecoder.cafe/p/build-your-own-kv-engine-1</guid><dc:creator><![CDATA[Teiva Harsanyi]]></dc:creator><pubDate>Wed, 05 Nov 2025 13:00:43 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!OzlO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F931eea2b-33a3-4ab5-99b4-9c4ab15dac91_1600x800.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1>Monster Scale Summit</h1><p><em>Curious how leading engineers tackle extreme scale challenges with data-intensive applications? Join Monster Scale Summit (free + virtual). It&#8217;s hosted by ScyllaDB, the monstrously fast and scalable database.</em></p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://www.scylladb.com/monster-scale-summit/?latest_sfdc_campaign=701Rb00000YVkNx&amp;campaign_status=Submitted&amp;utm_campaign=pn%20coder%20cafe%202026-03-11%20monster%20scale%20summit&amp;utm_medium=paid%20newsletter&amp;utm_source=paid%20newsletter&amp;lead_source_type=coder%20cafe" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!p4cN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png 424w, https://substackcdn.com/image/fetch/$s_!p4cN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png 848w, https://substackcdn.com/image/fetch/$s_!p4cN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png 1272w, https://substackcdn.com/image/fetch/$s_!p4cN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!p4cN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png" width="1456" height="182" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:182,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:737127,&quot;alt&quot;:&quot;Monster Scale Summit.&quot;,&quot;title&quot;:&quot;&quot;,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://www.scylladb.com/monster-scale-summit/?latest_sfdc_campaign=701Rb00000YVkNx&amp;campaign_status=Submitted&amp;utm_campaign=pn%20coder%20cafe%202026-03-11%20monster%20scale%20summit&amp;utm_medium=paid%20newsletter&amp;utm_source=paid%20newsletter&amp;lead_source_type=coder%20cafe&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.thecoder.cafe/i/174600320?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Monster Scale Summit." title="Monster Scale Summit." srcset="https://substackcdn.com/image/fetch/$s_!p4cN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png 424w, https://substackcdn.com/image/fetch/$s_!p4cN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png 848w, https://substackcdn.com/image/fetch/$s_!p4cN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png 1272w, https://substackcdn.com/image/fetch/$s_!p4cN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a></figure></div><div><hr></div><h1>Agenda</h1><ul><li><p><a href="https://read.thecoder.cafe/p/build-your-own-kv-engine">Week 0: Introduction</a></p></li><li><p><strong><a href="https://read.thecoder.cafe/p/build-your-own-kv-engine-1">Week 1: In-Memory Store</a></strong></p></li><li><p><a href="https://read.thecoder.cafe/p/build-your-own-kv-engine-2">Week 2: LSM Tree Foundations</a></p></li><li><p><a href="https://read.thecoder.cafe/p/build-your-own-kv-engine-3">Week 3: Durability with Write-Ahead Logging</a></p></li><li><p><a href="https://read.thecoder.cafe/p/build-your-own-kv-engine-4">Week 4: Deletes, Tombstones, and Compaction</a></p></li><li><p><a href="https://read.thecoder.cafe/p/build-your-own-kv-engine-5">Week 5: Leveling and Key-Range Partitioning</a></p></li><li><p><a href="https://read.thecoder.cafe/p/build-your-own-kv-engine-6">Week 6: Block-Based SSTables and Indexing</a></p></li><li><p><a href="https://read.thecoder.cafe/p/build-your-own-kv-engine-7">Week 7: Bloom Filters and Trie Memtable</a></p></li><li><p><a href="https://read.thecoder.cafe/p/build-your-own-kv-engine-8">Week 8: Concurrency</a></p></li></ul><h1>Introduction</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!OzlO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F931eea2b-33a3-4ab5-99b4-9c4ab15dac91_1600x800.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!OzlO!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F931eea2b-33a3-4ab5-99b4-9c4ab15dac91_1600x800.png 424w, https://substackcdn.com/image/fetch/$s_!OzlO!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F931eea2b-33a3-4ab5-99b4-9c4ab15dac91_1600x800.png 848w, https://substackcdn.com/image/fetch/$s_!OzlO!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F931eea2b-33a3-4ab5-99b4-9c4ab15dac91_1600x800.png 1272w, https://substackcdn.com/image/fetch/$s_!OzlO!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F931eea2b-33a3-4ab5-99b4-9c4ab15dac91_1600x800.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!OzlO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F931eea2b-33a3-4ab5-99b4-9c4ab15dac91_1600x800.png" width="1456" height="728" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/931eea2b-33a3-4ab5-99b4-9c4ab15dac91_1600x800.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:728,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2294642,&quot;alt&quot;:&quot;Week 1  In-Memory Store  Week 1  In-Memory Store&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://read.thecoder.cafe/i/174535414?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F931eea2b-33a3-4ab5-99b4-9c4ab15dac91_1600x800.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Week 1  In-Memory Store  Week 1  In-Memory Store" title="Week 1  In-Memory Store  Week 1  In-Memory Store" srcset="https://substackcdn.com/image/fetch/$s_!OzlO!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F931eea2b-33a3-4ab5-99b4-9c4ab15dac91_1600x800.png 424w, https://substackcdn.com/image/fetch/$s_!OzlO!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F931eea2b-33a3-4ab5-99b4-9c4ab15dac91_1600x800.png 848w, https://substackcdn.com/image/fetch/$s_!OzlO!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F931eea2b-33a3-4ab5-99b4-9c4ab15dac91_1600x800.png 1272w, https://substackcdn.com/image/fetch/$s_!OzlO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F931eea2b-33a3-4ab5-99b4-9c4ab15dac91_1600x800.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Welcome to week 1 of <em>Build Your Own Key-Value Storage Engine</em>!</p><p>Let&#8217;s start by making sure what you&#8217;re about to build in this series makes complete sense: what&#8217;s a storage engine?</p><p>A storage engine is the part of a database that actually stores, indexes, and retrieves data, whether on disk or in memory. Think of the database as the restaurant, and the storage engine as the kitchen that decides how food is prepared and stored.</p><p>Some databases let you choose the storage engine. For example, MySQL uses InnoDB by default (based on B+-trees). Through plugins, you can switch to RocksDB, which is based on LSM trees.</p><p>This week, you will build an in-memory storage engine and the first version of the validation client that you will reuse throughout the series.</p><h1>Your Tasks</h1><p>&#128172; If you want to share your progress, discuss solutions, or collaborate with other coders, join the community Discord server (<code>#kv-store-engine</code> channel):</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://discord.thecoder.cafe/&quot;,&quot;text&quot;:&quot;Join the Discord&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://discord.thecoder.cafe/"><span>Join the Discord</span></a></p><h2>Assumptions</h2><ul><li><p>Keys are lowercase ASCII strings.</p></li><li><p>Values are ASCII strings.</p></li></ul><blockquote><p><strong>NOTE</strong>: Assumptions persist for the rest of the series unless explicitly discarded.</p></blockquote><h2>REST Endpoints to Implement</h2><ul><li><p><code>PUT /{key}</code>:</p><ul><li><p>The request body contains the value.</p></li><li><p>If the key exists, update its value and return success.</p></li><li><p>If the key doesn&#8217;t exist, create it and return success.</p></li><li><p>Keep all data in memory.</p></li></ul></li><li><p><code>GET /{key}</code>:</p><ul><li><p>If the key exists, return 200 OK with the value in the body.</p></li><li><p>If the key does not exist, return <code>404 Not Found</code>.</p></li></ul></li></ul><h2>Client &amp; Validation</h2><p>Implement a client to validate your server:</p><ul><li><p>Read the testing scenario from this file: <a href="https://github.com/teivah/thecodercafe/blob/main/res/kv/gen/put.txt">put.txt</a>.</p></li><li><p>Run an HTTP request for each line:</p><ul><li><p><code>PUT k v</code> &#8594; Send a <code>PUT</code> to <code>/k</code> with body <code>v</code>.</p></li><li><p><code>GET k v</code> &#8594; Send a <code>GET</code> to <code>/k</code>. Confirm that <code>v</code> is returned. If not, something is wrong with your implementation.</p></li><li><p><code>GET k NOT_FOUND</code> &#8594; Send a GET to <code>/k</code>. Confirm that <code>404 Not Found</code> is returned. If not, something is wrong with your implementation.</p></li></ul></li><li><p>Each request must be executed sequentially, one line at a time; otherwise, out-of-order responses may fail the client&#8217;s assertions.</p></li></ul><h2>Input File Generation</h2><p>If you want to generate an input file with a different number of lines, you can use this <a href="https://github.com/teivah/thecodercafe/blob/main/res/kv/gen/gen.go">Go generator</a>:</p><pre><code>go run gen.go &lt;format&gt; &lt;lines&gt;</code></pre><ul><li><p><code>&lt;format&gt;</code> is the format to generate.</p></li><li><p><code>&lt;lines&gt;</code> is the number of lines.</p></li></ul><p>At this stage, you need a <code>put</code>-type file, so for example, if you need one million lines:</p><pre><code>go run gen.go put 1000000</code></pre><h2>[Optional] Metrics</h2><p>Add basic metrics for latency:</p><ul><li><p>Record start and end time for each request.</p></li><li><p>Keep a small histogram of latencies in milliseconds.</p></li><li><p>At the end, print <code>p50</code>, <code>p95</code>, and <code>p99</code>.</p></li></ul><p>This work is optional as there is no latency target in this series. However, it can be an interesting point of comparison across weeks to see how your changes affect latency.</p><h1>Wrap Up</h1><p>That&#8217;s it for this week! You have built a simple storage engine that keeps everything in memory.</p><p>In two weeks, we will level up. You will delve into a data structure widely used in key-value databases: LSM trees.</p><p>Next: <a href="https://read.thecoder.cafe/p/build-your-own-kv-engine-2">Week 2: LSM Tree Foundations</a></p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ZoDz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ZoDz!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png 424w, https://substackcdn.com/image/fetch/$s_!ZoDz!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png 848w, https://substackcdn.com/image/fetch/$s_!ZoDz!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png 1272w, https://substackcdn.com/image/fetch/$s_!ZoDz!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ZoDz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png" width="449" height="224.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:600,&quot;width&quot;:1200,&quot;resizeWidth&quot;:449,&quot;bytes&quot;:82853,&quot;alt&quot;:&quot;The Coder Cafe: Learn One Concept With Your Coffee.&quot;,&quot;title&quot;:&quot;The Coder Cafe: Learn One Concept With Your Coffee.&quot;,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.thecoder.cafe/i/151119215?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="The Coder Cafe: Learn One Concept With Your Coffee." title="The Coder Cafe: Learn One Concept With Your Coffee." srcset="https://substackcdn.com/image/fetch/$s_!ZoDz!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png 424w, https://substackcdn.com/image/fetch/$s_!ZoDz!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png 848w, https://substackcdn.com/image/fetch/$s_!ZoDz!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png 1272w, https://substackcdn.com/image/fetch/$s_!ZoDz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://read.thecoder.cafe/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">AI is getting better every day. Are you? At The Coder Cafe, we serve fundamental concepts to make you an engineer that AI can&#8217;t replace. Written by a Google SWE, trusted by thousands of engineers worldwide.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>&#10084;&#65039; <em>If you enjoyed this post, please hit the like button.</em></p>]]></content:encoded></item><item><title><![CDATA[Build Your Own Key-Value Storage Engine]]></title><description><![CDATA[Eight Weeks to a Working Key-Value Storage Engine]]></description><link>https://read.thecoder.cafe/p/build-your-own-kv-engine</link><guid isPermaLink="false">https://read.thecoder.cafe/p/build-your-own-kv-engine</guid><dc:creator><![CDATA[Teiva Harsanyi]]></dc:creator><pubDate>Wed, 29 Oct 2025 13:00:47 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!XGLo!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9db2dd72-a511-4fe4-881f-d7afc7668811_1600x800.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1>Monster Scale Summit</h1><p><em>Curious how leading engineers tackle extreme scale challenges with data-intensive applications? Join Monster Scale Summit (free + virtual). It&#8217;s hosted by ScyllaDB, the monstrously fast and scalable database.</em></p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://www.scylladb.com/monster-scale-summit/?latest_sfdc_campaign=701Rb00000YVkNx&amp;campaign_status=Submitted&amp;utm_campaign=pn%20coder%20cafe%202026-03-11%20monster%20scale%20summit&amp;utm_medium=paid%20newsletter&amp;utm_source=paid%20newsletter&amp;lead_source_type=coder%20cafe" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!p4cN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png 424w, https://substackcdn.com/image/fetch/$s_!p4cN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png 848w, https://substackcdn.com/image/fetch/$s_!p4cN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png 1272w, https://substackcdn.com/image/fetch/$s_!p4cN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!p4cN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png" width="1456" height="182" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:182,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:737127,&quot;alt&quot;:&quot;Monster Scale Summit.&quot;,&quot;title&quot;:&quot;Monster Scale Summit.&quot;,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://www.scylladb.com/monster-scale-summit/?latest_sfdc_campaign=701Rb00000YVkNx&amp;campaign_status=Submitted&amp;utm_campaign=pn%20coder%20cafe%202026-03-11%20monster%20scale%20summit&amp;utm_medium=paid%20newsletter&amp;utm_source=paid%20newsletter&amp;lead_source_type=coder%20cafe&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.thecoder.cafe/i/174600320?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Monster Scale Summit." title="Monster Scale Summit." srcset="https://substackcdn.com/image/fetch/$s_!p4cN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png 424w, https://substackcdn.com/image/fetch/$s_!p4cN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png 848w, https://substackcdn.com/image/fetch/$s_!p4cN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png 1272w, https://substackcdn.com/image/fetch/$s_!p4cN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fe5120b-8d18-4055-b1e0-d01af181fde6_2692x336.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a></figure></div><h1>Agenda</h1><ul><li><p><strong><a href="https://read.thecoder.cafe/p/build-your-own-kv-engine">Week 0: Introduction</a></strong></p></li><li><p><a href="https://read.thecoder.cafe/p/build-your-own-kv-engine-1">Week 1: In-Memory Store</a></p></li><li><p><a href="https://read.thecoder.cafe/p/build-your-own-kv-engine-2">Week 2: LSM Tree Foundations</a></p></li><li><p><a href="https://read.thecoder.cafe/p/build-your-own-kv-engine-3">Week 3: Durability with Write-Ahead Logging</a></p></li><li><p><a href="https://read.thecoder.cafe/p/build-your-own-kv-engine-4">Week 4: Deletes, Tombstones, and Compaction</a></p></li><li><p><a href="https://read.thecoder.cafe/p/build-your-own-kv-engine-5">Week 5: Leveling and Key-Range Partitioning</a></p></li><li><p><a href="https://read.thecoder.cafe/p/build-your-own-kv-engine-6">Week 6: Block-Based SSTables and Indexing</a></p></li><li><p><a href="https://read.thecoder.cafe/p/build-your-own-kv-engine-7">Week 7: Bloom Filters and Trie Memtable</a></p></li><li><p><a href="https://read.thecoder.cafe/p/build-your-own-kv-engine-8">Week 8: Concurrency</a></p></li></ul><h1>Introduction</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!XGLo!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9db2dd72-a511-4fe4-881f-d7afc7668811_1600x800.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!XGLo!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9db2dd72-a511-4fe4-881f-d7afc7668811_1600x800.png 424w, https://substackcdn.com/image/fetch/$s_!XGLo!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9db2dd72-a511-4fe4-881f-d7afc7668811_1600x800.png 848w, https://substackcdn.com/image/fetch/$s_!XGLo!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9db2dd72-a511-4fe4-881f-d7afc7668811_1600x800.png 1272w, https://substackcdn.com/image/fetch/$s_!XGLo!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9db2dd72-a511-4fe4-881f-d7afc7668811_1600x800.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!XGLo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9db2dd72-a511-4fe4-881f-d7afc7668811_1600x800.png" width="1456" height="728" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9db2dd72-a511-4fe4-881f-d7afc7668811_1600x800.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:728,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2282863,&quot;alt&quot;:&quot;Build Your Own Key-Value Storage Engine&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://read.thecoder.cafe/i/174600320?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9db2dd72-a511-4fe4-881f-d7afc7668811_1600x800.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Build Your Own Key-Value Storage Engine" title="Build Your Own Key-Value Storage Engine" srcset="https://substackcdn.com/image/fetch/$s_!XGLo!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9db2dd72-a511-4fe4-881f-d7afc7668811_1600x800.png 424w, https://substackcdn.com/image/fetch/$s_!XGLo!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9db2dd72-a511-4fe4-881f-d7afc7668811_1600x800.png 848w, https://substackcdn.com/image/fetch/$s_!XGLo!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9db2dd72-a511-4fe4-881f-d7afc7668811_1600x800.png 1272w, https://substackcdn.com/image/fetch/$s_!XGLo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9db2dd72-a511-4fe4-881f-d7afc7668811_1600x800.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Welcome to <strong><a href="https://read.thecoder.cafe/s/coding-corner">The Coding Corner</a></strong>! This is our new section at The Coder Cafe, where we build real-world systems together, one step at a time.</p><p>Next week, we will launch the first post series: <strong>Build Your Own Key-Value Storage Engine</strong><em><strong>.</strong></em></p><p>Are you interested in understanding how key-value databases work? Tackling challenges like durability, partitioning, and compaction? Exploring data structures like LSM trees, Bloom filters, and tries? Then this series is for you.</p><p><em>Build Your Own Key-Value Storage Engine</em> focuses on the storage engine itself; we will stay single-node. Topics such as replication and consensus are out of scope. Yet, if this format works, we may cover them in a future series.</p><p>The structure of each post will be as follows:</p><ol><li><p><strong>Introduction</strong>: The theory for what you are about to build that week.</p></li><li><p><strong>Your tasks</strong>: A list of tasks to complete the week&#8217;s challenges. Note that you can complete the series in any programming language you want.</p></li><li><p><strong>Further notes</strong>: Additional perspective on how things work in real systems.</p></li></ol><p>Last but not least, <strong>I&#8217;m delighted to share that this series was written in collaboration with ScyllaDB.</strong> They reviewed the content for accuracy and shared practical context from real systems, providing a clearer view of how production databases behave and the problems they solve. Huge thanks to <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Cynthia Dunlop&quot;,&quot;id&quot;:284826972,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c08d6ae6-6b9f-4742-a79b-70c8e4c365a4_640x960.jpeg&quot;,&quot;uuid&quot;:&quot;e18106ec-ddb0-4747-a765-098fa3af6701&quot;}" data-component-name="MentionToDOM"></span>, <a href="https://www.linkedin.com/in/felipe-cardeneti-mendes-858b92122">Felipe Cardeneti Mendes</a>, and ScyllaDB.</p><p>By the way, they host a free virtual conference called Monster Scale Summit, and the content is always excellent. If you care about scaling challenges, it&#8217;s absolutely worth registering! Also, if you&#8217;re interested in giving a talk, the CFP closes in two days.</p><p>On a personal note, this has been the most time-consuming project I have done for <em>The Coder Cafe</em>. I really hope you will enjoy it!</p><p>See you this Friday for a special post for Halloween and next Wednesday for the first post of the series.</p><p>Next: <a href="https://read.thecoder.cafe/p/build-your-own-kv-engine-1">Week 1: In-Memory Store</a></p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://read.thecoder.cafe/" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ZoDz!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png 424w, https://substackcdn.com/image/fetch/$s_!ZoDz!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png 848w, https://substackcdn.com/image/fetch/$s_!ZoDz!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png 1272w, https://substackcdn.com/image/fetch/$s_!ZoDz!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ZoDz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png" width="449" height="224.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:600,&quot;width&quot;:1200,&quot;resizeWidth&quot;:449,&quot;bytes&quot;:82853,&quot;alt&quot;:&quot;The Coder Cafe: Learn One Concept With Your Coffee.&quot;,&quot;title&quot;:&quot;The Coder Cafe: Learn One Concept With Your Coffee.&quot;,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://read.thecoder.cafe/&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.thecoder.cafe/i/151119215?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="The Coder Cafe: Learn One Concept With Your Coffee." title="The Coder Cafe: Learn One Concept With Your Coffee." srcset="https://substackcdn.com/image/fetch/$s_!ZoDz!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png 424w, https://substackcdn.com/image/fetch/$s_!ZoDz!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png 848w, https://substackcdn.com/image/fetch/$s_!ZoDz!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png 1272w, https://substackcdn.com/image/fetch/$s_!ZoDz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5b47134-fe05-42e3-9aaf-dd2758923a98_1200x600.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://read.thecoder.cafe/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">AI is getting better every day. Are you? At The Coder Cafe, we serve fundamental concepts to make you an engineer that AI can&#8217;t replace. Written by a Google SWE, trusted by thousands of engineers worldwide.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>&#10084;&#65039; <em>If you enjoyed this post, please hit the like button.</em></p>]]></content:encoded></item></channel></rss>