I Migrated from a Postgres Cluster to Disbursed SQLite with LiteFS

Over the previous couple of months, I have been heads-down on development the content material for
EpicWeb.dev. And I have been development all of it within the open on
my YouTube channel. If you happen to’ve been gazing
my EpicWeb.dev reside streams,
you’ll be able to know that I have been development the app we’re going to use to be informed internet dev: Rocket
Condominium πŸš€. With that app, I have began with SQLite and I have been seeking to
make a decision whether or not I will have to transfer over to Postgres.

I launched the 2021 replace to kentcdodds.com,
I determined to make use of Postgres on Fly.io on account of its toughen for
multi-regional deploys and each Postgres and Redis clusters with automated information
replication (I exploit Redis for caching gradual/charge restricted 3rd birthday celebration APIs and
dear computations). That proved to determine in point of fact neatly. My website online is speedy
anyplace you might be on the earth. On the other hand, there are a couple of downsides:

  1. Databases are method outdoor my wheelhouse (and private hobby).
  2. Operating my app in the neighborhood calls for beginning up Postgres/Redis by means of
    docker compose which is ok, however worrying, particularly for individuals.
  3. The infrastructure is sophisticated and will fail once in a while.

Additionally, once I constructed and deployed my website online, I believe I used to be the primary to construct and
deploy a allotted Node.js app with a Postgres cluster at the Fly.io community.
I consider the infra has stepped forward, however I used to be nonetheless working on older variations of
issues and reliability was once an issue (for each Postgres and Redis). I may just
no doubt improve and give a boost to the ones reliability considerations, however I had every other
thought. My website online will get sufficient common site visitors that those reliability problems ended in
me getting a large number of tweets like this:

So I have been short of to deal with those problems, however as a result of I am heads-down on
EpicWeb, I did not understand how I may just justify spending time on my private website online.

Whelp, I began the “Rocket Condominium” app with SQLite to stay issues easy and
I might in point of fact love to stay that if conceivable. On the identical time, it’s a must to me
that the app I exploit to show internet building is reasonable. So I used to be anticipating
that finally I might transfer to “an actual database” like Postgres. However I have been
listening to rumblings about SQLite being beautiful dang succesful, so I determined to take
benefit of my private website online’s database reliability problems to spend time
investigating SQLite’s feasibility to be used in my website online.

A number of months in the past, I met Kurt (Fly.io CEO) at
Remix Conf. We talked a couple of new challenge he is been
very thinking about that I might heard rumblings about referred to as
Litestream. He steered that for plenty of classes of
programs, SQLite can give a good higher person revel in than Postgres
along side a simplified developer revel in. Every time I pay attention “higher UX and
DX,” I am .

Litestream was once
in the beginning created to
give a boost to reliability of SQLite in edge circumstances (particularly, crisis restoration),
however the best way this was once finished additionally lends itself neatly to the theory of allotted
SQLite deployments.

Seems by the point Kurt and I had our dialog, he’d already employed
Ben Johnson (the creator of Litestream) to
paintings at Fly. Ben wrote
an ideal article
explaining what Litestream is. I like to recommend you give it a learn in case you are
unconvinced that SQLite is production-ready. He’s going to persuade you it’s.

Litestream has had “read-replicas” as a desired long run characteristic for over a 12 months.
Ben joined Fly to paintings on that and has now launched
LiteFS. Architecturally, LiteFS works
in a similar way to Postgres clusters. One LiteFS node is the “important” and different nodes
routinely reflect information writes. LiteFS acts as a proxy for your SQLite
database (you connect with LiteFS as an alternative of the underlying database) after which
LiteFS successfully replays information updates to all learn replicas. And this answer
occurs at the back of the scenes most often inside of 200ms (on par with allotted
Postgres clusters).

LiteFS lets in us to get allotted SQLite. As a reminder, the good thing about
allotted databases is that the knowledge in your app may also be geographically shut
to the customers of your app resulting in loads of milliseconds efficiency spice up
(consistent with request) for plenty of customers.

Moreover, SQLite information get entry to is way sooner than Postgres which is “speedy
sufficient” most often, however in fact manner
SQLite suffers a lot much less from N+1 queries
(which will have to be have shyed away from, however that is great).

So with LiteFS, our customers get a sooner app as a result of we get sooner information get entry to
that is allotted in all places the arena. And on most sensible of that, it is more practical thank you
to the reality we do not need to regulate a database server.

So, with that, I noticed that it is conceivable for me emigrate to SQLite on my
website online with out giving up the allotted nature of my app. Thankfully for me, LiteFS
continues to be in beta at Fly, so the migration went off with no hitch after all (jk
jk, I felt the beta nature of items beautiful laborious πŸ˜…).

My present app has the next Postgres information style:

datasource db {
  supplier = "postgresql"
  url      = env("POSTGRES_DATABASE_URL")

generator Jstomer {
  supplier = "prisma-client-js"

enum Function {

enum Group {

style Consumer {
  identification           String     @identification @default(uuid())
  createdAt    DateTime   @default(now())
  updatedAt    DateTime   @updatedAt
  e mail        String     @distinctive(map: "Consumer.email_unique")
  firstName    String
  discordId    String?
  convertKitId String?
  function         Function       @default(MEMBER)
  crew         Group
  calls        Name[]
  periods     Consultation[]
  postReads    PostRead[]

style Consultation {
  identification             String   @identification @default(uuid())
  createdAt      DateTime @default(now())
  person           Consumer     @relation(fields: [userId], references: [id], onDelete: Cascade)
  userId         String
  expirationDate DateTime

style Name {
  identification          String   @identification @default(uuid())
  createdAt   DateTime @default(now())
  updatedAt   DateTime @updatedAt
  identify       String
  description String
  key phrases    String
  person        Consumer     @relation(fields: [userId], references: [id], onDelete: Cascade)
  userId      String
  base64      String

style PostRead {
  identification        String   @identification @default(uuid())
  createdAt DateTime @default(now())
  person      Consumer?    @relation(fields: [userId], references: [id], onDelete: Cascade)
  userId    String?
  clientId  String?
  postSlug  String

Through some distance, the biggest desk in my database is the PostRead desk with just about a
part million rows. This in fact blows my thoughts as a result of that measures
precise reads of my weblog posts, no longer simply web page so much. Anyway, there are a number of
thousand person and consultation rows and only a few name rows (although the ones cling so much
of information).

The volume of information is essential as a result of the very first thing I had to consider
was once find out how to migrate the knowledge. There are quite a lot of established practices for doing
this. I am not a database man, so I am not neatly versed in them, however I regarded as
one means: Run each databases on the identical time and write to either one of them,
writes pass to each whilst nonetheless studying from the outdated one, then transfer to studying
from the brand new one and take down the outdated one… That sounded… annoyingly
tricky. And useless for my scenario. However I did discover it for some time.
It is almost definitely what you’ll need to do to be sure you had completely 0 information loss in
the time it takes to transport customers to the brand new model of the app.

In my case, I used to be happy with a tiny bit of information loss. All that may be misplaced would
be the knowledge gathered and adjusted within the Postgres database when I copied
the whole thing over to SQLite. I deliberate on doing the transfer over as a DNS transfer
right away after copying over the database so with a bit of luck that may be a horny
short while.

We will communicate extra about how I in fact migrated the knowledge itself later.

With that established, I had to migrate my utility from Postgres to
SQLite. Thankfully, as a result of I exploit Prisma, this was once beautiful easy! Here is the diff
to the schema report:

datasource db {
  supplier = "postgresql"
  supplier = "sqlite"
  url      = env("DATABASE_URL")


enum Function {

enum Group {

style Consumer {
  identification           String     @identification @default(uuid())
  createdAt    DateTime   @default(now())
  updatedAt    DateTime   @updatedAt
  e mail        String     @distinctive(map: "Consumer.email_unique")
  firstName    String
  discordId    String?
  convertKitId String?
  function         Function       @default(MEMBER)
  crew         Group
  function         String     @default("MEMBER")
  crew         String
  calls        Name[]
  periods     Consultation[]
  postReads    PostRead[]


I needed to flip enum sorts into strings because of SQLite’s loss of toughen for enums.
Rather than that, all I needed to do was once transfer from postgresql to sqlite within the
supplier and the whole thing else simply labored for the schema.

In fact, this impacted my utility slightly because of my reliance on the ones enums.
I do know prisma is operating on a characteristic referred to as “extensions” that may almost definitely
permit me to hook into prisma to do runtime exams to show the ones columns into
enums (successfully) so I shouldn’t have to replace any app code, however I might relatively
no longer use that experimental characteristic simply but. So as an alternative, I wrote a couple of at hand
sorts and software purposes:

sort Group = 'RED' | 'BLUE' | 'YELLOW'
sort Function = 'ADMIN' | 'MEMBER'
sort OptionalTeam = Group | 'UNKNOWN'

const groups: Array<Group> = ['RED', 'BLUE', 'YELLOW']
const roles: Array<Function> = ['ADMIN', 'MEMBER']
const isTeam = (crew?: string): crew is Group => groups.contains(crew as Group)
const isRole = (function?: string): function is Function => roles.contains(function as Function)
const getTeam = (crew?: string): Group | null => (isTeam(crew) ? crew : null)
const getOptionalTeam = (crew?: string): OptionalTeam =>
  isTeam(crew) ? crew : 'UNKNOWN'

With that, I simply adopted TypeScript squiggly traces to mend a couple of puts that
had been busted (huzzah for TypeScript).

Database connections

The following factor that added complexity to my setup is updating how the database
connects. To grasp why that is so sophisticated, you want to grasp slightly
of the restrictions there are with multi-region databases.

In my Postgres app, my important area was once Dallas (dfw). Shall we say you are in
Amsterdam (ams) and you need to replace your first identify for your KCD profile.
When that POST request is available in, I routinely ship a different roughly
reaction with a header: fly-replay: dfw. Fly will intercept that reaction
prior to sending it to you and “replay” that request to the node app working in

It does this since the learn replicas in all areas rather than the principle
area are not able to simply accept write requests.

Doing that is easy sufficient as a result of I am the use of categorical so I simply have a easy
middleware to do that for me. Sadly I do wish to make information writes on
non-POST requests. As an example, I love to replace your SESSION if you are making a
request inside of 6 weeks of your consultation’s expiration so that you do not need to
re-authenticate in case you are a typical reader of the publish. If that’s the case, I’ve two

  1. Write to the principle area at once (despite the fact that you are no longer in the principle
  2. Ship a fly-replay reaction so it may be treated by way of the principle area

On the time I used to be running on my website online in the beginning, choice 2 was once very tricky. So
I went with choice 1. So each and every area created two Prisma shoppers: One referred to as
prismaRead (which hooked up to the native area db) and the opposite referred to as
prismaWrite (which hooked up to the principle area db). This labored simply positive.

Sadly, with this migration to SQLite, choice 1 is now not conceivable.
SQLite actually solely runs in the neighborhood at the device and cannot be hooked up to from
the outdoor international. So now I’ve new choices:

  1. Create particular endpoints in my utility so ams can name dfw to accomplish
    those particular writes.
  2. Ship a fly-replay reaction so it may be treated by way of the principle area

I began with choice 1 and I nearly completed with it. It wasn’t in point of fact advanced
or anything else, but it surely positive was once worrying. Then I remembered that after I used to be writing
my website online in the beginning, we did not have the throw Reaction characteristic in Remix. In
Remix, at any level for your motion or loader, you’ll throw a Reaction
object and Remix will catch that and ship the reaction. This made choice 2
a lot more uncomplicated. Had we been in a position to do that prior to, I’d have long gone with this
choice in the beginning!

So, I made a util for interacting with fly, which was once later expanded into an
professional library: litefs-js:

With that, I may just merely upload ensurePrimary() within the few eventualities I wanted
to do a write on a GET request:

// if there may be lower than ~six months left, prolong the consultation
const twoWeeks = 1000 * 60 * 60 * 24 * 30 * 6
if (Date.now() + twoWeeks > consultation.expirationDate.getTime()) {
  look forward to ensurePrimary()
  const newExpirationDate = new Date(Date.now() + sessionExpirationTime)
  look forward to prisma.consultation.replace({
    information: {expirationDate: newExpirationDate},
    the place: {identification: sessionId},

There in point of fact are not many eventualities the place a mutation on a GET is sensible so
this works nice. The entire person reviews is a moderately longer reaction time
in those uncommon eventualities.

With that during position, I used to be in a position to take away all of the prismaWrite and prismaRead
and simply have a unmarried prisma Jstomer. In the long run this simplified issues for me
reasonably slightly.

To take issues additional, the fly crew is these days running on making it so
LiteFS will care for this fascinated by me. They’re going to successfully permit writes to
non-primary DBs. So with a bit of luck within the near-ish long run I will take away all this from
my codebase. This can be an important growth. There is a trade-off right here
that I’m going to wish to review (I believe it mainly manner that each one writes are gradual,
even the ones to the principle area). Industry-offs all over the place!


Every other factor I have been short of to do is eliminate Redis. Remix makes use of SQLite to
cache its documentation pages and I noticed I may just do the similar factor with my
weblog posts and 3rd birthday celebration APIs I cached. This is able to considerably simplify my
structure additional by way of decreasing the collection of products and services my app relies on (Redis
had no doubt brought about outages on my website online as neatly).

After I constructed my website online, I created an abstraction for turning a serve as right into a
“cachified” model of itself referred to as… look forward to it… cachified.
Hannes Diercks requested whether or not he may just extract my
code for that and equipment it up as a library. I mentioned sure and
right here it’s. So I determined to take this
alternative to delete my very own implementation and use Hannes’ stepped forward library.

To start with, I deliberate on simply including a Cache desk to my SQLite database as section
of my prisma schema. However then I noticed that I’d run into problems with no longer
having the ability to write to the cache in non-primary areas. Replaying requests to
do that would roughly defeat the aim of the cache (which is to make issues
sooner), so I determined to make use of a separate SQLite database for the cache. Because of
better-sqlite3 (which is COMPLETELY
SYNCHRONOUS!), this was once beautiful easy.

For essentially the most section, shifting from my very own cachified to Hannes’ model went very
neatly. There have been only some other (higher) choice names and I had to
write a SQLite adapter:

export const cache: CachifiedCache = {
  identify: 'SQLite cache',
  get(key) {
    const end result = cacheDb
      .get ready('SELECT price, metadata FROM cache WHERE key = ?')
    if (!end result) go back null
    go back {
      metadata: JSON.parse(end result.metadata),
      price: JSON.parse(end result.price),
  set(key, {price, metadata}) {
      .get ready(
        'INSERT OR REPLACE INTO cache (key, price, metadata) VALUES (@key, @price, @metadata)',
        price: JSON.stringify(price),
        metadata: JSON.stringify(metadata),
  async delete(key) {
    cacheDb.get ready('DELETE FROM cache WHERE key = ?').run(key)

Now I will ditch every other provider and simplify my structure even additional! One
downside to this, then again is that I do not get auto-replication like I did when
the use of Redis (fly controlled that for me). So each and every area has to regulate its personal
cache which isn’t an enormous deal, however it is one thing to remember. Perhaps one
day when LiteFS helps writing to all areas, I’m going to transfer this into LiteFS as

The double-thick wall

Sadly, I hit slightly of a wall when I completed this a part of the migration.
MDX pages would not show. The rest that used MDX would not paintings in any respect. This
intended no longer solely weblog posts or mdx pages, but additionally any web page that has weblog publish
suggestions (so maximum pages). It took me reasonably a while debugging this to
work out that MDX compilation was once inflicting me bother as a result of there have been
in fact two issues and one in all them solely came about when deployed to my staging
atmosphere 😱

The primary factor was once I came about to be the use of the similar cache key for 2 purposes
πŸ™ˆ For quite a lot of causes I do not need to get into, they in fact do finally end up storing
the similar information. The explanation this migration to the brand new cachified equipment broke
it is because the cachified library does a just right task of constructing positive you solely
get a contemporary price for a specific key as soon as and all different requests for it wait
on that first one. So I finished up entering a scenario the place Serve as A would
cause getting a contemporary price which referred to as Serve as B which requested for that very same
key within the cache so it hung without end.

If that is complicated, this is because it was once. Do not fret, it is not essential that
you already know the issue. Simply know this. When coping with a cache, be sure
not to proportion a cache key between two other puts that may create the cached

The second one factor was once extraordinarily irritating as a result of I could not reproduce it
in the neighborhood. This intended I needed to deploy my website online to make and examine adjustments. I could not
use my most popular debugging gear (I may just solely use console.log and watch the
logs). I made use of patch-package so as to add logs
to libraries I used to be the use of. I added a loopy collection of logs to
@remix-run/server-runtime solely to decide that it was once running completely.

I added a useful resource path to the app so I may just run solely the offending code (to
scale back logs and simply because that is normally a just right follow right through debugging
to scope down the quest house). I in the end discovered that the problem lay in MDX
compilation once more. I added logs to mdx-bundler and located that execution
stopped when beginning to in fact collect the MDX with esbuild.

In any case I noticed it was once an overly particular weblog publish that brought about the cling up. It
was once my “Migrating to Jest” weblog publish. I noticed I
wanted to determine what a part of that publish was once inflicting problems. So I up to date my
GitHub fetching code so it would fetch from the proper department (so I may just make
content material adjustments within the dev department and no longer major the place folks can be
perplexed about my experiments in manufacturing). With that during position, I chopped up
the publish into 10 sections. Each and every had part of the publish.

I hit the particular useful resource path for each and every of the ones sections. I discovered it was once in
the ultimate phase. I additional cut up that into 6 extra sections and located the
drawback was once a particular tweet. The tweet on the finish of the publish the place I mentioned:
“PayPal is hiring!” which additionally connected to a PayPal task listings web page this is no
longer running. My customized twitter embed observation plugin plays a GET request
to hyperlinks inside of tweets so I will show metadata just like the
identify/description/symbol for the web page. Sadly since the web page wasn’t
running (it simply hangs without end), it brought about compilation to hold without end which
brought about maximum pages on my website online to hold without end as neatly.

Phew! What a ache! So I added a timeout for compilation and I got rid of the tweet
from the publish. There may be almost definitely extra I may just do to give a boost to the resilience of
this code as neatly πŸ˜…

Deploying LiteFS

With that malicious program labored out, my website online was once buzzing alongside simply positive in one
area. So I used to be waiting to determine find out how to setup LiteFS to get multi-region

As a result of LiteFS continues to be in beta, documentation continues to be slightly sparse. Even
worse, by hook or by crook I neglected all however the instance documentation web page. So I used to be
stumbling round at midnight reasonably slightly. In any case I discovered which recordsdata I
wanted and the place. I were given assist from Ben at once in a Zoom name right through one in all my
streams. I did stumble upon numerous problems that went again as comments to the
Fly crew. Ben was once reasonably responsive and useful. I in the end did get it running

Something I really like about LiteFS is that it does not infect my app code in all places the
position. The one factor I wish to fear about is that learn replicas are read-only
(for now), so I’ve to do the fly-replay stuff. In reality taking a look ahead to
once I may not must even do this and it in point of fact will simply be slightly of
configuration that does not affect my app code one bit.

I had my app running in a staging atmosphere for some time. When I felt like that
was once running neatly in more than one areas I determined to start out the migration. So I put
in combination slightly record of items I had to do:

  1. Open up logs on the whole thing
  2. Double-check volumes in kcd (den is important)
  3. Double-check reminiscence allocation in kcd (use 2GB right through migration, can drop
    to 1GB later)
  4. Double-check env vars in kcd
  5. Merge to major – cause a deploy
  6. Get ready DNS transfer (do not run it)
  7. SSH into the server and run npx /app/prisma-postgres/migrate.ts
  8. Fast Handbook High quality verify
  9. Run DNS transfer
  10. Pray
  11. Upload new areas
    (fly vol create --size 3 -a kcd --region {ams,maa,syd,gru,hkg}) and scale
    (fly scale rely -a kcd 6)
  12. Make sure that writes in non-primary areas are running
  13. Delete prisma postgres stuff (the migrate script and stuff within the dockerfile)
  14. Permit workflow high quality keep an eye on (flip again on ESLint, TypeScript, Trying out,
    and so forth.)

The whole lot went nice. As I write this, I am working the migration script. Turns
out, with nearly part 1,000,000 information, the use of a naΓ―ve solution to migration
takes reasonably a while. It could almost definitely be a lot sooner if I had been smarter with
databases, however here is what my script did:

import {PrismaClient as SqliteClient} from '@prisma/Jstomer'
// eslint-disable-next-line import/no-extraneous-dependencies
import {PrismaClient as PostgresClient} from '@prisma/client-postgres'

// TIP: don't do that in case you have numerous information...
async serve as major() {
  const pg = new PostgresClient({
    datasources: {db: {url: procedure.env.POSTGRES_DATABASE_URL}},
  const sq = new SqliteClient({
    datasources: {db: {url: procedure.env.DATABASE_URL}},
  look forward to pg.$attach()
  look forward to sq.$attach()

  console.log('hooked up πŸ”Œ')

  look forward to upsertUsers()
  look forward to upsertSessions()
  look forward to upsertPostReads()
  look forward to upsertCalls()

  console.log('βœ…  all completed')

  look forward to pg.$disconnect()
  look forward to sq.$disconnect()

  async serve as upsertUsers() {
    console.time('customers πŸ‘₯')
    const customers = look forward to pg.person.findMany()
    console.log(`Discovered ${customers.duration} customers. Upserting them into SQLite ‴️`)
    for (const person of customers) {
      // eslint-disable-next-line no-await-in-loop
      look forward to sq.person.upsert({the place: {identification: person.identification}, replace: person, create: person})
    console.timeEnd('customers πŸ‘₯')

  async serve as upsertSessions() {
    console.time('periods πŸ“Š')
    const periods = look forward to pg.consultation.findMany()
      `Discovered ${periods.duration} periods. Upserting them into SQLite ‴️`,
    for (const consultation of periods) {
      // eslint-disable-next-line no-await-in-loop
      look forward to sq.consultation.upsert({
        the place: {identification: consultation.identification},
        replace: consultation,
        create: consultation,
    console.timeEnd('periods πŸ“Š')

  async serve as upsertPostReads() {
    console.time('postReads πŸ“–')
    const postReads = look forward to pg.postRead.findMany()
      `Discovered ${postReads.duration} publish reads. Upserting them into SQLite ‴️`,
    for (let index = 0; index < postReads.duration; index++) {
      if (index % 100 === 0) {
        console.log(`Upserting ${index}`)
      const postRead = postReads[index]
      if (!postRead) {
        console.log('HUH??? No publish learn??', index)
      // eslint-disable-next-line no-await-in-loop
      look forward to sq.postRead
          the place: {identification: postRead.identification},
          replace: postRead,
          create: postRead,
        .catch(err => {
          console.error('error', err, postRead)
    console.timeEnd('postReads πŸ“–')

  async serve as upsertCalls() {
    console.time('calls πŸ“ž')
    const calls = look forward to pg.name.findMany()
    console.log(`Discovered ${calls.duration} calls. Upserting them into SQLite ‴️`)
    for (const name of calls) {
      // eslint-disable-next-line no-await-in-loop
      look forward to sq.name.upsert({the place: {identification: name.identification}, replace: name, create: name})
    console.timeEnd('calls πŸ“ž')

major().catch(e => {
  procedure.go out(1)

There may be extra attention-grabbing stuff that came about right here. In case you are concerned about
gazing the migration, then music in
right here.

The entire thing took about an hour and fifteen mins to run the migration
script. However as soon as that was once completed, the whole thing else went off just about with out

“Went off just about with out factor” this is… Till I noticed I had a troublesome
core reminiscence leak… Learn extra about that during
Solving a Reminiscence Leak in a Manufacturing Node.js App.

Best time will inform whether or not this was once a good suggestion or no longer. I am beautiful assured
that it is going to be. Simply shifting from 3 products and services to at least one is reasonably great. Native
building is more practical thank you to simply the use of SQLite as neatly.

And I am hopeful that the website online can be noticeably sooner. I might upload actual numbers
to this phase of the publish as soon as the brand new website online will get some extra site visitors.

Once I fastened the reminiscence leak I determined to check out deploying to more than one areas. I
did and it is incredible. Now, in case you are outdoor the USA, you’ll be able to get a miles sooner
kentcdodds.com as a result of my app and knowledge are side-by-side in a area as with regards to
you as conceivable. That is simply sensible.

0 0 votes
Article Rating
Notify of
Inline Feedbacks
View all comments
Back To Top
Would love your thoughts, please comment.x