Why can’t you just put Promise.all() everywhere — especially while accessing databases (Parallelization Part 2)

Aashish Peepra
9 min readApr 7, 2024

--

If you ever have been in a relationship, I’m sure you would be in a situation where you made just too many promises and now it’s getting hard to keep them. The problem is, that it seems that all of them can be executed independently and are almost mutually exclusive tasks. But you didn’t consider how these tasks might get affected by each other due to shared resources. You will need to solve the relationship problem yourself but I can help you understand the computer version of this.

P.S.: This one is a tough read. So if you just woke up and this is your first email, I suggest you grab the coffee first.

Let’s look at some code first

 async syncInvoiceIndexForAllRootFiCompanies(): Promise<boolean> {
const rootFiCompanies = await this.prisma.rootFiCompany.findMany({
where: {
corporationId: {
gt: 0,
},
},
include: {
corporation: true,
},
});

const invoiceSyncPromises = (
await Promise.all(rootFiCompanies.map(async rootFiCompany => await this.setInvoicesToDatabaseFuture(rootFiCompany.id, rootFiCompany.corporation.companyId)))
).flat();

try {
await this.prisma.$transaction(invoiceSyncPromises);
return true;
} catch (err) {
console.error(err);
}

return false;
}

Problem: There are some companies in the database. Each company has some invoices in a third-party solution. We need to sync those invoices against the companies in our database.

The code does the following

  1. Gets all the companies from the database
  2. For each company fetch all the invoices from a third-party solution like QuickBooks. For each invoice, create a promise to write that invoice in our database. I will rephrase this. Let’s say there are 100 invoices for each company, this code will create 100 promises, and each promise to write one invoice in the database.
  3. Then we collect all these promises and put them inside a transaction. Hoping that if we fail to write any single invoice correctly, the whole thing will blow and we will get an alert. Then we can debug and handle it accordingly.

A couple of things to consider even before we move forward:

  1. This is bad from a business perspective. We should sync each company separately. This way if the data for one company’s invoice is corrupted, the rest of the companies will have no sync issues. Hence the code should be modularized at the company level with alerting added accordingly.
  2. What happens if the user has disconnected their accounts? What happens if an invoice is deleted from the original system?
  3. I think Sharma Ji’s grandma can catch her pug before this code is completed. It’s slow. It’s bad. It’s ugly and you know what’s the best part. I wrote it. It looks well abstracted and guess what, that’s the problem. This was a corn job in our system at Commenda that ran every 3 hours. The task itself took half an hour to complete and to be honest, I’ve never seen it complete 100%.

Let’s try to fix it. Let’s re-write it together

  1. Let’s write a function that syncs one company at a time.
  async syncInvoiceForRootFiCompany(company: RootFiCompany): Promise<boolean> {
return true;
}

2. Let’s get all the invoices for this company and just this company.

  async syncInvoiceForRootFiCompany(company: RootFiCompany): Promise<boolean> {
try {
const invoices = await this.getInvoices(company.id);
} catch (err) {
console.error("Failed to load invoices from third party for ", company.id, err);
// trigger an alert
return false;
}

return true;
}

3. Generate the promises to update the invoices in our database.

async syncInvoiceForRootFiCompany(company: RootFiCompany): Promise<boolean> {
let invoices: InvoicesResponse;
try {
invoices = (await this.getInvoices(company.id))?.invoices;
} catch (err) {
console.error("Failed to load invoices from third party for ", company.id, err);
// trigger an alert
return false;
}

const invoiceSyncPromises = invoices.data.map(invoice => this.setInvoiceToDatabase(invoice, company.id, company.corporationId))?.flat();

return true;
}

4. Let’s write the invoices in a single transaction to the database.

  async syncInvoiceForRootFiCompany(company: RootFiCompany): Promise<boolean> {
let invoices: InvoicesResponse;
try {
invoices = (await this.getInvoices(company.id))?.invoices;
} catch (err) {
console.error("Failed to load invoices from third party for ", company.id, err);
// trigger an alert
return false;
}

const invoiceSyncPromises = invoices.data.map(invoice => this.setInvoiceToDatabase(invoice, company.id, company.corporationId))?.flat();

try {
await this.prisma.$transaction(invoiceSyncPromises);
} catch (err) {
console.error("Failed to write invoices to th database for ", company.id, err);
// trigger an alert
return false;
}

return true;
}

P.S: Check how to setup production ready logging system

The setInvoiceToDatabase function is an abstraction for now. We will expose it in a second. This will also show why abstractions at the wrong level are useless if not worse.

5. Call this function for each company in parallel from the root caller (corn job).

  async syncInvoicesForAllCompanies(): Promise<void> {
const rootFiCompanies = await this.prisma.rootFiCompany.findMany({
where: {
corporationId: {
gt: 0,
},
},
include: {
corporation: true,
},
});

await Promise.all(rootFiCompanies.map(company => this.syncInvoiceForRootFiCompany(company)));
}

Well, this is not the only way to do task 5. It depends on your requirements. For example what if you don’t put an await on your Promise.all() and let all the companies sync in the background? The alerting already exists on per company level and if something goes wrong you will be notified nonetheless of the result of your await Promise.all().

But I prefer having an await so this function could then be used in different parts of the app. Imagine someone else uses your code and writes await syncInvoicesForAllCompanies() and finds out, it has a race condition inside and it doesn’t wait for the entire sync process to happen. So it’s better to think about your function from a user’s perspective.

Conclusion: It was still dead slow. Also, I encountered something interesting. I encountered a database lock error and the code bursts.

It’s time to learn more about the setInvoiceToDatabase function.

I keep saying that for each invoice that we get from the third-party solution, we will create a record in our database. There’s one more thing along with an invoice, the customer who owns the invoice. Now customers can be repeated across invoices. This means, there could be one customer who has 100 invoices.

The setInvoiceToDatabase function returns an array of the database write promises. Elements in that array are a write to an upsert customer, upsert invoice, and upsert line items (products sold in the invoice).

  async function setInvoiceToDatabase(invoice: InvoicesResponse["data"][number], companyId: number, rootFiCompanyId): Prisma.PrismaPromise<any>[] {
const contactUpsert: Prisma.RootFiContactUpsertArgs = {
where: {
id: invoice.contact.data.rootfi_id.toString(),
},
create: {},
update: {}
};

if (invoice.contact.data.addresses?.data?.length > 0) {

}

const invoiceUpsertQuery: Prisma.RootFiInvoiceUpsertArgs = {
where: {
id: invoice.rootfi_id.toString(),
},
create: {},

update:{},
};

if (invoice?.addresses?.data?.length > 0) {

}

let prismaMutations: Prisma.PrismaPromise<any>[] = [this.prisma.rootFiInvoice.upsert(invoiceUpsertQuery)];
if (invoice?.line_items?.data?.length > 0) {
prismaMutations = [...prismaMutations, ...this.getLineItemUpsertMutations(invoice?.rootfi_id?.toString(), invoice.line_items)];
}

return prismaMutations;
}

Some parts of the code have been deleted so I don’t expose our entire database model. If you’re from the Commenda engineering team, wassup guys, and don’t forget to approve my PRs!

So now we have one transaction with thousands of invoice nested writes. That also writes customers and line items. The biggest redundancy is the line items and then the customers. A company can have one thousand invoices, but only a couple of SKUs, and customers often repeat.

If you still don’t know what’s wrong. I will write a brief paragraph to explain. Look at this table for starters

The first query starts and it says, does the database have an invoice with ID 1001, it’s a miss. So it creates a new invoice. Does it have the customer C001, it was a miss so it created the customer. Then does the database have SKU123 and SKU456, it was a miss, so both products are created.

To execute the first row, we did 4 writes (1 invoice, 1 customer, 2 products).

In the second query, there’s no invoice with ID 1002, and no customer with ID C002, but SKU456 already exists inside the transaction scope. So it will update it. The SKU data is still the same, but it is still going to update all the fields in the SKU table. Why? Because it’s an upsert query.

To execute the second row, we did 4 writes (1 invoice, 1 customer, 1 SKU creation, and 1 SKU update).

Similarly, to execute the third row. We created a new invoice 1003. But we updated all the fields for the customer table where the row ID is C001 for no reason. Hence again 4 write operations.

P.S: If at the beginning of the article, you thought, why not create each invoice parallelly, well this is the reason. The current abstraction couldn’t support it because of the nested upsert queries.

Do you come to see the redundancy now? Hold that thought for a second.

Let’s talk about the latter problem. The Locks

When does a lock error happen? Why will there be a database lock error in the first place? How to fix it?

Race Conditions (Quick Recap)

Let’s say you are writing a bank server. It should be able to do 2 simple things. Transfer money between accounts and let you withdraw it. Let’s focus on the first one.

You wrote a simple script that reads the bank balance, adds X amount to it, and updates it back to the database.

But what if Alice and Bob, both send some X and Y amounts at the precisely same time?

This can result in unexpected behavior. In the scenario described, the result might be that the balance reflects only one of the transactions, rather than both. For example, if both transactions are processed simultaneously, the balance might end up being $40 (the original balance plus Bob’s $10), rather than the expected $52 (original balance plus both Alice’s $12 and Bob’s $10). This is called a race condition. To prevent this we use Locking.

Locking (Very Quick Recap)

I won’t get into detail here. This is Computer Science 101. But locking lets only one user update one field in the database at any given time. So no matter if Bob and Alice sent the money at the same time. The database will execute the enter transaction for Alice first, only once the transaction is completed, it will unlock the record and let Bob update it. Operating systems internally also use locking while executing parallelization. This prevents the parallel processes (tasks) from overwriting each other’s results.

Why the heck did we get lock errors in our code?

It’s been a month since I debugged this. So I don’t remember the exact reason. But it was a function of parallelly creating different resources with the same IDs across different companies and the spaghetti upsert queries. But it was a fun one to debug. I will try to reproduce it again and update this section in the future.

Okay, so what’s the solution?

The problem was the upsert queries. The solution was simple. Find all the unique customers first. Then upsert all the customers. Find all the unique invoices first then upsert the invoices. Find all the unique line items and then upsert and connect those line items with the invoices.

Break the nested upserts, into truly independent tasks. By doing this, you’re slowly making them mutually exclusive and closer to parallelizable.

  async setInvoicesToDatabaseFuture(rootFiCompanyId: number, companyId: number): Promise<boolean> {
const invoices = await this.getInvoices(rootFiCompanyId);

try {
// Order matters here
await this.prisma.$transaction([
// This handles the case where the rootFi invoices are deleted from quickbook but persistent in our system
this.prisma.rootFiInvoice.deleteMany({
where: {
rootFiCompanyId,
},
}),
this.prisma.rootFiContact.deleteMany({
where: {
rootFiCompanyId,
},
}),
]);

// upsert all the customers first
await this.prisma.$transaction(this.getContactUpsertMutations(invoices.invoices, rootFiCompanyId).contacts.map(mutation => this.prisma.rootFiContact.upsert(mutation)));

// upsert all the invoices
await this.prisma.$transaction(
this.getInvoiceUpsertMutations(invoices.invoices, rootFiCompanyId, companyId).invoices.map(mutation => this.prisma.rootFiInvoice.upsert(mutation)),
);

// Then upsert the line items and connect it to the invoices
await this.prisma.$transaction(this.getLintItemsUpsertMutation(invoices.invoices).lineItems.map(mutation => this.prisma.rootFiLineItem.upsert(mutation)));

return true;
} catch (err) {
console.error(err);
return false;
}
}

This is what the final code ended up looking like. This function updated the invoice for one company at a time. But it further breaks the task into upserting customers, invoices, and line items.

If you want to re-write the above code for improvement by doing the following, then my friend you learned nothing.


await Promise.all([
await this.prisma.$transaction(this.getContactUpsertMutations(invoices.invoices, rootFiCompanyId).contacts.map(mutation => this.prisma.rootFiContact.upsert(mutation))),
await this.prisma.$transaction(
this.getInvoiceUpsertMutations(invoices.invoices, rootFiCompanyId, companyId).invoices.map(mutation => this.prisma.rootFiInvoice.upsert(mutation)),
),
await this.prisma.$transaction(this.getLintItemsUpsertMutation(invoices.invoices).lineItems.map(mutation => this.prisma.rootFiLineItem.upsert(mutation)));
])

Outcome

What are you getting out of this? Well, nothing. We did do a quick recap on some basic parallelization principles to jog your memory. We saw you can’t run around throwing Promise.all() everywhere like a maniac and that was the title of the blog. We also saw how your code should consider business logic and give it a higher priority than simply writing a beautiful piece of complex nothingness.

So next time, think about will one of the processes in your promise fuck up another process silently and result in a Saturday spent on debugging.

Read Part 1 of Parallelization Series 👇

--

--

Aashish Peepra

Software Engineer @commenda-eng | 100K+ people used my websites | Mentored 28K+ Developers globally | Google Solution challenge global finalist