Understanding Parallelization from Mom’s perspective (Parallelizatio Part 1)
How many times have you been told to make your code faster and the first thing that comes to your mind is parallelization? If you’re like me this blog is for you.
A little story to understand why we need Parallelization.
A chore
Imagine you are 28 and live with your mom. She asks you to buy groceries and put the clothes in the washer. Now you can go to the market, then come back home, put the clothes in the washer, and be done. But that’s slow. You are doing one task at a time. This kind of methodology where you execute one task at a time, finish it, and then execute the second task is called sequential execution.
Also throughout the blog, I’ll talk about uniprocessor computers only. This means the computer can only execute one task at a time. Imagine being able to boot your system, and then wait for it to load the entire desktop page. Then only after it is done you can move your cursor to an application icon, and then double-click on it to open it. While it’s opening you can’t move your cursor again. There’s no processor to handle the cursor movement. (You might have noticed this happening on your Windows laptop. When you have 53 Chrome tabs open the entire screen freezes and you can’t move your cursor. In such cases, if you leave your laptop alone for a couple of minutes, all the background processes will finish executing and there will be enough processors to handle cursor movement and user inputs).
Optimization 1
Now, because you spent a ridiculous amount of money on a computer science degree, you try to optimize your work. You decided that maybe we can put the clothes in first, then go to market, by the time you’re back they will be ready to go into the dryer. That’s how you save time in sequential executions. You optimize your existing algorithm. You try to go from a higher Big O time complexity to a lower one. Let’s say you were trying to look for an element in an array with O(n) (little silly you, applying linear search), but then you optimized it to an O(log n). That’s optimization.
Simply by switching the orders and deciding not to complete the entire task in one go, we save 10 minutes. That’s 28% efficient.
The same optimization in the computer world for a uniprocessor system can be done by context switching. The system takes a task, finishes it a little, goes to the next task finishes it a little, and keeps switching between them. The switching happens so fast that it feels that you are running multiple apps parallely even on a uniprocessor system.
There’s just one process but 4 tasks to run. What can we do? Let’s run the first task for the pi amount of time. Then the context switches to the second task for the pi amount of time and keeps doing this for all the tasks until they are finished. There are so many ways this can go wrong but it’s still one of the most famous algorithms you might have studied in college called round robin.
But what now?
However, not all algorithms can be optimized. At a point, it becomes harder to further optimize your algorithm, and easier to throw more computing power at it. In the last decades, the computation power has grown more folds than engineers' abilities to write better algorithms. So let’s use computing power.
Let’s take the same example again. You have 2 simple tasks going to market and doing the laundry. What if, you could make your clone? A clone so identical even your mom can’t tell apart. Same physical and mental characteristics as in the same hardware and software access. One of you can then go, and buy the groceries and the other can finish the laundry.
No context switching is required. The work is done almost in half time. If you had 4 tasks and you spun up 3 more clones, you could finish the work in one-fourth of the time. The only simple requirement is the task you’re doing should be dividable. Parallelization mostly follows divide and conquer. Break your big task into small chunks, then execute each chunk in parallel by multiple processors. That’s it. That’s parallelization.
Whaaaaat. It’s still 25 minutes even after parallelization. Some tasks can’t be optimized just by putting more computational power. But if you take the round-robin example from above, parallelization can improve it.
As you can see from the above image, if your computer has more than one processor, it can execute things in parallel and save time.
Why can’t we just parallelize everything then?
That’s a very good question. In the beginning, I said we would talk about uniprocessor systems i.e. computers with only one processor. But that’s not true anymore. Now you have a thousand times better systems than the ones used in the Apollo space mission, so might as well use them right?
Even if you have multiple processors, the memory that each process accesses is the same. It’s the same hard disk and the same cache. Isolation of each process becomes difficult. One process might end up changing something in the shared resource in a way that other process fails.
I will explain this more in the next blog.