
What is an Algorithm?
“Computer science is the study of algorithms”
Donald E. Knuth,
Author of “The Art of Computer Programming”
An algorithm is the procedure of getting an output for a certain computational problem when given an input.
Simply, it is the correct procedure for solving problems.
This is not necessary to be a programming code. Consider this classic humorous example, shampoo algorithm 🙂
- Lather
- Rinse
- Repeat
This is a loop structure because the last word “Repeat” will repeat the previous two procedures again and again. Following this algorithm will result an empty shampoo bottle soon!
There are a lot of examples in our day-to-day life where we follow algorithms.
An example of application of algorithm in day-to-day life is, using shortcuts! When there’s a path perpendicular to the path you are currently moving and if you want to go to that path, probably you would choose the hypotenuse because you know it is shorter than the sum of the two paths.
Now when it comes to the field of programming, computers always need a very clear, unambiguous procedure to perform an operation. The way of writing code fulfilling this requirement is the real challenge. Learning about programming algorithms will make us more comfortable on the path to achieve it.
One way of solving a problem could be faster and clearer than another. Likewise, one algorithm could be effective than another one according to the given problem.
But how to know which algorithm is better?
We calculate the runtime of an algorithm by just counting the number of steps it takes to perform a certain task.
This should be obvious. Lesser the number of steps, faster the algorithm.
But, there’s a problem. How do we define a single step?
For example, we may consider addition of two numbers as a single step.
1 |
int x = 3 + 4; |
This is fine. But how about multiplication?
1 |
int y = 3*4; |
We know that, multiplication is analogous to adding the same item several times.
1 |
int y = 3 + 3 + 3 + 3; |
So, does multiplication take more time than addition? Yes. It does. But not much!
Another thing. The accessing of memory in RAM (Random Access Memory) is faster than accessing the memory stored in the hard drive. The process of accessing hard drive has several additional steps to perform.
Now, suppose your code has a step which requires the access of an item saved in hard drive. You don’t know this fact. So you count this as a single step as well.
But since this actually consist of multiple steps when running, it would take more time to complete than a normal single step.
So when measuring the runtime of an algorithm in real computers, it would be overwhelming to consider all the tiny details like these.
Hence, we need an alternative approach to find the runtime of algorithms so that we can find which one is better.
RAM Model for Computations
RAM – Random Access Machine (Machine, NOT Memory!) is a hypothetical computer which is used in developing machine independent algorithms.
This model is used to overcome the problems that we have encountered when calculating the run-time of an algorithm.
This model consists of a computer with the following rules:
- All the basic operations
addition (+)
multiplication (*)
subtraction (-)
assignment (=)
if
call
takes exactly one step.
- Loops and methods are not single step operations. This is obvious because, inside a loop or a method, there are several steps to run.
- There is no any time delay in accessing memory. Also we have infinite amount of memory.
So, with the help of RAM, all the problems that we met before were solved!
Best, Worst and Average Case
The code that we type would not follow the same number of steps always.
For example, having a control structure like switch or if statements in the program would make it run different pieces of codes at different times according to its state.
The pseudo code here would run one of the two blocks by considering the state of the variable X. State is the value stored in that variable. If the value stored in X is the word “true”, then block A is executed. Else, if X is “false”, the next block B is executed. (Block is just a collection of several lines of code)
1 2 3 4 |
If (X is True) run Block A; else run Block B; |
We don’t know how many lines in Block A and Block B. If A has 1000 lines and B has 10 lines, A would surely take a longer time than B. This all depends on the value stored in X variable. So if the value of X could change randomly each time the algorithm is running, there would be two possible run-times.
Note that, the code is running on a Random Access Machine. So, there’s no any previous issues that we discussed previously affecting the run-time. Only problem here is the Uncertainty.
There are 3 basic categories of running time of an algorithm.
- Best-Case : Algorithm would run in a minimum number of steps. So, the best case is the fastest one. Takes the least time.
- Average Case : Average value of the run time of an algorithm, for a given number of inputs.
- Worst-case : The highest time taken by an algorithm to run a particular number of inputs.
The number of inputs to an algorithm is a very important factor. When we have more number of inputs, more items will be processed by the algorithm so, the run time depends on this parameter. Hence, we measure the running time of all the 3 different cases for a given number of steps.
The 1st and 3rd cases have a very low probability to occur when the number of inputs to the algorithm is large.
We can represent each of these 3 cases for a particular number of inputs say, n, in a graph as follows:
Think about the next number of inputs, n+1, we can add the three cases for n+1 in the same graph as follows.
Likewise, this can be repeated for very number of inputs, starting from 0. By joining each point of 3 cases separately, we get 3 nice curves.
Thus, we get a numerical function for each case. They are called time-complexity. They give a fine idea on how the running time of a certain algorithm behaves with the increase in the number of inputs, that is, the problem size.
But unfortunately, there are various practical problems. It is hard to formulate a numerical function for these graphs as the function has to map each item of the curve correctly. Even if we build the function, it would be complex, so, algorithm analysis would then be really difficult. To overcome these problems, the “Big O notation” is used. I’ll describe more about this notation in the next article.
References
- The RAM Model of Computation. 2017. The RAM Model of Computation. [ONLINE] Available at: https://www8.cs.umu.se/kurser/TDBA77/VT06/algorithms/BOOK/BOOK/NODE12.HTM. [Accessed 18 April 2017].