ALTERNATE UNIVERSE DEV

CodingBlocks

84. Algorithms You Should Know

It’s time we discuss algorithms we all need to know as we continue diving into Rob Conery’s The Imposter’s Handbook while Michael will read anything, Allen questions Greenland’s name, and Joe talks wormholes.

Sponsors

  • Datadog.com/codingblocks – Sign up today for a free 14 day trial and get a free Datadog t-shirt after creating your first dashboard.

Survey Says

Anonymous Vote
Sign in with Wordpress
What is your most productive season?
  • Spring. Best way to avoid the pollen!
  • Summer. Let me stay behind my keyboard in the nice air conditioned room.
  • Fall. The sky, er, leaves are falling. Safer to stay inside.
  • Winter. Olaf scares me, I'm not going out there.
  • Seasons? We don't have seasons here!!

News

  • As always, we take a moment to give a huge thank you to everyone that left us a review:
    • iTunes: AKeith047, ThinkingWithMyHareBrain, T.Todorov, MeroIT, YourWorstEn3my, fiery_ginger, Emchar86, Joshua Reynvaan, Michael-not-an-outlaw, “Michael Joe Allen, SerialKilla”
    • Stitcher: LiamM, daniel65, CBFan7629045419, Radulan
  • Default interface methods might be coming to C# 8. (Info Q)
  • We released our second Community Talk, this time discussing RAD Studio. (YouTube)
  • Joe releases more videos adding to the QIT Diary. (YouTube)
  • Allen explains why a cube or data warehouse doesn’t solve the search problems we discussed in episode 83.

Algorithms Made Easy

Bubble Sort

What is it?

  • Iterate through the list to be sorted, comparing adjacent items, swapping them if they are in the wrong order. Repeat this process until no swaps are required.
    • Consider the values: 1, 3 5, 2, 4
    • First compare the 1, 3. They are in the correct order. Then compare the 3 and 5. They are also in the correct order.
    • Now compare the 5 and 2. They need to be swapped. The values are now 1, 3, 2, 5, 4.
    • Now compare the 5 and 4. They also need to be swapped. The values are now 1, 3, 2, 4, 5
    • We need to iterate through the list again. Compare 1 and 3. They’re good. Now 2 and 3. Swap ’em! The values are now 1, 2, 3, 4, 5.
    • Now we iterate over the list again. Nothing needs to be swapped. But we had to do this to verify that.
    • So we’ve sorted the list. It took 3 iterations over the list to rearrange 1, 3, 5, 2, 4 into 1, 2, 3, 4, 5.
  • Best time complexity is O(n) while the average and worst time complexities are O(n^2). The worst space complexity is O(1).
  • Visualizations available at Wikipedia and Toptal.

Strengths

  • Might be the easiest of sorting algorithms to implement. Certainly easy to understand.
    • A case can be made that this makes it a nice introduction to algorithms, although, some argue that better algorithms should be used as an introduction instead, like insertion sort or quicksort.

Weaknesses

  • Might also be one of the worst sorting algorithms in regards to time complexity.
  • Horribly inefficient in regards to time. There is almost always a better choice.

When to use it?

  • According to Wikipedia, there is a popular use for it in a very specific use for graphics, but aside from that avoid it unless you specifically know why it’s the best choice.
    • Basically when you can *assume* that the list is already “almost-sorted”
    • From Wikipedia: _”In computer graphics bubble sort is popular for its capability to detect a very small error (like swap of just two elements) in almost-sorted arrays and fix it with just linear complexity (2n). For example, it is used in a polygon filling algorithm, where bounding lines are sorted by their x coordinate at a specific scan line (a line parallel to the x axis) and with incrementing y their order changes (two elements are swapped) only at intersections of two lines.”_

Merge Sort

What is it?

  • Most common sort, out performs most other sorts.
  • Keep splitting your array in half, until you can’t anymore – then sort them back together.
  • Note: you don’t have to allocate new arrays every time you split, but you probably will end up creating new arrays when you put Humpty back together again.
  • Considered a stable sort.
  • Doesn’t _sound_ more efficient than bubble sort, but the trick is that merging sorted lists is much faster (less repeated work).
  • Best, average, and worst time complexities are O(n log(n)) and the worst space complexity is O(n).
  • Visualizations available at Wikipedia and Toptal.

Strengths

  • Solid performer, great all around routine if you don’t know much about your list going in.
  • In the worst case, merge sort does about 39% fewer comparisons than quicksort does in the average case.

Weaknesses

  • Mildly annoying to program, generally going to want two functions – one to call and another to merge.
  • Not commonly done inline, because of the merge step.

When to use it

  • This is your default choice.
  • Despite it’s memory overhead, it scales well because with large data because of it’s “divide and conquer” nature.

Quicksort

What is it?

  • Breaks a larger list into smaller lists with the use of a pivoting technique
  • Smaller lists use pivoting technique until they are sorted
  • The partitioning is done at random which can be a disadvantage for the list
  • Let’s consider this example list: 5, 2, 7, 4, 1, 8, 3, 6
    • Take the last number and use it as the partition value: 6 (Lumoto Partition Scheme).
    • All values less than the partition stay to the left of the partition value, 6.
    • All values more than the partition value move to the right.
    • 5 is less than 6, so it stays where it is.
    • 2 is less than 6, so it stays where it is.
    • 7 is greater than 6, so it needs to move after it, but doing that requires you to move the pivot down one position, but there’s already a number there, 3, so three is going to move back to where the 7 was.
    • Now the list is 5, 2, 3, 4, 1, 8, 6, 7.
    • Now we have to look at the 3 that just took the 7’s place, and it’s less than 6, so it can stay there.
    • 4 and 1 are both less than 6, so we’ll leave those alone as well.
    • 8 is more than 6, so it’s going to swap places with the 6 now.
    • Now the list is 5, 2, 3, 4, 1, 6, 8, 7.
    • That’s the end of the first partition sort…
    • Now we’ve got two new lists – the left of the old pivot 6, and the right: 5, 2, 3, 4, 1, _6_, 8, 7.
    • 6 is where it will be in the end, so you leave that one alone for now…
    • 1 is the new pivot point of the left list, 7 is the pivot point of the right list…let’s start with the left list.
    • 5 is greater than 1, so we’re going to move it after 1, which means 4 needs to move where 5 was.
    • So our remaining comparison operations will transition like this as the 1 moves to the left:
      • 4, 2, 3, 1, 5
      • 3, 2, 1, 4, 5
      • 2, 1, 3, 4, 5
      • 1, 2, 3, 4, 5
    • That was it for the left list…
    • Now we need to sort the right list: 8, 7.
    • Which is simple in this case, so the 7 moves and we get: 7, 8.
    • That’s it for the right…
  • Best and average time complexity is O(n log(n)) while the worst is O(n^2). Worst space complexity is O(log(n)).
  • Visualizations available at Wikipedia and Toptal.

Strengths

  • If you were to choose your pivot intelligently by finding the median value (requires a list scan first), then it will usually outperform the merge sort – because the sorting happens in place, the space complexity is less and fewer operations

Weaknesses

  • So the interesting thing here is if the list was already sorted, and you do this version, you get to O(n^2) – terrible performance.

Selection Sort

What is it?

  • This is how we might actually sort something, like playing cards for example.
  • Iterate through the list, finding the smallest element, and swap it with the first element. Repeat for each additional element in the list.
    • Consider the values: 1, 3 5, 2, 4.
    • Scan the list looking for the smallest value. After going through the list, we determine that 1 is already in the correct position.
    • Scan the list again looking for the next smallest value (which could be equal to 1 by the way). We eventually find 2. Swap 2 and 3. Now the values are 1, 2, 5, 3, 4.
    • Scan the list again looking for the next smallest value and we eventually find 3. Swap 3 and 5. Now the values are 1, 2, 3, 5, 4.
    • Scan the list again and we eventually find 4. Swap 4 and 5 and now our values are 1, 2, 3, 4, 5.
    • We iterated over the list 4 times.
  • Best, average, and worst time complexities are O(n^2) and the worst case space complexity is O(1).
  • Visualizations available at Wikipedia and Toptal.

Strengths?

  • No recursion necessary meaning no stack overflow exceptions to worry about.
  • Good for small lists
  • Can be advantageous in some circumstances, particularly when memory is limited.

Weaknesses?

  • Similar to bubble sort, it’s not very efficient. Worst case, the time complexity is O(n^2).
  • We have to scan each element in order to find the smallest value.
  • Not a good choice for large arrays.

When to use it?

  • Good (enough?) for small arrays.
    • But how many items in the array is still considered small? Some say 10-20 elements, others say 20-30 elements …
  • Can result in fewer write operations, which could be important if you’re sorting (and writing) on Flash memory.

Heapsort

What is it?

  • Divide the input into a sorted and an unsorted region, then you keep shrinking the unsorted region
    by moving the largest number out of the unsorted region.
  • The trick here, is that the the heap data structure allows you to quickly lookup the maximum, so rather than having to look at each item O(n).
  • It’s “in place” meaning low over head, but it is not stable.
So…what’s a heap?
  • It’s a tree where the parent node has a numerical relationship with it’s children – either the parent is bigger or equal (max) or smaller or equal (min).
  • For our purpose here, the main point is that the root of the tree is always bigger than either of it’s children.

So, lets revisit – start with an unsorted, build a binary max heap, cut the root off, then decide which child is the new root.

  • Best, average, and worst time complexities are O(n log(n)) and the worst space complexity is O(1).
  • Visualizations available at Wikipedia and Toptal.

Strengths

  • Better worst case scenario than QuickSort – O(log n) vs O(n^2).
  • Can be done in place.

Weaknesses

  • On average, performs worse than Quicksort despite the better worse case scenario.

When to use it

  • When you really care about the average sort time and memory usage – heap is nice and consistent (Similar best, worst, average).

Binary Search

  • The list to be searched has to be sorted first.
  • Divide and conquer algorithm.
  • Keep splitting the list in half and ignore the part you don’t need.
  • Let’s consider the following list: 1, 3, 5, 7, 9, 11.
    • Let’s say we’re looking for 9.
    • We divide the list, so we have 1, 3, 5 in one list, 7, 9, 11 in the other.
    • 1, 3, 5 can be ignored as we know 9 is greater all of those. Remember, we know the list is already sorted.
    • 7, 9, 11 gets split again in the middle…but because we can’t evenly split an odd number sized list, we’ll put 7, 9 on the left and 11 on the right.
    • 11 is greater than 9, so we’ll ignore it…
    • We’re just left with 7 and 9.
    • One last comparison and we now know where our item is.
  • Best time complexity is O(1). Average and worst time complexity is O(log n). Worst space complexity is O(1).

Graph Traversal

What is it?

  • Graph traversal is the concept of navigating a graph.
    • This gets tricky. We can do this either via recursion or an iterator. If we choose recursion, we risk stack overflow exceptions. So choose wisely.
      • Recall that every time we recurse a function, the variables in that function remain on the stack.
  • Consider your family tree. There are a couple ways we could search that tree (er, graph).
  • One way, might be to go as far down one path as possible before going to the next node.
    • Think of researching your ancestry. You might explore the lineage as far back as you can go starting with your mother, before exploring your father’s lineage.
    • This is a Depth First Search.
  • Another search, might be to first search all of the nodes on the same level, before going to the next.
    • Again, considering the family example, you might be more inclined to ask closer family members for money (or help) before going further down the tree. In other words, you might ask a sibling before your parents, or your parents before your grandparents, etc.
    • This is a Breadth First Search.

Depth First Search

What is it?

  • If you take a little bit of time up front you can save a lot of time retrieving data.
  • By a little time I mean x.
  • You can implement with recursion or a stack, stack is nice on memory since you only ever have a max of tree depth nodes and it’s easy to limit your algorithm based on the depth.
  • You can store a pointer back to your parent in order to avoid having to track your path in some situations (binary search tree) but if you’re in a cyclic graph you need to know the full path.
  • Lots of different types of trees – generally the more balanced the tree is, the better.

Strengths

  • Simple way of traversing a tree.
  • Better at finding “farther” data points.

Weaknesses

  • Worse at finding closer data points.
  • Some people might mention a problem with infinitely deep trees, but that’s a silly example since you can just as easily imagine a tree that’s infinitely broad.

Breadth First Search

  • From a starting node, traverse the tree by level, or layer-wise, before moving to the next level.
  • You’re searching every level of the tree, from the starting node, fully before going on to the next level of the tree.
  • You’ll visit all peers/siblings on a given level before moving to the next level.
  • Need to track each node in a tree before going to the next level.
    • Have to track every node and it’s children in order.
    • Use a queue for storing that info.

Strengths

  • Good for finding the shortest path between two nodes.
  • According to Wikipedia: _”When applied to infinite graphs represented implicitly, breadth-first search will eventually find the goal state, but depth-first search may get lost in parts of the graph that have no goal state and never return.”_

Weaknesses

  • Can take more time if the node being searched for is further down the tree and/or the tree is wide.
  • Can be memory intensive when the tree is wide.

Resources We Like

Tip of the Week

Episode source