VitaminAlgo

VitaminAlgo

Problem Decomposition, TradeOffs, Concept Map:  
Kth Largest Element in a Stream

Problem Decomposition, TradeOffs, Concept Map: Kth Largest Element in a Stream

Anup Kalburgi's photo
Anup Kalburgi
ยทFeb 21, 2022ยท

5 min read

Table of contents

  • Let's dive into the problem
  • Solution Exploration
  • Concept Map
  • Code for Optimum Solution
  • Questions ๐Ÿค”

Previously, in the blogs, problem solving has involved implementing what is obvious from the problem description and then optimizing the solution based upon the identifying constraints.

This time we will explore how to decompose the apparent problem into parts, where each part involves multiple ways to solve and optimizing a solution. As we are decomposing the problem, we will understand the trade-offs between optimizing the different parts.

During the process, understanding the sort of concepts we are connecting to build a concept map which can be further used as a pattern for related problems.

Let's dive into the problem

Design a class to find the kth largest element in a stream. Note that it is the kth largest element in the sorted order, not the kth distinct element.

Problem Description in Detail here.

image.png

Solution Exploration

As it is a stream of elements, we would start with some state of collection of elements and when we add a new element, we look for Kth largest element at that state.

We would break this implementation down into two functions

  • init() : which initializes the collection and Kth variable
  • add() : takes an element as input and returns Kth Largest element

Bruteforce

Screen Shot 2022-02-20 at 11.34.03 AM.png

  • init() is called only once for object initialization, we assume an Array / List. That will take O(c) ~ O(1) time complexity and O(c) space due fixed size of collection.
  • add() is getting called every element in a stream.
    • To find Kth Largest , we would iterate k times the array to compare the elements, which is in worst case O(k*n) time complexity. When k == n, that is O(n^2) time complexity.
    • Space complexity is O(n) [Original init() c==n] + O(n) [k==n].

Sort and Find Kth

Screen Shot 2022-02-20 at 11.34.12 AM.png

  • init() is called only once for object initialization, we assume an Array / List. Complexity same as above.
  • add() is getting called every element in a stream.
    • To find Kth largest , we first sort the array of n elements ,at any given point, which has time complexity of O(nlogn) and space O(n).
    • Once array is sorted, finding Kth largest is O(1) time complexity.

Optimize both Sorting and Finding Kth

In previous approach, we have init() function and find kth problem optimized to constant time complexity. Only sorting is now taking longer O(nlogn), is there a solution where sorting can be optimized?

  • The best case for a sorting algorithm out there is O(nlogn).
  • As this a collection of stream, and at any given point, we add only a new single element to the collection, is there a way to know where this new element fits into the collection?

Screen Shot 2022-02-20 at 11.34.24 AM.png

Right! After sorting once maybe in the init() or add(), we can keep maintaining a sorted array by finding position for this new element from the stream getting added using search methods.

  • Linear search and comparing will perform worst time complexity of O(n), but we can improve the searching using Two Pointer O(n) (some search will be reduced compare to linear search, in the average case) or even Binary Search O(logn).
  • As in the problem, we only care for Kth largest, we do not need to maintain all the elements coming in for a stream, which can be million/billions of elements. We can maintain only the top k elements , reducing the space complexity to O(k), improving further the searching methods , as shown in the diagram above.

Trade Offs Made

So far, we have only tried to optimize the function add(), keeping init() function with constant time and linear space complexity.

Can we make some trade-off here?

  • init() is basically call only once for object initialization.
  • add() is getting called every element in a stream.

Is there a solution which can make init() little slower but add() faster than O(nlogn)?

  • Instead of using array / list in init(), we can explore data structure which will arrange the incoming elements in stream in such a way that finding kth element is now faster.

Data Structure Exploration

What data structure would keep the order of the element in largest or smallest sequence and look up ( finding kth in the order) is optimum than array / list ? Ans : Priority Queue

There are two types of Priority Queue Screen Shot 2022-02-20 at 11.34.29 AM.png

Max Heap

We could build a max heap which will keep the largest element on the root node. To find the Kth largest, we would have to deepcopy the PQ, then pop the root node for k times to get the Kth largest.

Min Heap

We could build a min Heap which will keep the smallest element on the root node. As we do not care for smallest element, we do not need extra space to deep copy and simply start popping the elements until the tree size == k. At this point, the root node is the Kth largest element.

We can further optimize, to only maintain k elements in the PQ and a new element when added to PQ size > k, we can throw the smallest element on the root node, which is O(1) operation. The size of the PQ tree remains k elements.

Concept Map

Putting everything together, we have now build the concept map.

For problems like finding Kth Largest/Smallest in sorted, unsorted, or almost sorted array, now we can see a pattern and use this concept map of problem solving!

kthLargestConceptMap.png

Code for Optimum Solution

class KthLargest {

    private PriorityQueue<Integer> minHeap = new PriorityQueue<>((a,b) -> a-b);
    private int capacity ;

    public KthLargest(int k, int[] nums) {
        capacity= k;
        for(int num:nums){
            minHeap.offer(num);
            if(minHeap.size() > capacity){
                minHeap.poll();
            }
        }
    }

    public int add(int val) {
        minHeap.add(val);
        if(minHeap.size() > capacity){
                minHeap.poll();
        }
        return minHeap.peek();
    }
}

Questions ๐Ÿค”

  • Constraints to this problem guaranteed that there will be at least k elements in the array when you search for the kth element.
  • In the event the constraint wasn't guaranteed, what would you have to consider in your implementation?

Author : Rasika.Warade

ย 
Share this