How to efficiently find the n’th biggest element in a collection

Looking at std::nth_element the other day, I noticed that it’s complexity is O(n) for a collection of size n, and wondered how that was achieved. A basic implementation of an algorithm to find the ith-biggest element might start by sorting the collection and index to the ith afterwards – complexity O(n*log(n)):

int ith_element_by_sorting( std::vector<int> input, int i )
{
    std::sort( std::begin(input), std::end(input) );
    return input[i];
}

It’s a small step to realise then that you don’t need to sort all n elements of the collection – only the first i elements, so complexity O(n*log(i)):

int ith_element_by_partial_sort( std::vector<int> input, int i )
{
    std::partial_sort( std::begin( input ), std::begin( input ) + i + 1, std::end( input ) );
    return input[i];
}

But the real trick is that you don’t need to do any sorting at all. That’s the approach taken by quickselect, which is the selection sibling to quicksort, and achieves O(n) complexity on average:

int partition( std::vector<int>& values, int left, int right, int pivot )
{
     auto pivotValue = values[ pivot ];

     std::swap( values[pivot], values[right] ); // Move pivot to end
     auto store_pos = left;

     for ( int j = left; j < right; ++j )
     {
         if ( values[j] < pivotValue )
         {
             std::swap( values[ store_pos ], values[j] );
             ++store_pos;
         }
     }

     std::swap( values[right], values[ store_pos ] );  // Move pivot to its final place

     return store_pos;
}

int quickselect( std::vector<int> values, int left, int right, int i )
{
    // Unordered - no need to sort values at all, 
    // instead we recursively partition only the subset of values
    // containing the i'th element, until we have either
    // a) trivial subset of 1
    // b) pivot is moved to exactly the location we wanted 
    while( 1 )
    {
         if ( left == right )
         {
             return values[left];
         }

         // Pick a pivot from middle of values.
         // Better options are a random pivot or median of 3
         auto pivot = (left + right)/2; 

         // Move anything smaller than values[pivot] to the left of pivot,
         // and return updated position of pivot
         pivot = partition( values, left, right, pivot );

         if ( pivot == i )
         {
             return values[i];
         }
         else if ( i < pivot )
         {
             right = pivot - 1;
         }
         else
         {
             left = pivot + 1;
         }
    }
}

int ith_element_by_quickselect( const std::vector<int>& input, int i )
{
    return quickselect( input, 0, input.size()-1, i );
}

int ith_element( const std::vector<int>& input, int i )
{
    if ( i < 0 || i >= input.size() )
    {
        std::ostringstream ss;
        ss << "Input '" << i << "' outside range [0," << input.size() << ")";
        throw std::out_of_range( ss.str() );
    }

    return ith_element_by_quickselect( input, i );
}

Here’s some test code to check that the implementation works:

template <typename F, typename T>
void should_be( T t, F f, const std::string& message )
{
    try
    {
        std::ostringstream ss;

        auto got = f();
        if ( got != t )
        {
            ss << message << " got " << got << ", expected " << t;
            throw std::runtime_error( ss.str() );
        }
        else
        {
            ss << "OK: " << message << ": got " << got;
            std::cout << ss.str() << "\n";
        }
    }
    catch( const std::exception& ex )
    {
        // Report error if either f() threw or we found unexpected value
        std::cout << "ERROR: " << ex.what() << "\n";
    }
}

template <typename F>
void should_throw( F f, const std::string& message )
{
    try
    {
        f();
    }
    catch( const std::exception& ex )
    {
        std::cout << "OK: " << message << ": threw \"" << ex.what() << "\"\n";
        return;
    }

    std::cout << "ERROR: " << message << " should have thrown\n";
}

#define SHOULD_BE( t, expr ) should_be( t, [](){ return expr; }, #expr )
#define SHOULD_THROW( expr ) should_throw( [](){ expr; }, #expr )

void testIthElement()
{
    SHOULD_THROW( ith_element( {}, 0 ) );
    SHOULD_THROW( ith_element( {1,2}, -1 ) );
    SHOULD_THROW( ith_element( {1,2,3}, 3 ) );

    SHOULD_BE( 1, ith_element( {1}, 0 ) );
    SHOULD_BE( 0, ith_element( {0,1,2,3,4,5,6,7,8}, 0 ) );
    SHOULD_BE( 2, ith_element( {0,1,2,3,4,5,6,7,8}, 2 ) );
    SHOULD_BE( 6, ith_element( {5,4,7,6,1,2,0,8,3}, 6 ) );
    SHOULD_BE( 8, ith_element( {5,4,7,6,1,2,0,8,3}, 8 ) );
    SHOULD_BE( 5, ith_element( {5,5,5,5,5,5}, 1 ) );
}

Here’s the output:
QuickSelect

In fact, reviewing old posts on this blog, I found this link that dates back to 2013, when std::nth_element was only required by the standard to have worst-case O(N^2) complexity, with O(N) on average – precisely what you’d get from quickselect. Now though, thanks to introselect, O(N) is possible.

How to efficiently find the n’th biggest element in a collection

Leave a comment Cancel reply

Recent Posts

Top Posts & Pages

Categories

Tags

How to efficiently find the n’th biggest element in a collection

Share this:

Related

Leave a comment Cancel reply

Recent Posts

Top Posts & Pages

Categories

Tags