Looking at std::nth_element the other day, I noticed that it’s complexity is O(n) for a collection of size n, and wondered how that was achieved. A basic implementation of an algorithm to find the ith-biggest element might start by sorting the collection and index to the ith afterwards – complexity O(n*log(n)):

int ith_element_by_sorting( std::vector<int> input, int i ) { std::sort( std::begin(input), std::end(input) ); return input[i]; }

It’s a small step to realise then that you don’t need to sort all n elements of the collection – only the first i elements, so complexity O(n*log(i)):

int ith_element_by_partial_sort( std::vector<int> input, int i ) { std::partial_sort( std::begin( input ), std::begin( input ) + i + 1, std::end( input ) ); return input[i]; }

But the real trick is that you don’t need to do any sorting at all. That’s the approach taken by quickselect, which is the selection sibling to quicksort, and achieves O(n) complexity on average:

int partition( std::vector<int>& values, int left, int right, int pivot ) { auto pivotValue = values[ pivot ]; std::swap( values[pivot], values[right] ); // Move pivot to end auto store_pos = left; for ( int j = left; j < right; ++j ) { if ( values[j] < pivotValue ) { std::swap( values[ store_pos ], values[j] ); ++store_pos; } } std::swap( values[right], values[ store_pos ] ); // Move pivot to its final place return store_pos; } int quickselect( std::vector<int> values, int left, int right, int i ) { // Unordered - no need to sort values at all, // instead we recursively partition only the subset of values // containing the i'th element, until we have either // a) trivial subset of 1 // b) pivot is moved to exactly the location we wanted while( 1 ) { if ( left == right ) { return values[left]; } // Pick a pivot from middle of values. // Better options are a random pivot or median of 3 auto pivot = (left + right)/2; // Move anything smaller than values[pivot] to the left of pivot, // and return updated position of pivot pivot = partition( values, left, right, pivot ); if ( pivot == i ) { return values[i]; } else if ( i < pivot ) { right = pivot - 1; } else { left = pivot + 1; } } } int ith_element_by_quickselect( const std::vector<int>& input, int i ) { return quickselect( input, 0, input.size()-1, i ); } int ith_element( const std::vector<int>& input, int i ) { if ( i < 0 || i >= input.size() ) { std::ostringstream ss; ss << "Input '" << i << "' outside range [0," << input.size() << ")"; throw std::out_of_range( ss.str() ); } return ith_element_by_quickselect( input, i ); }

Here’s some test code to check that the implementation works:

template <typename F, typename T> void should_be( T t, F f, const std::string& message ) { try { std::ostringstream ss; auto got = f(); if ( got != t ) { ss << message << " got " << got << ", expected " << t; throw std::runtime_error( ss.str() ); } else { ss << "OK: " << message << ": got " << got; std::cout << ss.str() << "\n"; } } catch( const std::exception& ex ) { // Report error if either f() threw or we found unexpected value std::cout << "ERROR: " << ex.what() << "\n"; } } template <typename F> void should_throw( F f, const std::string& message ) { try { f(); } catch( const std::exception& ex ) { std::cout << "OK: " << message << ": threw \"" << ex.what() << "\"\n"; return; } std::cout << "ERROR: " << message << " should have thrown\n"; } #define SHOULD_BE( t, expr ) should_be( t, [](){ return expr; }, #expr ) #define SHOULD_THROW( expr ) should_throw( [](){ expr; }, #expr ) void testIthElement() { SHOULD_THROW( ith_element( {}, 0 ) ); SHOULD_THROW( ith_element( {1,2}, -1 ) ); SHOULD_THROW( ith_element( {1,2,3}, 3 ) ); SHOULD_BE( 1, ith_element( {1}, 0 ) ); SHOULD_BE( 0, ith_element( {0,1,2,3,4,5,6,7,8}, 0 ) ); SHOULD_BE( 2, ith_element( {0,1,2,3,4,5,6,7,8}, 2 ) ); SHOULD_BE( 6, ith_element( {5,4,7,6,1,2,0,8,3}, 6 ) ); SHOULD_BE( 8, ith_element( {5,4,7,6,1,2,0,8,3}, 8 ) ); SHOULD_BE( 5, ith_element( {5,5,5,5,5,5}, 1 ) ); }

Here’s the output:

In fact, reviewing old posts on this blog, I found this link that dates back to 2013, when std::nth_element was only required by the standard to have worst-case O(N^2) complexity, with O(N) on average – precisely what you’d get from quickselect. Now though, thanks to introselect, O(N) is possible.