Category Archives: C++ Code

How to efficiently find the n’th biggest element in a collection

Looking at std::nth_element the other day, I noticed that it’s complexity is O(n) for a collection of size n, and wondered how that was achieved. A basic implementation of an algorithm to find the ith-biggest element might start by sorting the collection and index to the ith afterwards – complexity O(n*log(n)):

int ith_element_by_sorting( std::vector<int> input, int i )
{
    std::sort( std::begin(input), std::end(input) );
    return input[i];
}

It’s a small step to realise then that you don’t need to sort all n elements of the collection – only the first i elements, so complexity O(n*log(i)):

int ith_element_by_partial_sort( std::vector<int> input, int i )
{
    std::partial_sort( std::begin( input ), std::begin( input ) + i + 1, std::end( input ) );
    return input[i];
}

But the real trick is that you don’t need to do any sorting at all. That’s the approach taken by quickselect, which is the selection sibling to quicksort, and achieves O(n) complexity on average:

int partition( std::vector<int>& values, int left, int right, int pivot )
{
     auto pivotValue = values[ pivot ];

     std::swap( values[pivot], values[right] ); // Move pivot to end
     auto store_pos = left;

     for ( int j = left; j < right; ++j )
     {
         if ( values[j] < pivotValue )
         {
             std::swap( values[ store_pos ], values[j] );
             ++store_pos;
         }
     }

     std::swap( values[right], values[ store_pos ] );  // Move pivot to its final place

     return store_pos;
}

int quickselect( std::vector<int> values, int left, int right, int i )
{
    // Unordered - no need to sort values at all, 
    // instead we recursively partition only the subset of values
    // containing the i'th element, until we have either
    // a) trivial subset of 1
    // b) pivot is moved to exactly the location we wanted 
    while( 1 )
    {
         if ( left == right )
         {
             return values[left];
         }

         // Pick a pivot from middle of values.
         // Better options are a random pivot or median of 3
         auto pivot = (left + right)/2; 

         // Move anything smaller than values[pivot] to the left of pivot,
         // and return updated position of pivot
         pivot = partition( values, left, right, pivot );

         if ( pivot == i )
         {
             return values[i];
         }
         else if ( i < pivot )
         {
             right = pivot - 1;
         }
         else
         {
             left = pivot + 1;
         }
    }
}

int ith_element_by_quickselect( const std::vector<int>& input, int i )
{
    return quickselect( input, 0, input.size()-1, i );
}

int ith_element( const std::vector<int>& input, int i )
{
    if ( i < 0 || i >= input.size() )
    {
        std::ostringstream ss;
        ss << "Input '" << i << "' outside range [0," << input.size() << ")";
        throw std::out_of_range( ss.str() );
    }

    return ith_element_by_quickselect( input, i );
}

Here’s some test code to check that the implementation works:

template <typename F, typename T>
void should_be( T t, F f, const std::string& message )
{
    try
    {
        std::ostringstream ss;

        auto got = f();
        if ( got != t )
        {
            ss << message << " got " << got << ", expected " << t;
            throw std::runtime_error( ss.str() );
        }
        else
        {
            ss << "OK: " << message << ": got " << got;
            std::cout << ss.str() << "\n";
        }
    }
    catch( const std::exception& ex )
    {
        // Report error if either f() threw or we found unexpected value
        std::cout << "ERROR: " << ex.what() << "\n";
    }
}

template <typename F>
void should_throw( F f, const std::string& message )
{
    try
    {
        f();
    }
    catch( const std::exception& ex )
    {
        std::cout << "OK: " << message << ": threw \"" << ex.what() << "\"\n";
        return;
    }

    std::cout << "ERROR: " << message << " should have thrown\n";
}

#define SHOULD_BE( t, expr ) should_be( t, [](){ return expr; }, #expr )
#define SHOULD_THROW( expr ) should_throw( [](){ expr; }, #expr )

void testIthElement()
{
    SHOULD_THROW( ith_element( {}, 0 ) );
    SHOULD_THROW( ith_element( {1,2}, -1 ) );
    SHOULD_THROW( ith_element( {1,2,3}, 3 ) );

    SHOULD_BE( 1, ith_element( {1}, 0 ) );
    SHOULD_BE( 0, ith_element( {0,1,2,3,4,5,6,7,8}, 0 ) );
    SHOULD_BE( 2, ith_element( {0,1,2,3,4,5,6,7,8}, 2 ) );
    SHOULD_BE( 6, ith_element( {5,4,7,6,1,2,0,8,3}, 6 ) );
    SHOULD_BE( 8, ith_element( {5,4,7,6,1,2,0,8,3}, 8 ) );
    SHOULD_BE( 5, ith_element( {5,5,5,5,5,5}, 1 ) );
}

Here’s the output:
QuickSelect

In fact, reviewing old posts on this blog, I found this link that dates back to 2013, when std::nth_element was only required by the standard to have worst-case O(N^2) complexity, with O(N) on average – precisely what you’d get from quickselect. Now though, thanks to introselect, O(N) is possible.

Leave a comment

Filed under C++, C++ Code, Programming

Tech Book: Effective Modern C++, Scott Meyers

EffectiveModernC++When Scott Meyers announced his retirement from C++ duties, I thought I’d get a copy of his latest book. There’s plenty in it to interest even seasoned C++ developers – as always, Scott’s insight and in-depth examples are worth studying. I’ve tried out most of the C++11 features, but still learnt a lot working through this book.

Uniform Initialisation and Auto
Meyers points out that compared to parentheses or just plain “=”, braced initialisation can be used in the most contexts, avoids his famous “C++’s most vexing parse”, and will issue compiler errors if narrowing occurs. However, for types that can take a std::initialiser_list in their constructor, the compiler is required to favour std::initialiser_list over any other interpretation. That’s a deterrent to use of braced initialisation with types that have a std::initializer_list constructor – and also precludes using auto with braced initialisation.

auto x = { 4 }; // type of x is inferred to be std::initializer_list&lt;int&gt;!

How to use the new “= delete” idiom
Meyers recommends declaring deleted methods as public to get better error messages from compilers that check for accessibility before deleted status. He also points out that you can declare non-member functions as deleted (I’d only seen examples of copy-constructor/copy-assignment operator as deleted before).

Use of noexcept on move operations
This one is a classic example of a combination of language features having an unexpected effect. std::vector::push_back offers the strong exception guarantee. It can achieve this using any copy constructor and being sure not to change the original contents of the vector until any new elements or copies have been constructed. Hence, move constructors are only preferred if it is exception-safe to do so – which means they must be marked as noexcept (and if that isn’t appropriate, you just won’t get move semantics pushing instances of your type into a std::vector).

Recursion with constexpr
Having toyed with template meta-programming in the past, this use of constexpr appeals to me:

constexpr int pow( int b, int exp ) noexcept
{
    return (exp == 0 ? 1 : b * pow( b, exp-1 ));
}
constexpr auto eight = pow( 2, 3 );

void testConstExpr()
{
    std::cout &lt;&lt; &quot;2^3 = &quot; &lt;&lt; eight &lt;&lt; &quot;\n&quot;;
}

It’s much more succinct than the equivalent TMP version, and still performs the calculation at compile time.

Summary
In the past, I’ve recommended Effective C++ for developers who have little experience of the language and wish to improve their skills. I think this book is a bit too advanced for that, particularly given the chapters on template type deduction and decltype being in the first part of the book! So read this book if you’ve got a few years’ experience of C++ already, but look at Effective C++ (3rd Edition) to improve on the basics.

Four stars

Leave a comment

Filed under C++, C++ Code, Programming, Tech Book

How to write generic, duck-typing code in F#

When writing generic code in C++, it’s easy to imply constraints in the generic type when developing a template. For example:

template<class T> 
class Announcer{
public:
	static void announce( const T& t )
	{
		std::cout << "And now from " <<  t.country << ", we have: ";
		bool first = true;
		for ( auto person : t.people )
		{
			std::cout << person;
			if (!first) std::cout << "\n";
			first = false;
		}
	}
};

Here, we’re assuming that T has two properties, ‘country’ and ‘people’, with the latter being enumerable (i.e. supports begin/end methods and iterators). This is one of the attractions of C++ for generic code – although, in the future, concepts will enable you to document the constraints in order to make it easier for developers to use your library, and for the compiler to give clearer errors.

What about something similar in F#? I found Tomas Petricek’s post from StackOverflow very useful.

type IHasCountryProperty =
    abstract Country : string

type Announcer<'T when 'T :> IHasCountryProperty>( item : 'T) =
    member this.Announce() =
        printfn "And now the party from %s" item.Country

let inline countrifiedItem<'T when 'T : (member Country : string)> (item : 'T) =
    {
        new IHasCountryProperty with
            member this.Country = (^T : (member Country : string) item)
    }

type London() =
    member this.Country = "England"

// Use it
let city = London()
Announcer( countrifiedItem city ).Announce()

The Announcer class uses a member constraint with statically resolved type parameters which bind to the Country property. Then, the inline static function ‘countrifiedItem’ uses static member constraints to capture the existing Country property on any other type, even if like ‘London’ it doesn’t support the IHasCountryProperty. The consensus seems to be that this approach is encouraged because it documents the requirements on ‘T in an interface.

Leave a comment

Filed under C++, C++ Code, F#, Programming

C++: std::make_array (N4315)

I read in May 2015’s CVU that there’s a proposal for std::make_array, a utility method in the same family as std::make_tuple and std::make_pair.

This would be a useful shorthand:

// Existing usage
std::array<double, 3> a = { 1.1, 2.2, 3.3 };

// Proposed usage
auto a = std::make_array( 1.1, 2.2, 3.3 };

The obvious benefit is that you would no longer need to specify the number of elements in the array, making the code more maintainable.

Leave a comment

Filed under C++, C++ Code

How to achieve pattern matching in C++ using boost::regex

Suppose you are parsing text input and need to handle the string representation of a field that could have several formats. In that case, using regular expressions is appealing because you can try a number of patterns sequentially (taking the most likely first) and exit when you get a match.

When I tried to do this in C++, I found it hard to find an easy example to follow. Here’s the way I’ve been doing it, based on a simple example to parse a length that could be in any of several units of measure:

    #include <boost/algorithm/string_regex.hpp>

    void parse( const std::string& candidate )
    {
        boost::smatch value;

        if ( boost::regex_search( candidate, value, boost::regex( "(.*)ft(.*)in" )) )
        {
            auto feet = atol( value[1].str().c_str() );
            auto inches = atof( value[2].str().c_str() );
            std::cout << "Matched " << feet << " feet and " << inches << " inches\n";
        }

        if ( boost::regex_search( candidate, value, boost::regex( "(.*)m(.*)cm" )) )
        {
            auto metres = atol( value[1].str().c_str() );
            auto centimetres = atof( value[2].str().c_str() );
            std::cout << "Matched " << metres << " metres and " << centimetres << " centimetres\n";
        }

        if ( boost::regex_search( candidate, value, boost::regex( "([0-9]+)mm" )) )
        {
            auto millimetres = atol( value[1].str().c_str() );
            std::cout << "Matched " << millimetres << " millimetres\n";
        }

        throw std::runtime_error( (boost::format( "Failed to match candidate '%1%' with ft/in, m/cm or mm" ) % candidate).str() );
    }

This only uses a fraction of what can be done with regex – the point is to show how to use boost::regex. One gotcha is that, as per the code above, the string matches that are written into value are indexed from 1 – for some reason, the zero’th index accesses the whole candidate expression. Another gotcha is that boost::regex is one of the few boost libraries that isn’t just implemented using templates in a header file, so you also have to add the appropriate .lib into the linker inputs.

1 Comment

Filed under C++, C++ Code, Programming

How to generate tests under GTest

One of the many features provided by GTest is the ability to generate test at runtime. One useful application of this is that you can execute data-driven testing by obtaining a list of test file names, then generating a test for each of them. It’s actually very simple to do this in GTest – they call these value-parameterized tests (see documentation). Here’s a snippet of code to demonstrate how little overhead is involved:

#include <gtest/gtest.h>

using namespace testing;

// placeholder - could be used for test setup/cleanup
class MyTest : public ::testing::TestWithParam<std::string>
{
};

TEST_P( MyTest, IntegrationTest )
{
    std::string test_file_name = GetParam();

    // open file and run the integration test
}

std::vector<std::string> test_file_names();

INSTANTIATE_TEST_CASE_P( MyLibraryName, MyTest, test::ValuesIn( test_file_names() ) );

On one hand, this approach isn’t really in the spirit of unit test – data-driven tests have a habit of quickly escalating to integration tests between libraries. However, such integration tests also have their place in a testing strategy, and this is a neat way to accomplish it with minimal overhead.

1 Comment

Filed under C++, C++ Code, Programming

How to get file names under a folder in C++

As per my earlier post How to transform between a double date-time and std::string in C++, some things are surprisingly hard in C++, especially compared to writing in a .NET language. I recently tracked down this useful snippet of code on the internet in order to recursively build a list of files matching some file extension under a folder. This only compiles on Windows – if you want something cross-platform, look at boost (I wanted to avoid bringing the boost filesystem library into my project as yet another dependency).

void accumulate_files(
  const std::string& folder,
  const std::string& extension,
  std::vector<std::string>& file_names )
{
  std::ostringstream oss;
  oss << folder << "\\";
  std::string search_path = oss.str();

  WIN32_FIND_DATA fd;
  HANDLE hFind = ::FindFirstFile( search_path.c_str(), &fd );
  if( hFind != INVALID_HANDLE_VALUE)
  {
    do
    {
      std::string file_name = fd.cFileName;
      std::ostringstream full_file_name;
      full_file_name << folder << "\\" << file_name;

      if ( boost::algorithm::ends_with( file_name, extension ) )
      {
        file_names.push_back( file_name );
      }
      else if ( file_name != "." && file_name != ".." && !file_name.empty() )
      {
        // Recursively call into next directory
        accumulate_files( full_file_name.c_str(), extension, file_names );
      }
    }
    while( ::FindNextFile(hFind, &fd) );

    ::FindClose(hFind);
  }
}

For production code, you’d also want to wrap the file handle in a smart pointer to ensure it gets closed properly for exception safety.

4 Comments

Filed under C++, C++ Code, Programming

How to emulate C++11’s ‘enum class’ in C++03

One of many neat features in C++11 is the introduction of ‘enum class’ – this addresses problems arising from enum declarations such as scoping (the enum cases leak in the same scope as the enum type) and implicit conversion to integer. See Stroustrup’s C++11 FAQ for more details.

However, in the papers that motivated the new language feature (N1513 and N2347) they discuss the current workarounds for C++03. One such workaround is to define a class to represent the enum type – it’s much more verbose than the new C++11 feature, but it solves the scoping and conversion issues.

// Header file
class Weekday
{
private:
  // Note that the private enum cases have underscores to differentiate
  // from the public cases
  typedef enum Weekday_ { Mon_, Tues_, Wed_, Thurs_, Fri_, Sat_, Sun_ };
  Weekday_ value_;

public:
  static const Weekday Mon, Tues, Wed, Thurs, Fri, Sat, Sun;
  explicit Weekday( const Weekday_& value ) : value_( value ){}

  bool operator<( const Weekday& rhs ) const { return this->value_ < rhs.value_; }
  bool operator==( const Weekday& rhs ) const { return this->value_ == rhs.value_; }
  bool operator!=( const Weekday& rhs ) const { return !(this->operator==(rhs)); }
};

// Source file
// Definitions for the public Weekday instances 
// in terms of the private enum
const Weekday Weekday::Mon( Mon_ );
const Weekday Weekday::Tues( Tues_ );
const Weekday Weekday::Wed( Wed_ );
const Weekday Weekday::Thurs( Thurs_ );
const Weekday Weekday::Fri( Fri_ );
const Weekday Weekday::Sat( Sat_ );
const Weekday Weekday::Sun( Sun_ );

That’s it – I’ve been using this recently and it works pretty well. I also added a “to_int()” member and a static “from_int()” member to explicitly convert bewteen the enum class and integers – the implementation is simply a switch over the enum cases.

8 Comments

Filed under C++, C++ Code, Programming

How to transform between a double date-time and std::string in C++

One of the attractions of writing software using the .NET framework is the wealth of support for doing simple things like translating between different data formats. These tasks are typically much harder to achieve in C++ due to the lack of an equivalent framework. One such task that I came across the other day is that date-times are often represented by a double in Windows, where the integer part represents the date since some epoch and the fractional part is the time as a fraction of 24 hours. Even with access to the Boost library, I still had to do some work to produce a simple transformation in C++.

#include <boost/format.hpp>
#include <boost/date_time/gregorian/gregorian.hpp>
#include <boost/date_time/posix_time/posix_time.hpp>

typedef double DateTime;

namespace
{
  boost::gregorian::date parse_date( DateTime date_time )
  {
      boost::gregorian::date dt = boost::date_time::parse_date<boost::gregorian::date>( "1899-12-30", boost::date_time::ymd_order_iso );
      dt += boost::gregorian::date_duration( static_cast<long>( floor(date_time) ) );
  }

  boost::posix_time::time_duration parse_time( DateTime date_time )
  {
    double fractionalDay = date_time - floor(date_time);
    long milliseconds = static_cast<long>( floor( fractionalDay * 24.0 * 60.0 * 60.0 * 1000.0 + 0.5) );
    return boost::posix_time::milliseconds( milliseconds );
  }
}

std::string to_date_string( DateTime date_time )
{
  boost::gregorian::date dt = parse_date( date_time );
  return (boost::format( "%4-%02d-%02d" ) % dt.year() % dt.month().as_number() % dt.day().as_number()).str();
}

DateTime from_date_string( const std::string& value )
{
  boost::gregorian::date epoch = boost::date_time::parse_date<boost::gregorian::date>( "1899-12-30", boost::date_time::ymd_order_iso);
  boost::gregorian::date dt = boost::date_time::parse_date<boost::gregorian::date>( value, boost::date_time::ymd_order_iso);

  boost::gregorian::date_duration diff = dt - epoch;
  return diff.days();
}

std::string to_date_time_string( DateTime date_time )
{
  boost::gregorian::date date_part = parse_date( date_time );
  boost::posix_time::time_duration time_part = parse_time( date_time );

  long long fractional_seconds = time_part.fractional_seconds();
  boost::date_time::time_resolutions resolution = time_part.resolution();
  if ( resolution == boost::date_time::micro )
  {
    fractional_seconds /= 1000;
  } 
  else
  {
    if (resolution != boost::date_time::milli)
      throw std::logic_error( "Unexpected time resolution" );
  }

  return (boost::format( "%d-%02d-%02d %02d:%02d:%02d.%03d" )
    % date_part.Year() % date_part.month().as_number() % date_part.day().as_number()
    % time_part.hours() % time_part.minutes() % time_part.seconds() % fractional_seconds ).str();
}

DateTime from_date_time_string( const std::string& value )
{
  DateTime date = from_date_string( value );
 
  boost::posix_time::ptime t = boost::posix_time::time_from_string( value );
  double milliseconds = static_cast<double>(t.time_of_day().total_milliseconds());

  return date + (milliseconds / 24.0 / 60.0 / 60.0 / 1000.0);
}

Please comment if you know a more straight-forward way to achieve this transformation, especially using Boost. Syntactically, the code could be simplified using C++11 auto, but I’ve spelt out the types explicitly throughout because I found it helpful to see which parts of the boost library are being used.

1 Comment

Filed under C++, C++ Code, Programming

Going Native 2013 – ‘new’ STL algorithms

I’ve watched a few of the Going Native 2013 videos now and, whilst I was impressed by Herb Sutter’s games written in Cinder, I particularly like the slide and gather algorithms demonstrated by Sean Parent in his C++ seasoning presentation:

template<typename Iter>
auto slide(Iter begin, Iter end, Iter target) -> std::pair<Iter,Iter>
{
    if (target < begin) return std::make_pair( target, std::rotate(target, begin, end) );
    if (end < target) return std::make_pair( std::rotate( begin, end, target ), target );
    return std::make_pair( begin, end );
}

template<typename Iter, typename Pred>
auto gather(Iter begin, Iter end, Iter target, Pred predicate) -> std::pair<Iter,Iter>
{
    auto start = std::stable_partition( begin, target, std::not1(predicate) );
    auto finish = std::stable_partition( target, end, predicate );
    return std::make_pair(start, finish);
}

Whilst I haven’t used std::rotate or std::stable_partition, I’m sure to use these two algorithms built on top of them, probably because their behaviour is so much easier to describe.

Leave a comment

Filed under C++, C++ Code, Programming, Video