Wednesday, August 23, 2006

copy_n Looses Input Tokens


Dealing with input streams it is impossible to obtain an iterator which represents the n'th element without consuming the previous n-1 elements. But such an iterator would be needed as end marker for copy. This is where copy_n helps out.

From http://www.sgi.com/tech/stl/copy_n.html
Copy_n is almost, but not quite, redundant. If first [the first parameter] is an input iterator, as opposed to a forward iterator, then the copy_n operation can't be expressed in terms of copy.
Sounds good, but isn't. Using copy_n is cumbersome and error-prone. If your use copy_n the way one IMO expects it should be use you end in losing elements.

#include <ext/algorithm>
#include <iostream>
#include <sstream>
#include <string>
#include <vector>
#include <iterator>

using namespace std;
using namespace __gnu_cxx;

int main()
{
    istringstream in("This is a test string for copy_n! Bye Bye.");
    vector<string> vec;

    // copy four words to the vector
    copy_n(istream_iterator<string>(in), 4, back_inserter(vec));

    /** now we lost one word! **/

    // append a string to the vector
    vec.push_back("oops");

    // append "the rest" of the sentence to the vector
    copy(istream_iterator<string>(in), istream_iterator<string>(), back_inserter(vec));

    // output the vector
    copy(vec.begin(), vec.end(), ostream_iterator<string>(cout, " "));

    return 0;
}

The output of the program is "This is a test oops for copy_n! Bye Bye."
The reason for the missing word is that istream iterators consume elements from the underlaying input stream when they are constructed and every time they are incremented. So the following correct implementation of copy_n results in count + 1 elements being consumed from the input stream.

template <class InputIterator, class Size, class OutputIterator>
OutputIterator copy_n(InputIterator first, Size count, 
                      OutputIterator result)
{
    for ( ; count > 0; --count) {
        *result= *first;
        ++first;
        ++result;
    }
    return result;
}

The element which is consumed by the last call of operator++ is lost. This turns the SGI version of copy_n unusable. I guess thats why the GNU changed the API to return pair<InputIterator, OutputIterator>. So one can rescue the lost element like this:

    // [...]
    pair <istream_iterator <string>, back_insert_iterator <vector <string> > > tmp_it
                =  copy_n(istream_iterator <string> (in), 4, back_inserter (vec));
    vec.push_back(*tmp_it.first);
    // [...]

What a hack! But I don't know any better solution either. The problem arises from the fact that input streams do not behave like other sequences but have side effects when elements are consumed. But nevertheless they are forced to be accessible through a common iterator interface. This must cause problems somewhere...

This issue was reported by Marco Baur. Thanks a lot.

No comments:

Post a Comment