Saturday, October 13, 2007

std::vector


std::vector<T> is more intrusive than one might expect. Besides the default and copy constructor T must have an assignment operator. "So what?", you think? Well, look at this:

struct Foo {
    Foo() : bar(42) { }
    const int bar;
};
//...
    std::vector<Foo> v;
    Foo foo;
    v.push_back(foo);

Compiling this program makes the g++ dump this error:
error: non-static const member 'const int Foo::bar', 
can't use default assignment operator

Hmm, right. The default assignment operator does a bit-wise copy but the const member is not allowed to be overwritten. So the compiler asks you, what to do.

What could be a meaningful definition of the assignment operator? What if the member const variable differs from the operant's one? I think there is no general answer to this question. But perhaps we can do something if we know which objects are assigned to each other. Let's illuminate the vector class to find out.

class Illuminate {
//... see the Illuminate post in the C++ section
};

int main()
{
    std::vector<Illuminate> v;
    Illuminate object;
    v.push_back(object);
}

Running this program shows:
default constructor 0
copy constructor 1 <- 0
destructor 0
destructor 1

Hmm, operator= is not called, altough the compiler was eager to get one. Why is this? Let's find out what is going on by diving into GNU's libstdc++ source code. In bits/stl_vector.h we find vector::push_back: (reformatted for the sake of readability)

//stl_vector.h
void push_back(const value_type& __x) {
    if (this->_M_impl._M_finish != this->_M_impl._M_end_of_storage) {
        this->_M_impl.construct(this->_M_impl._M_finish, __x);
        ++this->_M_impl._M_finish;
    } else {
        _M_insert_aux(end(), __x);
    }
}

What this code does, is: if the capacity of the vector was not reached we let the allocator construct a new object at the right place (copy constructor and placement new, by default) and increase the size of the vector. No operator= yet. So let's see what _M_insert_aux does which is in bits/vector.tcc. To get the point you must know that this function is also called from iterator insert(iterator pos, const T& x).

void _M_insert_aux(iterator __position, const _Tp& __x) {
    if (this->_M_impl._M_finish != this->_M_impl._M_end_of_storage) {
        this->_M_impl.construct(this->_M_impl._M_finish, 
                            *(this->_M_impl._M_finish - 1));
        ++this->_M_impl._M_finish;
        _Tp __x_copy = __x;
        std::copy_backward(__position, 
                            iterator(this->_M_impl._M_finish-2), 
                            iterator(this->_M_impl._M_finish-1));
        *__position = __x_copy;
    } else {
    //... reallocate
    }
}

This code again checks if the capacity was reached. If not the last element is copied to the end and the size of the vector is increased by one. Then all elements from the position of the new element to the former end are copied one place to the right. Finally the new element is placed at it's position by overwriting the old element with the assignment operator. Here it is.

But if you look at the code flow you see that in our case the first branch of the if statement is never executed as the push_back function calls this method only if the condition is false. This means I must define a operator= with unclear semantics just because it is never used m(


If I can guarante that the assignment operator is never called at runtime I could implement it like

Foo& operator=(const Foo& rhs) { 
    abort();
    return *this;
}

To find out if this abortion will ever be triggered let's modify the STL by removing the use of the assignment operator when the compiler complains and recompile the program. By doing so we find out that the only places where the assignment operator is used in our case is the one we found above and the implementation of std::copy_backward which is called from within the same branch. If we remove the content of this branch from _M_insert_aux our example compiles.
   
This means that our example would work with this unusual assignment operator. But generally this is no solution. As soon as we forget about this topic and extend our code to use other vector functions the assignment operator can be triggered.

Please note that this is no GNU specific problem here. The ISO C++ standard requires that T has an assignment operator (see section 23.2.4.3). I just showed on the example of GNU's STL implementation where this can lead to.

I think this is a design bug of the STL. Instead of using the assigment operator one can destroy the old element and use the copy constructor to place the new one. And don't tell me about efficiency. As no memory allocation or deallocation must be done and the assignment opertator typically does the same work as the copy constructor there is only one destructor call extra. Furthermore C++ is full of traps and inconsistencies to get some doubtful efficiency.

No comments:

Post a Comment