Monday, October 31, 2011

IteratorIterator - PHP Inconsistencies And WTFs

Last night I was discussing the SPL Iterators with @go_oh in the PHP StackOverflow chat room when I came across something very interesting.  We were talking about why some of the SPL Iterators accept only an Iterator as the constructor argument (Such as LimitIterator), and others accept either an Iterator or an IteratorAggregate as the argument (Such as IteratorIterator).  Feeling that this would be a useful feature to add (having all of them accept an IteratorAggregate), I opened up the PHP source and started looking at how hard of a change this would be.  What I found was... Interesting...

The C Code.


So I started by opening up ext/spl/spl_iterators.c.  I started off by looking at the constructor for IteratorIterator as that's what I would model the rest off of.  You can see the source of it here.  We can clearly see (even if you don't know C well) that it's calling another function with a few parameters.  So, to figure out what it does, let's look at that function (With lxr.php.net, all you need to do is click on a function name to be taken to its definition).

If we look at spl_dual_it_construct, we can see that it's basically a centralized constructor body for most of the SPL Iterators.  The majority of it is normal code for PHP and was expected.  What wasn't expected was inside of the body for the switch statement for the case DIT_IteratorIterator.  At first, it seems fairly straight forward, until we hit this block:

if (!instanceof_function(ce, zend_ce_iterator TSRMLS_CC)) {
    if (ZEND_NUM_ARGS() > 1) {
        if (zend_lookup_class(class_name, class_name_len, &pce_cast TSRMLS_CC) == FAILURE 
     || !instanceof_function(ce, *pce_cast TSRMLS_CC)
            || !(*pce_cast)->get_iterator
        ) {
            zend_throw_exception(spl_ce_LogicException, "Class to downcast to not found or not base class or does not implement Traversable", 0 TSRMLS_CC);
            zend_restore_error_handling(&error_handling TSRMLS_CC);
            return NULL;
        }
        ce = *pce_cast;
    }

Let's walk through that code to see why it's such a WTF.  To preface this, let's talk about the variables and functions that are defined here.  `ce` is a variable holding the object that was passed to the constructor as the first parameter.  ZEND_NUM_ARGS() is basically the C equivalent of func_num_args() in PHP.  Zend_lookup_class takes a string and returns the class structure for that string (Vaguely similar to ReflectionClass).  And instanceof_function() is basically the internal way of calling $foo instanceof bar.  The rest of it should either be easy to understand, or isn't really that important anyway.

So, the first if statement basically checks if the object that was passed in implements Iterator or not.  Remember that since IteratorIterator accepts a Traversable object, it could be either a Iterator or an IteratorAggregate.  So the entire branch is conditional upon the object being an IteratorAggregate.  The second if statement checks to see if we have a second argument (which hereto is undocumented).  So far so good.

Here's where it starts to get interesting.  The next if statement has three parts.  The first one determines if the string passed as the second argument is a valid class (and looks up the definition to store it in pce_cast).  If it can't find the class specified, it throws an exception.  The second part of the if statement is making sure that the object that was passed as the first parameter is an instance of the class provided as a second parameter (that the class name passed as the second parameter is in the class hierarchy for the object).  The final part simply ensures that the found class implements Traversable as well (can function as an iterator).

If all three of parts pass, something very interesting happens.  The original object is cast down to the specified class.  Now I know what you're thinking: Cast down?  PHP has no concept of casting...?  Well, in this case, it appears it does.

What it means


So basically, the second parameter to IteratorIterator controls which method is used to get the iterator.  To best understand it, let's use it in an example...  First, let's define a few classes.
class Foo implements IteratorAggregate {
    public function getIterator() {
        return new ArrayIterator(array(1, 2, 3));
    }
}

class Bar extends Foo {
    public function getIterator() {
        return new ArrayIterator(array(4, 5, 6));
    }
}

class Baz extends Bar {
    public function getIterator() {
        return new ArrayIterator(array(7, 8, 9));
    }
}

Now that we have them defined, let's go ahead and try iterating over them a few times to see what our behavior is.
var_export(iterator_to_array(new Foo, true));
// array(1, 2, 3)

var_export(iterator_to_array(new Bar, true));
// array(4, 5, 6)

var_export(iterator_to_array(new Baz, true));
// array(7, 8, 9)

That's exactly what we would expect.  But now let's look at iterating over them again, this time using IteratorIterator instead.
var_export(iterator_to_array(new IteratorIterator(new Foo), true));
// array(1, 2, 3)

var_export(iterator_to_array(new IteratorIterator(new Bar), true));
// array(4, 5, 6)

var_export(iterator_to_array(new IteratorIterator(new Baz), true));
// array(7, 8, 9)

So far, so good. But now, let's pass the hereto unknown and undocumented second parameter. Since we want to downcast, it wouldn't make sense to pass Foo for the new Foo one. So let's pass "Foo" to the bar one...
var_export(iterator_to_array(new IteratorIterator(new Bar, "Foo"), true));
// array(1, 2, 3)

Did you see what happened there? It used the method defined in Foo instead to iterate over. So we just told it to use a method from a child class instead of the parent! Let's look at another example:
var_export(iterator_to_array(new IteratorIterator(new Baz, "Bar"), true));
// array(4, 5, 6)

And let's look at a failure:
new IteratorIterator(new Baz, "StdClass");
// LogicException("Class to downcast to not found or not base class or does not implement Traversable")

It's quite interesting that we can downcast like this. I did some quick searches in core, and this appears to be the only place that does this.

Finally


Personally, I can't see any use-case for this. Doing anything with it requires explicit knowledge of the class tree when creating an IteratorIterator object. So I'm not even sure what in practice it would be useful for. So I don't know whether it being there classifies as a bug, or if it is just another "weird quirk" of PHP.

When I saw it, all I could say was "WTF"...

7 comments:

  1. Yes they are useful. Consider a case like this:

    class Base {
    public function getIterator() { new stdclass; }
    public function asArray() {
    return iterator_to_array(new IteratorITerator($this));
    }
    }
    class Extended extends Base implements IteratorAggregate {
    public function getIterator() { throw new Exception(); }
    }

    $o = new Extended;
    $o->asArray();

    This will always use Extended's getIterator() not the Base's classes. Whereas an foreach($this) in Base will use Base's getIterator().

    This comes particular "interesting" with internal classes implementing Traversable which no necessarily provide a getIterator() method.

    This can be considered bad and whatever but can be very handy in a few cases.

    ReplyDelete
  2. I can't understand the utility of your code, because all of these examples will throw an exception.


    $o = new Base;
    $o->asArray(); // $this inside of asArray is not an iterator
    foreach($o as $j){ } // stdclass is not an iterator

    $o = new Extended;
    $o->asArray(); $this inside of asArray is not an iterator

    foreach($o as $j){} //getIterator always throws an exception

    ReplyDelete
  3. yeah, the stdclass and exception were there due to an copy and paste error (the system here lost my comment ... so i quickly tried to recover it out of snippets i had, should have typed it again) you can replace it with any iterator ...

    ReplyDelete
  4. @Johannes,

    Just because you can think of a use for something doesn't mean that it's useful. The question is whether it's a solution looking for a problem or a solution to a problem. From what I've seen so far, it's the former.

    Right now, the only way to do something similar in areas other than iterating is to use reflection to make the call. So if this functionality is needed, it should be on reflection ($reflectionClass->getIterator(), or something similar).

    And considering the lack of documentation, I'm really curious how "handy" it truly is. I'd wage a bet that not many people at all know it exists, and out of those I wonder how many use it. So if there hasn't been a significant enough need in the past 6 years it's existed to be documented, I really doubt that it is necessary (especially here).

    And just to point out, the reason that I have not documented it myself is that I don't feel it should be documented. My opinion on it is that it should be removed, or at least moved to reflection...

    But thanks for the thoughts!

    ReplyDelete
  5. I'll agree with ircmaxell. This is probably a case of, "someone thought it might have a use case at some point in the future." It was included because of this and then promptly forgotten because no one remembered why anyone would want this.

    Actually, its presence is decidedly a *bad idea* -- it encourages violation of encapsulation which is the entire bloody point of OOP to begin with.

    ReplyDelete
  6. I assume it was needed for some thing and was therefore implemented. Why should their be such limitations? Try to create a clean approach. The option doesn't harm. Changing it and kicking it out is more risky (change is risk of breaking something) than keeping it. If you want a "clean" language you're completely wrong with PHP ;-)

    ReplyDelete
  7. @Johannes: I'm not about to start an RFC to pull it from core. Just questioning if it should be documented or remain an undocumented feature to be used at your own risk.

    As far as creating a clean approach, I would add it to ReflectionClass. That way you can get an iterator from reflection class which basically does the same thing, but it's separate and located to an area of the core that's supposed to be voodoo.

    Thanks for the commment though!

    ReplyDelete