Reading paged LDAP results with PHP is a show-stopper

I was writing the schema introspection code for Zend_Ldap when I came around a problem with Active Directory’s MaxPageSize restriction. By default Active Directory allows only 1000 items to be returned on a single query, a number which is easily exceeded when reading an Active Directory’s classes and especially attributes from the schema tree. OK – one option would be to increase the MaxPageSize variable, but as the component should be usable on every Active Directory server I couldn’t go for that.

The second option that seemed possible makes use of the paged result sets that Active Directory returns on a query. This way led me into the world of LDAP server controls and deep into the ext/ldap source code. There is astonishingly little information on the topic of paged result sets and LDAP server controls in respect of PHP and ext/ldap. To be honest I assume that only one person really looked into this area seriously and even came up with a solution: Iñaki Arenaza (Blog, Twitter, Facebook, LinkedIn). His information provided here is the foundation of this article – the discoveries are absolutely not my work, they are all based on what Iñaki Arenaza dug out. I just wanted to bring a little light into this very specific topic (and summarize what I’ve answered on stackoverflow.com).

To make it short right from the beginning: it’s currently not possible to use paged results from an Active Directory with an unpatched PHP (ext/ldap).

Let’s take a closer look at what’s happening.

Active Directory uses a server control to accomplish server-side result paging. This control is described in RFC 2696 “LDAP Control Extension for Simple Paged Results Manipulation” . LDAP controls, which come in the flavors “server” and “client”, are extensions to the LDAP protocol to provide enhancements – result paging is one example, password policy is another one. Generally ext/ldap offers an access to LDAP control extensions via its ldap_set_option() and theLDAP_OPT_SERVER_CONTROLS and LDAP_OPT_CLIENT_CONTROLS option respectively. To setup the paged control we do need the control-oid, which is 1.2.840.113556.1.4.319, and we need to know how to encode the control-value (this is described in the RFC). The value is an octet string wrapping the BER-encoded version of the following SEQUENCE (copied from the RFC):

realSearchControlValue ::= SEQUENCE {
    size    INTEGER (0..maxInt),
                     -- requested page size from client
                     -- result set size estimate from server
    cookie  OCTET STRING
}

So we can setup the control prior to executing the LDAP desired query:

$pageSize    = 100;
$pageControl = array(
    // the control-oid
    'oid'        => '1.2.840.113556.1.4.319',
    // the operation should fail if the server is not able to support this control
    'iscritical' => true,
    // the required BER-encoded control-value
    'value'      => sprintf ("%c%c%c%c%c%c%c", 48, 5, 2, 1, $pageSize, 4, 0)
);

This allows us to send a paged query to the LDAP/AD server. But how do we know if there are more pages to follow and how do we have to send a query to get the next page of our result set?

The server responds with a result set that includes the required paging information – but PHP lacks a method to retrieve exactly this information from the result set. In fact ext/ldap provides the required function (ldap_parse_result()) but it fails to expose the required seventh and last argument serverctrlsp from the C function ldap_parse_result() in the LDAP API, which contains exactly the information we need to requery for consecutive pages. If we had this argument available in our PHP code, using paged controls would be straight forward:

$l = ldap_connect('somehost.mydomain.com');
$pageSize    = 100;
$pageControl = array(
    'oid'        => '1.2.840.113556.1.4.319',
    'iscritical' => true,
    'value'      => sprintf ("%c%c%c%c%c%c%c", 48, 5, 2, 1, $pageSize, 4, 0)

);
$controls = array($pageControl);

ldap_set_option($l, LDAP_OPT_PROTOCOL_VERSION, 3);
ldap_bind($l, 'CN=bind-user,OU=my-users,DC=mydomain,DC=com', 'bind-user-password');

$continue = true;
while ($continue) {
    ldap_set_option($l, LDAP_OPT_SERVER_CONTROLS, $controls);
    $sr = ldap_search($l, 'OU=some-ou,DC=mydomain,DC=com', 'cn=*', array('sAMAccountName'), null, null, null, null);
    // there's the rub
    ldap_parse_result ($l, $sr, $errcode, $matcheddn, $errmsg, $referrals, $serverctrls);
    if (isset($serverctrls)) {
        foreach ($serverctrls as $i) {
            if ($i["oid"] == '1.2.840.113556.1.4.319') {
                    $i["value"]{8}   = chr($pageSize);
                    $i["iscritical"] = true;
                    $controls        = array($i);
                    break;
            }
        }
    }

    $info = ldap_get_entries($l, $sr);
    if ($info["count"] < $pageSize) {
        $continue = false;
    }

    for ($entry = ldap_first_entry($l, $sr); $entry != false; $entry = ldap_next_entry($l, $entry)) {
        $dn = ldap_get_dn($l, $entry);
    }
}

As you see, the only option to make all this work, is to mess with the ext/ldap source code and compile your own extension. Iñaki Arenaza provides several patches that can be applied to the PHP source to make patching a lot easier. The patches can be found here (last one for PHP 5.2.10 from June 24th 2009) and there is an accompanying blog post available. Iñaki Arenaza even opened an issue in the PHP bug tracker on September 13th 2005 offering his help – but there has been no reaction from the developer’s side. What a great pity.

So, if you have to use paged result sets in an Active Directory environment from within a PHP application you can choose between:

  • patch your ext/ldap and compile your own extension as described in this article
  • raise the MaxPageSize limit in your Active Directory server
  • use a completely different approach bypassing ext/ldap and make use of the appropriate COM components (ADODB) as described here (this only works on Windows machines)

It could have been so easy, if…, yes if only the PHP developers considered applying the available patch to the ext/ldap source code.

November 6, 2009 at 16:52 11 comments

Remove nodes in SimpleXMLElement

Matthias Willerich just picked up my proposal to remove XML nodes from a XML tree when using SimpleXML which I wrote some time ago on stackoverflow.com and posted a small how-to on Content with Style. Nice to know that this idea proved helpful.

July 16, 2009 at 23:56 Leave a comment

Zend_Ldap promoted to standard trunk

After some last modifications to make the original Zend_Ldap tests pass with the new extended Zend_Ldap classes, the ZF teams has promoted the component to the standard trunk.

I assume that we’ll have a public release of the code with the 1.9 version of Zend Framework.

July 10, 2009 at 09:46 5 comments

Extended Zend_Ldap passes penultimate hurdle

After Ralph Schindler (member of the Zend Technologies Zend Framework team) reviewed my Zend_Ldap extensions and successfully ran all unit tests, the last hurdle the component has to pass is the documentation. Currently the documentation is somewhat rudimentary but I think we’ll get the docs ready until the 1.9-release.

It’s been a long way for this component but at last I think we can have a GO for moving the component to trunk shortly.

July 9, 2009 at 12:08 Leave a comment

Screenshots of game sheets-app for Eishockey.net

Currently I’m helping out Eishockey.net with a web-application to allow for the presentation and management of game sheets for the regional ice-hockey leagues in North Rhine-Westphalia. The public part (viewing game sheets, showing statistics) will be integrated in a Joomla installation running JoomLeague, which is a league management module for Joomla. The internal part (creating, editing and managing game sheets) is a stand a lone application.

The application is fully build on Zend Framework and uses its MVC component. Authentication is handled by Joomla itself and is provided via a proxy-script.

The underlying database is tightly integrated into the JoomLeague database to use its resources (teams, schedules, players and so on). However the application only reads from those resources and writes to its own tables only. This minimizes the impact an installation has on the core components.

The user interface resembles the UI used on the main page of Eishockey.net and uses some AJAX enhancements to improve user experience especially within the management backend, although both back- and frontend also work with Javascript disabled – resulting only in a decrease of comfort and speed (when editing a game sheet).

I put together some screenshots to give some impression about the application that will (hopefully) come online in August this year.

Screenshots

Backend

List of games:List of games

Edit game sheet details:ehnet-edit-gamesheet

Edit roster:ehnet-edit-roster

Edit goals:ehnet-edit-goals

Edit penalties:ehnet-edit-penalties

Edit goalkeepers:ehnet-edit-goalkeepers

Frontend

Show goals scored:ehnet-show-goals

Show penalties given:ehnet-show-penalties

Show goalkeepers’ statistics:ehnet-show-goalkeepers

Show players’ statistics:ehnet-show-roster

Tooltip shown when hovering over a player:ehnet-show-roster-tooltip

Show misc. information about the game:ehnet-show-misc

Statistics

Player statistics:ehnet-stats-player

Team statistics:ehnet-stats-team

Features

The list of features includes:

  • editing rosters including setting captains and assistants, setting the position (offense, defense, goalkeeper) and the jersey number
  • editing scored goals
  • editing given penalties
  • editing goalkeepers on ice (goalkeeper replacement) to keep track of which goalkeeper was on ice at which point in time
  • editing meta data such as referees and attendance
  • determination of game-winning- or game-tying-goal
  • determination of game-winning, game-tying and/or game-losing goalkeeper
  • calculation of player statistics
    • games played
    • goals, assists, points
    • powerplay goals, assists, points
    • shorthanded goals, assists, points
    • game-winning-goals, game-tying-goals, game-winning-assists, game-tying-assist
    • penalty minutes
  • calculation of goalkeeper statistics
    • games played in goal
    • games won, games lost, games tied
    • time-on-ice
    • goals against
    • goals against average
    • we currently lack a save percentage as the data gathering for shots on goal will be quite difficult in these lower leagues

At this point in time the application is feature complete and only some minor things like “top-20-scorer in the league xy” are missing. It’s up to the guys at Eishockey.net to get the thing online now.

July 9, 2009 at 11:36 1 comment

Update on strcoll() UTF-8 issue

I just stumpled upon a comment in the Zend Framework Issue Tracker about an UTF-8 issue with PHP on Windows (the issue was about some problem within Zend_Ldap)  which pointed to a MSDN page about setlocale and _wsetlocale. It’s clearly stated there that the CRT function setlocale() does not work with multi-byte charsets on Windows:

The set of available languages, country/region codes, and code pages includes all those supported by the Win32 NLS API except code pages that require more than two bytes per character, such as UTF-7 and UTF-8. If you provide a code page like UTF-7 or UTF-8, setlocale will fail, returning NULL. The set of language and country/region codes supported by setlocale is listed in Language and Country/Region Strings.

That means that setlocale() does not work on Windows when given a locale with an UTF-8 charset, e.g. German_Germany.65001, and therfore you cannot use strcoll() or similar functions for locale-aware string operations with these charsets. It simply is not possible due to a Windows CRT limitation.

December 8, 2008 at 11:45 1 comment

How to use the new Zend_Ldap functionality

Zend Framework Logo (small)The following post should demonstrate some of the new features that come along with the extended Zend_Ldap component that is now available in the Zend Framework Standard Incubator. The component was originally designed to augment the current Zend_Ldap component with methods to do all those operations on a LDAP server which everybody is used to when dealing with databases: searching, retrieving, updating, adding and deleting items on the directory server. The component has recently been accepted for Standard Incubator development with the notice to explore the possibility of a migration of all new features directly into the existing Zend_Ldap class. This has been achieved and the component can be considered an all out solution for working with LDAP directory servers which naturally means that there is still a lot of room for improvement. The transition from the authentication-only Zend_Ldap to a fully featured data-access component flowed smoothly and backwards-compatibility could be maintained (at least when connecting to OpenLDAP servers as I don’t have any other LDAP server at hand to run regression tests against). Enough of that blah, let’s have a look at the new features…

Connecting to a LDAP server

There is no change from the old Zend_Ldap – provide the options to connect to the server and call bind().

$options = array(
    'host'     => 'localhost',
    'port'     => 389,
    'username' => 'uid=php,ou=People,dc=zend,dc=com',
    'password' => 'password',
    'baseDn'   => 'dc=zend,dc=com'
);
$ldap=new Zend_Ldap($options);
$ldap->bind();

Retrieving a single entry

To retrieve a single entry from the server you use the getEntry() method.

$carl = $ldap->getEntry('cn=Carl Baker,ou=People,dc=zend,dc=com');
/*
 * $carl is an array containing
 * array(
 *    'dn'          => 'cn=Carl Baker,ou=People,dc=zend,dc=com',
 *    'cn'          => array('Carl Baker'),
 *    'givenname'   => array('Carl'),
 *    'objectclass' => array('account', 'inetOrgPerson', 'top'),
 *    'sn'          => array('Baker'),
 *    'title'       => array('PR Manager'),
 *    'uid'         => array('cbaker'),
 *    ...);
 */

Searching entries

You can retrieve a list of entries that match a given LDAP filter by using the search() method.

$items=$ldap()->search('(objectClass=account)', 'ou=People,dc=zend,dc=com',
    Zend_Ldap::SEARCH_SCOPE_ONE);
// $items is an object of type Zend_Ldap_Collection
$count=count($items); // the returned object implements Countable
foreach ($items as $item) { // the returned object also implements Iterator
    /*
     * $item is an array containing
     * array(
     *    'dn'          => '...',
     *    'cn'          => array('...'),
     *    'givenname'   => array('...'),
     *    'objectclass' => array('account', 'inetOrgPerson', 'top'),
     *    'sn'          => array('...'),
     *    'title'       => array('...'),
     *    'uid'         => array('...'),
     *    ...);
     */
}
// alternatively you can use searchEntries() to get back an array of entries
$items=$ldap()->searchEntries('(objectClass=account)', 'ou=People,dc=zend,dc=com',
    Zend_Ldap::SEARCH_SCOPE_ONE);
// $items is an object of type array

Searching entries with filter objects

Instead of using a filter string you can create your filter with the object-oriented interface of Zend_Ldap_Filter.

$filter=Zend_Ldap_Filter::equals('objectClass', 'account')
    ->addAnd(Zend_Ldap_Filter::greater('salary', 10000));
$items=$ldap()->search($filter, 'ou=People,dc=zend,dc=com',
    Zend_Ldap::SEARCH_SCOPE_ONE);

Manipulating DN string

DN strings can be parsed into Zend_Ldap_DN objects which allow for easy modifiation of the DN.

$dnString='cn=Baker\\, Alice,cn=Users+ou=Lab,ou=People,dc=zend,dc=com';
$dn=Zend_Ldap_Dn::fromString($dnString);
unset($dn[2];
$dn[1]=array('cn' => 'Users', 'ou' => 'PR');
$dn->insert(1, array('ou' => 'Dismissed'));
$dnString=(string)$dn;
// cn=Baker\\, Alice,cn=Users+ou=PR,ou=Dismissed,dc=zend,dc=com

Adding new entries

Zend_Ldap provides an add() method to add entries to the directory.

$ldap->add('uid=created,ou=People,dc=zend,dc=com', array(
    'uid'         => 'created',
    'objectClass' => 'account'));

Updating existing entries

The update() method can be used to update existing entries in the directory.

$ldap->update('cn=Carl Baker,ou=People,dc=zend,dc=com', array(
    'title' => array('Barkeeper'),
    'uid'   => 'cabaker');
// it's enough to provide those attributes that will be changed

Deleting entries

Deletions can be done recursively when providing true as the second parameter.

// this will throw an exception if the given entry has subordinates
$ldap->delete('cn=Baker\\, Alice,cn=Users+ou=Lab,ou=People,dc=zend,dc=com', false);
// this will delete the complete subtree
$ldap->delete('ou=Dismissed,dc=zend,dc=com', true);

Copying and moving entries and subtrees

Zend_Ldap provides methods to move and copy entries and subtrees to other locations within the directory tree. The third parameter specifies if the operation will be carried ou recursively.

// this will throw an exception if the given entry has subordinates
$ldap->copy('cn=Baker\\, Alice,cn=Users+ou=Lab,ou=People,dc=zend,dc=com',
    'cn=Baker\\, Charlie,ou=Lab,ou=People,dc=zend,dc=com', false);
// the operation will be done recursively
$ldap->move('ou=Dismissed,dc=zend,dc=com', 'ou=Fired,dc=zend,dc=com', true);

There is a lot more to be discovered in Zend_Ldap such as checking server capabilities via the RootDSE, browsing the schema using the subSchemaSubentry, LDIF im- and exporting and especially the active-record like interface to LDAP entries Zend_Ldap_Node which we’ll cover in the next post.

I hope this post has provided some insight into the features of Zend_Ldap and you’re invited to try and test the new features and give your feedback to further improve the component and get it into production, which means into the standard trunk, as soon as possible. Especially I’m in search for testers who can verify the working of this component against the different LDAP servers available: e.g. ActiveDirectory, ADAM, Siemens, Novell, Sun, and so on.

December 7, 2008 at 20:57 6 comments

Extended Zend_Ldap is in Standard Incubator

Zend Framework Logo (small)

After beeing accepted for Standard Incubator development by the Zend Framework team on November, 1st 2008, the extended Zend_Ldap component (formerly known as Zend_Ldap_Ext) has been moved into the Zend Framework Standard Incubator and can be checked out from there.

This is what the component is all about (from the proposal page):

The existing Zend_Ldap component currently just responds to authentication use cases in all their varieties. There is no posibility to query a LDAP directory service in a unified and consistent way. The current component also lacks core CRUD (Create, Retrieve, Update and Delete) functionality – operations that are crucial to for example database abstraction layers.
This proposals tries to resolve these deficiencies in that it provides a simple two-ply object oriented model to connect to, query and perfom CRUD operations on an LDAP server. The first layer is a wrapper around the ext/ldap functions, spiced up with extended functionality such as copying and moving (renaming in a LDAP context) nodes and subtrees.
The second layer (Zend_Ldap_Node) provides an active-record-like interface to LDAP entries and stresses the tree-structure of LDAP data in providing (recursive) tree traversal methods.
To simplify the usage of the unfamiliar LDAP filter syntax this components proposes an object oriented approach to LDAP filter string generation, which can loosely be compared to Zend_Db_Select.
Usefull helper classes for creating and modifying LDAP DNs and converting attribute values complete this component.
Furthermore it is possible to do some LDAP schema browsing and to read and write LDIF files.
It is important to note, that this proposal is a complete replacement for the current Zend_Ldap component and does not break backwards-compatibility.

Later today I’ll try to publish a short tutorial on the usage of the (hopefully) new Zend_Ldap component.

December 7, 2008 at 16:50 1 comment

On how to sort an array of UTF-8 strings

This article is based on a question asked by me on stackoverflow.com and illustrates the way I solved the question myself and discovered a PHP bug on Windows.

Sorting an array of strings in PHP seems to be a no-brainer at all. There are a lot of sort functions with sort() being the most common one. The problem arises when the strings used in the array are multi-byte encoded, for example UTF-8 encoded. Because PHP comparison functions cannot operate on those strings (they do a byte-per-byte comparison) sorting does not work as expected either. Furthermore language specific sorting properties are not taken into consideration when sorting with sort() and the default parameters. In Swedish for example an Ä is sorted at the end of the alphabet while in German Ä normally is equivalent to A (when using the DIN 5007 sorting method).

Fortunately PHP provides a function which copes with this problem: strcoll(). The function can be used for array sorting by just specifying the function name as the second parameter to usort(). The sort() function also has a flag (SORT_LOCALE_STRING) which actually seems to do the same as usort() together with a strcoll() callback.

To summarize we can say, that sorting an array of UTF-8 strings in a language aware manner is more or less simply a question of setting the correct locale. Let’s look at the following example using German as the reference language and saved with a UTF-8 encoding.

$array=array('Übergabe', 'Ostfriesland', 'Äpfel', 'Unterführung', 'Apfel', 'Österreich');
$oldLocale=setlocale(LC_COLLATE, "0");
setlocale(LC_COLLATE, 'de_DE.utf8');
usort($array, 'strcoll'); // or equivalent sort($array, SORT_LOCALE_STRING);
setlocale(LC_COLLATE, $oldLocale);

This will result in an array of Apfel, Äpfel, Österreich, Ostfriesland, Übergabe, Unterführung (obviously we’re using DIN 5007 sorting here).

As sorting now is locale-dependent we have to respect the PHP environment, which means what machine are we running our script on – Windows or *nix?

First of all, if we have a *nix machine, the used locale must be installed on the system. You can get a list of installed locales by issuing the command locale -a on the command line. Be sure to use the correct encoding with the desired locale – the encoding must match the string encoding.

Things get more complicated on Windows machines as locales are named differently. The default naming scheme is Country_Language.Encoding. Information on locales on Windows can be found on MSDN: Language and Country/Region Strings, Language Strings, Country/Region Strings and Code Pages. Furthermore encodings are not specified like on *nix machines but rather by using code pages. As we’re using UTF-8 in our example we have to use the UTF-8 Windows code page, which is 65001. Putting all these things together we get to a locale of German_Germany.65001 for our example. For the sake of completeness the normal code page for Western Europe would be 1252.

This leads us to the following code snippet (UTF-8 encoded strings):

$array=array('Übergabe', 'Ostfriesland', 'Äpfel', 'Unterführung', 'Apfel', 'Österreich');
$oldLocale=setlocale(LC_COLLATE, "0");
setlocale(LC_COLLATE, 'German_Germany.65001');
usort($array, 'strcoll'); // or equivalent sort($array, SORT_LOCALE_STRING);
setlocale(LC_COLLATE, $oldLocale);

What the heck???? Übergabe, Apfel, Ostfriesland, Unterführung, Äpfel, Österreich?? That obviously doesn’t work… What’s the problem? Let’s try to use non UTF-8 strings (don’t forget to recode the file to ANSI, Windows-1252 or ISO-8859-1):

$array=array('Übergabe', 'Ostfriesland', 'Äpfel', 'Unterführung', 'Apfel', 'Österreich');
$oldLocale=setlocale(LC_COLLATE, "0");
setlocale(LC_COLLATE, 'German_Germany.1252');
usort($array, 'strcoll'); // or equivalent sort($array, SORT_LOCALE_STRING);
setlocale(LC_COLLATE, $oldLocale);

Now we get Apfel, Äpfel, Österreich, Ostfriesland, Übergabe, Unterführung. OK, non-UTF-8 is working correctly. Let’s dig in deeper. What does strcoll() do with my array? Let’s trace what’s going on (thanks to Huppie for the idea of tracing what strcoll() is doing):

function traceStrColl($a, $b) {
    $outValue=strcoll($a, $b);
    echo "$a $b $outValue\r\n";
    return $outValue;
}

$array=array('Übergabe', 'Ostfriesland', 'Äpfel', 'Unterführung', 'Apfel', 'Österreich');
$oldLocale=setlocale(LC_COLLATE, "0");
setlocale(LC_COLLATE, 'German_Germany.65001');
usort($array, 'traceStrColl');
setlocale(LC_COLLATE, $oldLocale);

The output is:

Äpfel Ostfriesland 2147483647
Äpfel Übergabe 2147483647
Äpfel Unterführung 2147483647
Äpfel Apfel 2147483647
Österreich Äpfel 2147483647
Ostfriesland Apfel 2147483647
Ostfriesland Übergabe 2147483647
Unterführung Ostfriesland 2147483647
Apfel Übergabe 2147483647

As you can see strcol() returns 2147483647 on every comparison operation. This is reproducible and emerges only on Windows machines (by the way the PHP version does not seem to matter as I tried the snippet on PHP 5.2.4, 5.2.5 an 5.2.6). Actually this is what I’d classify as a bug. Therefore I filed a bug report on bugs.php.net: Bug #46165 strcoll() does not work with UTF-8 strings on Windows

Summary: Currently it is not possible to sort UTF-8 strings on a WIndows machine simply using PHP-provided functions. A possible solution would be to recode the strings to Windows-1252 or ISO-8859-1 encoding (using mb_convert_encoding() or iconv()) and do a sort on the recoded array (provided by ΤΖΩΤΖΙΟΥ on stackoverflow.com).

September 24, 2008 at 12:16 8 comments

Installed phpUnderControl on our development server

Just installed phpUnderControl and CruiseControl on our development server. Actually everything went quite smoothly and only the java installation made some problems as it was mentioned nowhere that you need to install the Java SE Development Kit (JDK) and that the Java SE Runtime Environment (JRE) is not sufficient.

These are the required software packages:

The only disadvantage of the current installation is, that we have to use SSH tunnels to get to the phpUnderControl dashboard. By a fluke I just stumbled on an article by Max Horvath who describes how to setup Apache to access the phpUnderControl pages via a proxy and therefore avoiding the SSH tunnels. I’ll try this later today.

August 15, 2008 at 14:43 3 comments

Older Posts


del.icio.us

Certification