Wednesday, August 13, 2008

LDAP - What's in a string?

Sometimes when I'm doing projects I run across some things that make me wonder.. Huh?

When you are building LDAP objectclasses and attributes for your Identity Management project, should you be using Directory String or IA5 String for your typical attributes? Actually there are several string types supported by LDAP:

IA5String, DirectoryString, PrintableString, OctetString, PostalAddress, CountryString and NumericString.

Most often in the projects I have worked on we would use DirectoryString for most custom attributes. Looking at the RFC's you can pick up bits and pieces about the differences beteen IA5String and Directory String. My friend Thom Anderson does a great comparison of these two string types. Read on:

"The IA5 is more constrained than Directory String. You can think of it is ASCII on steroids . . . ASCII is a 7-bit protocol and for years, persons have been finding themselves with an eight-bit byte wondering what to do with the extra bit. Normally, they use the ‘zero’ value of the extra bit for ASCII characters and then use the ‘one’ value for things such as special characters (early IBM PC) or European characters (IA5). Although the ‘IA’ in IA5 means ‘international alphabet. It does not include all languages as that would require more than 8 bits. That is where Directory String comes in. Directory String is basically UTF-8, a version of Unicode that has only 8 bits for Western languages, but requires more bits (in 8-bit increments) as one moves East."

"Only in IA5 can you be assured that the number of characters and number of bytes is the same. Of course, that would limit one to Western characters, but that is not such a bad thing. In many cases, it will not make any difference. In the U.S. ASCII is sufficient and it is a subset of both IA5 and UTF-8."

No comments: