mail this page
products | company | support | training | contact us
If you already understand what LDAP is, what it is good for, Schemas, objectClasses, Attributes, matchingRules, Operational objects and all that jazz - skip this section. But if you are going to do anything except blindly follow HOWTOs you must understand most of this stuff.
LDAP and X.500 are feet deep in terminology. Some terminology is important, some is just fluff. We have created a glossary to jog your memory. We introduce terms either because they are important or because they are frequently used in the literature.
A brief Note about case sensitivity in LDAP: It's confusing - well, we found it confusing. Truth be told we find a lot of things confusing. The only case sensitive things in LDAP are passwords and the contents of certain (very obscure) attributes based on their matchingRule. Period. You will see both in this and other documentation: objectclasses or objectClasses and even ObjectClasses. They all work. Period. And you have enough to worry about in the first six years of learning LDAP (just kidding, it will only take four years) to get in a sweat every time you approach a keyboard in case you mistype the name of something.
Once upon a time, in the dim and distant past (the late 70's - early 80's) the ITU (International Telecommunication Union) started work on the X.400 series of email standards. This email standard required a directory of names (and other information) that could be accessed across networks in a hierarchical fashion not dissimilar to DNS for those familiar with its architecture.
This need for a global network based directory led the ITU to develop the X.500 series of standards and specifically X.519, which defined DAP (Directory Access Protocol), the protocol for accessing a networked directory service.
The X.400 and X.500 series of standards came bundled with the whole OSI stack and were big, fat and consumed serious resources. Standard ITU stuff in fact.
Fast forward to the early 90's and the IETF saw the need for access to global directory services (originally for many of the same email based reasons as the ITU) but without picking up all the gruesome protocol (OSI) overheads and started work on a Lightweight Directory Access Protocol (LDAP). LDAP was designed to provide almost as much functionality as the original X.519 standard but using the TCP/IP protocol - while still allowing inter-working with X.500 based directories. Indeed, X.500 (DAP) inter-working and mapping is still part of the IETF LDAP series of RFCs.
A number of the more serious angst issues in the LDAP specs, most notably the directory root naming convention, can be traced back to X.500 inter-working and the need for global directories.
LDAP - broadly - differs from DAP in the following respects:
Technically, LDAP is just a protocol that defines the method by which directory data is accessed. Necessarily, it also defines and describes how data is represented in the directory service (the Data (Information) Model). Finally, it defines how data is loaded (imported) into and saved (exported) from a directory service (using LDIF). LDAP does not define how data is stored or manipulated. Data storage and access methods are automagical processes as far as the standard is concerned and are generally handled by back-end modules (typically using some form of transaction database) within any specific LDAP implementation.
LDAP defines four models which we list and briefly describe - you can then promptly forget them since they bring very little to the understanding of LDAP.
Information Model: We tend to use the term Data Model, in our view a more intuitive and understandable term. The Data (or Informational) Model defines how the information or data is represented in an LDAP enabled system - this may, or may not, be the way the data is actually stored since this issue lies outside the scope of the LDAP standards as described above.
Naming Model: This defines all that 'dc=example,dc=com' stuff that you stumble across in LDAP systems. We stick pretty much to the specifications here because the terms are so widely used.
Functional Model: When you read, search, write or modify the LDAP you are using the Functional Model - yipee!
Security Model: You can control, in a very fine-grained manner, who can do what to which data. This is complex but powerful stuff. We progressively introduce the concepts and have dedicated a specific chapter to it. To begin with - forget security. You can always go back and retro-fit security in LDAP. Where you cannot retro-fit, we reference security implications in the text. This model also encompasses wire-level security such as TLS/SSL. Good solid mind numbing stuff.
The scope of the LDAP standards is shown in the diagram below. The red stuff is defined in the protocol (the various RFCs that define LDAP). What happens inside the black boxes (or in this case the green, yellow and mauve boxes) and on the black line to the Database(s) is 'automagical' and outside the scope of the standards.
Each component is briefly described here, in a bit more detail below and in excruciating detail in subsequent chapters. But there are four important points first:
LDAP does not define how data is stored, only how it is accessed; BUT most LDAP implementations do use a standard database as a back-end and indeed OpenLDAP offers a choice of back-end database support.
When you talk to an LDAP server you have no idea where the data comes from: in fact the whole point about the standard is to hide this level of detail. In theory the data may come from one OR MORE local databases or one OR MORE X.500 services. Where and how you access the data is an implementation detail and is only important when you define the configuration of your LDAP server(s).
Keep the two concepts - access to the LDAP service and operation of the LDAP service - very clearly separate in your mind. When you design a directory based system figure out what you want it to do (the schemas and data organization) and forget the implementation. Then figure out as a second phase where the data is and how and where you want to store it - during LDAP configuration.
A number of commercial database products provide an LDAP view (an LDAP wrapper or an LDAP abstraction) of relational or other database types.
LDAP is characterized as a 'write-once-read-many-times' service. That is to say, the type of data that would normally be stored in an LDAP service would not be expected to change on every access. To illustrate: LDAP would NOT be suitable for maintaining banking transaction records since, by their nature, they change on every access (transaction). LDAP would, however, be eminently suitable for maintaining details of the bank branches, hours of opening, employees etc. which change far less frequently.
It is never clear in the phrase 'write-once-read-many-times' just how many is many?
Where is the line between sensible use of LDAP vs a classic transaction oriented relational database, for example, SQLite, MySQL, PostGreSQL. If we update every second access, is this a sensible LDAP application, or should it be every 1,000 or 1 million times.
The literature is a tad sparse on this topic and tends to stick with 'slam-dunk' LDAP applications like address books which change perhaps once in living memory.
There is no simple answer but the following notes may be useful:
The performance hit during writes lies in updating the indexes. The more indexes (for faster reading) the less frequently you want to update the directory. Read:write ratios of less than 1,000:1 or higher for heavily read optimised LDAP directories.
LDAP Replication generates multiple transactions for every update so you want the lowest practical update load (1,000:1 or higher).
If data volumes are large (say > 100,000 ) the time to update even a small number of indexes may be serious so you want to keep updates as low as practical (10,000:1).
If data volumes are relatively small (say < 1,000 records), indexes modest and no replication is being used we see no inherent reason why you could not use LDAP in a form which approaches a transaction based system, that is, every 5 - 10 accesses involve a read followed by write cycle (a modify in the LDAP jargon).
We suspect the real answer to this question (with apologies to the memory of the late, lamented Douglas Noel Adams): the ratio of reads to writes is 42!
The primitives (LDAP protocol elements) that are used to access LDAP enabled directories use a model of the data that is (may be) abstracted from its physical organization. The primitives assume an object data model without being aware of the actual structure of the data. Indeed the relative simplicity of LDAP comes from this characteristic. The specific implementation will in its back-end function perform the LDAP primitive to physical data organization mapping in a completely 'automagical' way.
This is in marked contrast to say SQL in which the SQL queries used to interrogate the data have complete and detailed knowledge of the data structures and organization into tables, joins, etc..
Relational and transactional databases go to extreme lengths to ensure that data is consistent during write/update cycles using such techniques as transactions, locking, roll-backs and other methods. It is a vital and necessary requirement for this type of database. This extreme form of data synchronisation also persists when the data is replicated on multiple hosts or servers. All views of the data will be consistent.
The data in a master LDAP server and its slaves (or its peer in a multi-master environment) use a simple asynchronous replication process. This has the effect of leaving the master and slave (or peer) systems out of data synchronisation during the replication cycle. A query to the master and slave during this (usually short) period of time may yield a different answer. If the world will come to shuddering halt as a consequence of this discrepancy, LDAP is not suitable for this application. However, if Bob Smith is shown in the accounting department on one LDAP server and in the sales department on another for a few seconds or less, who cares? A surprising number of applications fall into this category.
Note: Modern LDAP implementations, especially those that support Multi-master configurations, have increasingly become more sophisticated in replicating updates. Additionally, high-speed communication networks permit significantly faster replication operations. These issues, however, simply reduce the time window during which any two systems may be out-of-sync they do not eliminate the out-of-sync behaviour of LDAP - even though it may be sub-second in most modern implementations.
So what are LDAP (Directory) advantages and why would any sane human being use a directory?
Before attempting to answer the question let's dismiss the tactical issue of performance. In general, RDBMS systems are still significantly faster than LDAP implementations. This is changing with the development of second generation Directory Servers but while RDBMS will always remain faster than LDAP the gap is reducing significantly to the point where, assuming you compare like with like (a measured network initiated transaction), the differences will become increasingly trivial. Unless, of course, you update a highly indexed attribute on every operation - in which case you deserve everything you (don't) get.
So why use LDAP? Here is our list of key characteristics which make the (currently) high level of pain worthwhile.
LDAP provides a remote and local data access method that is standardized. It is thus possible to replace the LDAP implementation completely without affecting the external interface to the data. RDBMS systems provide local access standards, such as SQL, but remote interfaces are always proprietary.
Because LDAP uses standardized data access methods Clients and Servers may be sourced (or developed) independently. By extension of this point LDAP may be used to abstract the view of data contained in transaction oriented databases, say for the purpose of running user queries, while allowing the user to transparently (to the LDAP queries) change the transactional database supplier.
LDAP provides a method whereby data may be moved (delegated) to multiple locations without affecting any external access to that data. By using referral methods LDAP data can be moved to alternate LDAP servers by changing operational parameters only.
LDAP systems can be operationally configured to replicate data to one or more applications without adding either code or changing the external access to that data.
The above definition focuses exclusively on the standard nature of LDAP data access and does not consider the ratio of reads to writes which, as noted above, depends on the number of operational indices maintained. It does implicitly ignore the use of Directories for transaction processing - though there are signs that some LDAP implementations are looking toward such capabilities.
LDAP enabled directories use a data model that assumes or represents the data as a hierarchy of objects. This does not imply that LDAP is an object-oriented database. As pointed out above, LDAP itself is a protocol that allows access to an LDAP enabled service and does not define how the data is stored - but the operational primitives (read, delete, modify) operate on a model (description) of the data that has object-like characteristics (mostly).
This section defines the essence of LDAP. If you understand this section and the various terms and relationships involved you understand LDAP.
Data is represented in an LDAP enabled directory as a hierarchy of objects, each of which is called an entry. The resulting tree structure is called a Data Information Tree (DIT). The top of the tree is commonly called the root (a.k.a base or the suffix).
Each entry in the tree has one parent entry (object) and zero or more child entries (objects). Each child entry (object) is a sibling of its parent's other child entries.
Each entry is composed of (is an instance of) one or more objectClasses. Objectclasses contain zero or more attributes. Attributes have names (and sometimes abbreviations or aliases) and typically contain data (at last!).
The characteristics (properties) of objectClasses and their attributes are described by ASN.1 definitions.
Phew! You now know everything there is to know about LDAP. The rest is just detail - there is quite a lot of detail. But this the core of LDAP.
The diagram below illustrates these relationships:
LDAP DIT Information (Data) Model
Now take the rest of the day off and celebrate.
Each attribute has a name and normally contains data. Attributes are always associated with (are members of) one or more ObjectClasses. Attributes have a number of interesting characteristics:
All attributes are members of one, or more, objectclass(es)
Each attribute defines the data type that it may contain.
Attributes can be optional (keyword is MAY) or mandatory (keyword is MUST) as described in the ASN.1 definitions for the objectclass of which they are a member. An attribute may be optional in one objectclass and mandatory in another. It is the objectclass which determines this property.
One sees apparently random attributes being picked up from all over the place in the documentation - it's confusing at first but comes from the optional characteristic of most attributes. It allows a 'pick-n-mix' approach to populating an entry. Find the attribute you want, include its objectclass, and hope that all the other attributes that you don't want to use in the objectclass are optional! Try browsing here to get a feel for this.
Attributes can have single or multi values (as described in their ASN.1 definitions). Single means that only one data value may be present for the attribute. Multi means there can be one or or more data values for the attribute. If the attribute describes say, an email address, there can be one, two or 500 values - this is one of a number of methods of dealing with email aliases in directory designs. The default for an attribute is multi (allow multiple values).
Attributes have names and sometimes aliases or abbreviations (as described in their ASN.1 definitions), for example, commonName is a member of the object class named person (and many others) and has an abbreviated name of cn. Either commonName or cn may be used to reference this attribute.
At each level in the hierarchy the data contained in an attribute can be used to uniquely identify the entry. It can be any attribute in the entry. It can even be a combination of two or more attributes.
Suppose you have a classic white-pages directory containing names, phone number, addresses, favorite drink (yes, there is a standard favoritedrink attribute) on so on. To uniquely identify a particular entry you may choose the person's name (the commonName or cn attribute). If the name is not unique in the LDAP directory, for example,'Bob Smith' then a search for 'Bob Smith' will return all the entries containing the name 'Bob Smith' in the directory and the user will have to select the best one. For reading and searching this may be acceptable or even desirable since the human 'interface' uses well known information - the person's name.
For writing or updating an entry if 'Bob Smith' is not absolutely unique it is useless - which of the returned entries do we update? In this case it may be necessary to select another attribute that is absolutely unique. It is perfectly permissible to use more than one attribute to access the data depending on the context, for instance, a person's name for reading and searching, telephone number for writing. Furthermore, as defined above, it is perfectly permissible to use more than one attribute (cn+favoritedrink!) to define the unique entry.
Browse some common objectClasses and attributes. If it looks pretty scary at this stage just forget you took the link and carry on reading.
More on Attributes - only if you are comfortable - otherwise continue here.
Objectclasses are essentially packages of attributes. There are a confusing number of pre-defined objectclasses, each of which contains bucket-loads of attributes suitable for almost all common LDAP implementations. It goes without saying, however, that in spite of all those pre-defined objectClasses, the one you really, really need is never there! objectclasses have two more characteristics:
The objectclass defines whether an attribute member MUST (mandatory) be present or MAY (optional) be present.
The objectclass may be part of a hierarchy in which case it inherits all the characteristics of its parent objectclasses.
More on objectClasses - only if you are comfortable otherwise continue here.
Eventually we want to slap some data into our directory and actually use the stupid thing.
Describing the tree structure and initial population of data is performed by adding entries (with their associated objectClasses and attributes) starting from the root of the DIT and progressing down the hierarchy. Thus, a parent entry must always have been added before attempting to add any child entries. Adding entries may be done in a variety of ways, one of which is using LDAP Data Interchange Files (LDIF) which are fully described in a later chapter. LDIFs are textual files that describe the tree hierarchy - the Directory Information Tree (DIT) - and the data to be added to each attribute. The following is a simple example of an LDIF file which sets up a root DN (dc=example,dc=com) and adds a single child entry under a people entry.
It is not important to understand what all the values in this LDIF file do at this stage. Chapter 5 (samples) covers the details of setting up LDIF files and Chapter 8 explains LDIF files in painful detail. It is enough, at this stage, to know that LDIF files can be used to set up the DIT and that LDIF files look like the one below.
version: 1 ## version not strictly necessary (and some implementations reject it) but generally good practice ## DEFINE DIT ROOT/BASE/SUFFIX #### ## uses RFC 2377 (domain name) format ## dcObject is an AUXILIARY objectclass and MUST ## have a STRUCTURAL objectclass (organization in this case) # this is an ENTRY sequence and is preceded by a BLANK line dn: dc=example,dc=com dc: example description: The best company in the whole world objectClass: dcObject objectClass: organization o: Example, Inc. ## FIRST Level hierarchy - people # this is an ENTRY sequence and is preceded by a BLANK line dn: ou=people, dc=example,dc=com ou: people description: All people in organisation objectClass: organizationalUnit ## SECOND Level hierarchy - people entries # this is an ENTRY sequence and is preceded by a BLANK line dn: cn=Joe Schmo,ou=people,dc=example,dc=com objectclass: inetOrgPerson cn: Joe Schmo sn: Schmo uid: jschmo mail: firstname.lastname@example.org mail: email@example.com ou: sales
Important Note: The lines in the above LDIF file beginning with 'dn:' essentially tell the LDAP server how to structure or place the entry within the DIT. In general, it does not matter what attribute value is used for this purpose as long as the 'dn:' is unique. The above example has chosen to use cn=Joe Schmo in the last entry for this purpose. It could equally have been, say, uid=jschmo. LDAP searching can be used with any combination of attributes and can thus find entries irrespective of the 'dn:' value used to create it. However, if the entry is going to be used for user authentication, say, a logon or Single Sign-On type use, the 'dn:' value becomes extremely important and defines the logon (or Bind DN in the jargon) identifier. It is sometimes (especially in the context of LDAP used within Microsoft's AD) referred to as a Principal DN though this term is not used within the LDAP standards definitions. For more on this topic.
We'll explain about LDIF files later as we need them but the above LDIF sets up the structure below:
Once the DIT is up and running, further data may be added using LDIF, an LDAP Browser, a web or other application interface.
Data can be exported (saved) for backup or other purposes using LDIF files.
Having gotten data into our Tree (DIT) we would now - normally - like to use it!
To do that we have to send commands (read, search, modify etc.) to the LDAP server, and in order to do that we have to be able to tell the LDAP server where the data is (for a write) or roughly where it is (for a search).
In short we have to navigate (or crawl around) the directory.
We defined previously that attributes have names and that at each level in the hierarchy one or more of the attributes must contain data that somewhat uniquely identifies each entry.
By constructing paths that comprise these named attributes and their data content we can get to our desired entry or search start position.
These paths are quaintly called Distinguished Names (DN). Each unique data attribute that is a part of this DN is called a Relative Distinguished Name (RDN). Alternatively, a DN is the sum of all of its RDNs. Whatever your views of the merits, or otherwise, of LDAP and X.500 the ability of the standard group to generate unique terminology is beyond question.
The following diagram illustrates the DN and RDNs.
One of the more powerful aspects of LDAP (and X.500) is the inherent ability within the design to delegate the responsibility for maintenance of a part of the directory while continuing to see the directory as a consistent whole. Thus, a company directory may create a delegation (referral is the LDAP term) of the responsibility for a particular department's part of the overall directory to that department's LDAP server. In this respect LDAP almost exactly mirrors the DNS delegation concept.
Unlike the DNS system, there is no option in the standards to tell the LDAP server to follow (resolve) a referral - it is left to the LDAP client to directly contact the new server using the returned referral. Equally, because the standard does not define LDAP data organisation it does not contravene the standard for an LDAP server to follow (resolve) the link and some LDAP servers perform this function automatically using a process that is usually called chaining.
OpenLDAP takes a literal view of the standard and does not chain by default it always returns a referral. However OpenLDAP can be configured to provide chaining by use of the overlay chain directive.
The built-in replication features of LDAP allows one or more copies of a directory (DIT) to be slaved from a single master (and even in some implementations between multiple masters) thus inherently creating a resilient structure.
It is important, however, to emphasize the difference between LDAP and a transactional database. When an update is performed on a master LDAP enabled directory, it may take some time (in computing terms) to update the slave(s) (or a peer master) - the master and slaves (or peer masters) may be unsynchronised for a period of time.
In the LDAP context, temporary lack of DIT synchronisation is regarded as unimportant. In the case of a transactional database, even a temporary lack of synchronisation is regarded as catastrophic. This emphasises the differences in the characteristics of data that should be maintained in an LDAP enabled directory versus a transactional database.
Figure 2.5-1 below shows a search request with a base DN of dn:cn=thingie,o=widgets,dc=example,dc=com, to a referral based LDAP system, that is fully satisfied from the first LDAP server (LDAP1):
Figure 2.5-1 - Request satisfied from LDAP1 only
Figure 2.5-2 below shows a search request with a base DN of dc:cn=cheri,ou=uk,o=grommets,dc=example,dc=com, to a referral based LDAP system, that results in a series of referrals to the LDAP2 and LDAP3 servers:
Figure 2.5-2 - Request generates referrals to LDAP2 and LDAP3
If the LDAP server is configured to chain (follow the referrals as shown by the alternate dotted lines) then a single data response will be supplied to the LDAP client. Chaining is controlled by LDAP server configuration and by values in the search request. Information on chaining.
Replication features allow LDAP DIT updates to be copied to one or more LDAP systems for backup and/or performance reasons. In this context it is worth emphasizing that replication operates at the DIT level not the LDAP server level since there may be multiple DITs within an LDAP server. Replication occurs periodically within what is known as the replication cycle time (essentially the time taken to send the updated data to the replica and to receive acknowledgement of success). In general there are methods to reduce the replication cycle time by configuration but these will typically have a performance or network usage overhead. OpenLDAP, historically, used a separate daemon (slurpd) to perform replication but since version 2.3 the replication strategy has changed significantly with important gains in flexibility and configurable changes in the replication cycle time. There are two possible replication configurations and multiple variations on each configuration type.
Master-Slave: In a master-slave configuration a single DIT is updated (the Master or Provider in OpenLDAP jargon) and these updates are replicated or copied to one or more designated LDAP servers running slave DITs (a consumer in openLDAP jargon). The slave servers operate with read-only copies of the master DIT. Read-only users can operate quite happliy with the servers containing the slave DITs but users who need to update the directory will need to access the server containing the master DIT. In certain conditions a Master-Slave configuration can provide significant load-balancing. However a Master-Slave configuration has two obvious shortcomings:
If all/most users have the ability/need to update the DIT then they will either have to access one server (with the slave DIT) for normal read access and another server (with the master DIT) to perform the update. Alternatively, they can always point to the server running the master DIT. In this latter case replication provides backup functionality only.
Since there is only one server containing a master DIT it represents a single point of failure for write operations (though the Slave DIT could be reconfigured to act as a Master in the event of a major failure).
Multi-Master: In a multi-master configuration one or more servers running master DITs may be updated and the resulting updates are propagated to the peer masters.
Historically, OpenLDAP did not support multi-master operation for quite some time but version 2.4 finally introduced multi-master capabilities. In this context, it may be worth pointing out two specific variations of the generic update-contention problem identified by the OpenLDAP project that apply to multi-master configurations and which are true for all LDAP systems:
Value-contention If two attribute updates are performed at the same time (within the replication cycle time) with different values, then, depending on the attribute type (SINGLE or MULTI-VALUED) the resulting entry may be in an incorrect or unusable state.
Delete-contention If one user adds a child entry at the same time (within the replication cycle time) as another user deletes the original entry, then the deleted entry will re-appear.
Figure 2.5-3 shows a number of possible replication configurations.
Figure 2.5-3 - Replication Configurations
RO = Read-only, RW = Read-Write.
LDAP1 Client facing system is a Slave and is read only. Clients must issue modify operations (writes) to the Master.
LDAP2 Client facing system is a Master and it is replicated to two slaves.
LDAP3 is a Multi-Master and clients may issues reads and/or Writes to either system. Each master in this configuration could, in turn, have one or more slave DITs.
Problems, comments, suggestions, corrections (including broken links) or something to add? Please take the time from a busy life to 'mail us' (at top of screen), the webmaster (below) or info-support at zytrax. You will have a warm inner glow for the rest of the day.
3 ldap objects
4 install ldap
7 replica & refer
10 ldap api
14 ldap tools
notes & info
rfc's & x.500
This work is licensed under a Creative Commons License.
If you are happy it's OK - but your browser is giving a less than optimal experience on our site. You could, at no charge, upgrade to a W3C STANDARDS COMPLIANT browser such as Firefox