threebit.NET

A Nested Set Implementation in Java and PostgreSQL

This tutorial intends to give the reader:

See an alternate implementation that uses varchar fields.

  1. Preparation

    You're gonna want to have a JVM and Postgres or MySQL or some other RDBMS installed to do this. If you are using Oracle or DB2 you should use their built in Adjencency list operators instead.

  2. Motivation

    The idea behind this project came from an article by Joe Celko in his excellent book, SQL for Smarties.

    What is a Nested Set?
    The Nested Set is an adjacent list realized in SQL as a tree. For a good slide show on the Nested Set theory check this site out.

    Why bother with a Nested Set?
    Imagine a filesystem program that stores its data in a Relational Database Table. Presumably, this table would have at least two columns, one being the unique id of the item, and the second being the id of the object's parent. Additionally, the id is usually the primary key and the parentId is referentially constrained to values in the set of id

    The top of the tree is called the root. The nodes of the tree that have no subtrees beneath them are called the leaf nodes. The descendants, or children, of a parent node are every node at all lower levels in the subtree that has the parent node as its root. In the table the data looks like this:

    id    parentId
    ______________
    1     null | 0
    2         1
    3         1
    4         2
    5         2
    6         3
    7         2
    8         7
    

    This approach is a pretty standard list linking used in Online Transaction Processing type databases. It works pretty well, and doesn't require a lot of integrety checks. Make sure the parentId points to a valid id and you're set. Consider again the filesystem program. It would have the following use-cases.

    1. User browses a node.
    2. User adds a node below an existing node.
    3. User deletes a node and all of its children.
    4. User moves a node to beneath another node.
    5. User searches all nodes beneath a single node.

    The Linked-List approach works great for -1-, cause all you have to do is the O(1) query:

    SELECT * FROM table WHERE parentId = node;
    And the Linked-List approach works well for use-case -2-, as all you have to do is something like this, again an O(1) statement:
    INSERT INTO table ( id, parentId ) VALUES ( newId, NODE );
    But things just totally fall apart in use case -3- there is no easy or efficient way to implement this using the Linked list. Most solutions for this problem drop into a stored procedure or use a middle tier using some iterative loop or a stack. The farther down the tree you have to go, the longer the statement runs.

    Most solutions to this problem are unelegent hacks that take too long and often leave the RDBMS integrity in a vulnrable state while its running. One elegent way to do this still using the linked list approach is to use in a recursive trigger like this to handle the parent/child stuff. Its still not really that fast, but it does keep all the logic in the RDBMS instead of in an outside method.

    For use case -4- you can use:

    UPDATE table SET parentId = NODE;
    which is also very quick and easy.

    However, you're fucked again for use case -5-. This is a big problem, as most users expect to get search information back quickly. If you are running some sort of EIS system where people are constantly searching through deeply nested set of relational information you will soon be fired. This means we have to find an O(x) or quicker algorithm, or rethink the data schema. As very few DBA's have the luxury of rewriting their schemas on a whim, you have to think up a quick and easy way to improve the performace of the system without messing up your legacy stuff. This is why you want to use the Nested Set Model.

  3. Theory

    How do you use Nested Sets?
    Now, in the above diagram the data was shown as a tree of nodes. You can also think the data as a set. Which, is actually a lot more SQL friendly as SQL is a set-based language. Using the same table as about imagine a group of encompassing sets like this:

    You see that the root, node number 1 is the largest set and contains every other node in the system. So, how do we represent this in SQL? We do it using a number line where each set is given a left and right hand bounds.

    For example, below you can see the following:

    Now, that being said the selection of all nodes in an arbitrarily deep nest is done very simply. Generally, this query will be used to feed some other query's in clause or as a constraint for a DML statement.
    SELECT
      n0.nid
    FROM
      NestedSet n0,
      NestedSet n1
    WHERE
        n1.nid = 0
    AND	n0.lft BETWEEN n1.lft AND n1.rft;
    

  4. SQL Implementation

    The Nested Set implementation will most likely be used as an indexing scheme to provide better access times for an existing system. The nested set table may o
    Table "nestedset"
     Column  |  Type  | Notes
    ---------+--------+-----------
    nid      | bigint | Primary Key
    parentId | bigint | Foreign Key (nestedSet.nid)
    rft      | bigint | Unique Key
    lft      | bigint | Unique Key
    

    What's the downside to using Nested Sets?

  5. Java Implementation

    Introduction
    I am going to present the Java Implementation as schema independent as possible, to this end, all the returned values from the imaginary tables in the example will be realized as Lists of Maps where each item in the List corresponds to a row in the resultset, and each row in the Map corresponds to a column with a discrete datatype. This is pretty portable.

    Prime the NestedSet
    Before you can use the NestedSet schema, you need to determine the scope of it. That is, how many tables will be represented (or bound) by the nested set, and do they all share a unique namespace? (eg. the id column.)

    Here is an example in PostgreSQL's procedural language.

    // First we need to assure the language is loaded.
    CREATE FUNCTION plpgsql_call_handler () RETURNS OPAQUE AS '/usr/local/pgsql/lib/plpgsql' LANGUAGE C;
    CREATE TRUSTED LANGUAGE plpgsql HANDLER plpgsql_call_handler;
    
    // Now create the functions you'll need for operation.
    // CloseGap is not necessary, but keeps the NestedSet structure clean.
    CREATE OR REPLACE FUNCTION closeGap(BIGINT) RETURNS INTEGER AS '
    DECLARE r RECORD;
    BEGIN
      SELECT INTO r * FROM nestedSet WHERE nid = $1;
      UPDATE nestedSet
      SET
      lft = CASE
        WHEN lft > r.lft
        THEN lft - (r.rft - r.lft 1 )
      ELSE lft END,
      rft = CASE
        WHEN rft > r.lft
        THEN rft - (r.rft - r.lft 1 )
      ELSE rft END;
    RETURN 1;
    END;
    ' LANGUAGE 'plpgsql';
    CREATE OR REPLACE FUNCTION deleteSubtree(BIGINT) RETURNS INTEGER AS '
    DECLARE r RECORD ;
    BEGIN
      SELECT INTO r * FROM nestedSet WHERE nid = $1;
      DELETE FROM NestedSet
      WHERE lft BETWEEN r.lft AND r.rft;
      RETURN 1;
    END;
    ' LANGUAGE 'plpgsql';
    
    heres some reasonable unportable code as an example. Thank you joe celko.
    
    /**
     *
     * Copyright 2001 by Shawn Deleurme
     * All rights reserved.
     *
     * $Id: NestedSet.java,v 1.28 2002/07/02 16:46:04 sdeleurme Exp $
     */
    package com.deleurme.facility.rdbms;
    
    import java.util.*;
    import java.sql.*;
    import java.util.logging.*;
    import java.util.logging.*;
    
    /**
     * This class encapsulates everything you need to handle a Nested Set
     * structure.
     * 
    * The connection object is utilized as a pass-through ref and should * not be messed with as general-use methods that use this object may * want to roll the entire transaction back. * * * @version 1.0 * */ public class NestedSet /*implements Storable*/ { public static final int ID = 1; public static final int PARENT_ID = 2; public static final int LEFT = 3; public static final int RIGHT = 4; /** Insert a new record into the NestedSet. */ static final String INSERT_NODE = "INSERT INTO NestedSet VALUES ( ?,?,0,0 ) \n" + // Set existing objects left hand bounds " UPDATE NestedSet " + " SET " + " lft = CASE " + " WHEN lft > (SELECT F1.lft FROM NestedSet as F1 WHERE F1.id = ?) " + " THEN lft + 2 " + // Set new objects right hand bounds " WHEN lft = 0 " + " THEN (SELECT F1.lft + 1 FROM NestedSet as F1 WHERE F1.id = ?) " + " ELSE lft " + " END, " + // Set existing objects right hand bounds " rft = CASE " + " WHEN rft > (SELECT F1.lft FROM NestedSet as F1 WHERE F1.id = ?) " + " THEN rft + 2 " + // Set new objects right hand bounds " WHEN lft = 0 " + " THEN (SELECT F2.lft + 2 FROM NestedSet as F2 WHERE F2.id = ?) " + " ELSE rft " + " END "; /** * Private connection. */ private Connection c ; /** * General use constructor inside of entities. */ public NestedSet(Connection c) { this.c = c ; } /** * Constructor for use with the coreConfig startup. */ public NestedSet() { } public void setParent( Datastore datastore ) {} /** * Logger */ Logger logger = Logger.getLogger("com.deleurme"); /** * Simple fast method to fetch a flattened tree. This method * removes the need for fancy stored procs and weird unportable * sql. * * This method simply returns Lists of Longs from the db. * For more general use, just use the NestedContraint. * * * * @return an ArrayList representation of the subtree. * * @param the root node of the tree * * @throws SQLException Thrown if something is wrong * with the connection. */ public List getSubtreeAsMaps( Long root ) throws SQLException { ResultSet results; // I'm big cause resizing me is slow. List list = new ArrayList(64); try { results = c.createStatement().executeQuery( " SELECT e.name, e.entityType " + " FROM NestedSet as n, entityView as e, NestedSet as n2 " + " WHERE " + " n2.nid =" + root.toString() + " AND e.id = n.nid " + " AND n.lft BETWEEN n2.lft AND n2.rft "); ArrayList row = new ArrayList(4); while ( results.next() ) { row.add(new Long(results.getLong(1))); row.add(new Long(results.getLong(2))); row.add(new Long(results.getLong(3))); row.add(new Long(results.getLong(4))); list.add(row.clone()); } } catch (NullPointerException npe) { logger.throwing("NestedSet","getSubtreeAsMaps", npe ); } catch (Exception e) { // throw (e) ; logger.throwing("NestedSet","getSubtreeAsMaps", e ); } finally { return list; } } /** * Retrieves a list of nodes beneath a tree. I strongly recommend. * method should only be invoked in some sort of serialized transactional state. * * * @param root the target node to delete from. * @return A list of maps corresponding to the columns returned. * @throws SQLException Thrown if something is wrong with the connection. */ public List getSubtree( Long root ) throws SQLException { ResultSet results; // I'm big cause resizing me is slow. List list = new ArrayList(64); try { results = c.createStatement().executeQuery( " SELECT * " + " FROM nestedSet as n, nestedSet as n2 " + " WHERE " + " n2.nid =" + root.toString() + " AND n.lft BETWEEN n2.lft AND n2.rft "); // ArrayList row = new ArrayList(4); HashMap row = new HashMap (4); while ( results.next() ) { row.clear(); // don't forget to clear results Collections across loop boundaries row.put( "nid", new Long(results.getLong("nid"))); row.put( "parentId", new Long(results.getLong("parentId"))); row.put( "lft", new Long(results.getLong("lft"))); row.put( "rft", new Long(results.getLong("rft"))); list.add(row.clone()); } } catch (NullPointerException npe) { logger.throwing("NestedSet","getSubtree", npe ); } catch (Exception e) { // throw (e) ; logger.throwing("NestedSet","getSubtree", e ); } finally { return list; } } /** * This method removes all the records beneath a node. * * @param root the target node to delete from. * @throws SQLException Thrown if something is wrong with the connection. */ public void deleteSubtree( Long root ) throws SQLException { int rows = c.createStatement().executeUpdate( " DECLARE " + " @dropLft BIGINT, " + " @dropRft BIGINT " + " SELECT " + " @dropLft=lft, " + " @dropRft=rft " + " FROM NestedSet " + " WHERE nid = " + root.toString() + " DELETE FROM NestedSet " + " WHERE lft BETWEEN @dropLft AND @dropRft; " ); /////////////////////////////////////////////// // Close Gaps! /////////////////////////////// /////////////////////////////////////////////// closeGap( root ); } /** * This method moves a node. This is the hardest * thing to do in the nested set model. * * @param the current node. * @param the destination node. * @throws SQLException Thrown if something is wrong with the connection. */ public void moveNode( Long root, Long destination) throws SQLException { //try { // Get the Subtree and save it. List subTree = getSubtree( root ); // Remove the subtree from its original spot. deleteSubtree( root ); // Insert the subtree at the destination node. insertSubtree( destination, subTree ); // } // catch (SQLException se) { // try { // c.rollback() ; // } // catch (SQLException sse) { // throw new RuntimeException ("Nested Set Severe Failure. Could not roll back.", sse) ; // } // logger.info( "\nmoveNode failed - but was rolled back." ); // throw (se); // } } /** * Inserts an entire subtree at a given node. * * * @param the current connection. * @param List a list of maps representing rows. * @throws SQLException Thrown if something is wrong with the connection. */ public void insertSubtree( Long destination, List subTree ) throws SQLException { if (subTree.size() == 0) return; StringBuffer buffer = new StringBuffer(256); // // Update the root object parentId to // the destination objects id. // //((ArrayList) subTree.get(0)).set(1, destination ); Map rootRow = (Map) subTree.get(0); rootRow.put("parentId",destination); for (Iterator i=subTree.iterator();i.hasNext(); ){ Map row = (Map) i.next(); Long id = (Long) row.get("nid"); Long parentId = (Long) row.get("parentId"); buffer.append( "INSERT INTO NestedSet (nid,\"parentId\", lft, rft) " + "VALUES ( " + id + "," + parentId + ",0,0 )\n" ); } buffer.append ( // Set existing objects left hand bounds "UPDATE NestedSet " + "SET " + "lft = CASE " + " WHEN lft > (SELECT F1.lft FROM NestedSet as F1 WHERE F1.nid = " + destination + " ) THEN lft + 2 " + // Set new objects right hand bounds " WHEN lft = 0 " + " THEN (SELECT F1.lft + 1 FROM NestedSet as F1 WHERE F1.nid = " + destination + " ) ELSE lft " + " END, " + // Set existing objects right hand bounds " rft = CASE " + " WHEN rft > (SELECT F1.lft FROM NestedSet as F1 WHERE F1.nid = " + destination + " ) THEN rft + 2 " + // Set new objects right hand bounds " WHEN lft = 0 " + " THEN (SELECT F2.lft + 2 FROM NestedSet as F2 WHERE F2.nid = " + destination + " ) ELSE rft " + " END "); // logger.info( buffer.toString() ); // for debugging c.createStatement().execute(buffer.toString()); } /** * Inserts and then synchronizes the NestedSets * across the table. * * @param a map of relevent information. * @throws SQLException Thrown if something is wrong with the connection. */ public void insertNode( Map map ) throws SQLException { PreparedStatement statement; long parentId = ((Long) map.get("parentId")).longValue(); long id = ((Long) map.get("id")).longValue(); statement = c.prepareStatement( INSERT_NODE ); //////////////////////////////////////////////// // Sets up parentId and the objects id // //////////////////////////////////////////////// statement.setLong(1, id ); statement.setLong(2, parentId ); statement.setLong(3, parentId ); statement.setLong(4, parentId ); statement.setLong(5, parentId ); statement.setLong(6, parentId ); statement.execute(); } /** * A debugging method. * */ public void printTable() throws SQLException { ResultSet r = null ; r = c.createStatement().executeQuery("SELECT * FROM nestedSet") ; StringBuffer s = new StringBuffer(512); s.append( "-------------------------------------\n" ); s.append( "nid\tpid\tleft\tright\n" ); s.append( "-------------------------------------\n" ); while (r.next()){ s.append ( r.getString( "nid" ) ).append ( "\t" ); s.append ( r.getString( "parentId" ) ).append ( "\t" ); s.append ( r.getString( "lft" ) ).append ( "\t" ); s.append ( r.getString( "rft" ) ).append ( "\t\n" ); } logger.info( "\n\n" + s.toString() ) ; } /////////////////////////////////////////////////////////////////////////////////// // // Handy little methods. // /////////////////////////////////////////////////////////////////////////////////// /** * Useful to maintain the index. */ public void closeAllGaps(){ // someday someone will write me.... } /** * Closes a gap between the points of a root object. While not * doing this does not break things, the nestedset indexing will become harder to * understand. * * @return the number of rows affected. * @param the root of the gap. * @throws SQLException on horrible failure. * */ public int closeGap(Long root) throws SQLException { return c.createStatement().executeUpdate ( // Find the gap. " DECLARE @LEFT BIGINT, @RIGHT BIGINT " + " SELECT @LEFT = lft, @RIGHT = rft " + " FROM NestedSet " + " WHERE nid = " + root.toString() // Close the gap. + " UPDATE NestedSet " + " SET " + " lft = CASE " + " WHEN lft > @LEFT " + " THEN lft - ( @RIGHT - @LEFT + 1 ) " + " ELSE lft END, " + " rft = CASE " + " WHEN rft > @LEFT " + " THEN rft - ( @RIGHT - @LEFT + 1 ) " + " ELSE rft END; " ); } /** * @params Long node. */ public void deleteNode(Long node){ throw new RuntimeException ("NestedSet.deleteNode(Long node) not implemented yet"); } /** * Returns map * * * */ public List getPathBetweenNodes(Long start, Long end) throws SQLException { ResultSet r = c.createStatement().executeQuery( " SELECT t2.nid,( " + " SELECT COUNT(*) " + " FROM NestedSet as T4 " + " WHERE t4.lft BETWEEN t1.lft AND t1.rft " + " AND t2.lft BETWEEN t4.lft AND t4.rft ) AS path " + " FROM nestedSet as T1, nestedSet as T2, nestedSet as T3 " + " WHERE t1.nid = " + start + " AND t3.nid = " + end + " AND t2.lft BETWEEN t1.lft AND t1.rft " + " AND t3.lft BETWEEN t2.lft AND t2.rft ") ; // you know, i used to work. i can't remember why this is broten. return null; } // // // ------------------------------------ // // /** * Useful to maintain the index. */ public void makeSpace(){ } public int getHeight(){ return 0; } public List getChildrenAtLevel(int level){ return null; } public List getImmediateChildren(){ return null; } public List getLeaves() { return null; } public List getPath() { return null; } }
  6. C++ Implementation

    to be delivered (anybody?)
  7. C# Implementation

    to be delivered (anybody?)
  8. Conclusion

    to be delivered (anybody?)