[XAR2] New implementation of Blocklayout compiler
- From: Marcel van der Boom <marcel (at) hsdev.com>
- Date: Fri, 03 Nov 2006 11:59:11 +0100
In the 2.x development branch i've switched the usage of the
Blocklayout compiler to a new XSL based implementation.
Instead of a hand-crafted character parser and codegenerator, the
blocklayout compiler is now made up of an XSLT processer based on the
PHP5 XSL extension, using a set of xsl templates to tranform xaraya
templates to the desired output. As a consequence, besides the PHP5
requirement we already imposed, there is now also the XSL extension
requirement for the 2.x branch. This extension is bundled by default
in PHP5, so this should not cause any problems we hope.
The implementation is far from complete but works well enough to
warrant testing by a broader group of people. The templates as they
are of this moment are still compatible with both BL compilers, but
this is likely to change rapidly, moving template constructs to BL2
syntax where appropriate. It is likely that the BL 2 compiler will not
remain 100% backward compatible with the existing compiler in some
areas. We keep a record of these so it should not cause unnecessary
pain tracking these changes down to change the relevant template
constructs.
All xar: tags as they are defined in RFC-0010 should work as
documented (with a few exceptions which are not finished yet) as well
as the custom tags defined in the core repository. (base and
dynamicdata have their own tag definitions) Modules have not been
considered yet, in general, for the 2.x branch (but we secretly test
them anyways, it's no disaster yet)
The implementation has been around for a while here locally. Going the
last mile and implementing the remaining tags was done last week. The
preliminary test results are very promising. If you can stand the
occassional ugly exception here and there, i encourage you to take a
look and report your findings (no bugzilla for 2.x yet, it cant keep
up with the change-rate we have ;-) ).
Some observations from testing i'd like to share:
- as XSL is very strict in what it will and wont do for you, there's
very little room for 'cheating' with undefined entities, unclosed tags
or other invalid XML constructs. This in itself can be a pain
sometimes, but the good news is that in core I only had to change a
handful of things to bring it into shape.
- we are transforming the input .xd/.xt/.xml directly to the compiled
cache, skipping the step of explictly building a Node Tree in PHP. Of
course, behind the scenes the XSL processor does roughly the same, but
in the extension and invisible. This is a lot faster (the extension is
written in C, which in general should be faster than interpreted PHP)
(for some templates having the compile cache off is even faster than
with the cache turned on!!, which was sort of a surprise )
- being forced to adhere to the XML rules showed a couple of
inconsistencies in the definition of the XAR template language. Design
errors, if you will. These will be corrected in the syntax, where they
will not lead to massive incompatibilities.
- The new compiler is 'context aware' which means we have much more
control over how templates will be processed. One of the more
interesting consequences could be for example that the <xar:mlstring/>
tag in it's plain form (that is, not containing #(1) like constructs
is redundant. The compiler has enough contextual info to detect text
nodes as such, and automatically put them up for translation. If this
is going to happen, is not yet certain, but it would make templating
quite a bit more comfortable in my opinion.
- creating a new tag, assuming a little XSL knowledge is very easy.
Most of the tags were literally implemented in 30 minutes or less.
On the other hand, if a tag *is* complex, it can take a whole day to
figure out how to do it.
- it has becomes clear that we probably will need to provide some
'convenience' settings to go with the new compiler. For example:
everyone is sort of used to using or é etc. These are
HTML entities, by default XML has no knowledge of them, so we need to
put some rules into place for those types of situations (either
switching to numeric entities only, or predefine some of the more
common ones). There are a couple of other areas like that which we
will need to address to try complication to the minimum.
- similarly for the expressions we use in BL (the things between
#-pairs) there is less room for cheating, especially in attributes.
One of the challenges is to precisely define how '#' is handled as it
also has a special meaning in entities (like:   anchors in XML
dialects like XHTML (named #anchors) and URI-adressing (#target in an
URI). Solvable, but easy to do wrong.
- we get a couple of things for free now which were problematic in the
earlier implementation. Most importantly <![CDATA[...]]> sections work
out of the box now (very, very comfortable in templating). Another
example is Processing instructions (PI), like <?php or <?perl or <?xar
can now be processed natively if we like. (they are disabled now, as
they are in BL 1 )
Personally, i'm very excited about this because it opens up a whole
new playground giving more power to the output producing (which is
already pretty powerful in my opinion) and it marks the next step in
creating an output independent templating system.
So, good times :-)
marcel.
_______________________________________________
Xaraya_devel mailing list
Xaraya_devel (at) xaraya.com
http://xaraya.com/mailman/listinfo/xaraya_devel