Spotlight Engine (engine.txt)
                       -----------------------------
                                    by
                              Jochen Wilhelmy
                                  a.k.a.
                                 digisnap
                         digisnap@cs.tu-berlin.de
                                21.06.1997

Introduction
============
This text file describes how the engine of the "Spotlight" 64Kb intro
works. This file should be read together with the source code, since I
often refer to it. Only very important structures are also in this file,
but descriptions of procedures are just an extension to the comments in the
source.

second section describes the dungeon which is static and has a special
structure to make sorting and casting shadows easier. The third section
is about the shadow algorithm.


Index
=====

1. Animated objects
1.1. Object classes
1.2. Loading objects
1.3. Animating objects
1.4. Tracks
1.5. Viewers

2. Dungeon
2.1. Loading cubes
2.2. Tracing objects
2.3. Visible cube determination

3. Texture draw routines
3.1. Drawing cubes and meshes
3.2. Projection
3.3. Clipping
3.4. Drawing polygons

4. Lights

Casting shadows


Appendix: layout of the world data file



1. Animated objects
===================
This section describes all animated objects in the dungeon, how they are
stored, read into memory and processed during the runtime. The dungeon is a
static object and described in section 2.

1.1. Oject classes
==================
I used a pseudo OOP system to describe these objects: polygon mesh, camera,
spotlight, target (for cameras and lights) and bound. The base class called
object possesses information common to all other objects. This is a
position vector, a rotation matrix and some pointers to build up a
hierarchical structure. The spotlight (only called light) is inherited from
the camera, since it has the same behaviour. A bound is a point to describe
a bounding box. It is used in conjunction with the dungeon system. This is
the inheritance diagram:


        ____________ object ____________
        |         |          |         |
       mesh     camera     target     bound
                  |
                light


The structure of the base class, the object, looks like this in memory:

tobject struc
 o_vmt          dd ?

 ;hierarchy-system: (pointer)
 o_owner        dd ?
 o_next         dd ?
 o_child        dd ?

 ;linear chain: (pointer)
 o_chain        dd ?                    ;links all objects globally

 ;descent-like cube system: (pointer)
 o_nextincube   dd ?                    ;link to next object in cube
 o_cube         dd ?                    ;link to cube

 ;state: (vektor/matrix)
 o_p            dd 3 dup(?)             ;actual position (position and matrix have fixed order in memory)
 o_A            dd 9 dup(?)             ;actual rotation and scale
 o_hidden       dd ?

 o_stamp        dd ?                    ;stamp to mark object as "used"
ends

Each object (a mesh, a camera ...) in memory has this structure as a kind
of header. After it, there come the specific fields for the actual object.
Look in the source for them. The first field o_vmt is a pointer to the
virtual method table (vmt) of the object. If it is a mesh, then it points
to m_vmt in the data segment. If it is a camera, then it points to c_vmt.
The virtual method table looks like this:

tobjvmt struc
 o_size         dd ?                    ;size of object
 o_firstobj     dd ?                    ;points to memory pool of the type
 o_nextobj      dd ?                    ;for memory allocation from the pool
 o_loaddata     dd ?                    ;virtual procedure to load data
 o_dotracks     dd ?                    ;virtual procedure to calculate the tracks
ends

In an object oriented language like C++ you can only make procedures
virtual, but here are three variables and two procedures virtual. The
variables are used for memory allocation, which is done in the readworld
procedure, and to link targets to cameras and lights.

1.2. Loading objects
====================
Now I describe how these objects are loaded into memory from the world data
file. This data file is included from the file "world.dat" and starts at
the label "world" in the data segment. The readworld procedure reads in
this file how many objects of a type are in the world, multiplies it with
the corresponding o_size to calculate the amount of memory needed and
stores a pointer to it in the o_firstobj and o_nextobj field. This is done
five times since there are five different object classes (also called
types). Then it calls the recursive procedure readobjects, which builds up
the hierarchical structure of all objects in memory. This procedure can
process all different objects via the vmt. It first reads an id from the
world data file. This are the possible id's: 0=mesh, 1=camera, 2=light,
3=target, 4=bound. Then it calculates the offset to the vmt by multiplying
the id with the size of the vmt structure and adding the offset of the
first vmt (m_vmt). Now it gets memory by loading the o_nextobj pointer and
incrementing it by o_size. The o_next field is loaded with a pointer to the
previois object, which is NULL for the first one. This is why the order of
the objects in one hierarchy level is reversed. o_owner is a pointer to
the owner (the caller). With o_chain all objects are linked to a big chain,
regardless of their type. With o_cube the object is linked to the part of
the dungeon, where it is in and o_nextincube links all objects in the same
cube. Read about these cubes in section 2. With the virtual call via the
vmt-field o_loaddata the object specific data is loaded. See the procedures
m_loaddata to b_loaddata which are selected by the actual vmt. Now the
procedure calls itself recursively and passes the actual object as owner
and the number of childs. The recursive call returns a pointer to the
sub-tree, which is stored in o_child (may be NULL). It repeats until no
more objects have to be loaded in this hierarchy level.

Imagine a walker with a body and two legs. This is how the id and childs
field are stored in the world data file and how a hierarchy structure is
build up:

Hierarchy level/Name           Data

                               2 (objects in highest hierarchy)

0  body                        0 (id: mesh)
                               2 (childs)

1  upper right leg             0 (id: mesh)
                               1 (childs)

2  lower right leg             0 (id: mesh)
                               1 (childs)

3  right foot                  0 (id:mesh)
                               0 (childs)

1  upper left leg              0 (id: mesh)
                               1 (childs)

2  lower left leg              0 (id: mesh)
                               1 (childs)

3  left foot                   0 (id:mesh)
                               0 (childs)

0  camera                      1 (id: camera)
                               0 (childs)

This is the structure in memory. The child pointer is a downlink and is
marked with c, the owner pointer is an uplink and is marked with o. The
next pointer is marked with n.

  (objtree, global variable)
   |
   |
   |   NULL                               NULL
   |   |                                    |
   |   |o                                  o|
   |   |                                    |
  (light) --------n-----> (******* body *****) -n-> NULL
   |                       |     |          |
  c|                      c|    o|         o|
   |                       |     |          |
  NULL             (upper left leg) --n--> (upper right leg) -n-> NULL
                           |     |          |      |
                          c|    o|         c|     o|
                           |     |          |      |
         NULL <-n- (lower left leg)        (lower right leg) -n-> NULL
                           |     |          |      |
                          c|    o|         c|     o|
                           |     |          |      |
              NULL <-n- (left foot)        (right foot) -n-> NULL
                           |                |
                          c|               c|
                           |                |
                         NULL             NULL


All other things concerning the loading of objects can be seen in the
source code.


1.3. Animating objects
======================
This is done with the recursive procedure dotracks. It processes all types
via the vmt, like the procedure readobjects. First it calls the virtual
procedure o_dotracks. See the procedures m_dotracks to b_dotracks which are
selected by the actual vmt. These procedures calculate a new position and
matrix and set the hidden-flag if the object has a hide-track. If the
object has an owner, the procedure readobjects recalculates the position
p and the maptrix A like this:

A = Ao*A,
p = Ao*p + po,

where po and Ao are the position and matrix of the owner. Then it calls
itself recursively and repeats until there are no more objects in the
same hierarchy level.

1.4. Tracks
===========
The tracks stay in the world data file and are referenced by pointers. The
mode field is a boolean and is set, if the track is looped. The keys field
is one less than the actual number of keys, since nothing has to be
interpolated if there is only one key. After all frame numbers come the
points (P1, P4) and tangents (R1, R4) for the hermite spline interpolation.
The formula is with t running from 0.0 to 1.0:

Q(t) = (2t^3-3t^2+1)*P1 + (t^3-2t^2+t)*R1 + (-2t^3+3t^2)*P4 + (t^3-t^2)*R4.

At joint points only one point is stored as P4 for the incoming and P1 for
the outgoing spline. The procedure dotrack does this for all tracks except
the rotation track, which has to be handled extra. In a rotation track
there are quaternions stored after all frame numbers. They are called qn
instead of P1, an instead of R1, b(n+1) instead of R4 and q(n+1) instead
of P4. If q0 to q2 are temporary quaternions, you do it like this:

q0 = slerp(qn,an,t);
q1 = slerp(an,b(n+1),t);
q2 = slerp(b(n+1),q(n+1),t);

q0 = slerp(q0,q1,t);
q1 = slerp(q1,q2,t);

q0 = slerp(q0,q1,t);

If you treat a quaternion as a four dimensional vector, this is the slerp
function:

slerp(q1,q2) = sin((1-t)*a)/sin(a) * q1  +  sin(t*a)/sin(a) * q2,

Where a is the angle between the vectors. Since all quaternions have unit
length, it can be calculated like this:

cos(a) = q1 * q2.

The interpolated rotation is now in q0. The rotation matrix A is calculated
like this if q0 = [w,(x,y,z)]:

     ( 1-2yy-2zz        2xy+2wz          2xz-2wy  )
A =  (  2xy-2wz        1-2xx-2zz         2yz+2wx  )
     (  2xz+2wy         2yz-2wx         1-2xx-2yy )

See the text file splines.txt if you want to know more about tracks and how
the control points are calculated.

1.5. Viewers
============
In my simple hierarchy structure a light is derived from a camera. Better
would be a viewer class and derive the camera and the light from it. But
anyway, I refer to both light and camera as a viewer. A light has only a
hide track and a pointer to a map in addition. After all tracks are
calculated, the procedure doviewer is first called for the camera and later
for all lights. It generates from a camera with a target, a rotation (roll)
about the view axis and an opening angle (FOV) a viewer. This is a
structure (tviewer, prefix v_) containing a local coordinate system, which
already takes roll, FOV and for cameras the screen ratio of x:y = 4:3 into
account. The three axes of the local coordinate system are called l1, l2
and l3. l1 is the right hand vector, l2 is pointing upwards and l3 is the
view direction. In the first step the local coordinate system is calculated
without roll, FOV and the x:y ratio. If u is a vector pointing from the
viewer to the target, then the formulas are:

l3 = u/|u|,
l1 = ((u) x (0,0,1))/|(u) x (0,0,1)|,
l2 = l1 x l3.

The x3 (z) component of a vector indicates the height in the world. The
cross product with (0,0,1) aligns the viewer with the floor. Then l1 and l2
are recalculated to include roll, FOV and x:y ratio:

l1' = (l1*cos(roll) + l2*sin(roll))/tan(FOV),
l2' = (l2*cos(roll) + l1*sin(roll))*(x:y ratio)/tan(FOV).


2. Dungeon
==========
The dungeon consists of linked convex cubes. This has many advantages to
just having all polygons in an array and sorting them. The first thing is
that normal 3D editors do not support the kind of texture alignment needed
here. That's why I made an extra editor for this. The second are speed
reasons. The dungeon engine only needs to draw the cubes which are seen by
the camera to a certain depth. All other cubes don't need to be touched.
Theoretically the dungeon can have infinite size, which obviously can not
be handeled with a sorting algorithm. Additionally, it reduces errors of
the shadow algorithm and makes it faster.


2.1. Loading cubes
==================
Unlike the objects the cubes are not pointer linked in memory. All cubes
have equal size, so the offset to a cube can be calculated with the number.
A cube consists of six sides and eight vertices, in memory it also has some
other fields. The vertices are referenced by pointers to a vertex list. The
sides contain links to adjacent cubes and mapping coordinates. If the four
points of a side do not lie in one plane, it is split into two triangles.
The sfTile flag specifies one of two possible directions, so that the cube
stays convex. While loading, for each triangle (half of a side) a normal
vector is calculated. It is used to describe the plane in which the
triangle lies, together with one of the points as a supporting vector.
Therefore there are twelve planes for each cube, which are used to trace
the animated objects (see below). The procedure readcubes loads them into
memory. It reads the number of vertices and cubes from the world data file
and enters a loop to process all cubes. First, the information for the six
sides are read into the memory. Remember that the procedure readworld has
already allocated the memory for all cubes. This is why the pointers to
linked cubes can already be calculated, even if they are not loaded yet.
Then there is a little loop to generate the vertex pointers. The most
complicated part calculates the normals. The part is executed twice and at
each pass one point of the side is skipped, since each side is split into
two triangles. If sfTile is 0, then the two triangles have the points 1,2,3
and 0,1,3. If sfTile is 1, then they have the points 0,2,3 and 0,1,2. The
supporting vector of a triangle is stored as a pointer to one of the three
vertices in d_o, the normal in d_n. The whole procedure repeats until
there are no more cubes to read.


2.2. Tracing objects
====================
To use the advantages of the dungeon system for the animated objects, we
should know in which cube they are. So every frame the procedure
traceobjects is called to check if an object has left one cube and entered
the side-cube. This is where we need the global chain and the normal
vectors of the sides. The procedure enters a loop to process all objects
and checks if the actual object is linked to a cube. Targets and meshes
with bound objects are not linked to cubes. Bound objects, cameras and
lights are always linked to a cube. If an object is linked to a cube, it
calculates the distance d = (p-o)*n to the sides of the cube which are
connected to another cube (max. twelve calculations for each cube), where p
is the position of the object, o is the supporting vector and n the normal
vector of the triangle. If the distance is negative, the object is
disconnected from the cube and linked to the side-cube. The loop repeats
until there are no more objects to check. If the frame rate gets too slow,
it may happen that objects can't be traced correctly. They will disappear
as soon as the cube gets out of view, to which they are linked. This is why
the procedure traceobjects is called extra by a loop in the main program if
the distance between two frames gets too high.


2.3. Visible cube determination
===============================
The procedure cubetree generates a list of all cubes which have to be
rendered. It can be called with a camera or a light as the viewer. At this
point a new structure is introduced, the temporary cube (ttempcube, prefix
tc_). A temporary cube is used for every cube which is visible from the
viewer. It contains the transformed vertices, a pointer to the original
cube, a recursion level number and a pointer to a temporary mesh list.
The temp cubes are organized array-like. There are two temp cube lists, one
for the camera and one for the lights. Their memory is allocated at program
start. A viewer contains a pointer to the temp cube list to use. Although
the procedure is called cubetree, it is linear organized. In general, it
transforms the cube where the viewer is in and places it at the beginning
of the temp cube list with a recursion level of 0. Then it transforms all
cubes which are connected to this one and inserts them into the list with
recursion level 1. Then all cubes which are connected to cubes of level 1
are transformed and inserted as level 2 and so on. This process terminates
when no more cubes are found or the temp cube list is full. Of course each
cube can be inserted only once. This is guaranteed by the stamp value which
marks a cube as "used". The cubes are also tested for visibility before
they are inserted into the list. This would be the recursion levels of some
connected cubes, the viewer with his direction is marked with V:

  -----------------------------------
    |   0 |  1  |  2  |  3  |  4  |
    | V-> |     |     |     |     |
  --------------|--------------------
                |  3  |  4  |
                |     |     |
                -------------

If we draw all cubes in reverse order (starting with the highest level) we
don't need to sort them. This are the transform formulas:

x1 = x = (v-l)*l1,
x2 = y = (v-l)*l2,
x3 = z = (v-l)*l3,

where v is the vertex to transform, l is the location of the viewer and l1,
l2 and l3 are the local coordinate system of the viewer. x1, x2 and x3 are
the components of the transformed vertex. The determination if the cube
can be seen by the viewer tests the vertices against the four edges of the
view area (for a camera the screen edges). Since the vertices are already
transformed, we see them through a "unit viewer" with an opening angle of
45 degree of all four edges (this is a "overall" opening angle of 90
degree). At first, look at this method to test if a single vertex is seen
by the viewer:

b = (x1 < x3) and (-x1 < x3) and (x2 < x3) and (-x2 < x3),

where b is true if the vertex is visible. But we can't use this test,
because we have to test if all vertices of a cube lie beyond the same edge.
If, for example, one half lies beyond the upper edge and the other half
beyont the right edge, the cube may be partially visible. This is why
there are four counters (c1 to c4) in the procedure cubetree to count how
many vertices lie beyond each edge. After each pass there is a test if all
vertices lie beyond one edge so far. If not, the cube may be visible. The
whole loop is skipped for the first cube where the viewer is in, because
this one can be seen in any case. If a cube is visible and the actual
viewer is a camera, an extra stamp and link to the temp cube is stored.
They are used where the lights are processed to test if a cube seen by a
light is also seen by the camera and, if this is true, to access the
transformed vertices from the camera's point of view. There are two temp
cube lists, because the lights need the transformed vertices of a cube from
thier own view and the view of the camera. Look in the next section for
more. Now the next cube to transform is searched. It is a side-cube of
the cube pointed to by readptr at the side indexed by side. If there is no
side-cube or if it is already used, the next side is tested. After the
last side, the next cube in the temp cube list becomes the actual
read-cube. If no more cubes can be added or the list is full, the loop
terminates.

Now the temporary mesh list is generated for all temporary cubes. There is
a temp cube loop and a mesh loop. Only whole hierarchy trees are inserted
into the temp mesh list. This is to avoid sorting errors that occur when
the objects of a hierarchy tree like the walker from section 1 are assigned
to different cubes. This is also the place where the bound objects are
used. Without them the textures of a wall or the floor can cover the
hierarchy tree partially when it reaches into a cube which is closer to the
viewer. But if a hierarchy tree is surrounded by some bounding objects, the
whole hierarchy tree will be inserted into the cube closest to the viewer
containing at least one bounding object (actually into the temp mesh list
of the temp cube linked to that cube). To this point no sorting algorithm
had to be used. Only the temp mesh list of a temp cube has to be sorted
later.


3. Texture draw routines
========================
First some words about the screen buffer, where the routines draw to: it
contains four bytes for each pixel. From lowest to highest byte they are:
unused, pixel value, light value, z value. All texture draw routines set
the light value to the actual ambient light value, which is modified by
the procedure doambienttrack once per frame. The light draw procedures add
to the light value for each light source later. The z value is a kind of z
buffer, but it contains for polygons of a cube the recursion level. For
polygons of a mesh it contains a number which is decreased with every mesh
and starts at 255. The use of this is that the light draw routines can
check if they are illuminating the right polygons. Imagine a wall which is
illuminated by a light source, but covered with another wall which is dark.
Without an extra identification value for each pixel the light draw
routines can not avoid lighting the wall which is dark. The limitations to
this are that there can be a maximum of about 220 meshes (the rest is used
by the recursion levels of the dungeon) and all meshes have to be convex.
Not convex meshes can not cast shadows on themselves.


3.1. Drawing cubes and meshes
=============================
The procedure texturedraw goes through the temp cube list from the last to
the first cube and draws first all textures of a cube, then all meshes
which are in the temp mesh list of that cube. The mesh list also has to be
sorted. It first calls the procedure drawcube for a cube. It draws all
sides of a cube with a texture. If all four points of a side lie
approximately in one plane, then the whole side drawed, otherwise two
triangles. Here a new structure is used, the screenpoint (tscreenpoint,
prefix sp_). It contains the location and mapping coordinates for one
point. A polygon consists of several screenpoints, organized array-like.
After a whole or a halve side is converted into polygon, it is projected
to the screen by the procedure projection and then drawed either
perspective corrected or not. This depends on the recursion level t_sub.
Far polygons usually do not need to be corrected.
Then the procedure texturedraw calls some procedures to draw the meshes.
The procedure xformincube calls the procedure meshxform to transform each
mesh and sorts them. The sorting routine does not need to be fast, because
it sorts just some meshes. If it would sort hundrets of polygons a quick-
or mergesort would have been better. There is also a special case testing
if a mesh has the mfUnitenext flag set. If yes, the next mesh is treated as
if it belonged to the same object: it can't cast shadows on it. The sorting
algorithm therefore must not seperate such meshes. It was introduced that
the walker does not have bad looking shadows on the knees and on the feet,
where two meshes overlap a bit. The procedure meshxform first calculates a
new matrix and position relative to the viewer, then it transforms the
vertices of the mesh. A and p are the rotation and position of the mesh,
Al and l define the local coordinate system of the viewer, where l1, l2 and
l3 are the rows of Al, and v is the vertex to transform, then

A' = Al*A,
p' = Al*(p-l)
v' = A*v + p.

The first two equations have to be calculated once per mesh, the last once
per vertex. Like the temp cube lists there are two temporary lists for
transformed vertices, one for the camera and the other for the lights. If
the actual viewer is a camera, an extra reference to the transformed
vertices is saved for later use in the light procedures. The procedure
meshxform returns a z distance for the sorting algorithm in xformincube.
The procedure makefaces makes faces from the transformed vertices with the
face data and inserts them into the screenpoints list. The procedure
drawfaces finally draws all faces in the screenpoints list without
perspective correction.


3.2. Projection
===============
The porcedure projection does the projection and the z-clipping of the
polygons with a loop repeated once for each point of the polygon. If a
point of the polygon is in front of the projection plane, then the screen
coordinates are calculated like this:

dx = xres/2 *(1 + x/z),
dy = yres/2 *(1 - y/z),
dz = z,

where xres and yres are the screen resolution. This new point is inserted
to a destination array. If an edge of the polygon passes through the
projection plane (one point is in front, the other behind of it) the point
of intersection is calculated like this:

r = (1.0-z)/(nz-z),
dx = xres/2 *(1 + x+r*(nx-x)),
dy = yres/2 *(1 - y+r*(ny-y)),
dz = 1.0.

This point is also inserted to the destination array. Now the loop repeats
to process the next point. Note that the projection plane has a distance of
one to the viewer. The visible area ranges from (1.0,1.0) to (-1.0,-1.0) on
that plane. The terms xres/2 *(1 + x) and yres/2 *(1 - y) adjust this range
to the actual screen resolution. If all points are behind the projection
plane, no points will arrive in the destination array and therefore the
polygon will not be drawed. In the end the destination array is copied to
the source array.


3.3. Clipping
=============


3.4. Drawing polygons
=====================


4. Lights
=========


Appendix: layout of the world data file
=======================================

structure:
==========

contents:
---------
 meshes         (int)                   ;for memory allocation
 cameras        (int)
 lights         (int)
 targets        (int)
 bounds         (int)
 cubes          (int)

bitmaps:
--------
 palette        (768 chars)             ;palette for all bitmaps
 bitmapdatalen  (int)                   ;length of bitmap data
 bitmapdata     (n chars)               ;all bitmap data

materials:
----------
 materials      (int)                   ;number of materials
 materialdata   (n materials)

objects:
--------
 objects        (int)                   ;objects in highest hierary level
 objectdata     (? objects)             ;recursively read object tree

level:
------
 vertices       (int)                   ;(number of vertices)*3*4
 vertexdata     (n vectors)
 cubes          (int)                   ;number of cubes
 cubedata       (n cubes)


data types:
===========

material:
---------
 mappos         (int)                   ;position in bitmap-data
 mapand         (int)                   ;resolution (and-value for inner-loop)

object:
-------
 id             (int)                   ;0 = mesh
                                        ;1 = camera
                                        ;2 = light
                                        ;3 = target
                                        ;4 = bound
 cubenum        (int)                   ;number of cube this object is in
 childs         (int)                   ;number of childs to read

Mesh:
-----
 (object)                               ;object has been read first
 flags          (int)
 material       (int)

 vertices       (int)
 vertexdata     (n vektors)
 *normals       (n vektors)             ;not implemented
 *mapping       (n*2 floats)            ;only if mfMapping is set in flags
 faces          (int)
 facedata       (n faces)

 position track (track)
 rotation track (track)
 scale track    (track)
 hide keys      (int)
 hide track     (n ints)

camera:
-------
 (object)
 targetnum      (int)
 position track (track)
 FOV track      (track)
 roll track     (track)

light:
------
 (camera)                               ;inherited from camera
 hide keys      (int)                   ;use hotspot-track as hide-track
 hide track     (n ints)
 map-boolean    (int)                   ;0: round map, 1: map with logo

target:
-------
 (object)
 position track (track)

bound:
------
 (object)
 position       (vector)

track:
------
 size           (int)                   ;ttrackh
 mode           (int)                   ;
 keys           (int)                   ;
 startframe     (int)                   ;
 for each key:
 easefrom       (float)                 ;not implemented
 easeto         (float)                 ;not implemented
 frame          (int)

 P1             (n floats)
 for each key:
 R1             (n floats)
 R4             (n floats)
 P4/P1          (n floats)

 size = (ttrackh_ints)*4 + keys*(ttrack_ints-1)*4 + typesize*4 + keys*3*typesize*4
      = (ttrackh_ints + keys*(ttrack_ints-1) + typesize*(1 + keys*3))*4

 keypos = (site ttrackh) - 4 + key*(size ttrack - 4)



cube:
-----
 side           (6 sides)
 vertex         (8 ints)                ;(vertex to use)*3*4

side:
-----
 flags          (int)
 mappos         (int)                   ;if sfBitmap
 mapand         (int)                   ;if sfBitmap
 mapping        (2*4 floats)            ;if sfBitmap
 nextcube       (int)                   ;if sfNextcube