Spotlight Engine (engine.txt) ----------------------------- by Jochen Wilhelmy a.k.a. digisnap digisnap@cs.tu-berlin.de 21.06.1997 Introduction ============ This text file describes how the engine of the "Spotlight" 64Kb intro works. This file should be read together with the source code, since I often refer to it. Only very important structures are also in this file, but descriptions of procedures are just an extension to the comments in the source. second section describes the dungeon which is static and has a special structure to make sorting and casting shadows easier. The third section is about the shadow algorithm. Index ===== 1. Animated objects 1.1. Object classes 1.2. Loading objects 1.3. Animating objects 1.4. Tracks 1.5. Viewers 2. Dungeon 2.1. Loading cubes 2.2. Tracing objects 2.3. Visible cube determination 3. Texture draw routines 3.1. Drawing cubes and meshes 3.2. Projection 3.3. Clipping 3.4. Drawing polygons 4. Lights Casting shadows Appendix: layout of the world data file 1. Animated objects =================== This section describes all animated objects in the dungeon, how they are stored, read into memory and processed during the runtime. The dungeon is a static object and described in section 2. 1.1. Oject classes ================== I used a pseudo OOP system to describe these objects: polygon mesh, camera, spotlight, target (for cameras and lights) and bound. The base class called object possesses information common to all other objects. This is a position vector, a rotation matrix and some pointers to build up a hierarchical structure. The spotlight (only called light) is inherited from the camera, since it has the same behaviour. A bound is a point to describe a bounding box. It is used in conjunction with the dungeon system. This is the inheritance diagram: ____________ object ____________ | | | | mesh camera target bound | light The structure of the base class, the object, looks like this in memory: tobject struc o_vmt dd ? ;hierarchy-system: (pointer) o_owner dd ? o_next dd ? o_child dd ? ;linear chain: (pointer) o_chain dd ? ;links all objects globally ;descent-like cube system: (pointer) o_nextincube dd ? ;link to next object in cube o_cube dd ? ;link to cube ;state: (vektor/matrix) o_p dd 3 dup(?) ;actual position (position and matrix have fixed order in memory) o_A dd 9 dup(?) ;actual rotation and scale o_hidden dd ? o_stamp dd ? ;stamp to mark object as "used" ends Each object (a mesh, a camera ...) in memory has this structure as a kind of header. After it, there come the specific fields for the actual object. Look in the source for them. The first field o_vmt is a pointer to the virtual method table (vmt) of the object. If it is a mesh, then it points to m_vmt in the data segment. If it is a camera, then it points to c_vmt. The virtual method table looks like this: tobjvmt struc o_size dd ? ;size of object o_firstobj dd ? ;points to memory pool of the type o_nextobj dd ? ;for memory allocation from the pool o_loaddata dd ? ;virtual procedure to load data o_dotracks dd ? ;virtual procedure to calculate the tracks ends In an object oriented language like C++ you can only make procedures virtual, but here are three variables and two procedures virtual. The variables are used for memory allocation, which is done in the readworld procedure, and to link targets to cameras and lights. 1.2. Loading objects ==================== Now I describe how these objects are loaded into memory from the world data file. This data file is included from the file "world.dat" and starts at the label "world" in the data segment. The readworld procedure reads in this file how many objects of a type are in the world, multiplies it with the corresponding o_size to calculate the amount of memory needed and stores a pointer to it in the o_firstobj and o_nextobj field. This is done five times since there are five different object classes (also called types). Then it calls the recursive procedure readobjects, which builds up the hierarchical structure of all objects in memory. This procedure can process all different objects via the vmt. It first reads an id from the world data file. This are the possible id's: 0=mesh, 1=camera, 2=light, 3=target, 4=bound. Then it calculates the offset to the vmt by multiplying the id with the size of the vmt structure and adding the offset of the first vmt (m_vmt). Now it gets memory by loading the o_nextobj pointer and incrementing it by o_size. The o_next field is loaded with a pointer to the previois object, which is NULL for the first one. This is why the order of the objects in one hierarchy level is reversed. o_owner is a pointer to the owner (the caller). With o_chain all objects are linked to a big chain, regardless of their type. With o_cube the object is linked to the part of the dungeon, where it is in and o_nextincube links all objects in the same cube. Read about these cubes in section 2. With the virtual call via the vmt-field o_loaddata the object specific data is loaded. See the procedures m_loaddata to b_loaddata which are selected by the actual vmt. Now the procedure calls itself recursively and passes the actual object as owner and the number of childs. The recursive call returns a pointer to the sub-tree, which is stored in o_child (may be NULL). It repeats until no more objects have to be loaded in this hierarchy level. Imagine a walker with a body and two legs. This is how the id and childs field are stored in the world data file and how a hierarchy structure is build up: Hierarchy level/Name Data 2 (objects in highest hierarchy) 0 body 0 (id: mesh) 2 (childs) 1 upper right leg 0 (id: mesh) 1 (childs) 2 lower right leg 0 (id: mesh) 1 (childs) 3 right foot 0 (id:mesh) 0 (childs) 1 upper left leg 0 (id: mesh) 1 (childs) 2 lower left leg 0 (id: mesh) 1 (childs) 3 left foot 0 (id:mesh) 0 (childs) 0 camera 1 (id: camera) 0 (childs) This is the structure in memory. The child pointer is a downlink and is marked with c, the owner pointer is an uplink and is marked with o. The next pointer is marked with n. (objtree, global variable) | | | NULL NULL | | | | |o o| | | | (light) --------n-----> (******* body *****) -n-> NULL | | | | c| c| o| o| | | | | NULL (upper left leg) --n--> (upper right leg) -n-> NULL | | | | c| o| c| o| | | | | NULL <-n- (lower left leg) (lower right leg) -n-> NULL | | | | c| o| c| o| | | | | NULL <-n- (left foot) (right foot) -n-> NULL | | c| c| | | NULL NULL All other things concerning the loading of objects can be seen in the source code. 1.3. Animating objects ====================== This is done with the recursive procedure dotracks. It processes all types via the vmt, like the procedure readobjects. First it calls the virtual procedure o_dotracks. See the procedures m_dotracks to b_dotracks which are selected by the actual vmt. These procedures calculate a new position and matrix and set the hidden-flag if the object has a hide-track. If the object has an owner, the procedure readobjects recalculates the position p and the maptrix A like this: A = Ao*A, p = Ao*p + po, where po and Ao are the position and matrix of the owner. Then it calls itself recursively and repeats until there are no more objects in the same hierarchy level. 1.4. Tracks =========== The tracks stay in the world data file and are referenced by pointers. The mode field is a boolean and is set, if the track is looped. The keys field is one less than the actual number of keys, since nothing has to be interpolated if there is only one key. After all frame numbers come the points (P1, P4) and tangents (R1, R4) for the hermite spline interpolation. The formula is with t running from 0.0 to 1.0: Q(t) = (2t^3-3t^2+1)*P1 + (t^3-2t^2+t)*R1 + (-2t^3+3t^2)*P4 + (t^3-t^2)*R4. At joint points only one point is stored as P4 for the incoming and P1 for the outgoing spline. The procedure dotrack does this for all tracks except the rotation track, which has to be handled extra. In a rotation track there are quaternions stored after all frame numbers. They are called qn instead of P1, an instead of R1, b(n+1) instead of R4 and q(n+1) instead of P4. If q0 to q2 are temporary quaternions, you do it like this: q0 = slerp(qn,an,t); q1 = slerp(an,b(n+1),t); q2 = slerp(b(n+1),q(n+1),t); q0 = slerp(q0,q1,t); q1 = slerp(q1,q2,t); q0 = slerp(q0,q1,t); If you treat a quaternion as a four dimensional vector, this is the slerp function: slerp(q1,q2) = sin((1-t)*a)/sin(a) * q1 + sin(t*a)/sin(a) * q2, Where a is the angle between the vectors. Since all quaternions have unit length, it can be calculated like this: cos(a) = q1 * q2. The interpolated rotation is now in q0. The rotation matrix A is calculated like this if q0 = [w,(x,y,z)]: ( 1-2yy-2zz 2xy+2wz 2xz-2wy ) A = ( 2xy-2wz 1-2xx-2zz 2yz+2wx ) ( 2xz+2wy 2yz-2wx 1-2xx-2yy ) See the text file splines.txt if you want to know more about tracks and how the control points are calculated. 1.5. Viewers ============ In my simple hierarchy structure a light is derived from a camera. Better would be a viewer class and derive the camera and the light from it. But anyway, I refer to both light and camera as a viewer. A light has only a hide track and a pointer to a map in addition. After all tracks are calculated, the procedure doviewer is first called for the camera and later for all lights. It generates from a camera with a target, a rotation (roll) about the view axis and an opening angle (FOV) a viewer. This is a structure (tviewer, prefix v_) containing a local coordinate system, which already takes roll, FOV and for cameras the screen ratio of x:y = 4:3 into account. The three axes of the local coordinate system are called l1, l2 and l3. l1 is the right hand vector, l2 is pointing upwards and l3 is the view direction. In the first step the local coordinate system is calculated without roll, FOV and the x:y ratio. If u is a vector pointing from the viewer to the target, then the formulas are: l3 = u/|u|, l1 = ((u) x (0,0,1))/|(u) x (0,0,1)|, l2 = l1 x l3. The x3 (z) component of a vector indicates the height in the world. The cross product with (0,0,1) aligns the viewer with the floor. Then l1 and l2 are recalculated to include roll, FOV and x:y ratio: l1' = (l1*cos(roll) + l2*sin(roll))/tan(FOV), l2' = (l2*cos(roll) + l1*sin(roll))*(x:y ratio)/tan(FOV). 2. Dungeon ========== The dungeon consists of linked convex cubes. This has many advantages to just having all polygons in an array and sorting them. The first thing is that normal 3D editors do not support the kind of texture alignment needed here. That's why I made an extra editor for this. The second are speed reasons. The dungeon engine only needs to draw the cubes which are seen by the camera to a certain depth. All other cubes don't need to be touched. Theoretically the dungeon can have infinite size, which obviously can not be handeled with a sorting algorithm. Additionally, it reduces errors of the shadow algorithm and makes it faster. 2.1. Loading cubes ================== Unlike the objects the cubes are not pointer linked in memory. All cubes have equal size, so the offset to a cube can be calculated with the number. A cube consists of six sides and eight vertices, in memory it also has some other fields. The vertices are referenced by pointers to a vertex list. The sides contain links to adjacent cubes and mapping coordinates. If the four points of a side do not lie in one plane, it is split into two triangles. The sfTile flag specifies one of two possible directions, so that the cube stays convex. While loading, for each triangle (half of a side) a normal vector is calculated. It is used to describe the plane in which the triangle lies, together with one of the points as a supporting vector. Therefore there are twelve planes for each cube, which are used to trace the animated objects (see below). The procedure readcubes loads them into memory. It reads the number of vertices and cubes from the world data file and enters a loop to process all cubes. First, the information for the six sides are read into the memory. Remember that the procedure readworld has already allocated the memory for all cubes. This is why the pointers to linked cubes can already be calculated, even if they are not loaded yet. Then there is a little loop to generate the vertex pointers. The most complicated part calculates the normals. The part is executed twice and at each pass one point of the side is skipped, since each side is split into two triangles. If sfTile is 0, then the two triangles have the points 1,2,3 and 0,1,3. If sfTile is 1, then they have the points 0,2,3 and 0,1,2. The supporting vector of a triangle is stored as a pointer to one of the three vertices in d_o, the normal in d_n. The whole procedure repeats until there are no more cubes to read. 2.2. Tracing objects ==================== To use the advantages of the dungeon system for the animated objects, we should know in which cube they are. So every frame the procedure traceobjects is called to check if an object has left one cube and entered the side-cube. This is where we need the global chain and the normal vectors of the sides. The procedure enters a loop to process all objects and checks if the actual object is linked to a cube. Targets and meshes with bound objects are not linked to cubes. Bound objects, cameras and lights are always linked to a cube. If an object is linked to a cube, it calculates the distance d = (p-o)*n to the sides of the cube which are connected to another cube (max. twelve calculations for each cube), where p is the position of the object, o is the supporting vector and n the normal vector of the triangle. If the distance is negative, the object is disconnected from the cube and linked to the side-cube. The loop repeats until there are no more objects to check. If the frame rate gets too slow, it may happen that objects can't be traced correctly. They will disappear as soon as the cube gets out of view, to which they are linked. This is why the procedure traceobjects is called extra by a loop in the main program if the distance between two frames gets too high. 2.3. Visible cube determination =============================== The procedure cubetree generates a list of all cubes which have to be rendered. It can be called with a camera or a light as the viewer. At this point a new structure is introduced, the temporary cube (ttempcube, prefix tc_). A temporary cube is used for every cube which is visible from the viewer. It contains the transformed vertices, a pointer to the original cube, a recursion level number and a pointer to a temporary mesh list. The temp cubes are organized array-like. There are two temp cube lists, one for the camera and one for the lights. Their memory is allocated at program start. A viewer contains a pointer to the temp cube list to use. Although the procedure is called cubetree, it is linear organized. In general, it transforms the cube where the viewer is in and places it at the beginning of the temp cube list with a recursion level of 0. Then it transforms all cubes which are connected to this one and inserts them into the list with recursion level 1. Then all cubes which are connected to cubes of level 1 are transformed and inserted as level 2 and so on. This process terminates when no more cubes are found or the temp cube list is full. Of course each cube can be inserted only once. This is guaranteed by the stamp value which marks a cube as "used". The cubes are also tested for visibility before they are inserted into the list. This would be the recursion levels of some connected cubes, the viewer with his direction is marked with V: ----------------------------------- | 0 | 1 | 2 | 3 | 4 | | V-> | | | | | --------------|-------------------- | 3 | 4 | | | | ------------- If we draw all cubes in reverse order (starting with the highest level) we don't need to sort them. This are the transform formulas: x1 = x = (v-l)*l1, x2 = y = (v-l)*l2, x3 = z = (v-l)*l3, where v is the vertex to transform, l is the location of the viewer and l1, l2 and l3 are the local coordinate system of the viewer. x1, x2 and x3 are the components of the transformed vertex. The determination if the cube can be seen by the viewer tests the vertices against the four edges of the view area (for a camera the screen edges). Since the vertices are already transformed, we see them through a "unit viewer" with an opening angle of 45 degree of all four edges (this is a "overall" opening angle of 90 degree). At first, look at this method to test if a single vertex is seen by the viewer: b = (x1 < x3) and (-x1 < x3) and (x2 < x3) and (-x2 < x3), where b is true if the vertex is visible. But we can't use this test, because we have to test if all vertices of a cube lie beyond the same edge. If, for example, one half lies beyond the upper edge and the other half beyont the right edge, the cube may be partially visible. This is why there are four counters (c1 to c4) in the procedure cubetree to count how many vertices lie beyond each edge. After each pass there is a test if all vertices lie beyond one edge so far. If not, the cube may be visible. The whole loop is skipped for the first cube where the viewer is in, because this one can be seen in any case. If a cube is visible and the actual viewer is a camera, an extra stamp and link to the temp cube is stored. They are used where the lights are processed to test if a cube seen by a light is also seen by the camera and, if this is true, to access the transformed vertices from the camera's point of view. There are two temp cube lists, because the lights need the transformed vertices of a cube from thier own view and the view of the camera. Look in the next section for more. Now the next cube to transform is searched. It is a side-cube of the cube pointed to by readptr at the side indexed by side. If there is no side-cube or if it is already used, the next side is tested. After the last side, the next cube in the temp cube list becomes the actual read-cube. If no more cubes can be added or the list is full, the loop terminates. Now the temporary mesh list is generated for all temporary cubes. There is a temp cube loop and a mesh loop. Only whole hierarchy trees are inserted into the temp mesh list. This is to avoid sorting errors that occur when the objects of a hierarchy tree like the walker from section 1 are assigned to different cubes. This is also the place where the bound objects are used. Without them the textures of a wall or the floor can cover the hierarchy tree partially when it reaches into a cube which is closer to the viewer. But if a hierarchy tree is surrounded by some bounding objects, the whole hierarchy tree will be inserted into the cube closest to the viewer containing at least one bounding object (actually into the temp mesh list of the temp cube linked to that cube). To this point no sorting algorithm had to be used. Only the temp mesh list of a temp cube has to be sorted later. 3. Texture draw routines ======================== First some words about the screen buffer, where the routines draw to: it contains four bytes for each pixel. From lowest to highest byte they are: unused, pixel value, light value, z value. All texture draw routines set the light value to the actual ambient light value, which is modified by the procedure doambienttrack once per frame. The light draw procedures add to the light value for each light source later. The z value is a kind of z buffer, but it contains for polygons of a cube the recursion level. For polygons of a mesh it contains a number which is decreased with every mesh and starts at 255. The use of this is that the light draw routines can check if they are illuminating the right polygons. Imagine a wall which is illuminated by a light source, but covered with another wall which is dark. Without an extra identification value for each pixel the light draw routines can not avoid lighting the wall which is dark. The limitations to this are that there can be a maximum of about 220 meshes (the rest is used by the recursion levels of the dungeon) and all meshes have to be convex. Not convex meshes can not cast shadows on themselves. 3.1. Drawing cubes and meshes ============================= The procedure texturedraw goes through the temp cube list from the last to the first cube and draws first all textures of a cube, then all meshes which are in the temp mesh list of that cube. The mesh list also has to be sorted. It first calls the procedure drawcube for a cube. It draws all sides of a cube with a texture. If all four points of a side lie approximately in one plane, then the whole side drawed, otherwise two triangles. Here a new structure is used, the screenpoint (tscreenpoint, prefix sp_). It contains the location and mapping coordinates for one point. A polygon consists of several screenpoints, organized array-like. After a whole or a halve side is converted into polygon, it is projected to the screen by the procedure projection and then drawed either perspective corrected or not. This depends on the recursion level t_sub. Far polygons usually do not need to be corrected. Then the procedure texturedraw calls some procedures to draw the meshes. The procedure xformincube calls the procedure meshxform to transform each mesh and sorts them. The sorting routine does not need to be fast, because it sorts just some meshes. If it would sort hundrets of polygons a quick- or mergesort would have been better. There is also a special case testing if a mesh has the mfUnitenext flag set. If yes, the next mesh is treated as if it belonged to the same object: it can't cast shadows on it. The sorting algorithm therefore must not seperate such meshes. It was introduced that the walker does not have bad looking shadows on the knees and on the feet, where two meshes overlap a bit. The procedure meshxform first calculates a new matrix and position relative to the viewer, then it transforms the vertices of the mesh. A and p are the rotation and position of the mesh, Al and l define the local coordinate system of the viewer, where l1, l2 and l3 are the rows of Al, and v is the vertex to transform, then A' = Al*A, p' = Al*(p-l) v' = A*v + p. The first two equations have to be calculated once per mesh, the last once per vertex. Like the temp cube lists there are two temporary lists for transformed vertices, one for the camera and the other for the lights. If the actual viewer is a camera, an extra reference to the transformed vertices is saved for later use in the light procedures. The procedure meshxform returns a z distance for the sorting algorithm in xformincube. The procedure makefaces makes faces from the transformed vertices with the face data and inserts them into the screenpoints list. The procedure drawfaces finally draws all faces in the screenpoints list without perspective correction. 3.2. Projection =============== The porcedure projection does the projection and the z-clipping of the polygons with a loop repeated once for each point of the polygon. If a point of the polygon is in front of the projection plane, then the screen coordinates are calculated like this: dx = xres/2 *(1 + x/z), dy = yres/2 *(1 - y/z), dz = z, where xres and yres are the screen resolution. This new point is inserted to a destination array. If an edge of the polygon passes through the projection plane (one point is in front, the other behind of it) the point of intersection is calculated like this: r = (1.0-z)/(nz-z), dx = xres/2 *(1 + x+r*(nx-x)), dy = yres/2 *(1 - y+r*(ny-y)), dz = 1.0. This point is also inserted to the destination array. Now the loop repeats to process the next point. Note that the projection plane has a distance of one to the viewer. The visible area ranges from (1.0,1.0) to (-1.0,-1.0) on that plane. The terms xres/2 *(1 + x) and yres/2 *(1 - y) adjust this range to the actual screen resolution. If all points are behind the projection plane, no points will arrive in the destination array and therefore the polygon will not be drawed. In the end the destination array is copied to the source array. 3.3. Clipping ============= 3.4. Drawing polygons ===================== 4. Lights ========= Appendix: layout of the world data file ======================================= structure: ========== contents: --------- meshes (int) ;for memory allocation cameras (int) lights (int) targets (int) bounds (int) cubes (int) bitmaps: -------- palette (768 chars) ;palette for all bitmaps bitmapdatalen (int) ;length of bitmap data bitmapdata (n chars) ;all bitmap data materials: ---------- materials (int) ;number of materials materialdata (n materials) objects: -------- objects (int) ;objects in highest hierary level objectdata (? objects) ;recursively read object tree level: ------ vertices (int) ;(number of vertices)*3*4 vertexdata (n vectors) cubes (int) ;number of cubes cubedata (n cubes) data types: =========== material: --------- mappos (int) ;position in bitmap-data mapand (int) ;resolution (and-value for inner-loop) object: ------- id (int) ;0 = mesh ;1 = camera ;2 = light ;3 = target ;4 = bound cubenum (int) ;number of cube this object is in childs (int) ;number of childs to read Mesh: ----- (object) ;object has been read first flags (int) material (int) vertices (int) vertexdata (n vektors) *normals (n vektors) ;not implemented *mapping (n*2 floats) ;only if mfMapping is set in flags faces (int) facedata (n faces) position track (track) rotation track (track) scale track (track) hide keys (int) hide track (n ints) camera: ------- (object) targetnum (int) position track (track) FOV track (track) roll track (track) light: ------ (camera) ;inherited from camera hide keys (int) ;use hotspot-track as hide-track hide track (n ints) map-boolean (int) ;0: round map, 1: map with logo target: ------- (object) position track (track) bound: ------ (object) position (vector) track: ------ size (int) ;ttrackh mode (int) ; keys (int) ; startframe (int) ; for each key: easefrom (float) ;not implemented easeto (float) ;not implemented frame (int) P1 (n floats) for each key: R1 (n floats) R4 (n floats) P4/P1 (n floats) size = (ttrackh_ints)*4 + keys*(ttrack_ints-1)*4 + typesize*4 + keys*3*typesize*4 = (ttrackh_ints + keys*(ttrack_ints-1) + typesize*(1 + keys*3))*4 keypos = (site ttrackh) - 4 + key*(size ttrack - 4) cube: ----- side (6 sides) vertex (8 ints) ;(vertex to use)*3*4 side: ----- flags (int) mappos (int) ;if sfBitmap mapand (int) ;if sfBitmap mapping (2*4 floats) ;if sfBitmap nextcube (int) ;if sfNextcube