Fenix Manual
Fenix Manual
Introduction

Notes
Considered Optimisations

Considered Optimisations
Last Modified: May 30, 2005 10:05AM
(Any)
Description

Optimisations being considered!

The following is a list of the most important optimisations that we would be most interesting in studying and trying to implement for version 1.0.

Subrotines

  • At the moment the creation of a process has a considerable overload: all their local values are initialized (potentially 2, 4K of data), it marks the use of all their local strings, reserves memory for the structure that describes it, etc. What is even worse, these resources are not released until the following frame. The traditional programming (for example, resolution of simple mathematical problems by means of recursive functions) is prohibitive.


  • It would be precise to be able to mark a process like subroutine. A process of this type does not appear in screen, it does not contain local data, and it never executes the instruction FRAME. One behaves like a function in C or another traditional language: it receives a series of parameters, it has a space for derived variables, and when leaving these are destroyed. The subroutines would be executed from within the interpreter without leaving the function that executes a process [ instance_go ], using battery space to temporarily maintain their private variables until they return more or less and acting as a jump JMP with something more of overload to initialize private variables.


  • It would have to be responsibility of compiler FXC:

  • To identify a process like subroutine (verifying that it does not use local variables nor it does it contain the instruction FRAME) and to mark it like so.


  • To generate code that qualifies space of battery (stack) for private variables and initializes its values (for example, it increases the account of use of the chains that are stored there). When entering the subroutine, a registry of "battery frame" will be initialized with the location of this space.


  • To generate code that releases the occupied space of battery (for example, it reduces the account of use of the chains that has stored in these variables). It will be responsibility of the subroutine in addition to recover the registry of frame of battery to the previous value.


  • To replace the accesses variables deprived of the process by accesses to referenced battery on the registry of frame of battery previously mentioned.


  • It must be responsibility of compiler FXI:

  • To maintain the new registry of battery frame


  • To recognize and to interpret any new mnemonic necessary one to accede to distant values of the battery in function del new registry of frame.


  • To recognize and to interpret new the mnemonics of jump to subroutine.
    He would be desirable to incorporate the maintenance of the frame of battery within mnemonic of call to new subroutine and the return of the same one.


  • Masks or sprites RLE

    At the moment the drawing of graphs commonest is made verifying pixel to pixel the existence of zeros (color is transparent). Version MMX of the same process makes a combination of the values read with the existing ones in the destiny buffer, which is much more slow in memory of video but more express in the ram, since it does not execute internal jumps.

    Techniques that allow to accelerate considerably this process, in exchange for making the graph exist immutable: to create a mask of bits of the same one (with 1 for pixels I am transparent), or a version RLE that separately codifies the empty spaces of pixels. The masks allow considerably to accelerate also the detection of collisions.

    No of the two techniques is interesting when sprites is rotated, amplĂ­ados, or reducing of size. In addition, the horizontal mirror would require special versions of the routines of drawing by mask or with RLE. All it suggests them present versions of blitter would have to continue existing, and these optimizations to coexist like special cases.

    The greater problem is than a more or less expensive conversion of sprite to mask or version RLE is required. A modification in the content of sprite can require to return to turn it. For this problem there are two possible solutions:

    That he is the programmer the one that explicitly creates the mask or version RLE of a graph, by means of a function to the effect, and that returns to call it to update it if it modifies the original graph. It has the disadvantage of which it can be heavy.


    To use flag "dirty" in a graph to detect automatically when it has been modified. It has the disadvantage of which some changes are indetectables (p.ej. MAP_BUFFER) and of which erroneously it is possible to be wasting the time in creating masks of graphs that always are going away to draw climbed, for example.

    Personally I show preference for the first option, in any case with a variant of the function to create masks that create masks on all the graphs in memory, or on all those of a bookstore, or that activates a way automatically to create them on all the graphs loaded in the future. This would increase the number of functions in three or four.

    As far as the implementation, each structure GRAPH would have to contain a leader to a specific function of drawing, to being called instead of gr_blit when the conditions of drawing are normal (without climbed, rotated, and flags or only with a set reduced of flags, perhaps with a mask specified in another field of the structure). The detection of the existence of this routine can come at the beginning of own gr_blit, making the transparent improvement to the system.

    The option to make graphs with personalizables routines of drawing has manifold additional applications. For example, they would be possible to be created DLLs that implemented blitters alternative to the original one (with special effects, or optimizations like the commented ones). Also it would be possible to create compound maps (a special version of a map of 8 bits where each pixel instead of being a color represents the index of a graph within a bookstore FPG) which they would be very useful like bottom of scroll, where now there are limitations of size and memory since all the bottom is an only graph. It would be enough with creating a routine of drawing specialized to the case.

    Signals
    Right now the signals sent with SIGNAL have effect when the finishing the sentence. A process is possible son to send a signal to sleep to the process father and to return: the process father will not get to execute the following sentence.

    The price to pay is that the present state of the process is verified contĂ­nuamente. After each mnemonics there is an empty verification of stack to know if we are in full stop of sentence. Although it is only one comparison of registries, he exceeds, and the interpreter would be accelerated in a certain percentage without his existence.

    Unfortunately that would make the performance erratic of signals, because a signal would have effect when the main curl arrived at the corresponding process. Some signals would have almost immediate effect (those of a process against which it has not called it and that not yet has executed this frame), while others would not act until the following one. In order to solve this, function SIGNAL would not have to immediately change the state of a process, but to keep the new state in an auxiliary variable, so that only at the end of present frame this one replaced the real state. Of this form the operation of the DIV would be emulated, where the signals do not have effect until just before the visualization, and that already is sufficient like means of interperformance between processes.

    What is worse is than the compatibility with the already existing programs would be lost, that are employees of the operation of the signals.

    The best solution is to create two versions of the nucleus of the interpreter (it is possibly precise to create more than one, for example step by step to qualify an execution or with breakpoints). In one of them the present mechanism is followed (although the verification of stack does not continue being valid if subroutines are implemented, since these can create processes), and in another alternative something faster the signals would work like in DIV. A global variable could decide what version to execute. It would not be necessary step by step to duplicate the versions of the interpreter, since in these the yield is not so important and could verify the happiness state global variable to emulate both versions.

    Main curl of execution and drawing

    At the moment the main curl of Fenix creates at least two dynamic lists that are ordered each frame with qsort: a list of processes to execute, and a list of objects to draw.

    This operation is not optimal, since in many cases these lists are long and quite constant (with few or no variation from frame to another one).

    Something of improvement in the yield, in programs with many simple processes would be obtained specially, if these lists stayed at all moment created and ordered (inserting the new elements in the correct position, by example). The greater complexity consists of detecting changes in the processes (for example, variables PRIORITY and Z can be changed by remote processes that accede using sharpshooting or the operating point [.]).

    The drawing method is not optimal either that is used at the moment, that redibuja frame whole. He would be more optimal in many cases (specially in games in hi-res with static bottom) if a list of rectangular "hot" zones were created and limited the drawing of frame those zones (reviewing the processes that have been created, moved or altered in form or graph). This type of implementation has the problem of which it is necessary to mix the obtained rectangles. For example, a great amount of processes "particle" can create 15 or 20 rectangles in screen that are superposed in a configuration non-trivial. In an extreme case it can be slower to make this process that to draw the whole screen. He would be better to leave of hands of the user the election, supporting variables DUMP_TYPE and RESTORE_TYPE of the original DIV. A game with a great amount of static processes (for example an isometric game type "Alien 8" where each bucket in screen forming the scenery is a process) is not practical right now, but could be it if you would not redibujaran all each frame.

    [ the file that outlines the present operation of the interpreter contains a possible implementation of this system ]

    Another alternative is to divide the screen in a grid and to mark each picture of the grid that "is stepped on" by a process to be redibujado. With a fit number of cells this can more be adapted for certain cases that the previous solution of the rectangles, at the cost of drawing a zone of screen rather ampler potentially.

    Scroll "parallax"

    Right now there is no routine that draws scrolls, but that simply draws the layers of same the one on the other. This is slow specially in hi-res.

    A first step is to draw a drawing routine "parallax" that mixes two or more layers, drawing the inferior layers through the holes of the superior layers. This routine can benefit from compression RLE of the graph that forms the superior layer. It would be precise to alter the present structures, thought for the existence of only two layers by scroll. The result could not be compatible with the present games.

    A disadvantage is that it can be interesting to use particular routines to draw planes of scroll, like for example the commented routine above of drawing of graphs by blocks from other graphs. In order to implement this case, the routine of scroll could maintain a cache with the visible zone of each plane (of size a little superior to the visible surface of screen) and go updating it calling to the drawing routine as the screen moves. This would also allow to qualify simple planes with flags, like the horizontal mirror, without implementing a special case of the routine of parallax for it, at the cost of an increase in the use of memory (more or less elevated; to 640x480 a plane to complete screen of 16 bits is 600K of ram).

    A classic problem of games DIV with scroll is the nonexistence of superior planes of scroll (or rather, not to be able to put in processes between planes of scroll). Being two planes it is not only specially important and the user resorts to processes to simulate a superior plane of scroll, but allowing the existence of more planes he would be better to support this characteristic. Fenix now allows the piled up one of layers of scroll, but safe to low resolution, it is not an optimal solution. If parallax of more is implemented than two planes he would be interesting to be able to draw processes between plane and plane. This problem does not have a simple solution. The routine of parallax could draw the processes upon scroll, once drawn this one to the complete one, and soon redibujar the layers that are upon them (limiting itself the zone who occupy in screen, a technique similar to a RESTORE_TYPE by rectangles). This solution would be faster than to have to draw the planes in multiple steps, but in any case those are enough pixels that draw several times. A better but more complex solution: to study the disposition in screen of all the processes and to make a classification of the screen in rectangular zones, of form similar to as the GUI.DLL works to draw windows without painting the same pixel twice. In the rectangles where there are not processes, to act normally; in where there is some process, to draw the inferior planes, followed of the processes, followed of the superior planes.

    In order to complicate the subject it continues being interesting to be able to make transparencies or effects between planes of scroll. Nevertheless this type of resources no longer is more remedy than to draw them after to have drawn scroll inferior. The simplest solution is to allow the piled up one of planes of scroll like until now, so that if the user wishes to make a strip of clouds that move, for example, can do scroll piled up on a reduced region of screen.

    User Contributed Notes
    Considered Optimisations
    Add Notes About Notes
    There are no user contributed notes for this page.
    Last updated: Mon, 30 May 2005 - 10:01:13

    Manual © 2005 By Gary Moncrieff(Dazzy), Notes belong to their respective posters