Wednesday, May 2, 2012

60 FPS Optim: 16bit Textures and Mipmapping

My game now runs at 60FPS on iPod! This is mainly thanks to 16bit texturing and mipmapping.

Previously, the game was running pretty slowly on the iPod touch 2G: 24 FPS while rendering 1028 triangles, with 7 different materials groups. My geometry is still far from optimized (my meshes are too high-res for the iPod), so only textures are optimized for now.

16 bit textures are a must on iPod: Not only they use half the amount of video memory and are handled really fast by the GPU, the quality loss is barely noticeable. However, I had to convert my 32 bit PNG data into 16 bit. Currently, the conversion is done at loading time, but I will soon make it at build time, as part of a resource processing tool. Here is how to convert the 32 bit PNG data returned by libPNG into 16bit data for use with glTexImage2D:

u16* pData = snew u16[width*height];
switch (info_ptr->color_type)
{
    case PNG_COLOR_TYPE_RGBA: // convert RGBA8888 to RGBA4444
    {
        channels = 4;
        for (s32 i=0; i<height; ++i)
        {
            for (s32 j=0; j<row_bytes; j += channels)
            {
                 u8 r = pngData[i*row_bytes + j + 0];
                 u8 g = pngData[i*row_bytes + j + 1];
                 u8 b = pngData[i*row_bytes + j + 2];
                 u8 a = pngData[i*row_bytes + j + 3];
                 r = u8((f32(r)*0xf)/0xff)&0xf;
                 g = u8((f32(g)*0xf)/0xff)&0xf;
                 b = u8((f32(b)*0xf)/0xff)&0xf;
                 a = u8((f32(a)*0xf)/0xff)&0xf;
                 s32 column = j/channels;
                 s32 index = i*width + column;
                 pData[index] = (r<<12) | (g<<8) | (b<<4) | a;
             }
        }
     }            
     break;

     case PNG_COLOR_TYPE_RGB: // convert RGB888 to RGB565
     {
          channels = 3;
          for (s32 i=0; i<height; ++i)
          {
               for (s32 j=0; j<row_bytes; j += channels)
               {
                    u8 r = pngData[i*row_bytes + j + 0];
                    u8 g = pngData[i*row_bytes + j + 1];
                    u8 b = pngData[i*row_bytes + j + 2];
                    r = u8((f32(r)*0x1f)/0xff)&0x1f;
                    g = u8((f32(g)*0x3f)/0xff)&0x3f;
                    b = u8((f32(b)*0x1f)/0xff)&0x1f;
                    s32 column = j/channels;
                    s32 index = i*width + column;
                    pData[index] = (r<<11) | (g<<5) | b;
                }
           }
      }
      break;

      default: SHOOT_ASSERT(false, "Unsupported PNG format");
}


Finally, here is how to pass the 16bit texture data to OpenGL, with mipmaps enabled:


glEnable(GL_TEXTURE_2D);
glGenTextures(1, &m_GLTextureID);
glBindTexture(GL_TEXTURE_2D, m_GLTextureID);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR_MIPMAP_NEAREST); 
glTexParameteri(GL_TEXTURE_2D, GL_GENERATE_MIPMAP, GL_TRUE);
            
switch(m_eFormat)
{
    case TF_RGB: glTexImage2D(GL_TEXTURE_2D, 0, GL_RGB, s32(m_vSize.X), s32(m_vSize.Y), 0, GL_RGB, GL_UNSIGNED_SHORT_5_6_5, m_pData);
                break;

    case TF_RGBA: glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, s32(m_vSize.X), s32(m_vSize.Y), 0, GL_RGBA, GL_UNSIGNED_SHORT_4_4_4_4, m_pData);
                break;            
}


On the windows port, 16bit textures become noticeable, so I will probably make them exclusive to iPod. Here is how 16bit textures with mipmapping look in the editor (click to enlarge):



Friday, April 27, 2012

Virtual File System / zlib integration

I'm currently trying to optimize resource loading before starting a full level design swing on my 3D shooter.

The main problem is that file disk access is slow. This is true for virtually all platforms, including my platforms of interest: iPhone and Windows. This is due to several factors: disk memory is typically abundant but slow, and it can be shared by several processes which makes it a costy resource to access.

Data packing and partitioning


So in order to optimize the resource loading, I made sure the disk is accessed in a managed and efficient way. Firstly, the game data is now packed into a single "data.fs" file instead of a bunch of file free-for-all. Secondly, this data file is organized into "partitions". The main idea is to be able to pre-load a bunch of files into RAM memory, and access them later in a super fast way.

But since RAM memory is itself a scarce resource, partitions must be carefully managed, loaded, and disposed of when not needed. In particular, partitions should not be too big so they don't bloat the RAM and take ages to load. Also, the first partition to be loaded must be small so that the game boots quickly, which give an impression of speed. Here is a quick desc of my partitions:
  • Boot Partition: data needed to boot the game: (fonts, main menu graphics, ..) is loaded in the foreground thread as soon as the game starts.
  • Common Partition: data common to all game levels (Player data, HUD data, ..), is loaded in the background as soon as the game starts.
  • Partition1: data for Level1, is loaded in the background when Level1 is requested 
  • Partition2: data for Level2, is loaded in the background when Level2 is requested 
  • Partition3: data for Level3, is loaded in the background when Level3 is requested
  • etc.
To facilitate development, the data files are accessed directly in Debug mode (the data.fs pack is not used.) However, the data files need to be reorganized into sub-folders, each one representing a partition.

In Release mode, the data.fs is used, so I have a build step that prepares it. I wrote a tool that reads the structure of the data folder, and outputs the corresponding pack in a binary format. At the beginning of the pack, there is a meta information header about the files: The offset of each partition in the pack, and the offset of each file with respect to its partition. After the header, the content of each partition is appended.


Reading files from memory


The last step is to actually use this file system :) In Release mode, everytime the engine requests file data, it is transparently redirected to the file data in RAM (that is part of a previously loaded partition) instead of the data read from disk. I have a file map that is loaded at the beginning (from the data.fs header), that is used to match the file offset in RAM from the file path.

This integration is completely transparent to engine code, so when libpng or tinyxml call a shoot::File::Read (which is the Shoot equivalent of fread), they are given the corresponding data from RAM really quickly.

Compression using zlib


Since I have a lot of data in xml format (Levels, Resource descriptors, Entity templates, ..), I decided to optimize things further and compress each partition using zlib. Basically zlib was already there as part of the libpng integration I just had to use it. Zlib is so easy to use, here is a quick example:
unsigned long compressedDataSize = unsigned long(originalDataSize * 1.1) + 12;

compress(dataCompressed, &compressedDataSize , data, originalDataSize);

uncompress(data, &originalDataSize, dataCompressed, compressedDataSize);

Monday, April 23, 2012

libPNG integration

I decided to integrate libPNG while I was tracking a memory leak that seemed to originate from the SOIL image library. Later it turned out that the leak was caused by my engine code not by SOIL, but still, integrating libPNG was the right thing to do. Not only the integration went very smooth, it also worked on iPhone on the first try.

The main advantage of libPNG is that it's designed to let you customize memory allocation and file i/o functions, in contrast with SOIL. This made the integration very smooth, as I did not need to modify a single line of code in libPNG to make it use my custom allocator and File class. Here is my full PNGLoader class, which returns the texture data needed by glTexImage2D. It is largely inspired from the PNG loading code from Morten Nobel's blog.


/* 

Amine Rehioui
Created: April 22nd 2012

*/

#include "Precompiled.h"

#include "PNGLoader.h"

#include "File.h"

#include "png.h"
#include "pnginfo.h"

namespace shoot
{
    //! PNG malloc
    png_voidp PNGMalloc(png_structp png_ptr, png_size_t size)
    {
        return snew u8[size];
    }

    //! PNG free
    void PNGFree(png_structp png_ptr, png_voidp ptr)
    {
        delete[] (u8*)ptr;
    }

    //! PNG error
    void PNGError(png_structp png_ptr, png_const_charp msg)
    {
        SHOOT_ASSERT(false, msg);
    }

    //! PNG warning
    void PNGWarning(png_structp png_ptr, png_const_charp msg)
    {
        SHOOT_WARNING(false, msg);
    }

    //! PNG read
    void PNGRead(png_structp png_ptr, png_bytep data, png_size_t length)
    {
         File* pFile = (File*)png_get_io_ptr(png_ptr);
         pFile->Read(data, length);
    }

    //! loads a texture
    u8* PNGLoader::Load(const char* strPath, s32& width, s32& height, s32& channels)
    {
         File* pFile = File::Create(strPath);
         pFile->Open(File::M_ReadBinary);

         png_structp png_ptr = png_create_read_struct_2(PNG_LIBPNG_VER_STRING, 
         /*error_ptr*/NULL,
         PNGError,
         PNGWarning,
         /*mem_ptr*/NULL, 
         PNGMalloc,
         PNGFree);

         SHOOT_ASSERT(png_ptr, "png_create_read_struct_2 failed");

         png_infop info_ptr = png_create_info_struct(png_ptr);
         SHOOT_ASSERT(info_ptr, "png_create_info_struct failed");

         png_set_read_fn(png_ptr, pFile, PNGRead);

         u32 sig_read = 0;
         png_set_sig_bytes(png_ptr, sig_read);
  
         png_read_png(png_ptr, info_ptr, PNG_TRANSFORM_STRIP_16 | PNG_TRANSFORM_PACKING | PNG_TRANSFORM_EXPAND, NULL);

         width = info_ptr->width;
         height = info_ptr->height;
         switch (info_ptr->color_type)
         {
         case PNG_COLOR_TYPE_RGBA:
             channels = 4;
         break;

         case PNG_COLOR_TYPE_RGB:
             channels = 3;
         break;

         default: SHOOT_ASSERT(false, "Unsupported PNG format");
         }

         u32 row_bytes = png_get_rowbytes(png_ptr, info_ptr);        
         png_bytepp row_pointers = png_get_rows(png_ptr, info_ptr);

         u8* data = snew u8[row_bytes * height];  
         for (s32 i=0; i<height; ++i)
         {
              memcpy(data+(row_bytes*i), row_pointers[i], row_bytes);
         }

         png_destroy_read_struct(&png_ptr, &info_ptr, NULL);
         pFile->Close(); 
         delete pFile;
         return data;
    }
}

The drawback is that libPNG obviously does not support formats other than PNG, in contrast with SOIL which could also do JPG, BMP, TGA. But the smooth integration and the fact that it also worked on iPhone makes it really worth it. Also, PNG answers all needs for my 3D game for now.

Rendering Strategy

I recently optimized the rendering in Shoot, and came out with something faster yet with a very simple implementation.

My goal was to minimize the graphic driver state changes. So the main challenge was to organize the entities in the world such as the ones that share the same material and the same geometry get rendered using the same "SetMaterial" and "SetVertexBuffer" calls. Meaning the only call that should be called per entity is the "SetTransform" call, which applies the entity world transformation.

As a first step, I made sure Material and VertexBuffer instances are only created if there is no existing ones that match their requests. For example, Materials can only be obtained through a "MaterialProvider", which queries the ObjectManager for a Material with the requested properties, for ex. texture, color, etc. If the ObjectManager finds an instance, it is re-used, otherwise the MaterialProvider creates a new one. This way entities with the same material always point to the same material instance. Roughly the same thing is done for vertex buffers, except they are matched in a slightly different way.

Second, I implemented the new rendering loop. Here is roughly how it works:
//! vertex info
struct VertexInfo
{
    VertexBuffer* pVertexBuffer;
    std::vector<Matrix44> aTransforms;
};

typedef std::map< u32, VertexInfo > VertexMap;

//! render info
struct RenderInfo
{
    Material* pMaterial;
    VertexMap m_VertexMap;
};

typedef std::map< u32, RenderInfo > RenderMap;

//! adds an entity to a render map
void EntityRenderer::AddToRenderMap(RenderMap& renderMap, RenderableEntity* pEntity)
{
    Material* pMaterial = pEntity->GetMaterial();
    VertexBuffer* pVertexBuffer = pEntity->GetVertexBuffer();
    renderMap[pMaterial->GetID()].pMaterial = pMaterial;
    renderMap[pMaterial->GetID()].m_VertexMap[pVertexBuffer->GetID()].pVertexBuffer = pVertexBuffer;
    renderMap[pMaterial->GetID()].m_VertexMap[pVertexBuffer->GetID()].aTransforms.push_back(pEntity->GetTransformationMatrix());
}

//! renders from a render map
void EntityRenderer::Render(RenderMap& renderMap)
{
    for(RenderMap::iterator it = renderMap.begin(); it != renderMap.end(); ++it)
    {
        Material* pMaterial = (*it).second.pMaterial;
        pMaterial->Begin();

        for(VertexMap::iterator it2 = (*it).second.m_VertexMap.begin();
            it2 != (*it).second.m_VertexMap.end();
            ++it2)
        {
            VertexInfo& vertexInfo = (*it2).second;
            vertexInfo.pVertexBuffer->Begin();

            std::vector<Matrix44>& aTransforms = vertexInfo.aTransforms;
            for(u32 i=0; i<aTransforms.size(); ++i)
            {
                GraphicsDriver::Instance()->SetTransform(GraphicsDriver::TS_World, aTransforms[i]);
                vertexInfo.pVertexBuffer->Draw();
            }
            vertexInfo.pVertexBuffer->End();
        }    

       pMaterial->End();
    }
}

//! renders the entities (Pseudo code)
void EntityRenderer::Render()
{
    ...Setup 3D view

    Render(m_SkyBoxMap);

    Render(m_Solid3DMap);

    Render(m_Transparent3DMap);

    ...Setup 2D View

    Render(m_Solid2DMap);

    Render(m_Transparent2DMap);
}

Monday, April 9, 2012

Shoot Engine Demo Video

I made a video to show the current progress of the Shoot Engine. It shows the following features: 2D Scene Graph, 3D Scene Graph, Skyboxes, Particles, Curves, Visitors, Billboards, and a small game-play sequence from my upcoming space shooter. Enjoy!


video

Sunday, October 17, 2010

XBOX 360 Port kicking!

Screen shot of the Shoot Editor running on Windows with OpenGL

Since creating the iPhone port, I added more features to the engine such as a Particle System and Skyboxes, then I decided to start an XBOX 360 port. Here are the steps I took:

- I bought an XBOX 360 elite and an XNA creators club annual membership

- At first, I went with an approach that turned out to be a development "cul-de-sac".. I had decided to use my C++ engine from C#/XNA, using the Common Language Runtime (CLR) feature of Visual Studio. This looked good in theory, and started good as my initial tests to invoke C++ from C# and vice versa worked fine. On the C++ side, I used the CLR syntax to access XNA objects, in particular the GraphicsDevice object. A big part of the work was on resource loading, were I made the engine use XNA's ContentManager for loading resources instead of accessing the file system directly. I also made sure the Projection and View matrices generated by the engine work as expected in XNA. However when I got to the stage of testing the rendering, I could not get a single polygon to render correctly. After checking everything, I gave up on this approach, with the conclusion that there should be a security layer within XNA that prevents C++ code invoked through CLR from accessing the Graphics Device. I still think this but I am note sure at all, so if anyone knows about this please let me know.

- Then I resorted to a second approach: rewrite the engine in C#! It was a bit disappointing to do this, specially that my point was to keep a C++ core across all platforms. But too bad the XBOX C++ SDK is not accessible to indies. To balance the duplication in code I made it a top priority to keep the game level data absolutely the same across all platforms. This will enable me to develop primarily on the Windows port using the Editor, generate the game levels, then load them with the C# version of the engine. Yes I am gonna have to port the game-play code from C++ to C#, but it seems acceptable to me as those languages are very similar and I already picked up practices to make the porting as easy as possible.

- Writing the C# port, and working with XNA in general, was smoother than I thought, and I truly respect Microsoft for that. Since XNA already has a lot utility classes and support for resources (meshes, textures, ect.), I only had to rewrite a thin skeleton of engine classes: Mainly the entity system and the serialization system.

- I hammered through it smoothly until I started working on the 3D entities. Since I decided to use XNA's content manager to load 3D models, and since it does not support the MilkShape 3D format that is used on other platforms of the engine, I had to find a 3D format that I can use on all platforms. After a short attempt at using Autodesk's FBX SDK which seemed gigantic and over-engineered, I decided to use the .X format, since I found some code for loading .X files in OpenGL. The code was from Mr. Paul Coppens, it seemed a bit outdated, but it was short and came with samples. So now thanks to the XBOX port, Windows/OpenGL and iPhone ports support .X meshes too.

- On a final note, did you know that XNA has a Right-Handed coordinate system? This caused me trouble as I assumed it would be Left-Handed like DirectX, and I spent some time dealing with a virtually inverted Z axis, by doing things such as scaling the view matrix by -1 in the Z direction, changing the culling mode, etc. After few failures, I finally got the 3D scene to show exactly the same on all platforms, and realized that XNA is in fact Right-Handed like OpenGL! A late reading of the XNA documentation confirmed this. Unfortunately there are always things to learn the hard way.

iPhone Port kicking!

I created an iPhone port for the Shoot engine. Here are the steps I took:

- I bought a mac mini and joined the Apple developer program
- I first created a C++ static library project under XCode that contains all the engine code
- I implemented an OpenGL ES version of the graphics driver
- I had problems using SDL from whiting the static library so I completely removed it and used a modified SOIL library for loading textures.
- I implemented a cross platform File access class that is transparently used to read resources (xml, images, meshes, etc.) on both iPhone and Windows platforms.
- I created an OpenGL ES application and invoked the engine from it

Generally speaking things went smoothly, but not as much as I would have wanted. It was painful to get rid of SDL for which I relied on for loading textures, but on the other hand it was a great experience to integrate texture loading code straight into the engine. Now Shoot has native support for .bmp, .jpg, and .tga formats on both Windows and iPhone platforms!

On a personal note, I do not like developing on mac a lot. I find the environment cute but not very functional and sometimes frustrating. But this is coming from a first time mac user.