Equinox, my little renderer, is slowly making progress. I had reached a point where I could render a simple shape (a sphere) which had its own transformation matrix. Having a working renderer that supports a pinhole orthographic camera and a sphere, I figured this was the perfect time to try to do something that I had never done before, Multi-threaded programming! I will discuss the multithreading portion of Equinox on another post. At the moment I will like to cover a nasty little side effect that popped out of nowhere (as usual) and bit me square in the ass!
Equinox was already rendering an array of spheres, granted the render times were higher than expected, but at this point, this was expected. I usually concentrate on getting my code to work, before I concentrate on optimization. This is a common practice in software development. I was implementing the multi-threading when I saw that using 8 cores did not give the render any significant speed up. I opened the activity monitor window to track the processor usage and to my surprise, the eight working threads would only pick up the first 8 buckets, afterward only two or three threads would be performing calculations. No wonder I was not seeing much of a speed up. What a strange behavior I thought. But even more surprising was the memory usage.
As I saw the threads act all weird, I also saw the memory usage skyrocketed to 1.6 GIGS! Something was severely wrong as there is no way a single EXR image and 20 spheres would use so much memory. It seemed that I had just run into what is known as Memory Leaks. This is something that I had never really had to deal with in the past. All scripting languages that I had ever used handle dynamic memory allocation and deallocation. Languages such as MaxScript, Mel, HScript, RSL, PHP and Python they all do automatic memory management. Even In C++, I had used constructors and destructors, which also performs memory management.
As you might have read, Equinox is being developed in C, which means the programmer (that would be me :D) is 100% in charge of all memory management. So I put on my "CSI" hat and I began to dive into the code, hoping to find where I was leaking memory. After a little bit of digging I was able to track the issue to this function:
void EiProcessBucket(EtBucket bucket, Rgba *px) { extern EtWorld world; int x,y; for (int y = bucket.pos.y; y < bucket.pos.y + bucket.height; y++) { for (int x = bucket.pos.x; x < bucket.pos.x + bucket.width; x++){ EtCameraInput camIn; camIn.x = x; camIn.y=y; camIn.xres=world.film.xres; camIn.yres=world.film.yres; EtCameraOutput *camOut = EiCameraOutput; EtCameraMethods *cammtds = (EtCameraMethods*)world.cameras->mtds; cammtds->createRay(world.cameras,camIn,camOut);float r,g,b; int j; g = y / (float)camIn.yres; r = x / (float)camIn.xres; b = 1; for (j = 0; j < buf_len(world.shapes);j++) { EtNode *shape = &world.shapes[j]; EtShapeMethods *mtds = (EtShapeMethods*)shape->mtds; if (mtds->intersectP(shape,camOut->I)){ r = g = b = 1; } } // (y * camIn.xres) + x makes sure the pixel // values are stored in scan lines Rgba *p = &px[(y * camIn.xres)+x]; p->r = r; p->g = g; p->b = b; p->a = 1; } } }
I started by commenting out most of the lines in the inner loop. I left only this code inside the loop:
EtCameraInput camIn; camIn.x = x; camIn.y=y; camIn.xres=world.film.xres; camIn.yres=world.film.yres; EtCameraOutput *camOut = EiCameraOutput; EtCameraMethods *cammtds = (EtCameraMethods*)world.cameras->mtds; cammtds->createRay(world.cameras,camIn,camOut); float r,g,b; int j; g = y / (float)camIn.yres; r = x / (float)camIn.xres; b = 1;
I ran the code again and I saw that the memory usage had dropped from 1.6 gigs to 100 MBS. Huge improvement, but still not what it should be. I see that I am not manually allocating any memory into the heap and that all my variables are on the stack. This seemed to tell me that I don't really have a memory leak, but I am consuming way too much memory. I analyzed the code a little longer and spotted the what I thought was the problem. For every pixel of the image, I am creating a new variable of type EtCameraInput and a new pointer to an EtCameraOutput. This looks bad, I moved the declaration of such variables to outside of the double "for" loop. This greatly reduced memory consumption, however, it still did not feel as the right answer to the problem.
The stack should deallocate "camIn", "camOut" and "cammtds" once they go out of scope, so these cant be the issue. I looked at the code and found this line to be quite interesting:
// // EtCameraOutput *camOut = EiCameraOutput; //
In an effort to write a framework that would allow me to write faster code, I had written this preprocessor macro.
//
//
#define EiCameraOutput (EtCameraOutput*)malloc(sizeof(EtCameraOutput))
//
//
So I was allocating memory on the heap and I was not deallocating It. Once again, I seem to conspire against myself by trying to be a little too smart. Maybe using such shorthand macros is not such a good idea. I re-arranged the code to what is listed below and the memory usage dropped to 3 MB.
void EiProcessBucket(EtBucket bucket, Rgba *px) { extern EtWorld world; int x,y; float r,g,b; int j; Rgba *p; EtCameraOutput *camOut = EiCameraOutput; EtCameraMethods *cammtds = (EtCameraMethods*)world.cameras->mtds; for (int y = bucket.pos.y; y < bucket.pos.y + bucket.height; y++) { for (int x = bucket.pos.x; x < bucket.pos.x + bucket.width; x++) { EtCameraInput camIn; camIn.xres=world.film.xres; camIn.yres=world.film.yres; camIn.x = x; camIn.y=y; g = y / (float)camIn.yres; r = x / (float)camIn.xres; b = 1; ...... ...... } } free(camOut); }
Next, I uncomment the lower part of the loop, the part where the intersections are actually performed and vroom, memory usage again launched to 1.5 gigs! There is a very serious memory leak in this block of code. After a good amount of digging, I found that the code responsible for the memory leak was the intersectP method in the sphere shape. Inside this function, I am performing several operations to apply the transformation to the shape. Here is the code that handles the transformation
EtPoint pos = EiNodeGetPnt(node,"center");
EtMatrix m = EiNodeGetMtrx(node,"matrix");
EtMatrix mi;
EiMatrixInvert(&m,&mi);
// create a transform
EtTransform xf = EiTransform(m);
// create a trasform for the center
EtTransform xpoint = EiTranslate(pos);
// Multiply the object transform by the pos
EiTransformMult(&xf,&xpoint);
I commented lines of code one by one and I realized that the issue was in EiTransformMult. I opened the transform module and I found 2 variables that were dynamically allocated and where not being deallocated at all. Here is the old code
void EiTransformMult(EtTransform *aa, const EtTransform *bb) { EtMatrix *mat = malloc(sizeof(EtMatrix)); float *mm = (float*)mat; float m[16]; EtTransform *tmp=malloc(sizeof(EtTransform)); float *a = (float*)&aa->m; float *b = (float*)&bb->m; ...... memcpy(&tmp->m,&m,sizeof(float) * 16); EtMatrix mi; EiMatrixInvert(&tmp->m,&mi); memcpy(&tmp->mInv,&mi,sizeof(float)*16); memcpy(aa,tmp,sizeof(EtTransform)); }
First of all, one of the allocated pointers (*mat) is not even used anymore, so I deleted it. The other pointer, *tmp, is never deallocated. I added a free(tmp) at the end of the function and voila! The renderer memory usage now stays at a mere 3mbs per render and as a side effect, now using 8 threads greatly improves the render times. A scene that takes 71 seconds to render on 1 thread, takes about 20 seconds with 8 threads. Here is an image rendered with the latest version of Equinox.
So keep an eye open for those malloc()s while programming in C. Remember to always deallocate whatever you allocated or evil leprechauns will spawn and consume as much memory as the can get your hands on. Oh, and like a good friend told me once, don't try to be too smart or over complicate things with C. It is already super simple, which makes it super powerful.