Dec 23, 2013

Snow Effect in Google+ Photo

I didn't know Google+'s Auto Awesome mode does snow effects too. It was a pleasant surprise to find out what it did to a picture I took recently. It would make a good x-mas card :)

Dec 16, 2013

CUDA Compile Error MSB3721 with Exit Code 1

So I finally decided to play with CUDA a bit. I love Visual Studio, so I created a CUDA project in VS2012(CUDA SDK doesn't support VS2013 yet) and tried to compile it. But uhoh.. I got this error:

error MSB3721: The command ""C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.5\bin\nvcc.exe" -gencode=arch=compute_10,code=\"sm_10,compute_10\" --use-local-env --cl-version 2012 -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 11.0\VC\bin"  -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.5\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.5\include"  -G   --keep-dir Debug -maxrregcount=0  --machine 32 --compile -cudart static  -g   -DWIN32 -D_DEBUG -D_CONSOLE -D_MBCS -Xcompiler "/EHsc /W3 /nologo /Od /Zi /RTC1 /MDd  " -o Debug\ "C:\test\"" exited with code 1.

I love how the error message doesn't tell me anything about the error. Yay error code 1..

Anyways, It turns out that COMODO's auto-sandbox is sandboxing it. I added nvcc.exe and msbuild.exe to the sandbox exception list, but I was still getting the same error. Eh.. I wish I could have found the permanant solution for this, but I ran out of ideas.

For now, temporarily disabling the autosandbox mode in COMODO whenever I wanna play with CUDA would cut it. Why can't COMODO and NVIDIA work it out together?

Dec 10, 2013

How Much Canadian Programmers Make

I didn't know that Canada government publishes the salary survey information for programmers until today. But here it is.. this shows how much we, Canadian programmers, make for each region.

This shows low, median, high hourly wages and the data is pretty up-to-date. (for example, BC stat is from 2011-2012). To calculate the annual salary out of this, simply use this formula:

hourly wage * 40 (hr/wk) * 52 (wk) = 

Alberta, the oil nation, comes in first, followed by BC and ON. If you are my neighbour living in BC, you should make:

  • low: $44,990
  • median: $79,997
  • high: $131,997

So are you making enough? :P

Dec 2, 2013

Lessons Learned from Voxel Editing

I have been looking into voxel editing stuff on and off recently, and this is a short(or long) list of practical lessons I learned. All the factual information presented here is already available online. I'm just gathering and summarizing them here with my 2 cents.

1. Understand how isosurface works first
One of the most practical way of manipulating voxel data is by assigning an iso value for each voxel. I have tried some other ways (and even a very original way I came up with), most of them came short. After all, reconstructing meshes from iso values is the most practical and memory-smart way. There are a lot of papers explaining iso surface extraction and what not, but none of those really explains what isosurface is.

I found this tutorial. It's good and it will help you understand the concept of isosurface.

Also the developer(s) of 3D Coat kindly shared how they store their voxel data. You will be surprised how much you can learn about voxel editing by looking at their data structure.

If your interest is not so much of voxel editing, but procedurally generating large terrains with some help from temporary voxel data, read VoxelFarm inc's blog. He's genius.

2. You can "only" modify voxel data implicitly
Once you use iso values to represent your volume data. The only way you can modify your voxel data directly is by changing the iso values. The operations would be very mathematical like increase/decreasing each iso values or averaging across neighbours unless you stamp a new isosurface shape directly by evaluating a distance function(e.g, functions you can find in the tutorial linked above).

There's no one-to-one relationship between your iso value and the final mesh shape, you can't simply say "I want this edge to extend about 0.2 meter, so let me change this voxel value to 0.752" In other words, you don't have a fine control over the final shape. This is still true even if you increase your voxel density crazyily high(e.g, a voxel per a centimeter).  So that's why I call it implicit editing.

There are some newer mesh construction methods that can give you very shape edges, but these are not really made for voxel editing as explained in the next point.

3. Traditional marching cube is sufficient
As I just said, there have been new researches on how to generate meshes out of iso data. Particularly, Cubical Marching Square and Efficient Voxel Octree were something I really enjoyed reading and playing with.

CMS can give you a sharp edge IF you capture your data from polygons. It encodes where polygon edges intersect with on a cube surface. But good luck with editing this data in real-time.  (e.g, voxel editing like C4 engine or 3D Coat).  I just couldn't find any practical way to edit this data while preserving the edge intersection info correctly.

EVO is not about polygon generation. It's a actually a ray-tracing voxel visualizer running on CUDA. It's fast enough to be real-time(very impressive! check out the demo!), and its genius way to store voxel data as a surface(so, sorta 2D) instead of volume needs some recognition. But still same problem here. I don't see any easy way to edit this data.

So basically all these new methods are good for capturing polygons as voxel data and visualize them without any alternation.

Sure, there can be some difference in final meshes generated from traditional MC and CMS, but if your intention is editing voxel data instead of replicating the polygon data precisely, the difference is negligible. As long as there's WYSIWYG in your editing tool, either method is fine. But MC is about 30% faster with my test. (both unoptimized and implemented in a multi-threaded C# application)

4. Optimize your vertex buffer
It's very common to have 1000 ~ 2000 triangles for a mesh generated from 10 x 10 x 10 voxels. If your voxel density is high and your world is large, it can easily give you 15 million triangles. Sending this much vertex data to GPU is surely a bottleneck. I tested on two different GPUs, one very powerful and the other okay-ish. Both were choking.

So at least do some basic vertex buffer optimization: sharing vertices between triangles and average normals from the neighbouring surfaces. You will see 30~50% reduction on vertex buffer size.

5. Use level-of-details
Simply sharing vertices wouldn't be enough. 30~50% save from 15 million is still 8~10 million triangles. Still not good enough! Now you need to generate LOD meshes. If you want to do it in voxel space(and probably you should), you simply sample every n-th voxels for LOD meshes. (e.g, for 1st LOD every 2nd, for 2nd LOD every 4th and so on)

If you use CMS, doing LODs is not that hard. This algorithm has intersection point information, so there will be no discontinuity between lower and higher LODs(if you have a full octree data, that is). But if you use MC as I suggested, you will see gaps. I haven't tried to fill in the gaps yet, but if I do, I will probably try TransVoxel algorithm first. Also Don Williams had a great success with this.

You can also generate lower LODs from your higher LOD polygoins instead of using voxel data directly. That's how Voxel Farm does it. This approach has a nice benefit: you can reproject higher LOD meshes and generate normal maps for the lower LOD mesh. But I'm not a big fan of mesh simplification for the reasons I'm gonna explain below.

6. Mesh simplification might not satisfy you
Another way of optimizing your vertex data is mesh simplification. Voxel Farm is doing it and it looks good enough for them, but with my test case, it was not giving me satisfactory results. I just didn't like the fact that I didn't have too much control over what triangles will be merged and what will not. It's all based on error calculation method you use, but I just was not be able to find an one-fits-all function for this.

Again, you might find it working fine for your use, but it was not the case for me.

K, that's all I can think of for now.. so happy voxeling... lol

Nov 25, 2013

Blendable RGBM

What is RGBM?
Using a fat buffer(16 bit-per-channel) is the most intuitive way to store a HDR render target, but it was too slow and it used too much memory, so we decided to encode(pack) it into a 8 bit-per-channel buffer. There are different packing methods, but I believe the most widely used method was RGBM. (also this link has a nice summary of LogLUV too, so give it a read if you are not familiar with this topic at all) 

Packing.. Yum.... It's nice: it saves bandwidth and memory...

RGBM Limitations
But... YUCK we couldn't do one thing that artists absolutely love to see.... ALPHA BLENDING..

We simply can't do alpha blending on a RGBM buffer. In other words, you shouldn't have semi-transparent objects. Why? With RGBM, you need this formula to blend pixels properly. 

    ( DestColour * DestMultiplier * InvSrcAlpha + SrcColor * SrcAlpha ) / SrcMultiplier

Let's look at some not-too-obvious parameters:
  • DestMultiplier: stored in DestAlpha channel, which is easily available to our blend unit
  • SrcMultiplier: alpha channel output from shader
  • SrcAlpha: transparency value. but where do we store this? Usually in alpha channel. but with RGBM, alpha channel is used for the multiplier... eh... eh... oh yeah! there is something called "premultiplied alpha." by multiplying alpha value to SrcColor in shader, you can still get this working. yay?
  • InvSrcAlpha: but this is where we fail miserably. We don't have transparency value anymore, and there's no easy-n-efficient way to pre-multiply InvSrcAlpha to DestColour....... yes.... we are doomed.

Existing Solution
So what did we do to solve this problem? eh.. nothing... We simply bypassed this problem by introducing a separate off-screen buffer. We drew all semi-transparent objects onto this buffer with no HDR encoding, and merged the result onto the scene buffer later after tone mapping was done on the HDR(opaque) buffer

So basically you get HDR only on opaque objects and all semi transparent objects will be in LDR. But merging, or running an extra full-screen pass, was also slow, so we experimented with half-res or quarter-res approaches here to counter-balance the extra cost of keeping another render target.

Well, but it still use more memory than simply using only a RGBM buffer.

Blendable RGBM
So here I present my hack. Blendable RGBM. Here is a short summary.
  • it blends transparent objects in a mathematically correct way
    • so it doesn't use any extra memory
  • transparent pixels reuses the multipliers from the opaque pixels. which are already in the render target
  • but it often suffers from lower precision, so if there is a huge difference in range between the opaque and alpha pixels
    • you might see the bending effect (when alpha objects are darkening), and
    • the alpha objects might not be able to brighten pixels over what the opaque multiplier can allow
Okay, now some more details....

Making Math Work
Let's see the RGBM blending formula again:

    ( DestColour * DestMultiplier * InvSrcAlpha + SrcColor * SrcAlpha ) / SrcMultiplier

I said we are failing miserably because of InvSrcAlpha. And this was because we were filling shader's alpha output with SrcMultiplier. So what are we going to do about it? Just throw away SrcMultiplier... LOL.... Instead, we will simply match SrcMultiplier to DestMultiplier.. After this hackery hack, the formula becomes this:

    ( DestColour * DestMultiplier * InvSrcAlpha + SrcColor * SrcAlpha ) / DestMultiplier

Once simplified, it becomes like this:

    DestColour * InvSrcAlpha + SrcColor * SrcAlpha  / DestMultiplier

Alright, so now alpha channel is not used, so we can output our SrcAlpha here.

Setting Up Correct Blending Render States
I love to mess with hardware blending operations to make a new thing. I've done this once with Screen Space Decals before. And I did it again here!

To make this magic work, we need to change render states like this:
  • SrcBlend: DestAlpha
  • DestBlend: InvSrcAlpha
After this change, the blending becomes very similar to the above formula with one exception: multiplying of SrcAlpha is missing. Since we are already setting DestAlpha as SrcBlend, we can't set SrcAlpha to the blending unit. The solution? Premultiplied alpha again. :-)   Premultiply SrcAlpha to SrcColor in shader.

So are we good? no.. not really. With our current blending states setup, we get this:

    DestColour * InvSrcAlpha + SrcColor * SrcAlpha * DestMultiplier

Doh! We are multiplying DestMultiplier!! But we have to divide it :(  To fix this, we will change RGBM encoding and decoding a bit.

Changing RGBM Encoding/Decoding
This is essentially how "original" RGBM encoding works:
  • M: max(RGB)
  • encoding: RGB / M
  • decoding: RGB * M
  • alpha encoding: M / 6
To make it work with our new way, we simply reverse M, so that we multiply while encoding and divide while decoding. This will be something like this.

  • M: 1 / clamp( length(RGB), 1, 6 )
  • encoding: RGB * M
  • decoding: RGB / M
  • alpha encoding: M
The reason why I used length(RGB) instead of max(RGB) was to give it some extra room which alpha objects later can use to brighten the pixels. How I get M is very hacky. Maybe I can replace two 1s with much lower value like 0.125 to allow better precision in LDR. But keeping the min range to 1 guarantees that full LDR range is available for the alpha pass. But, I'm pretty sure someone can come up with a better way of calculating M here. :)

Finally New Blending Formula 
With all the changes up to here, our new blending formula looks like this:

    ( DestColour / DestMultiplier * InvSrcAlpha + SrcColor * SrcAlpha ) * DestMultiplier

Once simplified, it becomes this:

    DestColour * InvSrcAlpha + SrcColor * SrcAlpha * DestMultiplier
Oh, hello there! This is exactly what our blending render states setup gave us! Yay! it works!

Disabling Alpha Write
But wait.. if we output our transparency value in alpha channel, the multiplier in HDR buffer will be overwritten, right? We don't want this to happen. We need the alpha value only for blending, so we have to make sure this value is not written to the render target.

We can do it easily by masking alpha from color write. Another render state change will do the magic.

Additive Blending
My way also works with additive blending too. Set this blending states:
  • SrcBlend: DestAlpha
  • DestBlend: One
As I mentioned earlier, this approach suffers from two side effects. 
  • When the transparent objects darkens pixels too much, you will see bending effects due to the lack of precision with this approach. This happens only when the dest pixel was in HDR range.
  • Transparent objects cannot brighten what's already written in the opaque pass too much.

Using blendable RGBM would make sense only in certain conditions. I have seen those conditions personally. So use it only when you can't afford separate offscreen buffer for memory or performance reasons and when you are willing to sacrifice visual qualities in return. After all, it's a trade-off between visual quality and memory use.

If you are working on newer consoles, HDR packing might not be needed anymore. But I'm pretty sure there are still a lot of developers who are stuck with older machines, so hopefully they will find this post helpful. :)


Jun 24, 2013

Mipmap Quality

So mipmaps are often blurry and artists think we are bunch of morons making their awesome artwork so blurry.. :-) You know, most of the time we see lower mips all the time in the game. It depends on camera settings, but I found it's very rare that we see highest mip in most of the games out there.

3 Widely Used Tricks

I know there are 3 widely used ways: 1) increased texture size, 2) mipmap bias or 3) use of kaiser filter when generating mip chain.

First way can be only used selectively because it increase the memory footprint. Probably this is more viable with next-gen consoles with way bigger memories.

Second way usually introduces too much aliasing because it only pushes back when the mip starts to change. If there's a way to control the curve when consecutive mips kick in, I would love it more. (I can do all this in HLSL of course)

Third way is something I didn't see any improvement personally. Some people say kaiser filter would make mipmap so much better than box filter. But when I tired it, i didn't really see much improvement.

Less Used But More Effective Trick - Sharpening Filter

And there's another way that is not as widely used, but I found this works really well: apply sharpening. (This is why I contributed a generic convolution filter to Nvidia Texture Tools 2. There were a couple of people requesting this feature, but the maintainer didn't see this feature important, so I had to make it by myself)

The concept is very simple. You just do this for a few top mips while generating the mipchain, hopefully in your texture baker.

  1. Generate mip 1 from original texture(mip 0) by using box filter
  2. Apply sharpening filter on mip 1
  3. Generate mip 2 from the result from Step 2 by using box filter
  4. Apply sharpening filter on mip 2
  5. Repeat....

So which technique do you use to increase mipmap quality? Or is there any other trick that I'm not aware of?

Jun 10, 2013

[Curiosity Research] Pet Fox

During this weeked, I stumbled into this video showing a pet fox.

I was like.. "Really? You can have a pet fox?". So I called my brother, Google, and did some reasearch. And yeah it's possible while it's quite restrictive.  This PopSci article, posted in Oct 2012, explains it very well.

Popular Science Article: Can I Have A Pet Fox?

Short Summary:
  • There's distinction between tamed and domesticated animals
  • You can tame animals by training them since they were young. So if you tame an wild fox, she won't bite you, but she won't play with you.  In other words, no ball chasing.. no belly rubbing... no wiggling. Most importantly, the offsprings of tamed animals need to be tamed again.
  • Domesticated animals are the result from very selective breeding of "human-friendly" wild foxes. They act more like dogs. It's in their DNA, so their children will love you too :)
  • The only way you can buy domesticated foxes are from Russia. They've been doing this selective breeding since 1950s, so after tens of generations those foxes are really dog-like.
  • The price is about $6,000
  • One interesting thing though: those domesticated foxes are showing some physical difference from pure wild counterparts.

so that is today's dose of my curiosity... :P

May 21, 2013

PIMPL Pattern - Me No Likey

I haven't used the PIMPL pattern so far. I remember reading about it once long time ago, but was just not convinced it's something I want.

And now I'm working in a codebase where PIMPL patters are abundant. My verdict? Yeah I don't like it. I just don't see the benefit of it in game programming. So what are the benefits of the PIMPL pattern? I can only guess these two:

  1. clear separation between API design and implementation: for example, the grand master designer designs API and minion programmers implement those in hidden places
  2. less header file interdependency = faster compilation time
I don't think #1 means a lot for game programming. Just like in any entertainment business, our requirements keep changing, so are a lot of APIs. Also hiding implementation from other programmers sounds just weird in this industry. We usually have the full access to any source code. After all, we are not library vendors.

I think the main reason why game developers are using the PIMPL pattern is because of #2. C++ sucks with compilation time blaa blaa... But there are other way to make the build faster and PIMPL is definitely not the fastest one I have seen. Then is it incredibuild, which is pretty expensive? No the fastest one(I have seen) is Unity builds, which is completely free.(duh)

So I don't see benefits of PIMPL at all. Instead, I see more disadvantages:
  1. Code is harder to read: who likes to jump around different classes to see actual implementations? Maybe you do, but I don't.
  2. Discrete memory allocation: you need an extra memory allocation to instantiate the pimpl object. I'd much prefer to have all my members in a class, so that there's only one memory allocation and it's straight-forward to find out the size of an object.
  3. Another pointer dereferencoing = slower: sure. some people might say "it's not that slow." but I'm just nazi about it. one additional pointer dereferencing is probably about 4 cycles extra... but with the separate memory allocation for the pimpl object, there's more chance it would trash cache lines. You can probably avoid it by carefully controlling the allocation order, but it sounds like very tedious work that I don't want to deal with.
So, me no likey.... or am I missing something here?

May 18, 2013

Random thought on 1080+P

some random thought I had while sitting on a bench at a rainy park today....

I finally watched James Cameron's Avatar the other day on DVD. It was a good movie, but the DVD quality was not that great on my 1080p display.  It's after all 480P right? Now people are used to 1080P, so DVD quality wouldn't cut it. (even youtube video looks better than DVD, jesus!)

So then what's next? will we need anything over 1080p?  I believe the common belief was that our eyes can't process more than a few million pixels so going over 1080p would be waste.(1080P is about 2 million pixels)  But apparently we can perceive way more than that....  So we will need more than 1080P.

But one day TV resolutions will overcome our eyes' capabilities right? Then can we stop adding more pixels on the TV?  I don't think our eyes are the limit.. the manufacturing cost would be the deciding factor.. Say our eyes can only process 1,000 million pixels.  But if we are given a TV with 4,000 million pixels, our eyes will see 4 pixels as one pixel. right? So it would be same as blending those 4 pixels into 1. Oh sweet! it's a 4X super sampling antialiasing! If it's gets 16,000 million pixels it's 16x SSAA.. I can't say no to this.. :)

But as I said, it might become too expensive to add that many pixels one day... but price of LEDs are going down everyday and smart people are making LEDs use less power. so that might not be a problem.... or it might become physically impossible to add more pixels one day? just like how we coudln't make single-core CPUs faster anymore? Maybe.... but still 576 Million pixels are long way to go....

Jan 9, 2013

Goodbye XNA, Hello SharpDX

I've been using XNA since the beginning. I liked this platform a lot because it allows me to write something that runs both on PC and Xbox 360.  And it is even possible to monetize your games on Xbox 360 Indie Games channel.  However, the importance of XNA games on Xbox Live turned out to be very minimal, and MS changed XNA a lot to support Windows Phone 7.0.  From that moment, (XNA 4.0 I believe), XNA became somewhat awkward.  PC and Xbox 360 became second-class citizens, i think.

Still I used XNA quite often.  It was a great tool to prototype anything. Its contents pipeline and ability to use DirectX in C# really gave me a huge advantage to make any demo in it.  For example, I even made the demo for my SIGGRAPH presentation in XNA.

Now XNA is dead because Windows Phone 8's not gonna support it. WinRT is apparently the new winner....Also Visual Studio 2012 doesn't support XNA officially. (an unofficial way exists, but whatever?) Sure, one can still stay behind with Visual Studio 2010 and XNA 4.0, but now XNA, aftert being neglected for a while, has some limits.
  • no 64-bit support
  • only supports DirectX 9 (no DX11 support)
So now time to find an alternative.... The reasons why I liked XNA were:
  • C# support
  • contents pipeline

So I just need to find an alternative which covers these two... right?

C# Support

There are two major libraries that supports DirectX in C#, or any .NET language:  SharpDX and SlimDX. Both libraries have almost identical API, thus the usage is very similar. Furthermore, both supports 64 bit. I needed 64-bit support for my recent work, so I tried both libraries, and decided to go with SharDX for following reasons:

  • SharpDX is faster (less overhead on API calls)
  • it's easier to install SharpDX(simple DLL file copies)
So with SharpDX, C# support issue is covered

Contents Pipeline
Honestly, I still don't see anything that's comparable to XNA's contents pipeline yet.  Texture is not a big issue since both C# and SharpDX's DX calls support textures pretty well. When it comes to other asset types, such as audio and mesh, it's still not that easy.

One good news is that Visual Studio 2012 contains an example which loads FBX files, I heard, and I believe Microsoft will enhance this side for WinRT, if they are really betting on Windows Phone 8. So maybe it's just matter of time?

I wish I had some time to make pretty solid contents pipeline for SharpDX or something.... But by looking at my schedule for this year, I don't think I'll ever get to do it.

My Conclusion - SharpDX
So I decided to live with SharpDX for now. My main reason was the 64-bit support.  My prototype's already using more memory than what 32-bit application can handle. I'm not worried too much about textures and mesh support... For other contents types, I'll solve the problems when I need them.

Update - Jan 12, 2013
Nicolas just told me his Assimp.NET can handle some asset import/export and it's actually used by SharpDX too. So if you need an asset pipeline check it out at Apparently FBX files are not supported yet, but since there's already FBX SDK from autodesk for free, I would think it's just a matter of time.

Jan 4, 2013

Legendary Football Player, Dale Winners

I used to live in a townhouse with a very spacious basement, which my friends and I used to hangout, drink, do some random jams/recordings and sometimes work together. And this is one of the recordings we did... (or should I say Dale did because he's the only one doing all the voice acting?)

It's been in my HDD for the last 2 years, and I finally decided to edit it to show it to the world because it's hilarious.  Enjoy

Jan 3, 2013

Another Useless Meeting

I don't remember exactly, but I think it was a presentation on how Apple works internally. A few key points I thought  cool:
  • Every meeting must draw some decisions(action plan)
  • Don't even bother to do a meeting if no decision will be made
  • Don't include any employee who has no authority to make decisions
Personally, I don't enjoy wasting my time in useless meetings: I'd rather go back to my desk and do some coding, so I liked the idea.

Later, I actually found myself in a meeting, which followed the above rules very faithfully, but still managed to waste my time. Better, it was even a very lengthy meeting. I won't go into too much details about what was discussed in the meeting. The problem of the meeting was not that it didn't make any decisions: it actually made decisions, lots of them. And the decisions were very sensible. However, those decisions were very obvious even before the meeting to anyone and no other decisions were even possible to be made.

I think the following should be added to the above rules to save my time.
  • Don't even bother to do a meeting if only one decision can be made from it