Thursday, December 10, 2009

Pet Peeve

If you ever make a recently used file menu, make sure it's updated directly after the file is loaded, preferably between you choose to load the file and the program actually loads it!

I often use editors and tools that are in an unstable state and crashes all the time. Typically the crash is related to a particular file that makes it easy to reproduce the problem. When you are working with this kind of problem you typically make some change to the program that you hope fix the problem, load whatever you loaded last time and try to make some actions that crashes the editor again. The recently used files menu is your best friend for this kind of workflow, however lots of tools don't save the recently opened files until it does a clean exit. This means that when you work with a support case in some obscure temp directory you find yourself browsing for this file 10 times and at some point you get so tired of it so you load the file, exit the editor cleanly and open it again to make sure it's available in the recently used file the next time you open the editor.

Saturday, November 21, 2009

File Format Rant

File formats is something necessary evil that shows up in almost every program of some kind of complexity.
Modern languages often have good serialization systems that can take care of much of this automatically, but in the end if the data should be passed between different applications written in different languages on different platforms you'd most likely need some kind of standardized format to store stuff.
I've used and created many file formats in my life and there is one thing that makes me really upset. When the format designers think they are doing the users a favor by supporting many ways of storing data.
Typical examples:
  • Is the file format big endian or little endian when it comes to storing the words? Let's introduce a flag so it's up to the writer, after all you don't want to introduce unnecessary conversions!
  • Is the image stored as RGBA or ARGB? Well, introduce a big enum with lots of different encodings, why should two be enough, someone may have ABGR encoded images.
  • Is the image coordinate system with a upward or downward pointing Y axis? Let's support both, after all it makes it so much easier to write the file if you don't have to convert it before!
  • Is the camera stored as a point and a look direction, or perhaps a lookat point, or perhaps a matrix. And what about the projection, horizontal fov, aspect ratio, or vertical fov? Let's support all of them, preferably with some kind of redundancy conventions like if you have stored horizontal fov, vertical fov and aspect ratio, the vertical fov should be ignored.
I personally think a file format should have the confidence to be a standard rather than supporting every standard. For a file format that has some kind of data interchange role, there is guaranteed to be way more readers than writers out there, so make sure reading the file is the priority, not writing it!

From the reading point of view it's a nightmare to be able to support thousands of permutations for how the file is stored. Often you need a big test suite with sample data and differently encoded files to make sure you cover every single way of doing things. I ran into a TIFF file the other week that failed in my application because it was stored in a non interleaved way (i.e rrrrrr...gggggg...bbbbbb rather than rgbrgbrgb...). I've never seen one of those in my entire life and also Maya didn't handle it properly!

There are of course times where the requirements makes it important be flexible when it comes to writing.
  1. Your format has high performance requirements. Perhaps you need to know that you can memory map your image in a buffer to load it blazingly fast, or perhaps being able to read it as a blob and typecast it to a struct with well known padding to avoid having to translate every read word in some fancy way. This however is a dangerous path, there are many platforms out there and if you start using a highly platform dependent format you may make your program worse performing or hard to port. It completely makes sense for data stored in the temp directory though.
  2. Your format represents the internal state of your program. A 3d modelling package that supports many ways of working and specifying a camera needs to make sure whatever what used is what's in the file when loading it again. These formats should generally not be used for data exchange though.
So please think about the people reading the files, rather than the ones writing them. They are more than the writers. If there are programs that claims to be compatible with your format but fails to load a file someone wrote you didn't really made anyone a favour by making your format easy to write.

Sunday, January 25, 2009

Memory mapped missunderstanding

I've become a stack overflow addict recently. Stack overflow is actually a really awesome programming community. It seems to be working very well when it comes to pushing the good and relevant answers to the top of the list so you don't need to spend time reading stupid, redundant or incorrect answers.
Regardless of that, reading programming forums is always stressing for me since sometimes people seem to be cooperatively clueless.

I recently had one of those experiences started by someone claiming one of my posts being stupid because I suggested using memory mapped files for large (4GB+) random file access. His motivation was that a memory mapped file that large would use all the address space of a 32 bit process. This is plain wrong, if anyone tells you that you need to map up the entire file in your address space, they don't know what they are talking about. You can choose how much of the file you want to map using one or multiple windows into the file. These are possible to move and manipulate in many cool ways. It's not for free, but it's definitely not a reason not to use memory mapped files!

Anyway I did some more search on postings on memory mapped files on the site and it seemed like people found this address space issue a reason not to use them!

The irony of the story is that the reason I started using memory mapped files in the first place was to prevent a problem caused by lack of (continous) address space in a 32 bit process :)
They are not only good for fast io, but also for throwing out large runtime data to disk and access it more like memory than files. 

Wednesday, January 14, 2009

CMake

Instead of telling you about the purpose of this blog I'm going to write a note about my experiences with CMake.
I've been upset about not having a good cross platform tool for writing make files. Make files actually is a somewhat too specific description. Perhaps a better one is build configurations.

First of all, I prefer to develop in visual studio, I really need a visual studio project file to work with. Visual Studio projects is a source of problems themselves. Typically when introducing new library paths or other global changes I tend to forget to update all versions (Release, Debug, Release with symbols, ia32, x64, etc.) so after commiting my code, a co worker who works on a different configuration, or even worse in a different project file (we used to need both visual studio 2003 and 2005 to build for all platforms) ends up in a shitty situation. Add to this a separate build system based on make for Linux and Mac that needs to be updated.
You can always blame this kind of errors on sloppy coders, but the problem is really in the process. If there are things that's easy to get wrong, people will make errors so it's better to make things easier to get right.

My personal experience with setting up a build system that works properly on Windows, linux and mac with working dependency tracking is quite painful. The root of all evil in our situation is that you want no redundancy in the included files and settings for the projects. They should be specified in only one place. You also need quite much flexibility in platform specific file lists and compiler flags. The natural approach would is to have the same build system on all platforms with abilities to make platform specific exceptions. I've seen lots of programs that claims to be better than good old make, but CMake is the first I've found that lets me work in Visual Studio without pain and still supports linux and mac properly.

My first impression of CMake wasn't that good. It was quite obvious that the script language was not designed by Donald Knuth. It was built around some macro like constructions with poor type checking and global variables where you feed in your compiler options. Accidentially getting a character wrong in a global variable name means you silently define a new nonsense variable rather than changing the setting you wanted. This is really a pity. I definitely think the CMake developers should have built their system around python or some other decent scripting language rather than creating their own less than stellar language.

After playing around some more with it, it seems very much like a program that does what it's supposed to do. Perhaps not in the most fancy and extensible way, but it gets the job done. That's truely awesome, I've been looking for this program for ages!

The major change in how I will develop with this system is that adding new files, library dependencies is done in a separate configuration file instead using the gui in visual studio. Then you just regenerate the projects and all is fine. The added bonus is that this one change update all configurations and platforms. To make things more convenient, they actually have a dependency on the specification files in the generated visual studio projects so when you change the project definition it will trigger a rebuild of the visual studio project. It seems to work good for most situations but I've managed to make things wrong in a way where the old project file was ruined and no new one was generated. This meant I had fix the error and manually run CMake to get the project file generated properly.

We've run into lots of problems so far, but all of them has been possible to work around. I've noticed when googling about the problems we've had, many of them have been solved quite recently. I chose to think of this as a good thing. If there are serious problems they provide some kind of solution rather than letting the users know that they are doing things wrong.

So far I've only been using it in Windows and it seems to do good. I've got it to use precompiled headers, build dll's, generate projects for both x64 and ia32. Soon, I'll see if the system pass the final tests, Linux and Mac.