~~~~~~~~~~~~~~~~ foreword ~~~~~~~~~~~~~~~~
When I started this blog, one of the things that came across my mind on creating new contents here was the writing on programming stuff. Besides the fact that I'm working as a programmer, I believe it's good to share the things I've learned with anyone who may be facing the same problem again (of which I got a ready solution here). And another thing is I've always found it difficult to get a "safe" place to put down these valuable ideas mainly because of my personal weariness disorder (I'll probably explain this in the future).
So, here comes the 1st post in my programming dimension.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Today, I got a problem in one of my programs which is actively running in my responsible customer's site. It's a "translator" program that picks up wire feed files coming from news agencies and translates them into a native format acceptable to the system that would make use of it. So, these feed files come in non-stop and a single wire news might be split into multiple files. For that reason, these files use a series of running numbers in their file names as a way to indicate the sequence in which they have been generated (and, the sequence in which they could be reassembled back).
OK, so that's the details we need to know now. As for my translator program, it's running on a configurable interval. Say, in every 5 minutes, it would go inside the source folder to which the feed files are generated, gather all the files currently in it, and start parsing the contents inside (and merge them if necessary). Then, the problem happened. First see, for a wire news that is split into multiple files, those files can only be merged if the file names are presented in a continuing series of running numbers (e.g. 1, 2, 3, 4, 5, 6 and so on). Here's an example (file names all start with
xxx_):
File #1: xxx_1 (1st legitimate file of wire news
a)
File #2: xxx_2 (2nd legitimate file of
a)
File #3: xxx_3 (3rd)
...File #10: xxx_10File #11: xxx_11 (last legitimate file of wire news
a)
File #12: xxx_12 (1st legitimate file of wire news
b)
... (and so on)In Windows Explorer, these files are sorted perfectly in the order as seen above. However, as my program is built in
VB.NET, the
GetFiles() function I'm using returns the the file names in this order instead:
File #1: xxx_1 (1st legitimate file of wire news
a)
File #2: xxx_10 (10th legitimate file of
a)
File #3: xxx_11 (last legitimate file of wire news
a)
File #4: xxx_12 (1st legitimate file of wire news
b)
File #5: xxx_2 (2nd)
File #6: xxx_3...File #12: xxx_9... (and so on)Now don't get it wrong here. The wire news 'a' was not built out successfully from the files
xxx_1,
xxx_10 and
xxx_11 (file
xxx_11 was last in the legitimate queue of wire news 'a'). This is because like I said, the split files could only be reassembled if they are processed in the order that a continuing series of running numbers are presented. Thus in this case, wire news 'a' was failed to be reconstructed upon receiving file
xxx_10 (file
xxx_2 was expected after
xxx_1).
After knowing this is the order the
GetFiles() function would return the file names, I immediately built a function just to sort the file names in the legitimate fashion as what's seen in Windows Explorer every day. Oh wait, this was the order used back in Windows 98 where the numbers in file names were totally ignored. So, the
GetFiles() function was built out from Windows 98?
OK, I solved my problem. And I'm putting it into my archive (blog) here. So next time I would not take things for granted again in much the similar way of
What You See Is What You Get (
WYSIWYG).
What I see (sort order of file names in Windows Explorer)
Is 'Not' What I Get (sort order of the
GetFiles() function). :S