In terms of my original posting with its original title and purpose, I think this item can be closed out with my thanks for the assistance provided. I would like to pass along some further information concerning the application which motivated the original questions because I think it might be useful to someone somewhere working within the broader scope of this AS3Lang forum. [Probably that means I should post in some other topic with some other tags, but I will just provide the information here and trust that others might locate it by their own search methods.]
- To the comment that I have been “cheating” or otherwise not forthcoming on the application on which I am working and must be “definitely building a product for users, not just for” myself, I apologize for whatever may have led to such an observation. From the beginning, as I briefly stated in asking for help, I am attempting to write an application, not just writing some exercise for some programming class. As with all such endeavors, it is my hope that my family and friends will find the application well-enough conceived and executed to be of some use to them. There has never been any commercialization of anything I have produced and there has never been any attempt to sell anything to anyone.
Indeed, the primary reason for taking a moment to pass along some information I have gained in my work to date is only to openly share that information with anyone who might find it useful, and, by all means, if anyone would like to have any of the sources for what I am working on, please just send me an email and I will gladly reply with a zip file and some instructions.
- That last sentence provides a useful connection back to the original questions and recommendations concerning how best to architect a solution to the challenges of keeping track of a rather large collection of files spread across a rather large number of devices attached to a relatively small number of locally-networked host computers. I have found that the best combination based on AIR and AS3 is to offload as much of the ‘hard lifting’ with regards to File System operations as possible to low level code running as ‘close to the metal’ as I know how to handle it.
My original desire – the reason that I have used AIR for the last decade – was always to develop such applications as these, and have it run as nearly as possible exactly the same on my kids’ MacOS as on the elder folks’ Windows boxes. With, however, Mr. Jobs, doing all that he did to kill the use of the Adobe tools on his Apple devices, I have to say, up front, that I have had to abandon all MacOS efforts. So, when I write about getting as close to the metal as I know how, I am just talking about Windows-based PCs. Just trying to work with the APIs provided there gives me headaches enough. Since I have never been able to get through the C++ learning curve, I am really just talking about the Windows C Runtime, and within that domain, I am talking about shared dlls compiled and tested for 32-bit and 64-bit deployment.
So, the instructions on how to use what I have been using are, unfortunately, pretty messy. On the Actionscript side, I am completely devoted to FlashDevelop where, sadly, we may have seen the last release of that outstanding IDE. On the C side of the street I have eschewed VisualStudio in favor of old-fashioned command line tools like ‘vi,’ ‘make’ and ‘ant.’ [I should, however, pass along a plug for another open-source tool that could very well be used to do what I am doing. The FRESharp ANE github project lets those of us who can’t handle C++ work with the very comprehensive C# framework. If I decide that C# does not drastically reduce the performance of my ‘backend’ I will change from using C with only command-line tools, to taking advantage of VisualStudio to manage C# projects.]
- The connective glue is the AIR Native Extension, not the NativeProcess means of communicating between the user-facing code and the file-facing code, and the main information I am passing along here relates to my use of threads to increase the throughput of the application. You likely know that AIR Workers provide a useful means of achieving some multi-tasking, but, unfortunately, they do not permit a Worker to use ANE. The creation and management of threads, therefore, has to be handled inside the ANE implementation – Windows [not POSIX] threads, in my non-C++ case.
I only have a modest i5 [6th-generation] quad-core [with no hyper-threading, therefore, maxing out at four threads at the hardware level] host on which I work, but I believe my findings would extrapolate naturally to Intel and AMD-based boxes with 6-12 thread possibilities. The first rule is that it never seems worthwhile to launch more than “One Less Than the Maximum” number of back-end tasks. If I try to run four parallel disk-search-intensive tasks, things break. If I run two or three parallel tasks, there is a nice, linear increase in throughput.
- So, with all that as background/context, here are some numbers for testing against only locally-connected hard drives. [The application eventually needs to work smoothly across local and remote hard drives, and I have done some preliminary tests to show that the software gets the correct results from remote drives, but, as you would know, the bandwidth reduction from local attachment to LAN-cable or WiFi remote attachment has a decided impact on how many bits can be handled per unit of time.]
My D:, E: and F:, drives are current-generation [Western Digital Black] one- and two-terrabyte devices. [I also have volumes C:, G: and Z: which are mounted on SDD devices, but, for now, there is still such a large price/performance differential between the two types of external storage, the real target of the application is the high-volume, lower-performance HD farm, not the SDD farm.]
My test case is to produce a list of all the directories and all the files on the D:, E: and F: volumes. [The output is written into .txt files which are themselves archived with a datetimestamp for later analysis.] For more pure benchmarking purposes, it would be good to have an automated launch of the three tasks, but since I am mostly trying to make sure the application works and that I can see what breaks and how to fix it, I don’t have any such test harness. Rather, I am actually launching each VolumeList task using the U/I, which involves opening an Explorer-like popup window and allowing the user to select any directory as the root of his search/listing. Thus, there is some necessary time delay between the start of each task.
Here’s a table of results:
D: 3 424,614 120
E: 1 526,914 67
F: 2 120,973 70
And this is how to understand the four columns. The first is the volume ID. They are listed in the order of launch. The second is the order of completion; the third is the number of lines of output; the fourth is the elapsed number of seconds for the task.
While the test is running, I am looking at the Windows 10 Task Manager Performance Tab with nice displays of overall CPU Percent Usage and scrolling graphs of the utilization in each of the four cores as well as displays of percent busy for each disk drive. I am also getting heartbeat messages displayed in the U/I [an event dispatched for each 50,000 lines of output generated by each task. The U/I output lets me see that whatever scheduling algorithm Windows is using, it seems to be giving a nice opportunity for each task to get some concurrent work done – the event reports come back in no predictable order. The aggregate CPU utilization jumps up to about 30% when I start the search of drive D:, jumps up to about 65% when I add the search for drive E:, and pretty much maxes out when I add the third concurrent search.
My two primary conclusions are: a.) the total elapsed time to get about a million directories and files listed – approximately two minutes – is very much less than it was when using AIR’s File.getDirectoryListing; b.) since the parallelization has been divided in such a manner that there is very little contention between the separate threads, the total elapsed time is very close [within about 5 percent] to the elapsed time of the longest-running single task.
The last point probably needs a little clarification. In the real use case – not this one – the user is free to select any directory anywhere for obtaining the fully-recursive list of all sub-directories and files. So, in a real use case, I have to expect that competing threads will sometimes be running against the same physical HDD. In that case, the contention for the read/write head of the actuator will be being pulled this way and that in such a manner that I expect the benefits of concurrency may be negative. I have yet to conduct those tests.
- So let me wrap up with a request for any comments that anyone may have from their similar work. Here’s a rerun of my test:
D: 1 424,614 25
E: 3 526,914 31
F: 2 120,977 19
And here is what is interesting and perplexing. You can see that the time required for each volume listing is drastically reduced – 60-70 percent reductions. As a result of the reductions, the order of completion is somewhat modified. One of the observations is that the elapsed times in relation to the number of lines of output has a somewhat unexpected behavior.
In this table:
D: 282 58
E: 578 58
F: 127 157
the value in column 2 is the number of microseconds per line of output for the first run and the value in column 3 is the number of microseconds per line of output for the second run. To which I need to add what I was seeing in the TaskManger display during each run. In the first run, as reported above, the CPU utilization jumped with each additional task and as each task started the Disk utilization associated with each volume remained pretty steady at around 80-90 percent. In the second run the same jump in aggregate CPU utilization steps were observed, but they were somewhat lower – each task adding about 25-30% so that the max was in the range of 90%, not 100% as in the first run.
But here was the very unexpected result, and one for which I would be very interested in gaining a proper understanding. If the tests are run right after a system boot, you get the results for the first case; if, however, the tests are run a second time, you get the results for the second case and in the second case TaskManager shows NO DISK UTILIZATION. The reason for the drastic reduction in elapsed time for the subsequent tests is that by some miracle, the Operating System is able to provide a list of the files with no read access to the drives themselves to obtain that data.
I have never read anything indicating that whatever caching mechanisms are built into an OS as a sensible performance enhancement, that said caching is massive enough to provide for the probability of being able to retain in main memory [paged of course, not RAM] a complete listing of all the millions of files available across its disk drive farm. I would appreciate being pointed to any sort of solid reading material concerning how Windows, MacOS and various Linus distros might compare in this regard. Being able to access over a million lines of searchable text representing the full path of each file on those three drives in about a half a minute is a real eye-opener to me and will greatly facilitate the application development of the ability to identify likely duplicates amongst those millions of possibilities.
That’s about a wrap. If you sense some excitement in this small achievement, you sense correctly. Having to go back and relearn C coding that I first encountered on some DEC pdp-9 box circa 1974 was more exasperating than I ought to admit, but the results make the pain go away. In the spirit of this forum, I hope that what we do with Actionscript may be as long-lived.