Performance versus AIR -- Assistance in Evaluating


#1

Hello,
I am looking for some specific facts concerning why I should or should not consider incorporating the redshell in an Actionscript 3 application. [ I am not at all interested in any sort of flame warfare that some might think as a helpful response and since, at the end of the day, I have a private development decision to make, if there is simply some way to take this thread offline, that would be preferable.]

A brief description of my application would be that it is intended to help solve the problem that, after so many years of so many PC-based projects, I have hundreds of thousands of files in almost a hundred thousand directories spread across a time-varying set of local and remote Windows volumes. I need a tool to help me identify duplicates so that I might eliminate and/or reorganize directory hierarchies to better support quick navigation to content of choice.

I am well aware that there are a variety of tools, some open source, others commercially-developed and made available in a “basic” version at no charge or a low license cost, so, to answer your question of “why,” the answer is really two-fold. First, I will complete the project because, like most of the folks from the as3lang community, I have a powerful set of development tools and a never-dying interest in using them to solve meaningful problems, if only to keep my 80-yr old brain from becoming too mushy. Second, with regards to the heuristics for determining that two files “very likely might represent the same source material” I have some different ideas than any I have found in use by the available applications.

So, to cut to the chase, I already have some of this functionality prototyped in AIR, have some idea where the very long execution times will be encountered in handling the file system operations required and, having done something similar just with regards to extracting factoids from jpeg and png files [which, as can be imagined in a world in which every grandchild was turned into a photographer by a cell phone, constitute a significant proportion of the files to be compared], I am at the stage of “going parallel” with workers. Even with that, especially with regards to files serviced either just by wifi, or even by the yards of Ethernet cable which are hiding under the floor mats of the house, some levels of file content [as contrasted with file metadata] comparison will result in rather long run times – sometimes close to an hour.

With all of that as preface I would appreciate any facts that would help me decide on the next necessary optimizations:

a. Are there any implementation reasons why a runtime execution of the flash.filesystem classes for an AIR application and a redtamarin application would be significantly different?

b. Are there some demonstrable reasons why I might expect to see significantly different performance in reading file streams if I created an ANE to call the C-library equivalents?

Let me end by saying that my use of the adverb “significantly” is, or course, not precise, but is rather intended to indicate that I am entirely comfortable with the trade-off of performance and language/framework features that AIR has provided us for the last ten years, so if someone says, for example, that it might be possible to obtain a sorted list of the files within a directory 20% faster using native code, that would fall below my threshold of significance because, even across tens of thousands of directory searches, I find the AIR application quite performant enough. So, I think, it is primarily in the matter of simply reading the file byte streams, memory managing the Flash objects and dispatching and handling the associated events where differences would have to aggregate to some interval of time, or perhaps some level of CPU load, that would make a difference.

If anyone has hard comparison numbers, let’s say, for reading a thousand files with a range of file sizes on the order of small [1-5KB], medium [6-50KB], large [51KB-500KB], and very large [anything else], that would me most helpful.

Thanks to anyone, and thank you for keeping Actionscript 3 in focus.


#2

yeah OK …

as is, redshell being a command-line tool, the only way to integrate it with an AS3 app
is with AIR on the desktop using NativeProcess

right there the wording of the question is quite bad, as you can build AS3 apps with redshell too

so, let’s clarify, using Flash you can build AS3 GUI app running in the browser
using Adobe AIR you can build AS3 GUI app running on the desktop and/or mobile,
using redshell you can build AS3 CLI app running on the desktop or the server.

please no, as the maintainer of redshell I would rather have all questions/discussions public
because I do not have time to repeat the same answers for each individuals


it seems you’re overcomplicating the problem, but here what I can say


  • if it is an archiving/backup problem eg. “I don’t want to lose those files”
    • favour a centralised solution
    • remove the “remote windows volume”
    • ignore file duplication problem because hardware is cheaper to store duplicate
    • reuse known hardware/software setup for consistency

simply put use a NAS, either well established solutions like Synology/Qnap or custom build
and use something like OpenMediaVault that will do the software part for you and some plugins check omv-extras.

Check blgo posts like My perfect 2018 media server. Openmediavault: NAS/storage + multimedia services, plenty of guides, videos etc.

In the category of backup/archiving/NAS different people have a lot of different needs, but here with a solution based on OMV you’ll find plugins to do stuff like sync to cloud (AWS, etc.) or “find duplicate files” even.

It would be quite the same with Synology/Qnap, you’ll find “application portal” to customise what you can do with the NAS.


  • if it is solely a file duplication problem, eg. “I absolutely want to control what is duplicated or not”

I get the idea of wanting to do things yourself, but when the problem is a very common problem that exists for a long time, you really have to weight the “reuse stuff” vs “reinvent the wheel myself”.

At the very worst, if for some reason you absolutely want to build it yourself, port an already existing solution to your own environment, eg, port the Python source code of dupeGuru to AS3.


With AS3 in general you will have to be able

Note:
that “identical file” can be defined differently
some people will consider identical: same name and same size
other people will consider identical: same size
etc.

Now for AS3, the difference between Adobe AIR and redshell would be related to the API that can allow you to access the filesystem.

With AIR you get flash.filesystem.File which inherits flash.net. FileReference
so you can access name, size, isDirectory, getDirectoryListing() etc.

With redshell, you can reuse shell.FileSystem, but you can also directly use C functions in C.stdio and C functions will allow you do do stuff that AIR may not do by default, like stat() a file without read(), which should be faster to get the file informations

and in fact, you can look at the implementation

which reuse C functions like readdir(), stat(), closedir(), etc.
to build your own “more optimised for what you need” implementation

you will be also able to find numerous hash functions here

that you can either reuse with Adobe AIR or redshell


You will find that there many different ways to approach the problem, for ex: favour the C function stat() for its speed to avoid to read the file itself, use hash table to store all files informations: name, size, directory or file, etc. and only at the end use a hash function, recurse, till at the end you get a group of “duplicated files”, etc.

You can prototype using redshell on the CLI, and then either port it to Adobe AIR for a GUI, or reuse nativeProcess to reuse redshell from AIR, or keep doing it on the CLI, etc.
Or port the C functions you need to an ANE, so you can reuse this kind of native access in AIR

Again, there are many ways to do it.


We don’t know the details of how Adobe implemented flash.filesystem classes,
we have the API, we can compare similar functions for speed for ex, but that’s about it

Even if Redtamarin were to provide an implementation of flash.filesystem.File
we would try to match the API, but we could not match exactly 100% the native implementation
(as we don’t have access to it)

but who cares? that’s where I would say you are overcomplicating things
if your worry is speed, then test/compare the speed


depends, in general I would say C functions would trump anything else in term of speed,
but then badly implemented it could also not bring the speed result expected

or it could be a case where as long as you are in the ANE context it is fast
but as soon as you transfer the data to AIR the serialisation of the data can slow down things

but again if you want to really know you have to demonstrate that yourself
as it seems nobody either did that kind of comparisons

measure it, compare it, and then interpret the results

For example, when I implemented hashlib, at first I did not want to implement it I would have preferred to reuse an already existing library like mx.utils.HashUtil, but comparing to original C implementations, it revealed that HashUtil had faulty implementation and I could not have that.


Because of your very specific need and scope, I’m afraid you gonna have to do those comparison yourself.

it is always the problem of what is “good enough” for your need.


#3

Thanks for considering my request for assistance in evaluating the various means by which I might implement the features required of my AIR application.

For the possible benefit of any other members of the AS3Lang Community following Redtamarin topics, I thought I would provide some of what I have learned from my investigations.

  1. For the task of providing a text file containing a breadth-first, ordered listing of the sub-directories and files starting at the root of a Windows 10 volume [e.g. C:\ or D:] my test bed is a volume mounted on a 2TB local drive. The output is approximately 110,000 lines and the average depth of a file is about 4.5 directories deep.

  2. The typical duration to create that output using AIR’s AS3 Classes is about 41 seconds.

  3. My first comparison was to configure the parameters for the unix find command to produce the same output with Cygwin. That proved to be about 23 seconds.

  4. After learning how to create the comparable application as a redbean projector, I got the time down to about 21 seconds.

  5. Since, however, so much of the total application’s appeal will have to do with its User Interface being nicely designed/oriented to non-technical users, I decided I ought to write the same C code as an AIR Native Extension. That has now been accomplished and the elapsed time is down to 19 seconds.

Different developers with different application requirements will, of course, have different opinions about what these test results mean. I have no agenda, but for my purposes these are my conclusions:

A. A 100% decrease in elapsed time for a workload which will be frequently encountered in the use of my application is significant and is worth any additional engineering effort to make it happen.

B. As one with years of AS3 and AIR experience, any opportunity to take advantage of those skills rather than having to devote man-months to achieving some level of competence using other languages, other frameworks and other IDEs is worth pursuing. [Just spending a couple of days re-learning C string handling sparseness is enough to convince most folks that ‘looking back to those good-old days’ is not the best use of one’s time.]

C. Redtamarin, unfortunately for me, does not provide as good a means of designing and implementing the communication between the AIR runtime and the maximum-performing native code needed by my application as does the ANE contract using shared byte arrays and asynchronous events.

It was certainly easier to write and debug the application logic taking advantage of the amalgamation of AS3-ness and C-ness in a single source file. If I had an application better-suited to the Redtamarin paradigm of a command-line application and CGI-style inter-process communication, that is the way I would go.


#4

Fair enough.

Look you try to combine almost 2 opposite things:

  • performance / speed
  • ease of use / nice UI

AIR + ANE is a pretty good combo for that, but yeah the performance part will not come “free”
you will have to go into the gruesome details of the ANE part, but that’s perfectly normal.

For Redtamarin, at one moment I was thinking publishing an ANE for the C API part, at least for the desktop, maybe for mobile, so if people use the C API Redtamarin-side they can reuse it almost “as is” AIR-side, but it’s not a high priority, it could happen but other things are more important.

Now I think you’re “cheating” on the context, to me it seems you try to build a commercial product but you present that as a personal project, I mean this

User Interface being nicely designed/oriented to non-technical users

oh yeah this guy is definitively building a product for users, not just for himself.

Not a big deal, I would have answered the same things, but it does put higher requirements on the UI part, and that was not that obvious from the original post (I thought you were trying to solve that problem just for yourself, so yeah different context).

In that context, yeah you could build a redshell projector that is distributed with the AIR app as said earlier

you don’t really need shared bytearrays and asynchronous events in redshell to do that, at the contrary you want the command-line exe to stay synchronous so it behave as expected with NativeProcess.

here some pseudo-code example

package something
{
    public class foobar
    {

        //...
        
        private function onProcessOutput( event:ProgressEvent ):void
        {
            var process:NativeProcess = event.target as NativeProcess;
            var data:String = process.standardOutput.readUTFBytes( _process.standardOutput.bytesAvailable );
            trace( "stdout: " + data );
            
        }

        private function onProcessError( event:ProgressEvent ):void
        {
            var process:NativeProcess = event.target as NativeProcess;
            trace( "stderr: " + process.standardError.readUTFBytes( process.standardError.bytesAvailable ) );
        }
        
        private function onProcessExit( event:NativeProcessExitEvent ):void
        {
            trace( "Process exited with " + event.exitCode );
        }
        
        private function onProcessIOError( event:IOErrorEvent ):void
        {
            trace( event.toString() );
        }

        private function _getPlatform():String
        {
            var platform:String = "";
            var manufacturer:String = Capabilities.manufacturer;
            
            if( (manufacturer != "") && (manufacturer.indexOf( " " ) > -1) )
            {
                var tmp:String = manufacturer.split( " " )[1];
                platform = tmp.toLowerCase();
            }
            else
            {
                platform = manufacturer.toLowerCase();
            }
            
            return platform;
        }

        private function _launch():void
        {
            var filepath:String;
            var args:Vector.<String> = new Vector.<String>();

            switch( _getPlatform() )
            {
                case "windows":
                // very weak way to do it
                filepath = "C:\\Windows\\System32\\cmd.exe";
                args[0] = "/C";
                args[1] = "echo %USERNAME%";
                break;
                
                case "macintosh":
                case "linux":
                filepath = "/bin/sh";
                args[0] = "-c";
                args[1] = "echo $USER";
                break;
            }


            var file:File = new File( filepath );


            var startupInfo:NativeProcessStartupInfo = new NativeProcessStartupInfo();
                startupInfo.executable = file;
                startupInfo.arguments = args;

            var process:NativeProcess = new NativeProcess();
                process.addEventListener( ProgressEvent.STANDARD_OUTPUT_DATA, onProcessOutput );
                process.addEventListener( ProgressEvent.STANDARD_ERROR_DATA, onProcessError );
                process.addEventListener( NativeProcessExitEvent.EXIT, onProcessExit );
                process.addEventListener( IOErrorEvent.STANDARD_OUTPUT_IO_ERROR, onProcessIOError );
                process.addEventListener( IOErrorEvent.STANDARD_ERROR_IO_ERROR, onProcessIOError );

                process.start( startupInfo );            
        }

        //...
    }
}

With something like that, you launch the native process from your UI and you wait for the result reading the standard output, it is perfectly async, even if the command-line part is perfectly not async.

If you were to build that with an ANE, the C part would be pretty similar, you would use one of the C exec functions or spawn functions, you would run the process from the ANE, wait for the output/result and send back an event from the ANE to the AIR context.

Doing differently, using the getenv() C function, in that particular case of “getting the username”, you could avoid the “waiting to the read the output” part and directly get the result from an ANE function call.

There are desktop software that do it this way to provide an UI to “obscure” command-line call,
under mac you have lsof, and you can find a desktop app Sloth that just parse the output of that command-line to provide a nice UI to it.

So, yeah you can build an ANE to then communicate back to AIR the listing of files and directories etc.

But I would argue, you can also build a command-line exe based on redshell, that you embed with the AIR app, which does all the “C function calls” part and read its output with NativeProcess, the speed will be pretty similar.

It is always a game of pros and cons.

The advantages (or pros) of using redshell would be to be able to write this “native” part in AS3 instead of C/C++, and be able to embed your own myprogram.exe for Windows or myprogram for macOS (or even Linux).

Not that the ANE is a bad solution, but even if your ANE list the files in 10 seconds, it is still 10 seconds you have to wait on the ANE side before you can send back an event with the result to the AIR UI side.

And when it come to ANE, they are executing by default on the main thread, see this post for example
[Air ANE Java]How use FREObject in Thread?.

My point is you don’t get asynchronous for free with ANE, that’s why NativeProcess may not be a bad idea, because it does the asynchronous part for free.

My advice would be that: continue a bit further your experimentations, implement an ANE, see how you wait and/or pass the data back to AIR, and compare to using NativeProcess and reading the standard output.

What will me smoother for the UI, waiting N seconds on the main thread for the ANE to process the data
or waiting for the standard output event from the NativeProcess (not on the main thread) ?

My bet is on NativeProcess.


#5

In terms of my original posting with its original title and purpose, I think this item can be closed out with my thanks for the assistance provided. I would like to pass along some further information concerning the application which motivated the original questions because I think it might be useful to someone somewhere working within the broader scope of this AS3Lang forum. [Probably that means I should post in some other topic with some other tags, but I will just provide the information here and trust that others might locate it by their own search methods.]

  1. To the comment that I have been “cheating” or otherwise not forthcoming on the application on which I am working and must be “definitely building a product for users, not just for” myself, I apologize for whatever may have led to such an observation. From the beginning, as I briefly stated in asking for help, I am attempting to write an application, not just writing some exercise for some programming class. As with all such endeavors, it is my hope that my family and friends will find the application well-enough conceived and executed to be of some use to them. There has never been any commercialization of anything I have produced and there has never been any attempt to sell anything to anyone.

Indeed, the primary reason for taking a moment to pass along some information I have gained in my work to date is only to openly share that information with anyone who might find it useful, and, by all means, if anyone would like to have any of the sources for what I am working on, please just send me an email and I will gladly reply with a zip file and some instructions.

  1. That last sentence provides a useful connection back to the original questions and recommendations concerning how best to architect a solution to the challenges of keeping track of a rather large collection of files spread across a rather large number of devices attached to a relatively small number of locally-networked host computers. I have found that the best combination based on AIR and AS3 is to offload as much of the ‘hard lifting’ with regards to File System operations as possible to low level code running as ‘close to the metal’ as I know how to handle it.

My original desire – the reason that I have used AIR for the last decade – was always to develop such applications as these, and have it run as nearly as possible exactly the same on my kids’ MacOS as on the elder folks’ Windows boxes. With, however, Mr. Jobs, doing all that he did to kill the use of the Adobe tools on his Apple devices, I have to say, up front, that I have had to abandon all MacOS efforts. So, when I write about getting as close to the metal as I know how, I am just talking about Windows-based PCs. Just trying to work with the APIs provided there gives me headaches enough. Since I have never been able to get through the C++ learning curve, I am really just talking about the Windows C Runtime, and within that domain, I am talking about shared dlls compiled and tested for 32-bit and 64-bit deployment.

So, the instructions on how to use what I have been using are, unfortunately, pretty messy. On the Actionscript side, I am completely devoted to FlashDevelop where, sadly, we may have seen the last release of that outstanding IDE. On the C side of the street I have eschewed VisualStudio in favor of old-fashioned command line tools like ‘vi,’ ‘make’ and ‘ant.’ [I should, however, pass along a plug for another open-source tool that could very well be used to do what I am doing. The FRESharp ANE github project lets those of us who can’t handle C++ work with the very comprehensive C# framework. If I decide that C# does not drastically reduce the performance of my ‘backend’ I will change from using C with only command-line tools, to taking advantage of VisualStudio to manage C# projects.]

  1. The connective glue is the AIR Native Extension, not the NativeProcess means of communicating between the user-facing code and the file-facing code, and the main information I am passing along here relates to my use of threads to increase the throughput of the application. You likely know that AIR Workers provide a useful means of achieving some multi-tasking, but, unfortunately, they do not permit a Worker to use ANE. The creation and management of threads, therefore, has to be handled inside the ANE implementation – Windows [not POSIX] threads, in my non-C++ case.

I only have a modest i5 [6th-generation] quad-core [with no hyper-threading, therefore, maxing out at four threads at the hardware level] host on which I work, but I believe my findings would extrapolate naturally to Intel and AMD-based boxes with 6-12 thread possibilities. The first rule is that it never seems worthwhile to launch more than “One Less Than the Maximum” number of back-end tasks. If I try to run four parallel disk-search-intensive tasks, things break. If I run two or three parallel tasks, there is a nice, linear increase in throughput.

  1. So, with all that as background/context, here are some numbers for testing against only locally-connected hard drives. [The application eventually needs to work smoothly across local and remote hard drives, and I have done some preliminary tests to show that the software gets the correct results from remote drives, but, as you would know, the bandwidth reduction from local attachment to LAN-cable or WiFi remote attachment has a decided impact on how many bits can be handled per unit of time.]

My D:, E: and F:, drives are current-generation [Western Digital Black] one- and two-terrabyte devices. [I also have volumes C:, G: and Z: which are mounted on SDD devices, but, for now, there is still such a large price/performance differential between the two types of external storage, the real target of the application is the high-volume, lower-performance HD farm, not the SDD farm.]

My test case is to produce a list of all the directories and all the files on the D:, E: and F: volumes. [The output is written into .txt files which are themselves archived with a datetimestamp for later analysis.] For more pure benchmarking purposes, it would be good to have an automated launch of the three tasks, but since I am mostly trying to make sure the application works and that I can see what breaks and how to fix it, I don’t have any such test harness. Rather, I am actually launching each VolumeList task using the U/I, which involves opening an Explorer-like popup window and allowing the user to select any directory as the root of his search/listing. Thus, there is some necessary time delay between the start of each task.

Here’s a table of results:

D: 3 424,614 120
E: 1 526,914 67
F: 2 120,973 70

And this is how to understand the four columns. The first is the volume ID. They are listed in the order of launch. The second is the order of completion; the third is the number of lines of output; the fourth is the elapsed number of seconds for the task.

While the test is running, I am looking at the Windows 10 Task Manager Performance Tab with nice displays of overall CPU Percent Usage and scrolling graphs of the utilization in each of the four cores as well as displays of percent busy for each disk drive. I am also getting heartbeat messages displayed in the U/I [an event dispatched for each 50,000 lines of output generated by each task. The U/I output lets me see that whatever scheduling algorithm Windows is using, it seems to be giving a nice opportunity for each task to get some concurrent work done – the event reports come back in no predictable order. The aggregate CPU utilization jumps up to about 30% when I start the search of drive D:, jumps up to about 65% when I add the search for drive E:, and pretty much maxes out when I add the third concurrent search.

My two primary conclusions are: a.) the total elapsed time to get about a million directories and files listed – approximately two minutes – is very much less than it was when using AIR’s File.getDirectoryListing; b.) since the parallelization has been divided in such a manner that there is very little contention between the separate threads, the total elapsed time is very close [within about 5 percent] to the elapsed time of the longest-running single task.

The last point probably needs a little clarification. In the real use case – not this one – the user is free to select any directory anywhere for obtaining the fully-recursive list of all sub-directories and files. So, in a real use case, I have to expect that competing threads will sometimes be running against the same physical HDD. In that case, the contention for the read/write head of the actuator will be being pulled this way and that in such a manner that I expect the benefits of concurrency may be negative. I have yet to conduct those tests.

  1. So let me wrap up with a request for any comments that anyone may have from their similar work. Here’s a rerun of my test:

D: 1 424,614 25
E: 3 526,914 31
F: 2 120,977 19

And here is what is interesting and perplexing. You can see that the time required for each volume listing is drastically reduced – 60-70 percent reductions. As a result of the reductions, the order of completion is somewhat modified. One of the observations is that the elapsed times in relation to the number of lines of output has a somewhat unexpected behavior.

In this table:

D: 282 58
E: 578 58
F: 127 157

the value in column 2 is the number of microseconds per line of output for the first run and the value in column 3 is the number of microseconds per line of output for the second run. To which I need to add what I was seeing in the TaskManger display during each run. In the first run, as reported above, the CPU utilization jumped with each additional task and as each task started the Disk utilization associated with each volume remained pretty steady at around 80-90 percent. In the second run the same jump in aggregate CPU utilization steps were observed, but they were somewhat lower – each task adding about 25-30% so that the max was in the range of 90%, not 100% as in the first run.

But here was the very unexpected result, and one for which I would be very interested in gaining a proper understanding. If the tests are run right after a system boot, you get the results for the first case; if, however, the tests are run a second time, you get the results for the second case and in the second case TaskManager shows NO DISK UTILIZATION. The reason for the drastic reduction in elapsed time for the subsequent tests is that by some miracle, the Operating System is able to provide a list of the files with no read access to the drives themselves to obtain that data.

I have never read anything indicating that whatever caching mechanisms are built into an OS as a sensible performance enhancement, that said caching is massive enough to provide for the probability of being able to retain in main memory [paged of course, not RAM] a complete listing of all the millions of files available across its disk drive farm. I would appreciate being pointed to any sort of solid reading material concerning how Windows, MacOS and various Linus distros might compare in this regard. Being able to access over a million lines of searchable text representing the full path of each file on those three drives in about a half a minute is a real eye-opener to me and will greatly facilitate the application development of the ability to identify likely duplicates amongst those millions of possibilities.

That’s about a wrap. If you sense some excitement in this small achievement, you sense correctly. Having to go back and relearn C coding that I first encountered on some DEC pdp-9 box circa 1974 was more exasperating than I ought to admit, but the results make the pain go away. In the spirit of this forum, I hope that what we do with Actionscript may be as long-lived.


#6

ok that’s a lot of things to digest

so I hope the comment about “cheating” was seen as tongue in cheek and not as a critic
this forum is mainly about building apps with AS3, commercial or not is not really an issue

so quickly, yes operating systems have a cache mechanism for disk I/O

things are not exactly the same but the principle or goal are identical
if you access something and you re-access it within a certain amount of time
yes the second access should be faster because of the cache

it depends also on the file system and what features it can support
see Comparison of file systems
for ex under Windows accessing a FAT32 or NTFS drive will probably have different performance

and it depends on the hardware, for ex the Apple fusion drive that combine both SSD and HDD to speed up the access

and it can also depends on other things like RAID
see Selecting the Best RAID Level from Oracle
(not even getting into the difference of perf with software RAID and true hardware RAID with dedicated RAID controller cards)

anyway, so here few articles that look into those disk caching mechanism from different angles

the best read being On Random vs. Streaming I/O Performance; Or Seek(), and You Shall Find – Eventually although very technical

and keep the Latency Numbers Every Programmer Should Know in mind

Latency Comparison Numbers (~2012)
----------------------------------
L1 cache reference                           0.5 ns
Branch mispredict                            5   ns
L2 cache reference                           7   ns                      14x L1 cache
Mutex lock/unlock                           25   ns
Main memory reference                      100   ns                      20x L2 cache, 200x L1 cache
Compress 1K bytes with Zippy             3,000   ns        3 us
Send 1K bytes over 1 Gbps network       10,000   ns       10 us
Read 4K randomly from SSD*             150,000   ns      150 us          ~1GB/sec SSD
Read 1 MB sequentially from memory     250,000   ns      250 us
Round trip within same datacenter      500,000   ns      500 us
Read 1 MB sequentially from SSD*     1,000,000   ns    1,000 us    1 ms  ~1GB/sec SSD, 4X memory
Disk seek                           10,000,000   ns   10,000 us   10 ms  20x datacenter roundtrip
Read 1 MB sequentially from disk    20,000,000   ns   20,000 us   20 ms  80x memory, 20X SSD
Send packet CA->Netherlands->CA    150,000,000   ns  150,000 us  150 ms

Notes
-----
1 ns = 10^-9 seconds
1 us = 10^-6 seconds = 1,000 ns
1 ms = 10^-3 seconds = 1,000 us = 1,000,000 ns

Credit
------
By Jeff Dean:               http://research.google.com/people/jeff/
Originally by Peter Norvig: http://norvig.com/21-days.html#answers

Now the good and bad news is that at the programming level, with any language, there is not much you can do about the hardware or the OS drivers, you will basically get the speed you get.

That said here a couple of things to follow or try

  • keep as much as possible in memory
    if you generate a long list of all the files do not write to the an external file line by line
    keep everything in memory and write it all on disk at the end

  • compare “list by breadth” and “list by depth”
    it is not the same sequential access
    see FileSystem.listByDepth() and FileSystem.listByBreadth()
    also Breadth is sometimes better than depth

  • if you want to try multithreading
    do not do it per drive, but spread it over the top hierarchy of one drive
    eg. if in C:\> you got 5 top directories (list by breadth)
    you could spawn a worker for each one

  • for multithreading, if you want to use the max cores available
    usually the rule is max workers = num core + 1
    for info AS3/AIR as an internal limit of 64 max concurrent workers

so a good formula imho would be

  • based on the number of cores, let’s say 4
  • use a queue, keep it in memory
  • for N drives, list by breadth each top directories
  • add them to the queue
  • you will have something like
C:\one
C:\two
C:\three
C:\four
D:\alpha
D:\beta
D:\gamma
etc.
  • for each entry of the queue spawn a worker till you reach max worker (4)
  • when a worker end, get the next line in the queue
  • rinse and repeat till the queue is empty
  • when all is done and you have all this data in memory
  • then you write the “file lists” on the hard drive

you can swap worker with

  • actually spawning a worker in AS3 with File.getDirectoryListing()
  • use File.getDirectoryListingAsync() (which should underneath use a worker to be async)
  • using NativeProcess (which is async by default and probably use a worker underneath too)
  • listing via an ANE on another thread

Adobe AIR does not have an API to obtain the number of cores

  • in Redtamarin you can use HardwareInformation.processors
  • under Windows you can use echo %NUMBER_OF_PROCESSORS% with NativeProcess
  • under macOS you can use sysctl -n hw.ncpu with NativeProcess
  • or from within an ANE
    see win32-platform.h and mac-platform.h
    for VMPI_processorQtyAtBoot()

so I don’t have time to prototype it with a small app but comparing to your notes and tests it should help a little.