[SOLVED] Command to keep track of folder content

DynV

Distinguished
Aug 13, 2009
288
5
18,785
There's a programs which output are only in files. The directory structure is that in the main output folder there's a subdirectory which contain incomplete files, which when complete will be moved up (thus in the main output folder). Rarely a file will be left in the subfolder, but there cannot be a file in the main output folder without it being in the subdirectory before. Often a file will be deleted directly in the subdirectory (without it being moved in the main output folder) ; there's no telling how often a particular file name could have been gone through this (ie 15% of the files created have the same file name). Here's an example of the output in chronological order:
Code:
1 MAIN
    SUBDIRECTORY
        1.abc
    [no file]
2 MAIN
    SUBDIRECTORY
        2.abc
    1.abc
3 MAIN
    SUBDIRECTORY
        3.abc
    2.abc
    1.abc
4 MAIN
    SUBDIRECTORY
        1.abc
    3.abc
    2.abc
    1.abc
5 MAIN
    SUBDIRECTORY
        4.abc
    3.abc
    2.abc
    1.abc
6 MAIN
    SUBDIRECTORY
        5.abc
        4.abc
    3.abc
    2.abc
    1.abc
7 MAIN
    SUBDIRECTORY
        3.abc
        4.abc
    5.abc
    3.abc
    2.abc
    1.abc
8 MAIN
    SUBDIRECTORY
        5.abc
        4.abc
    5.abc
    3.abc
    2.abc
    1.abc
As one may see in the last (8th) case, the same situation may arise again, but unlike the previous same case (6th), the file that was--not--left behind (5.abc), have a newer creation date; a reminder that only the subdirectory activity should be analysed. Thus when file information is taken, the creation date of the files should be included compared to see which file is being worked on. In the example above, the only file that should have > 1 entry should be 5.abc, both containing different timestamp and very likely different file size.

I'd like to have a report of the ongoing activity in the subdirectory. It should take little resource and it being done a couple times to a few times a minute is sufficient. How may I have that done?

Thank you kindly
 
Last edited:
Solution
Probing 2-4 times a minute and capturing the results of those probes is, in my mind, is a real time process versus just a snapshot.

Using an "average" of 3 times per minute your requirements include a data sweep every 20 seconds.

As to how much information needs to be captured with each sweep I can only speculate but that speculation concludes "a lot".

And with 15% of the files having the same name you will need to have some form of unique identification for each file added in.

Not to mention storing the probe results and subsequently summing everything up into some sort of report.

Be the report simply visually presented on a monitor (Dashboard ?) and/or printed out.

Either on a schedule and/or ad hoc.

The requirements are...
What OS? That is something you can make a script to do.

If Windows, a script may output a text file from tree or dir command, output to a file. Then next time, same procedure but to new file, so then you get the script to compare those. You can use fc command or any third party tool to check the differences.

I think that Winmerge would be a decent app to make a repport on the differences.
 
I thought I posted in Windows 10... anyway it's that OS. The software is constantly running and the files keep getting created in the subdirectory. So if it's a script it has to either be in a loop with Wait, or be run by the Windows equivalent of Cron.
 
Thank you for the suggestion but if I'm understanding it right, it's to have live information; I don't need that, I need something per shift or per day. My ultimate goal is to find out what portion of files were not move to the main directory, by either remaining in the subdirectory or being deleted from it (likely because such a filename were moved up (to main directory) previously), and the top such file, ie: 22% not moved in the last X # of hrs, top 10 file names not moved: 6.abc, 15.abc, 38.abc, ...
 
You can use Task Manager and/or Task Scheduler to launch the cmdlet or script automatically at the desired times.

Or simply set up a Desktop shortcut to do so on an ad hoc basis.

= = = =

So not real time. Just per shift or end of day snapshot of where files are located.

Then determine which files have not moved as expected or required within X hours or by other measurable means.

= = = =

Requirements and logic now becoming a bit more complicated....

Files not moved, time in folders, etc... All need some sort of "comparison value" to determine if the file has not moved or has been too long somewhere.

Time stamps.

Powershell could "Get" the data, then save the data to a file (CSV, TXT, XML) which in turn could be accessed by Excel/Access and present the filtered and sorted data as desired.

Likely that some basic script will accomplish 80% of what is necessary provided that the end user is focused on determining potential issues or problems with the files.

Automating the last 20% of the requirements to flag those potential issues or problems in some manner - that is much more of a challenge.

But some of that work has been done.

Reference:

https://dotnet-helpers.com/powershell/how-to-monitor-a-folder-changes-using-powershell/

https://devblogs.microsoft.com/scripting/use-powershell-to-monitor-for-the-creation-of-new-files/

You can easily google for more information. Search criteria = "using powershell to track files"

Then vary and revise the search criteria as you learn and filter the results more to your requirements and environment.
 
Thank you for your input. I'm assuming you understood my report desire, but in case you missed it: I'd like the subdirectory to be probed 2-4 times a minute, and from that probing, make a report every few hours (or on demand with the option of the start & end (the end would be by default the current time)).
 
Probing 2-4 times a minute and capturing the results of those probes is, in my mind, is a real time process versus just a snapshot.

Using an "average" of 3 times per minute your requirements include a data sweep every 20 seconds.

As to how much information needs to be captured with each sweep I can only speculate but that speculation concludes "a lot".

And with 15% of the files having the same name you will need to have some form of unique identification for each file added in.

Not to mention storing the probe results and subsequently summing everything up into some sort of report.

Be the report simply visually presented on a monitor (Dashboard ?) and/or printed out.

Either on a schedule and/or ad hoc.

The requirements are growing both in number and scope.

You are going, to some degree, into Data Analysis. Which may or may not be a completely accurate assessment.

Real time file tracking comes to mind also.

= = = =

What are the 1st indications that some file has fallen through the cracks, been struck too long somewhere, moved to the wrong sub-folder etc.?

Focus on some simple cmdlets or scripts to identify those occurrences.

Work out the logic and algorithms that will find and flag those files. Or at least the majority of them.

First manually and then later automate when viable.

And it does not need to be Powershell. Maybe Python perhaps. Or some other coding language.
 
Solution
Thank you for your input. I'm assuming you understood my report desire, but in case you missed it: I'd like the subdirectory to be probed 2-4 times a minute, and from that probing, make a report every few hours (or on demand with the option of the start & end (the end would be by default the current time)).
You can do all that in PowerShell, which is built into Windows 10. It runs in an console-like text window, and the script has control over where the output goes and what colour the text is in, so you could create a script that does a pass over the files and directories and produces a nice little summary in the console window, then waits for 10 seconds and does it again. Once every few hours it could pump out a summary to a text or HTML file, and you could create those files with unique date-stamped names so that you have a record of what was going on over time.

Since the directory is being actively updated by whatever your process is anyway, it'll be cached in main memory and running a PowerShell script to scan all the folders should be extremely low overhead.
 
  • Like
Reactions: Ralston18