summaryrefslogtreecommitdiffstats
path: root/tcllib/modules/fileutil/traverse.man
blob: 971b635d980f130f1bc3daccef10df3c24d9261c (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
[comment {-*- text -*- doctools manpage}]
[vset VERSION 0.6]
[manpage_begin fileutil_traverse n [vset VERSION]]
[keywords {directory traversal}]
[keywords traversal]
[moddesc   {file utilities}]
[titledesc {Iterative directory traversal}]
[category  {Programming tools}]
[require Tcl 8.3]
[require fileutil::traverse [opt [vset VERSION]]]
[require fileutil]
[require control]
[description]
[para]

This package provides objects for the programmable traversal of
directory hierarchies.

The main command exported by the package is:

[list_begin definitions]

[call [cmd ::fileutil::traverse] [opt [arg objectName]] \
      [arg path] [opt "[arg option] [arg value]..."]]

The command creates a new traversal object with an associated global
Tcl command whose name is [arg objectName]. This command may be used
to invoke various operations on the traverser.

If the string [const %AUTO%] is used as the [arg objectName] then a
unique name will be generated by the package itself.

[para]

Regarding the recognized options see section [sectref OPTIONS]. Note
that all these options can be set only during the creation of the
traversal object. Changing them later is not possible and causes
errors to be thrown if attempted.

[para]

The object command has the following general form:

[list_begin definitions]
[call [cmd \$traverser] [method command] [opt [arg "arg arg ..."]]]

[arg Command] and its [arg arg]uments determine the exact behavior of
the object.

[list_end]
[list_end]

The following commands are possible for traversal objects:

[list_begin definitions]

[call [cmd \$traverser] [method files]]

This method is the most highlevel one provided by traversal
objects. When invoked it returns a list containing the names of all
files and directories matching the current configuration of the
traverser.

[call [cmd \$traverser] [method foreach] [arg filevar] [arg script]]

The highlevel [method files] method (see above) is based on this
mid-level method. When invoked it finds all files and directories
matching per the current configuration and executes the [arg script]
for each path. The current path under consideration is stored in the
variable named by [arg filevar]. Both variable and script live / are
executed in the context of the caller of the method. In the method
[method files] the script simply saves the found paths into the list
to return.

[call [cmd \$traverser] [method next] [arg filevar]]

This is the lowest possible interface to the traverser, the core all
higher methods are built on. When invoked it returns a boolean value
indicating whether it found a path matching the current configuration
([const True]), or not ([const False]). If a path was found it is
stored into the variable named by [arg filevar], in the context of the
caller.

[para] The [method foreach] method simply calls this method in a loop
until it returned [const False]. This method is exposed so that we are
also able to incrementally traverse a directory hierarchy in an
event-based manner.

[para] Note that the traverser does follow symbolic links, except when
doing so would cause it to enter a link-cycle. In other words, the
command takes care to [emph not] lose itself in infinite loops upon
encountering circular link structures. Note that even links which are
not followed will still appear in the result.

[list_end]

[section OPTIONS]

[list_begin options]
[opt_def -prefilter command_prefix]

This callback is executed for directories. Its result determines if
the traverser recurses into the directory or not. The default is to
always recurse into all directories. The callback is invoked with a
single argument, the [emph absolute] path of the directory, and has to
return a boolean value, [const True] when the directory passes the
filter, and [const False] if not.

[opt_def -filter command_prefix]

This callback is executed for all paths. Its result determines if the
current path is a valid result, and returned by [method next]. The
default is to accept all paths as valid. The callback is invoked with
a single argument, the [emph absolute] path to check, and has to
return a boolean value, [const True] when the path passes the filter,
and [const False] if not.

[opt_def -errorcmd command_prefix]

This callback is executed for all paths the traverser has trouble
with. Like being unable to change into them, get their status,
etc. The default is to ignore any such problems. The callback is
invoked with a two arguments, the [emph absolute] path for which the
error occured, and the error message. Errors thrown by the filter
callbacks are handled through this callback too. Errors thrown by the
error callback itself are not caught and ignored, but allowed to pass
to the caller, i.e. however invoked the [method next]. Any other
results from the callback are ignored.

[list_end]


[section {Warnings and Incompatibilities}]

[list_begin definitions]

[def [const 0.4.4]]
In this version the traverser's broken system for handling symlinks
was replaced with one working correctly and properly enumerating all
the legal non-cyclic paths under a base directory.

[para] While correct this means that certain pathological directory
hierarchies with cross-linked sym-links will now take about O(n**2)
time to enumerate whereas the original broken code managed O(n) due to
its brokenness.

[para] A concrete example and extreme case is the [file /sys]
hierarchy under Linux where some hundred devices exist under both
[file /sys/devices] and [file /sys/class] with the two sub-hierarchies
linking to the other, generating millions of legal paths to enumerate.
The structure, reduced to three devices, roughly looks like

[include include/cross-index.inc]

[para] When having to handle such a pathological hierarchy it is
recommended to use the [option -prefilter] option to prevent the
traverser from following symbolic links, like so:

[include include/cross-index-trav.inc]

[list_end]

[vset CATEGORY fileutil]
[include ../doctools2base/include/feedback.inc]
[manpage_end]