------------------------------------------------------------------------ Description of the new macros to control feature exclusion and stack handling ------------------------------------------------------------------------ All the macros reside in "generic/tclInt.h" and can be set in the build environment. Especially the macros controlling usage of stack are setup in such a way that a value defined in the build environment takes priority over the value defined in the header. Feature exclusion. Simply define any of the macros below to exclude the associated feature of the core. TCL_NO_SOCKETS /* Disable "tcp" channel driver */ TCL_NO_TTY /* Disable "tty" channel driver */ TCL_NO_PIPES /* Disable "pipe" channel driver */ TCL_NO_PIDCMD /* Disable "pid" command */ TCL_NO_NONSTDCHAN /* Disable creation of channels beyond std* */ TCL_NO_CHANNELCOPY /* Disable channel copying, C/Tcl [fcopy] */ TCL_NO_CHANNEL_READ /* Disable Tcl_ReadChars, [read] */ TCL_NO_CHANNEL_EOF /* Disable [eof] */ TCL_NO_CHANNEL_CONFIG /* Disable [fconfigure] and Tcl_GetChannelOption */ TCL_NO_CHANNEL_BLOCKED /* Disable [fblocked] */ TCL_NO_FILEEVENTS /* Disable [fileevent] and underlying APIs */ TCL_NO_FILESYSTEM /* Disable everything related to the filesystem */ TCL_NO_LOADCMD /* Disable [load] and machinery below */ TCL_NO_SLAVEINTERP /* No slave interp's */ TCL_NO_CMDALIASES /* No command aliases */ MODULAR_TCL /* All of the above */ Controlling the stack. Define TCL_STRUCT_ON_HEAP to switch a number a of structures to allocation off the heap. The other macros are numeric and define how many variables of a kind are placed on the stack by the functions using the macros. TCL_STRUCT_ON_HEAP /* Allocate temp. big structures off the heap */ * TCL_FMT_STATIC_FLOATBUFFER_SZ 320 /* size of various information placed */ TCL_FMT_STATIC_VALIDATE_LIST 16 /* on the stack */ * TCL_FOREACH_STATIC_ARGS 9 * TCL_FOREACH_STATIC_LIST_SZ 4 TCL_FOREACH_STATIC_VARLIST_SZ 5 * TCL_RESULT_APPEND_STATIC_LIST_SZ 16 TCL_MERGE_STATIC_LIST_SZ 20 * TCL_PROC_STATIC_CLOCALS 20 TCL_PROC_STATIC_ARGS 20 TCL_INVOKE_STATIC_ARGS 20 TCL_EVAL_STATIC_VARCHARS 30 TCL_STATS_COUNTERS 10 TCL_LSORT_STATIC_MERGE_BUCKETS 30 * TCL_DSTRING_STATIC_SIZE 200 /* Exception: Resides in "tcl.h" */ Only the macros marked by '*' have been tested so far (-Dxxx=1). This means that usage of the other macros may result in a crash (FLOATBUFFER... for example did for while). It is advisable to use "-O" when compiling the core so that the compiler optimizes the allocation of local variables on the stack, i.e. collapsing variables with non-overlapping lifetimes into one memory location. ------------------------------------------------------------------------ ------------------------------------------------------------------------ ------------------------------------------------------------------------ Scratchpad Everything below may change at will. ------------------------------------------------------------------------ ------------------------------------------------------------------------ ------------------------------------------------------------------------ Pre-notes The cutting of the channel system is not as clean as I would like it to be, simply because cisco has the special need of a channel system trimmed down to the std* channels, without complete removal. I am not sure that I have removed the maximum amount of C Api's and functions possible for this specific configuration. A first step in rationalizing this section would be NO_CHANNELS to remove the I/O system completely, and then NO_NONSTDCHAN for minial exposure of channels. NO_FILEEVENTS is orthogonal to NO_NONSTDCHAN. Drivers are possible only if not NO_CHANNELS, but can be disabled separately. The standard channels need the "file" driver (currently not disable-able), should use #ifdef's to ensure integrity. => Would be interesting to have a configuration tool which is able to express and enforce these constraints. => The linux core configuration uses the domain specific language CML2 (Eric Raymond, written in Python). ! Investigate possible usage of SourceNavigator as basic for parsing the Tcl core. Use custom tools to follow dependencies between structures and functions. (What-If tools: What if I exclude this function/struct, what else can be removed, or requires this). Also: What are the leaf functions in the system ... ! Mapping help: Associate functions with functional areas and see how the areas relate, how much can be removed whenever an area is excluded ... ------------------------------------------------------------------------ ------------------------------------------------------------------------ ------------------------------------------------------------------------ Shrinking the core. Filesystem Shrinking the usage of stack Large static arrays on the stack Look for #define's, check usage, create #defines if necessary DString !! (initial dstring data in structure!) RE's ? NRE1 == running a stack test of the full test suite for a build is 1.5 hours == == something for the evening and the night == Document methodology of testing stack Macros TCL_NO_ to deactivate/cut feature MODULAR_TCL activates all TCL_NO_ macros ------------------------------------------------------------------------ Cut 1 The cut currently restricts itself to the UNIX and GENERIC parts. No changes in Win* and Mac areas. channel system - no sockets TCL_NO_SOCKETS / - no serial/tty TCL_NO_TTY / - no pipes TCL_NO_PIPES / - no pid command TCL_NO_PIDCMD / - channel system provides TCL_NO_NONSTDCHAN / only std* channels [x] - no channel copying TCL_NO_CHANNELCOPY / - no [read]ing TCL_NO_CHANNEL_READ / - no [eof] [/] TCL_NO_CHANNEL_EOF / - no channel set/get cfg TCL_NO_CHANNEL_CONFIG [+] / - no [fblocked] TCL_NO_CHANNEL_BLOCKED [/] / - no fileevents TCL_NO_FILEEVENTS [=] / filesystem - disable filesystem TCL_NO_FILESYSTEM [%] /* - disable load'ing TCL_NO_LOADCMD / master/slave interpreters - disable slave interp TCL_NO_SLAVEINTERP /* - disable command aliases TCL_NO_CMDALIASES /* [*] Access from the C level is not removed. [x] Implies that no .rc can be read during unix init. Implies that no startup script can be read by tclsh. Implies NO_SOCKETS, NO_TTY, NO_PIPES Implies currently 'no "source" cmd' and no loading of encoding files. In the generic case this functionality can the reimplemented by direct OS calls without using the channel system. Makes the implementation platform dependent. As Cisco doesn't want this functionality we disable them without adding a new implementation. Implies that channels cannot be moved/shared between master/slave interps. (seek is removed under the assumption that the std* channels are not seekable) [/] Tcl_Eof, Tcl_InputBlocked stay because they are required by [gets]. [+] Tcl_SetChannelOption stays, required for initial config of std channels. [=] Implies no socket servers. Reason: Accept callback for socket server is done through fev's [%] Ripping the filesystem intrudes heavily on the startup sequence of the interpreter as auto_path, package paths, etc. can't be initialized anymore. This also cuts into the initialization of encodings. Given that encodings will be changed later to not use UTF internally this is no big deal. For Cisco. Others might want to have 'no fs', but UTF. We have to check that the startup sequence is still operational. Given that without a FS loading of encoding from files is impossible the loss of initialization is again not so big a deal. ------------------------------------------------------------------------ ------------------------------------------------------------------------ ------------------------------------------------------------------------ Handling of stub table when cutting features: 1. Disable all functions for the feature, from the bottom up to the top (script level command). This includes full disabling of stub functions too. The bottom-up approach enforces link errors in the higher levels and thus allows us to use the compiler to find all relevant places where we have to cut. Cutting stub functions is essential to find everything. 2. Go through the functions causing link errors in tclStubInit.o == stub functions. Add variants which are empty, return errors etc. and compile these when the feature is disabled. ** Changed ** Add suppressor definitions to "tcl*.decls" and regen the code. ------------------------------------------------------------------------ ------------------------------------------------------------------------ ------------------------------------------------------------------------ Future: Implement a mechanism for 'tcl.decls' which allows the definition of (static, loadable) sub packages. So that the stub table is minimally initialized and sub packages initialize their slots when loaded. ------------------------------------------------------------------------ ------------------------------------------------------------------------ ------------------------------------------------------------------------ Cut 2, Working on the stacksize. * Spliced the NRE1 engine by Miguel Sofer into the core. Stack testing the testsuite show an average saving of 4 K stack space. * Reducing amount of characters directly stored in a DString structure from 200 to 1. Average savings when going through testsuite: * Going through #defines in headers and sources to identify more locations placing data on the stack. New controlling macros. #define TCL_NO_RECURSE /* enables the NRE modifications */ set by default Tcl_ExternalToUtfDString is in trouble for TCL_DSTRING_STATIC_SIZE=1 I guess TDSS < UTF_MAX is trouble because the function does not check before attempting the first conversion. ... Ok, SZ=25 is ok for the testsuite. This doesn't mean that it is ok in real life, but encoding is cut of for cisco, so we can screw this here. Keep in mind for later. 1,4,5,10,17 fail 21,25 ok #define TCL_FMT_STATIC_FLOATBUFFER_SZ 320 #define TCL_FMT_STATIC_VALIDATE_LIST 16 #define TCL_FOREACH_STATIC_ARGS 9 #define TCL_FOREACH_STATIC_LIST_SZ 4 #define TCL_FOREACH_STATIC_VARLIST_SZ 5 #define TCL_RESULT_APPEND_STATIC_LIST_SZ 16 #define TCL_MERGE_STATIC_LIST_SZ 20 #define TCL_PROC_STATIC_CLOCALS 20 #define TCL_PROC_STATIC_ARGS 20 #define TCL_INVOKE_STATIC_ARGS 20 #define TCL_EVAL_STATIC_VARCHARS 30 #define TCL_STATS_COUNTERS 10 #define TCL_LSORT_STATIC_MERGE_BUCKETS 30 -DTCL_FMT_STATIC_FLOATBUFFER_SZ=320 -DTCL_FMT_STATIC_VALIDATE_LIST=16 -DTCL_FOREACH_STATIC_ARGS=9 -DTCL_FOREACH_STATIC_LIST_SZ=4 -DTCL_FOREACH_STATIC_VARLIST_SZ=5 -DTCL_RESULT_APPEND_STATIC_LIST_SZ=16 -DTCL_MERGE_STATIC_LIST_SZ=20 -DTCL_PROC_STATIC_CLOCALS=20 -DTCL_PROC_STATIC_ARGS=20 -DTCL_INVOKE_STATIC_ARGS=20 -DTCL_EVAL_STATIC_VARCHARS=30 -DTCL_STATS_COUNTERS=10 -DTCL_LSORT_STATIC_MERGE_BUCKETS=30 cut_dstring ... -DTCL_FMT_STATIC_FLOATBUFFER_SZ=0 -DTCL_FMT_STATIC_VALIDATE_LIST=0 -DTCL_FOREACH_STATIC_ARGS=0 -DTCL_FOREACH_STATIC_LIST_SZ=0 -DTCL_FOREACH_STATIC_VARLIST_SZ=0 -DTCL_RESULT_APPEND_STATIC_LIST_SZ=0 -DTCL_MERGE_STATIC_LIST_SZ=0 -DTCL_PROC_STATIC_CLOCALS=0 -DTCL_PROC_STATIC_ARGS=0 -DTCL_INVOKE_STATIC_ARGS=0 -DTCL_EVAL_STATIC_VARCHARS=0 -DTCL_STATS_COUNTERS=0 -DTCL_LSORT_STATIC_MERGE_BUCKETS=0 ------------------------------------------------------ General look through the code for static buffers on the stack. tclAlloc /ok tclAsync /ok tclBasic Tcl_CallWhenDeleted 32+INT_SPACE Tcl_ExprString TCL_DOUBLE_SPACE tclBinary /ok tclClock /ok tclCmdAH StoreStatData TCL_INTEGER_SPACE Tcl_FormatObjCmd ...(Obj)Cmd functions often hold quite a lot of state in local variables. For exact measurements we have to instrument the C code with additional (macroized) function calls to record exact sizes for every invoked C function. Automatic instrumentation is difficult. Could instrument the dispatchers first (where commands are invoked) to get stack sizes for bigger blocks of execution (command + utility functionality called by it). ----------------------------------------------------------------------------------------------- A big structure is 'CompileEnv'. Instead of trying to reduce its size it might be better to allocate the whole structure of the heap. #define TCL_COMPENV_ON_HEAP /* Allocate temp. CompileEnv structs off the heap */ Stack measure @ TclSetByteCodeFromAny ../../src/tcl834_stkr/unix/../generic/tclCompile.c 300 = 2036 @ TclCompileByteCodesForExpr ../../src/tcl834_stkr/unix/../generic/tclExecute.c 6022 = 2008 On Heap @ TclSetByteCodeFromAny ../../src/tcl834_stkr/unix/../generic/tclCompile.c 300 = 100 @ TclCompileByteCodesForExpr ../../src/tcl834_stkr/unix/../generic/tclExecute.c 6022 = 68 ----------------------------------------------------------------------------------------------- Ditto Tcl_Parse @ TclCompileSetCmd ../../src/tcl834_stkr/unix/../generic/tclCompCmds.c 1618 = 640 @ TclCompileIncrCmd ../../src/tcl834_stkr/unix/../generic/tclCompCmds.c 1356 = 636 On Heap @ TclCompileSetCmd ../../src/tcl834_stkr/unix/../generic/tclCompCmds.c 1619 = 268 @ TclCompileIncrCmd ../../src/tcl834_stkr/unix/../generic/tclCompCmds.c 1356 = 264 ----------------------------------------------------------------------------------------------- TclObjInterpProc 832, 792, 752 => STATIC_CLOCALS 20 ==> 716 bytes accounted for. => STATIC_CLOCALS is of help and changing it does not crash the interp. TclInvokeStringCommand TCL_INVOKE_STATIC_ARGS => 20 x char* = 80 TclExecuteByte Uses 868 btes of stack. where ? .... compiler places all local variables immediately on stack, independent of where defined (i.e. even variables declared in sub scopes are placed immediately.) 868 -> /4 about 217 variables ... Yes, that it is on the order of variables declared in this behemoth Why variables with non-intersecting lieftimes collapsed into one memory location ? ... Ok, compilation was just -g, without any optimizations ... Compile -g -O => 480 bytes stack compile -g -O2 => 460 bytes stack ! Ok compiling the whole instrumented core with -g -O to get standard stack usage numbers. => Have to comile baseline with that as well. Also look for variable decl. hidden in intenrl blocks. ...