


                                 CleanWAD V1.54
                WAD file cleaner and optimizer by Serge Smirnov
                              (sxs111@po.cwru.edu)

                       Extended 2006/05/04 by Martin Howe
                           (martinhowe@myprivacy.ca)

                       Comments and suggestions welcome.

         The most up-to-date version of CleanWAD can always be found at
                 http://www.martinsobservationpost.net/index.html

                Refer to the legal statements and disclaimers in
               appendices 5 and 6, near the end of this document.

                             CleanWAD is freeware.



Extensions
----------

Recognizes BEHAVIOR and SCRIPTS lumps as map items
Recognizes OpenGL map entries as map items
Recognizes Heretic [sound,] graphic and music names
Recognizes HeXen sound, graphic and music names
Recognizes Strife sound, graphic and music names
Recognizes BOOM colormap lists
Recognizes ZDOOM ACS library lists
Recognizes ZDOOM auto-texture lists
Recognizes STRIFE voice lists
Recognizes Heretic and HeXen graphic font lists
Optionally restricts entry identificaton to recognized names
Optionally corrects double-letter list markers (PP_START -> P_START)
Optionally removes sub-list markers (e.g., F1_START, P3_END)
Optionally remove SCRIPTS and SCRIPTnn entries (often decompiled)
Optionally sorts PNAMES just after any TEXTURE lumps
Optionally attempts to recognize maps with nonstandard names
Optionally attempts to recognize DOOM and WAVE sounds by format
Optionally attempts to recognize DOOM and MIDI musics by format
Optionally attempts to recognize DOOM graphics by format
Optionally uses a specified game to minimize entry misidentification
Optionally uses other directory sortation orders than "by list"
Optionally packs the file by sharing lumps between directory entries
Completely rewritten and now using a totally new engine
Recognizes an environment variable from which to take extra default options



Overview
--------

CleanWAD simply copies a WAD file to a file with another name, rebuilding it as
it goes. This procedure includes the following:

    * sorting directory entries by type;
    * eliminating any repeated entries;
    * eliminating 'null' entries (i.e., entries with "\0\0\0..." for name);
    * truncating WAV resources whose length exceeds their number of samples;
    * rebuilding pictures and (optionally) blockmaps;
    * optional lossless picture and blockmap compression (typically 1.0x..1.6x)
    * optional lossless picture and blockmap decompression

What you should end up with is a cleaner, smaller copy of your WAD file, which
is identical in functionality to the original. The amount of space saved will
vary, but may be quite significant. Here are the results of running CleanWAD on
the standard IWADs with all default options:

File     | Version      | Old size    |  New size   | Saving    | Saving
---------+--------------+-------------+-------------+-----------+-------
DOOM     | 1.9 Ultimate |  12,408,292 |  11,363,480 | 1,044,812 |  8.42%
DOOM2    | 1.9          |  14,604,584 |  14,298,600 |   305,984 |  2.10%
TNT      | 1.9 Final    |  18,654,796 |  17,827,788 |   827,008 |  4.43%
PLUTONIA | 1.9 Final    |  18,240,172 |  17,140,932 | 1,099,240 |  6.03%
HERETIC  | 1.3 SotSR    |  14,189,976 |  13,939,404 |   250,572 |  1.77%
HEXEN    | 1.1          |  20,083,672 |  19,878,624 |   205,048 |  1.02%
HEXDD    | 1.0 DKotDC   |   4,429,700 |   4,377,940 |    51,760 |  1.17%
STRIFE   | 1.2          |  28,377,364 |  27,608,080 |   769,284 |  2.71%
VOICES   | 1.2          |  27,319,149 |  27,319,152 |        -3 |  0.00%
---------+--------------+-------------+-------------+-----------+-------
TOTAL    | N/A          | 158,307,705 | 153,754,000 | 4,553,705 |  2.88%

Note that the slight increase in VOICES.WAD (from Strife) is due to aligning the
directory ("-ad") and that TNT.WAD is the raw original (i.e., without the MAP31
Yellow Key Patch applied).

Usage
-----

The basic command line for CleanWAD is very simple:

    cleanwad.exe [options] <input-file> <output-file>

The original file is left intact, the output goes to output-file. If output-file
already exists, it will be OVERWRITTEN without ceremony, so please be careful.
In particular, if you specify the SAME file as both the input and output file in
a way that CleanWAD can not yet detect (e.g., if one is a hard link to the
other) then you will have problems (see Other Problems later in this manual).

While CleanWAD processes a WAD file, it will occasionally issue diagnostic
messages. Errors always cause the program to abort; unlike previous versions of
CleanWAD, the term "error" is used to mean either a failure from a system call
(such as malloc()) or else a genuinely broken WAD file that needs fixing before
attempting to optimize it. All messages fall into eight categories:

    S)  Statements: program startup and shutdown messages
    E)      Errors: problems that are uncorrectable or indicate serious trouble
    W)    Warnings: problems that are correctable but might indicate trouble
    C) Corrections: problems that can be completely corrected
    P)    Progress: completion of large steps (there are ony a few of these)
    A)     Actions: successful optimizations or other notable events
    D)     Details: detailed progress logging (too verbose for normal use)
    I)   Internals: internal logic diagnostics of resource optimization

You can tell CleanWAD how verbose to be by setting the "verbosity level" to a
character given in the table above; each category up to and including the
specified level are displayed: STATEMENTS means "startup banner only", INTERNALS
means "everything" and ACTIONS (default) displays everything except DETAILS and
INTERNALS; for example:

    cleanwad -vp a.wad b.wad

sets the verbosity level to PROGRESS. You might want to do this if CleanWAD
generates more optimization reports than you want to see while it runs.

You can tell CleanWAD which game to assume is being used by setting the game to
a single character; for example:

    cleanwad -gx a.wad b.wad

will select HeXen as the game. The legal values are:

    n) Neutral (make assumptions)
    1) Doom
    2) Doom II
    t) TNT: Evilution (currently equivalent to g2)
    p) The Plutonia Experiment (currently equivalent to g2)
    h) Heretic
    x) HeXen
    d) Deathkings of the Dark Citadel (currently equivalent to gx)
    s) Strife

You can tell CleanWAD how to sort the directory by setting the sort order to a
single character; for example:

    cleanwad -sl a.wad b.wad

will select "by list" as the sort order. The legal values are:

    n) None ........................................................ do not sort
    l) List .......................... sort into MAPS, SPRITES, LUMPS, and so on
    a) Alphabetic ................ as list, then alphabetically within each list
    i) Intrinsic ...... as alphabetic, but maintain intrinsic order in each list

The "alphabetic" and "intrinsic" orders sort list markers in the correct place
in their lists (start or end, as applicable) and so maintain the list structure.

The "alphabetic" and "intrinsic" orders should not be used if the flat or wall
patch lists contain animated textures or flats, unless they are already in
alphabetical order and would not have any others inserted between them by the
sortation. Depending on the engine being used, they would then fail to animate.

The "intrinsic" sort is like "alphabetic" in that it sorts alphabetically within
each list; however, some types of entry have their own intrinsic order and this
takes priority over alphabetic sortation if "intrinsic" is used. For example,
the DOOM2 musics have an intrinsic order which is that of the maps that they
represent (D_RUNNIN, D_STALKS, ..., D_ULTIMA) and which is not alphabetic.

If you specify "intrinsic" sort for a PWAD file supposedly containing DOOM2
musics, you will get recognized DOOM2 musics in map order (followed by D_READ_M,
D_DM2TTL and D_DM2INT, if present), then (in alphabetical order) any musics
whose names do not match (for example, a music called D_NEWMUS). Generally
speaking, if a PWAD contains custom music or generic lump names (i.e., lumps
whose names do not exist in the applicable IWAD), then it is best to avoid the
"intrinsic" sort, as CleanWAD will not know how to order the new names. In this
context, "custom" includes the use of existing names as if they were custom
names; for example, using D_E1M1 in a DOOM2 game (i.e., via a SNDINFO lump).

As of this time, "intrinsic" only works for music and generic lump names. The
"instrinsic" order for music lumps is level music, then system musics such as
D_INTRO. The "intrinsic" order for generic lumps is text or nearly-text first
(e.g., MAPINFO, ENDOOM, ...) then binary (e.g., COLORMAP). The specific ordering
eventually chosen was derived purely from the personal opinions of the author
and it is hoped that in future versions of CleanWAD it will be configurable,
along with the default options; for example, via an INI file.

In any case, it should be borne in mind that the sortation of a WAD file
directory, even that performed by the original CleanWAD, is purely cosmetic; it
is useful when editing or debugging a WAD file, but except as to how it affects
animated textures and flats, has no meaning to the games.

These options are not boolean and can, therefore, only be enabled or left at
default values; the others are boolean and can be preceded by the word "no",
depending on whether you want to enable or disable them.

For example:

    cleanwad a.wad b.wad -noad -rb -pp

turns the "ad" option off and turns the "rb" and "pp" options on (stay tuned
for an explanation of what these do). Needless to say, none of the options
themselves start with the word "no", as that would lead to much confusion :-)

The options that are on by default are the ones that most aggressively optimize
the output file. To see which options are ON by default, run CleanWAD with no
arguments. Note that some of the code letters for options have changed, one
has disappeared and that there are some new options not previously included.

The "is" option used to stand for "ignore syntax", but has been removed as it
is now TOTALLY unnecessary. CleanWAD builds a symbol table of all the lists that
it finds and can thus handle arbitrary lists like ?_START/?_END/??_START/??_END.
CleanWAD also knows how to pair the double-letter sprite, patch and flat markers
together and with standard markers, even if they are mixed in the same list.
CleanWAD also knows how to handle sprite, flat and patch sub-list markers such
as F1_START and P3_END. Any Remaining problems, such a mismatching START and END
markers, are genuine errors that imply a broken WAD file.

CleanWAD recognizes the environment variable CLEANWADDFLTOPS if it is set. This
should contain any options, in command-line format, that the user wishes to be
on or off by default; any option settings in the variable override the built-in
defaults for those options. For example, to silently "ignore and preserve" ALL
existing lump reuse (see later) by default, set CLEANWADDFLTOPS to the string
"-rl -qc" (without the quotes, in Windows; with the quotes, in Linux or CygWin).

-rd (or -nord) stands for "remove duplicates". Turning this option on causes
    CleanWAD to check for a previous occurence before adding an entry to the
    output file. If an entry with the same name already exists, it is replaced
    with the latest one. This happens as many times as necessary to insure that
    the output file contains no redundant entries. Maps are handled the same
    way, i.e., a second occurrence of E1M1 causes the first one, including all
    the map data following it, to be removed. Of course, a second occurrence of
    THINGS in the same WAD isn't considered redundant.

-tw (or -notw) stands for "truncate waves". Turning this option on causes
    CleanWAD to remove unused space at the end of the resource. Some editors
    save wave format entries with extra space after the actual sound samples and
    CleanWAD can truncate such sound lumps to their correct actual length.

*** The following three options are mutually exclusive:

-rb (or -norb) stands for "rebuild blockmaps" but WITHOUT packing or unpacking.
    Turning this option on causes CleanWAD to remove the empty holes that the
    format of DOOM blockmaps allows to exist within the resources (these exist
    in a similar way to the wasted space in wave format sounds, but can be
    littered throughout the lump as well as at its end). This option will also
    preserve any EXISTING blockmap packing that may have been applied to any
    non-hole parts of the resource BUT IT WILL NOT by itself pack any further or
    unpack at all: you must use the "-pb" or "-ub" options for that.

-pb (or -nopb) stands for "pack blockmaps". Turning this option on causes
    CleanWAD to remove the empty holes within the resources, as does "-rb", but
    also applies a packing algorithm of a type similar to that used by LINEDEF
    sharing WAD file packers. CleanWAD can pack blockmap resources up to 8 times
    (though ratios between 1.1 and 1.6 are much more typical), without changing
    their functionality in any way.

-ub (or -noub) stands for "unpack blockmaps". Turning this option on causes
    CleanWAD to remove the empty holes within the resources, as does "-rb", but
    also applies an unpacking algorithm that removes any existing packing. If
    you have packed blockmaps in your WAD with "-pb", but for some reason want
    to bring them back to normalized format, CleanWAD can unpack them for you.
    The result is a resource with no empty holes in the data, but in which data
    is specified in full even if it is duplicated elsewhere.

-fb (or -nofb) stands for "flush blockmaps". Turning this option on causes
    CleanWAD to always write a modified blockmap to the output file, even if the
    size has not changed. The unpack method used by CleanWAD will always
    canonicalize a blockmap completely and the rebuild and pack methods used by
    CleanWAD will do so to the maximum extent possible for those methods (refer
    to Appendix 3 for details). Forcing the modified blockmap to be written
    ensures that it will exist in the output file in most canonical form
    possible for that blockmap processing method.

*** The following error conditions are explicitly checked for by CleanWAD when
    processing blockmaps: (a) any blockmap pointer pointing *outside* the lump
    and (b) any invalid linedef number. Regarding case (b), linedef numbers are
    unsigned 15-bit integers stored in a signed 16-bit integer, with the "rogue"
    value (-1) being used to mean "end of linedef list" inside a blockmap. Thus
    no linedef number in a blockmap should ever be less than -1 or greater than
    the number of linedefs implied by the size of the LINEDEFS lump.

*** The result of playing a map with error (b) is undefined; many engines appear
    at first sight to play such a map because they will never ask for linedef
    numbers that do not exist; however other errors can subsequently occur, such
    as monsters walking through walls. This problem is often symptomatic of an
    attempt to build a blockmap that is logically valid, but physically too big
    for the format of a blockmap as stored in a WAD file (see next note).

*** Some blockmaps are so big that they can only fit the format of a blockmap,
    as stored in a WAD file, when packed and attempting to unpack them would
    cause integer overflow (sometimes known as wrap-around) in the linedef list
    pointers. This will usually lead to error (b) as described above, but in any
    case the resulting blockmap will not be valid. CleanWAD will warn you that
    the blockmap cannot be processed and will output the blockmap unmodified.

*** The following three options are mutually exclusive:

-rp (or -norp) stands for "rebuild pictures" but WITHOUT packing or unpacking.
    Turning this option on causes CleanWAD to remove the empty holes that the
    format of DOOM pictures allows to exist within the resources (these exist in
    a similar way to the wasted space in wave format sounds, but can be littered
    throughout the lump as well as at its end). This option will also preserve
    any EXISTING picture packing that may have been applied to any non-hole
    parts of the resource BUT IT WILL NOT by itself pack any further or unpack
    at all: you must use the "-pp" or "-up" options for that.

-pp (or -nopp) stands for "pack pictures". Turning this option on causes
    CleanWAD to remove the empty holes within the resources, as does "-rp", but
    also applies a packing algorithm of a type similar to that used by LINEDEF
    sharing WAD file packers. CleanWAD can pack picture resources up to 8 times
    (though ratios between 1.1 and 1.6 are much more typical), without changing
    their functionality in any way.

-up (or -noup) stands for "unpack pictures". Turning this option on causes
    CleanWAD to remove the empty holes within the resources, as does "-rp", but
    also applies an unpacking algorithm that removes any existing packing. If
    you have packed pictures in your WAD with "-pp", but for some reason want
    to bring them back to normalized format, CleanWAD can unpack them for you.
    The result is a resource with no empty holes in the data, but in which data
    is specified in full even if it is duplicated elsewhere.

-fp (or -nofp) stands for "flush pictures". Turning this option on causes
    CleanWAD to always write a modified picture to the output file, even if the
    size has not changed. The unpack method used by CleanWAD will always
    canonicalize a picture completely and the rebuild and pack methods used by
    CleanWAD will do so to the maximum extent possible for those methods (refer
    to Appendix 3 for details). Forcing the modified picture to be written
    ensures that it will exist in the output file in most canonical form
    possible for that picture processing method.

*** CleanWAD is not intended to handle hi-resolution graphics as used in source
    ports like JDoom and the Doom Retexturing Project. If your WAD includes any
    of these, be sure to turn the "-rp", "-pp" and "-up" options off.

*** Packing and unpacking are indempotent operations: in other words, attempting
    to pack a resource that is already packed does nothing and attempting to
    unpack a resource that is not packed also does nothing. In the same way,
    some resources don't pack at all because they inherently have no redundancy
    to exploit or were saved in packed format by whichever program created them.

*** The following error condition is explicitly checked for by CleanWAD when
    processing pictures: any pixel run pointer pointing *outside* the lump.

*** The following three options are mutually exclusive:

-rl (or -norl) stands for "reuse lumps" but WITHOUT packing or unpacking.
    Turning this option on causes CleanWAD to suppress the file write for a data
    lump if its directory entry specifies the same size and offset in the input
    file as that of an already-written directory entry. This is because some
    authors use this technique (sharing data lumps between directory entries) as
    a means of packing the PWAD file where, for example, several sprite frames
    must display the same picture. This option will only PRESERVE any EXISTING
    lump reuse or packing that may have been applied. It is important to note
    that CleanWAD will NEVER create lump reuse or packing by itself, unless you
    ask it to. In particular, this option WILL NOT by itself pack any further or
    unpack at all: you must use the "-pl" or "-ul" options for that.

-pl (or -nopl) stands for "pack lumps". Turning this option on causes CleanWAD
    to suppress the file write for a data lump if its directory entry specifies
    the same size and offset in the input file as that of an already-written
    directory entry, as does "-rl", but also applies a packing algorithm that
    shares one data lump between multiple directory entries for each group of
    lumps that are identical byte-for-byte. In other words, it preserves any
    existing lump reuse and also creates any extra reuse that may be possible.

-ul (or -noul) stands for "unpack lumps". Turning this option on causes CleanWAD
    to write out each lump explicitly, even if it could be shared or reused;
    that is, even if its contents or size-and-offset in the input file are the
    same as those of an existing entry. The algorithm removes any existing entry
    reuse and packing. If you have packed your WAD with "-rl" or "-pl", but want
    to bring it back to normalized format, for example to edit the file,
    CleanWAD can normalize it for you. The result is a file in which each data
    lump having a non-zero size is explicitly present (even if it is duplicated
    elsewhere) and in which no two directory entries that specify a non-zero
    lump size specify the same file offset for their data lumps.

*** Reusing, unlike rebuilding, is "almost" indempotent in the sense that (a)
    you have to request it and (b) attempting to reuse lumps that are already
    resused only preserves existing reuse. Packing is also "almost" indempotent
    in the sense that (a) you have to request it and (b) the physical order of
    the lumps in the file depends on how the directory was sorted. However, as
    with packing blockmaps and pictures, it does not save any more space in
    total if the PWAD is already packed and in the same way, some PWADs don't
    pack at all because they inherently have no redundancy to exploit or were
    saved in packed format by whichever program created them.

*** When reusing lumps itself or when considering an existing reuse, CleanWAD
    considers that MAPS (and their sub-entries), TEXTURES, PNAMES and generic
    LUMPS are ineligible even if the two lumps are of the same type. MAPS are
    structured entries that should really be left alone. Lumps in the other
    three lists are expected to be unique by definition and matches between them
    are unlikely to occur anyway. Entries in all other list types are eligible
    for reuse if they belong to the same list or to sub-lists that are logically
    of the same type (e.g., HEXEN fonts and HERETIC fonts).

*** The "-pl" option would seem to be potentially very slow when used on an IWAD
    or large PWAD, as it would seem to have to read each entry as many times as
    there are entries. However, entries are only considered if they are the same
    size and of types that are eligible and compatible for reuse. As a general
    rule, if they are not the same size, then they cannot be equal. There are
    very few sets of entries, even in the standard IWADs with exactly the same
    sizes, so it is reasonably fast in practice.

*** To speed things up further, the "-pl" option compares the packed or unpacked
    lumps, if either method is being used. The packing and unpacking methods
    always build the same canonical form of the resource, no matter how the data
    was spread across the resource. Thus by comparing the processed resource,
    CleanWAD correctly identifies matching lumps.

*** Where the "-pl" option CAN be slow is with a large WAD in which resources
    are being subjected only to rebuilding (instead of to packing or unpacking).
    While packing or unpacking pictures and blockmaps is guaranteed in itself to
    create a canonical form in which two resources that are logically equivalent
    will always pack or unpack identically, rebuilding is NOT (see Appendix 3).
    Therefore in this situation, CleanWAD must canonicalize every resource when
    it reads it back from the output file in order to compare it and cannot even
    rely on the lump size to screen out obvious mismatches. The number of times
    for this is potentially the SQUARE of the number of resources in the file.

*** This should not be a problem in practice, since the use of the "-pl" option
    indicates that the user wishes to pack the file as far as possible and will
    therefore likely be using the packing methods anyway, but it should be borne
    in mind. It is hoped that a future version of CleanWAD will use some sort of
    caching mechanism to assist this function.

*** The "-ul" option would appear at first sight to be redundant. However, there
    is a subtle distinction between "specifying none of these options" and
    "turning -ul on". Not mentioning any of these options in the command line
    does NOT literally mean "turn on -ul as a sort of default", it means "output
    the entries normally". Now under NORMAL circumstances, this means the same
    thing as what "-ul" does; however, having the "-ul" option provides a way of
    EXPLICITLY requesting that the directory and file data area be normalized,
    perhaps to override a behaviour that occurs under some ABNORMAL circumstance
    provided as part of new functionality in a future version of CleanWAD.

-ar (or -noar) stands for "align resources". Turning this option on causes
    CleanWAD to output every lump in the output file (but not the directory) on
    a four-byte (32-bit) boundary. This can have some effect on the speed of
    32-bit applications, but may not be noticable in practice. You can turn the
    option off to save a little space.

-ad (or -noad) stands for "align directory". Turning this option on causes
    CleanWAD to output the directory in the output file (but not the data lumps)
    on a four-byte (32-bit) boundary. This can have some effect on the speed of
    32-bit applications, but may not be noticable in practice. You can turn the
    option off to save a little space (up to three bytes). You could also use it
    to align the directory without aligning the resources ("-ad" and "-noar"),
    for example, if you are debugging a WAD tool or other editing application.

-in (or -noin) stands for "identify names". Turning this option on causes
    CleanWAD to identify lumps with Heretic/HeXen page names by name if needed
    (see "Heretic and HeXen Problems" for how to use this option).

-ip (or -noip) stands for "identify pages". Turning this option on causes
    CleanWAD to identify lumps with Heretic/HeXen page names as pages if
    needed (see "Heretic and HeXen Problems" for how to use this option).

-ig (or -noig) stands for "identify graphics". Turning this option on causes
    CleanWAD to identify lumps with Heretic/HeXen page names as graphics if
    needed (see "Heretic and HeXen Problems" for how to use this option).

*** Summary of "in", "ip" and "ig" usage (see "Heretic and HeXen Problems"):

     in  ip  ig  | UNKNOWN  DOOM    DOOM2   HERETIC HEXEN   STRIFE
     -------------+------------------------------------------------
     off off off | INVALID  INVALID INVALID INVALID INVALID INVALID
     off off on  | GRAPHIC  GRAPHIC GRAPHIC GRAPHIC GRAPHIC GRAPHIC
     off on  off | PAGE     GRAPHIC GRAPHIC PAGE    PAGE    GRAPHIC
     off on  on  | AUTO     GRAPHIC GRAPHIC AUTO    AUTO    GRAPHIC
     on  off off | LUMP     GRAPHIC GRAPHIC PAGE    PAGE    GRAPHIC
     on  off on  | GRAPHIC  GRAPHIC GRAPHIC PAGE    PAGE    GRAPHIC
     on  on  off | PAGE     GRAPHIC GRAPHIC PAGE    PAGE    GRAPHIC
     on  on  on  | AUTO     GRAPHIC GRAPHIC PAGE    PAGE    GRAPHIC

-ds (or -nods) stands for "detect sounds". Turning this option on causes
    CleanWAD to detect SOUND resources by their internal format, which is
    unlikely to occur by chance in any other kind of DOOM lump. This option has
    two effects. Firstly, unless the recognized_names ("-rn") option is turned
    on, entries identified by NAME as sounds are checked as to their format and
    rejected as sounds-by-name if they do not match the DOOM sound format.
    Secondly, irrespective of the recognized_names ("-rn") option, any lump that
    has not been identified by name during sounds-by-name checking is identified
    as a sound, if its contents match the DOOM sound format.

*** This option should not be used if there are sounds in your WAD file with
    standard DOOM names, but not in either DOOM sound or Microsoft(tm) WAVE
    format. Many source ports support exotic sound formats; including things
    like AU, OGG, MP3 or even DOOM sound format but with a different frame rate
    than the default value of 11025. It is intended that the "-ds" option will
    be extended to recognize these in due course, but that is work still to be
    done. Fortunately, WAD files that use such things are still relatively rare
    at the time of writing.

-dm (or -nodm) stands for "detect musics". Turning this option on causes
    CleanWAD to detect MUSIC resources by their internal format, which is
    unlikely to occur by chance in any other kind of DOOM lump. This option has
    two effects. Firstly, unless the recognized_names ("-rn") option is turned
    on, entries identified by NAME as musics are checked as to their format and
    rejected as musics-by-name if they do not match the DOOM music format.
    Secondly, irrespective of the recognized_names ("-rn") option, any lump that
    has not been identified by name during musics-by-name checking is identified
    as a music, if its contents match the DOOM music format.

*** This option should not be used if there are musics in your WAD file with
    standard DOOM names, but not in DOOM music or Standard MIDI format. Many
    source ports support exotic music formats; including things like MOD, MP3
    and XM. It is intended that the "-dm" option will be extended to recognize
    these in due course, but that is work still to be done. Fortunately, WAD
    files that use such things are still relatively rare at the time of writing.

-dg (or -nodg) stands for "detect gaphics". Turning this option on causes
    CleanWAD to detect GRAPHIC resources by their internal format, which is
    unlikely to occur by chance in any other kind of DOOM lump. This option has
    two effects which, unlike the corresponding SOUND and MUSIC options, are
    applied irrespective of other options. Firstly, entries identified by NAME
    as graphics are checked as to their format and rejected as graphics-by-name
    if they do not match the DOOM graphic format. Secondly, any lump that has
    not been identified by name during graphics-by-name checking, is identified
    as a graphic, if its contents match the DOOM graphic format.

*** The behaviour of this option is subtly different from its SOUND and MUSIC
    counterparts. In particular, although an alias of the GRAPHIC internal
    format is unlikely to occur by chance in any other kind of DOOM lump, it is
    more likely to happen than with sounds and musics. However, there is less
    chance of this being a problem in quantity than with non-graphic resources,
    because most graphics occur in structured lists and membership of a list
    overrides many aspects of graphic handling that would otherwise be affected.

*** The "detect gaphics" ("-dg") option can cause CleanWAD to run slower, as
    DOOM format pictures do not have a "proper" header in the same way that
    sounds, musics and indeed real-world graphic files (such as BMPs and JPGs)
    do; specifically, there is no "magic" code of any kind. Thus, CleanWAD must
    often resort to checking the entire lump in order to make a determination.

*** The "identify graphics" ("-ig") option is only relevant if there is doubt
    as to whether or not the graphic is really a Heretic/HeXen screen page
    image. If a graphic has been identified by name and, according to the settings
    of the "identify" options ("-in", "-ip" and "-ig"), is NOT suspected of
    being a Heretic/HeXen screen page, then the check for a valid graphic format
    is made REGARDLESS of the setting of the ("-ig") option.

*** If, when the ("-dg") option is turned on, the "identify" options ("-in",
    "-ip" and "-ig") DO cause a lump to be treated as a graphic after being
    suspected of being a Heretic/HeXen screen page, the test for a valid graphic
    will still be applied and, if failed, will cause the lump to be flagged as a
    generic lump; this must be done because the lump cannot be a valid graphic.

*** This option should not be used if there are graphics in your WAD file with
    standard DOOM names, but not in DOOM graphic format. Many source ports
    support exotic graphic formats; including things like BMP, JPG and PNG. It
    is intended that the "-dg" option will be extended to recognize these in due
    course, but that is work still to be done. Fortunately, WAD files that use
    such things are still relatively rare at the time of writing.

-rn (or -norn) stands for "recognized names". Turning this option on causes
    CleanWAD to identify SOUND, MUSIC, DIALOG and CONVERSATION resources purely
    by the recognized names (as found in the standard IWADs). It avoids
    potential misidentification of entries, especially if the game to use is
    specified, but will not recognize new resource names that exist only in the
    input WAD file. In particular, CleanWAD does not yet parse MAPINFO and
    SNDINFO lumps, so use this option with care on the more recent WADs.

-lm (or -nolm) stands for "loose markers". Turning this option on causes
    CleanWAD to recognize not only 'S' (sprites), 'P' (patches), 'F' (flats),
    'C'(colormaps), 'V' (voices), 'A' (ACS libraries) and 'TX' (auto-textures)
    as valid list markers before "_START" and "_END", but any arbitrary string
    as a list marker (e.g., "QK_START" and "QK_END"). It does this by building a
    symbol table of all the marker types that it encounters.

-nm (or -nonm) stands for "named markers". Turning this option on causes
    CleanWAD to take into account only the name of a lump (or, more
    specifically, the SYNTAX of the name) if loose markers is on ("-lm") and
    there is still any doubt as to whether or not a lump name is a marker. If
    loose markers is on ("-lm") and named markers is on ("-nm") then any lump
    whose name is not a predefined marker, but does have the SYNTAX of a marker,
    is identified as a marker irrespective of all other considerations. If loose
    markers is on ("-lm") and named markers is off ("-nonm") then any lump whose
    name is not a predefined marker but does have the SYNTAX of a marker is
    identified as a marker ONLY if the lump has zero size. The value of this
    option is IGNORED if the loose markers option ("-nolm") is turned off.

*** As regards the "sn", "lm" and "nm" options, some games (i.e., Strife) and
    some PWADs are known to have music entries that have the same syntax as list
    markers; for example "D_END" in Strife. If you get unexpected results when
    cleaning a PWAD then this may be the cause. You can also use the "-g?"
    option to ensure that only names from a particular game are recognized.

-mb (or -nomb) stands for "maintain blockmaps". Turning this option on causes
    CleanWAD to preserve the linedef lists in a blockmap whenever the "-rb",
    "-pb" or "-ub" options is on. It is possible for a blockmap linedef list to
    include duplicate entries and entries that are not in numerical order. The
    result of playing a map with this error is undefined; many engines could
    play such a map because they only look up the linedefs; however, others
    might ignore subsequent linedef numbers if they have found the one that
    is being searched for. This is implementation dependent on each engine. It
    also means that two logically equivalent linedef lists could be physically
    different byte-for-byte and this prevents blockmap packing when the "-pb"
    option is used and lump reuse when the "-pl" option is used from being fully
    effective. For this reason, the blockmap processing logic in CleanWAD will
    by default rebuild any linedef list that include repeated or out-of-order
    linedef numbers. In case your map includes such things deliberately (for
    example, your engine uses them to mean something), you can prevent CleanWAD
    from doing this with the "-mb" option. The value of this option is IGNORED
    if none of the "-rb", "-pb" or "-ub" options are turned on.

-mp (or -nomp) stands for "maintain pictures". Turning this option on causes
    CleanWAD to preserve the pixel columns in a picture whenever the "-rp", 
    "-pp" or "-up" options is on. It is possible for a picture column to include 
    pixel runs whose starting row numbers are duplicated, pixel runs whose 
    starting row numbers are not in numerical order, or even pixel runs that 
    partially overlap! It can also happen that pixel runs occur contiguously, 
    which should only ever be used when a pixel run is longer than 255 pixels. 
    The result of displaying a picture with this error is undefined; many 
    engines would work fine, as long as they render each pixel run strictly in 
    the order that it is encountered in the column data. However, others might 
    ignore subsequent runs if they have already rendered one further down the 
    column. This is implementation dependent on each engine. It also means that 
    two logically equivalent columns could be physically different byte-for-byte 
    and this prevents picture packing when the "-pb" option is used and lump 
    reuse when the "-pl" option is used from being fully effective. For this 
    reason, the picture processing logic in CleanWAD will by default rebuild any 
    picture column that include repeated or out-of-order pixel run start row 
    numbers or whose pixel runs partially or completely overlap or are 
    contiguous (except where they break at the standard row 254 boundary and 
    only exist for extending a 255-pixel run). In case your WAD file includes 
    such things deliberately (for example, ZDoom uses lower-than-before row 
    numbers as part of a means of having pictures higher than 509 pixels), you 
    can prevent CleanWAD from doing this with the "-mp" option. The value of 
    this option is IGNORED if none of the "-rp", "-pp" or "-up" options are 
    turned on.

-kw (or -nokw) stands for "keep wintex". Turning this option on causes CleanWAD
    to preserve the leftover _DEUTEX_ lump that WinTex puts in PWADs. This lump
    "iz no good" to anyone in a final build of a PWAD and should be deleted. If
    for some reason you want to preserve it then turn the "-kw" option on.

-kf (or -nokf) stands for "keep platForm". Turning this option on causes CleanWAD
    to preserve the leftover PLATFORM lump that some editor puts in PWADs. This
    lump is meaningless in a final build of a PWAD and should be deleted. If for
    some reason you want to preserve it then turn the "-kf" option on.

-kh (or -nokh) stands for "keep history". Turning this option on causes CleanWAD
    to preserve the leftover HISTORY lump that some editor puts in PWADs. This
    lump is meaningless in a final build of a PWAD and should be deleted. If for
    some reason you want to preserve it then turn the "-kh" option on.

-kt (or -nokt) stands for "keep tagdesc". Turning this option on causes CleanWAD
    to preserve the leftover TAGDESC lump that some editor puts in PWADs. This
    lump is meaningless in a final build of a PWAD and should be deleted. If for
    some reason you want to preserve it then turn the "-kt" option on.

-kp (or -nokp) stands for "keep pcsfx". Turning this option on causes CleanWAD
    to preserve the PC sound effect lumps that are present in DOOM, DOOM2, TNT,
    PLUTONIA and STRIFE and which could also exist in some PWADs. These are
    obsolete (since almost everybody has a sound card nowadays) and by default,
    CleanWAD removes any such lumps that it finds. If you are cleaning an IWAD,
    instead of a PWAD, you may need to turn this option on, depending on whether
    your engine REQUIRES these lumps to exist.

-kd (or -nokd) stands for "keep doubles". Turning this option on causes CleanWAD
    to preserve the double first letter in list markers SS_START, SS_END,
    PP_START, PP_END, FF_START and FF_END. These markers are used in old-style
    sprite/flat-in-a-pwad hacks and possibly by other WAD tools. However, the
    are not actually valid markers as far as the games are concerned and such
    hacks are unneeded by modern source ports anyway. By default, CleanWAD will
    correct these markers to their single-letter version.

-kb (or -nokb) stands for "keep borders". Turning this option on causes CleanWAD
    to preserve commonly-ocurring, but redundant, sub-list border markers. Many
    WADs, including some IWADS, include extra list markers such as F1_START and
    F1_END and some WAD editors use or create these as well. They are usually
    used to distinguish between shareware, registered or commercial resources.
    The engines, however, ignore them; they are not a true part of the WAD
    format and are totally unnecessary. By default, CleanWAD will remove them.

-ke (or -noke) stands for "keep empties". Turning this option on causes CleanWAD
    to keep any structured lists, that is, the type delimited by markers such as
    F_START and F_END, that have no actual entries between the markers; in other
    words, empty lists. By default, CleanWAD will remove empty structured lists.

-tm (or -notm) stands for "tolerate multiples". Turning this option on causes
    CleanWAD to only issue a warning, instead of an error, if it finds multiple
    occurences of the same structured list in the input file. Some WAD tools do
    not merge entries into one list when importing, but create a separate list
    WITHIN a file for each group of entries of the same type. This is, strictly
    speaking, an error; however, some source ports tolerate this organization,
    usually as an unintentional by-product of the way they handle structured
    lists BETWEEN files (such as when loading sprites and/or flats from a PWAD).
    The "-tm" option tells CleanWAD to warn the user of it, but not disallow it.

-qm (or -noqm) stands for "quiet multiples". This option is exactly the same as
    option "tolerate multiples" ("-tm"), but also suppresses any warning
    messages that would be generated by option "tolerate multiples" ("-tm").

*** As regards multiples, CleanWAD should ideally merge such lists; however,
    doing so requires duplicating much code of the various source ports and is
    not strictly necessary. It will therefore not appear in CleanWAD any time
    soon unless a lot of free time becomes available to the Author (or unless
    some third party contributes the necessary code :) ).

-tl (or -notl) stands for "tolerate links". Turning this option on causes
    CleanWAD to only issue a warning, instead of an error, if it finds lump
    reuse in the input file when it is not expecting any. The reuse of lumps
    between directory entries is, strictly speaking, an error and by default,
    CleanWAD treats it as such. However, if you turn on any of the options
    "-rl", "-pl" or "-ul" you are, in effect, telling CleanWAD (a) to expect the
    possibility and (b) how you want it handled if it occurs (there will always
    be a diagnostic issued for each lump under such circumstances, but as an
    action or detail, rather than as a warning). However, if you do not do that,
    then CleanWAD will warn you if any lump reuse,is found, unless "-tl" is
    turned off, in which case it is treated as an error. In such a situation,
    whether treated as a warning or an error, the reuse of the lumps will be
    subjected to the default lump reuse handling which, as stated above, is
    equivalent to "-ul". The value of this option is IGNORED if any of the
    "-rl", "-pl" or "-ul" options are turned on.

-ql (or -noql) stands for "quiet links". This option is exactly the same as
    option "tolerate links" ("-tl"), but also suppresses any warning messages
    that would be generated by option "tolerate links" ("-tl").

*** You might ask why the "-tl" and "-ql" options are provided at all, why not
    just give one of "-rl", "-pl" or "-ul" every time you run the program? The
    reason is that you may want, like the author, to treat lump reuse as an
    error and stop the program if any is found or sometimes, depending on the
    situation, to warn about it and unpack it. Depending on the verbosity level,
    giving "-ul" does necessarily not tell you that the reuse was there, hence
    the need for a warning or error message if you want to be notified of it.

-tc (or -notc) stands for "tolerate conflicts". Turning this option on causes
    CleanWAD to only issue a warning, instead of an error, if it finds lump
    reuse in the input file when it IS expecting the possibility, but finds it
    occuring between lumps that are ineligible for reuse (either individually or
    because the pairing itself is ineligible). For example, attempting to reuse
    a graphic font lump between directory entries that occur in different
    Heretic/HeXen font lists is fine, because the entry types are physically the
    same; however, attempting to reuse a sprite for a sound would normally be a
    severe error. Nevertheless, many PWADs include dummy entries for blank
    sounds and so forth, usually in conjunction with a DeHackEd patch; while
    some source ports can even use sprite graphics as wall patches. The "-tc"
    option, in effect, tells CleanWAD that the PWAD author may have actually
    meant to do it and therefore to warn if it happens, but not disallow it.

-qc (or -noqc) stands for "quiet conflicts". This option is exactly the same as
    option "tolerate conflicts" ("-tc"), but also suppresses any warning
    messages that would be generated by option "tolerate conflicts" ("-tc").

*** Attempting to use options "-tc" or "-qc" without also using one of options
    "-rl," "-pl", "-ul", "-tl" or "-ql" triggers an error message. This is
    because "-tc" and "-qc" are actually qualifiers to the other five lump reuse
    options and not strictly options in their own right; you would, in effect,
    be telling CleanWAD to HOW qualify without telling it WHAT to qualify.

-uc (or -nouc) stands for "unpack conflicts". Turning this option on causes
    CleanWAD to write out each lump that is a part of an ineligible lump reuse
    wherever ineligible reuse is being tolerated; i.e., whenever (one of "-tc"
    or "-qc") is turned on along with either ("-tl") or (one of "-rl", "-pl" or
    "-ul"). It is equivalent to "-ul" but for those lumps ONLY, in which cases,
    it overrides whichever of "-rl" or "-pl" may have been turned on. In effect,
    this option allows ineligible lump reuse to be removed without interfering
    with other lump reuse. The value of this option is IGNORED if neither of the
    "-tc" or "-qc" options is turned on.

*** Note that specifying any of ("-tl", "-ql", "-tc" and "-qc") does not by
    itself cause any existing lump reuse to be preserved; these options merely
    specify how seriously to treat lump reuse, not what to do with it when it is
    found (that is what the options "-rl", "-pl", "-ul" and"-uc" are for).

-rs (or -nors) stands for "remove scripts". Turning this option on causes
    CleanWAD to remove any SCRIPTS or SCRIPTnn entries (where nn is 01..99) from
    the file. Several HeXen and ZDoom WADs include with each map a script entry
    that is merely a deassembly of the BEHAVIOUR lump of that map and not really
    true source code. Existing ACS deassemblers have severe difficulty with
    ZDoom ACS code and usually get it wrong, which is not really surprising
    given the new p-codes available and the way in which they are arranged. Even
    with original HeXen scripts, several WAD editors have trouble with them and
    often mis-place them in the WAD. UNLESS THEY ACTUALLY CONTAIN ORIGINAL
    SCRIPT SOURCE CODE, you can get rid of them and try to deassemble the
    corresponding BEHAVIOR yourself. This also saves space in the output file.

*** This option should be used with caution on STRIFE files; conversation
    scripts share the same naming conventions as that used by some WAD editors
    for HeXen script source lumps. CleanWAD does not simply disallow this
    situation, because it might one day be possible to make a "Strife in HeXen
    format" WAD in which ACS is used for everything (including NPC interaction).

*** Do not use this option on any WAD that includes both types of script,
    because CleanWAD will not necessarily be able to tell the two types apart.

-dp (or -nodp) stands for "declassify pnames". Turning this option on causes
    CleanWAD to not classify PNAMES lumps specially, but to treat them as normal
    unclassified lumps. By default, CleanWAD places PNAMES in its own list and
    places that just after any TEXTURE lumps", the net effect being to treat
    PNAMES as if it is part of textures (i.e., the way WinTex does).

-lh (or -nolh) stands for "loose headers". Turning this option on causes
    CleanWAD to attempt to recognize nonstandard map names as well as E?M? and
    MAP??. This has some limitations as it (for now) does not involve reading
    MAPINFO or DDFLEVL lumps. These issues are explained in appendix 1.

-qh (or -noqh) stands for "quiet headers". Turning this option on causes
    CleanWAD to not issue a warning for any map name header lumps that do not
    have zero size. Unless your WAD contains FraggleScript, or whatever, map
    name header lumps are, by convention, expected to be zero length objects
    that exist only in the WAD file directory and are not present as actual data
    in the file; hence, by default, CleanWAD warns you about any that do not
    have zero size.

-fr (or -nofr) stands for "force removal". Turning this option on causes
    CleanWAD to always remove duplicate entries when option "-rd" is turned on,
    even if the entries are in different lists. By convention, there should
    never be entries with the same name but in different lists and it makes
    sense not to allow such ambiguities to occur. However if any are found,
    CleanWAD will by default issue a warning and not remove the duplicate. You
    can force CleanWAD to remove such duplicates by turning the "-fr" option on.
    The value of this option is IGNORED if the "-rd" option is turned off.

-iv (or -noiv) stands for "identify voices". Turning this option on causes
    CleanWAD to identify any standard Strife voice sounds (from VOICES.WAD) that
    are present in the WAD by NAME, even if they do not appear between V_START
    and V_END markers. These sounds are, in fact, all in standard DOOM format
    and it is easy to envisiage that a WAD author might someday replace them in
    a PWAD but without the markers. This option causes SOUNDS and VOICES having
    these names to be considered elegible for duplicate removal and lump reuse.
    The value of this option is IGNORED if the game is not Neutral or Strife.

Heretic and HeXen Problems
------- --- ----- --------

If CleanWAD attempts to process a graphic with an invalid header, then it will
issue a warning and not modify the graphic. This applies to all graphics,
irrespective of source. However, some Heretic IWAD pictures (CREDIT, E2END,
FINAL1 FINAL2, HELP1, HELP2, TITLE) and some HeXen IWAD pictures (CREDIT,
FINALE1 FINALE2, FINALE3, HELP1, HELP2, INTERPIC, TITLE) are in fact pure
screen, or "page" images, not DOOM format (i.e., no header but just a bitmap).
This is a serious problem, because although in these IWAD pictures, none of the
bytes in the position where the image header would be are valid height or width
values, THE SAME MIGHT NOT BE TRUE WHERE THEY ARE REPLACED IN A PWAD.

You cannot even replace these pictures after converting them into proper DOOM
format versions, because Heretic, and several source ports for that matter,
EXPECT the images to be in this format; at time of writing, ZDoom 2.0.63 and
CleanWAD itself are the only known programs that can attempt to auto-detect the
format of these images and of course, auto-detection is not GUARANTEED to work.

Of course CleanWAD can not (and should not) just ignore ANY pictures so named,
because the DOOM and DOOM II IWADs have similarly named resources that ARE real
doom pictures. In other words, there is no way to RELIABLY detect this condition
from the image contents alone. By SHEER LUCK, as mentioned above, with the
original Heretic CREDIT graphic (for example), you will get the message:

  WARNING: Picture CREDIT has an invalid header -- not processed

if you try this on the Heretic IWAD and similar results hold for HeXen. However,
if a PWAD replaces them with a different graphic in the same format and in which
the first eight bytes of color (pixel) information form 16 bit words in the
same range as proper height and width values, then CleanWAD could wrongly treat
them as DOOM format pictures AND CORRUPT THEM!

As listed previously, there are three options that control how CleanWAD handles
this situation, all of which are ON by default. They are IGNORED if the lump
name is not one of the special Heretic/HeXen screen names discussed herein or if
the lump has already been identified by name alone as a graphic or page. In the
table below, the DOOM2 settings also apply to TNT and PLUTONIA and the HEXEN
settings also apply to DEATHKINGS. These options are combined as follows:

  in  ip  ig  | UNKNOWN  DOOM    DOOM2   HERETIC HEXEN   STRIFE
  ------------+------------------------------------------------
  off off off | INVALID  INVALID INVALID INVALID INVALID INVALID
  off off on  | GRAPHIC  GRAPHIC GRAPHIC GRAPHIC GRAPHIC GRAPHIC
  off on  off | PAGE     GRAPHIC GRAPHIC PAGE    PAGE    GRAPHIC
  off on  on  | AUTO     GRAPHIC GRAPHIC AUTO    AUTO    GRAPHIC
  on  off off | LUMP     GRAPHIC GRAPHIC PAGE    PAGE    GRAPHIC
  on  off on  | GRAPHIC  GRAPHIC GRAPHIC PAGE    PAGE    GRAPHIC
  on  on  off | PAGE     GRAPHIC GRAPHIC PAGE    PAGE    GRAPHIC
  on  on  on  | AUTO     GRAPHIC GRAPHIC PAGE    PAGE    GRAPHIC

Meanwhile, the sound GFRAG in Heretic gives similar results:

  WARNING: Sound GFRAG has an invalid header -- not processed

and indeed, a hex dump of GFRAG shows that it genuinely does not resemble a
standard sound resource. In fact, the first half is just one character repeated
over and over.  According to the Heretic source code, GFRAG indeed IS a sound
and it gets used in multiplayer games when somebody gets fragged. This is
believed to be an error in the IWAD file so no special provision is made for it;
in any case, if you never play multiplayer games then you would never hear this
sound anyway. Of course, if you ever replace it in a PWAD, use a normal sound.



Structured List Problems
---------- ---- --------

Another issue that may cause confusion is how CleanWAD handles entries like
S_START and FF_END.  The assumption is that the '*_START' and '*_END' entries in
a WAD must form something similar to a set of matched parantheses.  In other
words, if every type of start and end represents a unique type of parenthesis,
based on the characters preceding the underscore, then the WAD must look like a
proper algebraic expression, with data entries for variables and operators.

However, in real WADs out there, this rule is broken all the time, sometimes
because of "ye olde flat hack" that was used in the days before source ports,
but nobody really knows why. Then there are the F1_START and P2_END and similar
markers that are used by some editors and indeed by the Heretic and HeXen IWADs
but not by the games. In any case, these kludges are not needed by source ports.

Thus, as of version 1.1, CleanWAD provides various options to normalize this
insanity into clean lists (for details, refer to the documention earlier on).

CleanWad does, however, detect various conditions that are definitely errors;
this is possible because the rules for structured lists are better understood
than when Serge Smirnov first wrote CleanWAD and because we now have the source
code of the games (except Strife, but that behaves like DOOM II) and thus know
how the markers are supposed to be used.



Other Problems
----- --------

(1) CleanWAD does not yet handle Legacy SKIN format lists. The author is not
    really sure what the DOOM community would want done with them, but any
    special treatment could be really hard work. Any suggestions???

(2) CleanWAD cannot parse MAPINFO and SNDINFO lumps and so name-based
    identification of sounds and musics other than those in the original IWADs
    will fail. This also applies, for those ports that support them, to maps
    that are not named E?M? and MAP?? (for example, see QDOOM for EDGE), unless
    the -gh (generic headers) option is used.

(3) CleanWAD has no "magic" power to prevent the user from specifying the input
    file as the output file. The basic check is made that the canonicalized file
    names are not the same; however, this will fail if one is a hard link to the
    other. To properly check, in a cross-platform compatibile way, that one file
    is not a link to the other requires major effort that is way beyond the
    scope of this version of the program. If CleanWAD behaves badly after
    appearing to open the files correctly, then this might be the problem.

(4) CleanWAD does not know the "intrinsic" ordering of STRIFE musics, as the
    source is not available and the author has not yet played through the game.
    If anybody knows which of the musics, other than D_INTRO and D_LOGO, are
    system and level musics, the author would appreciate being informed :-)

(5) CleanWAD is not intended to handle hi-resolution graphics as used in source
    ports like JDoom and the Doom Retexturing Project. If your WAD includes any
    of these, be sure to turn the "-rp", "-pp" and "-up" options off.

(6) The sounds identified using the "-iv" option are kept in the "normal" list
    of sounds and no V_START/V_END markers are created. It is not certain that
    this a problem, because nobody knows for sure what STRIFE would expect for
    VOICES added via a PWAD.

(7) The "detect sounds" ("-ds") option currently recognises only DOOM DMX and
    Microsoft(tm) WAVE formats; the "detect musics" ("-dm") option currently
    recognises only DOOM MUS and Standard MIDI formats; the "detect graphics"
    ("-dg") option currently recognises only DOOM picture format. It is intended
    that these options will in due course be extended to recognize the more
    exotic sound, music and graphic types supported by the source ports.



Bug reports
--- -------

CleanWAD may crash if the input file contains garbage - that is, pointers to
places past the end of file, or entries whose names imply a certain type but
contain something else. Such behavior is acceptable if the reason is that the
input file is corrupt. If CleanWAD crashes while processing a file that you
believe is a valid WAD file, send it to the author so that he can attempt to
figure out what is wrong.

CleanWAD should never write to uninitialized places in memory, no matter what
the input file is. A lot of effort has been made into making CleanWAD bullet
proof in this sense, so if it locks up your computer, let the author know so
that he can attempt to fix it

A lot of effort has been made to ensure that whatever CleanWAD produces works
exactly like the original file. Of course it might happen that somebody finds a
WAD that would be damaged by CleanWAD. If you're one such lucky person, don't
hesitate to email the author.

Finally, feel free to report anything you find annoying about using CleanWAD.
This includes things like excessive output to screen, confusing error/warning
messages, bad command line design, or anything that you think can be improved.



Credits
-------

ID Software      -- wrote DOOM and DOOM II and released the source code
Raven Software   -- wrote Heretic and HeXen and released the source code
Matt Fell        -- wrote The Unofficial Doom Specs
Olivier Montanuy -- gave Serge Smirnov the idea and suggested some of the design
Serge Smirnov    -- the original author of CleanWAD who worked so hard on it
Loads of people  -- wrote the source ports that make gaming so much better
Jacob Navia      -- wrote the freeware C compiler Lcc-Win32
Reign            -- for the music I mostly listened to while doing this :-)



Appendix 1:  Non-Standard Map Name Detection
-----------  -------------------------------

Some source ports allow nonstandard map names, that is, maps whose name is not
E?M? or MAP??. There is thus no longer a simple way of detecting the start of a
map data sequence. It is the opinion of the author that nonstandard map names
are a Bad Thing (TM) as they make things UNNECESSARILY complicated.

As a professional programmer for over a decade (and an amateur long before
that), it is the author's opinion that doing something that looks "cool" should
not be done "just because we can" if it involves a severe amount of effort to
support yet doesn't provide any tangible benefit. The author further believes
that nonstandard map names (in this format at least) are not TRULY necessary and
that extending a standard should be done by evolution rather than by revolution.

However, this particular genie is already out of Pandora's box and must be
supported to a reasonable extent. However, despite everything, nonstandard map
names are rare at the time of writing and it is thus wasteful to expend
disproportionate effort on supporting them.

This is how CleanWAD handles them; any suggestions for improvements are welcome.

REQUIREMENT:
	Detect nonstandard map names, WITHOUT relying on being able to parse a
	MAPINFO lump, DDFLEVL lump or whatever.
REASON:
	Nonstandard map names are a Bad Thing (TM) in the opinion of the author,
		as they make things UNNECESSARILY complicated. However, this
		particular genie is already out of Pandora's box.
	Nonstandard map names are rare at the time of writing and it is thus
		wasteful to expend disproportionate effort on supporting them.
	Parsing control information lumps involves messing aroung with code from
		source ports or else reverse-engineering them.
	One cannot predict future control information lumps in other ports.

ASSUMPTION (1)
	Any contiguous list of THINGS, SIDEDEFS, LINEDEFS, etc., are always
	preceded by a map name header that either is a standard map name or else
	has zero size.
REASON:
	The DOOM family games mandate a map header before the map data entries.
	This must be zero size, except in Doom Legacy where it might contain
		FraggleScript code. However, in Doom Legacy, nonstandard map
		names are not allowed and thus no ambiguity can arise.

ASSUMPTION (2)
	Any lump (a) of zero size (b) whose name is not a standard map name and
	(c) that immediately precedes a contiguous list of THINGS, SIDEDEFS,
	LINEDEFS, etc., is a map name header or else the WAD is broken and needs
	to be manually corrected.
REASON:
	The reasons in ASSUMPTION (1) above mean that if ASSUMPTION (2) is not
		true at any point, the only valid reason is a broken WAD file.

ASSUMPTION (3)
	Any contiguous list of GL_VERT, GL_SEGS, GL_SSECT, etc., are always
	preceded by an OpenGL map name header that either is a standard [base]
	map name or else has zero size.
REASON:
	The DOOM family OpenGL compliant source ports mandate an OpenGL map
		header before the OpenGL map data entries.
	This must be zero size [this is a property of the standard used	by DOOM
		family OpenGL compliant source ports].

ASSUMPTION (4)
	Any lump (a) of zero size (b) whose name begins "GL_" and (c) whose
	supposed base name is not a standard map name and (d) that immediately
	precedes a contiguous list of GL_VERT, GL_SEGS, GL_SSECT, etc., is an
	OpenGL map name header or else the WAD is broken and needs to be
	manually corrected.
REASON:
	The reasons in ASSUMPTION (3) above mean that if ASSUMPTION (4) is not
		true at any point, the only valid reason is a broken WAD file.

ASSUMPTION (5)
	Any OpenGL map name header immediately following a contiguous list of
	THINGS, SIDEDEFS, LINEDEFS, etc., must have the same base name as the
	map preceding it and must be the OpenGL entries for that map [by this,
	is meant that if the normal map name lump is MYMAP then the OpenGL map
	name lump MUST be GL_MYMAP].
REASON:
	Inside a GWA file (a normal PWAD containing ONLY OpenGL map data), no
		other entry types than OpenGL data are allowed [this is a
		property of GWA files] and thus no abiguity arises.
	Inside a PWAD file that contains OpenGL data, each normal map data
		sequence must be immediately followed by its own OpenGL map data
		sequence [this is a property of the standard used by DOOM family
		OpenGL compliant source ports].

ASSUMPTION (6)
	Any normal map data sequence that also has an OpenGL map data sequence
	would be unable to have a name longer than five characters irrespective
	of whether the map name is standard or nonstandard.
REASON:
	WAD file entry names are 8 characters or less [this is a property of all
		DOOM family games].
	The OpenGL map header name format reserves three characters for a "GL_"
		prefix thus leaving only five available for the map name.

ASSUMPTION (7)
	Any source ports that wish to support nonstandard map names AND the
	OpenGL standard used by the DOOM family OpenGL compliant source ports
	would have to accept that the only way to have a map name that is
	greater than five characters in length would be to forgo having	OpenGL
	data for that map.
REASON:
	If this were not true, such a map name would violate the standard used
		by the DOOM family OpenGL compliant source ports, thus rendering
		the source port in question noncompliant with the standard.



Appendix 2:  BlockMap and Picture Optimization
-----------  ---------------------------------

The purpose of OptimizeBlockmap() and OptimizePicture() is to rebuild a resource
and eliminate any holes in it. Both blockmaps and pictures consist of a header
followed by pointers to blocks of data, followed by the actual data. This means
that the actual data could be located anywhere in the lump after the pointers
and thus that the size of the lump could include unused areas that are not
pointed to by anything at all. For example, this can occur where several
pointers point to only one data portion (because it is identical to some of the
others), but the other copies of that portion are still present even though they
are ignored because nothing points to them. These functions get rid of any holes
by copying out only data that is actually pointed to by something.

To conserve space, the functions can also identify identical blocks of data and
copy the data only once, adjusting the pointers as necessary. This can greatly
reduce the amount of space required for blockmaps and pictures, but preserves
their functionality. This amounts to not having two copies of the same identical
data but instead having two or more pointers pointing to the same physical data.

The orginal author got the idea from node builders that do blockmap packing and
the current author has seen the same method manifest in other tools (e.g., WARM)
as LINEDEF sharing. Such packing is completely lossless in itself, but the data
must be unpacked in order to modify (instead of just use) the resources.

Resources packed in this fashion can be unpacked using the same function that
was used to pack them and this feature is provided for two reasons: (a) although
highly unlikely, some day somebody may write a program that does not work with
packed resource and (b) implementing it was simple.

For speed efficiency reasons, the functions were not coded in an easily
understandable way. For that reason, here is a pseudocode describing how they
work. Here, the word "block" refers to a block of data, which is either a list
of linedefs (in a blockmap) or a column (in a) picture).

for (every block)
{
    if (unpack option is on)
    {
        Copy the data for this block, whether it duplicates any other block or
	not. The pointer for this block points to it. The pointer will never
	also point to some other block or blocks, not under any circumstances.
    }
    else
    {
        if (pack is on) AND (block is identical to a previous block)
        {
            The pointer to that previous block can be used for this block:
	    make the pointer for this block a copy of the pointer to that
	    previous block. The pointer is duplicated, but not the data.
        }
        else
        {
            Copy the data for this block, as it is different. The pointer for
	    this block points to it. The pointer will not point to any other
	    block (unless a later block is found which duplicates this one).
        }
    }
}

It may be hard to believe that OptimizeBlockmap() and OptimizePicture() are
really that simple, but that's basically what they do. The only real difference
is that in OptimizePicture, each block is a column consisting of several posts
(a "post" is a vertical run of pixels of the same color) and every block is
composed of unsigned bytes instead of unsigned 16-bit integers.

In both cases the "in" pointer argument is a pointer to a copy of the original
resource; out is a pointer to a pointer to the (available) bytes preallocated
for the processed version. If the result is larger than the original (which can
only happen when unpacking a resource), OptimizeBlockmap() or OptimizePicture()
will reallocate the destination. Hence the need for double indirection.



Appendix 3:  The Canonical forms of Blockmaps and Pictures
-----------  ---------------------------------------------

When a data structure or other object has several physical forms that are all
equivalent, it is logical to ask, "Is there a natural-but-logical form of a
particular object that is always the same, whichever specific version of the
object you have at the moment? Like, all the parts of the object appear in the
same order, given explicitly in full, with no unintentional duplicates and no
wasted space? Is there a sort-of master version of an object that I can always
guarantee to get from the specific version I have, no matter which one it is?"

The proper term for the "natural-and-logical" form of something, or if you like,
the "master" version of it, is the "canonical" form. For example, consider a
directory DIR1 that contains a file called FILE.EXT. When in that directory, you
would only need to refer to the file as "FILE.EXT"; indeed depending on the
file's type, you might only need to refer to FILE without the extension. You may
also refer to the file as ".\FILE" (or on Linux or CygWin as "./FILE") or, if
DIR1 contains a subdirectory named DIR2, from DIR2 you would refer to it as
"../FILE.EXT". However, you will also be familiar with the concept of a "full
path from the root directory", in this example, "C:\DIR1\FILE.EXT". It is a way
of specifying the file that works *every* time whatever directory you are in,
contains no redundancy or indirection and in which everything is specified
literally in logical order. In other words, it is the "canonical form" of that
file's name. If you want to ask if two file names are the same, you would have
to convert them to canonical form in order to compare them; "..\DIR1\FILE.EXT"
and "FILE.EXT" are not the same, but from DIR1, they both are the same file.

The same applies to blockmaps, pictures and many other DOOM data structures.

It is possible for many blockmaps that are logically equivalent in the game to
be physically different, that is to say, the data pointed to by each block
pointer can be located anywhere in the resource; thus, every blockmap has a
canonical form in which the data is located contiguously after the block pointer
array, in block number order (the pointers are always in block number order
anyway). In the same way, every picture has a canonical form in which the pixel
data is located contiguously after the column pointer array, in column number
order (the pointers are always in column number order anyway).

Note that the "true" canonical form of a blockmap or picture is that produced by
CleanWAD's unpack method -- everything is spelled out in full, even though it
wastes space. CleanWAD's pack method is, however, is still canonical in the
sense that the *packed* forms of any two logically equivalent blockmaps packed
by CleanWAD are always identical, even if the supplied forms are not.

Note that CleanWAD's pack method will always produce the "packed canonical form"
of a resource, even though it reuses existing packing. This is worth bearing in
mind because a resource can be completely packed yet not canonical. For example,
even if all available pointer reuse is performed, the data need not be stored in
the resource in pointer order. This is why the the author has referred to
"CleanWAD's .... method" in this text -- other WAD tools may not do this.

CleanWAD's rebuild method, however, is not canonical, except in the trivial
sense that data is resorted to appear in blockmap order. The method does not
look for new data reuse; it uses any existing reuse and copies out in full (like
the unpack method) where there isn't any. Some of the data copied out could
therefore be the same as existing data.

Thus two blockmaps (or two picures) that are logically equivalent will NOT
necessarily be output in the same form by CleanWAD's rebuild method. For
example, consider two versions of the same blockmap, call them version A and
version B, that have three identical blocks.

In blockmap version A, the data for these blocks is *physically* present twice;
one copy is reused between two block pointers and the other copy is only used by
one pointer. In blockmap version B it is *physically* present three times, once
for each pointer that uses it (i.e., version A is partially unpacked and partly
packed, while version B is totally unpacked and not packed at all).

	CANONICAL
	                       0123456
		   [HDR]0123456ABCDCCE

	BEFORE REBUILD
	                       0123456
		A: [HDR]1023245BACDCE
		B: [HDR]3012456BCDACCE

	AFTER REBUILD
	                       0123456
		A: [HDR]0123245ABCDCE
		B: [HDR]0123456ABCDCDE

Since the pointer reuse level is not changed by CleanWAD's rebuild method
blockmap version B has one more physical block than A both before and after
rebuilding. Thus the two versions are different sizes, so not identical.



Appendix 4:  Directory Normalising
-----------  ---------------------

The routine that normalizes the WAD directory must guard against identifying an
OpenGL map sequence (GL_MAP01 .. GL_PVS) as belonging to the standard map
sequence (MAP01..BLOCKMAP) that immediately precedes it (unless it has the same
name). This should never be necessary as it is not done in practice to have GWA
style data mixed in more general WAD files; however, it might happen.

By this point, the only entries coming that are explicitly flagged as OpenGL
entries will be those that are standalone, that is not associated with a
standard map structure and this SHOULD only occur in GWA files.

It means that there is no risk caused by THIS function of falsely associating
OpenGL entries with any adjacent normal map data (although any preprocessor
function called after reading the WAD file directory must be designed to detect
and handle this possibility).

We set the type of map data to find in a variable to simplify the code. We
normally test for standard map data entries so these are assumed by default. If
an OpenGL entry is detected then we must change what we are looking for but this
is not an error possibility because as if we do not find one in the first place,
then the affected tests are never taken anyway or (if not map entries) would
fail (and should).



Appendix 5:  Copyrights and Permissions
-----------  --------------------------

The original permissions message was "You may use any part of this source
file provided you give me credit.", so the original work of Serge Smirnov is
hereby acknowledged - he put the hard work in when CleanWAD was first made.

Authors MAY use the contents of this file as a base for modification or reuse
provided that you e-mail the current author a copy and give them the right to
use parts of your work in future releases. [The same ought really to be done for
the original author of CleanWAD but he seems to have totally disappeared from
the DOOM community]. You must also credit work of all previous authors and that
of the original author, Serge Smirnov. Permissions have been obtained from
original author for any of their resources modified or included in this file
(he gave permission in the original source code provided he was credited).

You MAY (and are encouraged to) distribute and use CleanWAD, provided:

    (1) This entire collection of files is distributed UNMODIFIED,
	preferably in the original ZIP file in which it should have come. I
	have received permission from the original authors of any modified
	or included content in this file to allow further distribution.
    (2) The distribution is on a non-commercial basis; you may put CleanWAD
	on FTP sites or CD or other media as part of a collection for which
	you are charging a fee as long as you understand that the fee is for
	the collecting and (if applicable) the media, *not* for CleanWAD.
    (3) You accept that as with all free systems provided free of charge for
	like-minded people,CleanWAD is NOT guaranteed to work completely or
	correctly although it is likely to do so; thus it is a condition of
	use that you accept that as with most such products, you use it at
	your own risk. CleanWAD is **NOT** designed to be fault-tolerant!
    (4) Any legal disputes over CleanWAD or any part of it, including these
	terms and conditions, shall be governed by the Laws of England;
	furthermore, you do not have permission to use CleanWAD in any
	jurisdiction whose laws modify or limit these conditions unless you
	VOLUNTARILY accept these conditions as if under the Laws of England.

Note: although it won't happen straight away, once I have ensured that none of
Serge Smirnov's original code is left, I will CONSIDER releasing it under GPL.



Appendix 6:  Acknowledgements
-----------  ----------------

DOOM and DOOM2 are copyright of ID Software.
HERETIC and HEXEN are copyright of Raven Software.
STRIFE is copyright of Velocity.
CLEANWAD was written originally by Serge Smirnov.

Appendix 7:  Revision History
-----------  ----------------

04 May 2005
    Fixed non-recognition of 22050Hz sounds. It seems that vanilla DOOM does use
    them in places (such as DSDBCLS) whereas the standard sample rate is 11025.

22 Nov 2005
    Fixed crash in pixel column rebuild logic. Cleaned and improved code of
    pixel column and blockmap block rebuilding and error detection in both.

08 August 2005
    Added options ("-ds"), ("-dm") and ("-dg") to allow auto-detection of DMX
    and WAV sounds, MUS and MIDI musics and DOOM format graphics; changed "sound
    not processed" warning to sound less perjorative. Renamed endian-ness
    functions more meaningfully.

23 Jul 2005
    Added option ("-iv") to handle loose sound lumps by name as Strife voices.
    A few minor cosmetic bug fixes and documentation updates here and there.

13 Dec 2004
    Removes PLATFORM.

08 Dec 2004
    First version separating command-line parse from cleaning functionality.

19 Sep 2004
    Fixed bug in which lumps with names like E4M1SKY (i.e., sky patch for
    E4M1) or MAP25MUS (i.e., music track for MAP25) would be wrongly
    recognised as map name headers.

18 Aug 2004
    Added option ("-fr") to handle duplicate entry removal when the entries are
    in different lists. Final tidying of the source code, manual and text file.

09 Aug 2004
    Final touches for release version of V1.50. Removes _DEUTEX_, HISTORY and
    TAGDESC. Rebuilds dodgy blocks and columns if not told to preserve them.
    Final static checks and testing. Added DDF and RSCRIPT stuff to intrinsic
    sort lump list (as a former EDGE Team member, I really should do it :-) ).

03 Aug 2004
    Compiles cleanly under MSVC7 in 32-bit mode with 64-bit portability warnings
    enabled GENERALLY. If _WIN64 is defined, certain operations are explicitly
    checked for possible overflow; this should never happen with CleanWAD, as
    WAD files are inherently 32-bit objects. If _WIN64 is not defined, then the
    applicable warnings are disabled at those points and reenabled afterwards.

09 Jul 2004

    Fixed "find new lump reuse" bug - it actually works now! Somewhat slower
    therefore, since the current algorithm is O(n**2), but not unreasonably
    slow. For example, without turning on optimization in the compiler, the
    resulting program finds all possible lump reuse on DOOM2.WAD in about three
    minutes on a P4/2.6GHz. With most PWADs this should be a lot less.

    Fixed SCRIPTS entries in ACS Library list not sorting with the owning
    BEHAVIOR lumps when alpha or intrinsic sortation is applied. For example,
    sequence (A_START, ZZMYLIB, SCRIPTS, AAMYLIB, SCRIPTS, A_END) would end up
    as (A_START, AAMYLIB, SCRIPTS, SCRIPTS, ZZMYLIB, A_END) when it should (in
    both cases) end up as (A_START, AAMYLIB, SCRIPTS, ZZMYLIB, SCRIPTS, A_END).

08 Jul 2004

    Reworked all the "find new lump reuse" code. It is cleaner and more reliable
    and it always finds matches, if they exist, because it creates canonicalized
    copies of the lumps and compares those in situations where they would not by
    then have been canonicalized. It is still reasonably fast, even if the input
    is an IWAD, because it detects many kinds of definite mismatches in advance.

02 Jul 2004

    Cleaned up after putting in the "-fb" and "-fp" options and clarified and
    expanded documentation regarding canonical forms and resource compression.

01 Jul 2004

    Fixed reporting of picture size changes to be totally accurate. Originally,
    CleanWAD would only report the change of a picture size if the sizes, each
    divided by 4 with no remainder, were different. Presumably this was some
    sort of attempt to not bother the user too much if the difference was only
    slight? Who knows, apart from Serge that is. Note that the modified picture
    was ALWAYS written out even if the size in bytes wasn't different.

    Fixed reappearance of "ERROR" if the user runs CleanWAD with no argments; it
    should in that case simply print the usage message without complaining. Note
    that it still returns status code 1 as this is, strictly speaking, an error.

30 Jun 2004

    Implemented environment variable for user-specified default options and
    reworked options handling so that the default options, any options from the
    environment and any command-line options can all be merged sensibly. The
    internal default options are now parsed and validated the same way as any
    others, using an internal "argv[]" format structure but at the same time,
    neatly laid out in the source code to provide easy modification.

28 Jun 2004

    Finalized the lump-reuse handling and created a formal beta version.

26 Jun 2004

    Added option ("-te") to enable toleration of empty structured lists and also
    code to remove them by default. Massive cleanup of the lump reuse handling.

06 Jun 2004

    Added option ("-tm") to enable toleration of multiple occurences of the same
    structured list. If enabled it is a warning and if disabled it is an error.
    Added lump sharing between directory entries, in various forms, as requested
    by "Chilvence" on the ZDoom forums. Some PWAD authors do this kind of thing
    as a sort of packing; but the PWAD rebuilding process, of course, undoes it,
    so you can tell CleanWAD to preserve it or even do more of it when possible.

05 Jun 2004

    Improved diagnostics upon duplicate map removal and applied same to ACS
    libraries that have SCRIPTS entries for each module. CleanWAD will still
    complain about multiple occurences of the same structured list, but this is
    now a warning and not an error.

26 May 2004

    Finalized directory sortation options and reworked "intrinsic" sortation.
    Experimentally extended "intrinsic" sortation to graphics; however, this
    is disabled for now as it needs a lot of work to make it reliable and there
    is a compiler limitation that has to be worked around in order to do it.

24 May 2004

    Added main directory sortation options and started work on experimental
    implementation of the "intrinsic" sortation using music names only.

22 May 2004

    Enhanced the command-line parser (and manual) such that all options follow
    the standard practice of always starting with a minus sign, with the word
    "no" being the prefix applied to turn a boolean option off (e.g., "-norp").
    This is to make the syntax closer to current argument-parsing practice.
    None of the options start with the string "no" and none will in the future.

12 May 2004

    Massive tidy-up of the command-line parser and the manual. Fixed options to
    appear before any arguments, in line with current argument-parsing practice.

10 May 2004

    Added the three "identify by" options and the "named marker" option to aid
    minimization of entry misidentification in the extreme cases. If only Raven,
    Velocity and some PWAD authors had adhered to the DOOM specifications :-) !

06 May 2004

    Support for big-endian machines has been added. In services.h are a block of
    definitions that need expanding or modifying for each new platform or group
    of platforms. It would be helpful if someone does this and reports any bugs,
    as the author does not have access to a big-endian machine on which to test.

02 May 2004

    The program was huge and unweildy. It needed several things added, even over
    the additions that I had already made and I also felt that a lot of the
    functionality could be converted into libraries for generic use in writing
    WAD tools. Splitting it into modules would also make maintainance easier.

    This version has been re-written almost from the ground up and uses a
    totally new engine that is cleaner, more secure and - importantly - more
    extensible. There is hardly a line left of Serge Smirnov's original code.

    The "ignore wad syntax" option has been removed completely as it is now
    totally unnecessary. CleanWAD builds a symbol table of all the lists that it
    finds and any remaining problems are genuine errors that should NOT be
    treated with "oh well output it anyway so at least I can use the WAD".

    A lot of work was also spent expanding and updating the documentation.

10 Apr 2004

    I was fed up with WAD cleaners that couldn't handle modern standards of WAD
    files, but really liked CleanWAD. Then I remembered that Serge Smirnov had
    released the source code, so I rolled my sleeves up and here it is!

26 Sep 1995

    Serge Smirnov released the original CleanWAD. It was his second DOOM program
    project and was intended for a much wider range of users than his first one,
    that being CUSTMWAD (a set of two very low level WAD tools). CleanWAD owed
    much of its existence to Olivier Montanuy (author of DeuTex and WinTex) who,
    in the aftermath of CUSTMWAD, ended up asking Serge to write a WAD cleaner.
