============================================================================== EXTENSIBLE ROMHACK INTERCHANGE FORMAT File Format Specification, edition of January 10, 2008 =============================================================================== This document defines the Extensible ROMhack Interchange Format (XRIF), a gen- eral-purpose and extensible binary patching format. This document is designed to be viewed with a fixed width (monospace) font, in a display environment at least 80 characters wide. Tab size doesn't matter. ----------------------------------- WEBSITE ----------------------------------- http://zerosoul.arc-nova.org/Technology/XRIF/ (Failing that, use a search engine.) ========================== INTRODUCTION AND RATIONALE ========================= ---------------------------- THE CURRENT SITUATION ---------------------------- There is a format called IPS; whether this stands for "International Patching System" or "Intelligent Patching System" is a matter of some disagreement among users. (I myself hold that neither title accurately describes it.) It is a sim- ple patching format, with data arranged into "blocks", each of which contains an address into the target file at which to begin overwriting data, the amount of data to write there, and finally the data itself. IPS is well-supported and widely used in the ROM hacking community, but years of experience have revealed its limitations. Foremost among these is that it is remarkably unintelligent about handling shif- ted data, for example if some data is *inserted* into the middle of the modi- fied data. In this case, all IPS can do is perform meaningless comparisons be- tween the data at that location and the unrelated data that *used* to be at that location, and record all the (numerous) differences. This gets the job done, but is not at all elegant, intelligent or efficient. There are some other concerns as well, such as the fact that it is limited to 24-bit addressing, which is quite enough for most purposes, but has become a much more visible limitation with the advent of Game Boy Advance hacking: it is not uncommon to see 16 MB ROM images, which completely fills the entire addres- sing range of IPS. Also, the end of the file is marked by a special signature that occupies the space where normally the "address" field of the next block is to be found. As such, for patches of larger files, it is possible that there is a block that starts at the address that happens to be represented in the same way as the end-of-file signature, which will cause problems, and, at best, is something that has to be worked around. And, IPS doesn't allow for embedded annotation (the title of the work, who made it, and so on), or for storing a checksum of the target data to help ensure the correct file is being patched. ----------------------- HOW XRIF ADDRESSES THESE ISSUES ----------------------- FLEXIBLE FILE STRUCTURE Instead of consisting almost entirely of a sequence of blocks, an XRIF file can contain many different types of data, which are encapsulated in "chunks", each with a label to distinguish one type of data from another. There are many dif- ferent potential kinds of data (pure overwriting data, data shifting instruc- tions, annotation, and so forth), and they are all cleanly separated from other types of data in the file itself. ADDRESS RANGE IPS uses 24-bit addressing; this fact is hard-wired into the format. XRIF can use up to 63-bit addressing, and each patch can choose its addressing size, from 8-bit, 16-bit, 31-bit, 32-bit and 63-bit. However, the latter two are not required to be implemented by patchers (fortunately, they won't be needed ex- cept for patches of files over 2 GB), but are made available in the format in case they will be useful. MOVING DATA AROUND XRIF can contain instructions that copy some range of data from one place to an- other. This is a quick and efficient way to implement insertion of data into the middle of the target file: take the data after the insertion point and copy it, en masse, to its new location, then go in and write the inserted data in the usual way. EMBEDDED ANNOTATION XRIF can also store annotation (for example, who created the hack), in a manner that is fully internationalized, by recording the natural language (such as Eng- lish) of this annotation-- and allowing for annotation in multiple languages-- and recording this annotation in Unicode, by way of UTF-8. There are two meth- ods of recording annotation in XRIF-- a patch can use either method, or both-- one of which is simple and straightforward, while the other is much more expres- sive. VARIATIONS XRIF can also record multiple patches in a single file, with data common to all of them being recorded as such. Such "variations" can be used for many purposes, such as accomodating different versions of the target data, or for distributing an upgrade to a previous patch that includes the original patch too, or simply conveying multiple patches in a single XRIF file. EXTENSIONS But perhaps XRIF's most notable and defining characteristic is its ability to invoke Extensions. The base XRIF format, as defined by this document, is meant to be general-purpose, but XRIF can be adapted to highly specialized purposes by means of Extensions. These are invoked by name, and are defined separately from the base XRIF format itself, and inherit XRIF's features. OPTIONAL FEATURES XRIF has a number of exotic features, most of which are summarized above. How- ever, being mindful of how the format might be implemented, many of these feat- ures are declared as "optional": meaning that programs that implement XRIF are not required to implement some of the more unusual parts of the format. If a patch uses any of those optional features, it will say so in the file header, so that a program can see if the patch uses a feature the program doesn't pro- vide, and if so, to exit gracefully, without having to pre-scan the file and/or to abort in the middle of the patching process. Of course, I do not control others: they are able to write programs that do not actually implement everything marked as "required". The purpose, then, of mark- ing features as "required" is so that patch creators can beel free to use those features, with the reasonable expectation that patchers will support them. Con- versely, a patch creator should weigh the benefits of using optional features against the fact that they cannot be assumed to be widely supported. Thus, the "required" features comprise a lowest common denominator format that all imple- mentation of XRIF are expected to support. VERSION HANDLING The area of the header that says what features are used, also indicates which of those features are merely "advisory" (meaning, if the patcher doesn't recog- nize the feature, it can safely ignore it) and which are "mandatory" (meaning the patcher CAN'T safely ignore it). This helps ensure compatibility of current programs with future versions of XRIF, in a way much more flexible and meaning- ful than using version numbers. ------------------------ A BRIEF HISTORY OF THE FORMAT ------------------------ Today's XRIF format originated from my attempts to address NES (Nintendo Enter- trainment System) cartridge file patching. The predominant NES cartridge file format is called iNES (after the emulator that introduced it), and it is a sim- ple format. However, a more flexible and expressive format called UNIF (Univer- sal NES Interchange Format) has been defined. The only major problem, for the purposes of ROM hacking, is that it does not store ROM data in the same loca- tion within the file, both as compared to iNES, and possibly even to other UNIF file of the same game. However, the IPS format relies on the data being at the same location in the source file as in the target file, which UNIF cannot guar- antee, so IPS obviously does not work well with UNIF. My first attempt at addressing this problem was a format I called UNIF-IPS, which addressed UNIF chunks by name. Although this would work well with UNIF it- self, it was not so suitable for iNES, and so the idea was abandoned. Later, I created a format I called UNRIF (Universal NES ROMhack Interchange Format) which was meant to be format-neutral. However, although I even got as far as to make a public proposition for the format, it was never implemented in code. At one point, I had visions of a whole family of patching formats based on the idea of UNRIF, each tailored to a particular platform (NES, Super NES, and so forth). Although UNRIF was eventually abandoned, the idea of a family of for- mats was expanded upon, and I devised a new, general-purpose format that could be "extended" to serve specialized needs. This format was called XRIF. The det- ails of the format were worked out, and eventually it was first published on January 3, 2006, soon thereafter followed by the XRIF+NES specification (an Extension of the format), both of which were implemented not long afterward (by myself, no less). =============================================================================== =================== CONVENTIONS, STRUCTURE, AND TERMINOLOGY =================== * All multi-byte integers are represented in little-endian byte order, where the least significant bytes are at the lower file address (that is, they come first). For example, given the integer 0x12345678, this is represented as the byte sequence [78 56 34 12]. * [NN] means a byte with the value NN as expressed in hexadecimal (base 16). [NN NN NN...] means a byte sequence. "Byte", as used in this document (and, overwhelmingly, as used elsewhere too), means an 8-bit unit, an "octet". * All integers are to be interpreted as being unsigned. * A "FOO chunk" means a chunk having the label FOO. * The "XRIF header" means the contents of the XRIF chunk. * If something is "nul-terminated", it is followed by a [00], which serves to mark the end of it. This is commonly used with text data. * The "Nth byte" of a chunk refers to the Nth byte of the chunk's contents. These "Nth byte" ordinals start at 1; however, ordinals of the form "at off- set N" start at 0. Therefore, "the 1st byte", and "the byte at offset 0" (that is, 0 bytes away from the 1st byte), both refer to the same location. * As used in this document, "KB", "MB", "GB", and so forth, refer to quantities of bytes based on the powers-of-two definition: 2^10, 2^20, 2^30 and so on: equivalent to "KiB", "MiB", "GiB". * The "target data" is the data to which the patch is applied. Such data will usually, but not necessarily, be a file on disk. * A "Generic" XRIF file is one that does not use any Extensions; that is, which uses only the functionality described in *this* document. * A "patcher" is a program that applies an XRIF file to the target data. * A "patch generator" is a program that creates an XRIF file. * A "patch creator" is a person who uses a patch generator. * An "XRIF implementation" is a program or other system that implements XRIF. Patchers and patch generators are examples of this. * Digressions (which explore a specific subject in greater detail) are set off from the main text by being indented and wrapped in a border; this is to aid readability of the main text without losing the insight the digressions pro- vice. =============================== CHUNK STRUCTURE =============================== An XRIF file consists of "chunks", which are atomic units of data with an assoc- iated "label" specifying the purpose and format of the chunk's contents. Each chunk consists of the following, in this order: * The LABEL, which is 4 bytes of US-ASCII text such as "TREP". The name must consist entirely of upper- or lower-case letters and digits, with one excep- tion: out-of-band chunks start with a "." (period or full stop). * The LENGTH, which is a 32-bit (4 byte) integer specifying the length of the chunk's contents. This can be zero, in which case the mere presence of the chunk is enough to convey useful information. The value is limited to the 0x00000000 through 0x7FFFFFFF range (that is, 31-bit range). * The CONTENTS The interpretation of this data is based on the chunk's LABEL; the amount of it is given by the LENGTH. For example, here's a complete example chunk: +------------------------------------ Label (in this case, "XRIF") | +------------------------ Length (of contents) | | +--------- Contents ____|____ ____|____ _______|_______ / \ / \ / \ 58 52 49 46 06 00 00 00 7F 00 02 01 00 00 \_________/ \_______________/ | | | +--------- This is 6 bytes long... +------------------------ therefore this is set to 6. Chunks are laid out in the file sequentially, with no intervening bytes: the last byte of a chunk is followed by the first byte of the label of the next. =============================== STANDARD CHUNKS =============================== These are the chunks for the base XRIF format, and thus which comprise the set of chunks available to Generic XRIF files, and inherited by Extensions. ------------------------------------------------------------------------------- ------------------------------ XRIF: File header ------------------------------ ------------------------------------------------------------------------------- (Note: this refers to a chunk with the label "XRIF".) This is the first chunk in an XRIF file. It specifies some basic information about the patch, such as some structural parameters. Also, the fact that it is present at the very start of an XRIF file can serve as a way to quickly filter out things that *aren't* XRIF files. The first two bytes of the chunk (that is, of its contents) are always, without exception, [7F 00]. This serves as a "magic number" of sorts: every valid XRIF file will have this signature. The next byte indicates the size of "Address"-type fields, found in target-modi- fying chunks such as TREP: [00] = 8-bit addressing (0 through 0xFF) [01] = 16-bit addressing (0 through 0xFFFF) [02] = 31-bit addressing (0 through 0x7FFFFFFF, but see below) [03] = 63-bit addressing (0 through 0x7FFFFFFFFFFFFFFF) The next byte is the same, but defines the size of "Length"-type fields instead. "Address"-type and "Length"-type fields are collectively referred to as "block values". +--------------------------------------------------------------------------- | ON BLOCK VALUE SIZES | |----------------------+ | | 32-bit integers are, by default, limited to the positive range of signed | 32-bit integers (0 through 0x7FFFFFFF). For addresses, this yields an ad- | dress range of 2 GB, which is more than enough for the vast majority of | applications. For example, any *two* entire CD images could fit quite com- | fortably within that range; or, if the average size of a disc doesn't ex- | ceed about 682 MB, then it could contain *three* CD images. | | But, if even THAT isn't enough, patches can request that 32-bit integers | be allowed to use their full unsigned range (up to 0xFFFFFFFF), thereby | doubling the addressable range to 4 GB, which could address any byte ac- | ross about *six* CD images, and the better part of a lot of DVD images. | | If somehow even 4 GB is not enough, then a patch can use 63-bit addressing. | This gives an addressable range of no less than 8589934592 GB (8 EB), | which can address about 170,000,000 entire full size (50 GB) Blu-ray discs, | or could probably address most, if not ALL, the static content of the | entire Internet, including the Internet Archive and search engine data- | bases, at the individual byte level. | | Obviously, 64-bit addressing is massive overkill, but the format allows | for it just in case it is ever needed (that is, if full 32-bit addressing | isn't enough, which, when it comes to DVD images-- not to mention newer | technologies like HD DVD and Blu-ray-- it probably *will* be). It is by no | means required to be supported, though. Also, 64-bit integers are always, | without exception, limited to their positive signed range (that is, up to | 0x7FFFFFFFFFFFFFFF); the figures above take this into account. | | Besides being more than will be needed by all conceivable and realistic | uses of the format, it can also be troublesome to actually implement. In | the C language for example, there is not universal agreement on how to ob- | tain a 64-bit datatype to begin with (the use of notwithstand- | ing), and then there is the issue of using filesystem functions capable of | handling files larger than 2 GB; after all, there's no point in using 64 | bit-- or even "merely" full 32-bit-- addressing if only the first 2 GB are | addressed! | | And, on some platforms, obtaining even just an unsigned 32-bit datatype | can be difficult if not impossible; since 2 GB is more than enough for the | vast majority of applications, it is not required to support the full un- | signed range of 32-bit integers. | | +----------------------------------------------------------------------- | | Even if unsigned 32-bit datatypes ARE available, it is still easier to | | work with "31-bit" integers, for example by using unsigned 32-bit data- | | types to hold values, and you can add any two positive "31-bit" inte- | | gers and store the result in a 32-bit integer without worrying about | | overflow; this can be useful for checking that a TFSM operation does | | not exceed its boundaries, for example. | +----------------------------------------------------------------------- | | In any event, it is always required to support the full, unsigned range of | 8 and 16-bit block values. Furthermore, the above rules regarding "signed" | and "unsigned" are merely concessions to environments where unsigned data- | types may be unavailable: in XRIF, integers are NEVER to be interpreted as | being signed. | |--------------------------------------------------------------------------- | ON LENGTH SIZES | |-----------------+ | | The block length size has the same implications as above, except that, as | of the 2006-04-26 edition of XRIF, it is no longer required to support 32 | bit lengths at all (positive signed range or otherwise). This is to allow | implementations to, for example, allocate a 64 KB copy buffer (and 64 KB | is not much to ask for) and use *that*, and not have to worry about handl- | ing blocks that might exceed the buffer size. | |--------------------------------------------------------------------------- | THE BOTTOM LINE | |-----------------+ | | Once and for all, what is and is not required to be supported: | | +--------+--------------+---------------+ | | SIZE | ADDRESSING | BLOCK LENGTHS | | +--------+--------------+---------------+ | | 8-bit | REQUIRED | REQUIRED | | | 16-bit | REQUIRED | REQUIRED | | | 31-bit | REQUIRED | not required* | | | 32-bit | not required | not required | | | 63-bit | not required | prohibited** | | +--------+--------------+---------------+ | | * As of the 2006-04-26 edition of XRIF. Prior to that, it was required. | | ** If lengths are 64-bit, then a given length-type field will either be: | 1. Within the 32-bit range, in which case, 32-bit lengths will suffice. | 2. Outside the 32-bit range, which almost always means it spills past | the end of the containing chunk-- an individual chunk is limited to | 2 GB in size-- which is an error. | Besides, the choice of length size is merely one of efficiency: ANY | length size will get the job done, it is only a question of how many | data blocks it'll take. So, prohibiting certain block length sizes does | not reduce the format's capabilities. | +--------------------------------------------------------------------------- After the block address size and block length size bytes, are at least two bytes specifying "flags" that inform the patcher what features are used by the patch: each bit of these bytes is a flag. Of these bytes, ones at even numbered offsets are "advisory" and those at odd numbered offsets are "mandatory". The difference is in how a patcher is expected to handle unrecognized flags: * Unrecognized "advisory" flags can be safely ignored. In this case, chunks that are not recognized by their label can, and should, be silently ignored. * Unrecognized "mandatory" flags mean the patcher should exit gracefully, be- cause it will be required to do something it doesn't even know about, let alone know how to do. The first byte (5th byte of the XRIF chunk contents overall) is advisory: -------X : For variations, whether the "common segment" by itself is a select- able variation in its own right. (0=Yes 1=No) ------X- : For variations, whether to apply the common segment before (0) or after (1) the variation proper. -----X-- : Whether the target data is "streamable" (1=Yes). An explanation of this is given below. ----X--- : Whether the TFSM chunk is used. (1=Yes) ---X---- : Whether simplified annotation chunks are used. (1=Yes) You don't have to set this if you use the lower-case-"i" flag, below. --X----- : For variations, whether only one can/should be selected (0) or each individual variation can be independently enabled/disabled (1). -X------ : Whether simplified annotation chunks' labels start with a lower-case "i", and all such chunks are considered to be annotation. (1=Yes) Note that the first two flags, as well as the TFSM flag, should be treated as "mandatory"; these flags were assigned to this byte before I devised the alter- nate-between-advisory-and-mandatory scheme. However, these flags have been part of the specification since the beginning, therefore they are not "unrecognized", so this shouldn't be a problem. +--------------------------------------------------------------------------- | TARGET DATA STREAMABILITY | |---------------------------+ | | If the target data is "streamable", it means the patcher could potentially | apply the patch to the target data in a sequential manner, such as by read- | ing a target file from the standard input or decompressing it on the fly, | and/or writing it sequentially to the standard output or recompressing it | on the fly. This could be crucial for patches of very large files, so that | the patcher does not need to load the entire file into memory. | | Specifically, this means the target file is modified sequentially: after | an access (read or write) takes place, subsequent accesses shall only | occur AFTER that point. | | It is NOT required for patchers to implement streaming: this flag is sim- | ply a hint to the patcher that it COULD stream the target file if it want- | ed to. Also, it is not required-- but is of course recommended-- that a | patch generator mark an eligible patch as streamable. | | In short, if nobody paid any attention to this flag, or if some people did | and others did not, everything would still work. | | Extensions to XRIF can define additional rules as to whether a given patch | is or is not eligible to be marked target-streamable. | +--------------------------------------------------------------------------- The second of the flag bytes (6th byte of the XRIF chunk contents overall) con- tains "mandatory" flags: -------X : Whether the patch uses VARY. ------X- : Whether the patch uses TCFS. -----X-- : Whether the patch uses TADD. ----X--- : Whether the patch uses TROT. ---X---- : Whether the full unsigned range of 32-bit integers is used. --X----- : Whether the patch uses .AKA. -X------ : Whether the patch uses any of TAND, TIOR or TXOR. X------- : Whether the patch uses the new formats of INFO and VARY. (Refer to the definitions of those chunks for what that means.) The third of the flag bytes is advisory, but at the moment, no flags are def- ined for it. However, because these flag bytes are defined as alternating be- tween advisory and mandatory, compliant programs will recognize this byte as such. The fourth byte contains "mandatory" flags: -------X : Whether a block "length" field with a value of 0 actually means one higher than the maximum value. (1=Yes 0=No) For example, if using 8-bit lengths, a length of zero would be interpreted as 0x100. This only applies to 8- and 16-bit lengths, and not to addresses at all. ------X- : Whether TAND, TIOR and TXOR use a TREP-style block format. (1=Yes) Future editions of XRIF may specify additional flags; more bytes can be added to the XRIF chunk to accomodate them (alternating between advisory and manda- tory, as described above) if necessary; patchers shall allow for these extra bytes. The XRIF chunk contents will never, ever, exceed 0xFF (255) bytes in size, so a patcher does not have to accomodate anything larger, while still remaining in full compliance with this Specification. This also means that the upper 24 bits of the XRIF chunk's "length" header will always be [00 00 00], thereby adding yet another "magic number" to the header. Therefore, in a valid XRIF file, the first ten bytes will always be like so: +------------------------ Chunk label ("XRIF") | +------------ Chunk content length (>= 0x00000006) | | +--- "Magic number" (first two bytes of contents) ____|____ ____|____ _|_ / \ / \ / \ 58 52 49 46 xx 00 00 00 7F 00 This allows programs to determine fairly confidently that a file is, or is not, an XRIF file, without even having to actually implement it (with "mime.magic" files, for example). A summary of the content bytes: OFFSET FUNCTION 00 Magic number: must always be [7F] 01 Magic number: must always be [00] 02 Addressing size: 00 (8-bit), 01 (16-bit), 02 (31-bit), or 03 (63-bit). 03 Block length size: as above. 04+ Flags. Bytes at even offsets (04, 06...) are advisory, others mandatory. ------------------------------------------------------------------------------- ------------------------- XAPP: Extension Declaration ------------------------- ------------------------------------------------------------------------------- One of XRIF's defining features is its ability to invoke Extensions, which are formats that build on XRIF. XRIF itself is meant to be a general purpose format; Extensions can give it specialized functionality or adapt it for a specialized purpose. The XAPP chunk invokes such an Extension. It consists of the name of the Exten- sion so invoked, which is nul terminated, and optionally followed by zero or more bytes giving flags or parameters that can be used by the Extension, in much the same manner as for the XRIF chunk. However, whether the Extension's parameter bytes alternate between "advisory" and "mandatory", or even if it uses them as flags at all, is a decision left to the designer of the Extension. +--------------------------------------------------------------------------- | ON INHERITANCE | |----------------+ | | An Extension inherits all the functionality that is available to the XRIF | format that it extends. For example, if an Extension is defined, and then | later the base XRIF format is updated to add new features, those features | are inherited by the Extension, and should be taken into account by the | patcher. (A properly written patcher, though, will be able to distinguish | which unrecognized features are safe to ignore and which are not.) | | Remember, Extensions are built on top of XRIF: they are not self-contained, | complete formats unto themselves. | +--------------------------------------------------------------------------- A patch can have zero or more XAPP chunks, all of which must occur immediately after the XRIF chunk, and, if there is more than one XAPP chunk, they must all be contiguous (occuring one after the other). A given XAPP chunk operates on whatever came before it: for example, if there are two XAPP chunks, the first one extends XRIF itself, the second one extends the first Extension. +--------------------------------------------------------------------------- | HISTORICAL NOTE | |-----------------+ | | Originally, XRIF was supposed to be able to invoke not only Extensions, | but also "Supersets" (which might be thought of as "advisory Extensions", | where if a patcher doesn't recognize a Superset, it can safely ignore it) | and "Platform Declarations" (which do not affect functionality, but merely | declare the intended platform of the patch; for example, a patch for a | Game Boy Advance game-- which only requires Generic XRIF functionality-- | could be declared with an appropriate Platform Declaration). | | Extensions, Supersets and Platform Declarations are (or were) collectively | referred to as "Applications" of XRIF. When I was first designing what be- | came today's XRIF format, the chunk that invoked an Application was label- | ed "XAPP". Even though I later dispensed with the idea of Supersets and | Platform Declarations, the name of the chunk-- XAPP-- remained. | | In order to distinguish these types of Applications, the Application name | started with a "sigil" identifying what type of Application it was: for | Extensions this was "+", which is where the XRIF+FOO naming convention | came from. (For Supersets it was "/"; for Platform Declarations, "=".) | +--------------------------------------------------------------------------- No patcher is required to implement any Extension, unless it purports to. It is up to the patcher to recognize an Extension: if it does not, then it should exit gracefully; and if it *does* recognize an Extension, it should use it in accordance with that Extension's specification. As with the XRIF chunk, the size of an XAPP chunk shall never exceed 255 bytes. This is on the basis that the XAPP chunk itself should only contain flags that serve to indicate what functionality is required, so that patchers can see if it will be called upon to do something it can't do, and exit gracefully if so. If the Extension needs bulk data, it can be conveyed in separate chunks defined by the Extension (XRIF+NES for example has a separate CSIZ chunk instead of add- ing this information to its XAPP), perhaps with its XAPP chunk only indicating whether or not this data is present. ================================ PAYLOAD CHUNKS =============================== There are a number of chunks which perform the actual operation of modifying the target file, which is of course the fundamental purpose of an XRIF file, or indeed of any patch format. The data in these chunks are arranged into "blocks", each of which performs a single operation, containing fields such as addresses and data. The size of the block values (addresses and data-lengths) is determined by the XRIF header, des- cribed above. Each block occurs immediately after the previous block, with no intervening bytes. Be aware that it is NOT allowed to modify data outside the range of the target data; if this needs to be done, then a TFSM chunk will increase the size of the target data, and the extra space can then be addressed. This is because, unlike IPS, XRIF does not assume the target data is a file on disk: perhaps it will be applied to a set of data in memory (such as by an emulator that can perform on the fly XRIF patching), for example, and, unlike files on disk, memory doesn't magically expand to accomodate extra data if you put more into it than has been allocated. ------------------------------------------------------------------------------- ------------------------------ TREP: Replacement ------------------------------ ------------------------------------------------------------------------------- This modifies the target data by replacing parts of it with new data. This is the most basic method of binary patching, and is comparable to IPS's native-- and only-- format. The bulk of most XRIF files will consist of TREP chunks. Each block consists of the following: 1. TARGET ADDRESS (Address type) The offset into the target data at which to begin overwriting data. 2. DATA LENGTH (Length type) The amount of data to write there. 3. DATA The data itself, the amount of which is given by the DATA LENGTH field. For example, if the patch uses 31-bit addressing and 16-bit lengths, then this block would write the bytes [6C 6F 6C 7A] at offset 0x000158BF: +------------------------ Address (0x000158BF) | +--------------- Amount of data (0x0004 bytes) | | +------ Data ____|____ _|_ ____|____ / \ / \ / \ BF 58 01 00 04 00 6C 6F 6C 7A \___/ \_________/ | | | +------ This field is 4 bytes long... +--------------- ...therefore this is set to 0x0004. ------------------------------------------------------------------------------- ------------------------- TRLE: Repeated replacement -------------------------- ------------------------------------------------------------------------------- This is similar to TREP, except that the data is written multiple times succes- sively; each such time is called an Iteration. This can be used to fill in a region of the target file with a repeated pattern (including a one byte pattern) while only having to record the pattern one time. Each block consists of: 1. TARGET ADDRESS (Address type) The offset into the target data at which to begin writing. 2. DATA LENGTH (Length type) The amount of data in each iteration. This is the amount of data stored in the patch file itself. 3. ITERATION COUNT (Length type) The number of iterations. 4. DATA The data itself, the amount of which is given by the DATA LENGTH field. For example, if for some reason you want to write "SourceSourceSource" onto the target file, you can repeat "Source" three times. "Source" is six bytes long, so DATA LENGTH = 6. You write it three times, so ITERATION COUNT = 3. If you write it at 0x158BF, again using 31-bit addressing and 16-bit lengths: +----------------------------- Target address (0x000158BF) | +-------------------- Data length (here, length of "Source") | | +-------------- Iteration count | | | +-- Data ("Source") ____|____ _|_ _|_ _______|_______ / \ / \ / \ / \ BF 58 01 00 06 00 03 00 53 6F 75 72 63 65 \___/ \_______________/ | | | +--- This is six bytes long... +--------------------- ...therefore this is set to 0x0006. Then "SourceSourceSource" is written onto the target data starting at TARGET ADDRESS. If you want to fill in some region with a single byte value (such as [00]), you would have a DATA LENGTH of 1, with ITERATION COUNT being the number of times to write that byte. For example, to zero out everything from 0x10000 to 0x11FFF: +---------------------- Where to start writing (0x10000) | +------------- Data length (just one byte, so, 0x0001) | | +------- Iteration count (0x2000) | | | +--- What to fill it with ([00] in this case) ____|____ _|_ _|_ | / \ / \ / \ /\ 00 00 01 00 01 00 00 20 00 +--------------------------------------------------------------------------- | IMPLEMENTATION ADVICE | |-----------------------+ | | TRLE will likely be frequently used to "zero out" parts of the target data. | In this case, the data is a single byte (with ITERATION COUNT specifying | how many bytes to write), so implementations are advised to optimize for | the case of DATA LENGTH == 1, for example by reading the data byte and | using memset(3) as found in the Standard C Library: | | memset ( targetdata or buffer, data byte, iteration count ); | +--------------------------------------------------------------------------- As of the 2008-01-10 edition of XRIF, it's not allowed for DATA LENGTH x ITER- ATION COUNT to exceed 65536 bytes. ------------------------------------------------------------------------------- --------------------------- TCFR: Copy from result ---------------------------- ------------------------------------------------------------------------------- This copies a group of data from one location to another, all at once. This will mainly be of use if the target data is to have some data inserted into the middle of it: 1. Use TFSM (defined later in this document) to increase the target data size by the required amount. 2. Use TCFR to take the data after the insertion point and copy it to its new location. 3. (If desired) Use TRLE to zero out the insertion area. 4. Use TREP to write the new data to the insertion area. Each block of TCFR consists of: 1. SOURCE ADDRESS (Address type) The offset into the target data at which to begin copying data. 2. TARGET ADDRESS (Address type) The offset into the target data at which to begin writing the copied data. 3. DATA LENGTH (Address type) The amount of data to copy. This is of the Address type to allow for large scale copying without needing a large Block Length size (which would affect other chunks such as TREP). For example, if the target data size is 0x30000 bytes, and you want to insert 0x10000 bytes of data after the first 0x20000 bytes: 1. First, use TFSM to set the target data size to its new size of 0x40000 bytes. 2. Use TCFR to copy 0x10000 bytes of data from 0x20000 (old) to 0x30000 (new): +-------------------------- Start of source area (0x20000) | +-------------- Start of destination area (0x30000) | | +-- Amount of data to copy (0x10000) ____|____ ____|____ ____|____ / \ / \ / \ 00 00 02 00 00 00 03 00 00 00 01 00 It is legal for the two ranges to overlap, but not to be identical (which would be useless anyway). If the ranges overlap, the implementor should take care to ensure that, in the process of copying data, it doesn't clobber data it will need to read later; such as by using memmove(3) rather than memcpy(3). ------------------------------------------------------------------------------- --------------------------- TCFS: Copy from source ---------------------------- ------------------------------------------------------------------------------- This is like TCFR, except the data is copied from the *original* target data, the way it existed before any modifications that had been made to it by the patch. The block format is exactly the same; but unlike TCFR, it is perfectly allowable for the source and target addresses to be the same. (This is because the target data is no longer being used as its own source; it's copying from one set of data to the same offset of another set of data.) In order to implement this, a patcher will need to make arrangements to have the original target data available throughout the patching process. In general, this is a good idea anyway (for example, if something goes wrong in the middle of applying the patch, the target data won't have been irreversibly modified). However, making these arrangements can be impractical, such as if the target data is very large. As such, it is not required for patchers to implement this chunk; but, it is recommended that they do. ------------------------------------------------------------------------------- ----------------------- TAND, TIOR, TXOR: Bit twiddling ----------------------- ------------------------------------------------------------------------------- These perform "bit twiddling" operations: instead of simply overwriting data wholesale, they tweak individual bits, leaving others unchanged. They work by comparing the data in the patch with the target data, bit by bit, and writing the result of the comparison. The block structure is the same for all three. However, there are two formats available: either the default structure, or a TREP-style structure (the latter being available as of the 2008-01-10 edition of XRIF). +-----------------------------------------------------------------------+ | DEFAULT STRUCTURE | TREP-STYLE STRUCTURE | | | | | | This can be selected by a flag in | | | the XRIF header. | |-----------------------------------|-----------------------------------| | | | | 1. TARGET ADDRESS (Address type) | 1. TARGET ADDRESS (Address type) | | The address at which to begin | The address at which to begin | | twiddling bytes. | twiddling bytes. | | | | | 2. AMOUNT (Length type) | 2. DATA LENGTH (Length type) | | The number of bytes to twiddle.| The length of the twiddler | | | data. | | 3. DATA (one byte) | | | The byte with which to twiddle | 3. DATA | | the target data. | That data. | | | | |-----------------------------------|-----------------------------------| | | | | In this case, a single byte is | This works like TREP, except the | | used to twiddle a number of bytes | target data is twiddled rather | | in the target data. | than simply overwritten. | | | | +-----------------------------------------------------------------------+ Either way: for each byte of the target data, that byte is compared to the res- pective patch byte, and the result written back to the target file. The three chunks differ in the specific comparison performed: TAND: If both bits are 1, the result is 1, else 0. TIOR: If either bit is 1, the result is 1, else 0. TXOR: If the bits are the same, the result is 0, else 1. These correspond to the C language operators & | and ^ respectively. You might also think of them in this manner: TAND "masks out" bits: if the patch bit is 0, the result is 0, else unchanged. TIOR "sets" bits: if the patch bit is 1, the result is 1, else unchanged. TXOR "toggles" bits: if the patch bit is 1, the result is inverted (1<=>0), else unchanged. These are not required to be supported. However, if *any* of these are, then *all three* must be. It is not required to support both of the above formats (original and TREP-style); a patcher can support just one or just the other if it wishes, or it can support both (or neither). ------------------------------------------------------------------------------- -------------------------- TADD: Mathematic addition -------------------------- ------------------------------------------------------------------------------- This will perform mathematic addition on integers in the target file. This is not required to be supported. 1. TARGET ADDRESS (Address type) The offset into the target data at which to begin the operation. 2. INTEGER SIZE AND FLAGS (one byte) -----XXX : The size of the integer(s) in question, in bytes, minus one: 000 = 8-bit 001 = 16-bit 010 = 24-bit 011 = 32-bit 100 = 40-bit 101 = 48-bit 110 = 56-bit 111 = 64-bit At least 8/16/24/32-bit must be supported. Also, those are the only allowed values, unless the patch uses 63-bit addressing, in which case any of those values are allowed. -X------ : METHOD FLAG: how to interpret the next field. See below. X------- : Byte order 0 = Little endian 1 = Big endian Both of these must be supported. 3. DATA AMOUNT (Length type) If the "method flag" (above) is 0, then the DATA (below) is one integer (of the size and endianness given by the above byte), which is repeated the num- ber of times given by this field. For example, you would use this if you want to adjust a number of fields by the same amount. If the method flag is 1, then the DATA is this many integers (of the size and endianness given by the above byte). You would use this if you want to adjust a number of fields each by their own amount (without needing separate blocks for each of them). Both of these methods must be supported. 4. DATA The integer(s) themselves. If the METHOD FLAG is zero, this is one integer; otherwise the number of integers is given by DATA AMOUNT. The integers are stored in the same byte order as specified in the flags byte, above. ------------------------------------------------------------------------------- ------------------------- TROT: Shifting and rotation ------------------------- ------------------------------------------------------------------------------- This performs shifting of rotation of bits within bytes (or, more precisely, byte groups). This is not required to be supported. 1. TARGET ADDRESS (Address type) The offset into the target data at which to begin the operation. 2. PARAMETERS (one byte) -----XXX : The number of bytes in each group (minus one): bits can be shift- ed from one byte to another within a given group. 000 = 1 byte 001 = 2 bytes 010 = 3 bytes 011 = 4 bytes 100 = 5 bytes 101 = 6 bytes 110 = 7 bytes 111 = 8 bytes Everything up to 4 bytes must be supported. Those are also the only allowed values, unless the patch uses 63-bit addressing, in which case any of those values are allowed. ---X---- : Whether to shift/rotate left (0) or right (1). "Left" means multi- plication by 2; "right" means division by 2. --X----- : Whether to shift (0) or rotate (1). If a bit is rotated off the end of a group, it reappears at the other end; if shifted, it is discarded. -X------ : What to shift "in" for shift operations. (This is unnecessary for rotations: the bit shifted in is the one that was just shifted out.) X------- : Whether multi-byte groups are to be considered little endian (0) or big endian (1). When a bit is shifted or rotated off the end of a byte, this flag controls what byte it will go to. This is ignored if the group size (see above) is 1 byte. 3. GROUP COUNT (Length type) The number of groups in question. 4. SHIFT AMOUNT (one byte) The number of bits to shift or rotated in the given manner. This must be less than: {number of bytes in a group} * 8. ------------------------------------------------------------------------------- ------------------------ TFSM: Data size modification ------------------------- ------------------------------------------------------------------------------- This changes the size of the target data. No chunk is allowed to read, write, or otherwise address anything past the end of the target data. If that data needs to be increased in size, a TFSM chunk will make it so. This is not required to be supported. +--------------------------------------------------------------------------- | HISTORICAL NOTE (1) | |---------------------+ | | Prior to the 2006-01-22 edition of XRIF, it *was* required, except in sit- | uations where it was not possible to change the target data size (for any | reason). It is for this reason that the TFSM flag in the XRIF header is in | an "advisory" byte rather than a "mandatory" one, even though it ought to | be in the latter. | +--------------------------------------------------------------------------- The chunk consists of a [00], followed by a value of the Address type, which specifies the new size of the target data. +--------------------------------------------------------------------------- | HISTORICAL NOTE (2) | |---------------------+ | | Actually, the byte before the SIZE field can take on one of these values: | | [00] = the target data is set to exactly SIZE bytes (as above). | [01] = the target data is increased by SIZE bytes. | [02] = the target data is decreased by SIZE bytes. | | But, as of the 2008-01-10 edition of XRIF, the relative size methods are | formally deprecated: new patches should not use them. They would only be | useful if the patch could do things such as addressing N bytes *from the | end of the data*, which, at the moment, it can't. Therefore, the only use- | ful method is to explicitly state the exact final size. | +--------------------------------------------------------------------------- If the data is increased in size, [00]s are added to the end to fill out the new space. If the data is decreased in size, data is truncated off the end. The hard, format-imposed upper limit on the target size is whatever can be ex- pressed with an Address type field: * If the patch uses 16-bit addressing, this is 64 KB - 1 = 0xFFFF * If the patch uses 31-bit addressing, this is 2 GB - 1 = 0x7FFFFFFF * If the patch uses 32-bit addressing, this is 4 GB - 1 = 0xFFFFFFFF * If the patch uses 63-bit addressing, this is 8 EB - 1 = 0x7FFFFFFFFFFFFFFF It is up to the patcher to determine and respect the platform/filesystem file size limit in addition to the above rules. If such limits are surpassed even if the above, format-imposed limits are not, the patcher should do its best to han- dle the situation gracefully. As of the 2008-01-10 edition of XRIF, it is required-- in the sense of "can be assumed to be widely supported"-- that if TFSM is present, it is used before any payload chunks (such as TREP). This is so a program can check what the req- uested size will be, and see if it's possible to carry out, before it starts changing the target data. =============================== META-DATA CHUNKS ============================== ------------------------------------------------------------------------------- ------------------------------- TCRC: Checksums ------------------------------- ------------------------------------------------------------------------------- This records one or more checksums of the intended target data, computed at the time the patch was generated. The patcher can compute the checksum of the actual target data (when it goes to apply the patch) and compare it to this: if they match, it's reasonable to assume the patch is being applied to the cor- rect, intended target data. It is not required for patchers to implement this. Multiple checksums can be recorded in a single TCRC chunk (for example, if a number of different, yet known, versions of the target data are compatible with the patch). If there is more than one checksum in a chunk, then if *any* of them passes, the target data is considered to have passed the CRC check. If there are no TCRC chunks at all, the target data passes unconditionally. +--------------------------------------------------------------------------- | HISTORICAL NOTE | |-----------------+ | | Prior to the 2008-01-10 edition of XRIF, multiple checksums were done by | recording each in its own separate TCRC chunk, but this behavior has been | deprecated in favor of recording them all in a single chunk, so that an | implementation can deal with them all and pass judgement in a single oper- | ation, rather than having to keep track of prior results and possibly hav- | ing to pre-scan the patch to know how many checksums there are. | +--------------------------------------------------------------------------- +--------------------------------------------------------------------------- | AUTO SELECTION | |----------------+ | | TCRC can be used in tandem with VARY to auto-select a variation, based on | which variation is the first to pass the CRC check. | +--------------------------------------------------------------------------- The actual CRC algorithm is Adler-32. The only difference is that the result is recorded in little endian order in the TCRC chunk (for consistency with the rest of the XRIF format), rather than big endian as suggested by the Adler-32 specification. The algorithm works like so: let there be two counters, call them "A" and "B". "A" starts with the value 1, "B" with 0. For each byte of the target data, add that byte to "A", then add the new value of "A" to "B". When done with the tar- get data, take both values modulo 65521 (that is, the remainder of dividing them by 65521); the value of "A" forms the lower 16 bits of the result, "B" the high 16 bits, yielding a 32-bit value: this value is then inscribed (in little endian byte order) into the TCRC chunk. Here is some sample code in the C language that will do this. I wrote this my- self; for the purposes of licensing, consider this to be in the public domain: uint32_t adler32 ( unsigned char * data, unsigned int length ) { uint32_t a = 1, b = 0; uint16_t counter = 5803; while (length--) { if (!counter--) { a %= 65521; b %= 65521; } a += *(data++); b += a; } a %= 65521; b %= 65521; return (b << 16) | a; } Here, "uint32_t" means an unsigned 32-bit datatype; "uint16_t" means an unsign- ed 16-bit datatype. Both of these are available if you #include , or you can replace them with whatever is appropriate for your system, if that head- er is unavailable. The value 65521 is the highest prime number less than 65536; the value 5803 is the maximum number of bytes that can be safely calculated before there is a pos- sibility that "B" will overflow its datatype. So, after doing that many iter- ations, the code will modulo both counters in order to be safe (and reset the iteration counter to 5802 instead of 5803, because the next iteration is per- formed immediately afterwards). To point of all this is to delay the modulo operation as long as possible, to avoid doing it on *every* iteration. +--------------------------------------------------------------------------- | FOR DIFFERENT DATATYPES | |-------------------------+ | | If A and B are implemented as *signed* 32-bit datatypes, then the maximum | number of safe iterations (before B could surpass 0x7FFFFFFF) is 4103 | rather than 5803, so code accordingly. | | Conversely, if A and B are implemented as unsigned 64-bit datatypes, then | the value becomes 380368696 iterations. | +--------------------------------------------------------------------------- +--------------------------------------------------------------------------- | SECURITY | |----------+ | | This checksum algorithm is not meant as a security measure, but as a con- | venience measure. It's meant to catch the user applying the patch to the | wrong game entirely, or to the wrong version of it (such as a prior hack), | not to prevent someone composing a file that has the same checksum by this | algorithm and then tricking the user into applying the patch to that. Such | security measures are beyond the scope of XRIF. | Adler-32 was chosen over more comprehensive algorithms (such as MD5) be- | cause of the above, and because it is much easier to understand and imple- | ment. | +--------------------------------------------------------------------------- ------------------------------------------------------------------------------- --------------------------- VARY: variation marker ---------------------------- ------------------------------------------------------------------------------- XRIF is not limited to a single patch for a single target file: "variations" al- low for separate sets of patch data to be conveyed in a single XRIF file, while allowing data common to all variations to be marked as such. This can be used for: * Multiple versions of a work conveyed in the patch: for example, an upgrade to an existing work could be distributed as a patch that can either modify an un- modified target file to the full version, or to upgrade a target file that has already been patched with the previous version of the work. * Different versions of the same target file can be supported. It is not required for patchers to support variations; for this reason, a flag in the XRIF header indicates whether they're used. If they are supported and used, some other flags will give parameters for the handling of the variations. The VARY chunk marks the beginning of a variation, which encompasses everything until the next VARY chunk (or the end of the file). If there is data (TREP, ...) before the first VARY, then that data is considered part of the "common segment" and forms a common part of all the variations in the file. +--------------------------------------------------------------------------- | ON THE COMMON SEGMENT | |-----------------------+ | | Whether the common segment can be selected as a variation in its own right | is determined by a flag in the XRIF header. By default, it *is*, in order | to allow patchers to handle variationless patches in the same manner as | variable patches. | | Another flag determines whether the common segment is to be applied before | or after any selected regular variation. Most of the time it'll be applied | first, but, for example for the full-and-upgrade example given above, the | common segment would contain the part that upgrades the original work, and | so would be applied *after* the part that defines the original work. | +--------------------------------------------------------------------------- The VARY chunk itself consists of an identifier (for the purpose of selecting a variation by name, using a command-line tool for example) and one or more desc- riptions (each with a locale declaration) that describe it. There are two ways to encode this data in the VARY chunk; which one is used is selected by a flag in the XRIF header. The default format is straightforward: the Identifier is limited to 15 characters in length (not including the nul- terminator), and each Description consists of a locale declaration (as for ILOC) followed by a [0A] character, followed by the description itself. The alternate format has the Identifier field fixed to exactly 16 bytes long with all unused bytes set to [00], of which there must be at least one, which serves as a nul-terminator (thus making the 16th byte of the chunk contents al- ways be [00]), and each Description string is prefixed by a 16-bit integer giv- ing its length (including the locale and nul-terminator). The idea behind this is to make it easier to parse for a computer program. This format became avail- able in the 2006-05-01 edition of XRIF. The Description field is meant only as a short title for the purposes of distin- guishing one variation from another. More detailed "readme" text can be done using the usual annotation features, described below. It is not required for patchers to support both formats. It is allowable for a patcher to support just the alternate format without supporting the original. There is an arbitrary limit of 255 variations (not including the Common Segment) in a single XRIF file. As of the 2006-05-01 edition of XRIF, each variation can be independently selected, and multiple variations selected at once, with the appropriate flag set in the header. +---------------------------------------------------------------------------- | ON TCRC AND ANNOTATION | |------------------------+ | | Each variation can have its own set of TCRC and annotation chunks. | | A patcher could use each variation's TCRC to auto-select a variation based | on which is the first to pass the TCRC check. The patcher should not, how- | ever, actually *apply* the patch automatically unless requested to; but it | can certainly highlight it or make it the default variation if none other | is selected by the user. | | Annotation present in the Common Segment applies to the patch as a whole; | annotation in an individual Variation applies only to that Variation. | +---------------------------------------------------------------------------- ------------------------------------------------------------------------------- ------------------------- Ixxx: Simplified annotation ------------------------- ------------------------------------------------------------------------------- The 2006-03-31 edition of XRIF adds a simplified way to add annotation. Al- though the INFO chunk is very expressive and powerful, it is a bit much to proc- ess, and is quite different in structure than the rest of the XRIF format. The simplified annotation chunks contain a basic set of annotation abilities, wich each entry given in a separate chunk, as listed in the table below. The chunk contents consist of a nul-terminated Unicode text stream, encoded using UTF-8. It may consist of any valid Unicode text except U+0000 (which would be interpreted as the nul terminator). +---------------------------------------------------------------------------- | UNICODE? | |----------+ | | Unicode is a "character repertoire", a set of characters encompassing all | major writing systems in the world, and not a few minor ones. These charac- | ters are identified by "codepoints", given in the form "U+NNNN", where NNNN | is the codepoint, expressed in base 16 (hexadecimal). Codepoints can range | from U+0000 to U+10FFFF. | | A codepoint is a rather abstract concept. There are several "encoding forms" | which are used to represent Unicode text as a sequence of bytes for compu- | ter use. Foremost among these is UTF-8, which encodes each codepoint as a | sequence of 1 to 4 bytes, wich codepoints in the U+0000 to U+007F range | (that is, US-ASCII characters) encoded as a single byte, in the same way as | US-ASCII itself, which makes it much more palatable to existing computer | systems. | | Many people, when they think of "Unicode", they think of UTF-16, which is | another encoding form of Unicode, that encodes each codepoint as one or two | 16-bit values (codepoints U+10000 and above require two; the rest only one). | | UTF-8 and UTF-16 are both different ways to represent the *same thing*: | Unicode codepoints. | | XRIF uses UTF-8 rather than UTF-16 because: | * Characters in US-ASCII range are represented exactly the same as in US- | ASCII itself. This makes it much easier to "hand hack" text data. | * UTF-8 is more compact than UTF-16 for US-ASCII characters: only one byte | per character, rather than two. | * The nul terminator byte ([00]) will not occur in a valid UTF-8 bytestream | except to represent the U+0000 codepoint, which, for that reason, is not | allowed to be used. | | XRIF uses Unicode rather than, say, ISO-8859-1 because it can represent | scripts other than Latin, and all characters are unambiguous with respect | to similar encodings (see http://en.wikipedia.org/wiki/Mojibake). Rather | than having text specify the encoding used (which would require patchers to | detect and support all of them), I stuck with UTF-8 so there is only one | encoding that has to be dealt with. | | Links: | http://www.unicode.org/ | http://en.wikipedia.org/wiki/UTF-8 | +---------------------------------------------------------------------------- As of the 2008-01-10 edition of XRIF, these chunks are changed so that they start with a lower-case "i" instead of an upper-case one. Also, for future com- patibility, *all* chunks starting with a lower-case "i" are considered annota- tion, so that new annotation chunks can be defined in the future without having to give each of them their own flags. The lower-case versions are marked with an Advisory flag in the XRIF header (so that old patchers will accept them). The upper-case versions are deprecated. +-------------------------------------------------------------------------+ | iNAM | The complete name of the work, including subtitles, for example: | | | "Foo II: Revenge of Foo". | |-------------------------------------------------------------------------| | iCRE | The person who created the work. If there is more than one crea- | | | tor, each is given in a separate iCRE chunk. If desired, this | | | can be followed by a parenthetical comment explaining their role | | | in the creation process, for example, "Whoever (Translator)". | |-------------------------------------------------------------------------| | iGRP | The group that created, directed and/or published the work; for | | | example, "Yoyodyne Translations". There can be more than one, in | | | which case each is given in a separate iGRP chunk. | |-------------------------------------------------------------------------| | iTHX | A person or group or such that the creators wish to thank. This | | | implies they weren't directly involved in the creation of the | | | work, though (otherwise they would be in iCRE or iGRP). It's sug-| | | gested that, if desired, the name of the person/whatever being | | | thanked be followed by a parenthesized comment explaining why; | | | for example, "Whoever (advice and moral support)". | |-------------------------------------------------------------------------| | iTAR | Specifies the name of the target data (for example, the name of | | | the game being modified). This can be a filename (if the file is | | | expected to have a specific filename), or an arbitrary label for | | | human consideration. Extensions might define other uses for this;| | | XRIF+NES for example expects a GoodNES name here. | |-------------------------------------------------------------------------| | iDAT | The date on which the work or patch was finished or published, | | | or some other date as the patch creator sees fit. Unlike the | | | other annotation chunks, this is in a binary format, four bytes | | | in the following order: | | | | | | 1. A 16-bit value giving the YEAR. (for example, 2007) | | | 2. An 8-bit value giving the MONTH. 01=January, 02=February, ... | | | 3. An 8-bit value giving the DAY of the month, starting at 01 | | | | | | For example: | | | | | | +---------- Year (0x07D7 = 2007) | | | | +------ Month 11 (November) | | | _|_ | +--- Day of the month (20) | | | / \ | | | | | D7 07 0B 14 | | | | | | The date is in the Gregorian calendar. | |-------------------------------------------------------------------------| | iVER | The version number of the work, in an arbitrary text format such | | | as "1.0", "Beta 3", whatever is appropriate. | |-------------------------------------------------------------------------| | iTXT | General-purpose "readme" text, such as an overview of what the | | | work is for, how to use it, and so on, whatever is deemed fit. | | | | | | This is plain text, but should be formatted under the following | | | assumptions: | | | * It is displayed with a fixed-width (monospace) font | | | * The display area is 80 columns wide | | | (This document you're reading now is formatted in that way) | | | | | | Also, U+000A is the only acceptable end-of-line sequence; it is | | | up to the patcher to translate this to the local line-ending con-| | | ventions (such as [0D 0A] for Microsoft platforms) if necessary. | |-------------------------------------------------------------------------| | iURL | A website address (or some other thing that can be expressed as | | | a URL) for the last mentioned iNAM (that is, the whole work), | | | iCRE, iGRP or iTAR. If mentioned before any of these, then they | | | apply to iNAM. | | | This must be a valid absolute URL in accordance with RFC 2396 | | | (see: http://www.ietf.org/rfc/rfc2396.txt). | |-------------------------------------------------------------------------| | iADR | As for iURL, but specifies an e-mail address. | |-------------------------------------------------------------------------| | iIRC | As for iURL, but specifies an IRC channel: the name of the ser- | | | ver, then a space, then a list of one or more channel names, sep-| | | arated by commas. For example: | | | "irc.example.com #yoyodyne,#haxign" | | | The server name is preferably a canonical name for the network; | | | a specific server (like "bumfuck.ia.us.irc.example.com") should | | | not be mentioned unless necessary. | |-------------------------------------------------------------------------| | iGEN | The name of the patch generator (the program that created the | | | patch). This is provided to give patch generator authors a place | | | to insert the program's name into a patch in a non-intrusive way.| |-------------------------------------------------------------------------| | iAUT | The person who created the patch itself. (This is basically the | | | human version of iGEN.) | |-------------------------------------------------------------------------| | iLOC | Contains a short string specifying the locale for all subsequent | | | annotation chunks; for example, "en-US" to represent American | | | English. (This also affects INFO and VARY chunks that have an | | | empty locale field.) Locale support is not required to be sup- | | | ported; if it isn't, the implementation shall at least recognize | | | (that is, not complain about seeing) the iLOC chunk and ignore | | | it. | | | If other annotation chunks are given before iLOC, they are as- | | | sumed to be in the "en" locale (that is, in non-region-specific | | | English). | +-------------------------------------------------------------------------+ You can include more than one set of these chunks, separated by an iLOC chunk to introduce a new locale, but unless the implementation supports using iLOC in this manner, it can ignore any annotation chunks after an iLOC (unless the iLOC came before any other annotation chunks). For example, suppose there is an iNAM chunk, in English, followed by an iLOC declaring the "de" (German) locale, followed by another iNAM containing the title in German. In this case, unless the implementation supports multiple loc- ales in one patch, only the first one (in this case, the one in English) is dealt with. The use of these chunks is indicated with a flag in the XRIF header. New chunks should use the lower-case-"i" versions, in which case only that flag needs to be set; you don't need to set both the original and lower-case annotation flags. Older patches, however, might use the upper-case versions (with the attendant flag in the header), so patchers should, but are not required to, be prepared to handle them. ------------------------------------------------------------------------------- ------------------------ INFO: Consolidated annotation ------------------------ ------------------------------------------------------------------------------- This was the original XRIF annotation method: one chunk containing all the anno- tation, in NAME=VALUE format. This is semi-deprecated in favor of the simpli- fied annotation chunks (iNAM, iCRE, ...), since those are adequate and much less complex. However, there's nothing wrong with a patch using both methods; if a patch uses both and a patcher recognizes both, INFO takes precedence over the simplified annotation chunks. As with VARY, there are two different ways to encode this data. In the default format, the chunk begins with a nul-terminated locale declara- tion, which applies to the entire chunk. This is followed by the "entries" that comprise the actual content, each being in a NAME=VALUE format. The NAME is in US-ASCII, consisting only of upper-case letters, digits, underscores ("_") and dashes ("-"), at a maximum of 64 characters in length, including the "=" which separates it from the Value part. Also, it may not start with a dash. The VALUE part is in UTF-8 as for other annotation, and is nul-terminated. The alternate format works the same way, except the locale declaration and each entry are prefixed by a 16-bit integer giving its length (including the nul- terminator), to make it easier for programs to parse. This format became avail- able in the 2006-05-01 edition of XRIF. Either way, an entire entry, including name, "=", value, and nul-terminator, is limited to 65536 bytes in length. A given NAME may occur more than once, in which case they are taken to be mem- bers of a list. For example, if there is more than one CREATOR, each gets their own CREATOR entry (plus CREATOR:LINK and so forth). Some names can be "attached" to some other name, to convey some subset of infor- mation about the subject. For example, a CREATOR entry gives the name of the creator of the work; one could then associate, for example, a website address with them by using a CREATOR:LINK entry. As that suggests, ":" is used for this purpose. Names which are not "attached" to anything are assumed to be "attached" to the work conveyed in the patch. ------------ STANDARD ENTRY NAMES ----------- The NAME field can be any arbitrary string fitting the above constraints, but for the purposes of interoperability, a few names are defined here with stand- ard semantic properties; these should be used in preference to other labels, if they are adequate. TITLE The primary title of the work. Equivalent to the iNAM chunk. LOCALE The locale (language) of the work. If the work is a translation, this entry refers to the destination language. (The source language is given with TARGET:LOCALE.) LINK A website address-- or some other thing that can be expressed as a URL-- to associate with whatever this label is attached to. Equivalent to iURL. EMAIL An e-mail address to associate with whatever this is attached to. Equival- ent to iADR. IRC An IRC server and channel list to associate with whatever this is attach- ed to. Equivalent to iIRC. VERSION The version number of the work, in an arbitrary text format such as "1.0" or "RC3", whatever is appropriate. It is suggested that numeric version numbers (such as "1.0") omit the "v" at the beginning. Equivalent to iVER. CREATOR The (or A) person responsible for having actually created the work. If there is more than one such person, each gets their own CREATOR entry. Equivalent to iCRE. CREATOR:ROLE A description of the role the last mentioned CREATOR played in the creat- ion of the work. This can be arbitrary, but it is recommended that one of the following labels be used if one of them is appropriate: Hacker Translator Designer (for example, storyline designer) Writer (as of dialogue and such) Programmer (as for purpose-built utilities and such) Composer (as of music) GROUP The organization for which the work was created and/or by which it was released, for example "Yoyodyne Translations". Equivalent to iGRP. THANKS A person or group or such to which the creators wish to offer thanks. Being mentioned in this implies they weren't directly involved in the cre- ation of the work (otherwise they'd be listed as a CREATOR or such), but whom the creators nevertheless wish to give credit. Equivalent to iTHX. THANKS:ROLE Equivalent to CREATOR:ROLE but applies to the last THANKS. Beta-testers, if listed, should be listed using THANKS instead of CREATOR, and for THANKS:ROLE, should be created as "Beta-tester". TYPE A description of the intent of the work in the context of being a modifi- cation rather than a work unto itself. This can be arbitrary, but it is recommended that one of the following labels be used if one of them is appropriate: Hack Translation Update Bugfix TARGET The name of the intended target file. Equivalent to iTAR. TARGET:VERSION A version number thereof. TARGET:LOCALE Its locale. If the work is a translation, this indicates the source lang- uage. (The destination language is given with LOCALE.) DATE Equivalent to iDAT, except it's in text form, in the following notation: YYYY-MM-DD For example, January 10, 2008 becomes "2008-01-10". README Equivalent to iTXT. README:FONT-FAMILY Specifies the general type of font recommended for use with the README text. This must be exactly one of "serif", "sans serif" or "monospace". This does NOT specify a specific font, only the font TYPE: if it's spec- ified as anything other than those listed, it is ignored. It is up to the patcher to associate specific fonts with these general font types; it is recommended the patcher allow the user to select them. For example, on MS Windows systems, it will probably work well to start out with Times New Roman for the "serif" type, Arial for the "sans serif" type, and Courier New for the "monospace" type. There is no default: if this isn't specified, then the text of README should not make any assumptions about the font type. For example, if the README text contains ASCII diagrams or something, then it should specify a monospace font. It's not required for patchers to distinguish between "serif" and "sans serif". README:TABSPACING The spacing of tabs in the text: if a U+0009 character is encountered, the "cursor" (at which the next character will be written) is advanced to the next multiple of this many characters. The default is 8. README:COLORS This consists of two RGB color values in the form "RRGGBB", where RR is red, GG green, BB blue, each expressed in capitalized hexadecimal. The two colors are separated by a comma. The first color specifies the text color, the second specifies the back- ground color. These apply to the README text. Patchers are free to disregard this, including being configured to do so by the end user. PATCH:DATE The date on which the patch itself was created, which might not necessar- ily be the same as the date on which the work was completed. PATCH:CREATOR The person who created the patch itself. Equivalent to iAUT. PATCH:GENERATOR The program that generated the patch. This is provided so programs can inscribe their name in the patch in a supported way (avoiding "DiskDude!" type crap). Equivalent to iGEN. This list is not exhaustive, but you can mix-and-match attachments; for example, "GROUP:IRC" or "PATCH:GENERATOR:VERSION" (the latter demonstrating multiple at- tachments). =========================== OUT-OF-BAND DATA CHUNKS =========================== These chunks represent out-of-band data, relating to the patch itself. The XRIF and XAPP chunks logically fall under this group, but the labels of other chunks in this group all start with "." as their first character. Extensions are not allowed to redefine or remove these, or to add their own. ------------------------------------------------------------------------------- -------------------------- .END: End-of-patch marker -------------------------- ------------------------------------------------------------------------------- This chunk marks the end of the patch, and is required for this purpose in every XRIF file. Patchers shall ignore everything in the file past the end of this chunk. The chunk itself has no content. ------------------------------------------------------------------------------- --------------------- .SYS: Implementation-specific data ---------------------- ------------------------------------------------------------------------------- This chunk conveys program-specific information in a program-specific format. The chunk starts out with a nul-terminated identifier, 32 characters maximum (including the nul-terminator) that identifies the program in question; other programs use this to ignore .SYS chunks other than their own. After this identi- fier is data in any format the program may specify, and the interpretation of it is left to that program. An example of how this chunk might be used: a patch generator might record the local filenames of the two files from which the patch was created, as well as any relevant program options used in the process. Then, if the patch creator updated the hack, they might then open the patch in that program and tell it to re-create the patch, which it could then do without the user having to re-con- figure anything. It might even automatically create an "upgrade" version using variations. It is not required for any program to use this chunk, and it's allowed for pro- grams to look in other programs' .SYS chunks if it wants. ------------------------------------------------------------------------------- -------------------------- .PAD: Padding or comments -------------------------- ------------------------------------------------------------------------------- The purpose of this chunk is to take up space, such as to align the following chunk so its contents start on a 0x10-byte boundary, or to delete a chunk from a file without rearranging subsequent chunks, or whatever. The contents are ignored. However, it's suggested that, if the chunk does not start with [00] or [20], then it might be interpreted as an inline comment; but it's certainly not required to do so. Basically, there will not be any harm in ignoring any .PAD chunks you encounter. They have no effect on how the patch operates. ------------------------------------------------------------------------------- -------------------------- .AKA: Chunk name aliasing -------------------------- ------------------------------------------------------------------------------- This chunk was created (as of the 2006-01-22 edition of XRIF) to address a pot- ential problem with Extensions. Suppose, hypothetically speaking: * An Extension is created, which defines a new chunk, say FROB. * A later edition of XRIF specifies a chunk named FROB with a different meaning. Precedence, in this case, is not an issue: if the Extension is used, then FROB chunks would be interpreted as the Extension's and not XRIF's. But in such a case, the patch could not make use of XRIF's FROB. To address this problem, an .AKA chunk can be used to assign an "alias" to the base XRIF version's FROB chunk, so that it is known by another name. Supposing for example that it is aliased as "BARF", then instances of "FROB" chunks refer to the Extension's FROB, and instances of "BARF" chunks refer to the base XRIF version's FROB. The .AKA chunk itself consists of nine bytes: two chunk names (each four bytes), and a byte identifying the Extension layer. The first chunk name is the name of the chunk being aliased. The second chunk name is the alias itself. The byte at the end indicates which Extension layer the first chunk name refers to: layer 0x00 refers to the base XRIF version, 0x01 refers to the Extension given in the first XAPP, and so forth. In the exam- ple above, you want to create an alias for the base XRIF version's FROB, so the Extension layer would be 0x00. It is an error if any of the following are true: * The chunk name to be aliased is not defined by the specified layer's format. * The chunk name to be aliased is, itself, an alias. * The chunk name to be aliased already has an alias. * The chunk name to be aliased is an out-of-band chunk (starting with "."), or is either the XRIF or XAPP chunks. * The alias name is not a valid chunk name, format-wise. * The alias name refers to an existing chunk name or alias. (But see note below) * The Extension layer is the highest layer or does not exist. (Thus, it is not allowed-- or necessary-- to use .AKA in Generic XRIF files.) If the chosen alias refers to a chunk that is defined by the layer but requires a flag in its XRIF/XAPP header in order to be used, and that flag isn't set, then it's a legal alias. This is to allow for future updates to XRIF and/or the Extension in question to add new chunks: otherwise you might define an alias name for your patch, then a later edition of XRIF or the Extension is published wich specifies a chunk that happens to use the same name as the alias you chose; and then a program is written against the new edition, and suddenly the one-legal alias name in the already-created patch will have been clobbered: which would be ironic given the purpose of the .AKA chunk. So, if XRIF or the Extension adds a new chunk, then it would have to be invoked by a flag in its header, so that old patches will continue working with new programs. As an example: if the patch does not use, say, TADD (according to the header), then "TADD" is a legal alias name. Once a valid alias is created, then the alias becomes a legal chunk name for the remainder of the patch, and to use that alias is equivalent to using the chunk that was aliased. Being itself an addition to XRIF, the .AKA chunk has a flag in the XRIF header indicating whether it's used, and it is not required to be supported (in fact, when dealing with Generic XRIF files, it's prohibited to support it). ============================= MISCELLANEOUS TOPICS ============================ -------------------------- FILESYSTEM CONSIDERATIONS -------------------------- It is recommended that XRIF files have the filename extension ".xrif". For file- systems (such as that of MS-DOS) that do not allow filename extensions longer than three characters, then ".xri" shall be used, because this is what would be produced by automatically truncating ".xrif" to three letters, as would be done by MS Windows (which supports longer filename extensions) for example. For Macintosh HFS filesystems, which include a four-character string describing the file type, "XRIF" (if anything) is recommended for this purpose. The "res- ource fork" of the file shall be empty; the "data fork" shall contain an XRIF file as described in this document. --------------------------------- COMPRESSION --------------------------------- XRIF itself does not address the issue of data compression. This is expected to be handled externally, for example by gzip-compressing an XRIF file. If the tar- get file is compressed, it is up to the patcher and/or user to decompress it first so that it can be patched, and if necessary to recompress it afterwards. For example, this could be done by using a decompressor to pipe the uncompres- sed data to the patcher, which in turn can pipe the patched data to a recompres- sor, assuming of course that the patcher supports piping: zcat foo.gz | xrif - foo.xrif | gzip -c9 - > foo.gz Such a command would deal with compression external to XRIF itself, which means the patcher itself is not required to deal with compression. This, in turn, means that you can use any compression format for which suitable decompressors and compressors exist on your computer. The same considerations exist for the patch itself: zcat foo.xrif.gz | xrif foo.bin - There's nothing wrong with a patcher being equipped to do all this itself; but in any case, compression issues are expected to be dealt with externally to XRIF itself. -------------------------------- PRONUNCIATION -------------------------------- I usually pronounce the name of the format as "EKS-riff", and the standards of basic human decency compel you to do likewise. To do otherwise, especially on a Tuesday, will inexorably lead to Burnination. Consider yourself warned. --------------------- SUGGESTED IMPLEMENTATION APPROACHES --------------------- XRIF has a number of features that could be difficult for patch generators, without human intervention, to take full advantage of (such as block-copy oper- ations). Strictly speaking, such matters are beyond the scope of XRIF itself: this document only defines the file format itself, and it is left to implemen- tations to take full advantage of it. Having said that, I might offer some advice on certain details. BLOCK-COPY OPERATIONS The primary use of TCFR and TCFS is to make it practical to insert, remove, or otherwise move data around, by copying the moved data to its new location be- fore performing other modifications. For some cases, such as the NES, they could be used on a smaller scale, for dealing with pointers. In many games, there might be an array of data (such as dialogue text) where each item is variable in length and marked by a terminator of some sort. There will be a set of pointers nearby, which record the starting address of each item in the array, so that it can be found quickly without need- ing to scan the data from within the game itself. Suppose you change one of the array entries, and in the process, you increase its length. All subsequent array entries would have to be moved forward to make room, but otherwise they would not be changed. For a patcher to go in and detect which ones were changed and which were simply moved, without any context (such as knowing this to be, in fact, a set of point- ers and their variable-sized array entries), could be a difficult task, but it would need to be done in order to use TCFR/TCFS operations to move unchanged strings, which is more efficient than hard-wiring the changes with simple-mind- ed TREP operations and conveys less of the original data in the patch itself. Now, one approach might be to tell the patch generator just where such data structures exist: then at least it knows where to look. It can analyze the data (when it might not have analyzed the data before), and see which array entries were merely moved but otherwise unchanged, and insert the appropriate copy oper- ations, followed by TREP to implement actual changes in the contents, and not merely positions, of the other entries. (As an aside, it might be easier to use TCFS, as opposed to TCFR, to do that: with TCFR, the patch generator would be tasked with trying to juggle array en- tries around without clobbering data that needs to be copied later. Even if it could be done, it would be difficult to write a program to do it. Instead, the program can just use TCFS and not have to worry about such things. It is for precisely that reason that TCFS support is required, and not merely optional, for the XRIF+NES Extension.) For much the same reasons: if you insert data into the middle of a file, it would be more effective if the patch creator *told* the patch generator where these things had occured, and it can insert the appropriate copy rules before- hand, rather than trying to compare every byte sequence to every other byte seq- uence, and apply heuristics, to determine where such deliberate data shifts had occured; or taking the crude approach of comparing the shifted data with the un- related data that used to be at that file location (as must be done with IPS). HANDLING UNICODE For graphical programs, it is suggested that annotation and VARY descriptions be handed off to the window system, which is probably already well-equipped to handle Unicode text. If you have a means to, then pass the locale information along as well (for example by writing , if using an HTML engine for the purpose). If you must perform text processing by yourself (such as if writing a terminal- based program), you might do any of the following: * Handling only US-ASCII range codepoints (U+0000 to U+007F), displaying the rest with placeholder text. Just be sure to indicate the number of characters correctly: codepoints beyond US-ASCII range take up multiple bytes apiece. This isn't very attractive, but is about as simple as it gets, and will be sufficient for English text. * Handling only ISO 8859-1 range codepoints (U+0000 to U+00FF), displaying the rest with placeholder text. This might be done under the assumption that the display environment expects text encoded in ISO-8859-1. This isn't attractive either, but will suffice for most Western European languages. * Transliterating characters into US-ASCII or ISO-8859-1 versions; for example representing the EM DASH character as "--", curly quotes with straight quotes, accented characters with the nearest ISO-8859-1 versions, or even transcrib- ing non-Latin writing systems (such as Greek or Katakana) into a Latin approx- imation thereof. HANDLING ADDRESS/LENGTH FIELDS If you are using the C programming language-- or some other language that can use function pointers or otherwise "assign" a function or method call to a vari- able-- you might handle loading block values (addresses and length) by using function pointers. Just as an example, you could define the functions/methods readint8(), readint16(), readint32() and so forth. These would all return the largest data- type your program supports (64-bit if your program supports that, 32-bit other- wise). Then, have two function pointers, say read_address and read_length, and assign to them one of the above functions depending on the contents of the XRIF header, such as by using a switch() statement. Then, you can handle block values simply by invoking the relevant function pointer, without having to check which one to load every time. ==================== SUMMARY OF IMPLEMENTATION REQUIREMENTS =================== The following things are required to be supported; that is, patch creators are free to use these and can expect that they will be widely supported by XRIF implementations: Full, unsigned range of 8-bit and 16-bit integers. Positive signed range of 32-bit addresses (0 through 0x7FFFFFFF). XRIF (File header chunk) TREP (Replacement) TRLE (Repeated replacement) TCFR (Copy from result) .END (End-of-patch marker) .PAD (Padding/Comments) The following chunks can be ignored, but must at least be recognized without an error being reported or something: TCRC (Checksum) INFO and Simplified Annotation chunks .SYS (Implementation-specific data) The following are not required to be implemented at all. For all of these, the XRIF header will report, ahead of time, which ones are used: TCFS (Copy from source) If this is used, the patcher will need to make arrangements to keep the original, unmodified target data on hand throughout the patching process. This can be impractical, such as with large files, and some patchers might be designed or configured to operate directly on the target file rather than on a copy. TFSM (Data size change) This is optional as of the 2006-01-22 edition of XRIF. Prior to then, it was required to be supported, except in situations where it's impossible, for some reason, to resize the file. TADD, TIOR, TXOR (Bit twiddlers) These are all optional as of the 2006-01-22 edition of XRIF. Prior to then, they were required to be supported. Originally, their format was closer to that of TRLE; this was simplified to having the patch-side data be only a single byte. This simplification was done because, at the time, these chunks were required to be supported. The 2008-01-10 edition added an alternate, TREP-style format in order to allow for XOR-type patches (that use TXOR instead of TREP). This is not required to be supported either. In fact, it's permissible to support just one of the two to the exclusion of the other. TAND (Addition), TROT (Shifting/Rotation) These have always been optional on account of relative complexity. Variations (VARY) This has always been optional. .AKA (Chunk name alias) This is the first *new* chunk since the original 2006-01-03 edition of XRIF, which necessarily makes it non-required. However, even if it had been a part of XRIF from the very beginning, I don't think I would have made it required to be supported anyway. Integers greater than 0x7FFFFFFF Because not all filesystem libraries support addressing beyond this range, and some development environments make it difficult, if not impossible, to obtain datatype that can represent such integers anyway. Block lengths larger than 16-bit Optional as of the 2006-04-26 edition of XRIF. =========================== FORMAT REVISION HISTORY =========================== ------------------------------- JANUARY 3, 2006 ------------------------------- Initial publication of XRIF. ------------------------------- JANUARY 9, 2006 ------------------------------- Added "this patch uses TFSM" flag to the XRIF header. Changed TCFR to allow the source and target ranges to overlap. ------------------------------- JANUARY 18, 2006 ------------------------------ Changed TCFR and TCFS so their "Data length" field is of the Address type rath- er than the Length type. Added the rule that a TFSM operation may not set the file size to be anything larger than what a block address can specify. Changed TAND, TIOR and TXOR so the patch-side data is just one byte, instead of being in a TRLE-style format. ------------------------------- JANUARY 22, 2006 ------------------------------ Added .AKA chunk. TFSM, TAND, TIOR and TXOR are no longer required to be supported. -------------------------------- MARCH 31, 2006 ------------------------------- Added the Simplified Annotation Chunks. -------------------------------- APRIL 26, 2006 ------------------------------- The XRIF flag bytes now alternate between Advisory and Mandatory. Block lengths greater than 16-bit are no longer required to be supported. Added README:COLORS to INFO. --------------------------------- MAY 1, 2006 --------------------------------- Added alternate format of INFO and VARY, where each string is prepended with a 16-bit integer giving its length. ICRE, IGRP, IURL and IADR now contain only a single item (name, ...) per chunk, instead of possibly multiple values in a single chunk. IURL and IADR no longer have a label. Just an address. Added an Advisory flag stating that each of a patch's variations can be enabled and disabled independently, as opposed to selecting just one variation. (The idea for this came when contemplating having a patch full of bugfixes/tweaks that could be independently applied without making the user apply the patch sev- eral times, selecting a different one each time.) Imposed an arbitrary limit of 255 distinct variations, not including the Common Segment. I might have set this lower-- such as 15-- but in light of the above change (independent variations), a patch might have numerous bugfixes/tweaks to offer, so 255 seems reasonable. ------------------------------- JANUARY 10, 2008 ------------------------------ Added a Mandatory flag that would make Length-type fields with a value of zero be interpreted as meaning one higher than the maximum Length value. Added alternate format for TAND, TIOR and TXOR to give them a structure like that of TREP, chiefly to allow for XOR-style patches. Added Mandatory flag that invokes it. TFSM's relative-sizing methods are deprecated. Added requirement that TFSM, if used, be used before any Payload chunks. Changed TCRC to allow multiple checksums in a single chunk, instead of each being in a separate chunk. INFO is semi-deprecated, though THANKS and IRC entry names were added to the official list. Made all simplified annotation chunks start with a lower-case "i" instead of an upper-case one, and added an Advisory flag to that effect, and rendered the up- per-case versions deprecated. Added iTHX, iGEN, iAUT and iIRC annotation chunks. A TRLE block's range of effect (length x count) can't exceed 64 KB. =========================== NOTICE OF COPYING RIGHTS ========================== This document (as well as the XRIF format itself) was created by Vystrix Nexoth. However, this document, and the format it describes, are both committed to the public domain. This means you're free to distribute copies of this document, and to implement the format described, with no restrictions or requirements imposed on you by me. -- Vystrix Nexoth