File Pre-Processing

Introduction

Pre-Processing is used to alter a fixed-length files to help the paygate importer understand and read the file.

Pre-processing consists of a number of user defined rules that alter the incoming file. For example a rule might chop off the first 100 characters of a file or pad each line to a set length. The now ‘fixed’ file can now more easily be read into paygate using the standard fixed length importer.

When pre-Processing takes place

Pre-processing takes place after the file cleaning process.

Pre-Processing Order

The image below shows the path that the importer takes when importing a fixed length file.

Pre-Processing path

Rules are carried out in order, one at a time. After each rule is run the file is re-saved.

At the start of running each rule the file is re-loaded so that the re-loaded file contains all of the changes that the previous rules made.

Using the Pre-Processor

The Pre-processor can be found as part of the Fixed-Length Importer configuration. By default pre-processing is disabled. You enable it by clicking the ‘Enable Pre-processing’ switch.

Pre-Processing UI

Click ‘Add Rule’ to display the rule picker. Select a rule from the drop-down box to add the rule to the rule list. You can add as many rules as you like and the same rule can be added more than once. As previously mentioned, rules are run in order - top to bottom. Therefore the ordering of the rules is important. The ordering of rules can be changed if required using the grab handles.

Pre-Processing Rules

Replace Text

Replaces all instances of a matching fragment of text within the entire file.

Parameters

Old The text within the file that will be replaced. Note: text is case sensitive.

New The text that will be used as a replacement.


Remove First Characters

Removes a set number of characters from the beginning of a file.

Parameters

Number The number of characters to remove from the beginning of the file.


Remove Last Characters

Removes a set number of characters from the end of a file.

Parameters

Number The number of characters to remove from the end of the file.


Remove Between

Removes a section of text between two positions within the file.

Parameters

Start The (zero based) location within the document that marks the start of text that will be removed from the file.

Finish The (zero based) location within the document that marks the end of text that will be removed from the file.

For example: A Start value of 5 and a Finish value of 10 will remove 5 characters from the file from positions 5 to 10.


Slice File

Slices a file with no row delimiters into equal length rows.

Parameters

Length The length of each slice.

For example. Lets say your file contains the following text: ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ

Slicing this text using a value of 26 will result in the following: ABCDEFGHIJKLMNOPQRSTUVWXYZ ABCDEFGHIJKLMNOPQRSTUVWXYZ ABCDEFGHIJKLMNOPQRSTUVWXYZ ABCDEFGHIJKLMNOPQRSTUVWXYZ ABCDEFGHIJKLMNOPQRSTUVWXYZ

At the end of the slice process the long length of text is split into rows of length 26 characters. Each row is terminated with the standard Windows row termination of CR LF (carriage Return, Line Feed)


Remove Line if Starts With

If the file is split into separate rows, this rule can selectively remove entire rows if the row starts with a particular text value.

Parameters

Text The text to use to determine if the row should be removed.

For example: let’s say we had a file that contains the following:

ABCDEFGHIJKLMNOPQRSTUVWXYZ ABCDEFGHIJKLMNOPQRSTUVWXYZ XXXDEFGHIJKLMNOPQRSTUVWXYZ ABCDEFGHIJKLMNOPQRSTUVWXYZ XXXDEFGHIJKLMNOPQRSTUVWXYZ

If we used a text value of XXX, the third and fifth rows would be removed from the file leaving:

ABCDEFGHIJKLMNOPQRSTUVWXYZ ABCDEFGHIJKLMNOPQRSTUVWXYZ ABCDEFGHIJKLMNOPQRSTUVWXYZ


Remove Line if Contains

If the file is split into separate rows, this rule can selectively remove entire rows if the row contains a particular text value.

Parameters

Text The text to use to determine if the row should be removed.

For example: let’s say we had a file that contains the following:

ABCDEFGHIJKLMNXXXRSTUVWXYZ ABCDEFGHIXXLXNOPQRSTUVWXYZ ABCDEXXXXXXLMNOPQRSTUVWXYZ ABCDEFGHIJKLMNXXXRSTUVWXYZ ABCDEFGHIJKLMNOPQRSTUVWXYZ

If we used a text value of XXX, the first, third and fourth rows would be removed from the file leaving:

ABCDEFGHIXXLXNOPQRSTUVWXYZ ABCDEFGHIJKLMNOPQRSTUVWXYZ

Note the second row remains because although it contains three X’s, they do not exactly match the text XXX.


Insert at Position

Inserts a block of text into a specific position within the file

Parameters

Text The text to be inserted into the file.

Position The (zero based) position within the file where the text will be inserted.

For example: Say you have a file containing the following:

ABCDEFGHIXXLXNOPQRSTUVWXYZ

Being zero based, we start counting from the start at position zero. B is at position 1, C at 2, etc.

We use the text ‘XXXX’ and we insert this at position 6. The file will look as follows:

ABCDEFXXXXGHIXXLXNOPQRSTUVWXYZ

Note the number of characters has increased because the text was inserted but did not replace any existing text.


Pad Start of Lines

In a file that is split into delimited rows, this rule pads each row so that each row is at least a certain length.

Parameters

Pad Char The character that will be used to pad the row. The default is a whitespace (ASCII 32).

Length The length of the line each row will be padded out to. Note: Lines that are longer than the length will be ignored.

For Example: Say we have a file that contains the following:

ABCDEFGHIXXLXNOPQRSTUVWXYZ FGHIXXLXNOPQRSTUVWXYZ ABCDEFGHIXXLXNOPQRSTUVWXYZ CDEFGHIXXLXNOPQRSTUVWXYZ HIXXLXNOPQRSTUVWXYZ

We use the default pad character and a length of 26. The file will now look like this:

ABCDEFGHIXXLXNOPQRSTUVWXYZ FGHIXXLXNOPQRSTUVWXYZ ABCDEFGHIXXLXNOPQRSTUVWXYZ CDEFGHIXXLXNOPQRSTUVWXYZ HIXXLXNOPQRSTUVWXYZ


Pad End of Lines

In a file that is split into delimited rows, this rule pads each row so that each row is at least a certain length.

Parameters

Pad Char The character that will be used to pad the row. The default is a whitespace (ASCII 32).

Length The length of the line each row will be padded out to. Note: Lines that are longer than the length will be ignored.

For Example: Say we have a file that contains the following:

ABCDEFGHIXXLXNOPQRSTUVWXYZ FGHIXXLXNOPQRSTUVWXYZ ABCDEFGHIXXLXNOPQRSTUVWXYZ CDEFGHIXXLXNOPQRSTUVWXYZ HIXXLXNOPQRSTUVWXYZ

We use the pad character * and a length of 26. The file will now look like this:

ABCDEFGHIXXLXNOPQRSTUVWXYZ FGHIXXLXNOPQRSTUVWXYZ***** ABCDEFGHIXXLXNOPQRSTUVWXYZ CDEFGHIXXLXNOPQRSTUVWXYZ** HIXXLXNOPQRSTUVWXYZ*******


Truncate Start of Lines

In a file that is split into delimited rows, this rule truncates each row so that each row has a maximum length

Parameters

Length The length of the line each row will be truncated to. Note: Lines that are shorter than the length will be ignored.

For Example: Say we have a file that contains the following:

ABCDEFGHIXXLXNOPQRSTUVWXYZ FGHIXXLXNOPQRSTUVWXYZ ABCDEFGHIXXLXNOPQRSTUVWXYZ CDEFGHI HIXXLXNOPQRSTUVWXYZ

We use a length of 12. The file will now look like this:

OPQRSTUVWXYZ OPQRSTUVWXYZ OPQRSTUVWXYZ CDEFGHI OPQRSTUVWXYZ


Truncate End of Lines

In a file that is split into delimited rows, this rule truncates each row so that each row has a maximum length

Parameters

Length The length of the line each row will be truncated to. Note: Lines that are shorter than the length will be ignored.

For Example: Say we have a file that contains the following:

ABCDEFGHIXXLXNOPQRSTUVWXYZ FGHIXXLXNOPQRSTUVWXYZ ABCDEFGHIXXLXNOPQRSTUVWXYZ CDEFGHI HIXXLXNOPQRSTUVWXYZ

We use a length of 12. The file will now look like this:

ABCDEFGHIXXL FGHIXXLXNOPQ ABCDEFGHIXXL CDEFGHI HIXXLXNOPQRS