I have written several programs that scan files. I like writing them. But it's getting harder.

Ancient History

Back in the '70s, at The Bank, my boss wrote a COBOL program to read a file, and PRINT records that contained a specific character string (maybe his name). The logic was simple:

Read a record
Start at col-1, is this my string? If yes, go print it.
Bump to col-2, and repeat for the entire record.
And every record in the file.

One might keep in mind that the checking account master file, at that time, consisted of 40 reels of tape.
At the time, we had an IBM 360/195, IBM's screamingly fastest machine of the day (and to this day, one of my favorite machines ever). However, that program completely locked up the system - nothing else would run, because he was using ALL of the available CPU cycles. After 10 minutes of nothing else running, the operator cancelled it. My boss told me what he'd done, so I wrote a program that was inordinately faster. To this day, I don't know what he was looking for, or whether he found it.

In the program that I wrote, the idea is to pick one character from the string and search for it, and only test the entire string (his name???) when that character is found. So I wrote that, and did a 1-up in that I could search for any of several different strings at the same time, really fast. A couple years later, a guy named Joe Blank, who had worked at Fireman's Fund Insurance, came by selling a program he'd written while he was there.

The original need was a program to read a huge master file, and strip out various record types for different applications to use, in their production, or their testing, without having to pass the entire 20-50 reels of master file. It did that very well. He'd also written a 'front end' that allowed users to run it from their desk, selecting, and/or, editing selected records. It wasn't faster than my program but was far, far better.

Maybe the other thing to understand is that I like programming. A LOT. You get paid for doing puzzles all day. What's not to like. I'm older, but havn't changed much in that regard, although I'm certainly thinking slower and less clearly. However, I haven't been in an IBM shop since about 2000, and not been anywhere near operations (where all the fun stuff happens) since the late '80s. Maybe worse, computer operations at The Bank have moved to Texas, so there might not be anything local, any more.

And today

There is now a simulator that allows folks to run IBM assembler, or COBOL, programs on a PC. There are, in fact, 2 related simulators that share quite a bit of code.'

Hercules provides the instruction processing. Users have to provide an operating system, and all the stuff needed to run programs. I don't know, but suspect that this is used in many computer programming schools.
Z390.COM is the other simularor. It provides users with both the instruction processing, and also the operating system, so users just need to feed it their source code, and it gets run, (and crashes when it doesn't work as the author thought). I use Z390. It works amazingly well.

There's another program that I use, SPFlite, which is a line editor that's very similar to the IBM ISPF editor that I've used for decades. Between the two, I'm pretty comfortable, other than all the mistakes I make. Getting old hasn't made me smarter. But I do have spare time, and Elaine humors me far more than I deserve. So I've written several programs - some of which even work. Of those, this is only about SCAN programs.
Generally, when I'm happy with how they work, I put them in:

https://sites.google.com/site/linlyons

Which has lots of other stuff as well. In general, the programming stuff is down at the bottom of the index page.

A little more background

There are a couple words, or acronyms that I'm going to use, so I'll describe them here. That's way easier than trying to describe the function each time I use one.
JCL -- Job Control Language - In an IBM mainframe system, it's what is used to tell the computer what to do. It contains stuff like the name of the program to run, and the files that are to be used. There are several books, written just about JCL. It's that important.
PARM field -- is a way to pass information to the program being run. Another way to do that would be to have a control card file that contained instructions. With a control card file, you can have as many instructions as you'd like. With the PARM field, you are limited to only 100 bytes. That limitation is often the deciding factor.

In an IBM production environment, programmers write their (or fix someone else's) program, and then test it.
To test it, they submit it to the system with some JCL, that tells the system:

JOB - Who to charge for the computer time
PGM= What program to run
PARM= Some information that the program needs for this specific run
DD -- (Data Definition) What file to use. (usually several of these)

Such a 'job stream' might look like (this stretches my memory):

//APLPROG  JOB  MSGCLASS=T,MSGLEVEL=1,CLASS=X
//STEPNAM  EXEC  PGM=MYPROG     [,PARM='TEST']             [my program]
//STEPLIB  DD  DISP=SHR,DSN=APL.MY.EXEC.LIBRARY            [is in this file]
//SYSPRINT DD  SYSOUT=*                             [I want to see the report]
//IN       DD  DISP=SHR,DSN=APL.MY.FILE                    [input file]
//OUT      DD  DISP=(NEW,CATLG),DSN=APL.NY.NEW.FILE        [output file]
//SYSIN    DD  DISP=SHR,DSN=APL.MY.CONTROL.CARDS(SCAN32)   [contol cards]

The 360 computers can have multiple number formats.

Character format, readable. But the system cannot do math with them.
Binary format, either 2 or 4 bytes (half word or full word)
Packed decimal format, in which there are 2 decimal digits in each 8-bit byte of memory.
Floating point, which I don't understand.

Of those, character format is used for reading and writing.
Binary is used for indexing, both in a program, and in system operation.
Packed decimal is used for numbers, like your bank balance.

There is a TR (translate) instruction that's used to translate character strings, from one character set to another. There are, in fact, 2 popular character sets, ASCII (used in your PC) and EBCDIC (used in IBM mainframes) so having the ability to easily translate from one to the other is nice. To do a translate, for example, you put the ASCII char '0' at the offset the value of the EBCDIC '0', and do that for all the characters, so that after a TR instruction, the line will mean exactly the same, in the other character set.

There is another instruction which is a head-scratcher at first. But it's the heart of search programs. TRT (translate and test) does not tranalate anything. (But they couldn't figure out another name.) To fine the letter 'Q' for example, you create a 256 byte table that all 0s except at the "Q" offset there is a non-zero. If you're looking for "Q", "T", and "Z", then those 3 locations will be non-zero. This allows you to search for 3 different things with just one single instruction. It's really extraordinary. And, of course, that's what I use.

So what kinds of scan programs have I written?

Maybe some more notes are in order.

I don't expect that they'll ever be used "for real".
I'd be absolutely delighted if that were to really happen.
I've not written anything to make them usable from a work-station, because IBM already does that, and there's no way I'll be able to compete with IBM.
I've not ever dealt with the code to make them 'terminal friendly'.
I'm just going to stick with what I know, and make 'em as good as I can.

My general prejudice is to have everything in front of you at the same time. This implies that both the JCL to run the program, and the control information are together. And that implies that it's nice to have the 'instructions' included as part of the run deck, rather than have them in a separate control card file.

Another thing, at least for me, the coder, is how complicated to make it, vs how fast do I want it to run? KISS (Keep It Simple Stupid) really is a good model to follow. Sometimes I write 'em looking for the first character of the string. Other times I try to figure out which character in the string is likely to be found the least frequently in the file, and search for that, however sometimes the code to do that correctly is a major part of the program. But I think it is the right thing to do.

I wrote a program to read multiple files, and count character frequency, but I'm not happy with it. Later, I found a frequency list promoted by IBM. Again, I don't think competing with them is a good idea. I've just fairly recently found that list in Wikipedia, so it's not in some of the older programs, but I'll gladly put it in, if someone is interested in using any of them.

SCAN1

Use PARM field for control,
scan for least frequent char,
AND can edit the strings.
I think this is the only one so far that can do an edit. Eventually MYSCAN will do that also.

QSTRING

Written quite a while ago Maybe one of first here.
Again, use PARM field for control
Scans for the first character of the string, which cannot be a blank. This is one of the first that I've written in the last vew years, possibly before I found Z390.

SCANFAST

Use PARM field for control
Scan for multiple strings, only using PARM field for input

SCANTEXT

Scan and test both upper and lower case chars. (First time I'd done that.)
Uses either PARM or control cards to scan for multiple characters.
Can scan for multiple strings in a single pass of the file. Strings can be a mix of upper and lower case.
Scans for least frequent char - either upper or lower case.

PSCAN1 and PSCAN2

I think that the setup for MYSCAN works, but wanted a break. So I wrote 2 programs.
They scan, just like the others, but I needed something I could understand.
The results appear the same, but the internal processing is a bit different.
PSCAN1 saves the string(s) coded in the parm field, and looks for the first character (in each)
PSCAN2 also savees the string(s), but looks for the least frequently occuring character in each string
That makes it run faster internally.
PSCAN descriptions

MYSCAN

This is what I'm working on now. It's inordinately more complicated than any of the preceeding programs.
I'm haning trouble because of the complication. I've actually started this several times, and not been able to finish. Wish me luck.