Chapter 6 - String Data Types

Chapter 6 - String Data Types

A string is a list of text characters. We tell BASIC that we are dealing with text rather than variable names by enclosing the text in double quotation marks. Applying this, you should be able to see why, if we want to write 'Hello' on the screen we use:

PRINT "Hello"

and not

PRINT Hello

The second example would send BASIC scurrying off to its variable list trying to find one called Hello. If it just so happened that you had one, it will print its value, most likely you won't so BASIC will complain.

We can think of the way a string variable holds its value as a series of memory locations, each of which holds a character, like this:

The quotes are not kept as part of the text, they are just used as delimiters during programming. Each position is one byte in size, this means it can hold a number in the range of 0-255. So, if a byte can only hold a number, how does it store letters like above? The answer is that the operating system has a lookup table which it uses to translate your text into numeric codes for storing in memory and then translate them back again when we want to print them out. The table is called the ASCII table (American Standard Code for Information Interchange, programmers love acronyms) and it provides a table of corresponding letters, numbers, punctuation marks and other assorted characters. When we store the letters for "Hello", it actually represents them internally like this:

101

108

111

You can see the full table in the online help under Reference Information. Note that not all the codes have a visible representation and codes less than 32 may cause strange things to happen if you try to print them. These lower codes are often referred to as control characters, they represent things like horizontal tab, form feed etc. Also, the numbers above 127 are a non-standard standard (!) and so will give different characters depending on the font that is being used.

Look at character 32, space. Space is normally filtered out by our brains when we read text, it's there but we ignore it. To a computer, space still needs a representation and so is given a value, just like any of the other punctuation marks such as comma (code 44) or decimal point / full stop (code 46). As BASIC is so picky about spaces, this means that the two strings:

"Hello" and " Hello"

would be considered different, as stated we tend to filter it out, but to the computer it's just another character code.

As numbers have limits, so too do strings. The limits are the code of the character (0 to 255) and the number of characters the string variable can hold, or the length of the string. In BBC BASIC the length can range between 0 and 65535 characters. Zero because you can have a string with nothing in it. In fact when you first declare a string variable, BASIC creates it with zero length, i.e. with nothing in it. This may seem a little odd but is a useful concept. If at any time you wish to set a string to hold nothing, this is how you do it:

MyString$ = ""

That's double quotes with no gap between them. As a space is a character, this is not the same as:

MyString$ = " "

If you were to print them out, you would not see any difference, but that, of course, doesn't mean they are the same. A string with nothing in it is variously called an empty string or null string. You'll come across both.

Now we've got to grips with what a string is, what can we do with them? With numeric variables, you can add, subtract, square root etc. Strings are a little more limited. You can only add them:

REM Adding strings

S1$ = "Hello"

S2$ = ", "

S3$ = "world"

S4$ = S1$ + S2$

PRINT S4$

S4$ = S4$ + S3$

PRINT S4$

END

This is called concatenation, which is a fancy word meaning chain them together. The program copies the contents of S1$, splices S2$ onto the end of it and puts the result into S4$. Line 7 adds S3$ onto the end of S4$. There would be nothing to stop you doing all this in one line.

REM Adding strings

S1$ = "Hello"

S2$ = ", "

S3$ = "world"

S4$ = S1$ + S2$ + S3$

PRINT S4$

END

This is the only mathematical operation that is allowed on strings. None of the others make much sense anyway: how do you find the square root of "Hello"? Don't for one moment think that's it, though, BASIC has a very comprehensive set of functions for manipulating string variables. These are dealt with in the following section.

String Functions

LEN

One of the most useful things we can know about a string is its length. The function LEN tells us exactly this. It must always have one argument, though as is usual, this can be an expression. The result must always be assigned to a numeric value or used in an expression where a numeric value is expected. In immediate mode, try the following:

PRINT LEN("Hello")

PRINT LEN("Hello, world")

LEN can be used to distinguish between empty strings and strings with no visible characters:

REM LEN of an empty string

A$=""

B$=" "

PRINT LEN(A$)

PRINT LEN(B$)

END

STRING$

There are times when you want to be able to generate a repeating pattern of text without typing it all in manually. STRING$ does just this. It takes as its parameters a number of repetitions and a base string. It returns a string which is the base string repeated the given number of times:

PRINT STRING$(3,"+++===")

Here is a little program that will take a string then underline it.

REM Underline using LEN and STRING$

Title$ = "BBC BASIC"

L%=LEN(Title$)

PRINT Title$

PRINT STRING$(L%,"*")

END

Or, just to get carried away, we could put the title in a box:

REM Box using LEN and STRING$

Title$ = "BBC BASIC"

L%=LEN(Title$)

PRINT STRING$(L%+4,"*")

PRINT "* ";Title$;" *"

PRINT STRING$(L%+4,"*")

END

INSTR

Although it's easy to create strings, there are times when we want to inspect their contents. The function INSTR allows us to search a string for a character or pattern of characters. INSTR takes two or three arguments. The first is the string we wish to search. The second is a string containing the characters we wish to search for. The third is optional, we'll get to it in a minute. When supplied with two parameters, INSTR will return the position of the first character in the search string that matches the characters in the list to search for. This example will return the position of the first letter C in the target string:

PRINT INSTR("BBC BASIC", "C")

The first character in a string is position 1. If INSTR returns 0, it means no match was found.

PRINT INSTR("BBC BASIC", "C")

The optional third parameter can force INSTR to start at a position other than 1. This means we can search the entire string by remembering the last position returned and starting one character after that.

REM INSTR Demo

Posn%=INSTR("BBC BASIC","C")

PRINT "C found in position: ";Posn%

Posn%=Posn%+1

Posn%=INSTR("BBC BASIC","C",Posn%)

PRINT "C found in position: ";Posn%

END

Notice how we have to increment Posn% to get it past the first C. If we hadn't, we would have started from position 3 again. As position 3 is a C, the search would have returned the same value again. If the start position is larger than the length of the string, you get 0 (not found) in return.

INSTR can also search for a sequence of characters in the target string.

PRINT INSTR("BBC BASIC","BBC")

The thing to be wary of here is how you specify the string to search for

PRINT INSTR("BBC BASIC FOR WINDOWS","FOR")

PRINT INSTR("FORTUNE FAVOURS THE BOLD","FOR")

Will both tell you that both contain the word "FOR", when clearly the second one doesn't. This again is because BASIC has no concept of language, it just looks for a pattern of characters and when it finds a match, stops. A more correct way would be to search for:

PRINT INSTR("BBC BASIC FOR WINDOWS","FOR ")

PRINT INSTR("FORTUNE FAVOURS THE BOLD","FOR ")

LEFT$ and RIGHT$

The next two functions return a subsection of a string and are dealt with together as they are functionally similar.

LEFT$ takes two parameters: a target string and a number of characters. It returns a string which is the number of characters in length starting from position 1.

PRINT LEFT$("Hello, world", 5)

If the number is greater than the total length of the string, you just get the whole string.

PRINT LEFT$("Hello, world", 100)

LEFT$ will also accept one parameter only:

PRINT LEFT$("Hello")

This will return all the characters but the last one and is the same as:

PRINT LEFT$("Hello", LEN("Hello")-1)

It is also possible to use LEFT$ as an assignment. In this mode, LEFT$ will overwrite the characters in the string with the ones being assigned, starting at the first character.

REM LEFT$ as an assignment

MyStr$="Hello, world"

LEFT$(MyStr$,6)="Byebye"

PRINT MyStr$

END

If you specify a number less than the length of the replacement, BASIC will only overwrite the number of characters specified. Should you specify more, BASIC will only overwrite up to the maximum characters in the replacement string.

RIGHT$ takes the same arguments as LEFT$ but returns the rightmost number of characters.

PRINT RIGHT$("Hello, world", 5)

Again, if the number is too big, you just get the whole string back. With only one argument, RIGHT$ will return just the last character.

Predictably, when used in an assignment, RIGHT$ will overwrite the characters at the end of the string.

REM RIGHT$ as an assignment

MyStr$="Hello, world"

RIGHT$(MyStr$,5)="mummy"

PRINT MyStr$

END

Exactly what happens if you specify fewer characters than the length of the replacement string is probably best illustrated by example. Change the 5 to 4 in line 3 above and see what happens. It starts 4 characters from the end of the string and copies the first 4 characters from the replacement string. If you tell BASIC to use more characters than are contained in the string, our friendly computer will effectively derive its own number. Substitute 8 in line 3 and see. The replacement doesn't start 8 characters away from the end of the string, it merely works out that the replacement has 5 characters, and starts at that position instead.

Please note that with both LEFT$ and RIGHT$, you cannot lengthen the original string by giving more characters in the replacement than are in the target. BASIC will just truncate the substitute string at the length of the target.

MID$

LEFT$ and RIGHT$ allow us to manipulate the start and end of a string, but what happens if you want to extract from the middle? MID$ will do this for us.

In its more common application, MID$ has three parameters: a string, the start position and a number of characters. As with all strings, the left most character is position 1. Try this:

PRINT MID$("Fortune favours the bold", 9, 7)

This returns 7 characters starting at position 9 i.e. "favours" in this case.

If the last number is bigger than the length of the string, you just get everything up to the end.

PRINT MID$("Fortune favours the bold", 9, 1000)

This case is so common that BASIC allows us to omit the final parameter. If you do this, BASIC assumes that you want all the characters from the start position to the end.

PRINT MID$("Fortune favours the bold", 9)

OK, that was painless enough, but we're not finished. Like RIGHT$ and LEFT$, MID$ can also be used on the other side of the equals sign. This means that you can get BASIC to replace a section of a string:

REM MID$ demo

A$ = "Give me patience!!"

MID$(A$,9,8) = "strength"

PRINT A$

END

From the above description, you should be able to guess what it's doing. For completeness: line 3 takes the string "strength" which is 8 characters long and, starting at position 9 in A$, replaces the characters one for one with the characters in "strength".

There are several things to be aware of when dealing with the number of characters. Usually, the number is the same as the length of the replacement string. If the number of characters specified is shorter than the length of the replacement, only that number of characters are copied:

MID$(A$,9,4) = "strength"

Also, if the start position in the target string plus the number of characters is greater than the total length of the replacement string, BASIC will only copy characters up to the end of the target string and ignore anything after:

MID$(A$,9,13) = "all your cash"

To put it another way, BASIC will not extend the length of the target string.

You can leave out the number of characters. In this case BASIC assumes the length of the replacement string, but still obeys the rules given above.

Now for a little demo that uses INSTR, LEFT$ and MID$. Suppose we have someone's full name and we want to separate it into first name and surname. We know that the two names are separated by a space, so first we use INSTR to locate the space. Then we copy all the characters up to, but not including, the space into the string that keeps the first name. Next we take all the letters starting after the space up to the end and save them in the surname. Have a crack at this yourself first before looking at my result if you want to, it's the only way to learn.

REM Separate names

FullName$ = "Joe Soap"

Posn% = INSTR(FullName$, " ")

FirstName$ = LEFT$(FullName$, Posn%-1)

Surname$ = MID$(FullName$, Posn%+1)

PRINT "Your first name is: ";FirstName$

PRINT "Your surname is: ";Surname$

END

How did you do? There are always as many ways to code the solution to a program as there are people trying to code it, so if you got a different solution that's fine. Also don't be upset if you didn't get it completely right first go, I didn't: it's all part of the programming process.

ASC and CHR$

We have already made the acquaintance of the ASCII table. It is very useful to be able to find the codes that correspond to the letters and vice versa. That's the job of ASC and CHR$.

ASC returns an integer which is the ASCII code for the character passed as a parameter:

PRINT ASC("A")

Gives 65, as expected.
Note also:

PRINT ASC("1")

Gives 49, which is the code for the character "1", NOT the value 1.

If the string is bigger than one character, ASC just returns the code for the first character. To inspect other positions, we need to use MID$:

PRINT ASC(MID$("BBC BASIC",2,1))

which gives the code for the second character, "B".

As you may expect, CHR$ does the reverse of ASC: give it a number and it will return a single character string containing the corresponding ASCII code.

PRINT CHR$(65)

CHR$ is particularly useful for making strings out of the characters you can't get on the standard keyboard:

PRINT "The temperature is 21.2"+CHR$(176)+"C"

This can be a useful technique for printing cursor control characters or user defined characters, which are described in a later section.

If you give CHR$ a number which is bigger than 256, BASIC divides it by 256 and gives the character corresponding to the remainder.

Tip: Printing quotation marks

If you want to print a double quote in a string, you can do it in two ways, the first one involves building a string using CHR$(34), which is the code for double quote.

Greeting$ = CHR$(34) + "Hello, world" + CHR$(34)

PRINT Greeting$

The other way is a little trick that BBC BASIC allows us. You can actually put the quote in the string, but you use two double quotes together so BASIC knows that we want to print the quote character and not end the string.

Greeting$ = """Hello, world"""

PRINT Greeting$

As the quotes in this string are at the beginning and end, there are three lots, which definitely looks odd. Take the beginning, the first indicates the start of the string and the next two tell BASIC to store a quote. The end is the same but in reverse.

VAL and STR$

The next two commands allow us to convert between numeric and string data types.

VAL takes as its argument a string representation of a number and returns the numeric equivalent of that number.

PRINT VAL("123")

If the string contains non-numeric information, it will convert until it fails:

PRINT VAL("123xyz")

or if the non-numeric stuff comes first, you just get 0 back.

PRINT VAL("xyz123")

The counterpart of VAL is STR$, which you probably guessed. You might also have guessed that this takes a number or numeric variable and converts it into a string representation. Now we can add a number to a string:

REM STR$ demo

A$ = "The temperature outside is " + STR$(21.6)

PRINT A$

END

There are default settings which control the format of the string produced. This is well documented in the online help and is changeable at runtime if you require, but is a little beyond the scope of this tutorial.

EVAL

The last string command that must be mentioned is EVAL. I'll give a flavour of what it can do rather than a full description because it is such a powerful command. In essence, it allows you to evaluate the contents of a string expression. Take the description of VAL, which converts a string to a number. At some point programmers try, inadvertently or otherwise, something like this:

PRINT VAL("22/7")

VAL returns 22 as described above. Now try:

PRINT EVAL("22/7")

Not impressed? Try:

PRINT EVAL("SIN(PI/2)")

Take it from me, that's not something you get with any old BASIC. You can pass any string expression and EVAL will evaluate it and return a numeric or string value, just as if you had entered the code into a line of a program. As demonstrated above, you can use internal BBC BASIC functions (though commands like CLS etc. will not work). You can even use variables within the program:

REM EVAL demo

Side1 = 3

Side2 = 4

Hyp = EVAL("SQR(Side1^2+Side2^2)")

PRINT "Hypotenuse is: ";Hyp

END

The possibilities that this presents spiral off into infinity, so that's all I'm going to say about it here.

Exercises

1) Set a string to hold the days of the week like this:

"Sun Mon TuesWed ThurFri Sat"

All names are 4 characters in length including a space if necessary. Given a number for a day, use MID$ to extract the correct abbreviation for the day.
2) Set a string to hold your first name. Use MID$ and ASC to find the ASCII codes of the letters in the name.
3) Set three strings to hold your first name, second name (if you haven't got one, make it up) and surname. Use LEFT$ to find your initials and concatenation to create a new string in the format "R. T. Russell"

CONTENTS

CHAPTER 7