Module 2. Basic components of ‘C’ programming language

Lesson 4

CHARACTERISTICS OF ‘C’ LANGUAGE

4.1  Introduction

The main characteristics of ‘C’ include modularity, portability, extendibility, speed and flexibility. Modularity is the ability to split a larger programme into manageable small segments called modules. This is an important feature of structured programming languages. It helps to complete the software development projects in time as well as makes debugging process easier and quick. The codes written in ‘C’ are highly portable, i.e., it is possible to install the software developed in ‘C’ on different platforms with minimum (or no) alterations in the source code. ‘C’ allows extending the existing software by adding new features to it. ‘C’ is also known as LLL because ‘C’ codes run at the speeds matching to that of the same programmes written in assembly language. That is, ‘C’ has both the merits of HLL and LLL. Thus, ‘C’ is mainly used in developing system software such as operating systems, e.g., UNIX was written in ‘C’. ‘C’ has right number of reserved words or special words also called keywords (ANSI ‘C’ has 32 keywords), which allow the programmers to enjoy flexibility and to have complete control on the language. Hence, ‘C’ is known as programmer’s language as it facilitates them to induce creativity into their programmes.

4.2  Character Set

The character set in ‘C’ can be grouped into four major categories, viz., letters, digits, special characters and white spaces. The ‘C’ character set comprises of digits: 0-9; uppercase letters: A-Z and lowercase letters: a-z. The set of special characters supported by ‘C’ is given in Table-4.1.

Table 4.1 ‘C’ Special Character Set

Serial Number

Character

Description

1.        

,

Comma

2.        

.

Period

3.        

;

Semicolon

4.        

:

Colon

5.        

?

Question mark

6.        

'

Apostrophe

7.        

"

Quotation marks

8.        

!

Exclamation mark

9.        

|

Vertical bar

10.     

/

Slash

11.     

\

Backslash

12.     

~

Tilde

13.     

_

Underscore

14.     

=

Equal sign

15.     

%

Percentage sign

16.     

&

Ampersand

17.     

^

Caret

18.     

*

Asterisk

19.     

-

Minus sign

20.     

+

Plus sign

21.     

< 

Opening angle (‘less than’ sign)

22.     

> 

Closing angle (‘greater than’ sign)

23.     

(

Left parenthesis

24.     

)

Right parenthesis

25.     

[

Left bracket

26.     

]

Right bracket

27.     

{

Left Brace

28.     

}

Right brace

29.     

#

Number sign

 

Most versions of ‘C’ allow certain other characters such as $ and @ to be included within strings and comment statements. ‘C’ also uses certain combinations of these characters like \b, \n and \t to represent special conditions called backspace, newline and horizontal tab, respectively. Typically, these character combinations are known as escape sequences (to be discussed later). At this juncture, it is just to mention for completeness that each escape sequence represents a single character even though it is written as combination of two or more characters. 

White space characters include blank space, horizontal tab, vertical tab, carriage return, newline and form feed. All these characters are known as white space characters because they serve the same purpose as the spaces between words and lines on a printed page so as to make the programme-reading process easier for human comprehension. White spaces are ignored by the compiler until they are a part of string (alphanumeric or non-numeric) constant. White space may be used to separate words, but are strictly prohibited while using between characters of keywords or identifiers.

4.3  Tokens

The tokens of a language are the basic building blocks, which can be assembled in a systematic order to construct a programme. Thus, in a ‘C’ source programme, the basic element recognised by the compiler is the “token”. A token is text contained in the source programme that the compiler does not break down further into component elements, i.e., tokens are the smallest individual words, punctuation marks, etc., recognised by the ‘C’ compiler. ‘C’ has six such types of token, viz., keyword; identifier; constants and literals; operator and punctuator.

Instructions in ‘C’ are formed using syntax and keywords. It is necessary to strictly follow ‘C’ syntax rules, i.e., all programmes must conform to grammatical rules pre-defined in the language. Any instruction that mismatches with the prescribed syntax encounters an error while compiling the programme. Each keyword (or reserved word) in ‘C’ has its own pre-defined meaning and relevance; hence, keywords should neither be used as variables nor as constant names. A list of ‘C’ keywords is given in Table-4.2 below.

Table 4.2 ‘C’ Keywords

Serial Number

Keyword

Description

1.      

auto

The default storage class.

2.      

break

Command that exits for, while, switch, and do...while statements unconditionally.

3.      

case

Command used within the switch statement.

4.      

char

The simplest ‘C’ data type.

5.      

const

Data modifier that prevents a variable from being changed. Also, see volatile.

6.      

continue

Command that resets for, while, or do...while loop statement to the next iteration.

7.      

default

Command used within the switch statement to catch any instances not specified with a case statement.

8.      

do

Looping command used in conjunction with the while statement. The loop will always execute at least once.

9.      

double

Data type that can hold double-precision floating-point values.

10.   

else

Statement signalling alternative statements to be executed when condition underlying if statement evaluates as FALSE.

11.   

enum

Data type that allows variables to be declared that accept only certain values.

12.   

extern

Data modifier indicating that a variable will be declared in another area of the programme.

13.   

float

Data type used for floating-point numbers.

14.   

for

Looping command that contains initialisation, incrementation and conditional sections.

15.   

goto

Command that causes a jump to a pre-defined label.

16.   

if

Command used to change programme flow based on a TRUE/FALSE decision.

17.   

int

Data type used to hold integer values.

18.   

long

Data type used to hold larger integer values than int.

19.   

register

Storage modifier that specifies that a variable should be stored in a register if possible.

20.   

return

Command that causes programme flow to exit from the current function and return to the calling function. It can also be used to return a single value.

21.   

short

Data type used to hold integers. It isn’t commonly used, and it’s the same size as an int on most of the machines.

22.   

signed

Modifier used to signify that a variable can have both positive and negative values. Also, see unsigned.

23.   

sizeof

Operator that returns the size of the item in bytes.

24.   

static

Modifier used to signify that the compiler should retain the variable’s value.

25.   

struct

Keyword used to group ‘C’ variables of any data type together.

26.   

switch

Statement used to change programme flow in various directions. It is used in combination with the case statement.

27.   

typedef

Modifier used to create new names for existing variable and function types.

28.   

union

Keyword used to allow multiple variables to share the same memory space.

29.   

unsigned

Modifier used to signify that a variable will contain only positive values. Also, see signed.

30.   

void

Keyword used to signify either that a function doesn't return anything or that a pointer being used is considered generic or able to point to any data type.

31.   

volatile

Modifier that signifies that a variable can be changed. Also, see const.

32.   

while

Looping statement that executes a section of code as long as a condition remains TRUE.

 

Also, some compilers may include some or all of the following keywords:

 

ada

Far

Near

asm

fortran

pascal

entry

huge

 

 

The list of keywords supported by C++ programming language in addition to the above noted keywords is also given here under. These C++ reserved words aren’t within the scope of this e-course; but in due course it is quite possible that you may like to port your ‘C’ programmes to C++, therefore, you should avoid using these keywords as well in your ‘C’ programmes.

 

catch

inline

Template

class

new

This

delete

operator

Throw

except

private

Try

finally

protected

Virtual

friend

public

4.4  Identifiers

The term identifier refers to the name of various programme elements, i.e., a variable, a function, an array, etc. Both, uppercase as well as lowercase letters are permitted; although, common practice is to use lowercase letters. The underscore character is also permitted in identifiers. It is usually used as a link between two words in an identifier having a lengthy name, e.g., fat_content. The syntactic rules for assigning a name to an identifier are:

·         The first character must be a letter or an underscore

·         It must consist of only letters, digits or underscore

·         First 31 characters are significant, i.e., if the programmer writes an identifier comprising of more than 31 characters, the compiler considers only first 31 characters and ignores the remaining characters (note that some implementations of ‘C’ recognise only first eight characters)

·         A keyword cannot  be used as an identifier name

·         It cannot contain any spaces.

Examples 

The following names are valid identifiers:

 

X

y12

trial_1

_temperature

specie

sensory_score

win_tmp

TABLE

The following names are invalid identifiers for the reasons stated against each:

4th

:

First character must be a letter

"x"

:

Illegal character (")

order-number

:

Illegal character ()

viscosity score

:

Illegal character (blank space).

 

Note: All the keywords are lowercase. Since uppercase and lowercase characters are treated as different by the ‘C’ compiler, it is possible to use uppercase keyword as an identifier. Generally, this is not done and considered as a poor programming practice. For instance, the keyword ‘double’ and the identifier ‘DOUBLE’ are different

4.5  Constants

Constants in ‘C’ refer to the fixed values that do not change during the execution of the programme. There are four basic types of constant in ‘C’, viz., integer constants; floating-point constants; character constants and string constants. Besides, there are enumeration constants (to be discussed later). The integer and floating-point constants are collectively known as numeric constants. The rules governing all numeric constants are as follows:

·         Commas and blank spaces cannot  be included within the constant

·         The constant can be preceded by a minus (-) sign, if needed

·         The value of a constant cannot exceed specified minimum and maximum bounds.

·         For each type of constant, these bounds vary from compiler to compiler.

Integer constants

An integer constant is an integer-valued number. Thus, it comprises of a sequence of digits. Integer constants can be written in three different systems, viz., decimal (base 10), octal (base 8) and hexadecimal (base 16) systems. The novice programmers, generally, use decimal integer constants. A decimal integer constant may consist of any combination of digits taken from the set 0 through 9. If the constant contains two or more digits, the first digit must be other than 0.

Examples

Several valid constants are given below:

0

1

786

9797

32767

9999

The following integer constants are incorrect for the reasons stated against each:

Illegal character (,)

Illegal character (.)

Illegal character (blank spaces)

Illegal character ()

The first digit cannot be a zero.

 

Floating-point constants

A floating-point (or real) constant is a base-10 number that contains either a decimal or an exponent (or both), i.e., the floating-point constants are of two forms, viz., fractional form and the exponential form. A floating-point constant in fractional form must have:

·         At least one digit

·         A decimal point

·         Positive or negative sign (default sign is positive)

·         No commas or spaces embedded in it.

A floating-point constant in its fractional form use four bytes in memory. For example, -26.9876, +867.9 and 654.0 are valid floating-point constants.

In exponential form, the floating-point constant is represented in two parts, e.g., the floating-point constant, 0.00032  is written in exponential form as +3.2e -4. The part preceding the ‘e’ is known as ‘mantissa’; and the other succeeding ‘e’ is called ‘exponent’. A floating-point constant in exponential form must follow the rules:

·         The mantissa part and the exponential part should be separated by the letter ‘e’

·         The mantissa may have a positive or negative sign (default sign is positive)

·         The exponent must have at least one digit

·         The exponent may be a positive or negative integer (default sign is positive)

Floating-point constants encompass a much greater range than integer constants. Typically, the magnitude of a floating-point constant might range from a minimum value of approximately 3.4 e -38 to a maximum of 3.4e +38 . Some versions of the language allow floating-point constants that cover a wider range such as 1.7e - 308 to 1.7e +308. Note that the value 0.0 (even being less than 3.4e -38  and 1.7e - 308) is a valid floating-point constant. You should find out the suitable values for the version of ‘C’ used on your computer system.

Examples of some valid floating-point constants:

 

Examples of some invalid floating-point constants (the reasons specified against each):

Either a decimal point or an exponent must be present

Illegal character (,)

The exponent must be an integer quantity (it cannot contain a decimal point)

Illegal character (blank space) in the exponent.

 

Floating-point constants are normally represented as ‘double-precision’ quantities. Thus, each floating-point constant occupies eight bytes of memory. Some versions of ‘C’ allow the specification of a ‘single-precision’ floating-point constant by suffixing the letter ‘F’ (in either uppercase or lowercase) to the end of the constant, e.g., 3e5F. Similarly, some versions of ‘C’ permit the specification of a ‘long’ floating-point constant by appending the letter ‘L’ (uppercase or lowercase), e.g., 0.987654321e – 22L. Note that precision of floating-point constants, i.e., the number of significant figures is different for different versions of ‘C’. Basically, all these versions permit at least six significant figures and some versions permit as many as eighteen significant figures. You should determine the appropriate number of significant figures for your particular version of ‘C’.

Character constants

A character constant is an alphabet, a single digit or a single special character enclosed within single quotation marks. The maximum length of a character constant can be 1 character. It occupies one byte of memory.

Examples

‘A’

‘x’

‘3’

‘ ’

 

Note that the last constant consists of a blank space enclosed in quotation marks.

Character constants have integer values that are determined by the machine’s specific character set. Thus, the value of a character constant may vary from computer to computer. However, the constants themselves are independent of the character set. Majority of computer systems and virtually all personal computers use American Standard Code for Information Interchange (ASCII) character set (visit http://blob.perl.org/books/beginning-perl/3145_AppF.pdf for further details), in which each individual character is numerically encoded with a unique 7-bit combination (hence, a total of 27 =128 differenr characters).

Example: several character constants and their corresponding values as per the ASCII character set are shown below:

Constant

Value

Constant

Value

Constant

Value

‘A’

65

‘x’

120

‘3’

51

‘?’

63

‘ ’

32

 

 

 

Note that these values will be the same for all the computers using the ASCII character set. However, the values will be different for computers using an alternate character set such as IBM mainframe computers used EBCDIC character set, which is based on its own unique 8-bit combination.

Escape sequences

Certain non-printing characters as well as the backslash and the apostrophe (’) can be expressed in terms of escape sequences. An escape sequence always begins with a backward slash (\) followed by one or more special characters, e.g., a line feed (called newline in ‘C’ perspective) is represented as \n. Such escape sequences always represent single characters, even though they comprise of two or more characters. The commonly used escape sequences along with their corresponding ASCII values are listed in Table-4.3.

Table 4.3 Some Escape Sequences vis-ΰ-vis ASCII Values

Character

Escape Sequence

ASCII Value

null

\0

000

bell

\a

007

backspace

\b

008

horizontal tab

\t

009

newline

\n

010

vertical tab

\v

011

form feed

\f

012

carriage return

\r

013

quotation mark

\"

034

apostrophe

\'

039

question mark

\?

063

backlash

\\

092

 

Example

Some character constants expressed in terms of escape sequences are given below:

‘\n’

‘\t’

‘\b’

‘\'’

‘\\’

‘\"’

 

Note that the character constants ‘\0’ and ‘0’ are distinct. Also, note that the escape sequences can be expressed by means of octal or hexadecimal number systems. Generally, using an octal or hexadecimal escape sequence is less popular than writing the character constant directly.

String constants

A string constant consists of any number of consecutive characters (including none) enclosed in double quotation marks, e.g., “Operation Flood”; “NDRI, Karnal-132001”; “+91-184-2559015”; “19.95”; “The correct answer is: ”; “2*(I+3)/J”; “Line 1\nLine 2\nLine 3”;   “    ”; “ ”; and so on.

Note that a character constant, e.g., ‘A’ and corresponding single-character string constant, “A” are not equivalent! Also, mind that a character constant has an equivalent integer value, whereas a single-character string constant does not have such an integer value.

4.6  Variables and Arrays

A variable is a named memory location (in the main memory or RAM) that can be used to read and write information. You may think of a variable as placeholder for a value. A variable is considered as being equivalent to its assigned value. Thus, if there is a variable,i initialised (or set equal) to 0, then it follows that i+1 will be equal to 1. Hence, a variable is an identifier that denotes some specified type of information within a designated portion of the programme. The variable must be assigned a value at some point in the programme. Subsequently, the value can be accessed in the programme by referring to the variable name. A given variable can be assigned different values at various places within the programme. Thus, the information contained in the variable can change during the execution of the programme. However, the type of the information (i.e., numeric or string, etc.) associated with the variable cannot change. The rules governing naming variables in ‘C’ are as follows:

·         The name can contain letters, digits and the underscore

·         The first letter must be any valid letter or the underscore

·         An underscore as the first letter should be avoided as it may conflict with standard system variables

·         The length of name can be unlimited although the first 31 characters must be unique

·         Keywords cannot be used as a variable name

·         Of course, the variable name should be meaningful to the programming context.

As ‘C’ is, relatively, a low-level programming language, therefore, before a ‘C’ programme can utilise memory to store a variable, it must demand the memory needed to store the values for a variable. This is realised by declaring variables. Declaring variables is the statement(s) in which a ‘C’ programme specifies the number of variables it needs, their names and quantum of memory they will need.

The array is another kind of variable that is commonly used in ‘C’. An array is an identifier that refers to a collection of data items that all have the same name. The data items must all be of the same type such as integer, character, etc. The individual data items are represented by their corresponding array elements, i.e., the first data item is represented by the first array element and so on. The individual array elements are distinguished from each other by the value that is assigned to a subscript. The detailed description of variables and arrays will be made in the following modules/lessons.

4.7  Operators

An operator is a symbol that facilitates the programmer to instruct the computer machine to perform certain mathematical or logical manipulations. Operators are used in ‘C’ programme to operate on data and variables. ‘C’ supports a rich collection of operators, viz., arithmetic operators, relational operators, logical operators, assignment operators, increments and decrement operators, conditional operators, bitwise operators and special operators (to be discussed later).