Module 2. Basic components of C programming language
Lesson 4
CHARACTERISTICS OF C LANGUAGE
4.1 Introduction
The main characteristics of C include modularity, portability, extendibility, speed and flexibility. Modularity is the ability to split a larger programme into manageable small segments called modules. This is an important feature of structured programming languages. It helps to complete the software development projects in time as well as makes debugging process easier and quick. The codes written in C are highly portable, i.e., it is possible to install the software developed in C on different platforms with minimum (or no) alterations in the source code. C allows extending the existing software by adding new features to it. C is also known as LLL because C codes run at the speeds matching to that of the same programmes written in assembly language. That is, C has both the merits of HLL and LLL. Thus, C is mainly used in developing system software such as operating systems, e.g., UNIX was written in C. C has right number of reserved words or special words also called keywords (ANSI C has 32 keywords), which allow the programmers to enjoy flexibility and to have complete control on the language. Hence, C is known as programmers language as it facilitates them to induce creativity into their programmes.
4.2 Character Set
The character set in C can be grouped into four major categories, viz., letters, digits, special characters and white spaces. The C character set comprises of digits: 0-9; uppercase letters: A-Z and lowercase letters: a-z. The set of special characters supported by C is given in Table-4.1.
Table 4.1 C Special Character Set
Serial Number |
Character |
Description |
1. |
, |
Comma |
2. |
. |
Period |
3. |
; |
Semicolon |
4. |
: |
Colon |
5. |
? |
Question mark |
6. |
' |
Apostrophe |
7. |
" |
Quotation marks |
8. |
! |
Exclamation mark |
9. |
| |
Vertical bar |
10. |
/ |
Slash |
11. |
\ |
Backslash |
12. |
~ |
Tilde |
13. |
_ |
Underscore |
14. |
= |
Equal sign |
15. |
% |
Percentage sign |
16. |
& |
Ampersand |
17. |
^ |
Caret |
18. |
* |
Asterisk |
19. |
- |
Minus sign |
20. |
+ |
Plus sign |
21. |
< |
Opening angle (less than sign) |
22. |
> |
Closing angle (greater than sign) |
23. |
( |
Left parenthesis |
24. |
) |
Right parenthesis |
25. |
[ |
Left bracket |
26. |
] |
Right bracket |
27. |
{ |
Left Brace |
28. |
} |
Right brace |
29. |
# |
Number sign |
Most versions of C allow certain other characters such as $ and @ to be included within strings and comment statements. C also uses certain combinations of these characters like \b, \n and \t to represent special conditions called backspace, newline and horizontal tab, respectively. Typically, these character combinations are known as escape sequences (to be discussed later). At this juncture, it is just to mention for completeness that each escape sequence represents a single character even though it is written as combination of two or more characters.
White space characters include blank space, horizontal tab, vertical tab, carriage return, newline and form feed. All these characters are known as white space characters because they serve the same purpose as the spaces between words and lines on a printed page so as to make the programme-reading process easier for human comprehension. White spaces are ignored by the compiler until they are a part of string (alphanumeric or non-numeric) constant. White space may be used to separate words, but are strictly prohibited while using between characters of keywords or identifiers.
4.3 Tokens
The tokens of a language are the basic building blocks, which can be assembled in a systematic order to construct a programme. Thus, in a C source programme, the basic element recognised by the compiler is the token. A token is text contained in the source programme that the compiler does not break down further into component elements, i.e., tokens are the smallest individual words, punctuation marks, etc., recognised by the C compiler. C has six such types of token, viz., keyword; identifier; constants and literals; operator and punctuator.
Instructions in C are formed using syntax and keywords. It is necessary to strictly follow C syntax rules, i.e., all programmes must conform to grammatical rules pre-defined in the language. Any instruction that mismatches with the prescribed syntax encounters an error while compiling the programme. Each keyword (or reserved word) in C has its own pre-defined meaning and relevance; hence, keywords should neither be used as variables nor as constant names. A list of C keywords is given in Table-4.2 below.
Table 4.2 C Keywords
Serial
Number |
Keyword |
Description |
1.
|
auto |
The
default storage class. |
2.
|
break |
Command
that exits for, while, switch, and do...while statements unconditionally. |
3.
|
case |
Command
used within the switch statement. |
4.
|
char |
The
simplest C data type. |
5.
|
const |
Data
modifier that prevents a variable from being changed. Also, see volatile. |
6.
|
continue |
Command
that resets for, while, or do...while loop statement to the next iteration. |
7.
|
default |
Command
used within the switch statement to catch any instances not specified with a
case statement. |
8.
|
do |
Looping
command used in conjunction with the while statement. The loop will always
execute at least once. |
9.
|
double |
Data
type that can hold double-precision floating-point values. |
10.
|
else |
Statement
signalling alternative statements to be executed
when condition underlying if statement evaluates as FALSE. |
11.
|
enum |
Data
type that allows variables to be declared that accept only certain values. |
12.
|
extern |
Data
modifier indicating that a variable will be declared in another area of the programme. |
13.
|
float |
Data
type used for floating-point numbers. |
14.
|
for |
Looping
command that contains initialisation, incrementation and conditional sections. |
15.
|
goto |
Command
that causes a jump to a pre-defined label. |
16.
|
if |
Command
used to change programme flow based on a TRUE/FALSE
decision. |
17.
|
int |
Data
type used to hold integer values. |
18.
|
long |
Data
type used to hold larger integer values than int. |
19.
|
register |
Storage
modifier that specifies that a variable should be stored in a register if
possible. |
20.
|
return |
Command
that causes programme flow to exit from the current
function and return to the calling function. It can also be used to return a
single value. |
21.
|
short |
Data
type used to hold integers. It isnt commonly used, and its the same size as
an int on most of the machines. |
22.
|
signed |
Modifier
used to signify that a variable can have both positive and negative values.
Also, see unsigned. |
23.
|
sizeof |
Operator
that returns the size of the item in bytes. |
24.
|
static |
Modifier
used to signify that the compiler should retain the variables value. |
25.
|
struct |
Keyword
used to group C variables of any data type together. |
26.
|
switch |
Statement
used to change programme flow in various
directions. It is used in combination with the case statement. |
27.
|
typedef |
Modifier
used to create new names for existing variable and function
types. |
28.
|
union |
Keyword
used to allow multiple variables to share the same memory space. |
29.
|
unsigned |
Modifier
used to signify that a variable will contain only positive values. Also, see
signed. |
30.
|
void |
Keyword
used to signify either that a function doesn't return anything or that a
pointer being used is considered generic or able to point to any data type. |
31.
|
volatile |
Modifier
that signifies that a variable can be changed. Also, see const. |
32.
|
while |
Looping
statement that executes a section of code as long as a condition remains
TRUE. |
Also, some compilers may include some or all of the following keywords: |
ada |
Far |
Near |
asm |
fortran |
pascal |
entry |
huge |
|
The list of keywords supported by C++ programming language in addition to the above noted keywords is also given here under. These C++ reserved words arent within the scope of this e-course; but in due course it is quite possible that you may like to port your C programmes to C++, therefore, you should avoid using these keywords as well in your C programmes.
catch |
inline |
Template |
class |
new |
This |
delete |
operator |
Throw |
except |
private |
Try |
finally |
protected |
Virtual |
friend |
public |
4.4 Identifiers
The term identifier refers to the name of various programme elements, i.e., a variable, a function, an array, etc. Both, uppercase as well as lowercase letters are permitted; although, common practice is to use lowercase letters. The underscore character is also permitted in identifiers. It is usually used as a link between two words in an identifier having a lengthy name, e.g., fat_content. The syntactic rules for assigning a name to an identifier are:
· The first character must be a letter or an underscore
· It must consist of only letters, digits or underscore
· First 31 characters are significant, i.e., if the programmer writes an identifier comprising of more than 31 characters, the compiler considers only first 31 characters and ignores the remaining characters (note that some implementations of C recognise only first eight characters)
· A keyword cannot be used as an identifier name
· It cannot contain any spaces.
Examples
The following names are valid identifiers:
X |
y12 |
trial_1 |
_temperature |
||
specie |
sensory_score |
win_tmp |
TABLE |
||
The following names are invalid identifiers for the reasons stated against each: |
|||||
4th |
: |
First character must be a letter |
|||
"x" |
: |
Illegal character (") |
|||
order-number |
: |
Illegal character (―) |
|||
viscosity score |
: |
Illegal character (blank space). |
|||
Note: All the keywords are lowercase. Since uppercase and lowercase characters are treated as different by the C compiler, it is possible to use uppercase keyword as an identifier. Generally, this is not done and considered as a poor programming practice. For instance, the keyword double and the identifier DOUBLE are different
4.5 Constants
Constants in C refer to the fixed values that do not change during the execution of the programme. There are four basic types of constant in C, viz., integer constants; floating-point constants; character constants and string constants. Besides, there are enumeration constants (to be discussed later). The integer and floating-point constants are collectively known as numeric constants. The rules governing all numeric constants are as follows:
· Commas and blank spaces cannot be included within the constant
· The constant can be preceded by a minus (-) sign, if needed
· The value of a constant cannot exceed specified minimum and maximum bounds.
· For each type of constant, these bounds vary from compiler to compiler.
Integer constants
An integer constant is an integer-valued number. Thus, it comprises of a sequence of digits. Integer constants can be written in three different systems, viz., decimal (base 10), octal (base 8) and hexadecimal (base 16) systems. The novice programmers, generally, use decimal integer constants. A decimal integer constant may consist of any combination of digits taken from the set 0 through 9. If the constant contains two or more digits, the first digit must be other than 0.
Examples
Several valid constants are given below:
0 |
1 |
786 |
9797 |
32767 |
9999 |
|
The following integer constants are incorrect for the reasons stated against each: |
||||||
|
Illegal character (,) |
|||||
|
Illegal character (.) |
|||||
|
Illegal character (blank spaces) |
|||||
|
Illegal character () |
|||||
|
The first digit cannot be a zero. |
|||||
Floating-point constants
A floating-point (or real) constant is a base-10 number that contains either a decimal or an exponent (or both), i.e., the floating-point constants are of two forms, viz., fractional form and the exponential form. A floating-point constant in fractional form must have:
· At least one digit
· A decimal point
· Positive or negative sign (default sign is positive)
· No commas or spaces embedded in it.
A floating-point constant in its fractional form use four bytes in
memory. For example, -26.9876,
+867.9 and
654.0 are valid floating-point constants.
In exponential form, the floating-point constant is represented in two
parts, e.g., the floating-point constant, 0.00032 is written in exponential form as +3.2e -4. The part
preceding the e is known as mantissa; and the other succeeding e is called exponent. A floating-point constant in
exponential form must follow the rules:
·
The mantissa part and the exponential
part should be separated by the letter e
·
The mantissa may have a positive or
negative sign (default sign is positive)
·
The exponent must have at least one
digit
·
The exponent may be a positive or
negative integer (default sign is positive)
Floating-point
constants encompass a much greater range than integer constants. Typically, the
magnitude of a floating-point constant might range from a minimum value of approximately
3.4 e -38 to a maximum of 3.4e +38 . Some
versions of the language allow floating-point constants that cover a wider
range such as 1.7e - 308 to 1.7e +308. Note that the value 0.0 (even
being less than 3.4e -38 and 1.7e - 308) is
a valid floating-point constant. You should find out the suitable values for
the version of C used on your computer system.
Examples of some
valid floating-point constants:
|
|
|
|
|
|
|
|
|
|
|
|
Examples of some
invalid floating-point constants (the reasons specified
against each):
|
Either a decimal point or an exponent must be present |
|
Illegal character (,) |
|
The exponent must be an integer quantity (it cannot contain a decimal point) |
|
Illegal character (blank space) in the exponent. |
Floating-point constants are normally represented as double-precision
quantities. Thus, each floating-point constant occupies eight bytes of memory.
Some versions of C allow the specification of a single-precision
floating-point constant by suffixing the letter F (in either uppercase or
lowercase) to the end of the constant, e.g., 3e5F. Similarly, some
versions of C permit the specification of a long floating-point constant by
appending the letter L (uppercase or lowercase), e.g., 0.987654321e
22L. Note that precision of floating-point constants, i.e., the number
of significant figures is different for different versions of C. Basically,
all these versions permit at least six significant figures and some versions
permit as many as eighteen significant figures. You should determine the
appropriate number of significant figures for your particular version of C.
Character constants
A character constant is an alphabet, a single digit or
a single special character enclosed within single quotation marks. The maximum
length of a character constant can be 1 character. It occupies one byte of memory.
Examples
A |
x |
3 |
|
Note that the last
constant consists of a blank space enclosed in quotation marks.
Character
constants have integer values that are determined by the machines specific
character set. Thus, the value of a character constant may vary from computer
to computer. However, the constants themselves are independent of the character
set. Majority of computer systems and virtually all personal computers use
American Standard Code for Information Interchange (ASCII) character set (visit
http://blob.perl.org/books/beginning-perl/3145_AppF.pdf
for further details), in
which each individual character is numerically encoded with a unique 7-bit
combination (hence, a total of 27 =128 differenr
characters).
Example:
several character constants and their corresponding values as per the ASCII
character set are shown below:
Constant |
Value |
Constant |
Value |
Constant |
Value |
A |
65 |
x |
120 |
3 |
51 |
? |
63 |
|
32 |
|
|
Note that these values will be the same for all the computers
using the ASCII character set. However, the values will be different for
computers using an alternate character set such as IBM mainframe computers used
EBCDIC character set, which is based on its own unique 8-bit combination.
Escape sequences
Certain non-printing characters as well as the
backslash and the apostrophe () can be expressed in terms of escape sequences. An
escape sequence always begins with a backward slash (\) followed by
one or more special characters, e.g., a line feed (called newline in C
perspective) is represented as \n. Such escape sequences always represent single
characters, even though they comprise of two or more characters. The commonly
used escape sequences along with their corresponding ASCII values are listed in
Table-4.3.
Table 4.3 Some Escape Sequences vis-ΰ-vis ASCII
Values
Character |
Escape Sequence |
ASCII Value |
null |
\0 |
000 |
bell |
\a |
007 |
backspace |
\b |
008 |
horizontal tab |
\t |
009 |
newline |
\n |
010 |
vertical tab |
\v |
011 |
form feed |
\f |
012 |
carriage return |
\r |
013 |
quotation mark |
\" |
034 |
apostrophe |
\' |
039 |
question mark |
\? |
063 |
backlash |
\\ |
092 |
Example
Some character
constants expressed in terms of escape sequences are given below:
\n |
\t |
\b |
\' |
\\ |
\" |
Note that
the character constants \0 and 0 are distinct. Also, note that the escape sequences
can be expressed by means of octal or hexadecimal number systems. Generally, using
an octal or hexadecimal escape sequence is less popular than writing the
character constant directly.
String constants
A string constant consists of any number of
consecutive characters (including none) enclosed in double quotation marks, e.g.,
Operation Flood; NDRI,
Karnal-132001; +91-184-2559015; 19.95; The correct
answer is: ; 2*(I+3)/J; Line 1\nLine
2\nLine 3; ; ; and so on.
Note that a
character constant, e.g., A and corresponding single-character string constant, A
are not equivalent! Also, mind that a character constant has an equivalent
integer value, whereas a single-character string constant does not have such an
integer value.
4.6
Variables
and Arrays
A variable is a named memory location (in the main memory or RAM) that
can be used to read and write information. You may think of a variable as
placeholder for a value. A variable is considered as being equivalent to its
assigned value. Thus, if there is a variable,i initialised (or set equal) to 0, then it follows
that i+1 will be equal to 1. Hence, a variable is an identifier that
denotes some specified type of information within a designated portion of the
programme. The variable must be assigned a value at some point in the
programme. Subsequently, the value can be accessed in the programme by
referring to the variable name. A given variable can be assigned different
values at various places within the programme. Thus, the information contained
in the variable can change during the execution of the programme. However, the
type of the information (i.e., numeric or string, etc.)
associated with the variable cannot change. The rules governing naming
variables in C are as follows:
·
The name can contain letters, digits
and the underscore
·
The first letter must be any valid
letter or the underscore
·
An underscore as the first letter
should be avoided as it may conflict with standard system variables
·
The length of name can be unlimited
although the first 31 characters must be unique
·
Keywords cannot be used
as a variable name
·
Of course, the variable
name should be meaningful to the programming context.
As C is, relatively, a low-level programming
language, therefore, before a C programme can utilise memory to store a
variable, it must demand the memory needed to store the values for a variable.
This is realised by declaring variables. Declaring variables is the
statement(s) in which a C programme specifies the number of variables it
needs, their names and quantum of memory they will need.
The array is another kind of variable that is commonly
used in C. An array is an identifier that refers to a collection of data
items that all have the same name. The data items must all be of the same type
such as integer, character, etc. The individual data items are
represented by their corresponding array elements, i.e., the first data
item is represented by the first array element and so on. The individual array
elements are distinguished from each other by the value that is assigned to a
subscript. The detailed description of variables and arrays will be made in the
following modules/lessons.
4.7
Operators
An operator is a symbol that facilitates the
programmer to instruct the computer machine to perform certain mathematical or
logical manipulations. Operators are used in C programme to operate on data
and variables. C supports a rich collection of operators, viz.,
arithmetic operators, relational operators, logical operators, assignment
operators, increments and decrement operators, conditional operators, bitwise
operators and special operators (to be discussed later).