Module 2. Basic components of ‘C’ programming language

Lesson 5

DATA TYPES IN ‘C’

5.1  Introduction

‘C’ uses the concept of data types, which is used to define a variable before its use in a programme. The definition of a variable will assign storage for the variable and define the type of data that will be held in the memory location. Hence, the data type defines: the amount of storage allocated to variables; the values that the variables can accept; and the operations that can be performed on these variables. ‘C’ data types can be broadly classified as:

·         Primary data type

·         Derived data type

·         User-defined data type

5.2  Primary Data Types

All ‘C’ compilers accept the following basic data types:

Table 5.1 Basic data types supported by ‘C’

Serial Number

Data Type

Keyword

Range of Values

1.

Integer

int

- 32768 to +32767 

2.

Character

char

- 128  to  127

3.

Floating-point

float

3.4e - 38 to 3.4e + 38

4.

Double precision floating-point

double

1.7e - 308 to 1.7e + 308 

5.

Void

void

-

Integer types

Integers are whole numbers with a machine dependent range of values. ‘C’ has three classes of integer storage, namely, short int, int and long int. All of these data types have signed and unsigned forms. The short int requires half the space than normal integer values. Unsigned numbers are always positive and consume all the bits for the magnitude of the number. The long and unsigned integers are used to declare a longer range of values.

Floating-point types

A floating-point number represents a real number with six digits precision. Floating-point numbers are denoted by the keyword float. When the accuracy of the floating-point number is insufficient, the data type, double is used to define the number. The double is same as float but with longer precision. To extend the precision further, use long double,which consumes  80 bits of memory space.

Void type

The void data type is used to specify the type of a function. It is a good practice to avoid functions that does not return any values to the calling function.

Character type

A single character can be defined as a character type of data. Characters are usually stored in 8 bits of internal storage. The qualifier signed or unsigned can be explicitly applied to char. While unsigned characters have value between 0 and 255 characters have values from -128 to 127. Size and range of data types on 16 bit machine is given in Table-5.2 below:

Table 5.2 Size and range of data types on 16 bit machine

Type

Size (Bits)

Range

char or signed char

8

- 128  to  127

unsigned char

8

0 to 255

int or signed int

16

- 32768 to 32767 

unsigned int

16

0 to 65535

short int or signed short int

8

- 128  to  127 

unsigned short int

8

0 to 255

long int or signed long int

32

- 2147483648 to 2147483647

unsigned long int

32

0 to 4294967295

float

32

3.4e - 38 to 3.4e + 38

double

64

1.7e - 308 to 1.7e + 308 

long double

80

3.4e - 4932 to 3.4e + 4932 

 

5.3  Declaration of Variables

Every variable used in the programme should be declared to the compiler. The declaration serves two purposes:

a)      Tells the compiler the variables’ names.

b)      Specifies types of data the variables will hold.

The general format of any declaration is:

datatype variable_1, variable_2, …, variable_n;

where variable_1, variable_2, etc., are variable names. Variables are separated by commas. A declaration statement must end with a semicolon.

Examples

int sum;
int number, salary;
double average, mean;

5.4  Programmer-defined Type Declaration

In ‘C’, the ‘typedef’ feature allows programmers to define new data type(s) that is/are equivalent to existing data type(s) [see Table-5.3]. Once a programmer-defined data type is defined, new identifiers such as variables, arrays, etc., can be declared in terms of the newly defined data type. The general syntax is:

typedef type user-defined-type;

where ‘type’ represents existing data type (including a standard data type or earlier programmer-defined data type) and ‘user-defined-type’ refers to the new programmer-defined name given to the data type.

Examples

typedef int age;
typedef float marks;

Here, ‘age’ is a programmer-defined data type, which is equivalent to integer data type. Hence, the variable declaration

age male, female;

is equivalent to writing

int male, female;

That is, the variables, ‘male’ and ‘female’ are regarded as variables of type ‘age’, though these are actually variables of integer type.

Table 5.3 The standard data types in ‘C’

Data Type

Description

Typical Memory Requirements

int

Integer quantity

2 bytes or 1 word

(varies from one computer to another)

Short

Short integer quantity (may contain fewer digits that int)

2 bytes or 1 word

(varies from one computer to another)

long

Long integer quantity (may contain more digits that int)

1 or 2 words

(varies from one computer to another)

unsigned

Unsigned (positive) integer quantity (maximum permissible quantity is approximately twice as large as int)

2 bytes or 1 word

(varies from one computer to another)

char

Single character

1 byte

signed char

Single character, with numerical values ranging from -128 to 127

1 byte

unsigned char

Single character, with numerical values ranging from 0 to 255

1 byte

float     

Floating-point number (i.e., a number containing a decimal point and/or an exponent)

1 word

double

Double-precision floating-point number (i.e., more significant figures, and an exponent that may be larger in magnitude)

2 words

long double

Double-precision floating-point number      (may be higher precision than double)

2 or more words

(varies from one computer to another)

void

Special data type for functions that do not  return any value

(not applicable)

enum

Enumeration constant (special type of int)

2 bytes or 1 word

(varies from one computer to another)

Note: The qualifier unsigned may appear with short int or long int, e.g., unsigned short int (or unsigned short), or unsigned long int (or unsigned long).

 

Similarly, the following declarations:

typedef float height[30];

height boys, girls;

define ‘height’ as a 30-element floating-point array type. Hence, boys and girls are 30-element floating-point arrays (arrays are discussed later). The typedef feature is quite suitable while defining structures as it avoids the need to repeatedly write the struct tag whenever a structure is referenced. Further discussion will be continued on this topic in the last module.

5.5  Symbolic Constants

Symbolic constants are names for a sequence of characters. The characters may represent a numeric constant, a character constant or a string constant. Thus, a symbolic constant allows a name to appear in place of a numeric, character or string constant. At the time of compiling each occurrence of a symbolic constant gets replaced by its corresponding character sequence. These are usually defined at the beginning of a programme, e.g., the following statements:

#define ANGLE_MIN 0

#define ANGLE_MAX 360

#define PI 3.141593
#define TRUE 1
#define FALSE 0

would define the symbolic constants, ANGLE_MIN, ANGLE_MAX, PI, TRUE and FALSE to the values 0, 360, 3.141593, 1 and 0, respectively.  Recall that ‘C’ distinguishes between lowercase and uppercase letters in variable names. It is a tradition to use capital letters in defining global constants (like symbolic constants). Note that the symbolic constant definitions do not end with a semicolon unlike other ‘C’ statements.

Important Note: The following Sections 5.6 and 5.7 deal with the advanced topics on Enumerations and Macros. Therefore, the students may better comprehend these advanced topics after completing other lessons of this module.

5.6  Enumerations

An enumeration is a data type like a structure or a union (to be discussed later). It consists of a set of named values that represent integral constants, known as enumeration constants. An enumeration is also referred to as an enumerated type because you must list (enumerate) every value in creating a name for each of them. In addition to providing a way of defining and grouping sets of integral constants, enumerations are useful for variables that have a small number of possible values. You can declare an enumeration type separately from the definition of variables.

Enumeration type definition

Generally, an enumeration type definition begins with the enum keyword followed by an optional identifier (the enumeration tag) and a brace-enclosed list of enumerators. A comma separates each enumerator in the enumerator list.

enum tag {member 1, member 2, …, member m};

where enum is required keyword; tag is a name that identifies enumerations having this composition; and member 1, member 2, …, member m represent the individual identifiers that may be assigned to variables of this type. These member names must unique as well as they must be distinct from other identifiers whose scope is the same as that of the enumeration.

Enumeration variable declaration

Once the enumeration is defined, corresponding enumeration variables can be declared as follows:

storage-class enum tag variable 1, variable 2, …, variable n;

where storage-class is an optional storage class specifier, enum is the required keyword, tag is the name that appeared in the enumeration definition, and variable 1, variable 2, …, variable n are enumeration variables of the type tag.

The enumeration definition can be clubbed with the variable declarations, as follows:

storage-class enum tag {member 1, member 2, …, member m}
variable 1, variable 2, …, variable n;

the tag is optional in this situation.

An illustration – Consider the following statements as a part of a ‘C’ programme:

enum colours {black, blue, cyan, green, magenta, red, white, yellow};

colours foreground, background;

Note that, the first statement defines enumeration named colours (i.e., the tag is colour). The enumeration consists of eight constants whose names are black, blue, cyan, green, magenta, red, white and yellow. The second statement declares the variables foreground and background to be enumeration variables of type colours. Thus, each variable can be assigned any one of the constants black, blue, …, yellow

The two declarations can be combined as follows:

enum colours {black, blue, cyan, green, magenta, red,white, yellow} foreground background;

or without the tag, simply:

enum {black, blue, cyan, green, magenta, red, white,yellow} foreground background;

Enumeration constants are automatically assigned equivalent integer values, beginning with 0 for the first constant and with each successive constant increasing by 1. Thus, member 1 will automatically be assigned the value 0; member 2 will be assigned 1, and so on.

Example 1: Consider the following code demonstrating the use of enumeration constants. (For detailed description about the printf statement, see Lesson-5).

#include <stdio.h>

int main ()

{

  enum compass_direction  {north, east, south, west};

  enum compass_direction my_direction;

  my_direction = east;

  printf(“%d”, my_direction);

  return 0;

}

Output :

It is quite interesting to note that the aforementioned automatic assignments can be overridden within the definition of the enumeration! That is, some of the constants can be assigned explicit integer values, which differ from default values. To do so, each constant (i.e., each member), which is assigned an explicit value is expressed as an ordinary assignment expression; member = int, where int represents a signed integer quantity. Those constants that are not assigned explicit values will automatically be assigned values, which increase successively by 1 from last explicit assignment. This may cause two or more enumeration constants to have the same integer value.

Example 2:  Consider the following code demonstrating the use of enumeration constants with implicit and explicit assignments.

#include <stdio.h>

int main ()

{

  enum colour  {black=-1, blue, cyan, green,  
         magenta, red=2, white, yellow};

  enum foreground background;

  background = green;

  printf("%d\n", background);

  return 0;

}

Output :

The constants black and red are now assigned the explicit values,-1 and 2 respectively. The remaining enumeration constants are automatically assigned values that increase successively by 1 from the last explicit assignment. Thus, blue, cyan, green and magenta are assigned the values. 0, 1, 2 and 3 respectively. Similarly, white and yellow are assigned the values 3 and 4, respectively. Note that there are now duplicate assignments, i.e., green and red represents 2, where magenta and white both represent 3.

Enumeration variables can be processed in the same manner as other integer variables. Thus, they can be assigned new values, compared, etc. However, it should be understood that enumeration variables are generally used internally, to indicate various conditions that can arise within a programme. Hence, there are certain restrictions associated with their use. In particular, an enumeration constant cannot be read into the computer and assigned to an enumeration variable. (It is possible to enter an integer and assign it to an enumeration variable, though it is generally not done). Moreover, only integer value of an enumeration variable can be written out of the computer.    

5.7  Macros

As discussed earlier, the #define statement is used to define symbolic constants within a ‘C’ programme. All symbolic constants are replaced by their equivalent text at the beginning of the compilation process. Thus, symbolic constants provide shorthand notation to simplify the organisation of a programme. Besides, #define statement can be used to define macros. A macro is a single identifier that is equivalent to expressions, complete statements or group of statements; i.e., a fragment of code, which has been given a name. Whenever this name is used within the programme, it is replaced by the contents of the macro. In this sense, macros resemble functions; however, they are defined in an entirely different manner than functions.

You may define any valid identifier as a macro, even if it is a ‘C’ keyword. The pre-processor does not know about the keywords. This can be useful if you wish to hide a keyword such as const from an older compiler that does not recognise it. However, the pre-processor operator ‘defined’ can never be defined as a macro. Macros slow down the compiling process; however, the compiled programmes (executable codes) are faster than functions as functions involve passing values thereby increasing CPU usage.

The formal syntax of a macro is:

#define name(dummy1[,dummy2][,...]) token string

The symbols dummy1, dummy2, ... are called dummy arguments (the square brackets indicate optional items).

Example 1:

Consider the following simple example of a macro (Students should never emulate this in any real project):

#define SquareOf(x) x*x

It defines a kind of function, which, used in an actual piece of code, looks exactly like any other function call:

double y_out, x_in=3;

y_out = SquareOf(x_in);

As you would see subsequently, the problem is that the macro ‘SquareOf’ only pretends to be a function call, while it is absolutely different.

There are a few additional rules such as that the macro can extend over several lines, provided one uses a backslash to indicate line continuation:

#define ThirdPowerOf(dummy_argument) \

         dummy_argument \

        *dummy_argument \

        *dummy_argument

Of course, you should break the line at a reasonable position; and not, for example, in the middle of a symbol.

How does a compiler handle a macro?

What makes a macro different from a standard function is primarily the fact that a macro is a scripted directive for the compiler rather than a scripted piece of run-time code; and, therefore, it is dealt with at compilation time rather than at run time. When the compiler encounters a previously defined macro, it first isolates its actual arguments, handling them as plain text strings separated by commas. Then it parses (i.e., divides the code into functional components; compiler must parse source code in order to translate it into object code) the token string, isolates all occurrences of each dummy-argument symbol and replaces it by the actual argument string. The whole process consists entirely of mechanical string substitutions with almost no semantic (logical) testing!

The compiler then substitutes the modified token string for the original macro call and compiles the resulting code script. It is only in that phase that compilation errors can occur. When they do, the result is often either amusing or frustrating, depending upon how you feel at that moment as you may get mysteriously looking error messages resulting from the modified text; and thus, referring to something you have never written!

This is explained with the help of the following small programme, which is formally correct and compiles without any problem:

#include <stdio.h>

#define SquareOf(x) x*x

void main()

{

   int x_in=3;

   printf("\nx_in=%i",x_in);
   printf("\nSquareOf(x_in)=%i",SquareOf(x_in));
   printf("\nSquareOf(x_in+4)=%i",SquareOf(x_in+4));
   printf("\nSquareOf(x_in+x_in)=%i",SquareOf(x_in+x_in));
}

Naturally, you would expect the output of this programme as:

x_in=3
SquareOf(x_in)=9
SquareOf(x_in+4)=49
SquareOf(x_in+x_in)=36

However, what you actually get is:

x_in=3
SquareOf(x_in)=9
SquareOf(x_in+4)=19
SquareOf(x_in+x_in)=15

Let us see what happened. When the complier encountered the string “Squareof(x_in+4)”, it replaced the string with the string “x*x”; followed by replacing each of the dummy-argument-string “x_in+4”, obtaining the final string “x_in+4 * x_in+4”, which, in fact, evaluates to 19 and not to the expected value of 49. Similarly, it is now easy to work out the expression SquareOf(x_in+x_in) and understand why and how the result differs from the expected one.

The problem would have never happened if SquareOf(x) were a normal function. In that case, the argument x_in+4 would be first evaluated as a self-standing expression and only then would the result be passed to the function SquareOf for the evaluation of the square.

Actually, both the ways are correct! They are just two different recipes on how to handle the respective scripts. However, given the formal similarity between the function-like macro call and a standard function call; the discrepancy is dangerous and should be removed. Luckily, there is a simple remedy, i.e., replace the original definition of the SquareOf macro by

#define SquareOf(x) (x)*(x)

The problem vanishes because, for example, the macro-call string "SquareOf(x_in+4)" is transformed into "(x)*(x)" and then into "(x_in+4)*(x_in+4)", which evaluates exactly as intended.

On the basis of the foregoing discussion about macros, the students are advised to keep in mind the following rules while using macros in their programmes (For detailed description about the do...while statement, see Module-III):

Rule 1: Always write multi-line macros using following pattern:

 #define name \

    do { \

       macro definition here \

    } while (0)

Rule 2: Always surround macro arguments with parentheses inside the macro body.

Rule 3: Keep your macros as short as possible.

 Example 2:

#include <stdio.h>
#define MUL(x,y) (x) * (y)
int main()
{
    printf("%d\n" , MUL(3, 5));
//    system("pause");
    return 0;
}

Example 3:

/*

 * It swaps two integer numbers.

 * Requires tmp variable to be defined.

 */

#define SWAP(x, y) \

  do { \

    tmp = x; \

    x = y; \

    y = tmp; } \

  while (0)

int main()

{

     int x=10, y=20;

     SWAP(x,y);    

     printf("%d %d\n" , x, y);

     return 0;

}