## Lexical analysis of a chemical formula

The initial operation in

chemistry is to transform a chemical formula, which is a string corresponding

to a sequence of characters and digits, into a molecular weight (a real

number). It is obvious for the intellect of the chemist but cannot be easily

achieved by the computer.

Lexical analysis of the

chemical formula is performed in the class analysis: analysis:

an = new analysis(chemical_formula).

Characters are analysed

following the sequence:

7

Beginning

of the lexical analysis of a chemical formula with n atoms and n coefficients

o Atom n and corresponding coefficient

- First letter has to be

uppercase: H, Cl, .. - If exists, second letter has

to be lowercase: Cl, Al, … - Then could be a number

(digit): H2O, Al2O3, … - Could be again a number if

the coefficient is more than 9 - Then could be a dot if it is

a real number: Fe0.9O, … - Then could be a digit (first

decimal) - Then could be again a digit

(second decimal) - The next character could be a

comma: NaCl,H2O - Atom n and coefficient are obtained;

go to the atom n+1 - End of the lexical analysis

**class analysis{**

String

chemForm;

float

molmas = 0f;

analysis(String

cForm){

chemForm

= cForm;//Chemical formula

String

s[] = new String[20];// Symbols of the elements in the chemical formula

float massat = 0;//Atomic masses

———————————————

float

coeff[] = new float[20];// Coefficients ——————————–

int

len = cForm.length();//Number of characters in the formula

char

c;

String

ch, coefficient;

int

a = 0, i = 0, end = 0;

cForm

= cForm + " ";

//

Lexical analysis of the chemical formula in args[0]

do{

ch

= ""; coefficient = "1"; coeff[a] =0;

//

First letter has to be uppercase

c

= cForm.charAt(i);

if(Character.isUpperCase(c)){

ch

= String.valueOf(c);

s[a]

= ch;

i++;

}

//

If exists, second letter has to be lowercase

c

= cForm.charAt(i);

if(Character.isLowerCase(c)){

ch

= String.valueOf(c);

s[a]

=s[a] + ch; // The symbol of the element is obtained

i++;

}

//

Then could be a number (digit)

c

= cForm.charAt(i);

if

(Character.isDigit(c)){

coefficient

= String.valueOf(c);

i++;

}

//

Could be again a number

c

= cForm.charAt(i);

if

(Character.isDigit(c)){

coefficient

= coefficient + String.valueOf(c);

i++;

}

//

Then could be a dot if it is a real number

c

= cForm.charAt(i);

if(c

==’.’){

coefficient

= coefficient + ".";

i++;

}

//

Then could be a digit (first decimal)

c

= cForm.charAt(i);

if

(Character.isDigit(c)){

coefficient

= coefficient + String.valueOf(c);

i++;

}

//

Then could be again a digit (second decimal)

c

= cForm.charAt(i);

if

(Character.isDigit(c)){

coefficient =

coefficient + String.valueOf(c);

i++;

}

c

= cForm.charAt(i);

// The next character could be a comma

if(c

==’,’) i++;

coeff[a]

= Float.valueOf(coefficient).floatValue();

if

(coeff[a]==0) coeff[a] = 1;

a++;

}while(i<=len-1);

// End of the lexical analysis of the chemical formula

end

= a – 1;

calc_masmol

ms = new calc_masmol(end, s, coeff);

molmas

= ms.mt();

}

float

result(){return molmas;}

}

## Calculation of the molecular weight

Molecular weights are obtained from the class calc_massat. The atomic

symbols symb[] and weights ma[], are put in the program as final arrays. This

data could be read in an extra file but as they are definitively fixed it is

more convenient to compile them.

**class calc_masmol{**

float

masmol;

static

final String symb[] = {"Ac", "Ag", "Al",

"Am", "As", "At", "Au", "B",

"Ba",

"Be",

"Bi", "Bk", "Br", "C", "Ca",

"Cd", "Ce", "Cf", "Cl", "Co",

"Cr", "Cs", "Cu",

"Dy", "Er",

"Es", "Eu", "F", "Fe", "Ga",

"Gd", "Ge", "H", "Hf", "Hg",

"Ho", "I",

"In", "Ir",

"K", "La", "Li", "Lu", "Lr",

"Md", "Mg",

"Mn", "Mo",

"N", "Na", "Nb",

"Nd",

"Ni", "No", "Np", "Os", "P",

"Pa", "Pb", "Pd", "Pm", "Po",

"Pr", "Pt", "Pu",

"Ra", "Rb",

"Re", "Rh", "Ru", "S", "Sb",

"Sc", "Se", "Si", "Sm", "Sn",

"Sr", "Ta",

"Tb", "Tc",

"Te", "Th", "Ti", "Tl", "Tm",

"U", "V", "W", "Y", "Yb",

"Zn", "Zr", "O"};

static final float ma[] =

{227.0278f, 107.8682f, 26.98f, 243.0614f, 74.9216f, 209.987f,

196.966f, 10.811f, 137.327f, 9.012f, 208.980f, 247.07f, 79.904f,

12.011f,

40.078f,

112.411f,140.115f, 251.0796f, 35.4527f, 58.933f, 51.996f, 132.905f,

63.546f,

162.50f, 167.26f, 252.083f, 151.965f, 18.998f, 55.847f, 69.723f,

157.25f,

72.61f, 1.00794f, 178.49f, 200.59f, 164.930f, 126.905f, 114.82f,

192.22f,

39.0983f,138.906f, 6.941f, 174.967f, 260.1053f, 258.099f, 24.305f,

54.938f,

95.94f, 14.007f, 22.90f, 92.906f, 144.24f, 58.69f, 259.1009f, 237.048f,

190.2f,

30.974f, 231.036f,207.2f, 106.42f, 146.915f, 208.9824f, 140.908f,

195.08f,

244.064f, 226.03f, 85.47f, 186.207f, 102.91f, 101.07f, 32.066f, 121.75f,

44.96f,

78.96f, 28.09f, 150.36f, 118.71f, 87.62f, 180.95f, 158.93f, 98.91f,

127.6f,

232.04f, 47.88f, 204.38f, 168.93f, 238.029f, 50.94f, 183.85f, 88.91f,

173.04f,

65.39f, 91.224f, 15.994f};

calc_masmol(int

ed, String s[], float coeff[]){

float

massat[] = new float[ed + 1];

for

(int a = 0; a <= ed; a++){

for

(int i = 0; i<=symb.length-1; i++ ){

if

(s[a].equals(symb[i])){

massat[a]

= ma[i];

break;

}

}

}

for

(int a = 0; a <= ed; a++)

if

(massat[a] > 0) masmol= masmol + massat[a]*coeff[a];

else

{

masmol=0;

break;

}

}

float

mt(){return masmol;}

}

## Usage

These two previous classes can be used for many purposes in chemistry

such as calculation of a molecular weight, preparation of a solution with a

given concentration or preparation of a mixture of two compounds (obviously of n compounds).

In the next applications, formulae must be case sensitive: NaCl. The

coefficient of the element has to be written after the symbol: C6H6. Non integer

coefficients are accepted: Ba0.5Sr0.5TiO3. Additive formula – NaClO4,H2O

– is also possible but formula like FeCl3,6H2O is not accepted and has to be

written FeCl3, H12O6.

In order to shorten this article, the exceptions are not considered in

the following lines but they are in the downlodable application (chemCalcApp.java)

### Example 1: Calculation of molecular

weight

A very simple application can be written :

**public class chemCalcApp{**

public

static void main (String[] args){

analysis

an = new analysis(args[0]);

System.out.println("Molecular

weight of " + args[0] + " = " + an.result() + "g");

}

}

The result in the console is as following

D>java chemCalcApp H2O

Molecular weight of H2O = 18.00988g

### Example 2: Preparation of a solution with a

given concentration

**public class calcsol**{

public static void main (String[] args){

analysis an = new analysis(args[0]);

System.out.println("Weigth " +

an.result()*Float.valueOf(args[1]).floatValue()

+ " g "+ " of " + args[0] + " for 1 liter of

solvent");

}

}

The

chemical formula and the desired concentration are obtained from the command

line as args[0] and args[1].

In the

console:

D>java calcsol NaCl 0.01

Weigth 0.58352697 g of NaCl for 1 liter of solvent

### Example 3: Preparation of a mixture

of two compounds

**public class mixing{**

public static void main (String[] args){

analysis

an1 = new analysis(args[0]); float w1 = an1.result();

analysis

an2 = new analysis(args[1]); float w2 = an2.result();

float coef1

= Float.valueOf(args[2]).floatValue();

float coef2

= Float.valueOf(args[3]).floatValue();

float total

= Float.valueOf(args[4]).floatValue();

float

totalmolmas = w1*coef1 + w2*coef2;

System.out.println("Amount

to weight for a total mass of " + total + "g");

System.out.println(args[0]

+ " = "+ w1*total/totalmolmas + " g ");

System.out.println(args[1]

+ " = "+ w2*total/totalmolmas + " g ");

}

}

The

chemical formulae (args[0] and args[1]), the molar coefficients (args[2] and

args[3]) and the desired total weight (args[4]) are obtained from the command

line as.

In the

console:

D>java mixing NaCl KCl 1 1 10

Amount to weight for a total mass of 10.0g

NaCl = 4.3905997 g

KCl = 5.6094 g

## Download

About the Author

*Josik Portier is Directeur de Recherche at the Institut de Chimie de la Matihre Condensie de Bordeaux of the Centre National de la Recherche Scientifique.*