Molecular Clustering of Phenylurea Herbicides: Comparison with Sulphonylureas, Pesticides and Persistent Organic Pollutants

. Chromatographic retention times of phenylurea herbicides are modelled by structure–property relationships. Properties are hydration free energy and dipole. Bioplastic evolution is an evolutionary perspective conjugating the effect of acquired characters and relations that emerge among evolutionary indeterminacy, morphological determination and natural selection principles. Classification algorithms are proposed based on information entropy and production. Phenylureas are classified by Cl2, O2 and N2 presence; their different behaviour depends on the number of Cl atoms. When applying procedures to moderate-sized sets, excessive results appear compatible with data and suffer a combinatorial explosion; however, the equipartition conjecture allows a selection criterion resulting from classification between hierarchical trees. Information entropy permits classifying compounds and agrees with principal component analyses. Phenylureas periodic table shows that those in the same group present similar properties; phenylureas also in the same period, maximum resemblance. Classification extends to phenylureas, sulphonylureas, pesticides and persistent organic pollutants.


INTRODUCTION
Phenylurea herbicides are used in formulations and for nonagrigultural use. Sørensen et al. (2008) detected residues as water contaminants. Diuron and linuron are substituted ureas, which are soluble in water, migrate in soil and enter food chain. Both are a toxicological risk to humans and wildlife. The former, which is used in cotton growing and fruit crops, is the third most hazardous pesticide for groundwater resources. Both are applied on railways to maintain quality and provide safer working environment, which leads to groundwater contamination because of leaching potential (Cederlund et al., 2007). Phenylureas enter environment via pathways, e.g., spray drift, runoff from treated fields and leaching into groundwater; they penetrate into soil where they are subjected to micro-organism action and degradation (Canonica & Laubscher, 2008). They are unstable photochemically but they persist in water for several days or weeks depending on temperature and pH. Pesticide pollution of water reservoirs became frequent (Berrada et al., 2004). Phenylurea residues were found in water sources, processed products and crops. In India, most soft-drink bottling plants used surface water from canals and rivers, which present pesticide contamination. Water treatment was insufficient for removal of pesticides, which were above permissible limits. Evidence was provided in a 2003 Centre for Science and Environment (CSE) report, which found several pesticide residues in many soft drinks; findings were confirmed by Joint Parliamentary Committee. In 2006, CSE conducted tests and found pesticides. Sensitive, selective and efficient methods for herbicide analysis are designed. Analytical methods are high-performance liquid chromatography (LC)-ultraviolet (HPLC-UV), solid-phase microextraction (SPME)-HPLC, diode array, immunosorbent trace enrichment and HPLC, LC-mass spectrometry (MS), gas chromatography (GC)-MS, capillary electrophoresis (Chicharro et al., 2005) Yoshioka & Ichihashi (2008) and Cao et al. (2009) informed soft-drink SPE. As polar and degradablepesticide use becomes widespread, more sensitive analytical methods are developed for residual analysis in matrices. The HPLC showed advantages over GC because it is used for simultaneous analysis of thermally unstable nonvolatile, polar and neutral species without a derivative step. Because of thermally unstable nature of phenylurea herbicides, GC direct application is not possible and derivatization is needed. The HPLC with UV/FL detection was preferred over GC. Kaur et al. (2012) described a simple and sensitive HPLC-UV method for analysis of phenylurea herbicides (monuron, diuron, linuron, metazachlor, metoxuron), which involved single-step preconcentration by SPE. Phenylurea herbicidal action is based on ability to inhibit photosynthesis. Blasco & Picó (2009)  Starting point is to use information entropy for pattern recognition. Entropy is formulated based on similarity matrix between two biochemical species. As entropy is weakly discriminating for classification, more powerful concepts of entropy production and equipartition conjecture are introduced. The aim of the present report is to review the properties that distinguish the phenylurea structures according with retention times. This study applies chemical index to phenylureas. The goal of this work is index usefulness validation via capability to distinguish phenylureas, and interest as predictive index for retention time as compared with hydration free energy and dipole moment. The following section describes the computational method. The next section illustrates and discusses the calculation results. Finally, the last section summarizes our conclusions.

RESEARCH METHOD
The problem in classification studies is to define similarity indices when several criteria of comparison are involved (Iordache, 2011(Iordache, , 2012(Iordache, , 2014). The first step in quantifying similarity for phenylureas is to list main chemical characteristics of molecules. Vector of properties = <i 1 ,i 2 ,…i k ,…> should be associated with every urea i, whose components correspond to different molecular features in a hierarchical order according to expected retention importance. If characteristic m-th is chromatographically more significant for retention than k-th then m < k. Components i k are either "1" or "0" according to whether a similar characteristic of rank k is either present or absent in urea i compared to reference. Analysis includes three characteristics in urea molecules: presence of two Cl/O/N atoms (cf. Fig. 1). It is assumed that chemical characteristics of a urea molecule can be ranked according to their contribution to retention in the following order of decaying importance: Cl 2 > O 2 > N 2 . Index i1 = 1 denotes Cl 2 , i 2 = 1 means O 2 and i 3 = 1 signifies N 2 . In linuron, numbers of atoms are {Cl 2 ,O 2 ,N 2 }; obviously, its vector is <111>. Table 1 contains vectors associated with ureas. Similarity index between two ureas = <i 1 ,i 2 ,…i k …> and = <j 1 ,j 2 ,…j k …> is defined as: The definition assigns a weight (ak) k to any property involved in the description of molecule i or j. The grouping algorithm uses the stabilized matrix of similarity obtained applying max-min composition rule o defined by: (2) where R = [r ij ] and S = [s ij ] are matrices of equal type and (RoS) ij is element (i,j)-th of matrix RoS. When applying max-min composition rule iteratively so that R(n+1) = R(n) o R an integer n exists such that: R(n) = R(n+1) = … The resulting matrix R(n) is called stabilized similarity matrix. Stabilization importance lies in the fact that it generates partition into disjoint classes. The stabilized matrix is designated by R(n) = [r ij( n)]. The grouping rule follows: i and j are assigned to the same class if r ij (n) ≥ b. The class of i noted is set of species j that satisfies rule: r ij (n) ≥ b. Matrix of classes is: where s stands for any index of species belonging to class (similarly for t and ). Rule (3) means finding largest similarity index between species of two different classes. Information entropy h associated with similarity matrix R results:

32
ETET Volume 1 (4) Every hierarchical tree corresponds to entropy dependence on grouping level and diagram h-b is obtained. The equipartition conjecture of entropy production is proposed as selection criterion among different variants resulting from classification among hierarchical trees. The best dendrogram is that in which entropy production is most uniformly distributed. The equipartition line is: Since classification is discrete, a way of expressing equipartition would be a regular staircase function. The best variant is chosen that minimizing deviation square sum: (6) Learning procedures similar to stochastic methods are implemented. Consider a given partition into classes as good from practical observations, which corresponds to the reference similarity matrix S=[s ij ] obtained for an arbitrary number of fictious properties. Consider also the same set of species as in the good classification and actual properties. The degree of similarity rij is computed [Eq. (1)] giving matrix R. The number of properties for R and S differs. The learning procedure consists in finding classification results for R as close as possible to the good classification. The new similarity matrix is obtained [Eq. (1)]. The distance between partitions characterized by R and S results:

RESULTS AND ANALYSIS
For phenylureas, HPLC retention times Rt were taken from Kaur et al. Metoxuron was reference R t º because of least retention (cf. Table 1); ratios (R t -R t º)/R t º were calculated. Molecular dipole  was computed with Molecular Orbital Package-Austin model 1 (MOPAC-AM1). For linuron, an alternative conformation with 1.1kJ·mol -1 energy difference was used with higher dipole *. Standard Gibbs free energy of hydration G hydr º was calculated via program SCAP (Torrens & Castellano, 2012b The (R t -R t º)/R t º correlated with the dipole moment . The fit turned out to be: (8) where n is the number of points, r Pearson correlation coefficient, s standard deviation, F Fischer ratio, MAPE mean absolute percentage error and AEV approximation error variance. Linuron alternative conformation improved results: (9) International Journal of Engineering and Technologies Vol. 1 and AEV decayed by 46%. Both dipoles presented fit and AEV dropped by 83%. Hydration free energy G hydr º improved fit and AEV decreased by 91%. Additional fitting parameters were tested: partition coefficients, free energies of solvation and waterorganic-solvent transfer, molecular weight, volume, surface, globularity and rugosity, total and hydrophobic/philic solvent-accessible surfaces, molecular fractal dimension and that for external atoms, numbers of Cl, total atoms and cycles, total and differential formation enthalpy, etc.; however, results did not better. Dipole inclusion allowed fit and AEV decayed by 93%. Dipole * addition permitted fitting and AEV dropped by 94%. Both dipoles improved fit and AEV decreased by 98.6%. Chemical indices required variables T, S and W: T is minus standard formation enthalpy, S, molecular surface and W, molecular weight. They allowed I m = S/W and I c = T/I m . Molecular surface was computed with program TOPO (Torrens, 2003). {I m ,I c } bettered fit and AEV decayed by 78%. Correction of {*,G hydr º } with Ic bettered fit and AEV dropped by 98.7%. Simplified analysis was obtained by hierarchical QSPR (HQSPR): variables were split into logical blocks (electronic {,*}, solvation {G hydr º }, plastic {I m ,I c }); every block was represented by its most important variable: *, G hydr º or I c .
Pearson correlation coefficient matrix R was calculated between pairs of vector properties <i 1 ,i 2 ,i 3 > of five ureas. Intercorrelations are illustrated in partial correlation diagram (PCD), which contains high (r ≥ 0.75), medium (0.50 ≤ r < 0.75), low (0.25 ≤ r < 0.50) and zero (r < 0.25) partial autocorrelations. Pairs of ureas with higher partial correlations show similar vector property. However, the results should be taken with care because Entry 4 with constant vector <000> and Entry 5 with constant vector <111> show null standard deviation, causing greatest partial correlations r = 1 with any urea, which is an artefact. With the equipartition conjecture, upper triangle resulted: Some correlations are high, e.g., R 2,4 = 0.750. They are illustrated in partial correlation diagram, which contains one high (cf. Fig. 2, red lines), three medium (orange), three low (yellow) and three zero (black) partial correlations. Three out of four high partial correlations of Entry 4 are corrected: its correlation with Entry 1 is medium, its correlation with Entry 3 is low and its correlation with Entry 5 is zero partial correlation. All four high partial correlations of Entry 5 are corrected: correlation with Entry 3 is medium, correlation with Entry 1 is low, and correlations with Entries 2 and 4 are zero partial correlations. In all, 6 out of 7 high partial correlations of Entries 4 and 5 are corrected.

34
ETET Volume 1 Three groupings are obtained with associated entropy h-R-b 1 = 4.70 matching to <i 1 ,i 2 ,i 3 > and Cb 1 ; the binary taxonomy of Table 1 separates clusters 1, 2 and 3 with two, two and one ureas, respectively. Ureas 3 and 5 with two Cl atoms and great retention are grouped into the same class. Compounds belonging to the same grouping appear highly correlated in the partial correlation diagram (Fig. 2); however, C-b 1 results should be taken with care because class (4) with only one substance could be an outlier. At level b 2 with 0.13 ≤ b 2 ≤ 0.50 the set of groupings is: C-b 2 = (1,2,4) (3,5) Two classes result and entropy decays to h-R-b 2 = 2.08 matching to <i 1 ,i 2 ,i 3 > and C-b 2 dividing clusters: 1 and 2 with three and two ureas. Again, ureas with Cl2 and great retention are grouped into the same class (3,5). Ureas belonging to the same cluster appear highly correlated in the partial correlation diagram (Fig. 2). Table 2 shows a comparative analysis of set containing 1-5 classes in agreement with partial correlation diagram (Fig. 2).

Table 2. Classification level, number of classes and entropy for phenylureas vector property
From previous partial correlation diagram (Fig. 2) and set of five classifications ( Table 2) we suggest to split data into three groupings: (1)(2,4)(3,5) Ureas dendrogram (cf. Fig. 3) shows different behaviour depending on the number of Cl atoms. Once more, ureas with Cl 2 and great retention are grouped into class (3,5). The illustration of the classification above in a radial tree (cf. Fig. 4) shows the different behaviour of ureas depending on the number of Cl atoms. The same classes above are recognized in qualitative agreement with partial correlation diagram and dendrogram (Figs. 2 and 3). One more time, ureas with Cl 2 and great retention are grouped into class (3,5).

Figure 4. Radial tree of phenylurea herbicides
Program SplitsTree allows examining cluster analysis (CA) data. Based on split decomposition it takes as input a distance matrix and produces as output a graph, which represents relations between taxa. For ideal data, graph is a tree whereas less ideal data will give rise to a tree-like net, which could be interpreted as a possible evidence for conflicting data. As split decomposition does not attempt to force data onto a tree it can provide a good indication of how tree-like are given data. Splits graph for five ureas (cf. Fig. 5) reveals conflicting relation between class 3 and groupings 1 and 2 because of interdependences. It indicates spurious relation from base-composition effects. It shows different urea behaviour depending on number of Cl atoms in qualitative agreement with partial correlation diagram and binary/radial trees (Figs. 2-4).

Figure 5. Splits graph of phenylurea herbicides
A principal components analysis (PCA) was performed for urea vector properties (Shaw, 2003). Importance of PCA factors F 1 -F 3 for {i 1 ,i 2 ,i 3 } is collected in Table 3. Factor F1 explains 56% variance (44% error); F 1/2 , 83% variance (17% error); F 1-3 , 100% variance (0% error). Table 3. Importance of principal component analysis factors for phenylurea vector property The PCA factor loadings are shown in Table 4. The PCA F 1 -F 3 profile for vector property is listed in Table 5. For F 1 and F 3 , variable i 3 shows the greatest weight in the profile; however, F 1 and F 3 cannot be reduced to two variables {i 1 ,i 3 } without 29% and 21% errors, respectively. For F 2 , variables {i 1 ,i 2 } present 100% weight and F 2 is reduced to both variables with 0% error. Factors F 1-3 can be considered as linear combinations of {i 1 ,i3}, {i 1 ,i 2 } and {i 1 ,i 3 } with 29%, 0% and 21% errors.

38
ETET Volume 1 The PCA F 2 -F 1 scores plot for ureas (cf. Fig. 6) shows different behaviour depending on number of Cl atoms. It distinguishes three groupings: class 1 with one urea (F 1 >> F 2 , bottom), grouping 2 with two ureas (F 1 < F 2 = 0, left) and class 3 with two ureas (F 1 ≈ F 2 > 0, top). From PCA factor loadings of ureas F 2 -F 1 loadings plot (cf. Fig. 7) depicts the three properties. In addition, as a complement to the scores plot (Fig. 6) for the loadings (Fig. 7) it is confirmed that ureas in class 3 located on the top present a contribution of Cl 2 situated on the same position (Fig.  6). The urea in grouping 1 on the bottom shows more pronounced contribution of O 2 . Two clusters of properties are clearly distinguished in the loadings plot: class 1 {Cl 2 } (0 < F 1 < F 2 , Fig. 7, top) and grouping 2 {O 2 ,N 2 } (F 1 >> F 2 , bottom). Some correlations are relatively high, e.g., R 2,3 = 0.594. All are represented in partial correlation diagram, which contains one medium (cf. Fig. 8, orange), one low (yellow) and one zero (black) partial correlations. Dendrogram for vector properties (cf. Fig. 9) separates first Cl 2 (class 1) and then O 2 from N 2 (grouping 2), in agreement with PCA loadings plot and partial correlation diagram (Figs. 7 and 8).

40
ETET Volume 1 Figure 9. Dendrogram for the vector properties corresponding to phenylurea herbicides The radial tree for the vector properties (cf. Fig. 10) separates the same two classes as PCA loadings plot, partial correlation diagram and dendrogram (Figs. 7-9).
International Journal of Engineering and Technologies Vol. 1 41 Figure 10. Radial tree for the vector properties corresponding to phenylurea herbicides Splits graph for properties (cf. Fig. 11) reveals no conflicting relation between vector components. It is in agreement with PCA loadings plot, partial correlation diagram and binary/radial trees (Figs. 7-10).

44
ETET Volume 1 Property P variation (cf. Fig. 13) of vector <i 1 ,i 2 ,i 3 > is expressed in the decimal system P = 10 2 i 1 + 10i 2 + i 3 vs. structural parameters {i 1 ,i 2 ,i 3 } for ureas. Property was not used in PT (Table 6) development and serves to validate it. Most points appear superimposed. Results show a parameters hierarchy: i 1 > i 2 > i 3 in agreement with property PT with vertical groups defined by {i 1 ,i 2 } and horizontal periods, by {i 3 }. Figure 13. Variation of property P(p) of phenylurea herbicides vs. counts {i 1 ,i 2 ,i 3 } Property P change of vector <i 1 ,i 2 ,i 3 > in base 10 (cf. Fig. 14) vs. the number of group in PT, for ureas, reveals minima corresponding to compounds with <i 1 ,i 2 > ca. <00> (group g00) and maxima with <i 1 ,i 2 > ca. <11> (group g11). For group 1, period 2 is superimposed on 1. Periods p0 and p 1 represent rows 1 and 2, respectively, in Table 6. Corresponding function P(i 1 ,i 2 ,i 3 ) denotes two periodic waves clearly limited by minima and maxima, which suggest a periodic behaviour that recalls the form of a trigonometric function. For <i 1 ,i 2 ,i 3 >, a minimum is clearly shown. The distance in <i 1 ,i 2 ,i 3 > units between each pair of consecutive minima is four, which coincides with urea sets in successive periods. The minima occupy analogous positions in the curve and are in phase. The representative points in phase should correspond to elements in the same group in PT. For minima, <i 1 ,i 2 ,i 3 > coherence exists between the two representations; however, consistency is not general. Wave comparison shows two differences: period 1 is incomplete and period 2 is somewhat staircase-like. Most characteristic points of the plot are minima, which lie about group g00. Values <i 1 ,i 2 ,i 3 > are repeated as periodic law (PL) states.
Order relations (10) should repeat at intervals equal to period size and are equivalent to: (11) As relations (11) are valid only for minima, others more general are desired for all p values. The differences D(p) = P(p+1)-P(p) are calculated assigning every value to phenylurea p: (12) Instead of D(p), R(p) = P(p+1)/P(p) can be taken allocating them to urea p. If PL were general, elements in the same group in analogous positions in different periods would satisfy: either (13) and either (14) However, results show that this is not the case so that PL is not general existing some anomalies; e.g., D(p) variation vs. group number (cf. Fig. 15) presents a lack of coherence between <i 1 ,i 2 ,i 3 > Cartesian and PT representations. If consistency were rigorous, all points in each period would present the same sign. In general, a trend exists to give D(p) > 0, especially for the lower groups. Change of R(p) vs. group number (cf. Fig. 16) confirms the lack of constancy between Cartesian and PT charts. If steadiness were exact all points in each period would show R(p) either lesser or greater than one. A trend exists to give R(p) > 1, especially for the lower groups.

CONCLUSION
From the present results and discussion the following conclusions can be drawn. 1. Objective was to develop a structure-property model for qualitative and quantitative prediction of phenylureas retention. Results contribute to pesticide residue prediction in food and environmental samples. Program SCAP allows hydration and solvation free energies. Hydration free energy and dipole differentiated ureas. An alternative conformation above global minimum was used with higher dipole. Parameters needed for co-ordination index are molar formation enthalpy, weight and surface area. Morphological and co-ordination indices improved correlations for retention. Co-ordination index was predictive for retention when used with others. Correlation between molecular area and weight points to a homogeneous molecular structure of ureas, and ability to predict and tailor herbicide properties; the latter is nontrivial in environmental toxicology. A hierarchical quantitative structure-property relation provided properties simplified analyses.
2. Criteria selected to reduce the analysis to a manageable quantity of phenylureas referred to structural characteristics related to presence of two O, N and, especially, Cl atoms. Classification agreeded with principal component analyses. Program MolClas is a simple, reliable, efficient and fast procedure for molecular categorization based on equipartition conjecture of entropy production. It was written not only to analyze the equipartition conjecture of entropy production but also to explore the world of molecular clustering.
3. Periodic law does not satisfy physics-law status: (a) phenylureas retentions are not repeated; perhaps chemical character; (b) order relations are repeated with exceptions. The analysis forces the statement: The relations that any compound p has with its neighbour p + 1 are approximately repeated for each period. Periodicity is not general; however, if substance natural order is accepted the law must be phenomenological. Retention is not used in periodic-table generation and serves to validate it. Other-property analysis would give insight into possible generality of periodic table. Periodic classification was extended to phenylureas, sulphonylureas, pesticides and persistent organic pollutants.
Work is in progress on the quantitative structure-antioxidant activity models of isoflavonoids from Dalbergia parviflora.