Appl Biochem Biotechnol DOI 10.1007/s12010-013-0710-4

Phylogenetic Analysis of Putative Genes Involved in the Tryptophan-Dependent Pathway of Auxin Biosynthesis in Rice Yousef M. Abu-Zaitoon

Received: 24 October 2013 / Accepted: 25 December 2013 # Springer Science+Business Media New York 2014

Abstract Plant proteome databases were mined for a flavin monooxygenase (YUCCA), tryptophan decarboxylase (TDC), nitrilase (NIT), and aldehyde oxidase (AO) enzymes that could be involved in the tryptophan-dependent pathway of auxin biosynthesis. Phylogenetic trees for enzyme sequences obtained were constructed. The YUCCA and TDC trees showed that these enzymes were conserved across the plant kingdom and therefore could be involved in auxin synthesis. YUCCAs branched into two clades. Most experimentally studied YUCCAs were found in the first clade. The second clade which has representatives from only seed plants contained Arabidopsis sequences linked to embryonic development. Therefore, sequences in this clade were suggested to be evolved with seed development. Examination of TDC activity and expression had previously linked this enzyme to secondary products synthesis. However, the phylogenetic finding of a conserved TDC clade across land plants suggested its essential role in plant growth. Phylogenetic analysis of AOs showed that plants inherited one AO. Recent gene duplication was suggested as AO sequences from each species were similar to each other rather than to AO from other species. Taken together and based on the experimental support of the involvement of AO in abscisic synthesis, AO was excluded as an intermediate in IAA production. Phylogenetic tree for NIT showed that the first clade contained sequences from species across the plant kingdom whereas the second branch contained sequences from only Brassicaceae. Even though NIT4 orthologues were conserved in the second clade, their major role seems to be detoxification of hydrogen cyanide rather than producing IAA. Keywords Indole-3-acetic acid . Phylogenetic analysis . Flavin monooxygenase . Tryptophan decarboxylase . Nitrilase . Aldehyde oxidase

Introduction Several tryptophan-dependent pathways including the indole pyruvic acid (IPA) pathway, the indole acetamide (IAM) pathway, the indole-3-acetaldoxime (IAOx) pathway, and tryptamine (TAM) pathway (Fig. 1) have been proposed for the production of IAA in plants. Although the Y. M. Abu-Zaitoon (*) Department of Biology, Faculty of science, Islamic University in Madinah, Madinah, Saudi Arabia e-mail: [email protected]

Appl Biochem Biotechnol O

O N

OH

IAOx

OH

CYP79B2/3 cyp79b2/3

Trp

N H

NH

NH2

OH

wei8/sav3/tir2 TAA1TAR1/2

IPA

N H

TDC

O

NH2 HTAM

TAM

N H Cyp71A13 pad4

N

IAN

O

IAAld

IAM

NH

NH2

YU CC A

N H

NIT 1/2/3

nit1

N H

O

AO AMI

OH IAA

N H

O

Fig. 1 Proposed routes from tryptophan to IAA in plants. Bold lines refer to the first complete dissected pathway of IAA synthesis in plants. Dashed lines indicate that neither a gene nor enzyme activity has been identified in any member of plants. Trp tryptophan, IAOx indole-3-acetaldoxime, IAM indole-3-acetamide, IAA indole-3acetic acid, IPA indole-3-pyruvic acid, TAM tryptamine, HTAM N-hydroxyl tryptamine, IAAld indole-3acetaldehyde, IAN indole-3-acetonitrile, TAA tryptophan aminotransferase, AMI amidase, NIT nitrilase, AO aldehyde oxidase, YUCCA a flavin monooxygenase. References for mutant phenotypes are described in [1] wei8, [2] (sav3), [3] tir2, [16] cytp79b2/3, [4] pad4, and [5] nit1. Gene abbreviations are given in upper case italics. Mutant alleles are given in lower case italics

role of YUCCA in auxin biosynthesis has been well established, the involvement of other enzymes is unclear. In addition, the involvement of tryptophan decarboxylase and nitrilase in the synthesis and metabolism of secondary products further complicates the situation. In this study, bioinformatics was used to investigate the phylogenetic relationships of four enzymes proposed to be involved in the tryptophan-dependent pathways. YUCCA, TDC, NIT, and AO proteins from the following plant species: Arabidopsis thaliana, Zea mays, and Catharanthus roseus were used as query sequences. Homologues of the targeted proteins from diverse taxonomic groups of moss, non-seed vascular plants, monocots, and dicots were identified. Physcomitrella patens, Selaginella moellendorffii, Oryza sativa, Z. mays, Sorghum bicolor, Populus trichocarpa, and A. thaliana whose genomes are fully sequenced were used as representatives of the plant groups. Constructed phylogenetic trees for YUCCA, TDC, AO, and NIT enzymes were then investigated to identify any clades which are conserved across diverse plant species. Because auxin is implicated in most aspects of plant growth and development, enzymes involved in IAA synthesis were expected to be conserved across the plant kingdom. Based on this hypothesis it was expected to find conserved clades for enzymes proposed to have a role in IAA synthesis from all studied species. For example, in addition to a suggested role in IAA synthesis, AO has been shown to be involved in the biosynthesis of abscisic acid. If AO is

Appl Biochem Biotechnol

involved in the biosynthesis of both hormones, it was expected to find two clades of AOs one conserved for abscisic acid synthesis and one for IAA synthesis. Furthermore, TDC and nitrilase were proposed to play a role in the metabolism of secondary products such as terpenoid indole alkaloids and indole glucosinolates [6, 7]. These products differ from one taxonomic group to another, and unlike IAA synthesis, enzymes involved in secondary metabolite metabolism were expected to be restricted to particular plant clades. Alignment of amino acids rather than nucleotide sequences was used in this study because it is easier to find unauthentic alignment of DNA sequence which consists of only four nucleotides comparing to 20 different amino acids for protein sequences. DNA mutates quickly as a 24 % of single base changes produce the same amino acid. ClustalW (accurate) was used to align all sequences as it is the most widely used multiplesequence alignment program. The program initially identifies all possible pairwise alignments generating a raw phylogenetic tree. Pairs with the highest similarity score were then used to carry out realignment with the remaining sequences [8]. Distance-matrix [9], maximum parsimony [10], and maximum likelihood [11] are the three major methods used to construct phylogenetic trees. To construct phylogenetic trees the distance-matrix method was used. In this method the output of multiple-sequence alignment appears as a matrix containing genetic distances between all pairs of sequences. Mismatches between aligned sequences identify genetic distance. Protdist uses protein sequences to calculate the distance matrix under three models of amino acid substitutions: Dayhoff PAM matrix [12], Kimura’s distance [13], and the Categories distance [14]. In this study the default Dayhoff PAM matrix was used because it is based on observed rates of amino acid changes [15]. The involvement of YUCCA in IAA synthesis has been backed by strong experimental evidence. Since the first report about the importance of YUCCA as an enzyme catalyzing the rate-limiting step in the tryptamine pathway in Arabidopsis [16], YUCCA orthologues from a group of plants including, tomato, maize, Petunia, and rice were identified [17–20]. Phylogenetic analysis showed that YUCCA is conserved across diverse plant groups. Similar relationships were expected in phylogenetic trees of other enzymes expected to be involve in auxin production. YUCCA phylogenetic tree was also used to identify rice orthologues of experimentally tested genes from other plants. The amino acid sequence of the Arabidopsis YUCCA1 (AtYUCCA1) [16] was used in a BlastP search [21] for YUCCA orthologues. Tryptophan decarboxylase has been studied extensively in plants including C. roseus, Camptotheca acuminata, and Ophiorrhiza pumila. It has been shown to catalyze the first committed step in the production of indole alkaloids [22–24]. Tryptophan decarboxylase may also catalyze the first step in auxin biosynthesis by converting tryptophan to tryptamine. The enzymatic activity of Catharanthus TDC was identified as a tryptophan decarboxylase [24]. Putative TDC proteins from various plants were identified by BlastP using Catharanthus TDC (emb |CAA47898|) as the query. It should be noted that no Arabidopsis TDCs have characterized enzyme activity. Indole-3-acetaldoxime has been proposed to be converted to IAA via either indole-3acetaldehyde (IAAld) or indole-3-acetonitrile (IAN). IAOx may be converted to IAAld, which then may be converted to IAA via aldehyde oxidase. No enzyme has yet been found to catalyze the former reaction. In Arabidopsis CYP71A13 has been shown to convert IAOx to IAN [4]. This compound may be converted to IAA through the action of nitrilase. Aldehyde oxidase or nitrilase would be expected to be conserved across diverse plant species if AO or nitrilase are involved in IAA synthesis.

Appl Biochem Biotechnol

Aldehyde oxidase has been proposed to have involvement in the production of two plant hormones. Arabidopsis aldehyde oxidase delta (AOδ), the product of AAO3 gene, efficiently oxidizes abscisic aldehyde to produce abscisic acid in Arabidopsis leaves [25]. Among five tested aldehydes, AOδ expressed in Pichia pastoris exhibited the absolutely lowest Km value for abscisic aldehyde indicating an essential role in abscisic acid production [25]. AO has been also implicated in IAA synthesis as cell-free preparations from oat, cucumber, and maize have been shown to oxidize IAAld to IAA [26–28]. Arabidopsis aldehyde oxidase3 (At2g27150) was used as the query sequence to identify all AO orthologues in rice. Nitrilase (gb|AAO11742.1|) in Z. mays (ZmNIT2) has been found to convert indole-3acetonitrile to indole-3-acetic acid. The activity of this enzyme to catalyze the reaction has been reported to be seven to 20 times greater than that of Arabidopsis nitrilases1, 2, and 3. Moreover, it was found that ZmNIT2 is expressed in maize kernels where elevated level of IAA has been detected [29]. Therefore, sequence of ZmNIT2 was used as a query to identify all orthologues in rice and other plants. The aim of this article was to find conserved clades, if any, for enzymes proposed to be involved in the tryptophan-dependent pathways of IAA production. To achieve this aim, phylogenetic trees for YUCCA, TDC, AO, and NIT sequences from rice as well as plants from diverse taxonomic plant groups were analyzed. A potential clade of seed-specific YUCCA genes was discussed.

Methods Homologues of YUCCA, TDC, NIT, and AO were identified following BlastP searches [21] against proteomes of rice (TIGR rice genome database, version 5.0), Arabidopsis (TIGR Arabidopsis genome database, version 5.0), poplar (JGI P. trichocarpa version 1.1), Sorghum (S. bicolor v1.0), a moss, Physcomitrella (P. patens subsp patens version 1.1), and a vascular non-seed plant, Selaginella (S. moellendorffii v1.0). The GenBank database (nonredundant protein sequences) was screened to search for sequences from other plants. The Australian National Genomic Information Service [30] was used to construct both Bootstrapped and non-Bootstrapped trees for YUCCA, TDC, nitrilase, and aldehyde oxidase protein sequences. ClustalW (accurate), as a first step in constructing phylogenetic trees, was used to carry out multiple-sequence alignments [8]. A matrix of genetic distance between all pairs of sequences was then drawn using distance method computed by Protdist [15]. The output of the Protdist was then analyzed by Fitch to construct the phylogenetic trees [15]. Bootstrapping is a method of resampling used to measure the reliability of phylogenetic results. Bootstrapping was first introduced for phylogenetic analysis by Felsenstein [11]. In this method, columns of multiple-sequence alignment in the original data matrix are randomly selected with replacement giving a new matrix with the same number of columns as in the original data matrix. In BioManager this procedure can be repeated from 100 to 1,000 times. Trees for all alternative multiple-sequence alignments are then drawn and confidence value for each node is obtained by counting the number of times sibling branches in the original clade are repeated in the sample trees. Bootstrap confidence value is a measure of statistical support of a given clade. The probability for sequences to share a clade increases with increasing the confidence value. Even though a 75 % cutoff value can be used [30] a value of 95 % applied to other statistical analysis was used in this study to identify nodes with more support [15]. Bootstrap values were calculated to test the reliability of constructed trees using the following programs: ClustalW (Accurate), Seqboot with 500 replicates [15], Protdist, Fitch, and Consense. Rooted trees were constructed using homologous sequences from species other

Appl Biochem Biotechnol

than plants or paralogous plants protein as outgroups. All phylogenetic trees were displayed using TreeView [31]. In the resulting trees, branches in bold type indicate that the bootstrap values are less than 95 %. Arabidopsis and rice proteins were identified using TIGR locus numbers while protein IDs were used to identify Selaginella and Physcomitrella proteins. Proteins from Sorghum were referred to by the locus number if available otherwise the ID numbers were used. Abbreviated names, as explained in the figure legends were used to identify Populus proteins. GenBank accession numbers were used to identify proteins from other plant species.

Results YUCCA As a result of BlastP search using TIGR rice proteome data base, 14 putative proteins were identified as orthologues of Arabidopsis YUCCA1. These sequences have been previously described and numbered by [18, 20] and the same numbering system used by these authors had been used. The BlastP search resulted in orthologues of AtYUCCA from all studied plants with fully sequenced genomes in addition to other plants including, A. thaliana, Medicago truncatula, Solanum lycopersicum, Petunia x hybrida, Z. mays, S. bicolor, P. patens, S. moellendorffii, and P. trichocarpa. The phylogenetic tree for YUCCA (Fig. 2) showed that all sequences fall into two separate branches supported at 100 % by bootstrap analysis. Similarity on the amino acid level between OsYUCCAs from the two branches was relatively low. For instance OsYUCCA1 from the first branch shared only 44 % similarity to OsYUCCA11 from the second one. Angiosperms with fully sequenced genomes (rice, Populus, and Arabidopsis) had YUCCA sequences in both major branches whereas sequences form the moss and vascular non-seed plants were only represented in the first branch. Most angiosperms had more YUCCA orthologues in the first branch, e.g., Arabidopsis had nine proteins in the first branch but only two in the second branch. However, cereals in particular rice was unusual as six sequences out of 14 were found in the second branch. The first major branch where most sequences were found was divided into five minor groups. However, not all of these are supported by bootstrap analysis indicating that some of the structure is ambiguous. The first minor group contained sequences from only primitive species including the moss Physcomitrella and the non-seed plant Selaginella. Sequences from only Populus and cereals including rice and Sorghum were found in the second minor group. The third minor group, which was supported at 100 % by bootstrap analysis, was the biggest group in this branch. This minor group contained YUCCA proteins from both monocots and dicots including Sorghum, rice, Populus, and Arabidopsis. Five AtYUCCA sequences out of 11 were found in this group. AtYUCCA2 and AtYUCCA6 sequences in addition to sequences from rice, Populus, Medicago, and Sorghum were found in the forth minor group. The last minor group contained AtYUCCA1, AtYUCCA4, OsYUCC1, OsYUCC8 as well as YUCC As from Sorghum, Populus, tomato, Petunia, and Medicago. In the first major branch OsYUCCAs were distributed evenly with two sequences in each minor group. Amino acid similarity between sequences in each minor group was relatively low. For instance, OsYUCCA4, OsYUCCA5; OsYUCCA6, OsYUCCA7; OsYUCCA2, OsYUCCA3; and OsYUCCA1, OsYUCCA8 were 52, 68, 52, and 54 % similar, respectively. OsYUCCA1 was most similar to AtYUCCA1 and AtYUCCA4 while OsYUCCA2 and OsYUCCA3 were most similar to AtYUCCA2 and AtYUCCA6.

Appl Biochem Biotechnol 108750 P. patens 75206 S. moellendorffii 80431 S. moellendorffii 64527 S. moellendorffii 66487 P. patens 121794 P. patens 55917 P. patens VII.3430 P. trichocarpa 129405 S. bicolor OsYUCCA5 Os12g32750 O. sativa 5052389 S. bicolor OsYUCCA4 Os01g12490 O. sativa 4873899 S. bicolor OsYUCCA6 Os07g25540 O. sativa 4832201 S. bicolor OsYUCCA7 Os04g03980 O. sativa LG II002544 P. trichocarpa YUCCA8 At4g28720 A. thaliana YUCCA9 At1g04180 A. thaliana YUCCA5 At5g43890 A. thaliana 00100414 P. trichocarpa YUCCA3 At1g04610 A. thaliana YUCCA7 At2g33230 A. thaliana OsYUCCA2 Os05g45240 O. sativa 4753204 S. bicolor OsYUCCA3 Os01g53200 O. sativa YUCCA2 At4g13260 A. thaliana YUCCA6 At5g25620 A. thaliana VI.663.1 P. trichocarpa ABE83417 M. truncatula 124093 S. bicolor OsYUCCA8 Os03g06654 O. sativa 4804814 S. bicolor OsYUCCA1 Os01g45760 O. sativa YUCCA1 At4g32540 A. thaliana XVIII.1433 P. trichocarpa CAJ46041 S. lycopersicum AAK74069 P. x hybrida ABE92660 M. truncatula YUCCA4 At5g11320 A. thaliana 57000067 P. trichocarpa YUCCA11 At1g21430 A. thaliana YUCCA10 At1g48910 A. thaliana OsYUCCA11 Os12g08780 O. sativa ABL61263 Z. mays OsYUCCA12 Os02g17230 O. sativa OsYUCCA14 Os11g10170 O. sativa OsYUCCA13 Os11g10140 O. sativa LG XIV000597 P. trichocarpa OsYUCCA10 Os01g16750 O. sativa OsYUCCA9 Os01g16714 O. sativa 108142 M. loti 0.1

Fig. 2 Phylogenetic tree showing phylogenetic relationships between YUCCAs from various plants. The tree was constructed using ClustalW (Accurate) [8], Protdist [15], and Fitch [15]. Bold line branches indicate that the bootstrap values are less than 95 %. The phylogenetic tree was displayed using TreeView [31]. The outgroup used was dimethylaniline monooxygenase from Mesorhizobium loti. Bootstrap values, 500 replicates, were calculated in Seqboot [15] to test the reliability of nodes. All programs were accessed via BioManager by Australian National Genomic Information Service [30]. Arabidopsis and rice proteins are identified by TIGR locus numbers. Protein IDs are used to identify Selaginella and Physcomitrella proteins. Proteins from Sorghum are referred to by the locus number if available otherwise the ID numbers are used. Populus proteins are abbreviated as following: gw1.VII.3430.1; VII.3430, fgenesh4_pg.C_LG_II002544; LGII002544, eugene3.00100414; 00100414, gw1.VI.663.1; VI.663, gw1.XVIII.1433.1; XVIII.1433, fgenesh4_pg.C_scaffold_57000067; 57000067, fgenesh4_pm.C_LG_XIV000597; LGXIV000597. GenBank accession numbers are used to identify proteins from other plant species

The second major branch contained only Angiosperm sequences. OsYUCCA9, OsYUCC A10, OsYUCCA11, OsYUCCA12, OsYUCCA13, and OsYUCCA14 were most similar to AtYUCCA10 and AtYUCCA11. Sequences from Populus and maize were also found in this branch. Similarity at the amino acid level between rice sequences in the second major branch was high reaching up to 87 % in case of OsYUCCA13, OsYUCCA14; 81 % for OsYUCCA9, OsYUCCA10; and 79 % for OsYUCCA13, OsYUCCA12. Table 1 shows percent of amino acid identity between rice proteins and similar Arabidopsis and maize sequences in the same clade.

Appl Biochem Biotechnol Table 1 Amino acid identity of OsYUCCAs with Arabidopsis and maize YUCCAs represented in the same clade AtYUCC A1

AtYUCC A4

AtYUCC A2

AtYUCC A6

AtYUCC A10

AtYUCC A11

ZmYUCC A

OsYUCCAA2

63 %

64 %

OsYUCCAA3

66 %

OsYUCCAA9 OsYUCC AA10

62 % 47 % 46 %

47 % 47

49 % 47 %

OsYUCC AA11

50 %

52 %

64 %

OsYUCC AA12

50 %

49 %

55 %

OsYUCC AA13

51 %

48 %

59 %

OsYUCC AA14

51 %

50 %

56 %

OsYUCCAA1 60 %

65 %

It has been shown that the prosthetic group FAD and the cofactor NADPH are essential for the activity of flavin monooxygenase enzymes [16, 32]. Analysis of CLUSTALW alignment of the identified OsYUCCA proteins showed both conserved binding motifs, one for NADPH (Fig. 3a) and the other for FAD (Fig. 3b). Tryptophan/Tyrosine Decarboxylase Tryptophan, tyrosine (TYDC), and phenylalanine decarboxylases (PADC), as well as phenylacetaldehyde synthase (PAAS) belong to a family of aromatic L-amino acid OsYUCCA1 AtYUCCA1 OsYUCCA8 OsYUCCA2 OsYUCCA3 OsYUCCA4 OsYUCCA6 OsYUCCA7 OsYUCCA5 OsYUCCA9 OsYUCCA10 OsYUCCA13 OsYUCCA14 OsYUCCA12 OsYUCCA11

---EVLMARWLVVATGENAEPRVPDFPGMQKFAG-CAMHTSEYKSGEQFAGKKVLVVGCG ---TEYLSKWLIVATGENADPYFPEIPGRKKFSGGKIVHASEYKSGEEFRRQKVLVVGCG ---TEYASPWLVVASGENAEVVVPKVKGRERFAG-EALHSSEYRSGERFRGMRVLVVGCG ---VEYVSQWLVVATGENSEAVLPVIDGLEEFRG-SVIHTSAYKSGSKFAGKTVLVVGCG ---VEYVSRWLVVATGENAEVVLPEIDGLDDFKG-TVMHTSSYKSGGAFAGKRVLVVGSG CTMTVYRSRWLVVATGENAEPVVPEMDGAGRFKG-QMMHSSEYRNGDGYAGKKVLVVGCG --VTEYIGRWLVVATGENAERVVPEIDGADDFEG-PVSHVAEYKSGAAYRGKRVLVVGCG --EMEYIGSWLVVATGENAESVVPDIPGIDGFGG-EVVHVADYKSGEAYRGKRVLVVGCG --ETEFLSRWLVVATGENAVAAWP-AEGVGAYRG-AVMHTSSYKRGDEFAGKKVLVVGCG -----YAARHLVAAAGENDERVVPEVPGMETFPG-KVVHAADYRSAEGFKGKSVLVVGGG -----YAARHLVAAAGENDERVVPEVPGMDTFPG-KVVHSADYRSAGAFKGRSVLVVGCG -----LTAQFLVVATGENSAASIPPVPGLTRFEG-EAIHSSAYKSGRAYTGKNVLVVGAG -----LATQFLVVAAGENSAANIPPIPGLSRFEG-EAIHSSAYKSGRAYTGKSVLVVGAG -----LTARFLIMATGEKSAASIPLVPGLAGFEG-EAIHSSAYKSGNGYTGKSVLVVGAG -----YAARFLVVASGENSAGNIPSIPGLEDFSG-HVIHSSSFRSADSYAAQRVLVVGCG *: *:**: * * : * * : :: . : ***** *

186 191 203 184 229 228 188 196 224 187 219 183 183 180 180

OsYUCCA1 AtYUCCA1 OsYUCCA8 OsYUCCA2 OsYUCCA3 OsYUCCA4 OsYUCCA6 OsYUCCA7 OsYUCCA5 OsYUCCA9 OsYUCCA10 OsYUCCA13 OsYUCCA14 OsYUCCA12 OsYUCCA11

----MD----NKPAQ-----------------------------ERRETWVPGAVIVGAG ----MESHPHNKTDQ-----------------------------TQHIILVHGPIIIGAG ----MQGQQKQNAGGGGG------------------------DNASPCIVLDGPIIVGAG ----------------------------------------------MLVWVQGPIVVGAG MDPWSEIEGKRAHDPIFQNYFSQNCRQSVDGFCKKRSADAAVARAERCIRVLGPIIVGAG MDCFAETEGKRAHDPLYQRRAAAAATPATG---------VPVDDVDKVVDVPGAVIVGAG ----------------MA---------------------------ARVVWVNGPIVVGAG ----MVLQHSDRMDSLFS---------------------------PQTSWVSGPIIVGAG -MPTSQQDGGGGGERLFHGGGAGDGDGGGGG-------GAAAAAAARQVWVPGPVIVGAG ------------------------------------MAAVEQ------DQEEEVIIVGAG ----------MHPTPFTSIRSTRATIDELCSAKAAAMAVAAEGEEMTPEHEEEVIIVGAG ---------------------------------------------MEAVEEVVVLIVGAG ---------------------------------------------MNAMEDVVVLIVGAG ------------------------------------------------MEEVVVLIIGAG ------------------------------------------------MEKALVLIVGAG :::***

23 27 32 14 60 51 17 29 52 18 50 15 15 12 12

OsYUCCA1 AtYUCCA1 OsYUCCA8 OsYUCCA2 OsYUCCA3 OsYUCCA4 OsYUCCA6 OsYUCCA7 OsYUCCA5 OsYUCCA9 OsYUCCA10 OsYUCCA13 OsYUCCA14 OsYUCCA12 OsYUCCA11

NSGMEVSLDLCRHG-AKPSMVVRNT-VHVLPREMFGLSTFGIAMALLRWLPVQLVDRFLL NSGMEISLDLVRHN-ASPHLVVRNT-VHVLPREILGVSTFGVGMTLLKCLPLRLVDKFLL NSGMEMCLDLCEHG-AMPFMSVRSG-VHVLPREMFGASTFGIAMKLLRWLPIKMVDRFLL NSGMEVCLDLCNHN-GYPRIVV-----HILPREMLGQPTFRLAMWLLKWLPIHIVDRILL NSGMEVCLDLCNHN-ANPHIVVRDA-VHILPREMLGQSTFGLSMWLLKWLPVHVVDRILL NSGMEVSLDLCNHN-ARASMVVRDT-VHVLPREILGFSTFGLSMWLLRWLSVQTVDWLVL NSGMEVCLDLCHHN-ALPAMVVRDSKVHVLPREMLGVATFSVAVFLLRFLPLWVVDRILV NSGMEVSLDLCDHG-ARPAMVVRDA-VHVLPREVLGKSTFELAVLLMAWLPLWLVDKILV NSGMEVSLDLCNNG-AATSMVVRDK-IHVLPREILGISTFGLSVFLLKWFPIKWVDALLL NSGMEIAYDLAVGG-AATSIVIRSE-LHLVSKEIW---NLAMTLYR--YLPVWVIDKVVL NSGFEIAYDLAAGGAAAVSIAVRGE-VHLVSREVW---SVGMALQR--YLPTWAVDKVVL NSGMEIAYDLATHG-AHTSIVVRSP-IHIMTKELI---RFGMTVVQNLGLTVTTADSLLV NSGMEIAYDLATHG-AHTSIVVRSP-VHIMTKELI---WYGMTMVQNLGLNVTAVDSLLV NSGMEIAYDLATHG-AHTSIVVRSP-VHIMTKELI---RFGMTMVQNLGLSVTIVDPLLV NSGMEIAYDLSSHG-ANTSIVIRSP-LHVMTKELI---HMGMKLAS-WSLPVKFVDFILV ***:*:. ** . . : : *::.:*: : : : * .::

244 249 261 238 287 286 247 254 282 240 273 238 238 235 234

OsYUCCA1 AtYUCCA1 OsYUCCA8 OsYUCCA2 OsYUCCA3 OsYUCCA4 OsYUCCA6 OsYUCCA7 OsYUCCA5 OsYUCCA9 OsYUCCA10 OsYUCCA13 OsYUCCA14 OsYUCCA12 OsYUCCA11

PSGLAAAACLAARGVP-ATVLERSDSLASTWRHRMYDRLALHLPKRFCELPLLP-FPEEY PSGLATSACLSSRGVP-SLILERSDSIASLWKSKTYDRLRLHLPKHFCRLPLLD-FPEYY PSGLAVAATLRQHGAP-FTVVERSGGVADLWTNRTYDRLRLHLPKVFCELPHVA-FPPDF PSGLAAAACLKEKGID-SLVLERSSCLAPLWQLKMYDRLSLHLPRQFCELPLFP-FPASY PSGLAVAACLKEKGVD-SLVLERSNCIASLWQLKTYDRLSLHLPRQFCELPLMP-FPAYY PAGVAVGALLGLRGVA-YVVLERCGCIASLWRHRTYDRLCLHLPKRFCELPLRP-FPASF PAGLSVAACLRERGVP-SVLLERADCIASLWQRRTYDRLRLHLPKHFCELPGMP-FPDGY PSGLAVAASLREQGVP-FTMLERADCIASLWQKRTYDRLKLHLPKQFCELPRMA-FPAHY PSGLATAACLKARGVP-SLVLDKDATVAASWRERTYERLRLHLPRGFCELPLAPPFPPGT PSGLAAAACLSVRGVTGCLVLERDDCVASLWRHRTYDRVRLHLAKRYCALPHAP-HGEAS QSGLAAAACLSVRGVASCLVLERDDCVASLWRHRAYDRLRLHLPKRHCALPRAP-HAAAA PAGLATAACLAQRHVP-YVIVERESCTASLWRHRAYDRLKLHLAKEFCELPHMA-YPVGT PAGLATAACLAQRHVP-YIIVERESSTASLWRHRAYDRLKLHLAKEFCELPHMA-YPAGT PAGLATAACLTLQHVA-YAIIERESCTASLWRHRTYDRLKLHLAKEFCELPHMA-YPSGT PSGLATAACLGQLSIP-YVIIEREDCTASLWRKHTYDRLKLHLAKEFCEMPHMP-YPEDT :*::..* * :::: * * : *:*: ***.: .* :* .

81 85 90 72 118 109 75 87 111 77 109 73 73 70 70

A

B

Fig. 3 a OsYUCCAs have a conserved binding motif for NADPH which includes the following order of amino acids: G-C-G-N-S-G. However, OsYUCCA12, OsYUCCA13, and OsYUCCA14 have A in this domain instead of C while OsYUCCA3 and OsYUCCA9 have serine and G instead of C. Conserved residues appear in bold font. b OsYUCCAs have a conserved binding motif for FAD which includes the following order of amino acids: G-A-G-P-S-G. However, OsYUCCA4, OsYUCCA6, OsYUCCA12, OsYUCCA13, and OsYUCCA14 have A in this domain instead of S while OsYUCCA10 has Q instead of P. Conserved residues appear in bold font

Appl Biochem Biotechnol

decarboxylases. Aromatic L-amino acid decarboxylases have been the subject of intensive studies for their role in the synthesis of a wide range of secondary metabolites including indole alkaloids, hydroxycinnamic acid amides, serotonin, and phenylacetaldehyde [7, 33, 34]. TDC and TYDC which convert tryptophan and tyrosine into tryptamine and tyramine have attracted considerable attention because they are involved in the biosynthesis of a number of pharmaceutical compounds. Because they are similar proteins, automated annotation of decarboxylases as TDC or TYDC is unreliable. Figure 4 shows a phylogenetic tree for TDC homologues from a wide variety of plants including Hordeum vulgare, C. acuminata, O. pumila, Papaver somniferum, Aristolochia contorta, Thalictrum flavum, Rosa hybrid cultivar, P. hybrida, C. roseus, Petroselinum XP 419032 G. gallus AADC 89475 S. moellendorffii 121532 S. moellendorffii 78872 S. moellendorffii 79134 S. moellendorffii 187201 P. patens 187205 P. patens Sb02g010470 S. bicolor Os07g25590 O. sativa TDC At2g20340 A. thaliana At4g28680 A. thaliana II2533 P. trichocarpa 00040229 P. trichocarpa P54768 P. somniferum TYDC1 P54771 P. somniferum TYDC5 ABJ16446 A. contorta TYDC AAG60665 T. flavum TYDC P54770 P. somniferum TYDC3 ABB04522 R.x hybrida PAAS Q06086 P. crispum TYDC2 Q06088 P. crispum TYDC4 ABB72475 P. x hybrida PAAS 4992213 S. bicolor Os05g43510 O. sativa Os10g23900 O. sativa Os01g56380 O. sativa TYDC2 141709 S. bicolor Os10g26110 O. sativa 00130520 P. trichocarpa ABF06560 R. sachalinensis TYDC Os08g04560 O. sativa Os08g04540 O. sativa TDC Sb07g00304 S. bicolor BAD11768 H. vulgare TDC BAD11769 H. vulgare TDC CAA47898 C. roseus TDC BAC41515 O. pumila TDC AAB39708 C. acuminata TDC AAB39709 C. acuminata TDC 0.1

Fig. 4 Phylogenetic tree showing phylogenetic relationships between TDC from various plants. The tree was constructed using ClustalW (Accurate) [8], Protdist [15], and Fitch [15]. Bold line branches indicate that the bootstrap values are less than 95 %. The phylogenetic tree was displayed using TreeView [31]. The outgroup used was AADC from G. gallus. Bootstrap values, 500 replicates, were calculated in Seqboot [15] to test the reliability of nodes. All programs were accessed via BioManager by Australian National Genomic Information Service [30]. Arabidopsis and rice proteins are labeled by TIGR locus numbers. Protein IDs are used to name Selaginella and Physcomitrella proteins. Proteins from Sorghum are referred to by the locus number if available otherwise the ID numbers are used. Populus proteins are abbreviated as following: estExt_fgenesh4_pg.C_LG_II2533; LGII2533, eugene3.00040229; 00040229, eugene3.00130520; 00130520. GenBank accession numbers are used to identify proteins from other plant species. AADC aromatic amino-acid decarboxylase, TDC tryptophan decarboxylase, TYDC tyrosine decarboxylase, PAAS phenylacetaldehyde synthase

Appl Biochem Biotechnol

crispum, Rhodiola sachalinensis, S. bicolor, P. patens, S. moellendorffii, and P. trichocarpa. Two separate lineages supported at 100 % by bootstrap analysis were found. The first major branch in the tree contained sequences from all taxonomic groups including the moss P. patens and vascular non-seed plant S. moellendorffii. Representatives from monocots (rice and S. bicolor) in addition to the dicot plants A. thaliana and P. trichocarpa were also present. The second major branch consisted of three minor groups. The first minor group, supported at 100 % by bootstrap analysis contained TYDC and PAAS but no experimentally characterized TDC sequences. The second minor group contained four out of the seven rice TDC/ TYDC sequences. The third minor group was supported at 98 % by bootstrap analysis. This group was further divided into two subgroups: the first subgroup contained TDC sequences from only grass plants including rice, S. bicolor, and H. vulgare whereas the second subgroup contained experimentally characterized TDCs from C. roseus, O. pumila, and C. acuminata. Pyridoxal phosphate has been reported to be essential for the activity of decarboxylases [24]. De Luca [24] suggested that lysine residue number 319 in C. roseus binds pyridoxal phosphate. Alignment of amino acid sequences of TDC proteins from C. roseus (CAA47898), A. thaliana (At2g20340), as well as all TDCs from O. sativa showed that all have the lysine residue for pyridoxal phosphate cofactor (Fig. 5). Aldehyde Oxidase Orthologues The most striking observation from the phylogenetic tree of AO sequences (Fig. 6) is that sequences from the same species are more similar to each other than to sequences from other species. Aldehyde oxidase proteins in the tree clustered into two lineages supported at 100 % by bootstrap analysis. In the first lineage, AO proteins from dicots were clustered into three subgroups. The first subgroup contained three sequences for S. lycopersicum; the second subgroup contained two sequences for P. trichocarpa whereas the third subgroup contained four sequences for A. thaliana. The highest similarity at the amino acid level was found between sequences from S. Lycopersicum (85 %) followed by P. trichocarpa (74 %) and then A. thaliana (65 %). Aldehyde oxidase from monocot plants were found in the second lineage and clustered into three subgroups: the first subgroup contained three sequences for S. bicolor and one sequence for O. sativa. The second subgroup contained only three sequences for rice while the third subgroup contained two sequences for Z. mays and two sequences for S. bicolor. Sequences from P. patens and S. moellendorffii were clustered separately in the tree. There are four putative rice AO proteins; alignment showed that these four genes were very similar especially Os10g04860, Os03g57690, and Os03g57680. Os10g04860 as an example Os01g56380 Os10g23900 Os05g43510 Os10g26110 Os08g04540 Os08g04560 C. roseus At2g20340 Os07g25590

DPVRELCAVAARHGGVWVHVDAAYAGSALVCPEFRDVIAG-AEAVDSLSMNAHKWLLANN DPVGELCAAAAPHG-AWVHVDAAYAGSAMVCPELRGAVAGGVEAVDSFSMNAHKWLLANN DPLRELCAAVEGRG-VWVHVDAAYAGAACVCPEFRHAIAG-AEAVDSFSTNPHKWLLANM DPVRELGEVARRHG-MWLHVDAAYAGSAAICPEYQGYLDG-AELADSVSMNPHKWFLTNM DPVGAVADVAARFA-AWVHVDAAYAGSACICPEFRHHLDG-VERVDSISMSPHKWLMTCL DPVGAVADVAARFA-AWVHVDAAYAGSACICPEFRHHLDG-VERVDSISMSPHKWLMTCL DPVDSLSEIANEFG-IWIHVDAAYAGSACICPEFRHYLDG-IERVDSLSLSPHKWLLAYL DPLAALGKIANSNG-IWFHVDAAYAGSACICPEYRQYIDG-VETADSFNMNAHKWFLTNF DPLPELGQIAKSND-MWFHIDAAYAGSACICPEYRHHLNG-VEEADSFNMNAHKWFLTNF **: : . *.*:******:* :*** : : * * .**.. ..***:::

345 340 330 323 336 337 325 315 320

Fig. 5 Lysine residue (K) number 319 in C. roseus which reported to bind to pyridoxal phosphate is conserved in all TDCs from rice and Arabidopsis. Conserved residues appear in bold font

Appl Biochem Biotechnol 444239 S. moellendorffii AAG22606 S. lycopersicum AAG22607 S. lycopersicum AAG22605 S. lycopersicum 66000030 P. trichocarpa LG IX000157 P. trichocarpa At3g43600 A. thaliana AAO2 At5g20960 A. thaliana AAO1 At1g04580 A. thaliana AAO4 At2g27150 A. thaliana AAO3 Sb02g003720 S. bicolor Os07g18120 O. sativa 4873782 S. bicolor 4873789 S. bicolor Os10g04860 O. sativa Os03g57680 O. sativa Os03g57690 O. sativa Sb01g005680 S. bicolor 7431398 Z. mays AO1 7431399 Z. mays AO2 Sb01g005650 S. bicolor 106708 P. patens 140802 P. patens Os03g31550 O. sativa 0.1

Fig. 6 Phylogenetic tree showing phylogenetic relationships between AO from various plants. The tree was constructed using ClustalW (Accurate) [8], Protdist [15], and Fitch [15]. Bold line branches indicate that the bootstrap values are less than 95 %. The phylogenetic tree was displayed using TreeView [31]. The outgroup used was xanthine dehydrogenase from rice (Os03g31550). Bootstrap values, 500 replicates, were calculated in Seqboot [15] to test the reliability of nodes. All programs were accessed via BioManager by Australian National Genomic Information Service [30]. Arabidopsis and rice proteins are named after TIGR locus numbers. Protein IDs are used to name Selaginella and Physcomitrella proteins. Proteins from Sorghum are referred to by the locus number if available otherwise the ID numbers are used. Populus proteins are abbreviated as following: fgenesh4_pm.C_scaffold_66000030; 66000030, fgenesh4_pg.C_LG_IX000157; LGIX000157. GenBank accession numbers are used to identify proteins from other plant species

shared 84 and 85 % amino acid identity with Os03g57680 and Os03g57690, respectively. On the other hand Os10g04860 shared only 54 % similarity with the query sequence, Arabidopsis AOα. Nitrilase Figure 7 shows a phylogenetic tree constructed for nitrilase proteins from plants including O. sativa, Nicotiana tabacum, Lupinus angustifolius, A. thaliana, Brassica rapa, Z. mays, S. bicolor, P. patens, S. moellendorffii, and P. trichocarpa. Nitrilases in the tree were separated into two branches supported at 100 % by bootstrap analysis. All cereals included in this tree had orthologues of ZmNIT1 and ZmNIT2. The first branch (orthologues of AtNIT4) contained sequences from diverse taxonomic groups including moss, non-seed plants, monocots, and dicots. It appeared that all representatives of grass plants including Sorghum, maize, and rice had two isoforms of AtNIT4. There were five subgroups in the first branch; the first subgroup contained NIT4 of A. thaliana and B. rapa. The second subgroup, supported at 100 % by bootstrap analysis, contained sequences

Appl Biochem Biotechnol ABM55735 B. rapa NIT4 At5g22300 A. thaliana NIT4 66000137 P. trichocarpa ABA28312 L. angustifolius Q42965 N. tabacum 171766 S. moellendorffii LG XVI000624 P. trichocarpa 88254 P. patens Os02g42330 O. sativa 148687 S. bicolor 148686 S. bicolor ABO93185 Z. mays NIT1 Os02g42350 O. sativa 148688 S. bicolor AAO11742 Z. mays NIT2 ABM55734 B. rapa NIT2 ABM55733 B. rapa NIT1 AAK57436 B. rapa NIT3 At3g44310 A. thaliana NIT1 At3g44300 A. thaliana NIT2 At3g44320 A. thaliana NIT3 NP 770037 B. japonicum 0.1 substitutions per site

Fig. 7 Phylogenetic tree showing phylogenetic relationships between NIT from various plants. The tree was constructed using ClustalW (Accurate) [8], Protdist [15], and Fitch [15]. Bold line branches indicate that the bootstrap values are less than 95 %. The phylogenetic tree was displayed using TreeView [31]. The outgroup used was a nitrilase protein from Bradyrhizobium japonicum (NP 770037). Bootstrap values, 500 replicates, were calculated in Seqboot [15] to test the reliability of nodes. All programs were accessed via BioManager by Australian National Genomic Information Service [30]. Arabidopsis and rice proteins are named after TIGR locus numbers. Protein IDs are used to name Selaginella and Physcomitrella proteins. Proteins from Sorghum are referred to by the locus number if available otherwise the ID numbers are used. Populus proteins are abbreviated as following: fgenesh4_pg.C_scaffold_66000137; 66000137. GenBank accession numbers are used to identify proteins from other plant species

from P. trichocarpa, L. angustifolius, and N. tabacum. Sequences from P. patens, S. moellendorffii, and P. trichocarpa were found in the third subgroup which was supported at 100 % by bootstrap analysis. The fourth and fifth subgroups contained sequences from grass plants including Z. mays, S. bicolor, and rice. Orthologues of Arabidopsis NIT1, NIT2, and NIT3 were found in a second branch which only had members from the Brassicaceae. Rice had two nitrilase proteins in the first branch Os02g42330 (OsNIT1) and Os02g42350 (OsNIT2). Alignment of rice nitrilase proteins with ZmNIT2 showed that OsNIT2 has 90 % amino acid identity to ZmNIT2 compared to 75 % similarity to OsNIT1.

Discussion The presence of six to 14 YUCCA sequences from all plants with known genome sequences suggested a high degree of functional redundancy. This had been backed up by experimental

Appl Biochem Biotechnol

observation where Cheng et al. [35] showed that deletion of more than one Arabidopsis gene was required to obtain a phenotypic effect. On the other hand, phylogenetic analysis revealed that there were two major branches of YUCCA genes supported at 100 % by bootstrap analysis. All higher plants appeared to have sequences in both clades suggesting the evolution of functional specialization. Most YUCCA proteins from Arabidopsis, Petunia, tomato, rice, and maize studied for their role in auxin biosynthesis and found to have essential impacts in various developmental processes were found in the first major branch suggesting that YUCCAs in the second branch may be involved in a more limited and specific biological process. AtYUCCA1, AtYUCCA4, AtYUCCA2, and AtYUCCA6 in the first major branch have overlapping functions as no obvious developmental defects were observed for any single mutant knockout of the four AtYUCCA genes. Among the six combinations of double mutants generated for the four AtYUCCA genes; AtYUCCA1AtYUCCA2, AtYUCC A1AtYUCCA6, AtYUCCA2AtYUCCA4, AtYUCCA4AtYUCCA6, AtYUCCA1AtYUCC A4, and AtYUCCA2AtYUCCA6, only the last two double mutants displayed developmental defects [35]. Interestingly, phylogenetic analysis classified AtYUCCA1, AtYUCCA4 in one clade and AtYUCCA2, AtYUCCA6 in another clade. Following the finding that YUCCA genes found in the same clade had overlapping functions [35], OsYUCCA1 and OsYUCCA8 and OsYUCCA2 and OsYUCCA3 was hypothesized to have also overlapping functions. Despite the severe phenotypes, the quadruple mutant was not lethal suggesting that other AtYUCCA genes may be involved in IAA synthesis in Arabidopsis [35]. Analysis of AtYUCCA1 orthologues in Petunia, rice, and maize revealed that genetic redundancy in these plants was less complex because loss of function of a single YUCCA gene led to obvious developmental defects [18–20]. The importance of the four AtYUCCA genes, AtYUCCA1, AtYUCCA4, AtYUCCA2, and AtYUCCA6, has been highlighted by the findings that overexpression of each gene caused auxin overproduction whereas the triple and quadruple knockout mutants of these genes caused severe developmental defects confirming the essential role of this group of YUCCA genes in IAA synthesis. It was very clear that for every OsYUCCA there was a very close orthologue from Sorghum with a high % of amino acid similarity suggesting that IAA synthesis in these two model plants were very similar. Sequences of Sorghum in the second branch were left out of the tree as it was getting very large. The second branch contained YUCCA sequences from only angiosperms indicating that they may have evolved with seed plants or even angiosperms. Similarity at the amino acid level between rice sequences in the second major branch was high reaching up to 87 % in case of OsYUCCA13, OsYUCCA14; 81 % for OsYUCCA9, OsYUCCA10; and 79 % for OsYUCCA13, OsYUCCA12 suggesting that these groups of similar YUCCA genes may have arisen as a result of recent gene duplication. The tandem position of OsYUCCA9 and OsYUCCA10 and OsYUCCA13 and OsYUCCA14 also suggested recent gene duplication. A link between AtYUCCA10 and AtYUCCA11 in this branch to the formation of embryonic and postembryonic organs has been established [36]. Taken together with the observation that there was no representative from Physcomitrella or Selaginella, it was hypothesized that this branch may have evolved with seed development in plants. The YUCCA phylogenetic tree showed without ambiguity that YUCCA was conserved across the plant kingdom as YUCCA sequences from diverse taxonomic groups including dicot, monocot, non-seed vascular plants, and moss share common ancestors with sequences having an experimentally demonstrated role in IAA synthesis. TDC/TYDC proteins in this study separated into two major clades. The first major clade had members from diverse groups including moss, vascular non-seed plants, monocots, and

Appl Biochem Biotechnol

dicot plants. The catalytic activity of a rice sequence in this branch, Os07g25590, has been identified as a tryptophan decarboxylase [37]. Genes in this clade also had a similar number of introns. Sb02g010470 and At4g28680 have 11 introns whereas Os07g25590 and At2g20340 had 12 introns. Therefore, TDC proteins in this branch were suggested to be highly conserved and could have a role in IAA synthesis. The second major clade in the TDC/TYDC tree contained three subgroups. The first subgroup contained characterized TYDC and PAAS sequences. Even though these shared extensive similarity with TDCs, plant TYDCs exhibit high substrate specificity for tyrosine. Several TYDCs from parsley, Eschscholtzia californica, Thalictrum rugosum, and Opium poppy accept tyrosine as a substrate but not tryptophan [34, 38, 39] indicating that sequences in this subgroup are not involved in IAA production. It is also unlikely that the second subgroup is involved in auxin synthesis as it contained sequences from a limited number of taxonomic groups (grasses and Populus). In addition, one member in this subgroup (Os01g56380) was identified as a TYDC [37]. There were no Arabidopsis sequences in the third subgroup and their TDC proteins were also limited to specific taxonomic groups. TDCs in this subgroup contained fewer or no introns. The catalytic function of TDCs from C. roseus, O. pumila, and C. acuminata has been experimentally identified [22–24]. In addition, expression and activity of these proteins is correlated with accumulation of alkaloids including ajmalicine and camptothecin, an anticancer monoterpene alkaloid [22, 23, 40]. Despite the large amount of accumulated tryptamine in transgenic tobacco over-expressing C. roseus TDC no significant difference in IAA content was detected indicating that TDCs in this subgroup may not be involved in IAA synthesis [41]. There were seven TDC/TYDC proteins in rice. One rice TDC sequence (Os07g25590) was found in the first (conserved) clade and may be involved in IAA synthesis. The remaining sequences were found in the second clade which contained most characterized TDC/TYDC sequences from limited plant diversity. The activity of TDC sequences in the second clade is more likely to be linked to the production of secondary metabolites. The two identified TDC sequences in rice (Os07g25590 and Os08g04540) were 48 % similar on the amino acid level whereas these two sequences were 46 % similar to the identified TYDC sequence (Os01g563802) suggesting that annotation of TDCs and TYDCs based on amino acid identity is unreliable. Even though most TDC/TYDC sequences do not contain introns they are functional genes with gene products able to catalyze decarboxylation of either tryptophan or tyrosine [37]. Moreover, examination of rice sequences revealed that they all had the conserved lysine residue for binding the pyridoxal phosphate cofactor. The phylogenetic tree for AO showed that there were two separate lineages: one of them for dicots and the second one for monocots. Sequences from each species in the tree were usually most closely related to other homologues in the same plant rather than to sequences from other species. This suggested that each group of plants inherited one AO gene with recent duplications as sequences from each group shared high similarity reaching up to 85 % for rice AOs. It is well established that AO plays an important role in abscisic acid synthesis. It has been shown that AO efficiently oxidized abscisic aldehyde to abscisic acid in Arabidopsis rosette leaves as well as pea roots and leaves [25, 42]. Three abscisic acid-deficient mutants that are defective at the Arabidopsis aldehyde oxidase 3 gene (AAO3) have been isolated and characterized (aao3-1, aao3-2, and aao3-3) [43, 44]. The abscisic acid level was found to be significantly reduced in these mutants suggesting that AOδ is involved in the control of abscisic acid synthesis in Arabidopsis leaves and seeds. It appeared clearly from the discussion above that AO is more experimentally linked to abscisic acid synthesis than to IAA synthesis. Phylogenetic results showed that all studied

Appl Biochem Biotechnol

plants had more than one AO sequence. However, sequences from the same species were much more similar to each other than to sequences from other species indicating that these plants inherited only one AO with recent duplication. Based on these observations, it was expected that the conserved AO is involved in abscisic synthesis but not IAA. The phylogenetic tree of nitrilase (Fig. 7) clearly showed two clades. One of these had representatives from across the plant kingdom including lower plants. However, the second branch had only representatives from members of the Brassicaceae. Evidence for catalytic activity with IAN has been obtained for members of both branches [29, 45]. It has been found that AtNIT1, AtNIT2, AtNIT3, and AtNIT4 expressed in Escherichia coli can produce IAA from IAN [46, 47]. Normanly et al. [5] was generated point mutations in AtNIT1 showing low sensitivity to IAN. The sensitivity to IAN and the rate of conversion of exogenous IAN to IAA in the nit1 lines was increased by overexpressing AtNIT2. Moreover, the amount of exogenous I3C6IAN hydrolyzed to I3C6IAA was reported to be fourfold higher in the overexpressing lines compared to the control plants. Vorwerk et al. [45] found that AtNIT2 in the second branch was only but strongly expressed during the embryo development and maturation. Together with the observation that embryo development depends on local auxin supply a role of AtNIT2 in IAA synthesis has been suggested [45]. In addition to auxin biosynthesis IAN has been investigated in Arabidopsis for its role in indole glucosinolates metabolism [48] and the indolic phytoalexin camalexin production [4]. Being the major sinks for tryptophan and IAOx, glucosinolates and camalexin further complicate investigation of IAA synthesis in Arabidopsis. Compared to a wide group of compounds IAN is the least preferred substrate for AtNIT1, AtNIT2, and AtNIT3 found in the second branch suggesting that it may have a role in the production of secondary metabolites rather than auxin [45]. Phylogenetic analysis supported this suggestion as NIT sequences in the second branch came only from glucosinolate producing species. In the first branch, AtNIT4 showed high substrate activity toward β-cyano-Ala but not IAN. Tobacco nitrilase in the same branch in addition to protein extracts of A. thaliana, N. tabacum, and Lupinus angustifolius was also reported to catalyze the hydrolysis of βcyano-Ala [49]. On the other hand, ZmNIT2 from the same branch showed seven to 20 times higher activity toward IAN than that of AtNIT1, AtNIT2, and AtNIT3. Unlike AtNIT4, βcyano-Ala is not a substrate for ZmNIT2. Among 18 tested nitrile compounds IAN was the preferred substrate for ZmNIT2 [29]. Additionally, the endogenous IAN and ZmNIT2 activity were detected in maize kernels where high level of IAA is synthesized. Taken together [29] concluded that ZmNIT2 is involved in IAA synthesis in maize kernels. Phylogenetic analysis showed that Sorghum, maize, and rice had two isoforms of AtNIT4 suggesting that the catalytic activity of grass NITs may be determined by subunit structure. In [50], Jenrich et al. have shown that members of grasses have two different homologues for AtNIT4. None of these homologues alone was capable of catalyzing the conversion of βcyano-Ala whereas the heteromeric complexes had this activity. Furthermore, the S. bicolor heteromeric complex has been reported to have the highest activity ever reported for plant nitrilases toward IAN. The importance of NIT on IAA synthesis was further supported by knockout studies. Kriechbaumer et al. [51] has shown that the total IAA concentration of Zmnit2 knockout mutant in maize kernels and roots was reduced by 35 and 25 %, respectively. Pollmann et al. [52] was isolated IAA-synthase complex from 14 different species representing nine families and found that the activity of IAA synthase to produce IAA was reduced by 18 % in the triple knockout mutant of nit1-3. The free IAA content in the knockout mutant was found to be 29 % less than that of the sterile grown wild type.

Appl Biochem Biotechnol

It appeared from the phylogenetic tree of NIT that this enzyme was conserved in the first branch as it is widespread across the plant kingdom. Nevertheless, this tree differs from YUCCA and TDC trees as sequences from each group were similar to each other rather than to sequences from other groups. Therefore, results from phylogenetic analysis alone may not be enough to judge a possible role of nitrilase in IAA synthesis. It appeared from the phylogenetic tree of NIT that NIT4 orthologues were conserved across the plant kingdom. However, contradictory evidence regarding the ability of these enzymes to accept IAN as a substrate requires further investigation.

References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14.

15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39.

Stepanova, A., et al. (2008). Cell, 133(1), 177–191. Tao, Y., et al. (2008). Cell, 133(1), 164–176. Yamada, M., et al. (2009). Plant Physiology, 151, 168–179. Nafisi, M., et al. (2007). Plant Cell, 19(6), 2039–2052. Normanly, J., et al. (1997). Plant Cell, 9(10), 1781–1790. Bak, S., et al. (2001). Plant Cell, 13(1), 101–111. Noe, W., Mollenschott, C., & Berlin, J. (1984). Plant Molecular Biology, 3(5), 281–288. Thompson, J. D., Higgins, D. G., & Gibson, T. J. (1994). Nucleic Acids Research, 22, 4673–4680. Fitch, W. M. (1981). Journal of Molecular Evolution, 18(1), 30–37. Fitch, W. M. (1971). Systematic Zoology, 20, 406–416. Felsenstein, J. (1981). Journal of Molecular Evolution, 17(6), 368–376. Dayhoff, M. O. (1978), in A model of evolutionary change in proteins, vol. 5: Atlas of Protein Sequence and Structure (Dayhoff, M.O., ed.), Washington, D.C., pp 345–352. Kimura, M. (1983) The neutral theory of molecular evolution, Cambridge, pp 50–56. George, D. G., Hunt, L. T., & Barker, W. C. (1988), in Current methods in sequence comparison and analysi, vol. 2: Macromolecular Sequencing and Synthesis (Schlesinger, D.H. and Liss, A.R., ed.), New York, pp 127–149. Felsenstein, J. (1989). Cladistics, 5, 164–166. Zhao, Y. D., et al. (2001). Science, 291(5502), 306–309. Exposito-Rodriguez, M., et al. (2007). Journal of Plant Growth Regulation, 26(4), 329–340. Gallavotti, A., et al. (2008). Proceedings of the National Academy of Sciences of the United States of America, 105(39), 15196–15201. Tobena-Santamaria, R., et al. (2002). Genetics Development, 16(6), 753–763. Yamamoto, Y., et al. (2007). Plant Physiology, 143, 1362–1371. Altschul, S. F., Madden, T. L., Schäffer, A. A., Zhang, J., Zhang, Z., Miller, W., et al. (1997). Nucleic Acids Research, 25, 3389–3402. Yamazaki, Y., et al. (2003). Plant & Cell Physiology, 44(4), 395–403. LopezMeyer, M., & Nessler, C. L. (1997). Plant Journal, 11(6), 1167–1175. De Luca, V., Marineau, C., & Brisson, N. (1989). Proceedings of the National Academy of Sciences of the United States of America, 86(8), 2582–2586. Seo, M., et al. (2000). Plant Journal, 23(4), 481–488. Rajagopal, R. (1971). Physiology Plantarum, 24, 272–281. Bower, P. J., Brown, H. M., & Purves, W. K. (1978). Plant Physiology, 61(1), 107–110. Koshiba, T., et al. (1996). Plant Physiology, 110(3), 781–789. Park, W. J., et al. (2003). Plant Physiology, 133(2), 794–802. ANGIS/BioManager. 2009, The University of Sydney. Page, R. D. M. (1996). Computer Applications in the Biosciences, 12, 357–358. Woo, Y. M., et al. (2007). Plant Molecular Biology, 65(1–2), 125–136. Kaminaga, Y., et al. (2006). Journal of Biological Chemistry, 281(33), 23357–23366. Kawalleck, P., et al. (1993). Journal of Biological Chemistry, 268(3), 2189–2194. Cheng, Y. F., Dai, X. H., & Zhao, Y. D. (2006). Genetics Development, 20(13), 1790–1799. Cheng, Y. F., Dai, X. H., & Zhao, Y. D. (2007). Plant Cell, 19(8), 2430–2439. Kang, S., et al. (2008). Planta, 227(1), 263–272. Facchini, P. J., & De Luca, V. (1995). Phytochemistry, 38(5), 1119–1126. Marques, I. A., & Brodelius, P. E. (1988). Plant Physiology, 88(1), 52–55.

Appl Biochem Biotechnol 40. 41. 42. 43. 44. 45. 46. 47. 48. 49. 50. 51. 52.

Islas, I., Loyolavargas, V. M., & Mirandaham, M. D. (1994). In Vitro Cell Dev-Pl, 30(1), 81–83. Songstad, D. D., et al. (1990). Plant Physiology, 94(3), 1410–1413. Zdunek-Zastocka, E., et al. (2004). Journal of Experimental Botany, 55(401), 1361–1369. Gonzalez-Guzman, M., et al. (2004). Plant Physiology, 135(1), 325–333. Seo, M., et al. (2000). Proceedings of the National Academy of Sciences of the United States of America, 97(23), 12908–12913. Vorwerk, S., et al. (2001). Planta, 212(4), 508–516. Bartel, B., & Fink, G. (1994). Proceedings of the National Academy of Sciences, 91(14), 6649–6653. Bartling, D., et al. (1994). Proceedings of the National Academy of Sciences, 91(13), 6021–6025. Wittstock, U., & Halkier, B. (2002). Trends in Plant Science, 7(6), 263–270. Piotrowski, M., Schonfelder, S., & Weiler, E. W. (2001). Journal of Biological Chemistry, 276(4), 2616– 2621. Jenrich, R., et al. (2007). Proceedings of the National Academy of Sciences of the United States of America, 104(47), 18848–18853. Kriechbaumer, V., et al. (2007). Journal of Experimental Botany, 58(15–16), 4225–4233. Pollmann, S., Düchting, P., & Weiler, E. (2009). Phytochemistry, 70(4), 523–531.

Phylogenetic analysis of putative genes involved in the tryptophan-dependent pathway of auxin biosynthesis in rice.

Plant proteome databases were mined for a flavin monooxygenase (YUCCA), tryptophan decarboxylase (TDC), nitrilase (NIT), and aldehyde oxidase (AO) enz...
520KB Sizes 0 Downloads 0 Views