Research

The Eucalyptus grandis R2R3-MYB transcription factor family: evidence for woody growth-related evolution and function Marcßal Soler1, Eduardo Leal Oliveira Camargo1, Victor Carocha1,2,3, Hua Cassan-Wang1, Helene San Clemente1, Bruno Savelli1, Charles A. Hefer4, Jorge A. Pinto Paiva3,5, Alexander A. Myburg6,7 and Jacqueline Grima-Pettenati1 1

LRSV Laboratoire de Recherche en Sciences Vegetales, UMR5546, Universite Toulouse III/CNRS, BP 42617 Auzeville, 31326 Castanet Tolosan, France; 2Instituto de Tecnologia Quımica e

Biologica (ITQB), Universidade Nova de Lisboa, Av. da Rep ublica, 2780-157, Oeiras, Portugal; 3Instituto de Biologia Experimental e Tecnologica (iBET) Av. da Rep ublica, Quinta do Marqu^es, 2781-901 Oeiras, Portugal; 4Bioinformatics and Computational Biology Unit, Department of Biochemistry, Forestry and Agricultural Biotechnology Institute (FABI), University of Pretoria, Private Bag X20, Pretoria 0028, South Africa; 5Instituto de Investigacßao Cientıfica e Tropical (IICT/MNE) Palacio Burnay - Rua da Junqueira, 30, 1349-007 Lisboa, Portugal; 6

Department of Genetics, Forestry and Agricultural Biotechnology Institute (FABI), University of Pretoria, Private Bag X20, Pretoria 0028, South Africa; 7Genomics Research Institute (GRI),

University of Pretoria, Private Bag X20, Pretoria 0028, South Africa

Summary Author for correspondence: Jacqueline Grima-Pettenati Tel: +33 534323813 Email: [email protected] Received: 24 June 2014 Accepted: 5 August 2014

New Phytologist (2014) doi: 10.1111/nph.13039

Key words: cambium, Eucalyptus grandis, evolution, R2R3-MYB transcription factors, secondary growth, tandem duplications, wood formation.

 The R2R3-MYB family, one of the largest transcription factor families in higher plants, con-

trols a wide variety of plant-specific processes including, notably, phenylpropanoid metabolism and secondary cell wall formation. We performed a genome-wide analysis of this superfamily in Eucalyptus, one of the most planted hardwood trees world-wide.  A total of 141 predicted R2R3-MYB sequences identified in the Eucalyptus grandis genome sequence were subjected to comparative phylogenetic analyses with Arabidopsis thaliana, Oryza sativa, Populus trichocarpa and Vitis vinifera. We analysed features such as gene structure, conserved motifs and genome location. Transcript abundance patterns were assessed by RNAseq and validated by high-throughput quantitative PCR.  We found some R2R3-MYB subgroups with expanded membership in E. grandis, V. vinifera and P. trichocarpa, and others preferentially found in woody species, suggesting diversification of specific functions in woody plants. By contrast, subgroups containing key genes regulating lignin biosynthesis and secondary cell wall formation are more conserved across all of the species analysed.  In Eucalyptus, R2R3-MYB tandem gene duplications seem to disproportionately affect woody-preferential and woody-expanded subgroups. Interestingly, some of the genes belonging to woody-preferential subgroups show higher expression in the cambial region, suggesting a putative role in the regulation of secondary growth.

Introduction Eucalyptus grandis is one of the most widely planted species in industrial plantations because of its fast growth and high-quality wood. It is also the second hardwood forest tree whose genome has been sequenced (Myburg et al., 2014), which represents a crucial milestone in investigating wood formation at the genomic level. Moreover, the genus Eucalyptus belongs to a basal rosid lineage, which evolved mostly in the isolation of the Australian continent and therefore represents an independent evolutionary experiment for studies of the woody perennial lifestyle (Myburg et al., 2014). Wood is the most abundant raw material on Earth and is used in many industrial applications. Wood properties and final uses are mainly determined by the relative abundance of the three major polymers deposited in the secondary cell walls of xylem cells, namely cellulose (40–50%), hemicellulose (25%) and lignin (25–35%) (Plomion et al., 2001). Wood, which is also called Ó 2014 The Authors New Phytologist Ó 2014 New Phytologist Trust

secondary xylem, is formed during secondary (radial) growth through the activity of the vascular cambium, a continuous layer of meristematic cells on the outer periphery of the stem. Cambial initials produce xylem mother cells that differentiate into mature xylem cells (vessels and fibres) through cell division, elongation, secondary cell wall formation (including lignin deposition) and programmed cell death (Hertzberg et al., 2001; Ye, 2002). Cambial activity and xylem differentiation are spatially and temporally regulated, mainly at the transcriptional level, involving a complex hierarchical network of transcription factors (Zhong et al., 2010). R2R3-MYB genes constitute one of the largest families of transcription factors in plants and regulate many aspects of plant biology, such as primary and secondary metabolism, cell fate, developmental processes and responses to biotic and abiotic stresses (Jin & Martin, 1999; Dubos et al., 2010). The first demonstration that R2R3-MYB genes regulate lignin deposition came from transgenic tobacco (Nicotiana tabacum) plants overexpressing AmMYB308 and AmMYB330 from Antirrhinum majus New Phytologist (2014) 1 www.newphytologist.com

2 Research

(Tamagnone et al., 1998). After this discovery, several R2R3MYB genes were shown to regulate aspects of phenylpropanoid metabolism, including the biosynthesis and deposition of lignin during secondary wall formation (reviewed by Rogers & Campbell, 2004; Dubos et al., 2010; Grima-Pettenati et al., 2012). Some R2R3-MYB genes have also been characterized in woody plants such as Populus and Pinus (Patzlaff et al., 2003; GomezMaldonado et al., 2004; Karpinska et al., 2004; Bomal et al., 2008; McCarthy et al., 2010; Winzell et al., 2010). Only two R2R3MYB genes have been characterized to date in Eucalyptus (Eucalyptus gunnii), EgMYB1 and EgMYB2, two master regulators acting, respectively, as repressors and activators of secondary cell wall formation (Goicoechea et al., 2005; Legay et al., 2007, 2010). R2R3-MYB proteins have a highly conserved N-terminal DNA-binding domain composed of two adjacent repeats of the MYB domain (the R2R3 MYB domain) and a highly variable C-terminal activation or repression domain (reviewed in Jin & Martin, 1999; Rogers & Campbell, 2004). In Arabidopsis thaliana, phylogenetic reconstruction of the R2R3-MYB family was performed by motif detection in the C-terminal variable region to define 22 subgroups of genes, although for many genes no motif was found to group them (Kranz et al., 1998; Stracke et al., 2001). Phylogenetic studies including studies on rice (Oryza sativa; Jiang et al., 2004; Yanhui et al., 2006), grapevine (Vitis vinifera; Matus et al., 2008), poplar (Populus trichocarpa; Wilkins et al., 2009) and, more recently, maize (Zea mays; Du et al., 2012a), soybean (Glycine max; Du et al., 2012b), apple (Malus domestica; Cao et al., 2013) and switchgrass (Panicum virgatum; Zhao & Bartley, 2014) were also performed, showing that most of the subgroups described in A. thaliana were conserved. However, in some cases new subgroups absent in A. thaliana were found. Sequence-based homology classification is important to construct hypotheses about the function of R2R3-MYB genes not yet studied in model species, as genes in the same subgroup are thought to have relatively similar roles (Jiang et al., 2004), and also to determine the function of genes in nonmodel species by defining putative orthology relationships with known genes in model species. In this study, we took advantage of the newly sequenced Eucalyptus grandis genome to characterize the entire R2R3-MYB family, including its phylogenetic relationships with other plant lineages, genomic distribution, expansion by tandem duplication events, exon–intron structure and gene expression. Because of the industrial importance of wood in Eucalyptus, we focus on R2R3MYB genes putatively involved in the regulation of secondary cell wall formation and woody biomass production.

Materials and Methods Identification of R2R3-MYB gene models and phylogenetic analysis Eucalyptus grandis R2R3-MYB genes were retrieved from the E. grandis genome version 1.1 available in Phytozome (http://www.phytozome.net/) using INTERPROSCAN (Zdobnov & Apweiler, 2001) to search for the MYB domain signature. In total, 167 predicted gene models were found with two New Phytologist (2014) www.newphytologist.com

New Phytologist consecutive repeats of the MYB domain (excluding alternative transcripts). All but one (166) were also retrieved when performing a BLASTP analysis using the previously identified E. grandis genes against the whole A. thaliana R2R3-MYB gene data set (Dubos et al., 2010) with a cut-off e-value of e-40. The number of gene models was reduced to 141 after manual curation to eliminate partial (19) and uncertain (six) sequences. We applied the same procedure to identify and retrieve the R2R3-MYB sequences from Populus trichocarpa (v2.2), V. vinifera (v2010) and rice (O. sativa) (v7.0). The number of R2R3-MYB gene models identified by our methodology in the genome of A. thaliana (126) was the same as described in the literature (Dubos et al., 2010). However, the numbers of genes in P. trichocarpa (180), V. vinifera (123) and O. sativa (106) were different from those previously reported (192, 108 and 109, respectively; Yanhui et al., 2006; Matus et al., 2008; Wilkins et al., 2009). Taking into account that in P. trichocarpa 21 predicted genes could be alleles, whereas in V. vinifera a posterior study modified the number of R2R3-MYB proteins to 118 (Wilkins et al., 2009), the number of proteins found in our approach was very close to that reported, with just small differences that could be explained by the use of an updated version of the genome or by a different search methodology. R2R3-MYB sequences were aligned using MAFFT with the FFT-NS-i algorithm (Katoh et al., 2002) and evolutionary history was inferred by constructing a neighbour-joining phylogenetic tree with 1000 bootstrap replicates using MEGA5 (Tamura et al., 2011). Sequences were compared using the complete deletion method in MEGA5, which does not consider the positions containing gaps or missing data (mostly contained outside the conserved R2R3-MYB domain) as performed previously by Matus et al. (2008). The evolutionary distances were computed using the Jones–Taylor–Thornton substitution model and the rate variation among sites was modelled with a gamma distribution of 1. All of the E. grandis R2R3-MYBs belonging to subgroups with members in E. grandis, P. trichocarpa and V. vinifera but without members in A. thaliana and O. sativa (putative woody-preferential subgroups) were used in a reverse BLAST search against several different available plant genomes. The best hit obtained for a BLAST search in each plant genome was then used in another BLAST search against the E. grandis genome, and only if this retrieved the original E. grandis gene, or another member of the same subgroup, did we consider the subgroup to exist in the analysed genome. The following plant genomes, available in Phytozome, were used: Citrus sinensis (v1.1), Gossypium raimondii (v2.1), A. thaliana (TAIR annotation release 10), Capsella rubella (v1.0), Brassica rapa (v1.2), Carica papaya (39 draft), Medicago truncatula (v3.5), Prunus persica (v1.0), Malus domestica (v1.0), Cucumis sativus (v1), Manihot esculenta (v4.1), Ricinus communis (v0.1), Linum usitatissium (v1.0), P. trichocarpa (v2.2), V. vinifera (v2010), Solanum lycopersicum (v2.3), Mimmulus guttatus (v1.1), Aquilegia coerulea (v1.1), O. sativa (v7.0), Zea mays (v5b.60), Selaginella moellendorfii (v1.0) and Physcomitrella patens (v1.6). The same strategy was applied using the EST (Expressed Sequence Tags) database (http://www.ncbi.nlm.nih. gov/dbEST/) for Pinus spp. and Picea spp. Ó 2014 The Authors New Phytologist Ó 2014 New Phytologist Trust

New Phytologist In silico characterization of the Eucalyptus grandis R2R3-MYB family Information on intron–exon position and splicing sites for each gene model in the corresponding chromosome scaffold was extracted from the Phytozome database (E. grandis annotation v1.1). Gene sequences were normalized to start always at position 1 and to have the same orientation. FANCYGENE (Rambaldi & Ciccarelli, 2009) was used to obtain a graphic representation of the structure of all the genes. Gene models were plotted according to their physical position in the 11 chromosome scaffolds representing the 11 linkage groups of E. grandis using MAPCHART 2.2 (Voorrips, 2002). Genes belonging to the same subgroup, physically close (within 100 kb of each other) on a chromosome scaffold, and separated by 10 or fewer spacer gene models were considered to be in tandem duplicated arrays, similar to the procedure of Hanada et al. (2008). The same methodology was used to plot the gene models from P. trichocarpa, V. vinifera, A. thaliana and O. sativa. The rate of synonymous versus nonsynonymous substitutions per site in sequence pairs was calculated for each cluster of tandem duplicated genes using the codon-based Z-test for selection in MEGA5 with the Nei–Gojobori method using purifying selection as an alternative hypothesis (Tamura et al., 2011). Positions containing gaps and missing data were not considered for the analysis. Amino acid motif identification was performed using all the sequences in each woody-preferential subgroup using the MEME Suite (Bailey et al., 2009) with parameters described by Jiang et al. (2004). RNAseq expression analysis RNAseq data from six different tissues of three field-grown E. grandis trees, as described in Mizrachi et al. (2010), were obtained from EucGenIE (http://www.eucgenie.org/). Mean fragments per kilobase of exon per million fragments mapped (FPKM) values per gene were calculated for each tissue, and FPKM values of 0 were set to 1 before performing log2 transformation. Values were standardized using EXPANDER 6 (Ulitsky et al., 2010), resulting in a mean of 0 and a variance of 1 across tissues. Standardization was required because the absolute FPKM values were too divergent among genes for confident clustering of transcript abundance values. K-means clustering with EXPANDER 6 (Ulitsky et al., 2010) grouped the genes into 12 main clusters with unique expression patterns. Microfluidic quantitative PCR (qPCR) expression analysis Juvenile xylem, mature xylem, tension xylem, opposite xylem, primary stem, secondary stem, young leaves (expanding leaves), mature leaves (fully expanded leaves), lateral roots, floral buds and fruit capsules were collected from Eucalyptus globulus. Shoot tips, secondary xylem, secondary phloem and a cambiumenriched fraction were collected from 7-yr-old and 25-yr-old trees of Eucalyptus ‘Gundal’ hybrids (Eucalyptus Ó 2014 The Authors New Phytologist Ó 2014 New Phytologist Trust

Research 3

gunnii 9 Eucalyptus dalrympleana). Vascular tissues were sampled as described in Foucart et al. (2009). Briefly, bark was removed from the trunk and a cambium-enriched fraction was obtained by gently scraping both exposed surfaces (i.e. the inner side of the bark and the external side of the xylogenic tissue), thereby including cambial initials as well as xylem and phloem mother cells. After sampling of the cambium-enriched region, secondary phloem and xylem were obtained by deeper scraping of the exposed surfaces. Plant material provenance, RNA extraction and cDNA synthesis are described in Cassan-Wang et al. (2012). Gene-specific primers for microfluidic qPCR were designed using QUANTPRIME (Arvidsson et al., 2008) with default parameters (Supporting Information Table S1). Transcript abundance was assessed by microfluidic qPCR using the BioMark® 96.96 Dynamic Array platform (Fluidigm, San Francisco, CA, USA) as explained in Cassan-Wang et al. (2012). A dissociation step was performed after amplification to confirm the presence of a single amplicon. The contribution of primer dimers in the fluorometric reads and the absence of environmental contamination were assessed using nontemplate controls. When no amplification was detected, the cycle threshold (Ct) was assigned a value of 30, an arbitrary value when transcript was not detected (this was not done for the housekeeping genes, as it was not necessary and may introduce data bias). Amplification efficiency for each gene was calculated based on a dilution series (Table S1) and relative transcript abundance was calculated according to Pfaffl (2001). A mix with equal amounts of each sample was used to standardize data, and the geometric mean of five validated housekeeping genes, SAND (Eucgr.B02502), PP2A1 (Protein Phosphatase 2A1, Eucgr.B03386), EF1a (Elongation Factor 1 alpha, Eucgr.B02473), IDH (NADP Isocitrate dehydrogenase, Eucgr.F02901) and PP2A3 (Protein Phosphatase 2A3, Eucgr.B03031), was used as reference to normalize data (Cassan-Wang et al., 2012). Finally, data were log2transformed and hierarchical clustering was performed using EXPANDER 6 (Ulitsky et al., 2010).

Results Comparative phylogenetic analysis of the R2R3-MYB family in five angiosperm species A total of 141 R2R3-MYB gene models were identified in the E. grandis genome after manual curation and exclusion of alternative transcripts. In addition to the accession numbers attributed by Phytozome (http://www.phytozome.net/), we assigned short names to the 141 E. grandis R2R3-MYB genes. We kept the names of EgrMYB1 and EgrMYB2, the two first and only Eucalyptus R2R3-MYB genes functionally characterized to date (Goicoechea et al., 2005; Legay et al., 2007, 2010). The 139 remaining genes were numbered consecutively, from the first gene on the first chromosome scaffold (EgrMYB3) to the last gene on the last chromosome scaffold (EgrMYB141). Correspondence of gene names with accession numbers is shown in Table S2. All identified R2R3-MYB genes from E. grandis (141) were aligned with those of A. thaliana (126) and their evolutionary New Phytologist (2014) www.newphytologist.com

4 Research

history was inferred by constructing a neighbour-joining phylogenetic tree. Taking into account the topology of the tree and the bootstrap values, 42 subgroups were defined (Fig. S1), with high bootstrap values defining most of the subgroups (above 75 for > 90% of the subgroups). Because A. thaliana is, by far, the species in which the R2R3-MYB genes have been most extensively studied and because nearly all of the subgroups contain at least one A. thaliana gene, we used the nomenclature of Kranz et al. (1998) revised by Stracke et al. (2001) and Dubos et al. (2010) to name the subgroups. When no subgroup name had already been proposed in A. thaliana, we named the subgroup after the bestknown functionally characterized A. thaliana member. We then used all of the R2R3-MYB genes from V. vinifera (123), P. trichocarpa (180) and O. sativa (106) identified, and we included them with those from A. thaliana and E. grandis to construct a neighbour-joining phylogenetic tree. The topology of the resulting tree was mostly the same as obtained when using only sequences from E. grandis and A. thaliana, although bootstrap values were somewhat lower because 676 sequences from five different plant species were used (Figs 1, S2; Table S3). Generally, a good agreement exists between the phylogeny presented herein and the phylogeny described previously for A. thaliana (Stracke et al., 2001; Dubos et al., 2010), with nearly all of the subgroups maintained in this five-species phylogeny. The only exceptions are the A. thaliana genes of the subgroups S9 and S18, which were split in two (S9a, S9b, S18a and S18b), and the genes from subgroups 2 and 3, and 10 and 24, which were merged (S2&3 and S10&24). Although, in general, bootstrap values used to define subgroups were relatively high, branches at the base of the tree were very short and had low bootstrap support, probably because many gene duplications in the R2R3-MYB proteins took place early in the history of land plants (Rabinowicz et al., 1999) and they were followed by rapid diversification of the family (Dias et al., 2003). Inside the subgroups, bootstrap values were also low, impairing in many cases the identification of direct coorthology relationships between genes from different species, probably because many genes evolved differently after divergence from a common ancestor, including several events of lineage-specific gene duplication and gene loss. The vast majority of the subgroups (70%) contained members from all five species, whereas the remaining subgroups were found only in some species. For instance, three subgroups (S6, S15 and SAtM82) were present in all species but rice. These subgroups were also absent in maize (Du et al., 2012a), suggesting that they are not present in monocots. Some subgroups were specific to A. thaliana, V. vinifera or rice. The A. thaliana-specific subgroup S12, for instance, is known to contain members involved in the regulation of glucosinolate biosynthesis, which is specific to the Brassicaceae family (Matus et al., 2008). Three subgroups (S5, S6 and SAtM5, highlighted in yellow in Fig. 1) contained a significantly higher number of sequences from the woody perennial species (E. grandis, V. vinifera and P. trichocarpa), as compared with the herbaceous species, (A. thaliana and O. sativa). We designated these subgroups as putative ‘woody-expanded subgroups’ (Fig. 1). Remarkably, A. thaliana New Phytologist (2014) www.newphytologist.com

New Phytologist members of these three woody-expanded subgroups are involved in the phenylpropanoid pathway, more specifically in proanthocyanidin and anthocyanin biosynthesis (AtMYB123/TT2; Nesi et al., 2001; AtMYB75/PAP1, AtMYB90/PAP2, AtMYB113, and AtMYB114; Borevitz et al., 2000; Gonzalez et al., 2008; AtMYB5; Gonzalez et al., 2009), suggesting diversified regulation of these pathways in woody plants. Five subgroups were completely absent from A. thaliana and rice, containing only members from the three woody species (E. grandis, V. vinifera and P. trichocarpa). In order to determine whether these five subgroups were woody-specific, we assessed their occurrence in other plant genomes. We performed a reverse BLAST search in which all of the E. grandis genes belonging to these subgroups were used in similarity searches against the genomes of other plant species from diverse phylogenetic divisions (Fig. 2). The results showed that all five subgroups were completely absent in all the Bryophyte, Lycophyte and Monocot genomes analysed, but were present to different extents in Eudicots (except in the Brassicaceae family) and Gymnosperms, the two phylogenetic divisions containing cambium-derived woody plants. Within the Eudicots, we found the presence of members of at least four of these five subgroups in the perennial trees and shrubs (E. grandis, C. sinensis, G. raimondii, C. papaya, P. persica, M. domestica, M. esculenta, P. trichocarpa and V. vinifera) with the only exception being R. communis. In contrast, we found fewer subgroups (from one to three) in nonwoody Eudicot plants (M. truncatula, C. sativus, L. usitatissium, S. lycopersicum, M. guttatus and A. coerulea) and zero in nonwoody Brassicacea (A. thaliana, C. rubella and B. rapa). Therefore, accordingly, we named the five subgroups ‘woody-preferential subgroups’ (WPS-I, II, III, IV and V) as potentially involved in regulating aspects of cambium-derived woody growth that may be absent or less developed in some plant lineages. In silico characterization of the E. grandis R2R3-MYB gene models Analysis of the exon–intron structure revealed that most of the E. grandis R2R3-MYB genes contain three exons (Fig. S3). In most of these three-exon R2R3-MYB genes, exons 1 and 2 encoded nearly the whole MYB domain, whereas the third exon encoded the rest of the MYB domain and the C-terminal region of the protein. This structure is similar to that described for A thaliana and V. vinifera (Matus et al., 2008). The size of the third exon is much more variable than those of the first two exons, and changes in the sequence of the third exon are thought to be associated with the functional divergence among R2R3-MYB genes (Dias et al., 2003; Matus et al., 2008). Within each subgroup, all the genes had similar patterns of intron–exon structure, as was also observed by Jiang et al. (2004). In general, the sizes of the introns were variable, contrasting with the positions and phases of the intron sites, which were quite conserved. Indeed, the great majority of the genes contained two conserved intron sites, one in the R2 domain and the other in the R3 domain, in phases 1 and 2, respectively (Fig. S3). The remaining genes exhibited different splicing patterns, Ó 2014 The Authors New Phytologist Ó 2014 New Phytologist Trust

New Phytologist

Research 5 NJ tree

Group name

78

45

87

4

4

0

2 3

2 2

1 3

0 0

WPS-I

6

3

5

0

0

S5

16

11

6

1

2

WPS-II SAtM5

3

1

5

2

7

Woody Preferential Subgroup III

WPS-III

4

4

5

0 1 0

0

2

Subgroup 4 + AtMYB6 & AtMYB8

S4

4

5

7

6

5

S7

1

2

3

3

2

S14

10

7

7

6

9

SAtM35 SAtM80 WPS-IV

1 1 6

2 1 1

3 2 4

1 1

WPS-V

2

2

4

0 0

1 1 0

S2&3

4

5

5

7

5

S1

4

3

4

5

7

S9a

0 1

0 3

0 4

2 2

0 3

S12

0 0

2 0

0 0

0 6

0 0

Woody Preferential Subgroup II Subgroup AtMYB5

86 95 44 54 64

Subgroup 7

94 48

Os

10

Subgroup 5

26

At

1 4

Woody Preferential Subgroup I

94

Ptr

11

Subgroup 15

98

Vv

S6

Subgroup AtMYB82

33

Egr

SAtM82 S15

Subgroup 6

98

Short name

Subgroup 14

1 0

41 97 48 34

Subgroup AtMYB35 Subgroup AtMYB80

99

64

Woody Preferential Subgroup IV Woody Preferential Subgroup V

38

Subgroup 2 & Subgroup 3 + AtMYB10 & AtMYB72

31 97

Subgroup 1 AtMYB47 & AtMYB95

99 40

Subgroup 9a

89

GSVIVT01013979001 & GSVIVT01024357001 Subgroup 12

75 58

S11

5

3

6

4

1

S9b

2

2

3

1

2

S10&24

5

5

7

6

6

Subgroup 13 + AtMYB26 + AtMYB67

S13

8

7

12

6

10

Subgroup AtMYB46 & AtMYB83

SAtM46 SAtM103 S16

2 1 2

1 1 1

4 2 2

2 1 3

1 1 3

Subgroup AtMYB20, AtMYB40, AtMYB42, AtMYB43, AtMYB85 & AtMYB99

SAtM85

6

3

7

6

5

Subgroup AtMYB27, AtMYB48 & AtMYB59

SAtM59

1

2

3

3

3

Subgroup AtMYB71, AtMYB79 & AtMYB121

SAtM71 S19

1 2

2 1

4 2

3 3

3 2

S20

4

5

8

6

7

Subgroup 18a

S18a

2

4

5

6

5

Subgroup AtMYB125

Subgroup 18b (AtMYB104)

SAtM125 SAtM91 S18b

1 1 0

1 2 0

3 3 0

1 1 1

1 1 1

Subgroup 21 + AtMYB89

S21

6

6

13

8

4

Subgroup 22

S22

5

2

8

4

5

Subgroup 23

S23

2 0

2 0

1 0

3 0

1 1

Subgroup 25 + AtMYB98

S25

2

3

6

7

6

Subgroup AtMYB88 & AtMYB124

SAtM88

2

1

2

2

1

141

123

180

126

106

Subgroup 11+ AtMYB49 Subgroup 9b

38

Subgroup 10 + Subgroup 24

28

0

27 54 99

68

Subgroup AtMYB103

99

Subgroup 16

94

88 95

72

72

44

82

Subgroup 19 + AtMYB57

97

Subgroup 20

65

99

99

58

99

Subgroup AtMYB91

37 50 34 35 65

LOC_Os03g13310.1 & LOC_Os08g34960.1

77

78

44 44 99

0.2

Fig. 1 Neighbour-joining phylogenetic tree constructed using 676 amino acid sequences (Supporting Information Table S2) including all of the R2R3-MYB proteins from Eucalyptus grandis (Egr), Vitis vinifera (Vv), Populus trichocarpa (Ptr), Arabidopsis thaliana (At) and Oryza sativa (Os). Each triangle represents a R2R3-MYB subgroup (information about which genes are in each subgroup is included in Table S3), defined based on the topology of the tree and the bootstrap values. Bootstrap values are shown next to the branches (values < 25 are not shown). Two subgroups (S10&24 and S18a) had low bootstrap values, suggesting high divergence between these proteins from these five different plant species. The tree is drawn to scale, with branch lengths in the same units as those of the evolutionary distances used to infer the phylogenetic tree (amino acid substitutions per site). Subgroup names are included next to each clade together with a short name to simplify nomenclature. The number of genes of each species for each subgroup is also included. Subgroups in general expanded in woody species (E. grandis, V. vinifera and P. trichocarpa) are highlighted in yellow, whereas subgroups preferentially found in woody species are highlighted in red. Other groups are in grey. Ó 2014 The Authors New Phytologist Ó 2014 New Phytologist Trust

New Phytologist (2014) www.newphytologist.com

6 Research

New Phytologist

Fig. 2 Presence/absence of genes belonging to the woody-preferential subgroups in different plant genomes from diverse phylogenetic divisions obtained by a reverse BLAST strategy. Black boxes indicate the presence of woody-preferential subgroups, whereas white boxes indicate their absence.

which were generally conserved within the same subgroups in different species (Jiang et al., 2004). The E. grandis R2R3-MYB genes are distributed throughout the 11 chromosome scaffolds, with chromosome scaffolds B, C and J containing more genes than the others (Fig. 3). Interestingly, whereas chromosome scaffold G contains the smallest number of R2R3-MYB genes, it does contain the E. grandis orthologues of the two well-described EgMYB1 and EgMYB2 genes involved in secondary cell wall formation (Goicoechea et al., 2005; Legay et al., 2007, 2010). Ten tandem duplication arrays are highlighted in Fig. 3. Remarkably, these tandem arrays include mostly genes from the woody-expanded and woody-preferential subgroups. These subgroups also contain many gene models in tandem duplication arrays in P. trichocarpa and V. vinifera, but not in A. thaliana and O. sativa (Fig. S4). Analysis of the ratio of synonymous to nonsynonymous substitutions per site revealed that nearly all tandem duplicated genes were under purifying selection (Table S4). This type of selection, in which the rate of synonymous mutations is greater than that of nonsynonymous mutations, is the most common form of evolution for genes under functional constraints. However, for some gene pairs from subgroup S6 on scaffold J, the rates of synonymous and nonsynonymous substitutions per site were similar, suggesting that these genes evolved without selective constraints. This type of neutral evolution is typical although not exclusive for pseudogenes (Koonin & Wolf, 2010). Arabidopsis thaliana subgroups were defined by the presence in the variable C-terminal region of the R2R3-MYB genes of conserved motifs thought to play important roles in specifying functional divergence (Kranz et al., 1998; Stracke et al., 2001). For New Phytologist (2014) www.newphytologist.com

example, it has been demonstrated that AtMYB4, the putative orthologue of EgMYB1, contains a C-terminal motif also present in other R2R3-MYB proteins from subgroup S4 that is essential to bind and repress the C4H promoter, thus modifying phenylpropanoid metabolism (Jin et al., 2000). We searched for conserved motifs in the C-terminal region of the R2R3-MYB genes belonging to the woody-preferential subgroups including E. grandis, P. trichocarpa and V. vinifera genes. Using the MEME Suite (Bailey et al., 2009), we identified one to two putative motifs in each subgroup (Fig. S5). These previously unidentified motifs could be important for the function or regulation of these woody-preferential genes. Deep transcript abundance profiling of the R2R3-MYB family by RNAseq Eucalyptus grandis R2R3-MYB transcript abundance was analysed in six E. grandis organs and tissues by RNAseq (Table S5). Clustering based on transcript abundance patterns revealed 12 RNAseq-based expression clusters (Fig. 4). In most cases, genes present in the same phylogenetic subgroup exhibited distinct transcript profiles. As pointed out by Dubos et al. (2010), genes belonging to the same subgroup can perform similar functions, but in different cell types or in response to different developmental or environmental cues. In some cases, however, transcript abundance patterns of all genes belonging to the same phylogenetic subgroup were quite similar. This is the case for all members of subgroup S1, which were grouped in RNAseq-based cluster 1. In A. thaliana, members of this subgroup are mostly involved in stress responses and defence mechanisms (Cominelli Ó 2014 The Authors New Phytologist Ó 2014 New Phytologist Trust

New Phytologist

Research 7

Fig. 3 Physical position of the 141 R2R3-MYB genes in the 11 chromosome scaffolds of Eucalyptus grandis. Genes belonging to the same subgroup, physically close (within 100 kb of each other) and separated by 10 or fewer spacer gene models were considered to be the result of tandem duplication events. Each duplication array is highlighted in yellow when corresponding to woody-expanded subgroups (S5, S6 and SAtM5), in red when corresponding to woody-preferential subgroups (WPS-I, II, III, IV and V), and in grey for the other subgroups.

et al., 2005; Raffaele et al., 2008; Seo et al., 2011). Similarly, all genes of subgroup S22 were grouped in RNAseq-based cluster 3. Arabidopsis thaliana members of subgroup S22 are generally related to abiotic stress responses (Jung et al., 2008). Most of the genes involved in secondary cell wall formation and lignin biosynthesis exhibit preferential expression in secondary xylem, where these processes mainly occur. Therefore, the best candidates to play roles in wood formation in E. grandis are probably grouped in RNAseq-based clusters 2 (preferentially expressed in immature xylem and phloem) and 10 (preferentially expressed in immature xylem). In line with this hypothesis, c. 70% of genes found in clusters 2 and 10 belong to subgroups S2&3, S4, S13, S21, SAtM46, SAtM85 and SAtM103. All these subgroups contain members that have been shown to be involved in secondary cell wall formation in functional studies, mostly in A. thaliana and poplar (reviewed in Dubos et al., 2010; GrimaPettenati et al., 2012). Moreover, the E. grandis orthologues of EgMYB1 and EgMYB2 (Goicoechea et al., 2005; Legay et al., 2007, 2010) were also included in RNAseq-based clusters 2 and 10, respectively. No transcripts were detected for seven genes in RNAseq-based cluster 12. A lack of expression data may indicate that these are pseudogenes. In support of this hypothesis, this cluster contained EgrMYB113, a gene that apparently evolved without selective constraints (Table S4) and has a large deletion in its C-terminal domain (data not shown). However, some genes might only be Ó 2014 The Authors New Phytologist Ó 2014 New Phytologist Trust

expressed under very specific treatments. This may be the case for AtMYB19, which, in a large survey carried out by Kranz et al. (1998) in A. thaliana, was found to be expressed only after infection with Pseudomonas syringae. Thus, it is not possible to make the distinction in RNAseq-based cluster 12 between pseudogenes (with the likely exception of EgrMYB113) and genes only expressed in specific conditions not included in our RNAseq survey. Detailed transcript abundance profiling of selected R2R3-MYBs by microfluidic qPCR Based on a combination of phylogenetic analysis and RNAseq transcript abundance data, we selected 63 R2R3-MYB genes for expression profiling in a larger survey of developmental and environmental conditions. These genes were selected among those belonging to the woody-expanded subgroups (S5, S6 and SAtM5), woody-preferential subgroups (WPS-I, II, III, IV and V) or subgroups related to secondary cell wall formation and lignin biosynthesis (S2&3, S4, S13, S21, SAtM46, SAtMYB85 and SAtM103). Relative transcript abundance was analysed by microfluidic qPCR in 16 different tissues of E. globulus and E. gunnii 9 dalrympleana hybrids (Table S6). Hierarchical clustering was performed and genes were grouped in three main qPCR-based expression clusters (Fig. 5): genes preferentially expressed in xylem samples; genes highly expressed in lateral New Phytologist (2014) www.newphytologist.com

8 Research

New Phytologist

Fig. 4 Heat map of the RNAseq transcript abundance pattern of the 141 R2R3-MYB genes from Eucalyptus grandis in six different tissues clustered in 12 expression groups using K-means. For each gene, its name is shown to the left of the heatmap, whereas the short name of the phylogenetic subgroup is included to the right. Transcript abundance is expressed in standardized log2 fragments per kilobase of exon per million fragments mapped (FPKM) values. Next to each RNAseq-based cluster there is a graph with the mean transcript abundance  SD for the entire cluster. ST, shoot tips; YL, young leaves; MT, mature leaves; FL, flowers; PH, phloem; IX, immature xylem.

roots, leaves, shoot tips, flowers and fruits; and genes with preferential expression in the cambium-enriched region. The qPCR-based cluster of genes highly and preferentially expressed in xylem (Fig. 5) contained most of the E. grandis R2R3-MYB genes putatively involved in secondary cell wall New Phytologist (2014) www.newphytologist.com

formation, which were previously found in RNAseq-based clusters 2 and 10 (in Fig. 4). The qPCR-based cluster of genes highly expressed in lateral roots, leaves, flowers and fruits was composed of a large set of genes exhibiting different transcript abundance patterns that have in common the general relatively low transcript Ó 2014 The Authors New Phytologist Ó 2014 New Phytologist Trust

New Phytologist

Research 9

Fig. 5 Heat map of the transcript abundance patterns assessed using microfluidic qPCR of 63 Eucalyptus genes in 16 different tissues from Eucalyptus globulus and from Eucalyptus Gundal hybrids (Eucalyptus gunnii 9 Eucalyptus dalrympleana). Genes and samples were hierarchically clustered according to their transcript abundance (expressed in relation to the mean of all samples and log2-transformed) showing a topology that allows a clear differentiation of three different qPCR-based expression clusters. Gene names are shown to the left of the heat map, whereas the short name of the phylogenetic subgroup is included to the right. Next to each qPCR-based cluster there is a graph with the transcript abundance of all the genes within the cluster (in grey) and the mean of all the genes within the cluster (in black). R, lateral roots; SS, secondary stem; PS, primary stem; OX, opposite xylem; TX, tension xylem; X, xylem; JX, juvenile xylem; MX, mature xylem; Ph, phloem; C25y, cambium from 25-yr-old trees, C7y, cambium from 7-yr-old trees; FB, floral buds; FC, fruit capsule; ST, shoot tips; ML, mature leaves; YL, young leaves; Egl, E. globulus; Egn, E. gunnii 9 E. dalrympleana. Ó 2014 The Authors New Phytologist Ó 2014 New Phytologist Trust

New Phytologist (2014) www.newphytologist.com

10 Research

abundance in the vascular tissues (xylem, cambium and phloem) and high transcript abundance in shoot tips (Fig. 5). This qPCR-based cluster contains all genes analysed belonging to WPS-IV and V, which were not preferentially expressed in vascular tissues. The qPCR-based cluster of genes preferentially expressed in the cambium-enriched fraction is highly enriched in genes from subgroups S5 and WPS-I, II and III. In fact, all of the genes profiled with qPCR included in these woody-preferential subgroups were grouped in this cluster, with the exception of EgrMYB68, which was included in the xylem qPCR-based cluster because of its high transcript abundance in xylem tissues, although it was also quite abundant in the cambial-enriched fraction.

Discussion Analysis of the genome sequence of E. grandis, the second hardwood forest tree to be sequenced, identified 141 members of the R2R3-MYB family, which were further characterized by co-phylogenetic analysis with the corresponding gene families of A. thaliana, P. trichocarpa, V. vinifera and O. sativa. This comparative phylogenetic approach allowed us to identify three subgroups expanded in woody species (S5, S6 and SAtM5) and, strikingly, five new subgroups only present in woody species (WPS-I, II, III, IV and V). A reverse BLAST search on 24 genomes from relevant taxonomic lineages revealed that these five woody-preferential subgroups were totally absent in the basal lineages of the Bryophytes and Lycophytes, as well as in the more modern Monocot and Brassicaceae lineages. Within the Eudicots, all five subgroups were found only in woody perennial species, whereas in nonwoody plants only zero to three subgroups were identified. Moreover, the presence of some subgroups in Gymnosperms and Eudicots suggested that they are derived from a common ancestor of the Gymnosperms and Angiosperms, and were lost in some lineages during evolution. Eucalyptus grandis genes from WPS-I, II and III are preferentially expressed in the cambium-enriched region, the meristem responsible for the extensive secondary growth that leads to wood formation. It is therefore tempting to speculate that these genes could regulate specific aspects related to cambium development and metabolism, However, it should be taken into account that genes regulating secondary growth could be present in nearly all Eudicot plants, and that the distinction between woody and herbaceous lifestyles could be mainly a measure of the degree of gene regulation (Groover, 2005; Spicer & Groover, 2010). It is well known, for example, that A. thaliana plants grown under specific conditions are able to develop secondary growth in parts of the inflorescence stem and in the hypocotyl, producing secondary xylem with vessels and fibres similar to those of angiosperm woody species, although some structural differences are still notable such as the lack of rays (Lev-Yadun, 1994; Chaffey et al., 2002; Nieminen et al., 2004; Melzer et al., 2008; Lens et al., 2012). Even these differences were questioned in a recent paper showing the presence of rays during secondary growth of the A. thaliana inflorescence stems (Mazur & Kurczynska, 2012). Nevertheless, some aspects of tree secondary growth, such as the New Phytologist (2014) www.newphytologist.com

New Phytologist seasonal variations of cambial activity and the developmental separation between juvenile and mature wood, do not occur in short-lived annuals such as A. thaliana (Chaffey et al., 2002; Nieminen et al., 2004). It is conceivable, therefore, that perennial woody plants require more sophisticated control to adapt secondary growth to many different environmental conditions over their long lifespans. Nearly all the transcripts of the genes from the woody-preferential subgroups preferentially expressed in the cambium were also relatively highly expressed in shoot tips. This finding supports a possible dual role in meristematic regulation, as found for many genes regulating both the vascular cambium and shoot apical meristems (Schrader et al., 2004; reviewed in Groover, 2005). However, taking into account the close phylogenetic relationship between these genes and those from subgroups involved in the biosynthesis of anthocyanins and flavonoids (S4, S5, S6, S7 and SAtM5), another plausible possibility is that genes belonging to WPS-I, II and III regulate some phenylpropanoidderived compounds mostly synthesized in cambium-derived cells from woody species. In fact, it is well known that flavonoids can interfere with auxin transport capacity by competing with the auxin efflux inhibitor 1-N-naphthylphthalamic acid (NPA), thus increasing local auxin concentrations (reviewed by Taylor & Grotewold, 2005). As cambium stimulation depends on auxin and local NPA treatments induce cambium activity (Suer et al., 2011), it is tempting to speculate that these woody-preferential genes could regulate the biosynthesis of flavonoids or some phenylpropanoid-derived compounds that modify auxin transport in cambium. Indirect evidence to support this hypothesis is that, in A. thaliana, R2R3-MYBs from the S7 subgroup negatively regulate auxin response via interacting with other transcription factors (reviewed by Li, 2014). Considering the importance of cambial activity for woody plants, the existence of such a fine regulatory system, which could be absent or less developed in herbaceous plants, is plausible. This sophisticated mechanism could, for example, modulate cambium activity in response to environmental factors. The functions of woody-preferential subgroups should, however, be further studied using reverse genetic approaches in woody model plants with the aim of characterizing their roles. The C-terminal motifs conserved within the different woodypreferential subgroups could assist in the characterization of these genes, as they are thought to specify the function of these R2R3MYB genes. Genes from subgroups expanded in woody species (S5, S6 and SAtM5) include members involved in the regulation of the phenylpropanoid pathway leading to proanthocyanidin and anthocyanin biosynthesis, which is important for pigment production and proper seed development in A. thaliana (Borevitz et al., 2000; Nesi et al., 2001; Gonzalez et al., 2008, 2009). Matus et al. (2008) hypothesized that the expansion of these subgroups in V. vinifera compared with A. thaliana is attributable to the need for more sophisticated control of phenylpropanoid-related processes regulating the formation of grapes. Some genes from these subgroups have been found to control fruit colour in some Rosid species (Allan et al., 2008 and references therein; Lin-Wang et al., 2010). It is conceivable that such a mechanism could also Ó 2014 The Authors New Phytologist Ó 2014 New Phytologist Trust

New Phytologist be involved in the fruiting structures of E. grandis and P. trichocarpa, but other roles for these genes are also possible. For example, as flavonoid biosynthesis and lignin biosynthesis share many reactions from the phenylpropanoid pathway, some genes regulating anthocyanin biosynthesis could also be related to lignin biosynthesis and secondary cell wall formation. This is the case with the gene AtMYB75/PAP1 from the S6 subgroup, which, apart from its role in anthocyanin biosynthesis, interacts with the transcriptional repressor KNAT7 (KNOTTED-LIKE HOMEOBOX OF ARABIDOPSIS THALIANA 7) and, in A thaliana loss-of-function mutants, increases cell wall thickness in the xylary and interfascicular fibres of the inflorescence stems (Bhargava et al., 2010). In addition, genes from woody-expanded subgroups are also involved in stress responses, as was found for some A. thaliana, poplar and apple members of these subgroups (Lea et al., 2007; Mellway et al., 2009; Olsen et al., 2009; Rowan et al., 2009; Chen et al., 2012; Page et al., 2012; Cao et al., 2013). S5 is the most expanded subgroup in E. grandis and only two of the 16 genes belonging to this subgroup found in the E. grandis genome were not included in tandem arrays. Of the 12 genes from the S5 subgroup analysed by microfluidic qPCR, six grouped in the qPCR-based cluster of genes preferentially expressed in the cambium-enriched fraction. The fact that S5 is the subgroup most expanded in E. grandis, even compared with other woody species analysed, points to putative specializations relating to unique aspects of Eucalyptus biology (Myburg et al., 2014). To date, the only functional data available for genes belonging to subgroup S5 come from maize C1, A. thaliana AtMYB123 (TT2), and their putative orthologues in P. trichocarpa and V. vinifera, all of which have been related to the biosynthesis of proanthocyanidins and have no described roles in cambium (Paz-Ares et al., 1987; Nesi et al., 2001; Bogs et al., 2007; Mellway et al., 2009). Subgroup S5 genes in E. grandis could also be involved in the regulation of the biosynthesis of flavonoids, for example related to production of kino, a polyphenolic exudate that seems to contain high amounts of proanthocyanidins (Hillis & Yazaki, 1974). This substance forms veins particularly in the wood of Eucalyptus species, apparently in response to stresses such as damage to the vascular cambium, decreasing the commercial value of the wood produced under these circumstances (Hillis & Yazaki, 1974; Eyles & Mohammed, 2002). Twenty-four per cent of R2R3-MYB genes in E. grandis were arranged in tandem duplication arrays, somewhat lower that the percentage found at the whole-genome level (34%; Myburg et al., 2014), but this percentage was not equally distributed across subfamilies. The E. grandis genome has a larger proportion of tandem duplicated genes than any other plant genome studied and this seems to be attributable to an unusually high rate of tandem duplicate gain (Myburg et al., 2014). However, it is remarkable that 94% of the R2R3-MYB genes of E. grandis that are spatially arranged in tandem arrays belong to subgroups either expanded or preferentially found in woody species. Specifically, the subgroups containing the highest proportion of tandem duplicated genes were S5 and S6, with 88% and 82%, respectively, of their genes arranged in tandem in E. grandis. Populus trichocarpa and V. vinifera also contained a high percentage of R2R3-MYB gene models from woody-expanded and Ó 2014 The Authors New Phytologist Ó 2014 New Phytologist Trust

Research 11

woody-preferential subgroups arranged in tandem repeats (c. 80% of the tandem duplicated genes). The percentage of tandem genes in woody-expanded groups was much lower in A. thaliana (33%), whereas no R2R3-MYB genes arranged in tandem were found in the genome of O. sativa. It has been previously shown that tandem duplicated genes tend to be more frequently retained during evolution when they are involved in responses to environmental stimuli (Hanada et al., 2008 and references therein). Taking into account that perennial plants such as trees and shrubs have to overcome many biotic and abiotic challenges over their lifetimes in order to be evolutionarily successful, whereas annual herbs can simply die after producing seeds, it could be hypothesized that the adaptive mechanisms for stress resistance in woody perennial plants should be more elaborated than those in annual herbaceous plants. Therefore, the higher number of tandem duplicated genes from the woody-expanded and woody-preferential subgroups in E. grandis, P. trichocarpa and V. vinifera could be related to a more sophisticated response to stress conditions in these woody perennial plants. In sharp contrast to the highly duplicated genes belonging to the woody-expanded and woody-preferential subgroups, R2R3MYBs controlling secondary cell wall formation and lignin biosynthesis (included in subgroups S2&3, S4, S13, S21, SAtM46, SAtM85 and SAtM103) are quite conserved among the five different plant species studied here and, in general, they are not arranged in tandem. De Smet et al. (2013) pointed out that some crucial genes are maintained in single-copy status by gene-loss mechanisms after duplication. They formulated two main hypotheses to explain why some duplicated genes are so frequently lost in different plant genomes: the first hypothesis is that duplications may increase gene expression and disrupt the balance of their interactions with other proteins in the cell, and the second is that mutations in one of the duplicated copies may produce dominant-negative phenotypes that disrupt the function of the nonmutated proteins, especially when part of multiprotein complexes. It is therefore conceivable that R2R3-MYBs regulating lignin biosynthesis and secondary cell wall are preserved from being excessively duplicated by one of these two hypothetical mechanisms. Conclusions The use of E. grandis in the R2R3-MYB phylogeny led to the identification of three subgroups expanded in the woody plants analysed and five subgroups preferentially found in woody species. In contrast to genes regulating lignin biosynthesis and secondary cell wall formation, which are quite conserved among the five different species analysed and mostly not affected by duplication events, woody-expanded and woody-preferential subgroups are greatly affected by tandem duplication events. Moreover, the combination of phylogenetic analysis and expression studies provided interesting insights, such as in the case of the genes from WPS-I, II and III, which were shown to be preferentially expressed in the cambial-enriched fraction. This work represents a starting point for characterizing the function of many newly described R2R3-MYB transcription factors in order to better New Phytologist (2014) www.newphytologist.com

12 Research

understand their role in Eucalyptus, one of the most widely used tree genera in forestry plantations all over the world.

Acknowledgements This work was supported by the Centre National pour la Recherche Scientifique (CNRS), the University Paul Sabatier Toulouse III (UPS), the French Laboratory of Excellence project ‘TULIP’ (ANR-10-LABX-41; ANR-11-IDEX-0002-02), and the TREEFORJOULES project (ANR-2010-KBBE-007-01 and FCT, P-KBBE/AGR_GPL/0001/2010). M.S. is grateful for support through the postdoc fellowship ‘Beatriu de Pinos’ thanks to the Departament d’Universitats, Recerca i Societat de la Informacio de la Generalitat de Catalunya. V.C. was supported by a PhD grant from FCT (SFRH/BD/72982/2010). E.L.O.C. was supported by a PhD grant from Fundacß~ao de Amparo a Pesquisa do Estado de S~ao Paulo (FAPESP). J.P.P. acknowledges a research contract from the Ci^e ncia 2008 programme (FCT). RNASeq analysis was supported by Mondi and Sappi through the Forest Molecular Genetics (FMG) Programme, the Technology and Human Resources for Industry Programme (THRIP, UID 80118), the National Research Foundation (NRF, UID 71255 and 86936) and the Department of Science and Technology (DST) of South Africa. We are grateful to F. Melun and L. Harvengt (FCBA, FR), C. Ara ujo, L. Neves, L. Leal (Altri Florestal, PT) and C. Marques (RAIZ, PT) for kindly providing or allowing collection of Eucalyptus organs and tissues; to the Genotoul Bioinformatics Platform Toulouse Midi-Pyrenees for computing and storage resources; and to A. Plasencia, M. Bonhomme and P. Boher for helping in the phylogeny, in silico analysis and discussion, respectively. The authors in particular acknowledge the EUCAGEN consortium and the DOE JGI (USA) for making available the E. grandis genome.

References Allan AC, Hellens RP, Laing WA. 2008. MYB transcription factors that colour our fruit. Trends in Plant Science 13: 99–102. Arvidsson S, Kwasniewski M, Ria~ no-Pachon DM, Mueller-Roeber B. 2008. QuantPrime- – a flexible tool for reliable high-throughput primer design for quantitative PCR. BMC Bioinformatics 9: 465. Bailey TL, Boden M, Buske FA, Frith M, Grant CE, Clementi L, Ren J, Li WW, Noble WS. 2009. MEME SUITE: tools for motif discovery and searching. Nucleic Acids Research 37: W202–W208. Bhargava A, Mansfield SD, Hall HC, Douglas CJ, Ellis BE. 2010. MYB75 functions in regulation of secondary cell wall formation in the Arabidopsis inflorescence stem. Plant Physiology 154: 1428–1438. Bogs J, Jaffe FW, Takos AM, Walker AR, Robinson SP. 2007. The grapevine transcription factor VvMYBPA1 regulates proanthocyanidin synthesis during fruit development. Plant Physiology 143: 1347–1361. Bomal C, Bedon F, Caron S, Mansfield SD, Levasseur C, Cooke JEK, Blais S, Tremblay L, Morency MJ, Pavy N et al. 2008. Involvement of Pinus taeda MYB1 and MYB8 in phenylpropanoid metabolism and secondary cell wall biogenesis: a comparative in planta analysis. Journal of Experimental Botany 59: 3925–3939. Borevitz JO, Xia Y, Blount J, Dixon RA, Lamb C. 2000. Activation tagging identifies a conserved MYB regulator of phenylpropanoid biosynthesis. The Plant Cell 12: 2383–2394. Cao ZH, Zhang SZ, Wang RK, Zhang RF, Hao YJ. 2013. Genome wide analysis of the apple MYB transcription factor family allows the identification New Phytologist (2014) www.newphytologist.com

New Phytologist of MdoMYB121 gene confering abiotic stress tolerance in plants. PLoS ONE 8: e69955. Cassan-Wang H, Soler M, Yu H, Camargo ELO, Carocha V, Ladouce N, Savelli B, Paiva JAP, Leple JC, Grima-Pettenati J. 2012. Reference genes for high-throughput quantitative reverse transcription-PCR analysis of gene expression in organs and tissues of Eucalyptus grown in various environmental conditions. Plant & Cell Physiology 53: 2101–2116. Chaffey N, Cholewa E, Regan S, Sundberg B. 2002. Secondary xylem development in Arabidopsis: a model for wood formation. Physiologia Plantarum 114: 594–600. Chen M, Wang Z, Zhu Y, Li Z, Hussain N, Xuan L, Guo W, Zhang G, Jiang L. 2012. The effect of TRANSPARENT TESTA2 on seed fatty acid biosynthesis and tolerance to environmental stresses during young seedling establishment in Arabidopsis. Plant Physiology 160: 1023–1036. Cominelli E, Galbiati M, Vavasseur A, Conti L, Sala T, Vuylsteke M, Leonhardt N, Dellaporta SL, Tonelli C. 2005. A guard-cell-specific MYB transcription factor regulates stomatal movements and plant drought tolerance. Current Biology 15: 1196–1200. De Smet R, Adams KL, Vandepoele K, Van Montagu MCE, Maere S, Van de Peer Y. 2013. Convergent gene loss following gene and genome duplications creates single-copy families in flowering plants. Proceedings of the National Academy of Sciences, USA 110: 2898–2903. Dias AP, Braun EL, McMullen MD, Grotewold E. 2003. Recently duplicated maize R2R3 Myb genes provide evidence for distinct mechanisms of evolutionary divergence after duplication. Plant Physiology 131: 610–620. Du H, Feng BR, Yang SS, Huang YB, Tang YX. 2012a. The R2R3-MYB Transcription Factor Gene Family in Maize. PLoS ONE 7: e37463. Du H, Yang SS, Feng BR, Liu L, Tang YX, Huang YB, Liang Z. 2012b. Genome-wide analysis of the MYB transcription factor superfamily in soybean. BMC plant biology 12: 106. Dubos C, Stracke R, Grotewold E, Weisshaar B, Martin C, Lepiniec L. 2010. MYB transcription factors in Arabidopsis. Trends in Plant Science 15: 573–581. Eyles A, Mohammed C. 2002. Kino vein formation in Eucalyptus globulus and E. nitens. Australian Forestry 66: 206–212. Foucart C, Jauneau A, Gion JM, Amelot N, Martinez Y, Panegos P, Grima-Pettenati J, Sivadon P. 2009. Overexpression of EgROP1, a Eucalyptus vascular-expressed Rac-like small GTPase, affects secondary xylem formation in Arabidopsis thaliana. New Phytologist 183: 1014–1029. Goicoechea M, Lacombe E, Legay S, Mihaljevic S, Rech P, Jauneau A, Lapierre C, Pollet B, Verhaegen D, Chaubet-Gigot N et al. 2005. EgMYB2, a new transcriptional activator from Eucalyptus xylem, regulates secondary cell wall formation and lignin biosynthesis. Plant Journal 43: 553–567. Gomez-Maldonado J, Avila C, Torre F, Ca~ nas R, Ca novas FM, Campbell MM. 2004. Functional interactions between a glutamine synthetase promoter and MYB proteins. Plant Journal 39: 513–526. Gonzalez A, Mendenhall J, Huo Y, Lloyd A. 2009. TTG1 complex MYBs, MYB5 and TT2, control outer seed coat differentiation. Developmental Biology 325: 412–421. Gonzalez A, Zhao M, Leavitt JM, Lloyd AM. 2008. Regulation of the anthocyanin biosynthetic pathway by the TTG1/bHLH/Myb transcriptional complex in Arabidopsis seedlings. Plant Journal 53: 814–827. Grima-Pettenati J, Soler M, Camargo ELO, Wang H. 2012. Transcriptional Regulation of the lignin biosynthetic pathway revisited: new players and insights. In: Jouanin L, Lapierre C, eds. Lignins biosynthesis, biodegradation and bioengineering. Advances in Botanical Research, 61. Amsterdam, the Netherlands: Elsevier, 173–218. Groover AT. 2005. What genes make a tree a tree? Trends in Plant Science 10: 210–214. Hanada K, Zou C, Lehti-Shiu MD, Shinozaki K, Shiu SH. 2008. Importance of lineage-specific expansion of plant tandem duplicates in the adaptive response to environmental stimuli. Plant Physiology 148: 993–1003. Hertzberg M, Aspeborg H, Schrader J, Andersson A, Erlandsson R, Blomqvist K, Bhalerao R, Uhlen M, Teeri TT, Lundeberg J et al. 2001. A transcriptional roadmap to wood formation. Proceedings of the National Academy of Sciences, USA 98: 14732–14737. Hillis WE, Yazaki Y. 1974. Kinos of Eucalyptus species and their acid degradation products. Phytochemistry 13: 495–498. Ó 2014 The Authors New Phytologist Ó 2014 New Phytologist Trust

New Phytologist Jiang C, Gu X, Peterson T. 2004. Identification of conserved gene structures and carboxy-terminal motifs in the Myb gene family of Arabidopsis and Oryza sativa L. ssp. indica. Genome Biology 5: R46. Jin H, Cominelli E, Bailey P, Parr A, Mehrtens F, Jones J, Tonelli C, Weisshaar B, Martin C. 2000. Transcriptional repression by AtMYB4 controls production of UV-protecting sunscreens in Arabidopsis. The EMBO Journal 19: 6150–6161. Jin H, Martin C. 1999. Multifunctionality and diversity within the plant MYB-gene family. Plant Molecular Biology 41: 577–585. Jung C, Seo JS, Han SW, Koo YJ, Kim CH, Song SI, Nahm BH, Choi YD, Cheong JJ. 2008. Overexpression of AtMYB44 enhances stomatal closure to confer abiotic stress tolerance in transgenic Arabidopsis. Plant Physiology 146: 623–635. Karpinska B, Karlsson M, Srivastava M, Stenberg A, Schrader J, Sterky F, Bhalerao R, Wingsle G. 2004. MYB transcription factors are differentially expressed and regulated during secondary vascular tissue development in hybrid aspen. Plant Molecular Biology 56: 255–270. Katoh K, Misawa K, Kuma K, Miyata T. 2002. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Research 30: 3059–3066. Koonin EV, Wolf YI. 2010. Constraints and plasticity in genome and molecular-phenome evolution. Nature reviews. Genetics 11: 487–498. Kranz HD, Denekamp M, Greco R, Jin H, Leyva A, Meissner RC, Petroni K, Urzainqui A, Bevan M, Martin C et al. 1998. Towards functional characterisation of the members of the R2R3-MYB gene family from Arabidopsis thaliana. Plant Journal 16: 263–276. Lea US, Slimestad R, Smedvig P, Lillo C. 2007. Nitrogen deficiency enhances expression of specific MYB and bHLH transcription factors and accumulation of end products in the flavonoid pathway. Planta 225: 1245–1253. Legay S, Lacombe E, Goicoechea M, Briere C, Seguin A, Mackay J, Grima-Pettenati J. 2007. Molecular characterization of EgMYB1, a putative transcriptional repressor of the lignin biosynthetic pathway. Plant Science 173: 542–549. Legay S, Sivadon P, Blervacq AS, Pavy N, Baghdady A, Tremblay L, Levasseur C, Ladouce N, Lapierre C, Seguin A et al. 2010. EgMYB1, an R2R3 MYB transcription factor from eucalyptus negatively regulates secondary cell wall formation in Arabidopsis and poplar. New phytologist 188: 774–786. Lens F, Smets E, Melzer S. 2012. Stem anatomy supports Arabidopsis thaliana as model for insular woodiness. New Phytologist 193: 12–17. Lev-Yadun S. 1994. Induction of sclereid differentiation in the pith of Arabidopsis thaliana (L.) Heynh. Journal of Experimental Botany 45: 1845–1849. Li S. 2014. Transcriptional control of flavonoid biosynthesis: fine-tuning of the MYB-bHLH-WD40 (MBW) complex. Plant Signaling & Behavior 8: e27522. Lin-Wang K, Bolitho K, Grafton K, Kortstee A, Karunairetnam S, McGhie TK, Espley RV, Hellens RP, Allan AC. 2010. An R2R3 MYB transcription factor associated with regulation of the anthocyanin biosynthetic pathway in Rosaceae. BMC Plant Biology 10: 50. Matus JT, Aquea F, Arce-Johnson P. 2008. Analysis of the grape MYB R2R3 subfamily reveals expanded wine quality-related clades and conserved gene structure organization across Vitis and Arabidopsis genomes. BMC Plant Biology 8: 83. Mazur E, Kurczynska EU. 2012. Rays, intrusive growth, and storied cambium in the inflorescence stems of Arabidopsis thaliana (L.) Heynh. Protoplasma 249: 217–220. McCarthy RL, Zhong R, Fowler S, Lyskowski D, Piyasena H, Carleton K, Spicer C, Ye ZH. 2010. The poplar MYB transcription factors, PtrMYB3 and PtrMYB20, are involved in the regulation of secondary wall biosynthesis. Plant & Cell Physiology 51: 1084–1090. Mellway RD, Tran LT, Prouse MB, Campbell MM, Constabel CP. 2009. The wound-, pathogen-, and ultraviolet B-responsive MYB134 gene encodes an R2R3 MYB transcription factor that regulates proanthocyanidin synthesis in poplar. Plant Physiology 150: 924–941. Melzer S, Lens F, Gennen J, Vanneste S, Rohde A, Beeckman T. 2008. Flowering-time genes modulate meristem determinacy and growth form in Arabidopsis thaliana. Nature Genetics 40: 1489–1492. Mizrachi E, Hefer CA, Ranik M, Joubert F, Myburg AA. 2010. De novo assembled expressed gene catalog of a fast-growing Eucalyptus tree produced by Illumina mRNA-Seq. BMC Genomics 11: 681. Ó 2014 The Authors New Phytologist Ó 2014 New Phytologist Trust

Research 13 Myburg AA, Grattapaglia D, Tuskan GA, Hellsten U, Hayes RD, Grimwood J, Jenkins J, Lindquist E, Tice H, Bauer D et al. 2014. The genome of Eucalyptus grandis. Nature 510: 356–362. Nesi N, Jond C, Debeaujon I, Caboche M, Lepiniec L. 2001. The Arabidopsis TT2 gene encodes an R2R3 MYB domain protein that acts as a key determinant for proanthocyanidin accumulation in developing seed. The Plant Cell 13: 2099–2114. Nieminen KM, Kauppinen L, Helariutta Y. 2004. A weed for wood ? Arabidopsis as a genetic model. Plant Physiology 135: 653–659. Olsen KM, Slimestad R, Lea US, Brede C, Løvdal T, Ruoff P, Verheul M, Lillo C. 2009. Temperature and nitrogen effects on regulators and products of the flavonoid pathway: experimental and kinetic model studies. Plant, Cell & Environment 32: 286–299. Page M, Sultana N, Paszkiewicz K, Florance H, Smirnoff N. 2012. The influence of ascorbate on anthocyanin accumulation during high light acclimation in Arabidopsis thaliana: further evidence for redox control of anthocyanin synthesis. Plant, Cell & Environment 35: 388–404. Patzlaff A, McInnis S, Courtenay A, Surman C, Newman LJ, Smith C, Bevan MW, Mansfield S, Whetten RW, Sederoff RR et al. 2003. Characterisation of a pine MYB that regulates lignification. Plant Journal 36: 743–754. Paz-Ares J, Ghosal D, Wienand U, Petersont A, Saedler H. 1987. The regulatory c1 locus of Zea mays encodes a protein with homology to myb proto-oncogene products and with structural similarities to transcriptional activators. The EMBO Journal 6: 3553–3558. Pfaffl MW. 2001. A new mathematical model for relative quantification in real-time RT-PCR. Nucleic Acids Research 29: e45. Plomion C, Leprovost G, Stokes A. 2001. Wood formation in trees. Plant Physiology 127: 1513–1523. Rabinowicz PD, Braun EL, Wolfe AD, Bowen B, Grotewold E. 1999. Maize R2R3 Myb genes: sequence analysis reveals amplification in the higher plants. Genetics 153: 427–444. Raffaele S, Vailleau F, Leger A, Joubes J, Miersch O, Huard C, Blee E, Mongrand S, Domergue F, Roby D. 2008. A MYB transcription factor regulates very-long-chain fatty acid biosynthesis for activation of the hypersensitive cell death response in Arabidopsis. The Plant Cell 20: 752–767. Rambaldi D, Ciccarelli FD. 2009. FancyGene: dynamic visualization of gene structures and protein domain architectures on genomic loci. Bioinformatics (Oxford, UK) 25: 2281–2282. Rogers LA, Campbell MM. 2004. The genetic control of lignin deposition during plant growth and development. New Phytologist 164: 17–30. Rowan DD, Cao M, Lin-Wang K, Cooney JM, Jensen DJ, Austin PT, Hunt MB, Norling C, Hellens RP, Schaffer RJ et al. 2009. Environmental regulation of leaf colour in red 35S:PAP1 Arabidopsis thaliana. New Phytologist 182: 102–115. Schrader J, Nilsson J, Mellerowicz E, Berglund A, Nilsson P, Hertzberg M, Sandberg G. 2004. A high-resolution transcript profile across the wood-forming meristem of poplar identifies potential regulators of cambial stem cell identity. The Plant Cell 16: 2278–2292. Seo PJ, Lee SB, Suh MC, Park MJ, Go YS, Park CM. 2011. The MYB96 transcription factor regulates cuticular wax biosynthesis under drought conditions in Arabidopsis. The Plant Cell 23: 1138–1152. Spicer R, Groover A. 2010. Evolution of development of vascular cambia and secondary growth. New Phytologist 186: 577–592. Stracke R, Werber M, Weisshaar B. 2001. The R2R3-MYB gene family in Arabidopsis thaliana. Current Opinion in Plant Biology 4: 447–456. Suer S, Agusti J, Sanchez P, Schwarz M, Greb T. 2011. WOX4 imparts auxin responsiveness to cambium cells in Arabidopsis. The Plant Cell 23: 3247–3259. Tamagnone L, Merida A, Parr A, Mackay S, Culianez-Macia FA, Roberts K, Martin C. 1998. The AmMYB308 and AmMYB330 transcription factors from antirrhinum regulate phenylpropanoid and lignin biosynthesis in transgenic tobacco. The Plant Cell 10: 135–154. Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S. 2011. MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Molecular Biology and Evolution 28: 2731–2739. Taylor LP, Grotewold E. 2005. Flavonoids as developmental regulators. Current Opinion in Plant Biology 8: 317–323. New Phytologist (2014) www.newphytologist.com

14 Research Ulitsky I, Maron-Katz A, Shavit S, Sagir D, Linhart C, Elkon R, Tanay A, Sharan R, Shiloh Y, Shamir R. 2010. Expander: from expression microarrays to networks and functions. Nature Protocols 5: 303–322. Voorrips RE. 2002. MapChart: software for the graphical presentation of linkage maps and QTLs. The Journal of Heredity 93: 77–78. Wilkins O, Nahal H, Foong J, Provart NJ, Campbell MM. 2009. Expansion and diversification of the Populus R2R3-MYB family of transcription factors. Plant Physiology 149: 981–993. Winzell A, Aspeborg H, Wang Y, Ezcurra I. 2010. Conserved CA-rich motifs in gene promoters of Pt x tMYB021-responsive secondary cell wall carbohydrate-active enzymes in Populus. Biochemical and Biophysical Research Communications 394: 848–853. Yanhui C, Xiaoyuan Y, Kun H, Meihua L, Jigang L, Zhaofeng G, Zhiqiang L, Yunfei Z, Xiaoxiao W, Xiaoming Q et al. 2006. The MYB transcription factor superfamily of Arabidopsis: expression analysis and phylogenetic comparison with the rice MYB family. Plant Molecular Biology 60: 107–124. Ye ZH. 2002. Vascular tissue differentiation and pattern formation in plants. Annual Review of Plant Biology 53: 183–202. Zdobnov EM, Apweiler R. 2001. InterProScan – an integration platform for the signature-recognition methods in InterPro. Bioinformatics (Oxford, UK) 17: 847–848. Zhao K, Bartley LE. 2014. Comparative genomic analysis of the R2R3 MYB secondary cell wall regulators of Arabidopsis, poplar, rice, maize, and switchgrass. BMC Plant Biology 14: 135. Zhong R, Lee C, Ye ZH. 2010. Evolutionary conservation of the transcriptional network regulating secondary cell wall biosynthesis. Trends in Plant Science 15: 625–632.

Supporting Information Additional supporting information may be found in the online version of this article. Fig. S1 Neighbour-joining phylogenetic tree constructed using 267 predicted amino acid sequences including all of the R2R3MYB proteins from E. grandis and A. thaliana. Fig. S2 Neighbour-joining phylogenetic tree constructed using 676 predicted amino acid sequences including all of the R2R3MYB proteins from E. grandis, V. vinifera, P. trichocarpa, A. thaliana and O. sativa. Fig. S3 Neighbour-joining phylogenetic tree constructed using 141 amino acid sequences comprising all of the R2R3-MYB proteins from E. grandis; information about exon–intron structure is included next to each gene.

New Phytologist (2014) www.newphytologist.com

New Phytologist Fig. S4 Physical position of the R2R3-MYB gene models in the corresponding chromosome scaffolds of P. trichocarpa, V. vinifera, A. thaliana and O. sativa. Fig. S5 Putative conserved motifs outside the MYB domain found in the woody-preferential subgroups. Table S1 Nucleotide sequences of the primers used for the microfluidic qPCR analysis including gene name, accession number, subgroup and efficiency calculated during the microfluidic qPCR Table S2 R2R3-MYB predicted gene models of Eucalyptus grandis, Vitis vinifera, Populus trichocarpa, Arabidopsis thaliana and Oryza sativa, including accession numbers, size in amino acids, position of the R2 and R3 repeats and position of the gene in the respective genomes Table S3 List of the 676 predicted R2R3-MYB proteins from E. grandis, V. vinifera, P. trichocarpa, A. thaliana and O. sativa classified in the 43 different subgroups Table S4 Rate of synonymous versus nonsynonymous substitutions per site in sequence pairs calculated from all tandem duplicated genes in E. grandis Table S5 Eucalyptus grandis R2R3-MYB transcript abundance analysed in six different tissues from three field-grown E. grandis trees by RNAseq Table S6 Transcript abundance of 63 R2R3-MYB genes analysed by microfluidic qPCR in 16 different tissues of E. globulus and E. gundal (E. gunnii 9 E. dalrympleana) Please note: Wiley Blackwell are not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing material) should be directed to the New Phytologist Central Office.

Ó 2014 The Authors New Phytologist Ó 2014 New Phytologist Trust

The Eucalyptus grandis R2R3-MYB transcription factor family: evidence for woody growth-related evolution and function.

The R2R3-MYB family, one of the largest transcription factor families in higher plants, controls a wide variety of plant-specific processes including,...
3MB Sizes 2 Downloads 8 Views