DNA AND CELL BIOLOGY Volume 11, Number 8, 1992 Mary Ann Liebert, Inc., Publishers

Pp. 605-611

Human Interferon Regulatory Factor 1: Intron-Exon Organization YING CHA, SIMON H. SIMS, MARGARET F. ROMINE,* MARY KAUFMANN, and ALBERT B. DEISSEROTH

ABSTRACT Interferon (IFN) regulatory factor 1 (IRF-1) is a transcriptional regulatory protein that mediates the transcriptional activation of the IFN-a and IFN-/3 genes by viruses and IFNs. To characterize the mechanisms that govern the level of IRF-1 in cells, we isolated the IRF-1 gene and characterized the structure of its intronic and exonic domains and of its regulatory promoter region. A human placental genomic library was screened with an IRF-1 cDNA probe, and two clones that contained the IRF-1 gene and its 5' regulatory region were obtained. We used these clones to determine the complete nucleotide sequence for the IRF-1 gene, finding that the IRF-1 gene spanned 7.72 kb of DNA and included 10 exons and 9 introns. When the deduced amino acid sequences were compared among different species, the most conserved exons were exons 2, 3, and 4, in which the putative DNA binding domain for the IRF-1 protein is located.

INTRODUCTION

box. In addition to three putative xB motifs, several PRDG or A and K I-like hexamers (AARKGA, where R G or T) and GAAANN sequences are found in the IRF-1 promoter region. These sequences are found frequently in the promoter region of type I IFN genes and of IFN-inducible genes. Functional analysis of the IRF-1 promoter suggests that it is strongly induced by IFN-7 and more weakly and transiently induced by IFN-a in K562 and HeLa cells (Sims et ai, 1992). In this study, we have determined the complete nucleotide sequences of the IRF-1 gene, defined all junctions for exons and introns, and analyzed the features that appear to be important for its function in the regulation of IFN genes. =

(IFNs) heterogeneous family of cytokines involved in the regulation of cell growth, differentiation, and antiviral activity (Weissmann and Weber, 1986). Type I IFNs are products of a supergene family, including the IFN-a family and a single copy of the IFN-/3 gene. IFN-7 is described as type II IFN and has a distinct receptor. The transcriptional activation of the IFN and IFN-inducible genes is mediated by binding of nuclear proteins to the promoters of these genes. The cDNA for IFN regulatory factor 1 (IRF-1) has been identified in mice (Miyamoto et ai, 1988), human (Maruyama et ai, 1989; Pine et ai, 1990), and rats (Yu-Lee et ai, 1990). Upon induction, IRF-1 is post-translationally modified (Watanabe et ai, 1991) and binds to the promoter of the IFN-/3 gene at a site designated as PRD-I, along with a nuclear binding protein NF-xB (Baeuerle, 1991) that binds at a neighboring site called PRD-II. This binding results in the transcriptional activation of the IFNß gene. The regulation of the IRF-1 promoter has been studied and transcription initiation sites have been defined by Sims et ai (1992). The IRF-1 promoter contains a CpG island, with several GC boxes and a CAAT box, but no TATA

Interferons

are a

=

MATERIALS AND METHODS

placental genomic DNA library (Clontech was plated at 1 x 105 pfu/plate (150mm plates x 20), and duplicate nitrocellulose filters (Schleicher & Schuell, Inc.) containing phage DNA were hybridized overnight at 65°C with hybridization buffer containing 5 x SSPE, 3x Denhardt's solution, 0.3% SDS, 100 ¡ig/ml sonicated and denatured salmon sperm DNA, and [32P]IRF-1 cDNA (Sims et ai, 1992) labeled by nick A human

Laboratories, Inc.)

Department of Hematology, The University of Texas M.D. Anderson Cancer Center, Houston, TX 77030. •Current address: Battelle Pacific Northwest Laboratories, P.O. Box 999, K4-06, Richland, WA 99352. 605

CHA ET AL.

606

translation. The filters were washed twice at 65°C with 0.1 x SSPE and 0.1% NaDodS04 and then autoradiographed. After the putative positive clones were plaquepurified, they were digested with Sst I and subcloned into the pBS+ vector (Stratagene) for sequencing. The nucleotide sequence was determined by the dideoxy chain-termination method using Sequenase (United States Biochemical Corp.) from denatured double-stranded templates (Sänger et ai, 1977; Kraft et ai, 1988). Oligonucleotides were synthesized from published cDNA sequence for sequencing exons and from obtained intronic sequence for

subsequent sequencing. The polymerase chain reaction (PCR) used 100 ng of each primer, 50 ng of target DNA, 200 fiM dNTPs, 1 unit of Tag polymerase, and 1 x Tag buffer (Promega), in a total volume of 50 pi. The reaction was run for 35 cycles (95°C for 30 sec, 55°C for 30 sec, and 72°C for 3 min), and products were analyzed by 2% agarose gel electropho-

resis.

All

oligonucleotides were synthesized synthesizer.

on a

Cruachem

PS-250 DNA

RESULTS To study the genomic organization of IRF-1, a human genomic library was screened with 32P-labeled IRF-1 cDNA. Two positive clones (G5 and G8) were identified from 2 x 10" pfu and were plaque-purified. Seven Sst I fragments (1, 3, 4, 6, 17, 28, and 46) from clone G5 were subcloned in the pBS+ vector and their nucleotide sequences were obtained. The intron-exon organization of the IRF-1 gene and all seven Sst I fragments are diagramed

in Fig. 1A. The junction and order of each fragment were determined by the size of the PCR product (Fig. 1C) and the published cDNA sequence. Clone G5 was used as target DNA and primers from each fragment were synthesized in sense or antisense orientation for PCR. As shown in Fig. 1C, a 221-bp product (lane 2) between fragments 46 and 3 (from -130 bp to 91 bp), 1,184 bp (lane 3) between fragments 3 and 1 (1,580 bp to 2,764 bp), 896 bp (lane 4) between fragments 6 and 28 (3,048 bp to 3,944 bp), 1,409 bp (lane 5) between fragments 1 and 28 (2,535 bp to 3,944 bp), and a 347-bp product (lane 6) between fragments 28 and 4 (7,256 bp to 7,603 bp) were detected. The same PCR products were obtained when clone G8 was used as target DNA (data not shown). The sequence of fragment 46 (the promoter region, 1,312 nucleotides) has been described by Sims et ai (1992). The complete nucleotide sequence, beginning with the major transcription initiation site (Sims et ai, 1992) and ending at a site 66 nucleotides 3' to the end of the published cDNA sequence for IRF-1 (Maruyama et ai, 1989), is shown in Fig. 2. All intron-exon boundaries conformed to known splice junction consensus sequences (Senapathy et ai, 1990), except for the one associted with the first intron, which was located within the 5' untranslated region of IRF-1 cDNA. As shown in Table 1, each intron began with dinucleotide GT and ended with dinucleotide AG that was preceded by a pyrimidine-rich sequence. The first intron began with an AG dinucleotide and ended with a pyrimidine-rich sequence only. A large CpG island extended from the promoter region to the first intron as indicted on Fig. IB, where several Hpa II sites were also present. A putative site for the formation of Z-DNA was 12 3 4 5

6

i

i

1KB

2000

3000

B

1000

cd6 m ii

mii ii ¡i

4000

5000

7000

8000

9000

mim mini ii im miniiiiiiii ninii iiiiiiiiizbcs

40 CC6G rb25-rffHfflHW-H-H-H-r-H-H—I-H-r eC-l-H-H-H- -1-1366606 FIG. 1. Diagram of the human IRF-1 gene. A. Map of the Sst I (S) fragments, exons (black bars), and introns (open bars) of the IRF-1 gene. The numbers underneath the bars correspond to the Sst I fragments. The arrows (from left to right) indicate the transcription initiation site, translation starting site, and location of putative Z-DNA. B. The location of CpG islands, Hpa II sites, and GC boxes were determined along the sequence of the IRF-1 genomic DNA. A CpG island is a stretch of DNA in which the G + C composition is greater than 50% and the observed/expected ratio of CpG is greater than 0.6. Restriction enzyme Hpa II recognizes "CCGG" sites in DNA. GC boxes (GC) are clustered within the CpG island and are putative Spl binding sites. C. Orientation of each Sst I fragment from clone G5 as determined by PCR. PCR products were analyzed by 2% agarose gel electrophoresis and stained with ethidium bromide. Lane 1 is the X Hind III and 0X174 Hae III marker. The size of each product was indicated by arrow. Primers used for lane 2 were -130AGCCTGATTTCCCCGAAATGACGGCACGCA-101 and "CGGCCGGCGTGGACTGGGCACGGCTCCGGGTGGCCTCGGTTCGG48; lane 3, ' 580CACCCACCTGAAGGCACTTCCAGTTT'606 and 2 764CCATGGGATCTGGAAGATCATCT2742; lanes 4 and 5, "AAGTGTGTGGCTTCAGGGGGTTTGT3 or 5ACTAGGTGTCCTCCTGAG AG2 "4 with 3 944AGCATCTCGGCTGGACTTCGACTTT3 92°; and lane 6, 7 256CTGGGGACTGTTCCAAAGC7 274 and 7 603TGTTCACACATGACTTAATGGTTA7 5S0.

607

HUMAN INTERFERON REGULATORY FACTOR 1 GENE

EXON 1 1 81

AGAGCTCGCCACTCCTTAGTCGAGGCAAGACGTGCGCCCGAGCCCCGCCGAACCGAGGCCACCCGGAGCCGTGCCCAGTC CACGCCGGCCGTGCCCGCGCCTTAAGAACCCGGCAACCTCTGCCTTCTTCCCTCTTCCACTCGGAGTCGCGCTCCGCCCT

161 24 1

CACTGCAGCCCCTGCGTCGCCGGGACCCTCGCGCGCGACCGCCGAATCGCTCCTGCAGCagaggtgagtacgcctttgag

INTRON 1

32 1 401

481 561 64 1 72 1

801 881 961 1041 1121

(1059 bp)

gccggggcaccggcggcgtcgaataaaaggcgcgcggggcaccaggaagtggggggtcgaaagctccaggctggagactc gccggcgcgcggcgtgtcgcccgggcctccgcgcgggctccggggggcgccggaggagctgcgagccgcgggccgcggcg cggggagggcgggaccgggcgtggaccgcccacccggacgaggctgccggcgcccggcagctttcgcagatctgcgtgcg cgcagccgccaggggcctgtaggtggcccgctatgttcgtcccgcgcatccacacgccgtgccggggaccgagtgtcagc ccacgcgtgggcgcccagtgctcccggctttcggcggtcccacgtccggccccaggcgacaggttttgggctccctgtgc

tggtggcaagggctgcttactgcccaggtggctggagggaatcgtgactacggagactgcgggaagaggcgccacaggtg ttccttgggccacttctccagaggaggggaaaccgggccggaagggttagcgtcctggtcttagcgttgtgggcgctgtg getgtcaggaaggcgtagaatggattcaggggcgcgggagggggctgttcagggtgacggctagccctttgctagetagt ggttacaactcaagtcaagggaatttcttcttggcatcaagcaaaagaagtccctcccttcccaaaggatttgaattttg agcgaaaagttctgaaattagggtatctgtgcattttgtctcttttcctgcatatgaatcctgaagccatcacttgcatg cctgtctcctccagagactggctgggaggggctgaaggaaggggcaaaagcatttttgcctaagatgctgaaaaaatttg gagagcagttttattccagcgcagctcccctccgcactgagtgtagtacctagcagctggctgaggtgaggggagggtaa EXON2

1201 1281

ctaagtgacctcgggtgggcaggtcactgcccaggtactgttcaacagattccagactggagcctctgtgttctctttAC

AGCCAACATGCCCATCACTCGGATGCGCATGAGACCCTGGCTAGAGATGCAGATTAATTCCAACCAAATCCCGGGGCTCA

MetProIleThrArgMetArgMetArgProTrpLeuGluMetGlnlleAsnSerAsnGlnlleProGlyLeuI INTRON 2

(1363 bp)

13 61

TCTGGATTAATAAAgtgagtgtaactctttgggttttcctgccactgttttaacccatgtacttctggagggaccaaagc

1441 1521 1601 1681 17 61 1841

ttcagatgcagctcaaaaagggaagtgataacgggacaagcaggtgtttctcccagtgggtcctgcatgcagggagtgtg

192 1 2 001

2 081 2161 224 1 2 321 2401

agtatgttatttacattgtgtcatttaatcctcgcagtagtcctgtgggttagatcttactaatgtcatcattttcagat aagtaaacagaggcactgagaggtagatcataagatcacacaagaagtgatgaagccaagatttgaacttgaacggctcg actcagaaatctttactgttaaccataagtgatataataacagtaagaccttagacttcatatttgtcactgtgtcccta cacatcctctggtttttaatcctcaaaattttgttggatatgttttctcatttccgagaagagaaaactgaggggcaaag agatacagtgacaatgccagggttacacagtgttcaceatceaagtctagcccagagctccctcagtggtatgaccagga ccccctgtgtaagagcccatgctcccaggtgtcctgaggagtcctttctaatggaagaagttcttacttccatgtgggtg cttacaagccagagagaaacatcccagagcttcaaaaccagggctttgggggagggtgccctgtgtgggtcctagcacat

2481 2 561 2 641

gtgtaacaggcagagggaggtctttgtgagctaataatgctgcagctcatccaaactaggtgtcctcctgagagatccag agtggtctgtttaagccagcctcaagatgggtgtccaagccagatgtcaggggaaaaaaggggaagtcagccttttctca gacctgtctggctgggcaggcctgggtctcagactcagccccaaagtctgtggtctctgacctgacacagccttatgtgt

2 721

atgtgtgtattgttcagGAGGAGATGATCTTCCAGATCCCATGGAAGCATGCTGCCAAGCATGGCTGGGACATCAACAAG

leTrpIleAsnLys

cacggcccagcctgggcctcacttgcatgactcctgccttcttcccttcttgaggtagggcacccacctgaaggcacttc cagtttccagcagcaagactttccagcatctgcagagctggagttctgctctcctctaagcgagacccttacaaacatac acagcactctgcagggctccaatcgaacaaatagaagactgagaagtggatgctgctgggcagaaacgtgctggcttagc agaggacaaacgagttaatcttgcaccagtcactctggcccaagaagcctatagctggtgcacttggggcaacatagacc

ctatagacttagtagcaatgatagtatcataataatagctaatgcttactgaacactccctgtgtgcctggcacctgcta

EXON 3

GluGluMetllePheGlnlleProTrpLysHisAlaAlaLysHisGlyTrpAspIleAsriLys INTRON 3 2 801

(792 bp)

GATGCCTGTTTGTTCCGGAGCTGGGCCATTCACACAGgtgtgtgcctgggactcaggcctaggaagcccagggtagagac AspAlaCysLeuPheArgSerTrpAlalleHisThrG

2881 2961 3 041 3121 32 01 3281 3441 3 521

aagaggaggcactcacgttaacacagaggctcttcactggggtccctgagctccctgagacaacatgcagaattactggg aagaggggctggtggcagacttgtgtttctggagaagagagtcgatcatctcagcaaattctcaaagggaaaagccaaga tcttagaaagtgtgtggcttcagggggtttgtggctagatgaaagttctccctggcaaaagcatctgtgaaaagcagctg taagccagggcactgaaagagacccaggtctgcctttttcttcgtgttgaccaaggcccttggtccaagcctcatgtggt tggtggcctccttatccttgagagatggagctctaggcccatctcagaacagtcagcccacccatttagtaactgttctc tgctgcccagtctgtgcccactctaccctctggctgctgatagcccaaggaggaagactgggcatagtctgagacacaga tagtacactttggggatatggggactctagtgcttctggctgggcccttcactgaggccgctagatgtgtttaagccaag cctgggcatttgagaaggcccagggcctaggacctgcagagtgtcaccgggagtacctgctggtttgaccactgtggctc tctggtagcataagaggtcaggggtaccttgccttcctccttcagccaggggcagctgaggatccctacccatggccctg

3 601

acgatcctcttttttctcctgccctctagGCCGATACAAAGCAGGGGAAAAGGAGCCAGATCCCAAGACGTGGAAGGCCA

368 1

ACTTTCGCTGTGCCATGAACTCCCTGCCAGATATCGAGGAGGTGAAAGACCAGAGCAGGAACAAGGGCAGCTCAGCTGTG

3 361

EXON 4

lyArgTyrLysAlaGlyGluLysGluProAspProLysThrTrpLysAlaA

snPheArgCysAlaMetAsnSerLeuProAspIleGluGluValLysAspGlnSerArgAsnLysGlySerSerAlaVal 3 7 61

INTRON 4 (109 bp) CGAGTGTACCGGATGCTTCCACCTCTCACCAAGAACCAGAGAAAAGgtatccaaggactctgggtccttgggaagccctc ArgValTyrArgMetLeuProProLeuThrLysAsnGlnArgLysG EXON 5

3 84 1

3 92 1

agggagggagggtagaaggaggtcagctggggctggagagcctgcaccaaggctgacagcccgtctgccccacagAAAGA luArg INTRON 5 (106 bp) AAGTCGAAGTCCAGCCGAGATGCTAAGAGCAAGGCCAAGAGGAAGgtgagtgtggtcctaagcagccaggcctttggtca LysSerLysSerSerArgAspAlaLysSerLysAlaLysArgLys EXON 6

4 001

cctgtgggccagggtgagcagtggaagaaatgctaaggtggcctgggcctaagctgctttctccctcgacagTCATGTGG

4081

GGATTCCAGCCCTGATACCTTCTCTGATGGACTCAGCAGCTCCACTCTGCCTGATGACCACAGCAGCTACACAGTTCCAG

SerCysGl

yAspSerSerProAspThrPheSerAspGlyLeuSerSerSerThrLeuProAspAspHísSerSerTyrThrValProG 4161

INTRON 6 (183 bp) GCTACATGCAGGACTTGGAGGTGGAGCAGGCCCTGACTCCAGgtgagctggtccaggtctggcaggagaccccacaggtc lyTyrMetGlnAspLeuGluValGluGlnAlaLeuThrProA

608

CHA ET AL.

4241

agtgggatgactctttctcttggaggcatggtgctggcacatggtggcccattagtgcaggctgcagggttggtcggagg

4 3 21

gcgctcgatgtcttgcaaactaagaaagcacacaaccttgacctgtggcttctgctgttccccagCACTGTCGCCATGTG

EXON 7

laLeuSerProCysA

44 01

CTGTCAGCAGCACTCTCCCCGACTGGCACATCCCAGTGGAAGTTGTGCCGGACAGCACCAGTGATCTGTACAACTTCCAG

laValSerSerThrLeuProAspTrpHisIleProValGluValValProAspSerThrSerAspLeuTyrAsnPheGln INTRON 7

4481

(531 bp)

GTGTCACCCATGCCCTCCACCTCTGAAGgttggtgctcctggggcctggcctgctgcttgactgtctgggtctgtgaagg

ValSerProMetProSerThrSerGluA 4 561

4641

4721 4 801

4 881 4961

gcttcctgagagagaaaagatgatcagaactccacctggcactgaattgattgagttgggcattgcagtcttagccacca tagggggaggcaagcgacggggacactaggaaggcagttcagagtgggctgcagtacagtgggggctggtgagaggaggg aagggggccaggggtgcattttgggtgtgctggttctccttcctcctctgtagcccagcatcgtggagggtgaggaagga agtagggtaggggtgggaagcggcgtggcttcagggtttgagaggctgagtcaccaggccagggtcctgttctggaatct ctatggcagataggtccaccgggagggtgtgtgtgtgtgtgtgtgtgtcagagagacagagagacagagaaagggcaggg ggatctggtgggctggaactggaactgcagggtgagtgtgctgactgccagccaacctctctgctttccccatccacagC 1

EXON 8 5041

INTRON 8

(1158 bp)

TACAACAGATGAGGATGAGGAAGGGAAATTACCTGAGGACATCATGAAGgtaaagccccttcctacctgggcactcttga aThrThrAspGluAspGluGluGlyLysLeuProGluAspIleMetLys

5121 52 01 5281 53 61 5441 5521 5601

5681 5761

agtgaccgtttctcagtgaggagagagaaccagtgaagcgttccaaatcagaggatgggtagctgctgttgtcacctggc tgcttgcattgtcccacaagtgccacattcacgtggcttgactggtgggaaacccaccatgggaaggcaggtgggaggcc tggcctctgacagcgtcctgaagcaagccttggggcatcagacagctctgtgagtcaggcactatcagcgatgggtccct ggcctgcatcctctgccccaacatgccccagccctgctagttcgggaaatgcacatcaggcttcaataatcagcctttag gatccgttaatatgatgatggctttatagaaaaagttagcaaattatcctccaggtttttttttctgcttcagttttgaa agtgaatatagtttttgcagccgggggcagtgctcatgcctgtaatcccagcactttggaaggcgaaggtgggtggatca cctgaggtcaggagtttgagaccagcctgactaacatggtgaaacccatctctaccaaaaatataaaaattagctgggcc tggtgcgcatgcctgtaatcccagctactctgaaggctgaggcaggagaatcgcttgaacctgagaggcggaggttgcag tgagctgagattgtgtcattgcactccagcctgggcaacaagagcaaaactccatttcaaaaaaaagtttttgcagtagt

6081 6161

tgtacgccagctgttccattagcccaaaaaattgagacatggatgtcgttccttatctctagcttttctagtcatctttt cttgatttattatgctaacctttgttttaagccacattccctcttactatgtccttacacagttgagagggaagtcgtgg agatgctataccagagagtgggtgtgagaggggtgggaaaatgaattgaggaccagtgccaacatgcatttctgcctcct ctcccgggcccttgtcctgactgcagtgcacttctgcatcctatctgagattgtgaaaatggccaagggtgtgatactgg ctgagaggagctggctcattgaggcagggccacagggtgagtctgcactggaagggagttgatagcctcttgctcttctg

$241

tccccagCTCTTGGAGCAGTCGGAGTGGCAGCCAACAAACGTGGATGGGAAGGGGTACCTACTCAATGAACCTGGAGTCC

5841 5921

6001

EXON 9

LeuLeuGluGlnSerGluTrpGlnProThrAsnValAspGlyLysGlyTyrLeuLeuAsnGluProGlyValG

63 21

64 01

6481 6561

INTRON 9 (286 bp) AGCCCACCTCTGTCTATGGAGACTTTAGCTGTAAGGAGGAGCCAGAAATTGACAGCCCAGGGGgtaagaaggccctggat lnProThrSerValTyrGlyAspPheSerCysLysGluGluProGluIleAspSerProGlyG

ccttatggcttcttagatgagggagaaccacgtagggatggagaaagcttgggggcagggccagggagcagggcggtaaa gcatctggggtactgacacattgtgaattagctacggctgccatgccttaaggtttgcctgaagctgagtggatgtttac tgctgtgctgggaagagcagaggccatgtctatggccttcaggggtagggggaagcacacctgatgccaccgtcccctac EXON 10

6641

cctcatacaaccttcttcacatcttctagGGGATATTGGGCTGAGTCTACAGCGTGTCTTCACAGATCTGAAGAACATGG lyAspIleGlyLeuSerLeuGlnArgValPheThrAspLeuLysAsnMetA

6721

ATGCCACCTGGCTGGACAGCCTGCTGACCCCAGTCCGGTTGCCCTCCATCCAGGCCATTCCCTGTGCACCGTAGCAGGGC

6801

CCCTGGGCCCCTCTTATTCCTCTAGGCAAGCAGGACCTGGCATCATGGTGGATATGGTGCAGAGAAGCTGGACTTCTGTG GGCCCCTCAACAGCCAAGTGTGACCCCACTGCCAAGTGGGGATGGGGCCTCCCTCCTTGGGTCATTGACCTCTCAGGGCC TGGCAGGCCAGTGTCTGGGTTTTTCTTGTGGTGTAAAGCTGGCCCTGCCTCCTGGGAAGATGAGGTTCTGAGACCAGTGT ATCAGGTCAGGGACTTGGACAGGAGTCAGTGTCTGGCTTTTTCCTCTGAGCCCAGCTGCCTGGAGAGGGTCTCGCTGTCA CTGGCTGGCTCCTAGGGGAACAGACCAGTGACCCCAGAAAAGCATAACACCAATCCCAGGGCTGGCTCTGCACTAAGAGA

spAlaThrTrpLeuAspSerLeuLeuThrProValArgLeuProSerlleGlnAlalleProCysAlaProEnd 6881 6961 7041

7121 7 2 01

7281 7 361 7441 7521

AAATTGCACTAAATGAATCTCGTTCCCAAAGAACTACCCCCTTTTCAGCTGAGCCCTGGGGACTGTTCCAAAGCCAGTGA AATGTGAAGGAAAGTGGGGTCCTTCGGGGCG ATGCTCCCTCAGCCTCAGAGGAGCTCTACCCTGCTCCCTGCTTTGGCTG AGGGGCTTGGGAAAAAAACTTGGCACTTTTTCGTGTGGATCTTGCCACATTTCTGATCAGAGGTGTACACTAACATTTCC CCCGAGCTCTTGGCCTTTGCATTTATTTATACAGTGCCTTGCTCGGCGCCCACCACCCCCTCAAGCCCCAGCAGCCCTCA ACAGGCCCAGGGAGGGAAGTGTGAGCGCCTTGGTATGACTTAAAATTGGAAATGTCATCTAACCATTAAGTCATGTGTGA

7601

ACACATAGGACGTGTGTAAATATGTACATTTGTCTTTTTATAAAAAGTAAATTGTTtataaggggtgtggcctttttagg

3' 7 681

FIG. 2.

flanking sequence

aagagaaatttaacttgtaggaatgattttactttttatgg

The nucleotide sequence of the human IRF-1 gene and the deduced amino acid sequence of the protein. Exon sequences are in uppercase letters; intron and 3'-flanking sequences are in lowercase letters. Nucleotides are numbered to the left of the sequence, with nucleotide 1 corresponding to the major transcription initiation site. The exact size of each intron is given in parentheses. The translational start codon (ATG) and stop codon (TAG) are underlined. The amino acid sequence appears below the nucleotide sequence for each exon.

609

HUMAN INTERFERON REGULATORY FACTOR 1 GENE

Table 1. Junction Intron

Sequences Between

the

Exons

and

Introns

Intron

Exon

qcqccctqqc

AGAGCTCGCC -1-

CTCCTGCAGC

tgttctcttt tattgttcag tgccctctag

ACAGCCAACA -2-

GATTAATAAA

GAGGAGATGA -3-

ATTCACACAG

GGCCGATACA -4-

CAGAGAAAAG

tgccccacag

AAAGAAAGTC -5-

CAAGAGGAAG

agaggtgagt gtgagtgtaa gtgtgtgcct gtatccaagg gtgagtgtgg

ccctcgacag

TCATGTGGGG -6-

CTGACTCCAG

gtgagctggt

tgttccccag

CACTGTCGCC -7-

ACCTCTGAAG

ccatccacag

CTACAACAGA -8-

CATCATGAAG

gttggtgctc gtaaagcccc

ctgtccccag

CTCTTGGAGC -9-

AGCCCAGGGG

gtaagaaggc

catcttctag

GGGATATTGG —10-

GTAAATTGTT

tataaqqqqt

Underlined sequences represent the promoter next and after the end of cDNA, respectively.

found in intron 7, with 11 copies of dinucleotide GT (4,907-4,928 nucleotides). The IRF-1 gene spanned 7.72 kb and consisted of 10 exons, which ranged in size from 50 nucleotides (exons 5 and 8) to 988 nucleotides (exon 10) (Table 2). The sizes of the introns varied from 106 nucleotides (intron 5) to 1,363 nucleotides (intron 2) (Fig. 2). The first exon contained only 5' untranslated sequence and was separated by a 1,059 nucleotides intron from the second exon, which contained the initiation ATG preceded by 9 nucleotides of 5' untranslated sequence, as indicated in Figs. 1A and 2. The open reading frame of IRF-1 continued in exons 3-9 until encountering a termination codon (TAG) in exon 10. The sequence of the exons confirmed most of the cDNA sequences published by Maruyama et ai (1989) and was in agreement with the sequence modifications made by Pine et ai (1990). In addition, a single base substitution from T to C was detected when exon 6 was sequenced. This resulted in an amino acid change from isoleucine to threonine at position 220. At this position, threonine was found to be conserved in mouse and rat IRF-1 gene products (Fig. 3). The greatest degree of sequence conservation was seen in exons 2, 3, and 4 (the putative DNA binding domain), when deduced amino acid sequences were compared among the human, mouse, and rat species. Homologies of amino acid sequence reached 100% in the second and third exons between human and mouse and in the fourth exon between human and rat, as shown in Fig. 3. A conserved basic domain was located at the end of the exon 4 and in exon 5. A conserved acidic domain was found in exon 8.

DISCUSSION The IRF-1 gene was located on chromosome 5q (Itoh et ai, 1991) and fine-mapped adjacent to interleukin-3 (IL-3) by Dr. Janet Warrington (University of California, Irvine,

to

the

beginning of the transcription Table 2. Size of

and

Number

Exons

Exons

Size

1 2 3 4 5 6 7 8 9 10

219 96 100 177 50

130 123 50

135 988

Codons 1-29 30-62 63-121 122-138 139-181 182-222 223-239 240-284 285-325

personal communication). In this study, we determined the complete structure of the human IRF-1 gene. The gene is 7.72 kb in length and consists of 9 introns and 10 exons. The first exon contains only 5' untranslated sequence and is separated by an intron (1,059 bp) from the second exon. The translation of the IRF-1 gene may be initiated in exon 2, where the translation initiation codon (first ATG) is located. We have previously shown that 5' to exon 1 (fragment 46) is a GC-rich region that does not contain an obvious TATA box but does contain a CAAT box and Spl binding sites and has promoter function (Sims et ai, 1992). Exons 2, 3, and 4, which contain the putative DNAbinding domain, are the most highly conserved coding regions when compared with the other exonic domains. The last exon is quite large (988 bp) but codes for only 40 amino acids in addition to the stop codon and 3' untranslated sequence. All introns are under 2 kb. Intron 1 contains several characteristics. It was located between exons 1 and 2 and interrupted the 5' untranslated region. Unlike other introns, it did not contain the regular GT-AG donor-accep-

610

CHA ET AL.

*

55

Human Mouse

MPITRMRMRPWLEMQINSNQIPGLIWINKEEMIFQIPWKHAAKHGWDINKDACLF

Rat

-S-L-

Human Mouse Rat

RSWAIHTGRYKAGEKEPDPKTWKANFRCAMNSLPDIEEVKDQSRNKGSSAVRVYR

Human Mouse

MLPPLTKNQRKERKSKSSRDAKSKAKRKSCGDSSPDTFSDGLSSSTLPDDHSSYT

Human Mouse Rat

VPGYM.QDLEVEQALTPALSPCAVSSTLPDWHIPVEWPDSTSDLYNFQVSPMPS TQ—LG-DM-RDI-V-S-SE—MQMDII-T-LAQ—LG-DMDRDI-V-S-SE—MQMDIM-T-L-

Human Mouse Rat

TSEATTDEDEEGKLPEDIMKLLEQSEWQPTNVDGKGYLLNEPGVQPTSVYGDFSC

Human Mouse Rat

KEEPEIDSPGGDIGLSLQRVFTDLKNMDAT.WLDSLLT.PVRL.PSIQAIPCAP

-

*

110

-

*

*

165

-R-T-T-L-V*

*

219

*

274

-A-IA—L-F-HI-T-LS-A-F-H-A-LST*

325

-R-IGI-H-EM-SIM-M-GNS-P-EIGI-EM-PVM-M-T—GNST-. P-

FIG. 3. with the

Comparison of deduced amino acid sequences of IRF-1 in different species. The homologies were determined University of Wisconsin GCG program BESTFIT. Mouse and rat IRF-1 sequences were compared with the human sequence, with the conserved amino acid residues in mouse and rat labeled as "-". The splice sites in the human, where the exonic sequences were interrupted by introns, were marked by an asterisk at the top of the sequences. Singleletter amino acid codes are shown. Sequences were numbered with respect to the human sequence.

splicing sequence that usually borders an intron. A large CpG island extended from the 5' regulatory region to

tor

intron 1. These features suggest that intron 1 may be involved in gene regulation events as well as in differentiation (Senapathy et ai, 1990). In mouse EC cells, it has been shown that expression of the IRF-1 gene was developmentally regulated (Harada et ai, 1990). Further studies will be required to determine the significance and function of this intron. A putative Z-DNA site found in intron 7 of the IRF-1 gene was also detected in intron 2 of the mouse cytochrome c oxidase subunit IV gene (Carter and Avadhani, 1991). The significance of the Z-DNA in IRF-1 gene is unknown. Z-DNA structures have been shown to be important in the function of the SV-40 enhancer (Herr and Gluzman, 1985) and the metallothionein gene promoter (Karin et ai, 1984) and to be associated with the DNase I-sensitive sites of active genes (Nordheim and Rich, 1983). IRF-1 mRNA has been detected in various types of cells and tissues (Miyamoto et ai, 1988; Yu-Lee et ai, 1990). The AUUUA sequence found at the 3' untranslated region of IRF-1 mRNA in the human, mouse, and rat species implicates a rapid mRNA turnover (Shaw and Kamen, 1986). The expression of IRF-1 mRNA is regulated by IFN-a, IRF-7, prolactin, double-stranded RNA, viruses, and some other cytokines (Miyamoto et ai, 1988; Reis et ai, 1992). Previous studies have shown that IFN-7 induces the expression of IRF-1 mRNA more than IFN-a does and that this induction is not inhibitable by cycloheximide (Yu-Lee et al., 1990; Pine et ai, 1990; Sims et al. ,1992). The affinity of IRF-1 binding to the PRD-I site is increased by an

increase of its phosphorylation (Pine et ai, 1990); howbinding depends on the abundance of IRF-1 and has little dependence on the degree of its phosphorylation. Since IRF-1 exhibits affinity to transcriptional enhancers (the IFN-stimulated response element or ISRE) of the IFN-inducible genes as well as to the IFN genes themselves (Reis et ai, 1992), it has been proposed that IRF-1 (also called ISGF-2 by Pine et ai, 1990) may share structural similarities with other ISRE-binding proteins, such as ISGF-3. Further studies of the IRF-1 gene ever, the amount of

deepen our understanding of the regulation not only of the IFN gene, but also of the IFN-inducible genes, and result in the isolation of the transcriptional regulatory proteins for IFN-inducible genes. may

ACKNOWLEDGMENTS The authors recognize support to Albert Deisseroth from the National Cancer Institute (POl CA49639-01A1), the American Cancer Society (IM-580), the Sid Richardson Foundation, the Bush Leukemia Fund, the Ladies Leukemia League of Louisiana, the Kleberg Foundation, and the Anderson Chair for Cancer Treatment and Research. Many thanks to Rosemarie Lauzon for her expert editorial assistance.

REFERENCES BAEUERLE, P.A. (1991). The inducible transcriptional activator NF-xB: Regulation by distinct protein subunits. Biochim. Bio-

611

HUMAN INTERFERON REGULATORY FACTOR 1 GENE

phys. Acta 1072, 63-80. CARTER, R.S., and AVADHANI, N.G. (1991). Cloning and mouse cytochrome c oxidase subunit IV Biophys. 288, 97-106. HARADA, H., FUJITA, T., MIYAMOTO, M., KIMURA, Y., MARUYAMA, M., FURIA, A., MIYATA, T., and TANIGUCHI, T. (1989). Structurally similar but functionally distinct factors, IRF-1 and IRF-2, bind to the same regulatory elements of IFN and IFN-inducible genes. Cell 58, 729-739. HARADA, H., WILLISON, K., SAKAKIBARA, J., MIYAMOTO, M., FUJITA, T., and TANIGUCHI, T. (1990). Absence of the type I IFN system in EC cells: Transcriptional activator (IRF-1) and repressor (IRF-2) genes are developmentally regulated. Cell 63, 303-312. HERR, W., and GLUZMAN, Y. (1985). Duplications of a mutated simian virus 40 enhancer restore its activity. Nature 313,

characterization of the

gene. Arch. Biochem.

711-714.

ITOH, S., HARADA, H., NAKAMURA, Y., WHITE, R., and TANIGUCHI, T. (1991). Assignment of the human interferon regulatory factor-1 (IRF-1) gene to chromosome 5q23-q31. Genomics 10, 1097-1099. KARIN, M., HASLINGER, A., HOLTGREVE, H., RICHARDS, R.I., KRAUTER, P., WESTPHAL, H.M., and BEATO, M. (1984). Characterization of DNA sequences through which cadmium and glucocorticoid hormones induce human metallothionein-IIA gene. Nature 308, 513-519. KRAFT, R., TARDIFF, J., KRAUTER, K.S., and LEINWAND, L.A. (1988). Using mini-prep plasmid DNA for se-

quencing niques 6,

double stranded 544-549.

templates with Sequenase. BioTech-

MARUYAMA, M., FUJITA, T., and TANIGUCHI, T. (1989).

Sequence

of

a

cDNA

coding

bind to the promoters of both beta interferon- and interferonstimulated genes but is not a primary transcriptional activator of either. Mol. Cell. Biol. 10, 2448-2457. REIS, L.F.L., HARADA, H., WOLCHOK, J.D., TANIGUCHI, T., and VILCEK, J. (1992). Critical role of a common transcription factor, IRF-1, in the regulation of IFN-/3 and IFN-inducible genes. EMBO J. 11, 185-193. SANGER, F., NICKLEN, S., and COULSON, A.R. (1977). DNA sequencing with chain-terminating inhibitors. Proc. Nati. Acad. Sei. USA 74, 5463-5467. SENAPATHY, P., SHAPIRO, M.B., and HARRIS, NX. (1990). Splice junctions, branch point sites, and exons: Sequence statistics, identification, and applications to genome project. Methods Enzymol. 183, 252-278. SHAW, G., and KAMEN, R. (1986). A conserved AU sequence from the 3' untranslated region of GM-CSF mRNA mediates selective mRNA degradation. Cell 46, 659-667. SIMS, H.S., CHA, Y., ROMINE, M.F., GAO, P.Q., GOTTLIEB, K., and DEISSEROTH, A.B. (1992). A novel interferon-inducible domain: Structural and functional analysis of the human IRF-1 gene promoter (submitted).

WATANABE, N., SAKAKIBARA, J., HOVANESSIAN, A.G., TANIGUCHI, T., and FUJITA, T. (1991). Activation of IFNß element by IRF-1 requires a post-translational event in addition to IRF-1 synthesis. Nucleic Acids Res. 19, 4421-4428. WEISSMANN, C, and WEBER, H. (1986). The interferon genes. Prog. Nucleic Acid Res. Mol. Biol. 33, 251-300. YU-LEE, L.Y., HRACHOVY, J.A., STEVENS, A.M., and SCHWARZ, L.A. (1990). Interferon-regulatory factor 1 is an

immediate-early

gene under transcriptional regulation by prolactin in Nb2 T cells. Mol. Cell. Biol. 10, 3087-3094.

for human IRF-1. Nucleic Acids

17, 3292. MIYAMOTO, M., FUJITA, T., KIMURA, Y., MARUYAMA, M., HARADA, H., SUDO, Y., MIYATA, T., and TANIGUCHI, T. (1988). Regulated expression of a gene encoding a nuclear factor, IRF-1, that specifically binds to IFN-/3 gene regulatory elements. Cell 54, 903-913. NORDHEIM, A., and RICH, A. (1983). Negatively supercoiled Res.

simian virus 40 DNA contains Z-DNA segments within transcriptional enhancer sequences. Nature 303, 674-679. PINE, R., DECKER, T., KESSLER, D.S., LEVY, D.E., and DARNELL, J.E. JR. (1990). Purification and cloning of interferon-stimulated gene factor 2 (ISGF2): ISGF2 (IRF-1) can

Address reprint requests to: Dr. Albert B. Deisseroth

Department of Hematology Box 24 The University of Texas M.D. Anderson Cancer Center 1515 Holcombe Boulevard Houston, TX 77030 Received for publication April 1, 1992;

accepted

June

3, 1992.

Human interferon regulatory factor 1: intron-exon organization.

Interferon (IFN) regulatory factor 1 (IRF-1) is a transcriptional regulatory protein that mediates the transcriptional activation of the IFN-alpha and...
2MB Sizes 0 Downloads 0 Views