Cell, Vol. 70, 631-645,

August 21, 1992, Copyright

0 1992 by Cell Press

A Tissue-Specific MAREAR DNA-Binding Protein with Unusual Binding Site Recognition Liliane A. Dickinson, Tadashi Joh, Yoshinori Kohwi, and Terumi Kohwi-Shigematsu La Jolla Cancer Research Foundation La Jolla, California 92037

Summary A human cDNA was cloned that encodes a DNAbinding protein (SATBl) that is expressed predominantly in thymus and binds selectively to the nuclear matrix/scaffold-associating DNAs (MARslSARs). Missing nucleoside experiments showed that SATBl selectively binds in a special AT-rich sequence context where one strand consists of mixed A’s, T’s, and C’s, excluding G ’s (ATC sequences). When this feature is destroyed by mutation, SATBl binding is greatly reduced even if the direct contact sequence remains intact. Conjunctional SATBl-binding sequences become stably unpaired in supercoiled DNA. Specific mutations that diminish the unwinding potential greatly reduce SATBl binding. However, SATBl does not bind single-stranded DNA. Chemical interference assays show that SATBl binds along the minor groove with very little contact with the bases. This suggests that SATBl recognizes the ATC sequence indirectly through the altered sugar-phosphate backbone structure present in the double-stranded DNA. Introduction Eukaryotic chromosomes are thought to be organized into a higher-order structure consisting of discrete and topologically independent loop domains, which would be fastened at their bases to the intranuclear framework by nonhistone proteins (reviewed in Gasser and Laemmli, 1987). The loop organization of chromosomes may be important not only for compaction of the chromatin fiber, but also for the regulation of gene expression and replication. Each domain might represent an independent unit of gene activity that would be insulated from the regulatory mechanisms of neighboring domains and thus protected from chromosomal position effects. The above model implies that specific DNA sequences exist at the bases of the DNA loops and proteins that would bind to these sequences to separate one domain from another. A biochemical criterion for putative boundary sequences is their high binding affinity to the nuclear matrix or scaffold (Mirkovitch et al., 1984; Cockerill and Garrard, 1986), which is defined as the residual structures left in the nucleus after extraction of nuclei with either high salt (Berezney and Coffey, 1974) or the detergent lithium 3,5diiodo salicylate (Mirkovitch et al., 1984). Specific DNA segments that have this property have been identified in a number of different species (reviewed in Phi-Van and Strtitling, 1990). These sequences are called MARS or SARs (for nuclear matrix- or scaffold-associating regions;

in the current study, they are called MARS) and often contain or are located in close vicinity to regulatory sequences, including enhancer sequences (Gasser and Laemmli, 1986a; Cockerill and Garrard, 1986; Cockerill et al., 1987; Jarman and Higgs, 1988). Some MARS are found at the boundaries of transcription units where they may delimit the ends of an active chromatin domain (Mirkovitch et al., 1984; Gasser and Laemmli, 1986a, 1986b; Phi-Van and Strltling, 1988; Bode and Maass, 1988; Dijkwel and Hamlin, 1988; Levy-Wilson and Fortier, 1989). Furthermore, A elements of the chicken lysozyme gene, which contain DNA with high affinity to the nuclear matrix (PhiVan and StrBtling, 1988), augment the transcriptional activityof a linked gene in a position-independent, copy number-dependent manner in stably transfected cells (Stief et al., 1989; Phi-Van et al., 1990). The locus control region of the human P-globin domain, characterized by a set of tissue-specific DNAase I hypersensitive sites (Tuan et al., 1985; Forrester et al., 1987), also contains MARS (Jarman and Higgs, 1988) and confers copy number-dependent high levels of erythroid-specific expression to a linked gene (Grosveld et al., 1987). The specific role of MARS in either A elements or locus control region activity remains unclear. Recently, specialized chromatin structures which are MAR-like AT-rich sequences located at the boundaries of a Drosophila heat shock gene, were shown to insulate the regulatory influence of adjacent domains (Kellum and Schedl, 1991, 1992), while a MAR from the Drosophila 87A7 heat shock locus could not (Kellum and Schedl, 1992). What special sequence features are found in MARS? MARS are in general 70% AT-rich and are preferentially bound and cleaved by topoisomerase II (reviewed in Gasser and Laemmli, 1987). However, there is no consensus sequence known for MARS. The topoisomerase II consensus derived from Drosophila (Sander and Hsieh, 1985) and vertebrate (Spitzner and Muller, 1988) is only loosely defined. A specialized DNA structure formed by certain AT-rich sequences may be important for their biological function. The significance of structural characteristics for MARS, such as DNA bending (Anderson, 1986; Homberger, 1989; von Kries et al., 1990) and a narrow minor groove attributable to oligo(dA) tracts (Adachi et al., 1989), has been previously proposed. By employing an unpaired DNA-specific probe, chloroacetaldehyde (CAA) (KohwiShigematsu et al., 1983; Kohwi-Shigematsu and Kohwi, 1985), we recently demonstrated that naturally occurring MARS from different species are characterized by their strong potential for extensive unpairing (or unwinding) when subjected to superhelical strain (Kohwi-Shigematsu and Kohwi, 1990; Bodeet al., 1992). This inherent unwinding property was shown to be important for binding to the nuclear matrix and for the augmentation of gene expression in stable transformants (Bode et al., 1992). For example, two MARS flanking the immunoglobulin heavy chain (IgH) gene enhancer (Cockerill et al., 1987) were shown to be continuously unpaired over a distance of more than 200

Cell 632

bp in supercoiled plasmid DNA. A short sequence motif, ATATAT, within the MAR located 3’ of the IgH enhancer was delineated to be a nucleation site for unwinding (or unpairing). Point mutations substituting three bases in this sequence completely abolished the unwinding property of the MAR (Kohwi-Shigematsu and Kohwi, 1990). In a subsequent study (Bode et al., 1992), we showed that a concatemerized, double-stranded 25 bp oligonucleotide containing the core unwinding sequence of the 3’ MAR behaved like a typical MAR. This synthetic MAR was capable of unwinding under superhelical strain, strongly bound to the nuclear matrix, and enhanced transcription of a linked reporter gene in stable transformants (Bode et al., 1992) which was comparable with the enhancement observed with the 2.2 kb MAR of the human interferon 8 (hulFN-8) gene (Mielke et al., 1990; Klehr et al., 1991). However, none of these features was observed with a similarly concatemerized, double-stranded oligonucleotide derived from the mutated core sequence; the unpairing property was lost, the binding affinity to the nuclear matrix was greatly reduced, and no enhancement of transcription was detected. The unwinding property of MARS may be important in effectively relieving negative superhelical strain that would accumulate in a looped DNA domain and in preventing its influence on neighboring domains. If certain AT-rich sequences are biologically significant owing to their intrinsic structural property, there might be a protein that recognizes and distinguishes AT-rich sequences that unwind from those that cannot unwind themselves. Such a protein could be a MAR-binding protein. Except for topoisomerase II and yeast RAP-1 (Hofmann et al., 1989a), little is known about scaffold proteins in eukaryotes. Recently, a MARbinding protein, ARBP, has been purified from chicken oviduct (von Kries et al., 1991). The amino acid sequence of ARBP has not yet been reported. The present study describes a human cDNA encoding a protein that was cloned owing to its high affinity to the synthetic MAR probe described above. This is a tissuespecific nuclear matrix protein that binds strongly and selectively to naturally occurring MARS from different species. This protein exhibits an unusual mode of binding. Instead of a primary sequence consensus, it selects for AT-rich sequences with a high unwinding potential and binds in the minor groove with little contact to DNA bases. Results cDNA Cloning of an 85.9 kd MAR-Binding Protein Complementary oligonucleotides, 5’-TCllTAAll’TCTAATATATlTAGAAttc-3’ and 5’-TXTAAATATATTAGAAATTAAAGAgaa-3’, containing the nucleation site for unpairing (indicated by bold type; Kohwi-Shigematsu and Kohwi, 1990) and its immediate flanking sequences derived from the MAR 3’ of the mouse IgH gene enhancer were made into duplex DNA. These monomers were concatemerized by hybridization through overlapping singlestranded ends (indicated by lower-case letters) to seven repeats, so that their orientation was identical throughout the multimer. The resulting wild-type (25)7 exhibited the

following three features: a strong unpairing property under superhelical strain, a high affinity to the nuclear matrix, and an ability to enhance transcription when linked to a reporter gene in stable transformants (Bode et al., 1992). We used this double-stranded wild-type (25), as a probe to screen protein replica filters of a hgtll cDNA library (Singh et al., 1988; Vinson et al., 1988) prepared from human testis RNA (kindly provided by Dr. Jose Millan). Screening of 500,000 phage plaques yielded one clone, hATl1, containing a 2 kb insert that expressed a protein that bound to the DNA probe. It was subcloned in Bluescript plasmid (pAT11) and sequenced (Figure 1A). Because of the absence of a complete3’end, the same library was rescreened with the pAT11 cDNA insert and five additional clones were isolated. These were sequenced, and all five clones were found to be homologous to pAT11. One clone, pAT46, contained a poly(A) tail, but was 300 bp shorter than pAT11 at the 5’ end. The overlapping sequences of pAT11 and pAT46 were confirmed to be identical, and their inserts were fused at their common single Bglll site to yield the putative full-length cDNA pATll46. The complete 2946 bp nucleotide sequence of pAT1146 is shown in Figure 1B. It includes a short, AT-rich 5’ untranslated region (215 nt), followed by an ATG codon that meets the criteria for translation initiation (Kozak, 1984), followed by a long open reading frame of 763 aa that would encode an 85.9 kd protein, which we named SATBl (special AT-rich sequence binding protein 1). To confirm that the protein size predicted from the cDNA corresponded to the size of the expressed protein, the cDNA was placed under control of the T7 RNA polymerase promoter and RNA was synthesized and translated in vitro in reticulocyte lysate. Analysis of the translation products on an SDSpolyacrylamide gel revealed a major band of approximately 90 kd (data not shown), which is in agreement with the predicted size. The coding sequence is followed by a long 3’untranslated region (444 nt) that includes a p~ly(A),~ tail. From a mouse cDNA library, we cloned a single cDNA that encodes a 764 aa protein 98% homologous to the human SATBl (unpublished data). This confirms that the fused pATl146 cDNA represents a naturally occurring human cDNA. Except for the presence of a continuous stretch of 15 glutamine residues at the carboxyl terminus (residues 592 to 607; Figure 1 B), the deduced amino acid sequence does not contain any obvious sequence motifs such as zinc fingers and leucine zippers (reviewed in Struhl, 1989). Basic and acidic amino acids are evenly distributed throughout the 763 aa protein. Computer-assisted comparison of SATBl to the GenBank and National Biomedical Research Foundation data bases revealed no homologous sequences, except for the glutamine stretch, indicating that SATBl is a novel DNA-binding protein. Sequence Selectivity for SAT61 Binding To examine whether SATBl has sequence selectivity among different AT-rich sequences, we first tested whether SATBl could distinguish the wild-type (25), sequence described above from a similarly concatemerized mutated DNA, mutated (24k,, prepared from oligonucleo-

Tissue-Specific 633

MAREAR

DNA-Binding

Protein

Figure 1. Structural Analysis of Cloned SATBI cDNA

9 *

m= 2::

8; zm

8 2

(A) Schematic structure of cDNA clone pAT1146, which was derived from pATI 1 and pAT46(shown below). The hatched area represents the open reading frame. The solid bar indicates the glutamine stretch (QE) in the deduced amino acid sequence. Major restriction endonuclease sites are indicated. (6) Nucleotide sequence of pATl146 and the deduced primary structure of SATBl. The 15 glutamine repeat is underlined.

% z

5 pAT1 146 2.5 pAT1 1 1

pAT46

B.

2.9kb

I

I

1

CGGGGGAAAGCAAAATAA~A~AA~TTCAGGGGAAGT~G~C~~CAGGT~TG~TG~TTTTT~ATTTTTTTTTTT~TAATTAA AAAAAAAAAGGACATAGAAAACATCAGT~TTGAACTTCTCTT~AAGAA~CCGGG~TGCAAAGGAAATCT~CTTTGTTTTT GTTATTTATCTGCTGTCAAGTTTTGAAGTGGTGATCTTTAGACAGTGACTGAGTATGGAT~ATTTGAACGAGG~AACT~A

SO 160 240

"DHLNEATO GGGGAAAGAACATTCAGAAATGTCTAA~AATGTGAGTGATCCGAAGGGTCCAC~AG~CAAGATTG~~~G~CTGGAGCAGA

320

GKEHSEHSNNVSDPKGPPAK IARLEO ACGGCAGCCCCCTAGGAAGAGGAAGGCTlGGGAGTACAGGTG~AAAAATGCAGGGAGTGCCTTTAAAA~ACTCGGGCCAT

400

NGSPLGRGRLGSTGAYnOGYPLKHSGH CTGATGAAAACCAACCTTAGGAAAGGAAC~ATGCTGCCAGTTTTClGlGlGGTGGAACATTATGAAAACG~CATTGAATA LMKTNLRKGTRLPVFCY" EHIENAlEY TGATTCCAAGGAGGAGCATGCAGAATTTGTGCTGGTGAGAAAGGATATGCTTTTCAAC~AGCTGAT~GAAATGGCATTGC OCKEEHAEF"L"RKO"LFNO~,E"A~ TGTCTCTAGGTTATTCACATAG~T~TGCCCPIGGC~TG LSLGYSHSSAAOAKCLIOVGYYNPYPL TCTTACGTGACAGATGCCCCTGATGCTIC*GT*GCIGPI SYYTOAPOAT"A0~100""HV"T~~IO GTTACACAGTTGCCCCAAACTAGAAGACTTCCCTCCCGAACAATGGT~GCA~AC~ACAGTGAGGAATG~T~TGAAGGACT LHSCPKLEOLPPEOWSHTTV~NA~~O TACTGAAAGA,ATGAAT~AGAGTTCITTCGCCI\*GT LLKO~NOSSLAKECPLSOSnlSSlYNS AC,TACTATGCAAATGTCTCIGCIGC*PII*TGTCIT TYIANVSAAYCOEFGRYIKHFKKTKOn GATGGTTGAAATGGATAGTCTTT~TGAG~TAT~~~AG~AAGG~G~~AAT~ATGT~AATTTTGG~~AG~AA~~AGTT~‘AG

GNTAEOPPSPAOLSHGSOPSVRTPLPN CTGCACCCTGGGCTCGTAT~AA~A~~~AT~AGT~~T~AATTGGT~AA~~AG~AG~TGGTGATGG~T~AG~TG~~GAA~~A IHPGLVSTPISPOLVNOGLV~AO~~~G CCAGTATGCAGTGAATAGPlCTTTTAG~‘~AG~AGT~~TTAAA~~AA~AATA~TTGAA~~A~~~T~~~~~lGT~AGTAGAT alA"NRLLA00SLNaOYLN"PPPYSR CTATGAATAAGCCTTTGGAGC/\ICIGGTTCGICCPI*TGAA

9 35 62 480 89 560 115 640 142 720 169 800 195 880 222 960

249 1040

302 ,200

329 ,280 1360

sflNKPLEaavsT~rEvss~,va~v~a~ CTGAAACGAGCAGGAATClCCCAGGCGGTATTTGCACGTGTGG~TlTTAACAGAACTCAGGGCTTGCTTTCAGAAATCCT LKRAGISOAVFARVAFNR~OGLLSEIL CCGAAAGGAAGAGGACCCCAAGA~TGCITTCCPIGTCT~

382 ,440

409 1520

RKEEOPKTASOSLLVNLRAnONFLOL CGGAAGCTGAAACAGA~~GAATATA~~AGGA~GAAAGGGAAAGGAGCTTGAATG~TG~~T~GG~~ATGGGT~~TG~~~~~

,600

PEAERORIYOOERERSLNAASA~GPAP CTCATCAGCACAtCAC~~AG~~GTCCTCCCC*GGTGAAAA~AG~TA~TATTG~~A~TGAAAGGAATGGGAAA~~AGAGAA

,680

435 462

LISTPPSRPPOVKTATIATERNGXPEN CAATACCAlGAACATTAATG~TT~~ATTTATGPlfCCTGT ,NAS,YOE,OOE"KRAY"SOA~ N T ,, N TTGCAAAGGTTGCAGCAA~~AAAAG~~AGGGATGGTTGlG~GAG~TGTTA~G~TGGAAAGAAGAT~~TT~T~~AGAAAA~

IWO

FAKVAATKSOGWLCELLRY~EDPSPEN AGAACCCTGTGGGAGAA~~T~T~~AlGAT~~GAAGGTTCCT~AGTCTT~~T~AG~~AGAA~GTGATG~~ATTTATGAA~A

1920

489 1760

515 542

RTLYENLSnlRRFLSLPOPERDIlYEO GCAGAGCAACGCGGTGCATClCCPiTGGCGdCPiCGCCGCCCCACATTAlC~ATGTTCCAGCAGAGCAGATTCAGCAACAGC IHVPAEOI~ ESNAVHHHGORPPHl

2000

AGCAGCAACAGCAACAGCAGCAGCAGCAGCAG~AGG~A~CGCCGCCTCCACAGCCACAGCAGCAGCCACAGACAGGCC~T

2080

RLPPROPTVASPAESOEENROYTRPRT AAAAATTTCAGTGGAAGCCTTGGGAAT~CTCCAGAGTTl~ATA~AAGA~GTGGGCLTGTACCCTGACGAAGAGGCCATCC

2240

K,S"EALG,LOSF,OO"GLYPOEEdl ACAIClCTGTCTGCCCAGCT~GA~~lT~~~AAGlA~A~~AT~A,~AAGTT~TTT~AGAA~~AG~GGTA~TAT~T~AAG~A~ lKFFONOR"YLKH OTLSAOLOLPKYTI CACGGCAAACTGAAGGACAATTCCGG,,TAGAGGTCG HGKLKONSGLEVOVAEIKEEELLYOLE AGACAGTGTCCAAGATAAAAATA~lAA~A~~~TTTTlT~AGTGAAA~TAGAAGAAGAG~lGT~AGTGGAAGGAAA~ACAG ESVOOKNTNTLFSVLLEEELSYEGNT ACAlTAATACTGAT,TGAAAGA~TGAGAlAAAAGTATTTGTTT~GT,~AA~AGTG~~A~TGGTATTTA~TAA~AAAATGA OlNTOLKO AAAGTCCAtCTTGTCT,CT~,~AGAAAA~~TTTGTTGTT~ATTGTTTGG~~AATGAA,~lT~AAAAA‘TTG‘A~AAA~AG AAAAGTTCGAAAAGGATAAlA~AGA~TGCICG~ ATCAAGCATTGTTTGGTATTAAAATTTGTGTTC*CGGGATG~A~~AAAGTGTGTA~~~~GTAAG~A,GAAA~~AGTG,TT T~,GTTTT,TTTTTAGT,CTTATTCCGGAG~~,~AAA~AAG~AT,ATA~~TT‘TGTGAT,ATGATTT~~T~T~~TATAAT T~TTTCTGT~CCACTCCAC*CTGITCTTT~~AAA~TTGC 2946

355

569 595

649 675 2320 702 2400 729 2480

755 2560

763 2640 2720 2800 2880

Cdl 634

A.

ity group protein that binds to any runs of 6 or more AT base pairs (Solomon et al., 1986), SATBl clearly selects a particular type of AT-rich sequence for binding.

I,

12345676

B.

5, Wt

mut:

3’

: TCTTTAATTTCTAATATATTTAGAA TCTTTAATTTCTACTG-CTTTAGAA

Figure 2. Sequence

Requirement

for SATBI

Binding

(A) Autoradiograph of a gel retardation assay with SATBl, synthesized in reticulocyte lysate, and various repeats of wild-type or mutated synthetic oligonucleotide probes. Lanes 1 and 2, wild-type (25)r; lanes 3 and 4, mutated (24)*; lanes 5 and 6, wild-type (25),; and lanes 7 and 6, mutated (24)2. Odd numbered lanes are controls without added protein (indicated by minus signs); even numbered lanes (indicated by plus signs) contain SATE1 synthesized in reticulocyte lysate. (B) Sequences of the top strand of the wild-type (wt) and mutated (mut) oligonucleotide monomers. The nucleation site for unpairing, ATATAT, in the wild-type oligonucleotide was changed to CTG-CT in the mutated oligonucleotide as indicated with bold letters.

tide 5’-TCllTAATTTCTACTGCTlTAGAAttc-3’ and its complementary sequence, in which the ATATAT motif in the wild-type oligonucleotide was replaced by the sequence in bold (the sequence of a single repeating unit for each DNA is shown in Figure 28). In contrast to wild-type (25),, mutated (24)8 totally resists unwinding under superhelical strain even though it is still AT rich (Bode et al., 1992). Binding of SATBl to the wild-type and mutated oligonucleotide sequences was studied by mobility shift assays with bacterially produced SATBl (see Experimental Procedures; Figure 2A). The wild-type (25)7 probe, which was used to isolate the SATBl cDNA in the first place, was strongly complexed by SATBl (Figure 2A, lane 2). A monomer of the same sequence also bound to SATBl , although the affinity was somewhat reduced (Figure 2A, lane 6). The mutant sequence virtually failed to bind; only a very faint shifted band was seen for a concatemer of 8 repeats, and no shift was detected with a mutant concatemer of 2 repeats (Figure 2A, lanes 4 and 8). Therefore, the previously defined nucleation site for unpairing, important for MAR binding and transcriptional activation, was successfully distinguished by SATBl from the AT-rich mutated version. Unlike other AT-rich sequence-binding proteinssuch asa protein, a high mobil-

Identification of Nucleosides Contacted by SATBl To determine the sequences that make contact with SATBl, we initially attempted DNAase I footprinting using the dimerized wild-type (25)*. However, we were unable to detect regions protected by SATBl from DNAase I cleavage, even though the probe was strongly bound by protein. This failure to obtain DNAase I footprints was possibly attributable to the rapid on-off rate of protein binding: under the DNAase I footprinting assay conditions, the dissociation rate (cd”) of SATBl, determined according to Weeks and Crothers (1991) was 1 min-’ for the wild-type (25)2, and 0.3 min-’ for the wild-type (25), (data not shown). Therefore, we employed the missing nucleoside experiment developed by Hayes and Tullius (1989) to determine which nucleosides are contacted by SATBl on the DNA. Hydroxyl radical treatment results in the loss of a deoxyribose residue with its attached base and causes the phosphodiester backbone to break at this position. The DNA fragments that contain on average one random nucleoside gap per molecule are used as probes in a standard chemical interference experiment. The wild-type (25)2 XbalHindlll fragment (cloned in Bluescript) was selectively labeled at the Xbal site (either at the 3’end by Klenow polymerase or at the 5’ end by T4 polynucleotide kinase) and subjected to a limited hydroxyl radical treatment as described in Experimental Procedures. The gapped DNA molecules were incubated with bacterially produced SATE1 in a typical binding reaction, followed by native polyacrylamide gel electrophoresis. The free (nonshifted band) and bound DNA(shifted band) were purified and run on a denaturing urea-polyacrylamide gel. An autoradiogram of a sequencing gel from one of these experiments is shown in Figure 3A. Missing nucleosides essential for binding (marked by vertical brackets) are detected as low intensity or lacking bands in the lane containing DNA bound to protein (lane B) or as high intensity bands in the lane containing free, unbound probe (lane F). The control lane (C) contained naked DNA treated with hydroxyl radical that gave rise to a relatively even banding pattern. The bands of the autoradiogram were aligned to control bands corresponding to the G and A residues of the fragment (lane G+A) and scanned by laser densitometry. The relative strengths of the nucleoside contacts were estimated by comparing the relative peak heights of the densitometer scans and are summarized in Figure 38. The contact site spans about 11 nucleosides on each strand, and the contacted region on the bottom strand is offset by 1 base to the 3’ end. The strongest contacted motif is 5’-TTCTAATATAT-8 on the top strand and 5’-TATATTAGAAA-3’ on the bottom strand. This result shows that the 6 bp motif ATATAT is located within the site contacted by SATBl , which is in agreement with the mutational analysis; but it seems not to be solely responsible for binding, since neighboring sequences (toward the 5’ end on the top strand and toward the 3’ end on the bottom strand) are contacted just as strongly. An identical result (summarized

Tissue-Specific 635

MARlSAFt DNA-Binding

Protein

A. ‘G+A

Figure 3. Missing (25)2 Complex

C

F

6’

Nucleoside

TOP strand ‘G+A C F 8

Experiment



of the SATBI-Wild-Type

(A) Sequencing gel showing the products obtained from a missing nucleoside experiment. Bacterially produced SATBl protein was bound to the hydroxyl radical-treated wild-type (25)2, which was 5’~labeled by polynucleotide kinase (top strand, right panel) or 3’4abeled by DNA polymerase I Klenow fragment (bottom strand, left panel) at the Xbal site of the Bluescript polylinker (see Experimental Procedures). Lanes marked G+A represent Maxam-Gilbert GA-specific sequencing reactions performed on the intact DNA fragment. The control lane C shows the hydroxyl radical cleavage pattern of the DNA probe in the absence of bound protein. Lanes F and B represent the free and protein-bound probes, respectively, that were separated in a gel mobility shift assay. Vertical brackets indicate nucleosides contacted by SATBl, The 5’-ATATAT-3’motif is indicated alongside the GA ladder. (B) SATE1 contact sites on the oligomer wild-type (25)* were deduced from densitometer scans of the autoradiograms shown in (A) and are given by vertical bars. The height of the bars approximates the relative strength of the contacts between protein and individual nucleosides. Nucleosides with no indication were not contacted. Stars indicate adenines that interfered slightly with protein binding when methylated (see Figure 6C).

in Figure 38) was obtained for each of the two repeated sequences (Figure 3A). Multiple Discrete SATBi-Binding Sequences within the IgH Enhancer Mutation analysis revealed a certain sequence selectivity for SATBl binding (see Figure 2). We examined a potential SATBl-binding site(s) within the whole 997 bp IgH enhancer region. The original 25 bp sequence used as a

probe for SATBl binding occurs only once in this region and is located 3’of the enhancer. Because the 997 bp Xbal fragment contains two sets of MARS that surround the IgH enhancer (Cockerill et al., 1987) and coincide with the sequences that become unpaired and reactive with CAA under superhelical strain (Kohwi-Shigematsu and Kohwi, 1990) SATBl might also bind at multiple sites if it is a MAR-binding protein. (Figure 5C shows a map of the IgH enhancer fragment.) We employed the missing nucleoside experiment, as described for wild-type (2!&, to map precise contact regions within the 5’ enhancer fragment (5’-En) (Xbal[l]-Pvuil[384]) and the 3’enhancer fragment (3’-En) (EcoRl[683]-Xbal[997]) (numbering is according to Gillies et al., 1983). Among the three DNA fragments, the core (Pvull[384]-EcoRl[683]) did not contain SATBlbinding sites, as determined by gel mobility shift assays (data not shown). The results for the Y-En region employing the 384 bp S’Xbal-Pvull fragment labeled at either end (Figures4A and 48) revealed three contact sites. Vertical brackets with roman numerals indicate site I, which consists of 3’-ATAATCTTC-5’(69-77) (Figure 4A), site II, and site Ill, which are 10 bp apart and correspond to 5’-AATAATAAAT-3’ (293-302) and 5’ATATTTTT-3’ (313320), respectively(Figure 48). In the 314 bp EcoRI-3’Xbal restriction fragment containing the 3’-En region, one major contact site (site IV) was found that corresponded to 5’-TTCTAATATAT-3’ (740-750; Figure 4C). As expected, site IV is identical to the contact site determined for the oligomer probe (Figure 3C) containing this sequence. Although this SATBl contact sequence, site IV, exhibits the strongest CAA reactivity, avery high reactivity that persists at high salt was also detected in its immediate 3’ neighboring sequence (Kohwi-Shigematsu and Kohwi, 1990). When the major contact site, site IV, was mutated in the same way as shown in Figure 28 within the natural sequence context of the 314 bp EcoRI-3’Xbal fragment, this mutated DNA fragment still retained SATBl binding based on the gel shift assay (data not shown). Therefore, we examined whether SATBl binding could also be detected in the neighboring sequences by analyzing the mutated 314 bp DNA using the missing nucleoside experiment. Two additional contact sites were detected downstream of site IV, which were designated site V and site VI and consisted of 5’-AATAATAGAGTAAlllT-3’(765-781) and 5’ACCAATAATCA 3’(790-800), respectively (Figure 4D). These two sites appear to be low affinity binding sites, since their binding to SATBl can only be detected by the missing nucleoside interference assaywhen the major site IV is destroyed. Special AT-Rich Sequences Recognized by SATBl When the direct contact sequences were compared, no sequence homology was found, except for their high AT content. Nevertheless, SATBl is capable of recognizing certain types of sequences, since only a limited set of sequences was selected among approximately 750 bp of MAR regions surrounding the IgH enhancer delimited by Cockerill et al. (1987). To understand better the basis for SATBl recognition, the direct contact sequences in their

Cdl 636

Figure 4. Missing nucleoside experiment of SATE1 Bound to the IgH Enhancer Region The missing nucleoside experiment was performed as described in the legend to Figure 3. Annotations at the top of the lanes are identical to those in Figure 3. Each autoradiograph shows the results obtained with restriction fragments derived from the enhancer S- and 3Lflanking regions as probes (for map see Figure 5C). (A and B) The 5’-En fragment (Xbal]l]-Pvull[364]) was subcloned in Bluescript (see Experimental Procedures) and 3’Jabeled with Klenow either at the BamHl siteof the polylinker, which flanks the Xbal(1) site of the IgH enhancer fragment (A), or at the Xhol site of the polylinker, flanking the Pvull(364) site of the IgH enhancer (8). (C) Wild-type 3’-En region (EcoRl[663]-Xbal19971) 5’-labeled with T4 polynucleotide kinase at the Hindlll site of the Bluescript polylinker, which is adjacent to EcoRl (663). (D) Mutated 3’-En fragment (EcoRl[663]-Xbal[997]) labeled with Klenow at the EcoRl (663) site. The roman numerals alongside vertical brackets indicate specific contact sites of protein with DNA. The sequences of these sites and their relative locations within the IgH enhancer fragment are given in Figure 5.

natural sequence contexts were compared (Figure 5A). One common characteristic among these sequences was found: SAT61 contact sites are located in stretches of AT-rich sequences with an asymmetric distribution of guanine residues between the two strands; one strand consists exclusively of C, A, and T that are mixed, and the other strand contains only G, T, and A(indicated by capital letters and brackets in Figure 5A). We call this the ATC sequence. To test whether the asymmetric distribution of C or G in one strand is important for SATBl recognition, we employed a mutated DNA (shown in Figure 58) that was derived from the ATC sequence 51-83 (which includes site I) where the Gs in the top strand were replaced with alternating Gs and Cs throughout the ATC sequence, but the 10 bp direct contact site was kept intact. As a result, 4 bases were mutated in the surrounding sequences among the 33 bp DNA containing the contact site. These changes in the base sequence greatly reduced the binding affinity to SATBl as revealed by gel shift assays (Figure 58). This experiment strongly supports the notion that an ATC sequence is the SATBl binding consensus. This also confirms the hypothesis that sequence recognition by SATBl is mediated not only through the direct contact site, but also through the sequences immediately surrounding this site. As summarized in Figure 5C, the SATBl contact sites are clustered, and some ATC sequences are superimposed (indicated by brackets in Figure 5A) within the two matrix attachment regions surrounding the enhancer core (Cockerill et al., 1987). These clustered ATC sequences coincide with the CAA-reactive regions (Kohwi-Shigematsu and Kohwi, 1990). The adjacent sites II and III correspond to the CAA-reactive region 5’of the enhancer, whereas the adjacent sites IV, V, and VI correspond to the CAA-reactive region 3’ of the enhancer. Site I confers SATBl binding, but this region is not CAA reactive (data not shown). It

appears that CAA hyperreactivity corresponds to the region where ATC sequences occur in multiples, but a single isolated ATC sequence such as site I does not confer CAA reactivity. For example, we found that even the unpairing activity of site IV, which contains the core unwinding sequence, is greatly reduced when this site is placed by itself in a random sequence context (unpublished data). Although conjunctional SATBl-binding sites have a high unwinding potential, SATBl does not bind single-stranded DNA (data not shown). SATBl Binding in the Minor Groove of DNA with Little Contact to the Bases A small group of sequence-specific DNA-binding proteins exist that interact with AT-rich sequences within the minor groove. This includes Escherichiacoli integration host factor (IHF) protein, members of the eukaryotic high mobility group family (reviewed in Churchill and Travers, 1991) and RNA polymerase II transcription factor TFIID (Starr and Hawley, 1991; Lee et al., 1991). To test whether SATBl binds in the minor groove, we employed distamycin A, which specifically binds minor grooves of AT-rich DNA (Coil et al., 1987) as a competitor for SATBl binding. The conformational effect on DNA exerted by distamycin is very small according to Kopka et al. (1985) and Coll et al. (1987). Mobility shift assays showed that the quantitative binding of wild-type (2$ to SATBl in the absence of distamycin A (Figure 6A, lane 2; indicated by open arrowhead) decreased with increasing distamycin A concentrations (Figure 6A, lanes 3-7), until almost all the SATBl-DNA complex was ultimately replaced by a distamycin-DNA complex (indicated by a closed arrowhead) at a concentration of 100 uM distamycin A (Figure 6A, lane 7). Thus, distamycin A prevents binding of SATBl to DNA in a manner dependent on concentration, implying a minor groove recognition of SATBl. We obtained similar results with a

Tissue-Specific 637

MAWSAR

DNA-Binding

Protein

A.

B.

5’ REGION Site

I

83 I I oaoggo.c.cAGAAGTATGTGTATGGAATATTAGIAGATGTTGcttttactct tttccttgtgTCTTCATACACATACCTTATAAlC~TACAACgaaaatgago 51

Site II

12345676

Site III

wt 51

332 305 284 I I I tcc.gaactgACTTTTAACAATAATAAATTAAGTTTAAAATAmTTAAATGAATTGAGca.tgttg.g ~ggtcttgacTGAAAATTGTTATTA~~ATTCAAATTT~TAAAAATTTACTTAACTCgttacaoctc b I I 1

51

mut

3' REGION

wt 51:

5’ AGAAGTATGTGTATGGAATATTAGAAGATGTTG 3’ 3’ TCTTCATACACATACCTTATAATCTTCTACAAC5 ’

cnut 51:

5’ AGAA?TATCT&AT‘fAATATTA~AAGAT?TTG 3’ 3’ TCTTCATACAGATACGTTATAATCTTCTAGMC 5’

Site IV 753

732

I

I

ggactttegtgTCTTTAATTTCTAATATATTTAGAAAACTTCTTAAAATTACTCTATTATTCTTCTTCC cctgoootcocAGAAATTAAAGA~ATATAAATCTTTTGAAGAA~TTTAATGAGATAAT~GAAGAAGG I I I

Site 790

I

VI me I

CTCTGATTATTGGTctccatt GAGACTAATAAC&&aggtao I 1

C.

I I

Klo

II II III

IV v VI

4

5ziA

+

3' CM

3' MAR

5’MAR

d-Enhancer+ (‘1

(157)

5’

I

0 Figure 5. ATC Sequences

and SATBl

1 1

I

(683)

(334) Pvull

Sea I

Xba I

I

I

2

3

I

(997)

EcoRl I

I

4

5

Contact Sites in the IgH Enhancer

Fragment

6

Xba I I

7

8

9

c I 1OObo

3'

(A) ATC sequences within the Sand the 3’ regions of the IgH enhancer fragment are indicated by capital letters and brackets, and their positions are shown by arabic numerals. The six SATBl contact sites are underlined and indicated by roman numerals. (B) Band shift assay with bacterially produced SATBl and oligomers derived from the ATC sequence 51-83. Only the shifted bands are shown for better comparison between the wild-type (wt 51) and mutated (mut 51) ATC sequence 51-83. Lanes 1-8 contain the following protein concentrations in ng: 0 (lane l), 31 (lane 2) 62.5 (lane 3) 125 (lane 4) 250 (lane 5) 500 (lane 6) 1000 (lane 7) and 2000 (lane 8). The sequences of the double-stranded oligomers wt 51 and mut 51 are shown at the bottom; the dots indicate G’s that were replaced by C’s in the mutated DNA. The SATBI contact site is underlined. (C) Relative location of SATBl-binding sites and sequences with known activities in the IgH enhancer. The 997 bp Xbal fragment containing the enhancer, flanked by the B’and 3’ regions, is pictured with the major restriction sites. The arrow indicates the location of the ATATAT nucleation site for unpairing. The MARS located 5’ and 3’ of the enhancer core are shown by hatched bars (Cockerill et al., 1987). CAA-reactive areas are represented by double-headed arrows (Kohwi-Shigematsu and Kohwi. 1990). The four major and two minor recognition sites for SATBl are given by closed and open bars, respectively.

A. o,co* -

SATBI

-++++++-

1

0

c&?&v2

Dist.

-

2

3466

B.

0

,OpLM

76

C. B

F

Figure 6. Distamycin A Competition Interference Experiments

B

of SATBl

F

Binding and Chemical

(A) Distamycin A competition. Autoradiograph of a gel retardation assay with SATBI synthesized in rabbit reticulocyte lysate and radiolabeled 3’-En fragment (EcoRl[683]-Xbal[997]). Distamycin A (Dist.) was added at the start of the binding reaction at the micromolar concentrations indicated above the lanes. The open arrowhead indicates SATBI-bound DNA and the closed arrowhead indicates distamycinbound DNA. SATBl was omitted or present in the samples indicated by a minus sign or a plus sign, respectively. (B and C) Depurination interference (B) and methylation interference (C)with end-labeled wild-type (25)* bound to thrombin-cleaved bacterially produced SATBI. Lane B shows the bound probe and lane F shows the free probe, as described in Figure 3 legend. The dashed

smaller minor groove-binding drug, berenil (kindly provided by Dr. Stephen Neidle; Brown et al., 1990; data not shown). To determine whether SATBl directly contacts specific bases, the wild-type (25)2 DNA was sparingly depurinated by formic acid and complexed with SATBl under the same conditions employed for the missing nucleoside experiment that showed interference for binding. After binding with the protein, free and protein-bound probes were isolated from a native polyacrylamide gel, cleaved with piperidine, and separated on a sequencing gel to determine if depurination of certain nucleotides interfered with binding (Brunelle and Schleif, 1987). This interference assay removes only the bases and is differerent from that with hydroxyl radical, which removes both the bases and the sugar moieties. Figure 8B shows that depurination did not interfere with SATBl binding; band intensities in the purine ladders were identical for free and protein-bound probes, even at the previously determined sites that interfered with binding after hydroxyl radical treatment (indicated by dashed brackets). Chemical modifications of bases in the major groove also did not affect SATBl binding. Methylation of the N-7 position of guanine residues by dimethylsulfate did not interfere with protein binding (Figure 6C). We have also employed CAA, which specifically modifies the N-6 and N-l positions of adenine and the N-3 and N-4 positions of cytosine residues (Kochetkov et al., 1971; Barrio et al., 1972), for interference assays. When a DNA fragment containing CAA-modified SATBl contact sites was prepared from G&treated supercoiled plasmid DNA and used for the assay, we found that binding was not affected (data not shown). The minor groove modification at the N-3 position of adenine residues by dimethylsulfate did not show major interference with SATBl binding (Fig ure 8C). Only two modified adenine residues within the 11 bp contact site IV, which contain8 six adenine residues, interfered slightly with protein binding, as revealed by reduced band intensities in the lane containing the bound probe compared with the lane containing the free probe. We conclude from these results that SATBl binds along the minor groove of DNA without directly contacting any bases other than the two adenine residues. The result8 described above suggest an unusual mode of binding for SATBl : the protein recognizes a particular type of AT-rich sequence (ATC sequences) and binding occurs in the minor groove without extensive base contacts, strongly suggesting that sequence recognition is determined indirectly by a specific structure of the sugar-phosphate backbone in the minor groove at specific sets of AT-rich sequences in double-stranded DNA. Specific Binding of SATBl to MAR Sequences from Different Species An important question is whether SATBl is indeed a MARbars indicate SATBl contact sites as determined by the missing nucleoside experiment. The full contact sequence and the location of ATATAT sequence within the other contact site are given in (C). Two adenine residues in each repeat that interfered with binding when methylated are indicated by stars. Arrowheads indicate G.

Tissue-Specific 639

r

MAWSAR

Yeast MAR

+ + - SD

“6

m -

Protein

Plant-

“i%“-

I

CEN III

H4ARS SATBIn cow. -

DNA-Binding

MAR

** -

SD

---spn6

-

-

sp

binding protein. If so, SATBl should bind to MARS from different species, as does the MAR-binding protein ARBP, purified from chicken oviduct (von Kries et al., 1991). We tested this hypothesis by mobility shift assays. Figure 7 shows an autoradiogram of gel mobility shift experiments carried out with yeast histone H4ARS (Umek and Kowalski, 1988) and yeast centromere CENIII probes (Fitzgerald-Hayes et al., 1982) (Figure 7, lanes l-7), which have previously been shown to attach to the nuclear scaffold (Amati and Gasser, 1988) a MAR fragment from the human interferon 8 gene (Klehr et al., 1991; Figure 7, lanes 8-1 l),andaMARfrom thepotatoST-LS7 gene(Stockhaus et al., 1987; Figure 7, lanes 12-14). In each case, addition of SATBl to the binding reaction resulted in the appearance of a shifted band (Figure 7, lanes 2, 6, 9, and 13) that was inhibited by specific, unlabeled competitor DNA (Figure 7, lanes 3, 7, 10, and 14) but not by a Bluescript fragment as a nonspecific competitor (shown only in Figure 7, lanes 4 and 11). The partial nonspecific competition observed with the human interferon-8 probe(Figure 7, lane 11) probably reflects the low binding affinity of this probe to SATBl (unpublished data). In addition, the yeast centromere CEN II, MARS from the human 6-globin gene (Jarman and Higgs, 1988) and the Drosophila 87A7 heat shock gene (Mirkovitch et al., 1984) all bound to SATBl (data not shown). A synthetic poly(dA-dT)Is repeat was found to be complexed weakly by the protein (Figure 7, lanes 21-23), which is consistent with the fact that poly(dA-dT) repeats are present in the MAR sequence of the human 8-globin gene. When we employed the wildtype (25), probe as a competitor, SATBl binding to MARS was inhibited, showing that a common SATBl recognition site(s) exists in the two types of DNA (unpublished data). We tested other types of AT-rich sequences, which are not MARS and which do not unwind, for SATBl binding (Figure 7). SATBl did not bind to DNA fragments containing a TATA box of the sequence TATAAT or a poly(dA)18-poly(dT)1s tract (Figure 7, lanes 15-20). The ATrich spacer region from Xenopus oocyte 5s DNA (Fedoroff and Brown, 1978) which consists mostly of &T4 repeats, the 33 bp poly(dG-dA) and 31 bp poly(dA-dC) oligonucleotides, as well as an Oxytricha telomere sequence (A&&, all failed to bind SATBl (data not shown). Thus, SATBl

Figure 7. Mobility Shift Assay with MAR and Non-MAR DNA Probes SATE1 synthesized in reticulocyte lysate was used in gel mobility shift assays with the probes indicated above the lanes. Lanes indicated by a minus sign contain samples with no protein (SATBl) or competitor (camp.) added to the binding reaction; lanes marked by a plus sign contain samples to which protein was added. A 200-fold molar excess of specific competitor (sp) was present in the samples shown in lanes 3, 7, 10, 14, 17, 20, and 23, and nonspecific competitor (ns) was added to the samples in lanes4 and il. The nonspecific competitor was a 445 bp Pvull restriction fragment isolated from Bluescript.

distinguishes AT-rich MAR sequences from AT-rich nonMAR sequences. Tissue-Specific SATBl Expression The nature of the SATBl mRNA and its tissue distribution were examined by Northern and RNAase protection experiments with total cellular RNA isolated from mouse and human tissues. A Northern blot with RNA from mouse tissues (5-8 weeks old) was hybridized with a uniformly labeled restriction fragment (containing nucleotides 2941278) derived from pAT1146, which was expected to be very specific for SATBl transcripts since it lacked the sequences encoding the glutamine stretch (present in other genes) and the AT-rich untranslated 5’ region. Unexpectedly, a major transcript of 3.7 kb and some minor bands were detected in thymus (Figure 8A), but not in other tissues examined. Brain expressed a minute amount of SATE1 transcripts, visible only after overexposure of the films (data not shown). The size and the pattern of bands were identical between human and mouse thymus (data not shown). No transcript was detected in testis RNA (even after long film exposure), although the SATBl-expressing cDNA was originally isolated from a testis library. To confirm a very low level of SATBl transcript in testis, we analyzed RNA from human testis and thymus by RNAase protection. The autoradiogram in Figure 88 reveals the presence of a small amount of SATBl-specific transcript in total cellular RNA from human testis (Figure 8B, lanes 3,6, and 9) and from a human testicular cell line, Germa-l (Hofmann et al., 1989b) (Figure 88, lanes 4, 7, and 10). Lanes 5-10 in Figure 88 are from an overexposure of the same gel to confirm the presence of the SATBl-specific band in testis and Germa-l . The amount of SATBl -specific transcript was substantially lower in testis than in thymus, and the weak signals were not caused by contamination from the strong band, since the same intensity was observed in weak bands that were well separated (Figure 88, lanes 9 and 10). These results indicate that thymus is the major tissue of SATBl expression, both in human and mouse, and that the SATBl-encoding cDNA isolated from testis originated from avery rare RNA in this tissue. Immunofluorescence data obtained with anti-SATBl antiserum revealed that SATBl is expressed in thymocytes (unpub-

Figure 8. Tissue pression

A.

Specificity

of SATBl

Ex-

(A) Northern blot analysis. Ten micrograms each of total cellular RNA from mouse tissues was analyzed using a radiolabeled Aval(392)Bglll(1278) restriction fragment from pATl148 (Figure 1A) as a probe. The major transcript of 3.7 kb in thymus is marked. The same size -3.7 Kb transcript was also detected in human thymus RNA (data not shown). Abbreviations are: Br = brain, He = heart, Ki = kidney, Li = liver, Lu = lung, Mu = muscle, Ov = ovaries, Sp = spleen, Th = thymus, Te = testis. The bottom panel is a photograph of the ethidium bromidestained 28s and 18s rRNA bands on the formaldehyde gel before transfer to the hybridization membrane to show that equal amounts of RNA ware loaded in each track. (B) Ribonuclease protection assay. The synthesis of the RNA probe is described in Experimental Procedures and represented schematically below the autoradiograph. The sizes of the undigested probe (244 bases) and of the RNAase-resistant fragments (196 bp; lane M) are indicated. The control lane (lane 1) contains 2 3 4 kD undigested probe. Total cellular RNA (25 pg each) was from human thymus (lanes 2 and 5), testis (lanes 3, 6, and Q), and the human testicular cell line Germa-l (lanes 4,7, and IO; Hofmann et al., 1989b). Lane 8 does not con196. tain RNA. The panel on the right with lanes 510 is an overexposure of the autoradiograph shown on the left, to confirm the presence of a 196 bp protected RNA transcript in testis and Germa-I. (C)Western blot with rabbit anti-SATBI serum. Lane 1, bacterially produced, thrombin-cleaved 24 4 bases CI Ribprobe ASATBl; lane 2, immunoprecipitated wholecell extract from mouse thymus, containing IO 196bp protected fragment wg total protein; lanes 3 and 4, immunoprecipitated whole-cell extract from mouse liver, containing 10 pg (lane 3) and 20 Fg (lane 4) total protein per lane. Size markers are shown on the right in kilodaltons. The predicted sizes of 85.9 kd for the full-length SATBI protein and 80.0 kd for the truncated, thrombin-cleaved GST-ASATBl protein (see Experimental Procedures) are indicated on the left. Western blot analysis using thymus and liver cell extract without immunoprecipitation revealed a specific band in thymus, but not in liver (data not shown). Since the absence of a signal in liver could have been attributable to lack of sensitivity, we used a 30- to 60-fold larger volume of cell extract for immunoprecipitation for the Western blot analysis shown here. Br

He

Ki

Li

Lu

Mu

Ov

Sp

Th

Te

lished data). Thus, SATBl is the first example of a tissuespecific MAR-binding protein. To confirm that pAT1146 contains the complete SATBlcoding sequence, rabbit anti-SATBl antibodies were generated to compare the size of the bacterially produced SATBl with the native protein isolated from mouse thymus. Cell extracts from thymus-, liver-, and thrombincleaved glutathione-S-transferase (GST)-ASATBl fusion protein were subjected to immunoprecipitation with antiSATBl antibody. The immunoprecipitates were separated on SDS-polyacrylamide gels and identified by Western blotting with anti-SATBl rabbit serum (see the legend to Figure 8C). A prominent band of about 85 kd is visible in the thymus extract (Figure 6C, lane 2), which is in perfect agreement with the 85.9 kd size predicted by the cDNA pAT1146. No band was detected in liver extract (Figure 8C, lanes 3 and 4). Bacterially produced SATBl was slightly smaller (Figure 8C, lane l), because 59 aminoterminal amino acids were deleted from pATll46 prior to

cloning it into the expression vector pGEX-2T (see Experimental Procedures), thereby reducing the size of the protein to 80.0 kd. These results indicate that the cDNA clone pAT1146 encodes a full-length SATBl protein, even though the size of the cDNA (2.9 kb) is smaller than the mRNA size (3.7 kb). Discussion cDNA Cloning for a Novel MAR-Binding Protein We selected an AT-rich sequence previously determined to be the core unwinding sequence of a MAR segment located 3’ of the IgH enhancer (Kohwi-Shigematsu and Kohwi, 1990) to isolate a cDNA for a MAR-binding protein. Employing a concatemerized oligonucleotide of this sequence as a probe that behaves like a naturally occurring MAR (Bode et al., 1992), we isolated and characterized a novel human cDNA of 2.9 kb in length that encodes a 763 aaor 85.9 kd SATBl protein, which is expressed predomi-

Tissue-Specific 641

MAWSAR

DNA-Binding

Protein

nantly in thymus. We demonstrate that SATE1 not only binds to thesynthetic MAR probe, but alsowith high affinity to various MARS from human, mouse, Drosophila, yeast, and plant SAT61 binding can be specifically competed by MARS from any species, but not by non-MAR DNAs. SATBl is therefore a tissue-specific MAR-binding protein. Sequence Selectivity of SATBl Binding Although MARS are known to be AT rich in general, the MAR-binding protein SATBl is not merely an AT-rich sequence-binding protein. Our results revealed a selectivity for only certain subsets of AT-rich sequences. Using SATBl as a new tool, we discovered important features of the sequences that constitute MARS. SATBl showed a strong affinity to the double-stranded, linear wild-type (25)7 oligonucleotide that has a high potential to unwind under superhelical strain and a substantially low affinity to the mutated (24)a sequence with 3 base changes and 1 base deletion that leads to a complete loss of the unwinding property. Thus, specific point mutations that dramatically alter the inherent structural property, but do not significantly alter the overall AT content of the DNA, were clearly distinguished by SATBl. Specific recognition by SATBl was further demonstrated by the fact that within the 997 bp IgH enhancer region containing two sets of MARS, only certain subsets of short motifs in the MARS made direct contact with SATBl. Regions of conjunctional SATBlbinding sites correspond to the sequences that have a potential to unwind or to the CAA-reactive sites in supercoiled DNA (Kohwi-Shigematsu and Kohwi, 1990). None of these direct contact sites revealed a primary base sequence consensus. However, when these sites were compared in their natural sequence context, an important common feature was found. That is, in addition to being AT rich, there is a bias in the distribution of cytosines or guanines on either strand. Such special AT-rich with one strand consisting exclusively of A’s, T’s, and C’s that are well mixed, avoiding long stretches of homopurines and homopyrimidines (ATC) sequences, confer SATBl binding. The 30 bp ATC sequence for site I in the 5’region of the IgH enhancer is only 66% AT rich. Mutations that destroyed the ATC feature for site I but kept the direct contact site intact severely reduced SATBl binding. ATC sequences appear to be nearly as specific as primary sequence consensus; ATC sequences thus represent a novel concept for a protein binding consensus. A loosely defined consensus that could be drawn by aligning SATBl direct contact sequences, as is generally done for other DNA-binding proteins, would have been misleading in terms of defining binding specificity for the case of SATBl Unusual Binding Characteristic of SATBl SATBl binding has high sequence selectivity, and yet it apparently makes little contact with the DNA bases. None of the chemical modifications at the major groove or depurination of approximately 1 base per molecule interfered with binding. The findings that SATBl did not bind in the major groove, that the missing nucleoside experiment revealed specific contact sites for SATBl binding, and that the minor groove-binding distamycin inhibited SATBl

binding strongly suggest that SATBl binds in the minor groove and recognizes ATC sequences indirectly through their effects on the geometry of the sugar-phosphate backbone. SATBl seems to recognize a sequencedependent distortion of the DNA structure that already exists in the linear double-stranded DNA, presumably at the phosphate backbone and the minor groove, before the DNA sequence becomes unpaired as a result of superhelical strain. Unlike an autonomously replicating sequence consensus-binding protein (ACBP) (Hofmann and Gasser, 1991), SATBl does not bind single-stranded ATC sequences (data not shown). It will be interesting to examine whether SATBl binding to linear double-stranded ATC sequences induces an unusual conformation similar to that observed with ATC sequences in supercoiled DNA. It is known that most DNA-binding proteins that have an obvious primary sequence consensus make contacts with the DNA bases in the major groove, whereas many proteins with low sequence selectivity, like the histone-like HV class of proteins, histone Hl, topoisomerase II, DNAase I, and the nonhistone chromosomal a protein or HMG-I, which is part of the nuclear matrix, contact the phosphate backbone and the minor groove of DNA (for a review see Churchill and Travers, 1991; Adachi et al., 1989). Unlike these proteins, the minor groove-binding protein SATBl selects a particular group of sequences for binding. The trp repressor is an example of a sequence-specific protein that interacts with the phosphate backbone but not directly with the DNA bases(Otwinowski et al., 1988). This binding mode is similar to that of SATBl in the sense that sequence recognition is indirectly achieved by the DNA structure. An important difference, however, isthat SATBl binds in the minor groove while trp repressor binds in the major groove (Kumamoto et al., 1987; Otwinowski et al., 1988). Some known sequence-specific binding proteins bind in the minor groove. IHF recognizes specific DNA sites (Goodrich et al., 1990) and yet it primarily contacts the minor groove of DNA (Yang and Nash, 1969). Transcription factor IID has a similar mode of binding: it binds to the TATA box, or the consensus 5’-TATAAA-3’ (Breathnach and Chambon, 1981) in the minor groove (Starr and Hawley, 1991; Lee et al., 1991). Both IHF and TFIID proteins, however, bind deeply in the minor groove so that they make direct contact with the bases. In contrast, SATBl binding in the minor groove hardly touches DNA bases except for two adenines. Furthermore, although substitutions of adenine to inosine, together with thymine to cytosine, within the direct contact site did not alter binding of TFIID (Starr and Hawley, 1991) these substitutions diminished SATBl binding (unpublished data). These substitutions can cause conformational changes of DNA (Koo and Crothers, 1987) and therefore the minor groove conformation specifically determined by unsubstituted, natural ATC sequences must be important for SATBl binding. Biological Significance of SATBl MAR DNA segments are interchangeable across the species barrier (reviewed in Phi-Van and Strltling, 1990); in support of this, MARS from different cell origins bind chicken ARBP (von Kries et al., 1991) and human SATBl

Cell 642

(Figure 8). Assuming that MAR-binding proteins are solely structural components of the nuclear matrix and serve to anchor bases of chromatin loops, one may expect that common MAR-binding proteins would exist in every cell type of a given species. However, too little is known to date about MAR-binding proteins to elaborate on their function or possible cell specificity. A recent work showed that the attachment of the avian (5globin enhancer to the nuclear matrix follows a tissue-specific pattern (Brotherton et al., 1991) implying the possible existence of a tissue-specific MAR-binding protein. Here we demonstrate the existence of a MAR-binding protein, SATBl, that is tissue specific and expressed predominantly in thymus, although it is also present in minute quantities in other tissues. This tissuespecific SATBl was also found to be a nuclear protein and is a component of the nuclear matrix prepared from thymocytes (unpublished data). The existence of a tissuespecific MAR-binding protein strongly suggests a new role(s) for a MAR-binding protein in addition to anchoring chromatin to the nuclear matrix. In fact, our preliminary data show that SAT61 acts as a suppressor for transcription based on transient cotransfection assays with a reporter gene (unpublished data). We are currently studying whether SATBl suppresses a specific or multiple endogenous genes in nuclei. It is possible that SATBl suppresses multiple genes at the level of whole transcriptional domains by interacting with the bases of chromatin loops. Because MARS often comap with regulatory regions including enhancers, it is not surprising that regulatory proteins would exist in the nuclear matrix so that they could have easy access to the target DNAsequences. In this sense, it is interesting that SATBl , which has transcriptional suppression activity, is not only a nuclear matrix component but also a MAR-binding protein itself. The cloning of SATBl provides us with a molecular tool to dissect and reevaluate MARS and to elucidate the role of special AT-rich sequences with unusual structural properties in biological processes such as transcription and possibly replication. Experimental Procedures

Pvull(382), the core enhancer was excised by Pvull(382) and EcoRl(683) digestion, and the 3’-En spanned the region between EcoRl(683) and Xbal(992). The 5’-En and the 3’-En fragments were subcloned in the Xbal-Hincll and EcoRl sites of the vector Bluescript (Stratagene), respectively. The wt51 and mut51 probes were prepared by hybridizing the oligonucleotide sequences shown in Figure 58, which contain the 5’-most SATE1 contact site (site I), followed by purification on a polyacrylamide gel. The yeast histone H4ARS is located on a Hindlll(398)-EcoRl(798) restriction fragment derived from plasmid pAE9 (Bouton and Smith, 1988; kind gift of Dr. D. Kowalski). The CENIII sequence was isolated as a 0.6 kb Sau3A fragment from plasmid pYE(CEN3)30 (FitrgeraldHayes et al., 1982; kind gift of Dr. J. Carbon). The human interferon 8 probe was a Xmnl(564)-BamHl(995) fragment purified from plasmid pCL (Klehr et al., 1991; kind gift of Dr. J. Bode). The human 8-globin MAR fragment was located on a 390 bp Hindlll-ApaLI fragment from plasmid pRYG &fuller et al., 1988; kind gift of Dr. M. Muller). The plant MAR probe was a Styl(2643)-Xbal(3140) fragment isolated from the potato gene ST-LSI, in pUCl9 (Stockhaus et al., 1987; kind gift of Dr. J. Bode). A TATA box containing the DNA sequence was isolated from the Ptac promoter of the vector pGEX-2T (Pharmacia) by Sspl(184)EcoNl(264) digestion. The poly(dA)la sequence in the BspEl(2826)BamHl(polylinker) region of the cDNA clone pATll46 was used as a probe. The synthetic oligonucleotides 32 bp poly(dA-dT),, 33 bp poly(dG-dA),, and 31 bp poly(dA-dC),, as well as the Oxytricha telomere sequence (kind gift of Dr. V. Zakian), were cloned in the polylinker of pUC19 and purified from the plasmids after EcoRI-Hindlll digestion. The AT-rich spacer region from Xenopus oocyte 5s DNA was a 530 bp Haelll-Hindlll fragment from the cloned insert X 1036 of the plasmid pMB9 (kind gift of Dr. W. Reynolds; Fedoroff and Brown, 1978). Probes used in gel mobility shift assays were generated by cutting plasmids with appropriate restriction enzymes, purifying the fragments on agarose or polyacrylamide gels, and end labeling with DNA polymerase I Klenow fragment by standard procedures (Maniatis et al., 1982). For protein binding interference experiments, DNA probes were selectively labeled at one end by digesting the plasmid containing the respective fragment with an appropriate restriction enzyme, then labeled with the Klenow fragment at the 3’end or with T4 polynucleotide kinase at the 5’ end after dephosphorylation with calf intestinal phosphatase. Unincorporated deoxyribonucleotides were removed by spin column chromatography through Sephadex G50, followed by a second restriction enzyme digestion at the opposite end of the probe. The labeled fragment was purified from a native polyacrylamide gel by electroelution into dialysis bags or by soaking the gel slice in 0.2 M NaCI-TE at 37% overnight and subsequent ElutipD (Schleicher and Schuell) purification. DNA probes for Northern hybridizations were labeled by random oligonucleotide priming (Pharmacia).

DNA Probes

Cloning and Sequencing of cDNA Encoding SATBl

The wild-type (25 bp) and mutated (24 bp) oligonucleotides containing the wild-type or mutated sequence derived from the IgH 3’-En (position 731-756) were synthesized, made into duplex DNA, and multimerized to either dimers, pentamers, heptamers, or octamers as described by Bode et al. (1992). In brief, complementary oligonucleotides for the wild-type (25 bp) 5’-TCllTAATTTCTAATATATTTAGAAttc-3’ and 5’-TTCTAAATATATTAGAAATTAAAGAgaa-3’ or for the mutated (24 bp) 5’-TCTTTAAllTCTACTGCTTTAGAAttc-3’ and 5’-TTCTAAAGCAGTAGAAATTAAAGAgaa-3’ were hybridized into double-stranded DNA and concatemerized by hybridization through overlapping single-stranded ends (indicated by lowercase letters). The orientation of the sequence was the same throughout the multimer. The singlestranded ends of multimers were digested with mung bean nuclease. Multimers were separated on a polyacrylamide gel, and cloned into the EcoRV site of a Bluescript vector. The concatemerized sequences were isolated by digesting the Bluescript recombinant DNA with BarnHI and Hindlll. The Xbal restriction fragment (l-992) containing the entire IgH enhancer region as described by Gillies et al. (1963) was cloned by use of EcoRl linkers in the EcoRl site of plasmid pUC16. The subfragments of the IgH enhancer region used in gel retardation assays were the following: 5’-En was a fragment located between Xbal(1) and

The cloned wild-type (25), was excised from the vector Bluescript by BamHI-Hindlll digestion and end labeled with the DNA polymerase I Klenow fragment to a specific activity of 4 x lOa cpmlng. It was used as a probe to screen protein replica filters of a human testis cDNA library cloned in ?@I 1. Plating of the library and screening of the filters were done as described by Vinson et al. (1988) except that the filter overlays were incubated at room temperature overnight and the guanidine hydrochloride denaturation-renaturation cycles were omitted. Binding reactions were carried out at 4% for 3 hr, the binding buffer was supplemented with 10 uglml double-stranded poly(dl-dC), 10 pg/ ml denatured salmon sperm DNA, and 125 m M NaCI. The phage clone MT11 was purified through four successive rounds of filter hybridizations and its insert was subcloned in the EcoRl site of the vector Bluescript. cDNA clones were digested from both 5’ and 3’ ends with exonuclease Ill-mung bean nuclease to generate overlapping deletion clones. Sequencing was performed with modified T7 DNA polymerase (Pharmacia) according to the manufacturets instructions. To isolate additional cDNA clones homologous to the pATl1 clone, we rescreend the human testis cDNA library using the pATl1 cDNA sequence as a hybridization probe. A full-length cDNA in Bluescript

Tissue-Specific 643

MARlSAR

DNA-Binding

Protein

vector (pAT1146) was generated by replacing the Bglll-Xbal fragment of pATI cDNA in Bluescript by the Bglll-Xbal fragment of pAT46. Expression of SATBl For synthesis of a GST-ASATBI fusion protein, the cDNA clone pATll46 was cut with Aval(392) downstream of the start codon and with Xbal (polylinksr) at the 3’ end, blunt ended with Klenow, and ligated in the blunt-ended EcoRl siteof theexpression vector pGEX-PT (Pharmacia). The plasmid with the correct orientation was selected. The synthesized, thrombin-cleaved protein was therefore 80 kd in size, because 59 aaof the N-terminal were missing. Protein synthesis, purification of the GST-ASATBI fusion protein, and removal of the GST portion by cleavage with thrombin were as described (Smith and Johnson, 1988; Gearing et al., 1989). The resulting 80 kd ASATBI protein was used for gel shift assays where indicated and referred to as “bacterially produced SATBl” in the text. A mammalian expression plasmid that expresses SATBl driven by the SV40 enhancer-promoter sequence was constructed using Bluescript that contains the BamHIPvull fragment of pECE (Ellis et al., 1986; recloned into the Bluescript vector and kindly provided by Dr. C. Hauser). The BamHI-Pvull fragment was blunt ended and recloned into the Pvull sites of Bluescript. The whole pATl146 was inserted into this vector at the EcoRl site. In vitro transcription reactions were done with 1 ug of linearized proteinase K-treated plasmid DNA with a purchased RNA synthesis kit (Stratagene). The quality of the synthesized RNA was examined by formaldehyde-agarose gel electrophoresis (Maniatis et al., 1982) and the yield was determined by ODm measurement. Protein was synthesized from 0.5 trg of RNA per 50 ul reaction in a rabbit reticulocyte lysate system (Promega) with [%]methionine. The full-length SATBI wasproducedfrom theT3promoterof pATI linearized with BamHI. Proteins were separated on SDS-polyacrylamide minigels (Laemmli, 1970). and the bacterially produced proteins were visualized by staining the gels with Coomassie brilliant blue, whereas the 35S-labeled proteins produced in reticulocyte lysate were exposed to XAR film. Protein concentrations were determined using a protein assay kit (Bio-Rad). Gel Mobility Shift Assays Binding reactions were done in IO ul total volume containing 10 m M HEPES (pH 7.9) 1 m M dithiothreitol, 50 m M KCI, 2.5 m M MgCI,, 10% glycerol, 0.1 mglml poly(dl-dC)-poly(dI-dC), and 0.5-l ng of thrombin-cleaved 80 kd ASATBl (described in the preceding section of Experimental Procedures) or 1 ul of the reticulocyte lysate reaction containing in vitro synthesized SATBl. When purified fusion protein was used, the reaction mixture was supplemented with 5 mg/ml bovine serum albumin. Samples were preincubated at room temperature for 5 min before radiolabeled DNA probe (0.5 ng) was added. In some cases, lOOngofunlabeledDNAprobewasaddedasaspecificcompetitor. The binding reaction was allowed to proceed at room temperature for 15 min. Aliquots (4 ul) of each sample were electrophoresed at 120 V for 2 hr through a 6% polyacrylamide minigel (8 x 10 cm) containing 0.05% bisacrylamide, 5% glycerol, and 0.5 x Tris-borate-EDTA (Maniatis et al., 1982). The gels were dried onto Whatman 3 M M paper and exposed to XAR film. Two films were superimposed to distinguish the [%S]methionine-labeled protein from the P’P]DNA. Protein Binding Interference Experlments The missing nucleoside experiment described by Hayes and Tullius (1989) was adopted with minor modifications. Approximately 150 ng (3 x lo6 cpm) of a radiolabeled DNA fragment was used for hydroxyl radical treatment at 37%. The final Fe(ll) concentration was 20 WM. Binding reactions were carried out as described above, with 8 ug of thrombincleaved 80 kd ASATBl and 50 ng (lo8 cpm) DNA probe in a total volume of 30 pl. Under these conditions, at least 75% of the labeled probe was routinely found to be shifted by protein after gel electrophoresis. The protein-DNA complex and the remaining free probe were separated by polyacrylamide minigel electrophoresis and visualized by exposure of the wet gel to XAR film for 30 min at 4%. Free and bound probes were eluted from the gel by soaking the gel slices containing the respective fragments overnight at 37% in 0.2 M NaCI-TE followed by purification through Elutip-D columns (Schleicher & Schuell), phenol-chloroform extraction, and ethanol precipitation with 20 ug of glycogen as a carrier. The purified DNA was

electrophoresed through 8% denaturing polyacrylamide gels. Autoradiographs of the sequencing gels were scanned by laser densitometry to determine the relative strengths of contacts between SATBI and the target DNA. For methylation interferenceexperiments, asingleend-labeled fragment (7 x lo6 cpm) was chemically modified with 1 nl of dimethylsulfate in a 200 pl total volume for 5 min at 15% according to Maxam and Gilbert(1977). Thesubsequent bandshiftassayaswellas the isolation and purification of the free and protein-bound probes were done exactly as described for the missing nucleoside experiment, except that the purified probes were cleaved with piperidine before loading on an 8% sequencing gel. This same procedure was used for the acid depurination interference experiments (Brunelle and Schleif, 1987) except that the target probe (2 x 10’ cpm in 15 nl) was treated with 1 ul of 4% formic acid for 25 min at 37% to partially eliminate guanines and adenines. For CAA interference analysis, supercoiled pSERCy3OObp containing the EcoRl(883)-Xbal(997) fragment of the IgH enhancer region was modified with a 50 m M Na’concentration of CAA (pH 5.0) as described by Kohwi-Shigematsu and Kohwi (1990). The extent and the site of modification was confirmed by the chemical cleavage method (Kohwi and Kohwi-Shigematsu, 1988). The CAAmodified EcoRI-Xbal fragment was used as a probe for protein binding. RNA Analysis Total cellular RNA was prepared from frozen mouse tissues or tissue culture cells by the guanidinium thiocyanate method (Chirgwin et al., 1979). Human RNA was purchased from Clontech. For Northern blots, total cellular RNA (10 ug per lane) was separated by formaldehyde agarose gel electrophoresis as described by Maniatis et al. (1982). RNA was transferred to a Zetaprobe membrane in 10 x SSC (1 x SSC is 0.15 M NaCl plus 0.015 M sodium citrate) and fixed to the membrane by ultraviolet cross-linking. Filters were prehybridized, hybridized to a radiolabeled probe, and washed according to standard procedures (Maniatis et al., 1982). RNAase protection experiments were carried out as follows: the 5’ BamHl (polylinker)-Bglll(l278) fragment of pATl148 was subcloned in the BamHl restriction site of Bluescript, linearized with Ncol(l082), and radiolabeled antisense RNA was synthesized with T7 RNA polymerase (Stratagene) as specified by the manufacturer. The riboprobe was 244 bases (including 48 bases derived from Bluescript) in length and the size of the protected fragment was 196 bp. RNAase probe protection was performed with 25 pg total cellular RNA as described (Melton et al., 1984) except that digestion was at 30% for 30 min with a concentration of 50 uglml RNAase A (Sigma) and 900 U/ml RNAase Tl (Bethesda Research Laboratories). RNA was precipitated in the presence of 20 pg of glycogen (as a carrier) per sample. Preparation of Rabbit Anti-SATBl Antibody Rabbits were immunized subcutaneouslywith approximately500 pg of thrombin-cleaved 80 kd ASATBl protein in a 1:l mixture with Freund’s complete adjuvant. Subsequent immunizations were performed at 4 week intervals. Sera were obtained IO-14 days after each boost. The antiserum that exhibited 160,000 titer with enzyme-linked immunosorbent assay was used in this study. Gel Electrophoresis and Western Blots For Western blot analysis, whole-cell extracts from mouse tissues were prepared by Dounce homogenizing tissue samples in 10 m M sodium phosphate (pH 7.5) 0.5 m M dithiothreitol, 10% glycerol, 0.4 M KCI, 100 uglml aprotinin, and 34 uglml phenylmethylsulfonyl fluoride, followed by centrifugation for 1 hr at 100,000 x g. The cleared supernatants were incubated for 1 hr at room temperature with rabbit antiSATBl serum and the antigen-antibody complexes were precipitated with protein A-Sepharose glass beads. The precipitate was washed three times with phosphate-buffered saline containing 0.1% NP-40, resuspended by boiling in SDS sample buffer, and separated on SDSpolyacrylamide gels (Laemmli, 1970). The proteins separated on SDSpolyacrylamide gels were electrophoretically transferred to lmmobilon P membranes (Millipore) in 10 m M CAPS and 10% methanol (Matsudaira, 1987). Antigenic protein was detected by incubating the blots with SATE1 antiserum in Tris-saline with 10% bovine serum albumin, followed by several washes in Tris-saline, incubation with horseradish

Cdl 644

peroxidase-labeled goat anti-rabbit immunoglobulin (I:2000 dilution), and subsequent treatment of the blots with 5 mg/ml diaminobenzidine and 1% hydrogen peroxide.

Cockerill, P. N., Yuen, M.-H., and Garrard, W. T. (1987). The enhancer of the immunoglobulin heavy chain locus is flanked by presumptive chromosomal loop anchorage elements. J. Biol. Chem. 262, 53945397.

Acknowledgments

Coil, M., Frederick, C. A., Wang, A. H.J., and Rich, A. (1987). A bifurcated hydrogen-bonded conformation in the d(A.T) base pairs of the DNA dodecamer d(CGCAAATTTGCG) and its complex with distamycin. Proc. Natl. Acad. Sci. USA 84, 8385-8389.

This work was supported by the National institutes of Health (ROI CA39681 to T. K.-S and ROI CA51377 to Y. K.), an American Cancer Society Faculty Award to T. K.-S., and funds provided by the Cigarette and Tobacco Surtax Fund of the State of California through the Tobacco-Related Disease Research Program of the University of California (1 KT98 to Y. K.). We thank Drs. Harold Weintraub, James Kadonaga, and Craig Dickinson for kindly reviewing the manuscript and providing helpful comments and Drs. Jeffrey Hayes and Tom Tullius for their valuable advice on the missing nucleoside experiment. We also thank Drs. Jiirgen Bode, Robert Oshima, Charles van Beverlin, and Richard Maki for valuable discussion. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 16 USC Section 1734 solely to indicate this fact. Received

March 16, 1992; revised July 2, 1992.

References Adachi, Y., Kas, E., and Laemmli, U. K. (1969). Preferential, tive binding of DNA topoisomerase II to scaffold-associated EMBO J. 8, 3997-4006.

cooperaregions.

Amati, B. B., and Gasser, S. M. (1966). Chromosomal ARS and CEN elements bind specifically to the yeast nuclear scaffold. Cell 54,967976. Anderson, J. N. (1986). Detection, sequence patterns and function of unusual DNA structures. Nucl. Acids Res. 14, 8513-6533. Barrio, J. R., Secrist, J. A., Ill, and Leonard, N. J. (1972). Fluorescent adenosine and cytosine derivatives. Biochem. Biophys. Res. Commun. 46597-604. Berezney, R., and Coffey, D. S. (1974). Identification of a nuclear protein matrix. Biochem. Biophys. Res. Commun. 60, 1410-1417. Bode, J., and Maass, K. (1968). Chromatin domain surrounding the human interferon-8 gene as defined by scaffold-attached regions. Biochemistry 27, 4706-4711. Bode, J., Kohwi, Y., Dickinson, L., Joh, T., Klehr, D., Mielke, C., and Kohwi-Shigematsu, T. (1992). Biological significance of unwinding capability of nuclear matrix-associating DNAs. Science 255, 195-197. Bouton, A., and Smith, M. M. (1988). Fine-structure analysisof the DNA sequence requirements for autonomous replication of Saccharomyces cerevisiae. Mol. Cell. Biol. 6, 2354-2363. Breathnach, R., and Chambon, P. (1981). Organization and expression of eukaryotic split gene coding for proteins. Annu. Rev. Biochem. 50, 349-383. Brotherton, T., Zenk, D., Kahanic, S., and Reneker, J. (1991). Avian nuclear matrix proteins bind very tightly to cellular DNA of the f3-globin gene enhancer in a tissue-specific fashion. Biochemistry 30, 58455850. Brown, D. G., Sanderson, M. R., Skelly, J. V., Jenkins, T. C., Brown, T., Garman, E., Stuart, D. I., and Neidle, S. (1990). Crystal structure of a berenil-dodecanucleotide complex: the role of water in sequencespecific ligand binding. EMBO J. 9, 1329-1334. Brunelle, A., and Schleif, R. F. (1987). Missing contact probing of DNA-protein interactions. Proc. Natl. Acad. Sci. USA 84, 6673-6676. Chirgwin, J. M., Przybyla, A. E., MacDonald, R. J., and Rutter, W. J. (1979). Isolation of biologically active ribonucleic acid from sources enriched in ribonuclease. Biochemistry 78, 5294-5299. Churchill, M. E. A., andTravers, A. A. (1991). Protein motifs that recognize structural features of DNA. Trends Biochem. Sci. 76, 92-97. Cockerill, P. N., and Garrard, W. T. (1986). Chromosomal loop anchorage of the kappa immunoglobulin gene occurs next to the enhancer in a region containing topoisomerase II sites. Cell 44, 273-282.

Dijkwel, P. A., and Hamlin, J. C. (1988). Matrix attachment regionsare positioned near replication initiation sites, genes, and an interamplicon junction in the amplified dihydrofolate reductase domain of Chinese hamster ovary cells. Mol. Cell. Biol. 8, 5398-5409. Ellis, L., Clauser, E., Morgan, D. O., Edery , M., Roth, R. A., and Rutter, W. J. (1986). Replacement of insulin receptor tyrosine residues 1162 and 1163 compromises insulin-stimulated kinase activity and uptake of P-deoxyglucose. Cell 45, 721-732. Fedoroff, N. V., and Brown, D. D. (1978). The nucleotide sequence of oocyte 5s DNA in Xenopus laevis. I. The AT-rich spacer. Cell 73,701716. Fitzgerald-Hayes, M., Clarke L., and Carbon, J. (1982). Nucleotide sequence comparisons and functional analysis of yeast centromere DNAs. Cell 29, 235-244. Forrester, W. C., Takegawa, S., Papayannopoulou, T., Stamatoyannopoulos, G., and Groudine, M. (1987). Evidence for a locus activating region: the formation of developmentally stable hypersensitive sites in globin expressing hybrids. Nucl. Acids Res. 75, 10159-10177. Gasser, S. M., and Laemmli, U. K. (1986a). Cohabitation of scaffold binding regions with upstream/enhancer elements of three developmentally regulated genes of D. melanogaster. Cell 46, 521-530. Gasser, S. M., and Laemmli, U. K. (1986b). The organization of chromatin loops: characterization of a scaffold attachment site. EMBO J. 5, 51 l-518. Gasser, S. M., and Laemmli. U. K. (1987). A glimpse at chromosomal order. Trends Genet. 3, 16-22. Gearing, D. P., Nicola, N. A., Metcalf, D., Foote, S., Willson, T. A., Gough, N. M., and Williams, R. L. (1989). Production of leukemia factor in Escherichia co/i by a novel procedure and its use in maintaining embryonic stem cells in culture. Biotechniques 7, 1157-l 161. Gillies, S. D., Morrison, S. L., Oi, V. T., and Tonegawa, S. (1983). A tissue-specific transcription enhancer element is located in the major intron of a rearranged immunoglobulin heavy chain gene. Cell 33,717728. Goodrich, J. A., Schwartz, M. L., and McClure, W. R. (1990). Searching for and predicting the activity of sites for DNA binding proteins: compilation and analysis of the binding sites for Escherichia co/i integration host factor (IHF). Nucl. Acids Res. 18, 4993-5000. Grosveld, F, van Assendelft, G. B., Greaves D. R., and Kollias G. (1987). Position-independent, high-level expression of the human f3-globin gene in transgenic mice. Cell 51, 975-985. Hayes, J. J., and Tullius, T. D. (1989). The missing nucleoside experiment: a new technique to study recognition of DNA by protein. Biochemistry 28, 9521-9527. Hofmann, J. F.-X., and Gasser, S. M. (1991). Identification and purification of a protein that binds the yeast ARS consensus sequence. Cell 64, 951-960. Hofmann, J. F.-X., Laroche, T., Brand, A. H., and Gasser, S. M. (1989a). RAP-l loop formation is necessary for DNA loop formation in vitro at the silent mating type locus /-/ML. Cell 57, 725-737. Hofmann, M.-C., Jeltsch, W., Brecher, J., and Walt, H. (1989b). Alkaline phosphatase isozymes in human testicular germ cell tumors, their precancerous stage, and three related cell lines. Cancer Res. 49, 4696-4700. Homberger, H. P. (1989). Bent DNA is a structural feature of scaffoldattached regions in Drosophila melanogaster interphase nuclei. Chromosoma 98, 99-104. Jarman, A. P., and Higgs, D. R. (1988). Nuclear scaffold attachment sites in the human globin gene complexes. EMBO J. 7, 3337-3344.

Tissue-Specific 645

MARlSAFt DNA-Binding

Protein

Kellum, Ft., and Schedl, P. (1991). A position-effect assay for boundaries of higher order chromosomal domains. Cell 64, 941-950.

structure of trp repressor/operator ture 335, 321-329.

Kellum, Ft., and Schedl, P. (1992). A group of scs elements function as domain boundaries in an enhancer-blocking assay. Mol. Ceil. Biol. 12, 2424-2431.

Phi-Van, L., and Stratling, W. H. (1988). The matrix attachment regions of the chicken lysozyme gene co-map with the boundaries of the chromatin domain. EMBO J. 7, 855-664.

Klehr, D., Maass, K., and Bode, J. (1991). Scaffold-attached regions from the human interferon 6 domain can be used to enhance the stable expression of genes under the control of various promoters. Biochemistry 30, 1264-l 270.

Phi-Van, L., and Strltling, W. H. (1990). Association clear matrix. Prog. Mol. Subcell. Biol. 7 7, 1-I 1.

Kochetkov, N. K., Shibaev, V. N., and Kost, A. A. (1971). New reaction of adenine and cytosine derivatives, potentially useful for nucleic acid modification. Tetrahedron Lett. 22, 1993-1996. Kohwi, Y., and Kohwi-Shigematsu, T. (1988). Magnesium iondependent triple-helix structure formed by homopurine-homopyrimidine sequences in supercoiled plasmid DNA. Proc. Natl. Acad. Sci. USA 853781-3785.

complex at atomic resolution.

Na-

of DNA with nu-

Phi-Van, L., von Kries, J. P., Ostertag, W., and Stratling, W. The chicken lysozyme 5’matrix attachment region increases tion from a heterologous promoter in heterologous cells and position effectson the expression of transfected genes. Mol. 70. 2302-2307.

H. (1990). transcripdampens Cell. Biol.

Sander, M.. and Hsieh, T.-S. (1985). Drosophila topoisomerase II double-strand DNA cleavage: analysis of DNA sequence homology at the cleavage site. Nucl. Acids Res. 13, 1057-1072.

Kohwi-Shigematsu, T., and Kohwi, Y. (1985). Poly(dG)-poly(dC) sequences, under torsional stress, induce an altered DNA conformation upon neighboring DNA sequences. Cell 43, 199-206.

Singh, H., LeBowitz, J. H., Baldwin, A. S., Jr., and Sharp, P. A. (1988). Molecular cloning of an enhancer binding protein: isolation by screening of an expression library with a recognition site DNA. Cell 52.415423.

Kohwi-Shigematsu, T., and Kohwi, Y. (1990). Torsional stress stabilizes extended base unpairing in suppressor sites flanking immunoglobulin heavy chain enhancer. Biochemistry 29, 9551-9560.

Smith, D. B., and Johnson, K. S. (1988). Single-step purification of polypeptides expressed in Escherichia co/i as fusions with glutathione S-transferase. Gene 67, 31-40.

Kohwi-Shigematsu, T., Gelinas, R., and Weintraub, H. (1983). Detection of an altered DNA conformation at specific sites in chromatin and supercoiled DNA. Proc. Natl. Acad. Sci. USA 80, 4369-4393.

Solomon, M. J., Strauss, F., and Varshavsky, A. (1986). A mammalian high mobility group protein recognizes any stretch of six A.T base pairs in duplex DNA. Proc. Natl. Acad. Sci. USA 83, 1276-1280.

Koo, H.-S., and Crothers, D. M. (1967). Chemical determinantsof DNA bending at adenine-thymine tracts. Biochemistry 26, 3745-3748.

Spitzner, J. R., and Muller, M. T. (1988). A consensus sequence for cleavage by vertebrate DNA topoisomerase Il. Nucl. Acids Res. 76, 5533-5556.

Kopka, M. L., Yoon, C., Goodsell, D., Pjura, P., and Dickerson, R. E. (1985). The molecular origin of DNA-drug specificity in netropsin and distamycin. Proc. Natl. Acad. Sci. USA 82, 1376-1380.

Starr, D. B., and Hawley, D. K. (1991). TFIID binds in the minor groove of the TATA box. Cell 67, 1231-l 240.

Kozak, M. (1984). Compilation and analysis of sequences upstream from the translational start site in eukaryotic mRNAs. Nucl. Acids Res. 12, 857-672.

Stief, A., Winter, D. M., Stratling, W. H., and Sippel, A. E. (1989). A nuclear DNA attachment element mediates elevated and positionindependent gene activity. Nature 347, 343-345.

Kumamoto, A. A., Miller, W. G., and Gunsalus, R. P. (1987). Escherichia co/i tryptophan repressor binds multiple sites within the aro H and trp operators. Genes Dev. 7, 556-564.

Stockhaus. J., Eckes, P., Blau, A., Schell, J., and Willmitzer, L. (1987). Organ-specific and dosage-dependent expression of a leaf/stem specific gene from potato after tagging and transfer into potato and tobacco plants. Nucl. Acids Res. 75, 3479-3491.

Laemmli, U. K. (1970). Cleavage of structural proteins during the assembly of the head of bacteriophage. Nature 227, 660-685. Lee, D. K., Horikoshi, M., and Roeder, R. G. (1991). Interaction ofTFllD in the minor groove of the TATA element. Cell 67, 1241-1250. Levy-Wilson, B., and Fortier, C. (1989). The limits of the DNAase I-sensitive domain of the human apolipoprotein B gene coincide with the locations of the chromosomal anchorage loops and define the 5’ and 3’ boundaries of the gene. J. Biol. Chem. 264, 21196-21204. Maniatis, T., Fritsch, E. F., and Sambrook, J. (1962). Molecular Cloning: A Laboratory Manual (Cold Spring Harbor, New York: Cold Spring Harbor Laboratory). Matsudaira, P. (1987). Sequence from picomole quantities of protein electroblotted onto polyvinylidene difluoride membranes. J. Biol. Chem. 262, 10035-10038. Maxam, A. M., and Gilbert, W. (1977). A new method for sequencing DNA. Proc. Natl. Acad. Sci. USA 74, 560-564. Melton, D.A., Krieg, P. A., Rebagliati, M. R., Maniatis, T., Zinn, K., and Green, M. R. (1984). Efficient in vitro synthesis of biologically active RNA and RNA hybridization probes from plasmids containing a bacteriophage SP6 promoter. Nucl. Acids Res. 12, 7035-7056. Mielke, C., Kohwi, Y., Kohwi-Shigematsu, T., and Bode, J. (1990). Hierarchical binding of DNA fragments derived from scaffold-attached regions: correlation of properties in vitro and function in viva. Biochemistry 29, 7475-7485. Mirkovitch, J., Mirault, M.-E., and Laemmli, U. K. (1984). Organization of the higher-order chromatin loop: specific DNA attachment sites on nuclear scaffold. Cell 39, 223-232. Muller, M.T., Spitzner, J. R., DiDonato, J. A., Mehta,V. B., andTsutsui, K. (1968). Single-strand DNA cleavages by eukaryotic topoisomerase Il. Biochemistry 27, 8369-8379. OtwinowskLZ., Schevitz, R. W., Zhang, R.-G., Lawson, C. L., Joachimiak, A., Marmorstein, R. Q., Luisi, B. F., and Sigler, P. B. (1988). Crystal

Struhl, K. (1989). Helix-turn-helix, zinc-finger, and leucine-zipper motifs for eukaryotic transcriptional regulatory proteins. Trends Biochem. Sci. 14, 137-140. Tuan, D., Solomon, W., Li, Cl., and London, I. M. (1985). The “p-like globin” gene domain in human erythroid cells. Proc. Natl. Acad. Sci. USA 82,6384-6388. Umek, R. M., and Kowalski, D. (1988). The ease of DNA unwinding as a determinant of initiation at yeast replication origins. Cell 52, 559567. Vinson, C. R., La Marco, K. L., Johnson, P. F., Landschulz, W. H., and McKnight, S. L. (1988). In situ detection of sequence-specific DNA bindingactivityspecified by a recombinant bacteriophage. Genes Dev. 2.801-806. von Kries, J. P., Phi-Van, L., Diekmann, S., and Stratling, W. H. (1990). A non-curved chicken lysozyme 5’matrix attachment site is 3’followed by a strongly curved DNA sequence. Nucl. Acids Res. 18,3881-3885. von Kries, J. P., Buhrmester, H., and Stratling, W. H. (1991). A matrix/ scaffold attachment region binding protein: identification, purification, and mode of binding. Cell 64, 123-135. Weeks, K. M., and Crothers, D. M. (1991). RNA recognition by Tatderived peptides: interaction in the major groove? Cell 66, 577-588. Yang, C.-C., and Nash, H. A. (1989). The interaction of E. coli IHF protein with its specific binding sites. Cell 57, 869-880. GenBank

Accession

The accession M97287.

Number

number

for the sequence

reported

in this paper is

SAR DNA-binding protein with unusual binding site recognition.

A human cDNA was cloned that encodes a DNA-binding protein (SATB1) that is expressed predominantly in thymus and binds selectively to the nuclear matr...
3MB Sizes 0 Downloads 0 Views