Structural Insight into the Binding of TGIF1 to SIN3A PAH2

Structural Insight into the Binding of TGIF1 to SIN3A PAH2

1. Introduction

TGIF1 (TG-interacting factor 1 or TGFβ-induced factor homeobox 1) is a transcription factor (TF) belonging to the three-amino acid loop extension (TALE) subfamily of the homeodomain (HD) protein family [1,2,3]. Through regulating the transcription of numerous genes, TGIF1 plays crucial roles in various aspects of human development and function, such as the development of fetal face and brain, hematopoiesis, bone formation and energy metabolism [4,5,6,7,8]. Gene mutations that impair TGIF1 function are associated with holoprosencephaly (HPE), a genetic disease with fetal craniofacial malformation [9,10,11,12,13]. In the last decade, increasing evidence established a close coherence of TGIF1 expression with the progression of various cancers [14,15,16,17,18,19].TGIF1 functions mainly as a DNA-binding transcription repressor that interacts with general corepressors, including CtBP1/2, Sin3A and histone deacetylases (HDACs) [20,21,22]. TGIF1 was initially identified to contain 272 amino acid residues (isoform a, high-abundant form), while the following study further identified a variety of TGIF1 isoforms resulting from mRNA alternative splicing, with the longest one containing 401 residues (isoform c) [23]. Three major functional domains including HD and repressive domains 1 and 2 (RD1/2) have been demonstrated in TGIF1 (Figure 1A) [20,21,22,24]. TGIF1 HD binds to gene promoters containing a consensus sequence of 5’-TGTCA-3′, and thousands of genes have been identified to be directly regulated by TGIF1 [6,25,26,27]. Following DNA binding via the HD, TGIF1 recruits corepressor CtBP1/2 through RD1 containing a conserved “PLDLS” motif [20] and/or Sin3A and HDACs through RD2 [21,22], leading to the change of chromatin state and thereby transcription repression. Moreover, some functions of TGIF1 independent of DNA binding were also evidenced in TGFβ, nuclear receptor and Wnt signaling pathways [3,17,28,29,30].RD2, which is located at the C-terminal of TGIF1 following the HD, contains more than 150 residues, wherein residues M256–D375 (numbered according to isoform c hereafter) are essentially disordered in structure as revealed by previous NMR studies [31,32]. Functionally, RD2 can be at least divided into two parts, previously denoted as RD2a and RD2b [24]. RD2a was suggested to mediate the interactions of TGIF1 with various proteins including Smad2 [3], PHRF1 [33], Itch [34] and Axin1/2 [17] in a cell to regulate TGFβ and Wnt signaling. RD2b harbors a phosphodegron targeted by Fbxw7 for ubiquitin-mediated degradation of TGIF1 [35]. However, the binding site of transcription corepressor SIN3A in TGIF1 RD2 remains obscure, as two independent studies, respectively, reported that RD2a and RD2b can bind to SIN3A [21,22]. Thus, the mechanism for TGIF1 to recruit SIN3A is still unknown.Sin3A is a general transcription co-regulatory factor and commonly acts as a scaffold protein assembling the DNA-binding TFs, HDACs and other chromatin-modifying enzymes into a large complex, which facilitates transcription repression through changing the chromatin compaction [36,37]. In recent years, emerging roles of SIN3A in cancer development have been revealed [38]. Sin3A contains five defined domains including three paired amphipathic helix (PAH) domains, an HDAC-interaction domain (HID) and a C-terminal domain. Therein, PAH2 is a major docking site of diverse TFs for mediating SIN3A-targeted regulation of specific gene transcription [39,40,41,42]. Peptides and small molecule inhibitors for SIN3A PAH2 have been designed to block its binding with TFs and inhibit triple-negative breast cancer (TNBC) cell metastasis [43,44,45]. The complex structures of SIN3A PAH2 binding with Sin3A-interaction domains (SIDs) of Mad1, HBP1 and Pf1 have been solved, respectively [40,46,47]. In these structures, SIN3A PAH2 adopts a similar four-helix bundle fold with a hydrophobic cleft, into which the single-helix SID of Mad1, HBP1 or Pf1 can insert. However, as the sequence conservation of the known SIDs is low, it is hard to predict the location of the SID in other TFs, such as TGIF1, through sequence analysis.

In this study, we applied NMR spectroscopy, structure stimulation and biochemical methods to investigate the structural basis of TGIF1 for binding with SIN3A PAH2. The results reveal that the TGIF1 RD2 truncation covering residues M256–A401 was structurally disordered and adopted a monomer in solution. Therein, the region harboring residues F376–E394 contributed to the binding of SIN3A as the SID. Structure stimulation manifested that TGIF1 SID formed an amphipathic helix binding into the hydrophobic cleft of SIN3A PAH2. Furthermore, F379, L382 and V383 were identified as the key residues of TGIF1 SID for SIN3A-PAH2 binding through site-directed mutagenesis combined with yeast two-hybrid (Y2H) assay. Interestingly, although TGIF1 RD2 was not observed to dimerize in solution, Y2H assay indicated that it can dimerize via the SID in a cell, implying a potential dual role of TGIF1 SID and a relationship between dimerization and SIN3A binding of TGIF1.

4. Materials and Methods

4.1. Production of Recombinant Proteins

The coding DNA fragments of TGIF1256–375, TGIF1256–401 and SIN3A PAH2 (residues S295–N384) were cloned from human HeLa cell and inserted into a modified pET32 vector, which allows a recombinant expression of individual protein fused with a purification tag of “MHHHHHHSSGLVPRGS”. After DNA sequencing, the plasmids containing the coding sequences were respectively transformed into E. coli Rosetta (DE3) cell for inducing protein expression using a similar method as described previously [32].The purification of TGIF1256–375 was carried out as previously described [32]. For TGIF1256–401 purification, following collection by centrifugation, E. coli cells were resuspended in solution A (20 mM Tris-HCl, 500 mM NaCl, 1 mM phenylmethylsulfonyl fluoride (PMSF), pH 7.8) for lysis by sonication. After high-speed centrifugation, the supernatant was discarded, and the pellet was washed two times with solution B (20 mM Tris-HCl, 500 mM NaCl, 2 mM EDTA, pH 7.8), followed by two times of washing with Milli-Q water. Subsequently, the pellet was solved using solution C (20 mM Tris-HCl, 500 mM NaCl, 6 M guanidine hydrochloride, pH 7.8) and clarified by high-speed centrifugation and filtration. The clarified solution was loaded onto an ÄKTAxpress™ chromatography system (GE Healthcare, Boston, MA, US) equipped with a Ni-affinity column (HisTrap IMAC HP™ column, 5 mL). The TGIF1256–401 protein was eluted with solution D (20 mM Tris-HCl, 500 mM NaCl, 6 M guanidine hydrochloride, 250 mM imidazole, at pH 7.8) and then dialyzed against solution E (20 mM Tris-HCl, 500 mM NaCl, pH 7.8) for several times. The purification tag was removed by thrombin through incubation at 20 °C for 3 h. The resulting TGIF1256–401 with only two additional residues of “GS” at its N-terminus was further purified through gel filtration chromatography, using an NGC chromatography system (Bio-Rad, Hercules, CA, US) equipped with a HiLoad 26/60 Superdex 75 column (GE Healthcare). The purified protein was concentrated to a final concentration of 0.4 mM for NMR study in solution F (90% H2O/10% D2O (v/v), 20 mM HEPES, 80 mM NaCl, 2 mM dithiothreitol (DTT), 0.05% NaN3, pH 6.4).

For purification of SIN3A PAH2, the harvested E. coli cells were resuspended in solution G (20 mM Tris-HCl, 100 mM NaCl, 1 mM PMSF, pH 8.0) for lysis by sonication. The supernatant was clarified by centrifugation and filtration and then loaded onto the ÄKTAxpress™ system equipped with a Ni-affinity column (HisTrap IMAC HP™ column, 5 mL). The protein was eluted with solution H (20 mM Tris-HCl, 100 mM NaCl, 250 mM imidazole, at pH 8.0) and dialyzed against solution I (20 mM Tris-HCl, 100 mM NaCl, pH 8.0). Subsequently, the purification tag was removed, and the resulting SIN3A PAH2 was further purified through the gel filtration chromatography mentioned above. The purified protein was exchanged into solution F and concentrated just before NMR titration.

The purities of obtained TGIF1256–375, TGIF1256–401 and SIN3A-PAH2 proteins were assessed to be over 90% by SDS-PAGE, and the MWs were verified through electrospray ionization mass spectrometry (ESI-MS).

4.2. Circular Dichroism

CD spectra of TGIF1256–375 and TGIF1256–401 from 190 to 260 nm were recorded on a Chirascan™ CD spectrometer (Applied Photophysics, Leatherhead, Surrey, UK) using a 0.2 cm path length quartz cell, with a step size of 1 nm and a bandwidth of 1 nm at 25 °C. Measurements were conducted with 10 μM protein in 10 mM KH2PO4, pH 6.5. Each sample was scanned three times, and the obtained spectra were averaged and subtracted with the spectrum of buffer solution (recorded as the baseline) to generate the final spectra.

4.3. Multi-Angle Light Scattering

Multi-angle static light scattering (MALS) analysis of TGIF1256–401 was carried out on a DAWN HELEOS II MALS detector (Wyatt Technology Corp., Santa Barbara, CA, USA) coupled with a SuperdexTM 75 10/300 GL column (GE Healthcare) at 0.5 mL/min at room temperature in a solution of 20 mM HEPES, 80 mM NaCl, 2 mM DTT, 0.05% NaN3, pH 6.4. The concentration of TGIF1256–401 was 0.2 mM. The data were analyzed using ASTRA 7.1 software package (Wyatt Technology Corp.). The weight-average molar mass was calculated according to the theoretical UV extinction coefficient (280 nm) of TGIF1256–401 and using a protein dn/dc value of 0.185 mL/g.

4.4. NMR Experiments

NMR experiments were collected on a Bruker Avance III 850 MHz spectrometer equipped with a cryogenic probe at 293 K (25 °C). The TGIF1256–401 concentration was 0.4 mM. The NMR data were processed using NMRPipe [53] and analyzed using Sparky [54]. NMR titration experiments were performed by mixing 0.1 mM 15N-labeled TGIF1256–375 or TGIF1256–401 with non-labeled SIN3A PAH2 at indicated molar ratios in solution F. After gently shaking for 1 h that allows the binding to reach equilibrium, 1H–15N HSQC spectra were collected.

4.5. Yeast Two-Hybrid

Yeast two-hybrid assays were performed using the Matchmaker Yeast Transformation System (Clontech, Palo Alto, CA, USA). The coding DNA fragments of TGIF1256–375, TGIF1256–401 and SIN3A-PAH2 were inserted into pGADT7 and pGBKT7 vectors, respectively. Yeast AH109 cells were co-transformed with different pairs of pGADT7 and pGBKT7 constructs as indicated and according to the manual. All yeast transformants were grown on SD2 (–Trp/–Leu) medium for transformation success test and SD4 (–Trp/–Leu/–His/–Ade) medium for prey–bait interaction test.

4.6. Molecular Modeling

The structure model of TGIF1 SID was built through de novo modeling using I-TASSER [55] ( (accessed on 19 November 2021)). The sequence of TGIF1 SID (F376–E394) was entered into I-TASSER as input with recommended setting for structure modeling. The first model of TGIF1 SID generated by I-TASSER and the SIN3A-PAH2 structure from PDB (ID: 2L9S.B) were used to build the complex structure model of the two proteins through molecular simulation, which included multiple steps of docking and optimization. First, initial structures for the complex were calculated using ZDOCK [56]. The model with highest score was selected from the calculated models for further optimization using the molecular docking tool ClusPro [57], which, using multiple steps, optimized the binding of the receptor and the ligand by exhaustively sampling the free energy landscape. Ten structural models that were most-populated clusters were generated, out of which the model with the highest score was refined as the final complex structure model. PyMol 2.5 and its related programs were used to analyze the structure and produce the images.

4.7. Sequence Alignment

Sequence alignment was carried out using Clustal Omega ( (accessed on 19 November 2021)) and then rendered using ESPript [58] with default settings for similarity calculations. The aligned sequence IDs of TGIF1 from different vertebrates in NCBI database: Homo sapiens, NP_003235.1; Mus musculus, NP_001157547.1; Columba livia, XP_021139826.1; Gavialis gangeticus, XP_019381338.1; Xenopus laevis, NP_001080420.1; Danio rerio, NP_955861.1. The alignment result of human TGIF1 isoform c is shown because the additional residues in isoform a showed very low sequence similarity with the analyzed TGIF1 homologs from other vertebrates.

5. Conclusions

In conclusion, we demonstrated that TGIF1 utilizes a C-terminal motif (termed SID) ranging from F376 to E394 to bind with SIN3A PAH2. The TGIF1 SID adopts a disordered structure at the apo state, whereas it forms an amphipathic helix upon binding to SIN3A PAH2. In the complex, SIN3A PAH2 adopts a four-helix bundle structure with a deep hydrophobic cleft, into which TGIF1 SID binds through the nonpolar side of the amphipathic helix. The residues F379, L382 and V383 of TGIF1 SID buried in the hydrophobic core of the complex are critical for the binding, which are conserved residues in SIN3A-PAH2 binders. Although recombinant TGIF1256–401 exists as a monomer in solution, homodimerization of TGIF1 through the SID can be found in a Y2H assay, which suggests a dual role of TGIF1 SID and a correlation between homodimerization and SIN3A-PAH2 binding of TGIF1. This study provides insight into the binding mechanism of TGIF1 with SIN3A, improves the understanding of the structure–function relationship of TGIF1 and reinforces the knowledge on the sequence and structure characteristics of SIN3A-PAH2 binders. The results can be widely applied to interpret the function of TGIF1 homologs not only from human but also from other vertebrates, recognize the potential SIN3A-PAH2 binders and design a peptide inhibitor blocking SIN3A–TFs interaction for cancer treatment.