Categories
Uncategorized

The cDNA sequence representing the complete mRNA of the normal or wild-type human CFTR gene is given in Appendix I. Use this to determine the amino acid sequence of the protein.

Data Analysis Part 1 – Mutation
Human cystic fibrosis transmembrane conductance regulator (CFTR) mRNA 
  1. The cDNA sequence representing the complete mRNA of the normal or wild-type human CFTR gene is given in Appendix I.  Use this to determine the amino acid sequence of the protein.

There are several DNA translation tools that you can use for this:

https://web.expasy.org/translate/

https://www.ebi.ac.uk/Tools/st/emboss_transeq/

     Which reading frame gives you the correct protein sequence (remember you only need to translate the top strand)?

  • Identify the protein coding sequence (that is, the region which is translated into protein), and the 3’ and 5’ untranslated regions and complete Table 1. How many amino acids are encoded by this region?

 Table 1.  CFTR mRNA features

FeatureNucleotide positions
3’ UTR 
Coding sequence (CDS) 
5’ UTR 
  • Over 1700 mutations have been identified in the CFTR gene that are associated with cystic fibrosis. Four of the most common ones are listed in Table 2.  

For each of the mutations determine (i) what kind of mutation it is (missense, nonsense) and (ii) what is the effect of the mutation on the protein sequence.

To do this you will need to identity and change the relevant nucleotide(s) in the wild-type cDNA and then use a translation tool to determine the effect on the protein. You can then use programmes designed to align multiple sequences to compare the mutant proteins to the wild-type. The most commonly used is Clustal Omega (https://www.ebi.ac.uk/Tools/msa/clustalo/).

Table 2. Common CFTR mutations

MutationNucleotide position(s)*Mutation typeAmino acid changeProtein domain affected
M1:  G → A420MissenseR117HTMD 1
M2: del CTT1591 -15933 bp deletionDeletion of F508ATP binding domain 1
M3: G → T1694NonsenseG542XPremature termination
M4: G → A1722NonsenseG551DATP binding domain 1

* Refers to the nucleotide position in the cDNA

D. Using the information in Table 3, draw a cartoon of the CFTR protein showing the domain organisation and indicate on this the positions of the amino acid changes identified in the mutants.

Table 3. CFTR proteins domain regions

DomainAmino acids
Transmembrane domain 185 – 303
ATP binding domain 1389 – 670
Regulatory (R) domain639 – 849
Transmembrane domain 2866 – 1147
ATP binding domain 21208 – 1480

E.  A patient was diagnosed with CF; neither parent had CF.  Routine analysis identified a known mutant allele on one chromosome and an unknown mutant allele on the other chromosome. The novel mutation was found to be located between exons 8-11.  RT-PCR analysis was carried out on RNA samples from the patient and both parents, using primers designed to amplify this region (Figure 1B).  The PCR products were analysed by agarose gel electrophoreses (Figure 1A) and then sequenced (Appendix II).

A picture containing diagram

Description automatically generated

Figure 1: (A) Analysis of RT-PCR products of RNA isolated from blood samples of the patient (J28), and his mother (Mo) and father (Fa).  The position of the 500 bp band in the 100 bp marker lane is indicated. (B) Sequences of the forward (F) and reverse (R) primers used in the PCR.

  • Locate the position of the primers on the complete cDNA. Remember you will first need to determine the reverse complement of the R primer in order to do this.
  • What size product is expected for amplification of the wild-type cDNA?
  • How do you interpret the results of the RT-PCR?
  • Compare the wild-type and mutant DNA sequences to determine the nature of the mutation using programmes Clustal Omega https://www.ebi.ac.uk/Tools/msa/clustalo/).
  • Can you speculate how this mutation arose (hint: look at the positions of the exon junctions in the cDNA; Appendix III)?
  • What protein domain is affected by the mutation and in what way?

Appendix I. CFTR cDNA sequence

Note the sequence below is presented in FASTA format, which is a text-based format used in bioinformatics to represent nucleotide or protein sequences using their single-letter abbreviations. The first line denoted by ‘>’ is a comment line and is used to identify or describe the sequence.  

>human_CFTR_wildtype_cDNA_6070 bp

gtagtaggtctttggcattaggagcttgagcccagacggccctagcagggaccccagcgcccgagagaccatgcagaggtcgcctctggaaaaggccagcgttgtctccaaactttttttcagctggaccagaccaattttgaggaaaggatacagacagcgcctggaattgtcagacatataccaaatcccttctgttgattctgctgacaatctatctgaaaaattggaaagagaatgggatagagagctggcttcaaagaaaaatcctaaactcattaatgcccttcggcgatgttttttctggagatttatgttctatggaatctttttatatttaggggaagtcaccaaagcagtacagcctctcttactgggaagaatcatagcttcctatgacccggataacaaggaggaacgctctatcgcgatttatctaggcataggcttatgccttctctttattgtgaggacactgctcctacacccagccatttttggccttcatcacattggaatgcagatgagaatagctatgtttagtttgatttataagaagactttaaagctgtcaagccgtgttctagataaaataagtattggacaacttgttagtctcctttccaacaacctgaacaaatttgatgaaggacttgcattggcacatttcgtgtggatcgctcctttgcaagtggcactcctcatggggctaatctgggagttgttacaggcgtctgccttctgtggacttggtttcctgatagtccttgccctttttcaggctgggctagggagaatgatgatgaagtacagagatcagagagctgggaagatcagtgaaagacttgtgattacctcagaaatgattgaaaatatccaatctgttaaggcatactgctgggaagaagcaatggaaaaaatgattgaaaacttaagacaaacagaactgaaactgactcggaaggcagcctatgtgagatacttcaatagctcagccttcttcttctcagggttctttgtggtgtttttatctgtgcttccctatgcactaatcaaaggaatcatcctccggaaaatattcaccaccatctcattctgcattgttctgcgcatggcggtcactcggcaatttccctgggctgtacaaacatggtatgactctcttggagcaataaacaaaatacaggatttcttacaaaagcaagaatataagacattggaatataacttaacgactacagaagtagtgatggagaatgtaacagccttctgggaggagggatttggggaattatttgagaaagcaaaacaaaacaataacaatagaaaaacttctaatggtgatgacagcctcttcttcagtaatttctcacttcttggtactcctgtcctgaaagatattaatttcaagatagaaagaggacagttgttggcggttgctggatccactggagcaggcaagacttcacttctaatggtgattatgggagaactggagccttcagagggtaaaattaagcacagtggaagaatttcattctgttctcagttttcctggattatgcctggcaccattaaagaaaatatcatctttggtgtttcctatgatgaatatagatacagaagcgtcatcaaagcatgccaactagaagaggacatctccaagtttgcagagaaagacaatatagttcttggagaaggtggaatcacactgagtggaggtcaacgagcaagaatttctttagcaagagcagtatacaaagatgctgatttgtatttattagactctccttttggatacctagatgttttaacagaaaaagaaatatttgaaagctgtgtctgtaaactgatggctaacaaaactaggattttggtcacttctaaaatggaacatttaaagaaagctgacaaaatattaattttgcatgaaggtagcagctatttttatgggacattttcagaactccaaaatctacagccagactttagctcaaaactcatgggatgtgattctttcgaccaatttagtgcagaaagaagaaattcaatcctaactgagaccttacaccgtttctcattagaaggagatgctcctgtctcctggacagaaacaaaaaaacaatcttttaaacagactggagagtttggggaaaaaaggaagaattctattctcaatccaatcaactctatacgaaaattttccattgtgcaaaagactcccttacaaatgaatggcatcgaagaggattctgatgagcctttagagagaaggctgtccttagtaccagattctgagcagggagaggcgatactgcctcgcatcagcgtgatcagcactggccccacgcttcaggcacgaaggaggcagtctgtcctgaacctgatgacacactcagttaaccaaggtcagaacattcaccgaaagacaacagcatccacacgaaaagtgtcactggcccctcaggcaaacttgactgaactggatatatattcaagaaggttatctcaagaaactggcttggaaataagtgaagaaattaacgaagaagacttaaaggagtgcttttttgatgatatggagagcataccagcagtgactacatggaacacataccttcgatatattactgtccacaagagcttaatttttgtgctaatttggtgcttagtaatttttctggcagaggtggctgcttctttggttgtgctgtggctccttggaaacactcctcttcaagacaaagggaatagtactcatagtagaaataacagctatgcagtgattatcaccagcaccagttcgtattatgtgttttacatttacgtgggagtagccgacactttgcttgctatgggattcttcagaggtctaccactggtgcatactctaatcacagtgtcgaaaattttacaccacaaaatgttacattctgttcttcaagcacctatgtcaaccctcaacacgttgaaagcaggtgggattcttaatagattctccaaagatatagcaattttggatgaccttctgcctcttaccatatttgacttcatccagttgttattaattgtgattggagctatagcagttgtcgcagttttacaaccctacatctttgttgcaacagtgccagtgatagtggcttttattatgttgagagcatatttcctccaaacctcacagcaactcaaacaactggaatctgaaggcaggagtccaattttcactcatcttgttacaagcttaaaaggactatggacacttcgtgccttcggacggcagccttactttgaaactctgttccacaaagctctgaatttacatactgccaactggttcttgtacctgtcaacactgcgctggttccaaatgagaatagaaatgatttttgtcatcttcttcattgctgttaccttcatttccattttaacaacaggagaaggagaaggaagagttggtattatcctgactttagccatgaatatcatgagtacattgcagtgggctgtaaactccagcatagatgtggatagcttgatgcgatctgtgagccgagtctttaagttcattgacatgccaacagaaggtaaacctaccaagtcaaccaaaccatacaagaatggccaactctcgaaagttatgattattgagaattcacacgtgaagaaagatgacatctggccctcagggggccaaatgactgtcaaagatctcacagcaaaatacacagaaggtggaaatgccatattagagaacatttccttctcaataagtcctggccagagggtgggcctcttgggaagaactggatcagggaagagtactttgttatcagcttttttgagactactgaacactgaaggagaaatccagatcgatggtgtgtcttgggattcaataactttgcaacagtggaggaaagcctttggagtgataccacagaaagtatttattttttctggaacatttagaaaaaacttggatccctatgaacagtggagtgatcaagaaatatggaaagttgcagatgaggttgggctcagatctgtgatagaacagtttcctgggaagcttgactttgtccttgtggatgggggctgtgtcctaagccatggccacaagcagttgatgtgcttggctagatctgttctcagtaaggcgaagatcttgctgcttgatgaacccagtgctcatttggatccagtaacataccaaataattagaagaactctaaaacaagcatttgctgattgcacagtaattctctgtgaacacaggatagaagcaatgctggaatgccaacaatttttggtcatagaagagaacaaagtgcggcagtacgattccatccagaaactgctgaacgagaggagcctcttccggcaagccatcagcccctccgacagggtgaagctctttccccaccggaactcaagcaagtgcaagtctaagccccagattgctgctctgaaagaggagacagaagaagaggtgcaagatacaaggctttagagagcagcataaatgttgacatgggacatttgctcatggaattggagctcgtgggacagtcacctcatggaattggagctcgtggaacagttacctctgcctcagaaaacaaggatgaattaagtttttttttaaaaaagaaacatttggtaaggggaattgaggacactgatatgggtcttgataaatggcttcctggcaatagtcaaattgtgtgaaaggtacttcaaatccttgaagatttaccacttgtgttttgcaagccagattttcctgaaaacccttgccatgtgctagtaattggaaaggcagctctaaatgtcaatcagcctagttgatcagcttattgtctagtgaaactcgttaatttgtagtgttggagaagaactgaaatcatacttcttagggttatgattaagtaatgataactggaaacttcagcggtttatataagcttgtattcctttttctctcctctccccatgatgtttagaaacacaactatattgtttgctaagcattccaactatctcatttccaagcaagtattagaataccacaggaaccacaagactgcacatcaaaatatgccccattcaacatctagtgagcagtcaggaaagagaacttccagatcctggaaatcagggttagtattgtccaggtctaccaaaaatctcaatatttcagataatcacaatacatcccttacctgggaaagggctgttataatctttcacaggggacaggatggttcccttgatgaagaagttgatatgccttttcccaactccagaaagtgacaagctcacagacctttgaactagagtttagctggaaaagtatgttagtgcaaattgtcacaggacagcccttctttccacagaagctccaggtagagggtgtgtaagtagataggccatgggcactgtgggtagacacacatgaagtccaagcatttagatgtataggttgatggtggtatgttttcaggctagatgtatgtacttcatgctgtctacactaagagagaatgagagacacactgaagaagcaccaatcatgaattagttttatatgcttctgttttataattttgtgaagcaaaattttttctctaggaaatatttattttaataatgtttcaaacatatataacaatgctgtattttaaaagaatgattatgaattacatttgtataaaataatttttatatttgaaatattgactttttatggcactagtatttctatgaaatattatgttaaaactgggacaggggagaacctagggtgatattaaccaggggccatgaatcaccttttggtctggagggaagccttggggctgatgcagttgttgcccacagctgtatgattcccagccagcacagcctcttagatgcagttctgaagaagatggtaccaccagtctgactgtttccatcaagggtacactgccttctcaactccaaactgactcttaagaagactgcattatatttattactgtaagaaaatatcacttgtcaataaaatccatacatttgtgtgaaa

Appendix II. Sequence of RT-PCR products

>Wild-type_RT-PCR product

ctgcgcatggcggtcactcggcaatttccctgggctgtacaaacatggtatgactctcttggagcaataaacaaaatacaggatttcttacaaaagcaagaatataagacattggaatataacttaacgactacagaagtagtgatggagaatgtaacagccttctgggaggagggatttggggaattatttgagaaagcaaaacaaaacaataacaatagaaaaacttctaatggtgatgacagcctcttcttcagtaatttctcacttcttggtactcctgtcctgaaagatattaatttcaagatagaaagaggacagttgttggcggttgctggatccactggagcaggcaagacttcacttctaatggtgattatgggagaactggagccttcagagggtaaaattaagcacagtggaagaatttcattctgttctcagttttcctggattatgcctggcaccattaaagaaaatatcatctttggtgtttcctatgatg

>Mutant_RT-PCR product

ctgcgcatggcggtcactcggcaatttccctgggctgtacaaacatggtatgactctcttggagcaataaacaaaatacaggatttcttacaaaagcaagaatataagacattggaatataacttaacgactacagaagtagtgatggagaatgtaacagccttctgggaggagacttcacttctaatggtgattatgggagaactggagccttcagagggtaaaattaagcacagtggaagaatttcattctgttctcagttttcctggattatgcctggcaccattaaagaaaatatcatctttggtgtttcctatgatg

Appendix III. Exons positions

exonnucleotidesexonnucleotidesexonnucleotides
exon 11 – 123exon 101280 – 1462exon 193059 – 3209
exon 2124 – 234exon 111463 – 1654exon 203210 – 3437
exon 3235 – 343exon 121655 – 1749exon 213438 – 3538
exon 4344 – 559exon 131750 – 1836exon 223539 – 3787
exon 5560 – 649exon 141837 – 2560exon 233788 – 3943
exon 6650 – 813exon 152561 – 2689exon 243944 – 4033
exon 7814 – 939exon 162690 – 2727exon 254034 – 4206
exon 8940 – 1186exon 172728 – 2978exon 264207 – 4312
exon 91187 – 1279exon 182979 – 3058exon 274313 – 6070

Leave a Reply

Your email address will not be published. Required fields are marked *