24.6: Genetic Mapping - Biology

24.6:  Genetic Mapping - Biology

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

Because the frequency of recombination between two loci (up to 50%) is roughly proportional to the chromosomal distance between them, we can use recombination frequencies to produce genetic maps of all the loci along a chromosome and ultimately in the whole genome. The units of genetic distance are called map units (mu) or centiMorgans (cM), in honor of Thomas Hunt Morgan by his student, Alfred Sturtevant, who developed the concept. Geneticists routinely convert recombination frequencies into cM: the recombination frequency in percent is approximately the same as the map distance in cM. For example, if two loci have a recombination frequency of 25% they are said to be ~25cM apart on a chromosome (Figure (PageIndex{9})). Note: this approximation works well for small distances (RF<30%) but progressively fails at longer distances because the RF reaches a maximum at 50%. Some chromosomes are >100 cM long but loci at the tips only have an RF of 50%. The method for mapping of these long chromosomes is shown below.

Note that the map distance of two loci alone does not tell us anything about the orientation of these loci relative to other features, such as centromeres or telomeres, on the chromosome.

Map distances are always calculated for one pair of loci at a time. However, by combining the results of multiple pairwise calculations, a genetic map of many loci on a chromosome can be produced (Figure (PageIndex{10})). A genetic map shows the map distance, in cM, that separates any two loci, and the position of these loci relative to all other mapped loci. The genetic map distance is roughly proportional to the physical distance, i.e. the amount of DNA between two loci. For example, in Arabidopsis, 1.0 cM corresponds to approximately 150,000bp and contains approximately 50 genes. The exact number of DNA bases in a cM depends on the organism, and on the particular position in the chromosome; some parts of chromosomes (“crossover hot spots”) have higher rates of recombination than others, while other regions have reduced crossing over and often correspond to large regions of heterochromatin.

When a novel gene or locus is identified by mutation or polymorphism, its approximate position on a chromosome can be determined by crossing it with previously mapped genes, and then calculating the recombination frequency. If the novel gene and the previously mapped genes show complete or partial linkage, the recombination frequency will indicate the approximate position of the novel gene within the genetic map. This information is useful in isolating (i.e. cloning) the specific fragment of DNA that encodes the novel gene, through a process called map-based cloning.

Genetic maps are also useful to track genes/alleles in breeding crops and animals, in studying evolutionary relationships between species, and in determining the causes and individual susceptibility of some human diseases.

Genetic maps are useful for showing the order of loci along a chromosome, but the distances are only an approximation. The correlation between recombination frequency and actual chromosomal distance is more accurate for short distances (low RF values) than long distances. Observed recombination frequencies between two relatively distant markers tend to underestimate the actual number of crossovers that occurred. This is because as the distance between loci increases, so does the possibility of having a second (or more) crossovers occur between the loci. This is a problem for geneticists, because with respect to the loci being studied, these double-crossovers produce gametes with the same genotypes as if no recombination events had occurred (Figure (PageIndex{11})) – they have parental genotypes. Thus a double crossover will appear to be a parental type and not be counted as a recombinant, despite having two (or more) crossovers. Geneticists will sometimes use specific mathematical formulae to adjust large recombination frequencies to account for the possibility of multiple crossovers and thus get a better estimate of the actual distance between two loci.

Introduction of Drug Metabolism and Overview of Disease Effect on Drug Metabolism

Protein Therapeutics

Therapeutic proteins are extensively used in the treatment of cancer, HIV, and other diseases. Monoclonal antibodies, IFNs, and cytokines are examples of some of the macromolecular therapeutic proteins. Proteins are not good substrates for CYP enzymes and are generally cleared by renal filtration or degraded to smaller peptides or amino acids in several tissues by circulating phagocytic cells or by their target antigen-containing cells ( Keizer et al., 2010 ). In addition, antibodies and endogenous immunoglobulins can sometimes be protected from degradation by binding to protective receptors [the neonatal Fc-receptor (FcRn)], which explains their long elimination half-lives (up to 4 weeks).

FcRn recycles both albumin and IgG thereby circumventing each from being degraded and extending their half-life ( Sand et al., 2014 ). The fusion of a therapeutic protein with albumin enhances the half-life of the therapeutic protein by taking advantage of the recycling of human albumin by FcRn receptors. A deficiency in FcRn receptors due to a mutation (β2-microglobulin gene) can result in decreased plasma concentrations of IgG and albumin ( Wani et al., 2006 ). This disorder is known as familial hypercatabolic hypoproteinemia, and the hypercatabolism (high clearance) of IgG and albumin can result in lower concentrations of albumin fusion proteins or therapeutic antibodies ( Kim et al., 2007 ). Disorders such as this are likely to influence the pharmacokinetics of therapeutic antibodies and albumin-fused therapeutic proteins.

Use Y-DNA, surname histories, and geographical analysis tools to identify your genetic homeland and facilitate further research and visits. Genealogy maps formatted for mobile and tablet platforms as well as traditional computers.

Introducing Genetic Homeland is a set of online tools for enhancing genealogy research by geographically pinpointing the latitude and longitude of historical records and events. If you would like to learn the specific geographical place where your ancestors came from, try our web applications which enable you to generate data maps in real-time to see the spatial relationship between multiple surnames and datasets.

Beyond just geographically plotting historical records like tax rolls and censuses, we are also digitizing and geocoding non-traditional locations like castles, clan histories, and Y-DNA signatures.

Web query tool to search our geo-coded database of genetic and traditional genealogy records with dynamic mapping and triangulation across multiple data sources.

California Consumer Privacy Act (CCPA) information: we do not buy or sell any personal information. See our Privacy Policy for more information.


We report on the joint analysis of inter-individual variation in the levels of DNA methylation, total and allelic expression, and DNA sequence of 62 healthy parents of 31 parent-child trios of European descent. Here, we start by introducing each data set individually before discussing the relationships among them.

DNA methylation assays

DNA methylation was assayed in forearm skin fibroblast samples using the Illumina 450 K assay (Materials and methods). For each sample, methylation was measured at approximately 485,000 CpG sites, but we only considered the approximately 392,000 sites uniquely mapped in autosomes and containing no known SNPs. Methylation levels are measured in populations of diploid cells using beta values [18], which range from 0 (no methylation) to 1 (complete methylation of the two alleles). Methylation measurements were highly replicable, with the Pearson correlation coefficient between beta values of two replicates exceeding 0.99 in each of three pairs of biological replicates, while the average pairwise correlation coefficient between methylation from different samples levels ranges around 0.95 Additional file 1). Surrogate Variable Analysis [19] was used to identify possible batch effects accounting for inter-individual methylation variation but none were detected, suggesting that the observed variation may mostly be due to stochastic, environmental, or genetic effects.

The Illumina 450 K assay includes both type I probes utilizing two query probes per CpG locus (largely concentrated around genes’ TSSs), and type II probes utilizing a single probe per locus (dispersed somewhat more uniformly across the genome see Materials and methods). The distributions of methylation beta values differ for type I and type II probes due to their localization biases but both are bimodal, with modes corresponding to CpG sites that are unmethylated in most cells of the sample (hypomethylated), and those that are methylated in most cells of the sample (hypermethylated) (Figure 1A (type II probes) Additional file 2A (type I probes)). Consistent with previous reports [7], hypomethylated sites are mainly located in CpG islands and within 1.5 kb of the TSS of a gene (53% of probes with mean beta value <0.3 are located near a TSS versus 34% of all probes in the case of CpG islands, it is 60% versus 32%), whereas hypermethylated sites are generally located in the rest of the genome (distal intergenic and gene body regions).

Fibroblast methylation beta values are bimodal and the two modes show different breakdown in terms of CpG islands and genes. Distribution of methylation beta values in type II probes across the genome, partitioned by position relative to (A) CpG islands (with a shore defined by Illumina as less than 2 kb from an annotated CpG island, a shelf as 2 to 4 kb, and open sea as more than 4 kb) and (B) annotated genes.

Hypomethylated CpG sites are preferentially located in active regulatory regions characterized by DHS and H3K4me3, as measured by the ENCODE consortium in fibroblast cell lines [20] (Figure 2A (type II probes) Additional file 3A (type I probes)). Of hypomethylated CpG sites, 59% overlap with a DHS peak in the BJ foreskin fibroblast line, and 72% with an H3K4me3 peak. This is approximately twice the fraction seen among all CpG sites (29% and 34%, respectively). On the contrary, hypermethylated sites show a considerable overlap with H3K36me3, an intragenic marker of active transcription [21], with 19% of sites with mean beta >0.7 overlapping with a peak for this mark, compared to 9% among all sites. However, 62% of hypermethylated sites overlap none of the features considered in our analyses. Consistent with observations of low methylation in regions of DHS and active histone marks, genes with high expression levels show considerably lower methylation in the region proximal to the TSS (up to 1,500 bp from the TSS) and higher methylation in the gene body region compared to genes with lower average expression levels (Figure 3A Additional file 4A), with probes adjacent to genes in the top quartile of expression having mean beta <0.3 81% of the time and mean beta >0.7 only 11%. Those in the lowest quartile still have a plurality of hypomethylated probes near the TSS, but with numbers considerably diminished, that is, 42% hypomethylated versus 30% hypermethylated.

Mean and variance of beta values of CpG probes associate with several genome marks. Proportion of type II CpG probes falling in various types of genomics regions identified by ENCODE, partitioned by (A) CpG probe mean beta value and (B) percentile of beta value standard deviation (Std. dev.). All data types, except for 28-way conservation, are derived from broad peaks in BJ human foreskin fibroblast cells.

The mean and variance of beta values of CpG probes near transcription start sites depend on the gene’s expression level. Mean (A) and standard deviation (B) of type II CpG probes with respect to their position relative to TSSs of annotated genes. Each green dot corresponds to a CpG probe, and the four lines show the running median for probes based on the quartile of the expression level (from RNA-seq in four individuals) of the gene they are associated with.

We examined the levels of inter-individual variation of methylation probes, finding a drop in variation of probes located within 1,500 bp of a TSS annotated for an actively expressed gene (Figure 3B Additional file 4B), with only 11% of probes near the TSS of a top quartile expression gene also being in the top quartile of methylation variation, compared to 30% for CpG sites adjacent to the TSS of a bottom quartile expression gene. These results were corroborated by the finding that sites with low inter-individual methylation variation were enriched for DHS and H3K4me3, and, to a lesser degree, sequence conservation (Figure 2B Additional file 3B).

On the contrary, highly variable CpG probes (top 25%, standard deviation >0.0932) are usually located far away from the TSS (either in intergenic regions or in the gene body), or are located near the TSS of genes with low expression in fibroblasts and generally lack regulatory or evolutionary marks of function. The majority of these CpG sites show a unimodal distribution (Additional file 5). Genes whose TSS regions contain highly variable CpG probes were enriched for Gene Ontology (GO) terms related to multicellular organismal development (Additional file 6, worksheet 1), compared to the full set of genes having at least one CpG probe in the TSS region. Unexpectedly, extremely variable CpG probes (top 5%, standard deviation >0.15) show a marked increase in their overlap with DHS and H3K4me3 marks. Genes collocated with these CpG probes are even more strongly enriched for having functions related to development, and include a large number of genes from the HOX clusters (see Discussion).

Gene expression analysis

RNA expression levels for the 62 individuals were measured using the Illumina HumanRef8 microarray platform, giving expression levels for 21,916 probes mapping to a total of 16,952 genes. Only probes that showed moderate to high inter-individual expression variation (standard deviation >0.1127, corresponding to a total of 9,493 genes) were considered for further analyses.

To complement total expression data, allelic expression (AE) was assayed at a set of approximately 900,000 SNP locations dispersed in annotated genes and intergenic regions of all autosomes using hybridization to genotyping arrays, as previously described [22] (see Materials and methods). For each sample and each heterozygous SNP, the ratio of the expression level of each allele is estimated, after normalization to genomic DNA. Of 24,814 known canonical UCSC genes, 81% have at least one assayed SNP within their boundaries. A previously described [23] hidden Markov model was used to reduce the noise in the data and estimate, for each SNP of each sample, the expected true allele expression log-ratio. We note that because this approach does not make use of gene annotation, it is able to detect AE at transcripts that do not, or only partially, overlap annotated genes. However, detection power for genes that are short or contain a small number of SNPs is reduced.

As previously reported for other cell types [22], AE was seen to be widespread. We defined an aeSNP as a SNP whose expected log2 allele ratio is above 0.2 in at least two samples (which corresponds to 5% false discovery rate (FDR) Materials and methods), and found 74,624 aeSNPs within annotated gene regions (corresponding to 15.8% of genic/intronic SNPs), and 25,467 outside (corresponding to 5.4% of intergenic SNPs). aeSNPs were clustered into 3,327 aeRegions (consisting of two or more consecutive aeSNPs), of which more than 80% had full or partial overlap with an annotated gene (Additional file 7), similar to results previously obtained in lymphoblasts [23] (for a full list of aeRegions, see Additional file 8).

Linking methylation and genetic variation

Inter-individual methylation variation is likely due to both genetic and environmental variation between samples. To determine the relationship between genetic variation and CpG methylation levels, we first genotyped our 62 samples (Materials and methods). We then mapped CpG beta values to the imputed genotype at polymorphic sites within 250 kb (absolute value Spearman’s rho above 0.452, which corresponds to a P-value of 6 × 10 -6 and an FDR of 5% (Materials and methods)). A set of 27,486 pairs (Additional file 9) were retained as significant, involving a total of 1,676 mappable CpG probes and 19,561 candidate mQTLs. Whole genome bisulfite sequencing-derived DNA methylation data were generated for four fibroblast cell lines (Additional file 10) and used to validate array methylation detected at mappable CpG loci. We observe high concordance between array- and sequencing-derived methylation for highly variable CpG sites across the four cell lines (254 loci median Pearson correlation coefficient = 0.84).

Remarkably, mappable CpG probes are 1.5-fold enriched in fibroblast DHS regions, but 1.75-fold depleted in highly conserved regions. While CpG probes found within CpG islands are underrepresented in the set of highly variable CpG probes (Figure 1B), CpG island probes are 1.66-fold enriched in mappable probes when compared to the set of highly variable CpG probes. Although mappable CpG probes represent only 1.7% of all highly variable CpG probes, they are approximately four times more frequent among extremely variable CpG probes relative to the set of highly variable probes (Figure 4). Most mappable CpG probes have a distribution of methylation levels that is unimodal, consistent with a moderate effect of genetic variation on methylation. However, bimodality and trimodality are much more frequent among this set of CpG probes than in highly variable CpG sites in general (29.7% and 4.8% of mappable probes, corresponding to 1.5- and 2.6-fold enrichments, respectively Additional file 5). These correspond to cases where the impact of genetic variation is strong enough that classes of methylation levels are clearly distinct.

Variable CpG sites are more likely to be correlated with expression or sequence. Proportion of probes being significantly correlated (5% FDR) to either an mQTL or a gene’s expression levels, by percentile of population standard deviation.

The majority (67%) of mappable CpG probes have a significant mQTL within 5 kb but in 6% of cases the closest significant mQTL lies more than 100 kb away (Figure 5A). Despite their relative rarity, these distal regulators of methylation appear genuine, since even at these larger distances, such pairs are seen much more often than expected by chance (Figure 5B).

mQTLs are preferentially close to CpG sites. (A) Distribution of the mQTL to CpG probe distances for all correlated SNP-CpG pairs at 5% FDR. For each CpG probe, when more than one SNP is significantly correlated, a single one is retained as having either the most significant correlation (gray bars) or being located closest to the CpG probe (black bars). (B) Quantile-quantile plot of SNP/CpG probe Spearman’s rho P-values, grouped by pairwise distances. For each CpG probe included in the mQTL analysis, the most strongly correlated SNP within 250 kb was identified and the P-value obtained included in the set of P-values to be plotted for the distance bin in question. All SNPs in linkage disequilibrium with the selected SNP (R 2 > 0.8) were removed, and the next most strongly correlated SNP was taken, until all SNPs within the range of the CpG probe in question were considered. The number of significant mQTLs decays with distance, but is still more than expected by chance at distances greater than 100 kb.

Linking gene expression and genetic variation (eQTLs)

We sought eQTLs within 250 kb of each gene with variable expression (absolute value Spearman’s rho >0.537, P-value <1.4 × 10 -5 , corresponding to a 5% FDR see Materials and methods). Such eQTLs were found for 420 (4.4%) genes and involved 9,674 SNPs (Additional file 11). This is comparable to previous reports from Veyrieras et al.[24] (6.5% of genes mapping to an eQTL in LCLs, with a larger sample size of 210), but larger than the 2 to 3% seen by Stranger et al.[25] in four different HapMap populations. Consistent with previous reports [25], genes with eQTLs were not enriched for any specific GO annotations. As previously reported [24], eQTLs are most strongly over-represented near the TSS and transcription end site (TES) of genes, with a stronger enrichment within the gene body than outside (Figure 6).

eQTLs are concentrated near the transcription start and end sites of genes. (A) Distribution of the distance between eQTLs and the closest of the boundaries (TSS or TES) of the gene whose expression they correlated with, for all pairs at 5% FDR. When a gene’s expression correlates significantly with more than one SNP, a single SNP is retained as having either the set of genotypes with the most significant correlation (gray bars) or being the most proximal to one of the two gene boundaries (TSS or TES). (B,C) Quantile-quantile plot of SNP/gene P-values, grouped by distances from the SNP to TSS (B) and TES (C). Selection of P-values to be plotted followed a similar procedure to that in Figure 4B, with all SNPs located up to 250 kb on either side of the gene boundaries or within the gene body included for consideration.

These eQTL data were complemented with the mapping of allelic expression ratios in aeRegions to candidate regulatory allelic expression quantitative trait loci (aeQTLs) within 250 kb (Spearman rho >0.452, P-value = 0.00029, corresponding to a 5% FDR see Materials and methods). A total of 95,949 aeQTL-aeRegion pairs were obtained (Additional file 12), involving a total of 2,360 (or 71%) aeRegions and 89,874 candidate aeQTLs (many of which being in linkage disequilibrium with each other). These mappable aeRegions had a significant overlap with 1,452 annotated genes, three times more than the number of genes for which eQTLs were detected. We found 127 genes in both sets, corresponding to a 2.05-fold enrichment. Slightly larger overlap (2.92-fold enrichment) was observed in terms of the SNPs these genes mapped to. This significant but imperfect overlap by two methods is explained by multiple assay-specific factors: aeRegions are dependent on the presence of informative SNPs, are largely driven by primary transcript variation (intronic expressed SNPs) and in general allow for greater statistical power in terms of detecting statistically significant correlated SNPs [26] whereas eQTL mapping (conducted on Illumina expression arrays) assesses both transcriptional and post-transcriptional variation and is skewed towards measuring exon-specific variation [27]. Consequently, these methods can be used to complementarily capture different compartments of expression variation. Roughly 70% of mappable aeRegions have at least one candidate aeQTL within 5 kb of one of their boundaries (Figure 7), which is comparable to results seen using eQTL analysis with known genes.

aeQTLs are concentrated near boundaries of aeRegions. Distribution of the distance between aeRegion boundary and the SNP they correlate with (5% FDR). When an aeRegion’s allelic expression correlates significantly with more than one SNP, a single SNP is retained as having either the set of genotypes with the most significant correlation (gray bars) or being the most proximal to one of the two aeRegion boundaries (black bars).

Linking gene expression to DNA methylation

We identified genes whose expression levels correlated with methylation levels of high-variance CpG probes located within their body or 250 kb on either end (absolute value Spearman’s rho >0.506, P-value <5.132 × 10 -5 , resulting in an FDR of 5% see Materials and methods). This resulted in the identification of 587 genes with correlation to at least one of 1,793 CpG probes (Additional file 13). Extremely variable CpG sites are strongly over-represented amongst sites correlated with gene expression (Figure 4), and correlated CpG sites are 1.6-fold and 3.2-fold enriched, respectively, for bimodal and trimodal sites relative to the set of highly variable CpG sites.

Remarkably, methylation-correlated genes are far from representing an unbiased sample of the genome, with 78 (13%) of them being known transcription factors (GO enrichment P-value = 8.23 × 10 -16 ) and 145 (24%) involved in multicellular organismal development (GO enrichment P-value = 6.1 × 10 -22 ) (Additional file 6, worksheet 2). These include a number of genes from each of the four HOX clusters, together with several other key regulators of development and cellular differentiation such as EN1, HAND2, TBX1, TBX2, TBX3, TBX5, and TBX15.

We sought to further characterize the CpG sites having methylation-expression correlations. Although about a quarter of methylation correlated genes had their closest correlated probe located within 1.5 kb of the TSS and 30% in their gene body, more than a third showed only correlation with distal intergenic probes (Figure 8). Since highly expressed genes have on average low DNA methylation near the TSS and higher DNA methylation at the gene body (Figure 3A), one might expect to see negative methylation-expression correlations for CpG probes located near a gene’s TSS and positive correlations for CpG probes located in its body. However, this is only partially verified, with one-third of the former type of pairs showing a positive correlation and nearly half of the latter showing a negative correlation. Overall, strong enrichments were seen for both negatively and positively correlated probes in both the gene body and TSS region, compared to other regions 3′ or more than 5 kb 5′ of the gene (Figure 9).

CpG sites where methylation positively or negatively correlates with expression differ with respect to chromatin marks. Proportion of CpG probes having various chromatin marks in at least one of five ENCODE fibroblast cell lines or located at various positions with respect to genes, with CpG probes grouped into three categories based on the type of correlation seen with an adjacent gene expression values.

Positive and negative methylation/expression correlations are seen at all positions with respect to the gene. (A) Distribution of the distance between expression-correlated CpGs and the closest of the boundaries (TSS or TES) of the gene whose expression they correlated with, for all pairs at 5% FDR. When a gene's expression correlates significantly with more than one CpG site, it is retained as having either the set of methylation beta values with the most significant correlation (gray bars) or being the most proximal to one of the two gene boundaries (TSS or TES) (black bars). (B) Quantile-quantile plot of methylation/expression rank based correlation (Spearman’s rho), grouped by distances from the SNP to gene boundaries.

In order to find genomic features that may help distinguish CpG probes that correlate positively and negatively with gene expression, we turned to DHS and histone modification data obtained by the ENCODE consortium [8], considering data from five human fibroblast cell lines. Though these cell lines were not derived from the same donors as used in this study, we found in general that they allowed a clear separation between the two types of CpG probes (Figure 9). CpG probes where methylation levels correlated negatively with gene expression are for the most part located in regions with marks of regulatory activity (H3K4me3 or DHS): marks that are less frequent among CpG probes that show no correlation with expression and even less frequent among those that show a positive correlation. In contrast, positively correlated probes were slightly more often seen with the inactive gene-associated marker H3K27me3 when compared with negatively correlated probes.

As illustrated in Figure 10, CpG sites in all types of genomic regions are more likely to be negatively correlated with gene expression if they are located in regions of DNase I HS in at least one of the five FB cell lines considered. A similar pattern was seen with the active transcription mark H3K4me3, with the notable difference that regions having this mark in all five fibroblast cell lines considered were under-represented for negatively correlated CpG marks, indicating perhaps that invariably active regions will also be subject to less consequential variability in terms of DNA methylation and expression. We also observe that regions containing H3K27me3 in at least one of the two fibroblast cell lines where this type of data was available are more likely to contain positively correlated CpG sites.

The proportion of CpG sites where methylation correlates with expression depends on the site location, DHS and histone marks. Proportion of CpG probes showing correlation with gene expression, ±95% confidence interval, for probes located in intergenic regions (left), within 1.5 kb of the TSS (middle), or within the gene body (right), and showing either negative (top row) and positive (bottom row) correlation, depending on the presence of DHS, H3K4me3 and H3K27me3. For DHS and H3K4me3 marks, the individual bars are based on the number (out of five) of ENCODE fibroblast cell lines that have the mark in question.

In our samples, the four HOX clusters represent the densest centers of methylation-expression relationships in the genome. As seen in Figure 11A-D, each cluster is rich in both positive and negative methylation-expression correlations, involving CpG sites both within genes and within intergenic regions, with many but not all negatively correlated sites lying in regions marked by H3k4me3 and/or DHS. Also of interest in HOXA and HOXD are the topological domains obtained from a recent Hi-C study in IMR-90 cell lines [28]. In HOXD, a 40 kb region representing a boundary between the two domains contains the majority of CpG sites that have negative correlation with expression, whereas the boundary between two domains in HOXA also roughly delimits the positively and negatively CpG sites in this gene cluster. TBX1 and TBX3 represent other developmentally significant transcription factors having both positively and negatively correlated probes, whereas the latter largely coincide with DHS regions (Figure 11E,F).

Methylation-expression relationships in genomic context. Schematic of significant methylation-expression relationships for (A-D) the four HOX clusters, and (E,F) genes TBX1 and TBX3. Gold and blue lines link the TSS of the gene and the CpG probes correlated to that gene’s expression, with gold indicating negative correlation and blue indicating positive correlation. Red and blue blocks above indicate the presence of DHS or H3K4me3 marks in at least one of five ENCODE fibroblast cell lines. Where a domain boundary from Dixon et al.[28] was found, the domains are indicated with distinct colors.

Overlap between mQTLs and eQTLs

Three main types of relationships have so far been considered: methylation to sequence (mQTLs), expression to sequence (eQTLs and aeQTLs) and methylation to expression. To quantify the degree of overlap between the various relationships studied, we used genes, rather than CpG probes or SNPs, as the primary unit of interest. As seen in Figure 12, genes exhibiting two or three of the possible relationships form a relatively small but still non-negligible set. eQTLs and aeQTLs that were also mQTLs are termed in our report 'expression and methylation quantitative trait loci' (emQTLs), and correspond to a total of 52 eQTL-mappable genes and 234 aeQTL-mappable aeRegions, which together form the set of emQTL-mappable loci obtained in our analyses. When emQTL-mappable aeRegions are broken into annotated genes they overlap with, and merged with the list of emQTL-mappable genes obtained via combining eQTLs and mQTLs, we obtain a set of 242 emQTL mappable genes, plus 23 emQTL mappable aeRegions not overlapping with any annotated genes. Compared to a random selection of SNPs matched for minor allele frequency, we find 5.9 times more mQTLs are also emQTLs than expected by chance.

Overlap of genes with an eQTL, genes with expression correlated with methylation, and genes adjacent to mQTLs. Number of genes corresponding to various categories or relationships.

One example of an emQTL-mappable gene is C21orf56 (Figure 13A), which had previously been reported as having mappable CpG probes near the TSS [11]. These probes overlap with DHS and H3K4me3 regions and are negatively correlated with expression. Also of note are positively correlated CpG probes located in the body of the gene, which are also mappable to a similar set of mQTLs.

emQTL relationships in genomic context. Schematic of methylation-sequence-expression relationships in the loci surrounding the (A) C21ORF56, (B) PAX8, (C) GSTM1-GSTM5, and (D) GSTT1-GSTT2 genes. Annotations are similar to those in Figure 13, with added grey and cyan lines indicating mQTL and eQTL relationships, respectively.

Homeodomain transcription factor PAX8, transcription of which has been identified as an important biomarker in distinguishing various tumor types (reviewed in [29]), presented another particularly interesting case of overlap between the various types of relationships (Figure 13B), where CpG probes located near the gene’s TSS were unexpectedly positively correlated with the gene expression and those located in its body were negatively correlated. A possible explanation may involve putative uncharacterized transcript DKFZP686E10196, antisense to and located within PAX8, whose expression would be negatively correlated with the CpG methylation at sites near its TSS (but in the body of PAX8) but positively correlated probes in the body of the transcript (but near the TSS of PAX8). Indeed, RNA-seq data obtained from three individuals with differing genotypes in the cis-associated emQTLs suggest that the expression of PAX8 and its antisense transcript are positively correlated, ruling out an interference between the two but instead hinting at a possible chromatin-linked role of DKFZP686E10196 activation in regulating PAX8 transcription. (For a recent review of antisense regulation, see [30]).

Gene clusters of glutathione transferase families GSTM and GSTT also show multiple genes being mappable to similar sets of CpG probes and SNPs (Figure 13C,D), with active marks DHS and H3K4me3 located near negatively correlated CpG probes.

We estimated the proportion of gene expression variation that could be explained by either sequence variation alone or by a combination of sequence variation and DNA methylation, using a simple linear model and five-fold cross-validation (Materials and methods). For each gene, the five SNPs (within 250 kb) jointly explaining the largest portion of the expression variation on the training data were sequentially identified and regressed out. Independently of this we regressed out the five CpG sites explaining the largest portion of the expression variation. We found a total of 25.5% of gene expression variation to be explained by sequence variation, whereas methylation explained only 8.9% of expression variation. We applied a third model in which the top five SNPs were regressed out and then the top five CpGs were regressed from the residuals, finding in this case the variation explained by methylation dropped to 5.9%. This suggests that 5.9/8.9 = 66% of methylation-facilitated gene expression variation was independent of sequence variation. These figures are considerably higher than the 1.2% and 3.3% variation of expression explained, respectively, by DNA sequence and DNA methylation found by Li et al. in breast tumors [31], indicative perhaps of much greater variation of gene expression brought about by other factors in the tumor micro-environment.

Population Genetics


Population genetics is the study of genetic variation within and among populations and the evolutionary factors that explain this variation. Its foundation is the Hardy - Weinberg law, which is maintained as long as population size is large, mating is at random, and mutation, selection and migration are negligible. If not, allele frequencies and genotype frequencies may change from one generation to the next. Ethnic variation in allele frequencies is found throughout the genome, and by examining this genetic diversity, evolutionary patterns can be inferred, and variants contributing to the cause of common diseases can be identified. As a result of major international initiatives, extensive databases containing millions of genetic variants are available. Together with automated technology for genotyping, sequencing and bioinformatic analysis, these datasets provide the population geneticist with a huge set of densely mapped polymorphisms for reconciling genome variation with population histories of bottlenecks, admixture, and migration, for revealing evidence of natural selection, and for advancing understanding of many diseases.

24.6: Genetic Mapping - Biology

1,000 Inspiring Black Scientists in America

Biology Prof. Jill Bargonetti and dept. alumni Eric Jarvis included in list.

Hunter College Chosen as Capstone College

Hunter College has been designated as one of 11 Capstone Colleges in the United States by the Howard Hughes Medical Institute as a result of long-term funding to PI Shirley Raps.

Bratu Lab Images Featured In Cell

Images From Diana Bratu's lab, taken by Dr. Irina Catrina are featured in Cell's Journal Picture Show, called Reproduction.

Program Learning Outcomes For Biology Students

  1. Recognize, critique, design and carry out experiments according to the scientific method
  2. Synthesize and integrate abstract and practical concepts to address biological problems
  3. Discuss mechanisms of life at the organismal level, at the cellular level and molecular/genetic level
  4. Perform quantitative analyses in Biology

Biology Graduate (M.A.) and B.A./M.A. in Biotechnology

  1. Summarize and Articulate Advanced Research and Theoretical Concepts
  2. Use and Interpret Experimental Design from Current Literature in Molecular, Cellular and Developmental Biology
  3. Interpret Experimental Data Independently

Bio. Department News

New Doctoral Students Fall 2020

The New Biology Department Ph.D. New Students Fall 2020

New Doctoral Students Fall 2019

The New Biology Department Ph.D. New Students Fall 2019

Melendez Lab Work Highlighted in DOD Booklet & Published in Scientific Reports

Prof Carmen Melendez's work recently highlighted in a Multiple Sclerosis Research Program of Department of Defense booklet,and published in Scientific Reports

New Doctoral Students Fall 2018

The New Biology Department Ph.D. Students Fall 2018

Access The Gene Center Mail Server

To Access the Gene Center Mail Server Click Here

Hunter's Quantitative Biology Initiative

Hunter College is one of only nine institutions of higher education in the country to offer an interdisciplinary program in quantitative biology. Students majoring in Biology, Chemistry, Computer Science, Mathematics or Statistics can add a quantitative biology concentration to their major. Among the many benefits of this innovative program are access to competitive scholarships, small classes, training by a multidisciplinary team of research scientists and dedicated academics, individual mentoring, the opportunity to participate in research conducted at Hunter and nationally, topnotch preparation for graduate studies and for scientific careers in this new frontier.

To find out more click here.

Financial support available for:

Summer Internship, Graduate School, Post Bac. Experiences, Medical School, M.D./Ph.D. Programs, Postdoctoral Fellowships. Are you interested in a summer internship or perhaps seeking support for graduate school, medical school, an M.D./Ph.D. degree program or a postdoctoral research experience?

Skirball Science Learning Center

The Skirball Science Learning Center (SSLC) provides comprehensive assistance to all Hunter College students in all areas of the natural sciences and technology.
Location: 7th Floor, East Building

Phone (Desk): 212-396-6458

Director: Christina Medina Ramirez (212)-650-3283)

Our Department

Hunter College is located at the intersection of 68 th Street and Lexington ave. in Manhattan's Upper East Side. The Biology department occupies the 8 th and 9 th floors of the Hunter North building. The mission of the Department of Biological Sciences parallels that of Hunter College: to provide a quality education for our undergraduate and graduate students, enabling them to participate productively in their chosen pursuits.

GENTalks: Introduction

GENTalks were designed to be a series of short recorded presentations on novel, exciting and futuristic topics in genomics, that might capture the imagination of students and scientists. These presentations were recorded by students and staff in the studio of the Department of Film and Media Studies at Hunter College. Although the COVID pandemic delayed the editing and timely release of these presentations, we are happy to begin providing this material for our community. We hope you find them stimulating and informative, and we welcome your feedback in writing.

Harvey Lodish: YouTube

The Howard Hughes Medical Institute Undergraduate Science Education Program

CUNY Hunter College has been designated as one of 11 Capstone Colleges in the United States by the Howard Hughes Medical Institute.
This is the result of long-term funding to PI Professor Shirley Raps from Sept. 1, 1993 - Aug. 31, 2017, resulting in &ldquomature and successful&rdquo programs at the college. This includes recruiting a faculty member to create a bioinformatics program, supporting research by junior faculty in biology, curriculum changes and innovations in biology for undergraduates, increasing support for undergraduate research, enhancing science education for teachers and their students at public high schools and middle schools, collaboration with the Manhattan/Hunter Science High School to provide research opportunities for their students, and developing a Science Policy track in the Public Policy Program at Roosevelt House for undergraduates. Hunter is the only public urban college to receive this designation. The other colleges are Barnard, Bryn Mawr, Carleton, Grinnell, Hope, Morehouse, Smith, Spelman, Swarthmore, and Xavier. Further information can be found on the Biology HHMI website.

MARC Program at Hunter

The MARC Programs at Hunter College is supported by the National Institutes of Health (NIH) and are intended to encourage talented undergraduate minority students to pursue a career in research and science. Students in both programs receive a scholarship and financial support for conducting research throughout the academic year in a Hunter College laboratory. For more information click here.

Bio Department Advising Schedule

Students: To talk with a faculty advisor check out the department Spring Advising Schedule

Minority Graduate Student Network

MGSN is a student-run network of graduate students, medical students, and postdocs that aims to retain and increase the number of underrepresented minority students pursuing advanced degrees in STEM and medical fields.

If you’re interested in boosting MGSN activities at your school, helping plan events, or even starting a chapter, we’d love to hear from you. Email us at [email protected] to join our listserv and come to our meetings, networking events, and socials. MGSN provides the following to its members: • Academic/research support • Mentoring opportunities with established scientists • Career and personal development workshops • Mixers to foster networking and collaborations • Community outreach projects

Biology Club News

Interested in Pursuing a Graduate Degree??

The Biology Club Presents

Q&A Event on the Graduate School Process?

Department Calendar

Hunter RISE Program

The Minority Biomedical Research Support (MBRS) program was initiated at Hunter College in 1981 and in 2000 was changed to the Research Initiative for Scientific Enhancement (RISE) program. RISE provides underrepresented students majoring in biology, biochemistry, psychology and physics opportunities to complete research training. Students participating in the program are provided with financial, research and professional support to prepare them for Ph.D programs in biomedical sciences. The students are involved in research projects with faculty members in the biomedical research field on a year-round basis. ​Since its inception, the RISE program has produced approximately 75 Ph.D.&rsquos. Currently, the program provides financial and professional development support for up to 15 undergraduates and 14 Ph.D. students.​​

To apply click here.

Scholarship Opportunity

SciMON (Science Mathematics Opportunities Network) is an innovative institutional initiative designed to enhance the extraordinary research and mentoring programs available to students who study science and mathematics at Hunter College. For more info click here

Important Events in NHGRI history

1988 — Program advisory committee on the human genome is established to advise NIH on all aspects of research in the area of genomic analysis.

1988 — The Office for Human Genome Research is created within the NIH Office of the Director. Also, NIH and the Department of Energy (DOE) sign a memorandum of understanding, outlining plans for cooperation on genome research.

1988 — NIH Director James Wyngaarden, M.D., assembles scientists, administrators, and science policy experts in Reston, Virginia, to lay out an NIH plan for the Human Genome Project.

1989 — The program advisory committee on the human genome holds its first meeting in Bethesda, Maryland.

1989 — The NIH-DOE Ethical, Legal and Social Implications (ELSI) working group is created to explore and propose options for the development of the ELSI component of the Human Genome Project.

1989 — The National Center for Human Genome Research (NCHGR) is established to carry out the NIH's component of the Human Genome Project. James Watson, Ph.D., co-discoverer of the structure of DNA, is appointed as NCHGR’s first director.

1990 — The first five-year plan with specific goals for the Human Genome Project is published.

1990 — The National Advisory Council for Human Genome Research (NACHGR) is established.

1990 — The genome research review committee is created so the center can conduct appropriate peer review of human genome grant applications.

1990 — The Human Genome Project officially begins.

1991 — NACHGR meets for the first time in Bethesda, Maryland.

1992 — James Watson resigns as first director of NCHGR. Michael Gottesman, M.D., is appointed acting center director.

1993 — The center's Division of Intramural Research is established.

1993 — Francis S. Collins, M.D., Ph.D., is appointed NCHGR director.

1993 — The Human Genome Project revises its five-year goals and extends them to September 1998.

1994 — The first genetic linkage map of the human genome is achieved one year ahead of schedule. Such maps consist of DNA patterns, called markers, positioned on chromosomes, and help researchers search for disease-related genes.

1995 — Task Force on Genetic Testing is established as a subgroup of the NIH-DOE Ethical, Legal, and Social Implications (ELSI) working group.

1996 — Human DNA sequencing begins with pilot studies at six U.S. universities.

1996 — An international team completes the DNA sequence of the first eukaryotic genome, Saccharomyces cerevisiae, or common brewer's yeast. (A eukaryote is any organism whose cells contain a nucleus and other organelles enclosed within membranes.)

1996 — The Center for Inherited Disease Research, a project co-funded by eight NIH institutes and centers to study the genetic components of complex disorders, is established on the Johns Hopkins Bayview Medical Center campus in Baltimore, Maryland.

1996 — Scientists from government, university, and commercial laboratories around the world reveal a map that pinpoints the locations of more than 16,000 genes in human DNA.

1996 — NCHGR and other researchers identify the location of the first gene associated with Parkinson's disease.

1996 — NCHGR and other researchers identify the location of the first major gene that predisposes men to prostate cancer.

1996 — The Joint NIH-DOE Committee issues an evaluation of the ELSI program of the Human Genome Project.

1997 — Department of Health and Human Services Secretary Donna E. Shalala signs documents elevating NCHGR to an NIH institute, the National Human Genome Research Institute.

1997 — A federal government-citizen group – the NIH-DOE ELSI Working Group and the National Action Plan on Breast Cancer (NAPBC) – suggests policies to limit genetic discrimination in the workplace.

1997 — NHGRI and other scientists show that three specific alterations in the breast cancer genes BRCA1 and BRCA2 are associated with an increased risk of breast, ovarian and prostate cancers.

1997 — A map of human chromosome 7 is completed. Changes in the number or structure of chromosome 7 occur frequently in human cancers.

1997 — NHGRI and other researchers identify an altered gene that causes Pendred syndrome, a genetic disorder that causes early hearing loss in children.

1998 — Vice President Al Gore announces that the Clinton administration is calling for legislation to bar employers from discriminating against workers in hiring or promotion because of their genetic makeup.

1998 — At a meeting of the Human Genome Project’s main advisory body, project planners present a new five-year plan to produce a “finished” version of the DNA sequence of the human genome by the end of year 2003, two years ahead of its original schedule. The Human Genome Project plans to generate a “working draft” that, together with the finished sequence, will cover at least 90 percent of the genome in 2001. The “working draft” will be immediately valuable to researchers and form the basis for a high-quality, “finished” genome sequence.

1998 — A major international collaborative research study finds the site of a gene for susceptibility to prostate cancer on the X chromosome. This is the first time a gene for a common type of cancer is mapped to the X chromosome.

1998 — NHGRI and other Human Genome Project-funded scientists sequence the genome of the tiny roundworm Caenorhabditis elegans. It marks the first time scientists have spelled out the instructions for a complete animal that, like humans, has a nervous system, digests food and has sex.

1999 — The pilot phase of the Human Genome Project is completed. A large-scale effort to sequence the human genome begins.

1999 — NHGRI, DOE, and the Wellcome Trust, a global charity based in London, hold a celebration of the completion and deposition of 1 billion base pairs of the human genome DNA sequence into GenBank ( GenBank is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences.

1999 — For the first time, NHGRI and other Human Genome Project-funded scientists unravel the genetic code of an entire human chromosome (chromosome 22). The findings are reported in Nature.

2000 — President Clinton signs an Executive Order to prevent genetic discrimination in the federal workplace. NHGRI programs on the ethical, legal and social implications of the Human Genome Project played a role in the development of policy principles on this issue.

2000 — Public consortium of scientists and a private companyelease a substantially complete genome sequence of the fruit fly, Drosophila melanogaster. Science publishes the findings.

2000 — Scientists in Japan and Germany report that they have unraveled the genetic code of human chromosome 21, known to be involved with Down syndrome, Alzheimer's disease, Usher syndrome, and amyotrophic lateral sclerosis, also known as Lou Gehrig's disease. Nature publishes these findings.

2000 — President Bill Clinton, NHGRI Director Francis Collins, British Prime Minister Tony Blair (via satellite), and Craig Venter, president, Celera Genomics Corp., announce the completion of the first survey of the human genome in a White House ceremony.

2000 — An international team led by NHGRI scientists discover a genetic “signature” that may help explain how malignant melanoma, a deadly form of skin cancer, can spread to other parts of the body. The findings are reported in Nature.

2000 — The NIH, the Wellcome Trust, and three private companies collaborate to form the Mouse Sequencing Consortium to accelerate the sequencing of the mouse genome.

2000 — The Human Genome Project is the recipient of the American Society of Human Genetics' Allan Award to honor the hundreds of scientists involved in deciphering the human genetic code.

2001 — The ELSI Research Programs of NHGRI and DOE cosponsor a conference to celebrate a decade of research and consider the impact of the new science on genetic research, health and policy.

2001 — The Human Genome Project publishes the first analysis of the human genome sequence, describing how it is organized and how it evolved. The analysis, published in the journal Nature, reveals that the human genome only contains 30,000 to 40,000 genes, far fewer than the 100,000 previously estimated.

2001 — NHGRI scientists use DNA microarray technology to develop a gene test that differentiates hereditary and sporadic breast cancer types. The New England Journal of Medicine publishes the findings. (DNA microarray technology is a collection of microscopic DNA spots attached to a solid surface that scientists use to measure the expression levels of large numbers of genes simultaneously.)

2001 — NHGRI and Human Genome Project-funded scientists find a new tumor suppressor gene on human chromosome 7 that is involved in breast, prostate and other cancers. A single post-doctoral researcher, using the “working draft” data, pins down the gene in weeks. In the past, the same work would have taken several years and contributions from many scientists.

2001 — The Mouse Genome Sequencing Consortium announces it has achieved three-fold coverage of the mouse DNA sequence. The publicly available data represents 95 percent of the mouse sequence, and can be used to uncover human genes by comparing the genomes of mouse and human to each other.

2001 — Researchers from NHGRI and Sweden's Lund University develop a method of accurately diagnosing four complex, hard-to-distinguish childhood cancers using DNA microarray technology and artificial neural networks. Nature Medicine publishes the results.

2001 — NHGRI creates the Centers for Excellence in Genomic Sciences (CEGS) program, which supports interdisciplinary research teams that use data sets and technologies developed by the Human Genome Project. The initial CEGS grants for innovative genomic research projects are awarded to the University of Washington and Yale University.

2001 — To inform the public, students, and healthcare providers in minority communities about the scientific advances and the ethical, legal, and social impact of the Human Genome Project, NHGRI co-sponsors a forum, entitled The Human Genome Project: The Challenges and Impact of Human Genome Research for Minority Communities.

2001 — NHGRI holds a planning conference called, Beyond the Beginning: The Future of Genomics at the Airlie Conference Center in Warrenton, Virginia. Attendees help develop a broad vision for the future of genomics research as the achievement of the Human Genome Project goals approaches.

2002 — NHGRI scientists and collaborators at Johns Hopkins Medical Institution in Baltimore and The Cleveland Clinic identify a gene on chromosome 1 that is associated with an inherited form of prostate cancer in some families. Nature Genetics publishes the findings.

2002 — NHGRI and the NIH Office of Rare Diseases launch a new information center – the Genetic and Rare Diseases Information Center (GARD) — to provide accurate, reliable information about genetic and rare diseases to patients and their families.

2002 — NHGRI chooses the next set of model organisms to sequence as DNA sequencing capacity becomes available. They include the chicken, chimpanzee, several species of fungi, a sea urchin, the honeybee and Tetrahymena, a microscopic animal commonly used in laboratory studies.

2002 — NHGRI launches a redesigned Web site,, which provides improved usability and easy access to new content for a wide range of users.

2002 — An international team of researchers led by NHGRI pinpoints the gene defect responsible for a form of the devastating brain disorder microcephaly, found in nine generations of infants among the Old Order Amish. Nature Genetics publishes the results, which may shed new light on normal brain development.

2002 — NHGRI publishes, “A User's Guide to the Human Genome,” in Nature Genetics. The “how-to” manual is designed to encourage scientists to explore the human genome sequence available in public databases

2002 — NHGRI, in cooperation with five other NIH institutes, awards a grant to combine three of the world's current protein databases into a single global resource called UniProt (

2002 — NHGRI launches the International HapMap Project, a $100 million, public-private effort to create a new type of genome map that will chart genetic variation among human populations. The HapMap serves as a tool to speed the search for the genes involved in common disorders such as asthma, diabetes, heart disease and cancer. The SNP Consortium, a collaborative effort among industry, academic centers and the Wellcome Trust, helps provide an instrumental public catalog of genetic variation.

2002 — NHGRI names Alan E. Guttmacher, M.D., as its new deputy director. It selects Eric D. Green, M.D., Ph.D., as its new scientific director, and William A. Gahl, M.D., Ph.D., as its new intramural clinical director.

2003 — NHGRI launches the ENCyclopedia of DNA Elements (ENCODE) pilot project to identify all functional elements in human DNA.

2003 — NHGRI celebrates the successful completion of the Human Genome Project — two years ahead of schedule and under budget. The event coincides with the 50th anniversary of the description of DNA’s double helix and the 2003 publication of the vision document for the future of genomics research.

2003–NHGRI publishes a new strategic plan for the “genomics era” in Nature, titled, “A Vision for the Future of Genomics Research,” the culmination of two years of planning with the research community.

2003 — NHGRI researchers identify the gene that causes the premature aging disorder progeria. Nature publishes the findings.

2003 — NHGRI researchers make discoveries in mice that may lead to safer methods of gene therapy. They show that a genetically engineered virus used in gene therapy trials tends to insert itself at the beginning of genes in the target cell, potentially disrupting gene function.

2003 — A detailed analysis of the sequence of the human Y chromosome is published in the journal Nature.

2003 — A detailed analysis of the sequence of chromosome 7 uncovers structural features that appear to promote genetic changes that can cause disease. The findings by a multinational team of scientists are reported in the journal Nature.

2003 — A team of researchers, led by NHGRI, compares the genomes of 13 vertebrate animals. The results, published in Nature, suggest that comparing a wide variety of species' genomes will illuminate genomic evolution and help identify functional elements in the human genome.

2003 — NHGRI establishes the Education and Community Involvement Branch to engage the public in understanding genomics and accompanying ethical, legal and social issues.

2003 — NHGRI announces the first grants in a three-year, $36 million scientific program called ENCyclopedia Of DNA Elements (ENCODE), aimed at discovering all parts of the human genome that are crucial to biological function.

2003 — NHGRI selects five centers to carry out a new generation of large-scale genome sequencing projects to realize the promise of the Human Genome Project and expand understanding of human health and disease.

2003 — NHGRI announces the first draft version of the chimpanzee genome sequence and its alignment with the human genome.

2003 — The International HapMap Consortium publishes a paper that sets forth the scientific rationale and strategy behind its effort to create a map of human genetic variation.

2004 — NHGRI announces that the first draft version of the honey bee genome sequence has been deposited into free public databases.

2004 — NHGRI and other scientists successfully create transgenic zebra fish using sperm genetically modified and grown in a laboratory dish. This achievement has implications for wide ranging research, from developmental biology to gene therapy. The study is published in the Proceedings of the National Academy of Sciences.

2004 — The Genetic and Rare Disease Information Center announces efforts to enable healthcare workers, patients and families who speak Spanish to take advantage of its free services.

2004 — NHGRI's Large-Scale Sequencing Research Network announces it will begin genome sequencing of the first marsupial, the gray short-tailed South American opossum, and more than a dozen other model organisms to further understanding of the human genome.

2004 — NHGRI announces that the first draft version of the chicken genome sequence has been deposited into free public databases.

2004 — NHGRI researchers and other scientists find variants in a gene that may predispose people to type 2 diabetes, the most common form of the disease.

2004 — NHGRI announces that the International Sequencing Consortium has launched a free online resource, where scientists and the public can view the latest information on sequencing projects for animal, plant and eukaryotic genomes.

2004 — The International Rat Genome Sequencing Project Consortium announces the publication of a high-quality draft sequence of the rat genome. The publication is important because of the rat’s ubiquitous use as a disease research model.

2004 — NHGRI and the Melbourne-based Australian Genome Research Facility, Ltd., announce a partnership to sequence the genome of the tammar wallaby, a member of the kangaroo family.

2004 — NHGRI announces that the first draft version of the dog genome sequence has been deposited into free public databases.

2004 — NHGRI launches the NHGRI Policy and Legislative Database, an online resource to enable researchers, health professionals, and the public to locate information on laws and policies related to genetic discrimination and other genomic issues.

2004 — NHGRI scientists and an interdisciplinary consortium of researchers from 11 universities and institutions discover a possible inherited component for lung cancer, a disease normally associated with external causes, such as cigarette smoking.

2004 — NHGRI's Large-Scale Sequencing Research Network announces a comprehensive strategic plan to sequence 18 additional organisms, including the African savannah elephant, the domestic cat, and the orangutan to help interpret the human genome.

2004 — NHGRI launches four interdisciplinary Centers for Excellence in Ethical, Legal and Social Implications Research to address some of the most pressing societal questions raised by recent advances in genetic and genomic research.

2004 — NHGRI announces that the first draft version of the cow genome sequence has been deposited into free public databases.

2004 — NHGRI awards more than $38 million in grants to develop new genome sequencing technologies to accomplish the near-term goal of sequencing a mammalian-sized genome for $100,000, and the longer-term challenge of sequencing an individual human genome for $1,000 or less. These are the first grants from the Advanced Sequencing Technology Program.

2004 — The International Human Genome Sequencing Consortium, led in the United States by NHGRI and the Department of Energy, publishes its scientific description of the finished human genome sequence. The analysis, published in Nature, reduces the estimated number of human protein-coding genes from 35,000 to only 20,000-25,000, a surprisingly low number for our species.

2004 — The ENCODE Consortium publishes a paper in Science that sets forth the scientific rationale and strategy behind its quest to produce a comprehensive catalog of all parts of the human genome crucial to biological function.

2004 — NHGRI partners with the Office of the U.S. Surgeon General to launch a free computer program, My Family Health Portrait, which the public can use to record important information about their family health history and share with their health care providers.

2004 — NHGRI and the International Chicken Genome Sequencing Consortium publish in Nature an analysis comparing the chicken and human genomes. It is the first bird to have its genome sequenced and analyzed.

2005 — NIH hails the first comprehensive analysis of the sequence of the human X chromosome. The work, some of which was carried out as part of the Human Genome Project, is published in Nature. It provides sweeping new insights into the evolution of sex chromosomes and the biological differences between males and females.

2005 — The first comprehensive comparison of the genetic blueprints of humans and chimpanzees is published in the journal Nature, showing that we share 96 percent of our DNA with our closest living relatives.

2005 — NIH awards contracts that will give researchers unprecedented access to two private collections of knockout mice, providing valuable models for the study of human disease and laying the groundwork for a public, genome-wide library of knockout mice.

2005 — The International HapMap Consortium publishes a comprehensive catalog of human genetic variation. This landmark achievement published in Nature, will serve to accelerate the search for genes involved in common diseases, such as asthma, diabetes, cancer, and heart disease.

2005 — NHGRI and the National Cancer Institute (NCI) launch The Cancer Genome Atlas (TCGA), a comprehensive effort to accelerate understanding of the molecular basis of cancer through the application of genome analysis technologies.

2006 — The Genetic Association Information Network (GAIN), a public-private partnership led by NHGRI, is established to help find the genetic causes of common diseases by conducting large-scale genomic studies and making their results broadly available to researchers worldwide.

2006–NIH launches the Genes, Environment and Health Initiative (GEI) to understand the interactions of genetics and environment in common conditions and disease. It is managed by NHGRI and the National Institute of Environmental Health Sciences.

2006 — Researchers at the NIH Chemical Genomics Center – a trans-NIH center administered by NHGRI – develop a new screening approach that can profile compounds in large chemical libraries more accurately and precisely than standard methods. This advance speeds the production of data that can be used to identify molecular leads for drug discovery.

2006 — NHGRI awards grants totaling more than $13 million to further speed the development of innovative sequencing technologies that reduce the cost of DNA sequencing and expand the use of genomics in medical research and health care.

2007 — In the most comprehensive look at genetic risk factors for type 2 diabetes to date, NHGRI researchers, working in close collaboration with two other scientists, identify at least four new genetic variants associated with increased risk of diabetes and confirm existence of another six. All three reports are published in Science.

2007 — NHGRI establishes the Genomic Healthcare Branch to promote the effective integration of genomic discoveries into healthcare.

2007 — NHGRI establishes the Office of Population Genomics to promote multidisciplinary research in epidemiology and genomics.

2007 — The Electronic Medical Records and Genomics (eMERGE) Network is announced in September 2007. Researchers use DNA biorepositories and electronic medical records in large-scale studies to better understand the underlying genomics of disease.

2007 — NHGRI awards grants totaling more than $80 million over four years to expand the ENCODE project, which, in its pilot phase, yielded provocative new insights into the organization and function of the human genome.

2007 — An international team of scientists, supported in part by NHGRI, announces that its systematic effort to map the genomic changes underlying lung cancer has uncovered a critical gene alteration not previously linked to any form of cancer. The results are published in Nature.

2007 — In a White House Ceremony, NHGRI Director Francis S. Collins is awarded the Presidential Medal of Freedom by President George W. Bush for his leadership of and contributions to the Human Genome Project.

2007 — To better understand the role that bacteria, fungi, and other microbes play in human health, NIH launches the Human Microbiome Project. The human microbiome is all microorganisms present in or on the human body. NHGRI, the National Institute of Allergy and Infectious Diseases, and the National Institute of Dental and Craniofacial Research lead the project on behalf of NIH.

2008 — The NIH Genome-Wide Association Studies (GWAS) data sharing policy goes into effect to promote access to genomics research data while ensuring research participant protections.

2008 — An international research consortium announces the establishment of the 1000 Genomes Project. This effort will involve sequencing the genomes of at least 1000 people from around the world to create the most detailed and medically useful picture to date of human genetic variation. NHGRI is a major funder of the 1000 Genomes Project.

2008 — NHGRI and the National Institute of Environmental Health Sciences collaborate with the U.S. Environmental Protection Agency to begin testing the safety of chemicals, ranging from pesticides to household cleaners. The initiative uses the NIH Chemical Genomics Center's high-speed, automated screening robots to test suspected toxic compounds using cells and isolated molecular targets instead of laboratory animals.

2008 — NIH announces the establishment of the NIH Intramural Center for Research on Genomics and Global Health (CRGGH), a new venue for research about the way populations are impacted by diseases such as obesity, diabetes and hypertension. CRGGH is part of the NIH Office of Intramural Research and administered by NHGRI.

2008 — The first analysis of the genome sequence of the duck-billed platypus reveals clues about how genomes were organized during the early evolution of mammals. The research, published in Nature, was supported in part by NHGRI.

2008 — President George W. Bush signs into law the Genetic Information Nondiscrimination Act (GINA) that will protect Americans against discrimination based on their genetic information when it comes to health insurance and employment. The bill passed the Senate unanimously and the House by a vote of 414 to 1.

2008 — Francis S. Collins steps down as NHGRI director. Alan E. Guttmacher is named acting director of NHGRI.

2008 — NIH funds a network of nine centers across the country that will use high tech screening methods to identify small molecules for use as biological probes and targets for drug development. The NIH Chemical Genomics Center, administered by NHGRI, is funded as part of the network.

2008 — The TCGA Research Network reports the first results of its large-scale, comprehensive study of the most common form of brain cancer, glioblastoma. In a paper published in Nature, the TCGA team describes the discovery of new genetic mutations and other types of DNA alterations with potential implications for the diagnosis and treatment of glioblastoma.

2008 — NHGRI researchers help to identify a protein that plays matchmaker between two key types of white blood cells, T and B cells, enabling them to interact in a way that is crucial to establishing long-lasting immunity after an infection. The results are published in Nature.

2008 — The NIH Human Microbiome Project, collaborating with scientists around the globe, announces they will form the International Human Microbiome Consortium, an effort that will enable researchers to characterize the relationship of the human microbiome in the maintenance of health and in disease.

2008 — A multi-institution team, funded by NHGRI, reports results in Nature of the largest effort to date to chart the genetic changes involved in the most common form of lung cancer, lung adenocarcinoma.

2008 — An international consortium including NHGRI researchers, in search of the genetic risk factors for obesity, identifies six new genetic variants associated with BMI, or body mass index, a measurement that compares height to weight. The results, funded in part by NIH, are published online in the journal Nature Genetics.

2009 — Researchers from NIH and NHGRI find a new way of detecting functional regions in the human genome. The novel approach involves looking at the three-dimensional shape of the genome's DNA and not just reading the sequence of the four-letter alphabet of its DNA bases. The results are published online in Science.

2009 — A team led by NHGRI scientists identifies a gene that suppresses tumor growth in melanoma, the deadliest form of skin cancer. The finding is reported in the journal Nature Genetics as part of a systematic genetic analysis of a group of enzymes implicated in skin cancer and many other types of cancer.

2009 — NHGRI announces the release of the first version of PhenX, a free online toolkit aimed at standardizing measurements of research subjects' physical characteristics and environmental exposures. The tools give researchers more power to compare data from multiple studies, accelerating efforts to understand the complex genetic and environmental factors that cause cancer, heart disease, depression and other common diseases.

2009 — The U.S. Department of Agriculture and NIH announce that an international consortium of researchers has completed an analysis of the genome of domestic cattle, the first livestock mammal to have its genetic blueprint sequenced and analyzed. The landmark research, which received major support from NHGRI, bolsters efforts to produce better beef and dairy products and will lead to a better understanding of the human genome.

2009 — NIH launches the first integrated drug development pipeline to produce new treatments for rare and neglected diseases. The $24 million program, whose laboratory operations are managed by NHGRI at the NIH Chemical Genomics Center, jumpstarts a trans-NIH initiative called the Therapeutics for Rare and Neglected Diseases program.

2009 — NHGRI researchers studying the skin's microbiome publish an analysis in Science revealing that our skin is home to a much wider array of bacteria than previously thought. The study, done in collaboration with other NIH researchers, also shows the bacteria that live under your arms are likely to be more similar to those under another person's arm than they are to the bacteria that live on your forearm.

2009 — An NIH research team led by NHGRI researchers finds that a single evolutionary event appears to explain the short, curved legs that characterize all of today's dachshunds, corgis, basset hounds and at least 16 other breeds of dogs. The unexpected discovery provides new clues about how physical differences may arise within species and suggests new approaches to understanding a form of human dwarfism. The results are reported in Science.

2009 — NIH researchers report in the online issue of PLoS Genetics the discovery of five genetic variants related to blood pressure in African Americans, findings that may provide new clues to treating and preventing hypertension. This effort, which includes NHGRI researchers, marks the first time that a relatively new research approach, called a genome-wide association study, has focused on blood pressure and hypertension in an African-American population.

2009 — Researchers, supported in part by NHGRI, generate massive amounts of DNA sequencing data of the complete set of exons, or “exomes,” from the genomes of 12 people. The findings, which demonstrate the feasibility of this strategy to find rare genetic variants that may cause or contribute to disease, are published online in Nature.

2009 — NHGRI researchers lead a study that identifies a new group of genetic mutations involved in melanoma, the deadliest form of skin cancer. This discovery, published in Nature Genetics, is particularly encouraging because some of the mutations, which were found in nearly one-fifth of melanoma cases, reside in a gene already targeted by a drug approved for certain types of breast cancer.

2009 — NHGRI launches the next generation of its online Talking Glossary of Genetic Terms. The glossary contains several new features, including more than 100 colorful illustrations and more than two dozen 3-D animations that allow the user to dive in and see genetic concepts in action at the cellular level.

2009 — An NHGRI-led research team finds that carriers of a rare, genetic condition called Gaucher disease face a risk of developing Parkinson's disease more than five times greater than the general public. The findings are published in the New England Journal of Medicine.

2009 — NIH director Francis S. Collins, M.D., Ph.D., announces the appointment of Eric D. Green, M.D., Ph.D., to be director of NHGRI. It is the first time an institute director has risen to lead the entire NIH and subsequently picked his own successor.

2010 — NHGRI launches the Genetics/Genomics Competency Center (G2C2), an online tool to help educators teach the next generation of health professionals about genetics and genomics.

2010 — An international research team, including researchers from NHGRI, produce the first whole genome sequence of the 3 billion letters in the Neanderthal genome.

2010 — NIH and the Wellcome Trust, a global charity based in London, announce a partnership called the Human Heredity and Health in Africa project (H3Africa) to support population-based genetic studies in Africa by Africa. NHGRI helps administer H3Africa.

2010 — Daniel L. Kastner, M.D., Ph.D., is appointed scientific director of the NHGRI.

2010 — NIH announces awards to support the Genotype-Tissue Expression (GTEx) project, an initiative to understand how genetic variation may control gene activity and its relationship to disease. GTEx is managed in part by NHGRI.

2011 — NHGRI's new strategic plan, Charting a course for genomic medicine, from base pairs to bedside, for the future of human genome research is published in the February 10, 2011, issue of Nature.

2011 — A research team from the NIH Undiagnosed Diseases Program, which is co-led by NHGRI, reports in the New England Journal of Medicine the first genetic finding of a rare, adult-onset vascular disorder associated with progressive and painful arterial calcification.

2011 — The Partnership for Public Service selects NHGRI Clinical Director William A. Gahl, M.D., Ph.D., to receive its Science and Environmental Medal (one of nine annual Service to America Awards, or Sammies).

2011 — P. Paul Liu, M.D., Ph.D., a world expert in the onset, development and progression of leukemia, is named NHGRI's deputy scientific director.

2011 — Mark S. Guyer, Ph.D., is named NHGRI deputy director.

2011 — NHGRI announces funding for its five Clinical Sequencing Exploratory Research projects aimed at studying ways that healthcare professionals can use genome sequencing information in the clinic.

2012 — For the first time, researchers in the NIH Human Microbiome Project (HMP) Consortium – including NHGRI investigators — map the normal microbial make-up of healthy humans. They report their findings in a series of coordinated papers in Nature and other journals.

2012 — ENCODE researchers produce a more dynamic picture of the human genome that gives the first holistic view of how the human genome actually does its job. The findings are reported in two papers appearing in Nature.

2012 — NHGRI reorganizes the institute's Extramural Research Program into four new divisions and promotes to division status the office overseeing policy, communications, and education, and the office overseeing administration and management. The divisions and their inaugural directors include: Division of Genome Sciences, Jeffery Schloss, Ph.D. Division of Genomic Medicine, Teri Manolio, M.D., Ph.D. Division of Extramural Operations, Bettie Graham, Ph.D. Division of Genomics and Society, (acting director) Mark Guyer, Ph.D. Division of policy, communications, and education, Laura Lyman Rodriguez, Ph.D. and Division of Management, Janis Mullaney, M.B.A.

2013 — A special symposium, The Genomics Landscape: A Decade After the Human Genome Project, marks the 10th anniversary of the completion of the Human Genome Project.

2013 — The Smithsonian Institution in Washington, D.C. opens a high-tech, high-intensity exhibition Genome: Unlocking Life's Code to celebrate the 10th anniversary of researchers producing the first complete human genome sequence. The exhibition is a collaboration between the Smithsonian Institution’s National Museum of Natural History and NHGRI. The exhibition will travel across North America following its time at the Smithsonian.

2013 — NIH awards the initial four grants for NHGRI’s Implementing Genomics in Practice (IGNITE) focused on developing new approaches to incorporating genomic information into patient care.

2013 — In a long-running legal case over a patent held by Myriad Genetics on a gene linked to breast cancer, the U.S. Supreme Court rules that isolated but otherwise unmodified DNA cannot be the subject of a patent.

2013 — NHGRI and the Eunice Kennedy Shriver National Institute of Child Health and Human Development announce awards for pilot projects to explore the use of genomic sequencing in newborn healthcare.

2013 — A team of scientists from NHGRI and the NIH Clinical Center receives a Service to America Medal for their efforts to protect patients from infections with drug-resistant bacteria.

2013 — NHGRI selects Lawrence C. Brody, Ph.D., to be the first director of the Division of Genomics and Society, established through the October 2012 reorganization.

2014 — NHGRI celebrates the 10th anniversary of the Social and Behavioral Research Branch, which it launched as a branch of the Division of Intramural Research in December 2003.

2014 — NHGRI Scientific Director Daniel Kastner, M.D., Ph.D., implements a reorganization of NHGRI's 45 intramural investigators and associated research programs into nine branches.

2014 — NHGRI Deputy Director Mark Guyer, who played a critical role in the Human Genome Project and countless other genomics programs, retires from federal service.

2014 — NIH issues the NIH Genomic Data Sharing policy to promote data sharing as a way to speed the translation of data into knowledge, products and procedures that improve health while protecting the privacy of research participants. The final policy will be effective for all NIH-supported research beginning in January 2015.

2014 — The first Clinical Center Genomics Opportunity awards of exome data goes to 10 intramural investigators for research at the NIH Clinical Center.

2014 — NIH announces the two-site DNA Sequencing Core Undiagnosed Diseases Network, awarded to Baylor College of Medicine, Houston, and the Medical College of Wisconsin, Milwaukee.

2014 — Scientists looking across human, fly, and worm genomes find that these species have shared biology. The findings, appearing in the journal Nature, offer insights into embryonic development, gene regulation and other biological processes vital to understanding human biology and disease.

2014 — An international team including researchers from NIH completes the first comprehensive characterization of genomic diversity across sub-Saharan Africa. The study provides clues to medical conditions in people of sub-Saharan African ancestry, and indicates that the migration from Africa in the early days of the human race was followed by a migration back into the continent.

2014 — Investigators with The Cancer Genome Atlas (TCGA) Research Network identify new potential therapeutic targets for a major form of bladder cancer.

2014 — Ellen Rolfes, M.A., is appointed the NHGRI executive officer and director of the NHGRI Division of Management.

2015 — NHGRI celebrates the 25th anniversary of the launch of the Human Genome Project (HGP). To commemorate this anniversary, NHGRI’s History of Genomics Program hosts a seminar series titled, “A Quarter Century after the Human Genome Project: Lessons Beyond Base Pairs,” featuring HGP participants sharing their perspectives about the project and its impact on their careers.

2015 — The Undiagnosed Diseases Network (UDN) opens an online patient application, the UDN Gateway, to streamline the patient application process across its individual clinical sites.

2015 — An international team of scientists from the 1000 Genomes Project Consortium creates the world’s largest catalog of genomic differences among humans, providing researchers with powerful clues to help them establish why some people are susceptible to various diseases.

2015 — NHGRI awards grants of more than $28 million aimed at deciphering the language of how and when genes are turned on and off. The awards emanate from NHGRI’s Genomics of Gene Regulation (GGR) program.

2015 — Shawn Burgess, Ph.D., and colleagues develop transgenic zebrafish as a live animal model of metastasis, offering cancer researchers a new, potentially more accurate way to screen for drugs and to identify new targets against disease.

2015 — Experts from academic and non-profit institutions across the United States join NHGRI and NIH staff at a roundtable meeting to discuss opportunities and challenges associated with the inclusion and engagement of underrepresented populations in genomics research.

2015 — Research funded by NHGRI’s Centers for Excellence in Genome Sciences and published in Nature Genetics provides new insights into the effects and roles of genetic variation and parental influence on gene activity in mice and humans.

2015 — NIH researchers discover the genomic switches of a blood cell are key to regulating the human immune system. The findings, published in Nature, open the door to new research and development in drugs and personalized medicine to help those with autoimmune disorders.

2015 — The Electronic Medical Records and Genomics (eMERGE) Network begins Phase III with nine new investigator sites, two central sequencing and genotyping facilities, and a coordinating center.

2016 — NHGRI launches the Centers for Common Disease Genomics, which will use genome sequencing to explore the genomic contributions to common diseases such as heart disease, diabetes, stroke and autism.

2016 — NHGRI awards approximately $11.1 million to support research aimed at identifying differences - called genetic variants - in the less-studied regions of the genome that are responsible for regulating gene activity.

2016 — NHGRI funds researchers at its Centers of Excellence in Ethical, Legal and Social Implications Research program to examine the use of genomic information in the prevention and treatment of infectious diseases genomic information privacy communication about prenatal and newborn genomic testing results and the impact of genomics in American Indian and Alaskan Native communities.

2016 — NIH scientists identify a genetic mutation responsible for a rare form of inherited hives induced by vibration, also known as vibratory urticarial.

2016 — NHGRI Senior Investigator Dr. Francis Collins and an international team of more than 300 scientists conduct a comprehensive investigation of the underlying genetic architecture of type 2 diabetes. Their findings suggest that most of the genetic risk for type 2 diabetes can be attributed to common shared genomic variants.

2016 — NHGRI researchers collaborate with physicians and medical geneticists around the world to create the Atlas of Human Malformation Syndromes in Diverse Populations.

2016 — The Genomic Healthcare Branch convened a meeting with 14 family health history tool developers and vendors to highlight their approaches to addressing gaps in most current electronic health records.

2016 — The Policy and Program Analysis Branch held a public workshop, “Investigational Device Exemptions and Genomics,” to help investigators and institutional review board members learn more about Food and Drug Administration regulations and their application to genomics research.

Burrows-Wheeler Aligner

BWA is a software package for mapping low-divergent sequences against a large reference genome, such as the human genome. It consists of three algorithms: BWA-backtrack, BWA-SW and BWA-MEM. The first algorithm is designed for Illumina sequence reads up to 100bp, while the rest two for longer sequences ranged from 70bp to 1Mbp. BWA-MEM and BWA-SW share similar features such as long-read support and split alignment, but BWA-MEM, which is the latest, is generally recommended for high-quality queries as it is faster and more accurate. BWA-MEM also has better performance than BWA-backtrack for 70-100bp Illumina reads.

    Li H. and Durbin R. (2009) Fast and accurate short read alignment with Burrows-Wheeler Transform. Bioinformatics, 25:1754-60. [PMID: 19451168]
    Li H. and Durbin R. (2010) Fast and accurate long-read alignment with Burrows-Wheeler Transform. Bioinformatics, Epub. [PMID: 20080505]

Connectivity Map (CMAP)

Creating and analyzing large perturbational datasets to aid our understanding of human disease and to accelerate the discovery of novel therapeutics.

The sequencing of the human genome has led to an explosion of new insights into the genetic basis of disease. A challenge, however, is that while the identity of disease-associated genes may be known, in many cases their function remains obscure.

At the same time, approaches to chemical biology and drug discovery have dramatically expanded. New types of chemical libraries have been generated, powerful screening methods have been developed, and novel classes of therapeutics are entering the clinic. And yet, there has been no method to systematically determine the cellular effects of a given compound – unexpected off-target activities are often discovered only late in the drug development process, resulting in side effects that limit clinical use.

We hypothesized that a potential solution to these problems might be the creation of a comprehensive catalog of cellular signatures representing systematic perturbation with genetic (thus reflecting protein function) and pharmacologic (thus reflecting small-molecule function) perturbagens. Signatures with high similarity might represent useful and previously unrecognized connections (e.g. between two proteins operating in the same pathway, between a small-molecule and its protein target, or between two small-molecules of similar function but structural dissimilarity). Such a catalog of connections could serve as a functional look-up table of the genome we termed this concept the Connectivity Map (CMap).

To date, CMap has generated a library containing over 1.5M gene expression profiles from

5,000 small-molecule compounds, and

3,000 genetic reagents, tested in multiple cell types. To produce data of that scale, we’ve developed L1000, a relatively inexpensive and rapid high-throughput gene expression profiling technology. Expression data are processed through a computational pipeline that converts raw fluorescence intensity into signatures, which can be used to query the CMap database for perturbations that give a related gene expression response.

To house and use these vast amounts of data, we have built a cloud-based compute infrastructure termed CLUE (CMap and LINCS Unified Environment), a suite of user-friendly web applications and software tools that enable researchers to access and manipulate CMap data and integrate it with their own data datasets.

We invite biologists and computational scientists to use CMap to further your research.

Funding for our work comes from the NIH LINCS (Library of Integrated Cellular Signatures) project, as well as from philanthropic grants, collaborative projects with industry and Broad Institute funds.

Author information


Department of Biomedical Informatics, Harvard Medical School, Countway Library, 10 Shattuck St, Boston, MA, 02115, USA

Peter Kerpedjiev, Chuck McCallum, Jacob M. Luber, Scott B. Ouellette, Alaleh Azhir, Nikhil Kumar, Soohyun Lee, Burak H. Alver, Peter J. Park & Nils Gehlenborg

Computational and Systems Biology Program, MIT, Cambridge, USA

School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, USA

Fritz Lekschas, Kasper Dinkla, Hendrik Strobelt, Jeewon Hwang & Hanspeter Pfister

Bioinformatics and Integrative Genomics Program, Harvard Medical School, Boston, MA, USA

Department of Physics, MIT, Cambridge, USA

Institute for Medical Engineering and Science, MIT, Cambridge, USA

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar


PK and NG conceived the research. PK, NA, and NG wrote the manuscript with input from LAM, PJP, and BHA. PK, NA, FL, and CM wrote the software with help from KD, HS, JML, SO, AA, NK, JH, and SL. BHA, HP, LAM, and PJP provided valuable input and advice for the project. All authors read and approved the final manuscript.

Watch the video: Η γενετική καταγωγή των Ελλήνων Δρ. Κωνσταντίνος Τριανταφυλλίδης (June 2022).


  1. Alexandru

    the very precious phrase

  2. Fenrirr

    I consider, that you commit an error. Let's discuss it.

  3. Naomhan

    I, sorry, but that certainly does not suit me. I will look further.

Write a message