The variant of Concern (VOC), Omicron is the predominant variant circulating throughout the world of the SARS COV2 pandemic during the third wave. The World Health Organisation has designated this highly mutated variant as a variant of concern due to its high transmissibility and risk of reinfection. The Omicron variant has spread to most of the cities and has become the predominant variant driving the 3rd wave in India. Whole-genome sequencing was performed for 192 SARS-CoV2 PCR positive samples as a part of INSACOG between December 2021 to January 10th, 2022, and was analyzed. From the 133 omicron variants detected, genomic analysis was carried out by contextualizing them with 1586 complete genomes of Omicron from the whole of India downloaded from the GISAID database.
The Omicron variant prevalence in India has increased in a log phase within 3 months' time and has been seen in most of the metropolitan cities. Even though the sublineage BA.1 was first observed in the country, the BA.2 sublineage which was introduced to Delhi in the mid of December 2021 has become predominant in early January. The two outbreaks observed were of BA.2 variant and were seen to spread to multiple cities within a span of a few days. The rapid spread and specific mutations in the outbreak samples of Omicron indicate that the variant is highly transmissible when compared to previous variants. The study shows the importance of genomic sequence to identify the emergence of clusters and take actions to prevent further super spreading events.
Whole-Genome Sequencing; Omicron; GISAID database; SARS COV2 pandemic
Coronavirus is a virus that infects the nose, sinuses, and upper respiratory tract, most of them not causing severe effects . After an outbreak in Wuhan, China in December 2019, the World Health Organisation (WHO) recognized severe acute respiratory syndrome coronavirus 2 (SARS-CoV2) as a novel coronavirus that was a serious threat to humans . The SARS-CoV2 virus, like other coronaviruses, has been undergoing continuous mutations, resulting in multiple variants of concern (VOC) .
On November 25, 2021, the WHO identified the Omicron (B.1.1.529) variation of SARS-CoV-2 to be a variant of concern after its initial identification in Gauteng Province, South Africa. The Omicron variant's rapid spread sparked the fourth wave of SARS-CoV-2 infections in South Africa, with daily diagnosed cases exceeding totals reported in the country during all preceding periods . Various reports from across the globe have shown that there is a significant increase in the spread of the omicron variant when compared to the delta [5,6]. As of January 2022, the Omicron variant has disseminated worldwide including India, where 90% of all SARS-CoV-2 infections diagnosed in the metro cities in January 2022 were estimated to be caused by the Omicron variant . The rise in the positivity rate resulted in the declaration of a third-wave crisis in the country . The spread of SARS-CoV2 widely and causing a high risk of infection has led to a greater opportunity for mutation and possible positive evolution . This variant is known to contain numerous mutations in the RBD of the spike protein which has led to its maximum transmissibility. It has been hypothesized that omicron would have emerged from Alpha that had stemmed from a persistently infected patient who got infected with multiple variants. But as the number of cases is increasing and mutation analysis is becoming more robust, we are deviating from this hypothesis and concluding that omicron is emerging with multiple mutations from different variants . At least 50 mutations differentiate the Omicron variant from the SARS-CoV-2 reference strain. The spike protein alone possesses 30 of the mutations with some overlapping mutations with previously identified strains as well as mutations specific to omicron. Preliminary laboratory studies have shown that Omicron can escape extensively, but incompletely, from neutralizing antibodies in vaccinated and convalescent sera .
Covid 19 pandemic is the first where real-time Whole Genome Sequencing (WGS) was performed to track transmission, identify vaccine targets, and design therapeutics for the virus. The accurate identification and sequencing of SARS-CoV-2 in patient samples provides extensive information, ranging from reducing false negatives and contact tracing to evaluating the suitability of diagnostic assays (particularly nucleic acid-based) and determining whether vaccines are likely to be or remain effective despite genetic recombination’s, diverse mutations during virus replication .
We performed and analyzed WGS of SARS-CoV2 samples sequenced as a part of INSACOG and compared with existing genome sequences of omicron variant available in GISAID from India to study the following:
- Geo-spatial distribution
- Demographic characterization
- Sub lineage proportions
- SNP characterization
2.1 Sequencing of SARS-CoV2 PCR positive isolates 2.1a Clinical specimens:
这项研究是在获得道德ap进行proval vide letter number KIMS/IEC/A012/M/2022. Central Research Laboratory is an ICMR recognized COVID19 testing laboratory set up in Kempegowda Institute of Medical Sciences, a tertiary care hospital and a postgraduate medical college situated in the heart of the Bengaluru city. Nasopharyngeal swabs collected from 01 December 2021 up to the 10th of January 2022 as a part of routine diagnostics were tested for COVID-19 using the ICMR-approved RT-PCR test. The positive samples with a CT Value between 20 to 30 were taken for sequencing as a part of the National SARS COV2 surveillance program INSACOG. The epidemiological metadata was obtained from the SRF forms submitted during the collection of the sample.
2.1b Genome Sequencing:
The 192 samples positive for COVID-19 from RT-PCR were sequenced on the nanopore MK1C platform. Initially, 1200 bp amplicons were generated using Midnight PCR primers by doing as per the manufacturer's specification . The concentrations of cDNA were determined using the dsDNA HS Assay Kit with Qubit 4 Fluorometer. RBK-110-96 rapid barcoding kit  was used to barcode each sample and the sequencing was performed on MIN-106 flowcells.
2.1c Sequence analysis:
Guppy 3.6.0 was used for base calling and demultiplexing the samples with a 450 bp fast base calling configuration . The demultiplexed reads were assembled using nf-core viralrecon pipeline  with medaka g303 model for read-polishing . A minimum of 250bp and a maximum of 1400 bp were used as read-cutoff as per the standards used for midnight protocol. The consensus assembly obtained was used further for variant calling and lineage analysis.
2.2 Retrieval of Genomic Data from the database:
从全球倡议所有Influen分享za Data (Gisaid) , we downloaded 1586 high coverage, complete VOC Omicron GRA(B.1.1.529+BA.*) SARS-CoV-2 genome sequences submitted from different centres across India as of 19th January 2022. Also, the patient information and metadata for the downloaded sequences were retrieved. The sequences with large gaps and more than 50 ambiguous bases were removed. The sequence of the established SARSCoV-2 reference genome (NC_045512.2; Wuhan, December 2019) was downloaded from NCBI for further variant analysis and gene annotations.
2.3 SNP identification and Phylogenetic analysis
2.3a SNP identification and Lineage prediction
The consensus sequence files from in-house sequence analysis and those downloaded from GISAID were combined and were submitted to https://clades.nextstrain.org/ to get the type defining markers and clade assignment. Nextclade, the open-source web-based tool provided Mutation detection, clade assignment, quality checks, and phylogenetic placement of the isolates . The genomic dataset was classified into sub-lineages using PANGOLIN (Phylogenetic Assignment of Named Global Outbreak LINeages) .
2.3b Phylogenetic tree construction
The consensus sequences were aligned to the reference genome to create multiple sequence alignment using MAFFT  and the alignment positions tagged as inappropriate for phylogenetic analysis were removed. Maximum likelihood phylogenetic reconstruction was performed with the curated alignment and a General Time Reversible (GTR) model using FASTTREE with 1000 bootstrap replicates .
2.4 Visualisation of the tree:
The microreact  was used to view the tree & analyze the SNPs and distribution of the sub-lineages across various states in the span of 3 months i.e, November 2021, December 2021, and January 2022.
3.1 Genomic data
A total of 3429 in-house samples were tested on RT-PCR for COVID-19 from December 2021 to January 2022 from Bengaluru, Karnataka. Among them, 235 samples were tested positive and 192 samples that had CT values between 20 to 30 were taken for sequencing. The sequence data provided even genome coverage with high-quality viral genomes to be reconstructed at 500x average depth coverage with an average of 0.1 million reads. Out of 192, the Pangolin tool classified 133 isolates as omicron variants. In addition, 1586 isolates corresponding to the omicron variant are included from GISAID. Thus, for further analysis, we included 1719 isolates.
3.2 Geospatial distribution:
On the 10th of November 2021, the first Omicron isolate was collected and tested from Mumbai, Maharashtra. By the time WHO declared Omicron as the VOC, it had already spread to multiple states like Delhi, Kerala, Gujarat, and Karnataka. As shown in Figure 1a maximum sequences were uploaded in the month of Dec 2021 out of which 29% were classified as Omicron. Then in the month of Jan up to 19th 2022, 84% of the total sequences were classified as Omicron. We can observe the takeover of the omicron variant with respect to other variants. By the beginning of January 2022, the virus had extended its reach to further states like Telangana, Punjab, Andhra Pradesh, etc, and community transmission was seen and the state-wise distribution is shown in Figure 1b.
Figure 1:Geospatial distribution of Omicron Sequences.
Figure 1a:Number of samples in different months. The bar represents the total samples in that month and Red color shows the percentage of samples belonging to the omicron variant
Figure 1b:Number of Omicron samples distributed across different states of India. The height of the bar is proportionate to the number of samples.
3.3 Lineage prevalence
Pango lineage B.1.1.529 was the first to be declared as the Omicron VOC. Due to acquisitions of mutations, as it spread globally, sub-lineages emerged as BA.1, BA.2, and BA.3 .
Until the mid of December 2021, we observed the prevalence of BA.1 sublineage along with
B.1.1.529 in different states of India as shown in Figure 2. Karnataka and Delhi were the first states to report this sublineage in the mid of November 2021.
The first case of BA.2 sublineage was found in Delhi on the 16th of December with a cluster of 22 from a single area. We analyzed the SNP difference within this cluster and found that they had 0-1 SNP, which confirmed it could be a potential outbreak. BA.2 was observed in other states like Gujarat, Telangana, Maharashtra, etc with an incidence of 18% (n/total= 312/1719) by December end. N:S413R, S:A27S, S:V213G, ORF1a:L3027F nonsynonymous mutations were specific to the outbreak. However, it was found in less than 30% of the total isolates from India.
Figure 2:Phylogenetic tree of Omicron sequences between Nov 2021 to Dec 15th, 2021 depicting the prevalence of BA.1 in different states of India. The nodes are colored by states and the metadata block represents the lineages. Can be viewed athttps://microreact.org/project/6hU3Rp3Wf8yDK24E2mBvh5-omicronindiafirst
We observed another cluster from multiple states with an SNP difference of 0-1. The first case in this cluster was from Maharashtra which originated on the 20th of Dec 2021 and spread to Karnataka and Punjab simultaneously. Out of 51 samples in the cluster, 32 of them were from Bengaluru which were our In-house samples. The remaining 18 isolates belonged to Punjab where it was observed after 4 days of the first detection. This could potentially mean that there were multiple introductions to Bengaluru and Punjab from other metropolitan cities that drove the BA.2 spread. SNP analysis showed that the predominant deletions such as N:E31-, N:R32-, N:S33-, ORF9b:E27-, ORF9b:N28-, ORF9b:A29- and substitutions
N:P13L, ORF9b:P10S were detected only in 2% (1/51) and 17.6% (9/51) respectively (Figure 3a&3b).
Figure 3: cluster from multiple states with an SNP difference of 0-1.
Figure 3a:Phylogenetic tree showing the outbreak of BA.2 sublineage in Delhi. The view is available at
Figure 3b:Tree showing multistate outbreak of BA.2 variant, the first blocks showing variant and the second blocks showing the states. The view is available at
ο已经迅速蔓延成为世界dominant variant of SARS-CoV2 by the end of December 2021. The Omicron variant belongs to PANGO lineage B.1.1.529, in comparison to the original SARS-CoV-2 strain, this has 30 amino acid modifications, three modest deletions, and one short insertion in the spike protein. In addition to these mutations, a number of substitutions and deletions in other genomic areas are also present in Omicron . Our study showed the presence of specific mutations in the outbreak isolates and the microevolution of the sublineages. The B.1.1.529 lineage was given the name Omicron after it was designated as a variant of concern. Later, it was discovered that this lineage had its own small variations, and the three most common were designated as B.1.1.529.1, B.1.1.529.2, B.1.1.529.3, or simply known as BA.1, BA.2, and BA.3 . In the month of January, BA.1 is the prevalent Omicron sub-variant in circulation while there is a significant rise in the proportion of BA.2 simultaneously.
根据世界卫生组织,> 98%的οgenetic sequences submitted in global databases till January 15 were that of the BA.1 variant . BA.2 holds most of the characteristics of BA.1 but has some additional mutations that can give it a distinguishing feature. Some of the important mutations in BA.2 that are different from other sub-lineages include the spike protein mutations such as T19I, 24-26 deletion, V213G, T376A, R408S, etc . BA.2 has been discovered in 49 countries, according to the outbreak.info website, which tracks the prevalence of different lineages of this virus around the world . Since the first sequence was available to the public on the 10th of November 2021, we have seen varied proportions of sublineages circulating in India. As seen in the global trend, India also follows the same route.
Omicron was introduced to India in early November 2021. Reports say that it was first seen in Karnataka [28, 29] but the first sequence collected and submitted to GISAID was on the 10th of Nov 2021 from Maharashtra. At the beginning of December 2021, sequences of Omicron were 1% of the total sequences from India. Within a month the Omicron replaced all other variants to become 98% of total sequences. At present 100% of the cases are due to Omicron in India . Even though the reports in early Jan 2022 say that Omicron is in the community transmission stage , the data in GISAID shows that the community transmission was seen in mid of Dec 2021 and was a dominant variant by end of Dec 2021 . Even though in the month of Dec 2021 the number of omicron cases remained less than other variants it had already extended to most of the cities in India signifying the community transmission. While it took a month to increase its frequency to 50% of the total cases in the African continent, Europe, Asia, and the American continent saw 75% of the total cases reported as Omicron in the month of December 2021.
We have BA.1 prevailing in different states with 61% (n=1046/1719) of the total cases analyzed. Following this, BA.2 (n=545/1719) was seen in 37% of the cases. At different time frames, we have noticed the introduction of sublineages in different parts of the country. The first outbreak cluster involved BA.2 introduction in the mid of Dec 2021 which then spread across different states within a few days. We found that both the outbreak clusters were caused by the BA.2 sub-lineage. It indicates that BA.2 is highly variable when compared to BA.1. This is causing higher transmission leading to the community spread of Omicron. During the analysis period, BA.3 sublineage was not found in India.
The study provides an in-depth view of the spread and transmission of the omicron variant in India and showed the community transmission had begun in the mid of December 2021 itself. WGS significantly enhanced understanding of SARS-CoV-2 omicron variant transmission and dynamics of outbreaks, identifying spatial and mutational variation of the same variant and within sublineages. Examination of the genomic epidemiology of Omicron variant across India showed the multiple introduction events at different time spans which led to the 3rd wave of the COVID-19 pandemic.
The work was carried out at the Central Research Laboratory, Kempegowda Medical College, Bengaluru as a part of The Indian SARS-CoV-2 Genomics Consortium (INSACOG).
Conflict of Interests
The authors declare that there is no conflict of interest.
Shreedhanya DM drafted the manuscript with all authors contributing by editing and conducting the laboratory analyses, data collection, curation, and analyses. All authors read and approved the final version of the manuscript.
All supporting data, code, and protocols have been provided within the article or through supplementary data files.
The authors declare no funding was received for the study.
Ethics Approval Statement
This study was conducted after obtaining ethical approval from the institutional ethical committee vide letter number KIMS/IEC/A012/M/2022.
- Weiss SR, Leibowitz JL (2011) Coronavirus pathogenesis.Adv Virus Res81: 85-164. [Crossref]
- Zhu N, Zhang D, Wang W, Li X, Yang B, et al. (2020) A Novel Coronavirus from Patients with Pneumonia in China, 2019.N Engl J Med382: 727-733. [Crossref]
- Harvey WT, Carabelli AM, Jackson B, Gupta RK, Thomson EC, et al. (2021) SARS-CoV-2 variants, spike mutations and immune escape.Nat Rev Microbiol19: 409-424.
- Thakur V, Ratho RK (2022) OMICRON (B.1.1.529): A new SARS-CoV-2 variant of concern mounting worldwide fear.J Med Virol94: 1821-1824. [Crossref]
- 新的证据显示ο可能传播的两倍fast as delta in South Africa, Infectious Disease, 2021.
- South Africa study suggests Omicron could displace Delta, healthcare-pharmaceuticals, 2022.
- Omicron in 90 percent of Metro COVID cases., The New Indian Express, 2022.
- Ghosh P (2021) Omicron and 3rd wave of pandemic in India: What experts, scientific projections have predicted so far.
- Kuchipudi SV (2021) COVID Omicron Variant: How Did It Emerge and Is It More Contagious Than Delta?
- Cele S, Jackson L, Khoury DS, Khan K, Moyo-Gwete T, et al. (2021) Omicron extensively but incompletely escapes Pfizer BNT162b2 neutralization.Nature602: 654-656. [Crossref]
- Meredith LW, Hamilton WL, Warne B, Houldcroft CJ, Hosmillo M, et al. (2021) Rapid implementation of SARS-CoV-2 sequencing to investigate cases of health-care-associated COVID-19: a prospective genomic surveillance study.Lancet Infect Dis20: 1263-1272. [Crossref]
- Freed NE, Vlková M, Faisal MB, Silander OK (2020) Rapid and inexpensive whole-genome sequencing of SARS-CoV-2 using 1200 bp tiled amplicons and Oxford Nanopore Rapid Barcoding.Biol Methods Protoc5: bpaa014. [Crossref]
- Wick RR, Judd LM, Holt KE (2019) Performance of neural network basecalling tools for Oxford Nanopore sequencing.Genome Biol20: 129.
- Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, et al. (2020) The nf-core framework for community-curated bioinformatics pipelines.Nat Biotechnol38: 276-278. [Crossref]
- medaka: Sequence correction provided by ONT Research, 2022.
- Khare S, Gurry C, Freitas L, Schultz MB, Bach G, et al (2021) GISAID’s Role in Pandemic Response.China CDC Wkly3: 1049-1051. [Crossref]
- Hadfield J, Megill C, Bell SM, Huddleston J, Potter B, et al. (2018) Nextstrain: real-time tracking of pathogen evolution.Bioinformatics34: 4121-4123. [Crossref]
- O'Toole Á, Scher E, Underwood A, Jackson B, Hill V, et al. (2021) Assignment of epidemiological lineages in an emerging pandemic using the pangolin tool.Virus Evol7: veab064. [Crossref]
- Katoh K, Misawa K, Kuma K, Miyata T (2002) MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform.Nucleic Acids Res30: 3059-3066. [Crossref]
- Price MN, Dehal PS, Arkin AP (2010) FastTree 2--approximately maximum-likelihood trees for large alignments.PLoS One5: e9490. [Crossref]
- Argimón S, Abudahab K, Goater RJE, Fedosejev A, Bhai J, et al. (2016) Microreact: visualizing and sharing data for genomic epidemiology and phylogeography.Microb Genom2: e000093. [Crossref]
- Kenneth M, EXPLAINED: Why Omicron Is a Family Of Four And How Its BA.1 Sublineage Is The One To Watch Out For, news18, 2022.
- Andre Hudson And Crista Wadsworth, Genome-Sequencing: How Researchers Identify Omicron and Other CoV Variants, Science the wire, Health The Sciences, 2022.
- Chen J, Wei GW (2022) Omicron BA.2 (B.1.1.529.2): high potential to becoming the next dominating variant. Preprint.ArXivarXiv: 2202.05031v1. [Crossref]
- Desingu PA, Nagarajan K (2022) Omicron BA.2 lineage spreads in clusters and is concentrated in Denmark.J Med Virol94: 2360-2364. [Crossref]
- William A. Haseltine, Birth of The Omicron Family: BA.1, BA.2, BA.3. Each As Different as AlphaIs from Delta, 2022.
- Saxena SK, Kumar S, Ansari S, Paweska JT, Maurya VK, et al. (2022) Characterization of the novel SARS-CoV-2 Omicron (B.1.1.529) variant of concern and its global perspective.J Med Virol94: 1738-1744. [Crossref]
- Rhythma Kaul, India’s first Omicron cases detected in Karnataka, Hindustan times, Dec 02, 2021.
- COVID-19: Journey of Omicron in India so far, The Economic Times, Dec 18, 2021.
- Chen C, Nadeau S, Yared M, Voinov P, Xie N, et al. (2022) CoV-Spectrum: analysis of globally shared SARS-CoV-2 data to identify and characterize new variants.Bioinformatics38: 1735–1737. [Crossref]
- Omicron in community transmission stage in India, dominant in multiple metros: INSACOG, 2022.
- Garg R, Gautam P, Suroliya V, Agarwal R, Bhugra A, et al. (2022) Evidence of early community transmission of Omicron (B1.1.529) in Delhi- A city with very high seropositivity and past-exposure.Travel Med Infect Dis46: 102276. [Crossref]