Background

Our client is a service provider to Government of India to conduct internet-based skill assessment at village level along with Aadhar based biometric authentication. The company had been handling one state with a load of about 5000 assessments per day with average session duration of 60 mins. The company was further awarded a new contract to expand pan India, increasing the load to about 50,000-1,00,000 assessments per day. This led to an enormous challenge for the AWS-hosted assessment system as despite scaling up the cloud infrastructure incurring high costs, the unoptimized system was not able to handle the high concurrent load leading to frequent crashes during live sessions. The client approached Decillio with an assignment to redesign their cloud infrastructure and optimize on the costs. Decillio’s mandate was to ensure high availability, high resiliency and implement top notch security and recoverability.

Challenges

Following were the specific objectives of the assignment:

  • Ensure the system was able to handle huge concurrent load with zero tolerance for system crashes

  • Handle 5,00,000-6,00,000 Aadhar authentications on daily basis with a peak concurrency of 30,000 authentication per minute

  • Handle 50,000-1,00,000 assessments everyday of 60 minute duration (along with remote video proctoring) with a peak concurrency of 5,000 assessments per minute

  • Handle about 7,00,000 financial transactions over the payment gateway. These transactions were a part of the test assessing candidates’ technology literacy.

  • Process 3,00,000-4,00,000 inbound emails on daily basis which were a part of the test assessing candidates’ email knowledge.

  • Optimize the cost for server and licenses of different software systems

Our Solution

After multiple rounds of discussions with the client, the teams agreed to do away with the current cloud infrastructure which sufficed for a smaller system but was not designed to handle traffic at the new scale.

Scalability

Following were the key elements of approach towards building a new cloud infrastructure from scratch to meet the objectives to ensure high availability and high resiliency:

  • Leverage both serverless and managed services along with dedicated instances to ensure scalability

  • Changed architecture to stateless micro-services, to ensure horizontal scalability and removing single point of failures, finally resulting in high availability

  • Moved database from non-clustered index to clustered indexes to reduce database IO resulting in improving database performance by over 60%

  • Used Bloom fulters to reduce database search calls

Cost Optimization

Following were the measures to optimize costs:

  • Implemented intelligent auto scaling based on demand prediction.

  • Used mix of spot, on-demand and reserved instance.

  • Used Open Source technologies to reduce cost e.g. replaced SQL server database with Postgres

  • Used serverless functions for un-predicted load to save on the cloud cost

  • Optimized storage cost by dividing data based on frequent and in-frequent access..

  • Scaling of database clusters based on volume, resulting in significant cost savings.

  • Database replication and separation for use case scenario e.g. using RADSHift for analytics.

  • Communication between different microservices, database, storage were done through Intranet, to save on IO cost.

  • Optimized code artifacts storage, integrated with CI/CD pipeline to save on storage cost.

Security

Following were the measures taken to ensure top notch security and resilient recoverability:

  • Dynamic encryption to secure data storage

  • Custom encryption to secure data transmission

  • Data segregation at storage level

  • Database versioning implementation

  • Access level controls ensuring no unauthorized access

  • Handled cross side scripting attacks with text whitelisting strategy

  • All cloud components were put in private network and public endpoints allowed only when necessary

  • Optimized backup strategy to ensure recoverability

The entire project was implemented without any system downtime.

Decillio Impact

  • While the skill assessment sessions went from 5,000 to 1,00,000 assessments per day, crashes during the live sessions were brought down to zero

  • Cloud infrastructure costs reduced by 70%

  • Certin certification obtained as a mandatory security standard of Government of India

Aadhaar is the world's largest biometric ID system used in India with more than 1.3 billion Aadhar cards issued as on 31st October 2021