Building robust, web-based applications backed by solid software practices
Data Engineering Case Study
Munich Re’s Data Engineering team (DE), within the Integrated Analytics group, brings amazing models to life. Data Engineers wear many hats: they are designers, developers and debuggers. Our team creates and maintains the infrastructure that our business partners use to integrate the insights from data science into pre-existing or novel workflows.
Our DE team recently delivered a web-based, client-facing application that is now used in the client’s daily underwriting workflow – allowing the client to receive case data and deliver model scores through API requests. In this case study, we explore the many design decisions and technical expertise that made it possible to build and integrate a non-trivial underwriting solution within a short timeframe. This is the software development and deployment cycle in action, customized to align with the evolving insurance industry and our client’s specific needs.
Designing the application
Based on client requirements and our standard software best practices, the Data Engineering team approached the project by focusing on two primary aspects:
- A web-based application
- Receives and automatically processes life insurance applicant data
- Returns model-based underwriting decision and other relevant information
- Easily integrate data flow with existing underwriting workbench
- Ensure the app is continuously available without interruption
- Secure data at all steps
- Implement monitoring to detect anomalie
The process followed an Agile framework, an approach to software development that encourages fast, frequent deliverables and high levels of collaboration. As features were developed, they were deployed and made available for the client to test in real-time, resulting in iterative improvements. The main advantage of using this project management structure is both to ensure the client’s needs were met and equally distribute the code development, feature improvements, testing, bug fixes and maintenance amongst team members.
To that end, DE devised a secure and robust technology stack that leveraged the best-suited technologies for the task. The stack consists of two main parts: the backend (a collection of the pieces responsible for processing and storing the data) and the frontend (a client-facing user interface). The table below highlights a few technical components of the application and the rationale for selection. The design decisions were oriented around a cloud native architecture, which does not require a physical, on-site server. Security was also given due concern, as discussed in detail later in this article.
|App Component||Technology||What is it?||Why is it used?|
|Backend||Flask (with gunicorn)||Python libraries and tools for creating web-based applications.||Allows for lightweight applications that can handle complexity. Also used to distribute API request loads.|
|CosmosDB||A NoSQL Azure database that supports SQL querying. Enables data retrieval via PowerBI.||Stores the parsed details of requests submitted. It is designed for fast retrieval of items and does not require a schema.|
|Azure Blob Storage||Storage for unstructured data (like model object files).||Provides a simple service for saving and loading files in the cloud.|
|Deployment||Docker||Designed to make it easier to create, deploy, and run applications by using containers.||Allows a developer to package up the needed parts for an application (libraries, dependencies, etc.) and ship it as one package.|
|Azure Pipeline and Azure App Service||Continuous integration and continuous deployment tool. Code is deployed to “slots” set up in App Service, with separate regions for Development, Non-Production (also called UAT), and Production.||Streamlines the integration of new changes into the system by automatically building, testing, and deploying application code.|
Making the application available to the client
The application runs on Azure Cloud Services, and heavily utilizes Azure Pipeline to build and deploy the application that is set up using Azure App Services. When changes are made to the code or to restart the application, Azure Pipeline runs a predefined series of jobs encompassing the app’s components, including automated testing. This process is commonly called continuous integration and continuous deployment (CI/CD), where manual invention is limited and the testing environments are identical copies of the production environment where the final application runs.
- Developer commits new changes via pull requests to Azure Repos.
- If the tests pass, another developer will peer-review and approve changes.
- On approval, Azure Pipeline is triggered to create containerized, development instances of the application and run automated unit tests and check code coverage.
- Once the applicated is tested in the development region of the application, Azure Pipeline is triggered to deploy the new code to a User Acceptance Testing (UAT) environment, which is otherwise identical to the production environment where the final app runs.
- The client runs regression tests on this copy of the application to validate the changes.
- Once the client approves the change, this version of the application is moved to production.
Security: Protecting the application
Protecting the application
Security is paramount since data submitted to the API contains personally identifiable information, and the model response is also proprietary. During development, sensitive material was received from the client using Secure File Transfer Protocol (SFTP) rather than email. SFTP encrypts everything by default. As part of deployment, all traffic to and from the API is securely encrypted. To ensure no malicious user can make API requests, the API endpoint is accessible only to users that have an API access key which has been authenticated by Azure App Services. When the request data is received by the API endpoint, it is decrypted and securely stored in CosmosDB, which is also encrypted by Azure. In addition, the production and non-production app instances are isolated from each other and use different databases with different access keys, which are safely stored in Azure Key Vault. If a malicious user ever got access to the API key, that user could post requests, but would not be able to read any of the data stored in the databases.
Using App Services has additional benefits – it helps monitor the web app by connecting to Azure Monitor and Azure Log Analytics, which help in viewing the request and response rates, exceptions, diagnostics, session counts and much more in an easy-to-view dashboard. This ensures that any problem with the app is diagnosed in real-time, maximizing availability and reducing downtime.
This case study presents a high-level view of the decisions and processes that Data Engineering undertook to deploy machine learning model predictions as a web-based service for our client. Throughout, we considered various choices for cloud architecture and security, with some options ruled out due to the client’s requirements or after testing. As Data Engineers, we strive to improve and as such our continuous integration/continuous deployment (CI/CD) process was also enhanced over time; for example, resource allocation between the production and non-production applications was modified in order to optimize performance. The application is stable in production, returning secure, real-time results to the client.
Integrated Analytics provides white-labelled, customized analytics solutions for Munich Re Life US’s clients. To that end, Data Science and Data Engineering will work hand in hand with a client to develop products that meet business needs and technical requirements. Data Engineering provides the expertise in cloud architecture and software engineering to make real-time, secure, and accurate model serving a reality.