Consulting Webflow Template - Galway - Designed by Azwedo.com and Wedoflow.com
United States Department of Defense
Data Resurgence: Advanced Analytics Driving Cost Breakdowns
Providing transparent cost breakdowns with AWS and Databricks.
Date
January 20, 2025
Topic
United States Department of Defense

Background

The United States Department of Defense (DoD) is an executive branch department of the federal government of the United States charged with coordinating and supervising all agencies and functions of the U.S. government directly related to national security and the United States Armed Forces. One of the subordinated agencies within the DoD is the Defense Logistics Agency. The Defense Logistics Agency (DLA) is a combat support agency in the United States Department of Defense (DoD). DLA provides supplies to the military services and supports their acquisition of weapons, fuel, repair parts, and other materials.

Eric was brought onto the Advana programs to support activities directly supporting the DLA’s global operations. During the two year long tenure, he worked on 3 large scale projects. 

Architecture

Amazon Web Services, Databricks, Tableau, Qlik

Implementation

USA Spending

In the procurement subject space, there was a large need for large scale data transformation services to understand contract spending across thousands of contracts. The USA Spending government website provides a rich suite of API endpoints from which extensive analytics can be performed.  DLA officials needed a daily pipeline built using Databricks. Databricks is a data processing application with powerful databases built right into the tool. Storage of files was handled by Amazon Web Services (AWS) S3 and databases.  Eric started by interacting with USA Spending API endpoints to bring data into a landing zone. Once data is downloaded to the landing zone, the data is scanned by security to ensure secure data is being transmitted to the platform. Databricks advocates for a medallion architecture which includes bronze, silver, and gold layers to process data effectively.  Eric followed the medallion approach when transforming data using databricks jobs and scheduled the notebooks for daily updates. The job started in the development environment as DLA officials worked to integrate the data into Qlik dashboards. After 2 months of running in development, the pipeline was promoted to production to power Qlik dashboards utilized by DLA officials.

Procurement Integrated Enterprise Environment (PIEE)

The Procurement Integrated Enterprise Environment (PIEE) is the primary enterprise procure-to-pay (P2P) application for the Department of Defense and its supporting agencies and is trusted by companies reporting over $7.1 billion in spending.PIEE is a massive application generating terabytes worth of data. My task was to replicate the Oracle PIEE databases into Databricks to power Qlik dashboards for DLA officials. 

In the PIEE environment, Database Migration Services (DMS) was used to read the change log from the Oracle databases and write the change log to the landing zone for security scans. There were over 120 tables that were being replicated to the landing zone so a unique approach was taken to manage all of the tables.

I built a pipeline that scanned all of the locations in S3 to collect various information such as size, location, latest file processed, etc. I enriched the metadata with primary and secondary keys to make merging the data easier in downstream steps. This metadata table was a critical integration to unblock data engineers in the development of the merging pipelines. 

The merging pipelines utilized primary key and column names to dynamically build merge statements for each of the 124 tables. One of the largest tables in the database started at 6 billion rows and scaled by over 100 million records per month. 

As the data started to flow in the data warehouse, DLA officials were able to start building analytics views to answer questions that could never have been answered to this point. DLA is continuing this analytics work to answer many questions that a procurement official is typically asked in the performance of their duties in the DoD. 

Pipeline Reporting and Insights Systems Monitoring (PRiSM)

PRiSM is a platform monitoring solution that I recommended to be built so that efficient monitoring could be achieved for engineering teams managing more than 15 pipelines.

Before the development of the solution, engineers would receive an email if pipelines failed overnight. This worked well when the Advana platform was small but with over 100,000 end users and 3,000 people building on the platform, this approach was not scalable. 

I sought to build a solution that integrated information from the databricks API and surface that data in the dashboard that was accessible to the engineering team.

I started development in March 2024 and built the bulk of the application through the summer. I gathered data about job permissions, database growth metrics, pipeline failures, and pipeline metadata. The application was built with the foundational building block being the pipeline number and the name of the databases to gather growth metrics. In development, I built the pipeline to monitor over 100 pipelines and 14 databases. 

With this section of the application built, the focus of the project started to shift to the dashboard. The dashboard made it simple for engineers to look at a single piece of glass and know what pipelines failed over the past 24 hours. This piece of information is essential to responding quickly to pipeline failures and minimizing the impact to customers during early morning business hours. 

The application quickly started to grow more popular with other engineering teams asking for demonstrations on a weekly basis. Some of the engineering teams that expressed interest were: financial analytics, acquisition analytics, and enterprise applications. 

The project was a screaming success for the procurement analytics team leading to multiple members receiving promotions.  

Supporting Images

PRiSM Dashboard Slide

PRiSM Unified Modeling Language (UML) Diagram