Contract Management Application (Spark/Scala/SBT)
This project showcases a Scala-based application developed with Apache Spark to process and manage contract data. The application supports multi-format file ingestion, data transformation, and analytical operations. It was created as a forked learning project, designed to enhance skills in big data processing and software development. This application provides a practical example of how modern technologies can be leveraged to extract insights from contract data, calculate key financial metrics, and determine contract statuses based on defined rules. Note: This is an educational project. The original source code is owned by the repository's creator, and it is not distributed under the MIT license.
APACHE SPARKSCALADATA TRANSFORMATION
1 min read
Main Features
Multi-Format File Reading
Supports various input file formats: JSON, CSV, Parquet, ORC, and XML.
Provides a flexible reader architecture for seamless data ingestion.
Advanced DataFrame Operations
Implements a JsonReader class to read JSON files based on custom configurations.
Leverages Spark SQL functions for efficient data manipulation and transformation.
Total Cost Calculation
Calculates Total Cost (TTC) using the formula:TTC=HTT+(TVA×HTT)TTC=HTT+(TVA×HTT)
Rounds results to two decimal places and removes intermediate columns (TVA, HTT) for cleaner outputs.
Date and City Extraction
Parses and extracts contract end dates (Date_End_contrat) in the YYYY-MM-DD format.
Extracts city information (Ville) using advanced string parsing methods.
Contract Status Determination
Adds a new column, Contrat_Status:
"Expired" for contracts with an end date in the past.
"Actif" for contracts still in effect.
Technology Stack
Development Environment: IntelliJ IDEA
Programming Language: Scala 2.12.15
Framework: Apache Spark
Build Tool: SBT (Simple Build Tool)
Java Version: Java 1.8