Contract Management Application (Spark/Scala/SBT)

This project showcases a Scala-based application developed with Apache Spark to process and manage contract data. The application supports multi-format file ingestion, data transformation, and analytical operations. It was created as a forked learning project, designed to enhance skills in big data processing and software development. This application provides a practical example of how modern technologies can be leveraged to extract insights from contract data, calculate key financial metrics, and determine contract statuses based on defined rules. Note: This is an educational project. The original source code is owned by the repository's creator, and it is not distributed under the MIT license.

APACHE SPARKSCALADATA TRANSFORMATION

1 min read

Main Features

  • Multi-Format File Reading

    • Supports various input file formats: JSON, CSV, Parquet, ORC, and XML.

    • Provides a flexible reader architecture for seamless data ingestion.

  • Advanced DataFrame Operations

    • Implements a JsonReader class to read JSON files based on custom configurations.

    • Leverages Spark SQL functions for efficient data manipulation and transformation.

  • Total Cost Calculation

    • Calculates Total Cost (TTC) using the formula:TTC=HTT+(TVA×HTT)TTC=HTT+(TVA×HTT)

    • Rounds results to two decimal places and removes intermediate columns (TVA, HTT) for cleaner outputs.

  • Date and City Extraction

    • Parses and extracts contract end dates (Date_End_contrat) in the YYYY-MM-DD format.

    • Extracts city information (Ville) using advanced string parsing methods.

  • Contract Status Determination

    • Adds a new column, Contrat_Status:

      • "Expired" for contracts with an end date in the past.

      • "Actif" for contracts still in effect.

Technology Stack

  • Development Environment: IntelliJ IDEA

  • Programming Language: Scala 2.12.15

  • Framework: Apache Spark

  • Build Tool: SBT (Simple Build Tool)

  • Java Version: Java 1.8