A professional with 5 years of experience as a Data Engineer, 3 years as a Data Architect and 3 as Big Data Developer. My main skills are related to Azure Stack, Scala, Python, Databricks and Spark with a Microsoft Azure Data Engineer certification. The business areas in which I have been developing as a professional have been direct sales, banking, retail, logistics, and services. Currently I perform the role of Data Architect giving design in a fintech company. My main role is to develop architectures that enables business with new capabilities of data and information such as standard bi, Datawarehouse and streaming data for applications.
During this time, I participated in the following projects:
Project: Synapse Ingestion – Migration from dataflows to pyspark batch processes
Refactor of important data products:
- Definition of standard to develop batch processes with pyspark.
- Refactor of dataflows processes to pyspark to improve pair revisions.
Project: Synapse Spark – Streaming processes
Enabling streaming processes to enable near-real time features for critical data products:
- Developed Azure function trigger processes in python send data to event hubs queues.
- Creation of standard for spark streaming processes.
- Creation of streaming processes for Invoice’s KPIs.
Project: Synapse Serverless – Data Governance refactor
Refactor orchestration and batch architecture:
- Define a mesh architecture to standardize new batch development and data product organization.
- Refactor processes to centralize business logic and cleaning processes.
- Standardize data format for layers of the Datwarehouse.
- Creation of new Synapse services and automation of blackbox tests with pytest.
Project: Synapse Datawarehouse – Interfaces
The Lecacy DW in a SQL VM needed to contribute with data to other third-party using REST API.
Main tasks executed in the project:
- Define new Architecture for the new Datawarehouse.
- Develop a Spark ETL app base framework for batch data processing using wheel deployment.
- Create a RoadMap for an initial migration.
- Interface data to REST API using Synapse Pipelines.
- Develop in-house framework for batch and streaming processing with Pyspark.
Technologies: Azure Storage Gen 2, Azure Synapse Workspace, Azure Key Vault, Azure Logic App.
Project: Integration Freight DB – LFS
The company needed to integrate third party application data with the LFS Freight DB. We build a Python Application than read from the third-party REST API and loaded into the Freight DB, the centralized operational Database.
Main tasks executed in the project:
- Define the application architecture.
- Developed a Python Spark Application to run on Synapse Spark on-demand cluster.
- Source Code is managed in Azure DevOps
Technologies: Azure Storage Gen 2, Azure Synapse Workspace, Python, Databricks, Azure DevOps
During this time, I participated in the following projects:
Project: Synapse Datawarehouse Enhancements
The USA office of Cision consume data from Salesforce and NetSuite for operational reports. Departments relied on Power BI reports to measure sales and marketing goals and budgets.
Main tasks executed in the project:
- Provide a framework that will enable dynamic new sources to the synapse warehouse using synapse workspace pipelines.
- Secure all credentials and passwords with azure key vaults.
- Create alerts for pipeline failures.
- Create data models for finance, marketing, and sales reports.
I proposed a dynamic way to generate T-SQL scripts that create new tables (raw vault and mart) into the Datawarehouse based on configuration tables.
Technologies: Azure Storage Gen 2, Azure Synapse Workspace, Azure Key Vault, Azure Logic App.
Project: Unpaid Invoice Recommendation Grouping
The USA office of Cision need to process unpaid invoices. They decided to have grouped them by reason of not payment. Some of the invoices were poorly documented so text analysis on the invoice’s reasons was not enough. Analysis on related data like claims and notes was required to be properly grouped and managed.
Main tasks executed in the project:
- Prepare Invoice data.
- Developed a python script to run in notebook to process invoice text data.
- Automate python notebook using spark pools in synapse.
- Create reports on top of results
Technologies: Azure Storage Gen 2, Azure Synapse Workspace, Spark Pools, Pyspark, python, Azure Key Vault, Azure Logic App.
Project: HR Centralized data
The USA office of Cision had many HR applications than support many processes. The main issue was data integrity between all systems. They decided to integrate every system source into a database where they push a consolidated csv file that we are needed to add into warehouse and final mart model for HR.
Main tasks executed in the project:
- Coordinate access to azure data lake with Cision Development team.
- Create pipeline to ingest csv file from data lake into stage tables.
- Create pipeline to process data into raw vault in Datawarehouse.
- Create pipeline to add new data into HR model.
Project: Azure DevOps Repository Configuration
New developer joins the BI team, so we need an organized way to collaborate. We decided to integrate with azure DevOps to stored synapse workspace code and to deploy new features into prod environment. Configure Azure App Service to communicate resource group security, workspace security and devops
processes.
- Test branch security and policies.
- Create deploy pipelines for azure synapse workspace.
- Assigned as Senior Data Engineer in a outsource account in a developer squad. Among my responsibilities is the development of Spark Batch and streaming applications with Scala to load information into NoSQL databases. We work with Scrum Methodologies using Jira. Regarding code management and deployment, we use Bitbucket and Jenkins. Most of the data is consumed by back-end applications that are used in banking apps.
- Part of my responsibility is to intervene in new project proposals for our clients. I actively participate in the design of the solution, most of them are data related projects for large datasets in semi-structure and structured format.
- I’m also responsible for the technical interview of new applicants in Azure Data Services related roles. Some capacitation and training are also part of my current responsibility for new team members.
During this time, I participated in the following projects:
Project: Fatca Batch Process (Nov 2020 – March 2021)
Some master data for foreign customers need to be process and encrypted using key salt algorithm. Data came from the main Data Warehouse and was sink in an Azure SQL database using Databricks clusters. The data needed to be consumed by and API that was already developed. All the batch process was developed in Scala.
Main tasks executed in the project:
- Identify the sensitive data.
- Develop batch application using bank’s Databricks libraries for batch processing.
- Start continuous integration approval workflow.
- Coordinate certification process for the application.
- Coordinate production process for the application.
I proposed a standard to record the progress of the process in a sql table to monitor in real time.
Technologies: Scala, Databricks, Azure Storage Gen 2, Azure Data Factory, Jira, Bitbucket.
Project: RCC to SalesForce Project (Feb 2021 – May 2021)
The regulatory entity SBS, provides summaries on the status of debts of each person that the bank needs to expose in an API for the consumption of its internal web portal. The data comes relatively filtered from a bank's data lake.
Main tasks executed in the project:
- Collect the transformation requirements from the Salesforce team.
- Design document database in Cosmos DB.
- Develop batch application using bank’s Databricks libraries for batch processing.
- Start continuous integration approval workflow.
- Coordinate certification process for the application.
- Coordinate production process for the application.
I proposed an improvement in the delta lake layer strategies by adding a business layer to reused them in the future.
Technologies: Scala, Databricks, Azure Storage Gen 2, Azure Cosmos DB, Azure Data Factory, Jira, Bitbucket.
- In charge of designing and advising on improvements in architecture and analytical that are distributed in different processes, Hive processes, NoSQL Databases, Azure Stream Analytics and Streaming applications on Spark.
- Responsible for the analysis of the application’s operations that processes the CPE (Electronic Payment Receipt) and business analytics for SUNAT. All dashboards that consumed incoming data were worked in Power BI.
- Part of my responsibility was to interact with those in charge of infrastructure and the application to identify / diagnose bugs that arrive at reports among other data defects. The entire architecture is oriented to Big Data services in Azure: From Cosmos DB as a database of taxpayer documents to the data processed in SQL tables for operational reports.
Achievements:
- Improvement of the health indicators of the CPE solution.
- Improvement of the streaming application in the resource consumption in 40% (from 32 nodes cluster to 20 nodes)
- Automation of HDInsight Clusters with Azure DevOps.
- Development of the Streaming Solution for Application Insights source for real-time reporting in Power BI.
During this time, I participated in the following project:
Project: Comprobante Electronico SUNAT
SUNAT is an entity in charge of taxes in Peru. Microsoft developed a multi-component solution that processes electronic taxpayer invoices. All the moving parts were Azure services. Part of the solution was deployed as microservices and the other part as a real-time process of invoices for analysis. The solution replaces some of the on-premises solution for invoice tax analysis.
Main tasks executed in the project:
- Develop new features and improve the streaming and batch Scala application deployed in the HDInsight clusters.
- Develop monitoring solutions for the applications.
Save project budget by improve streaming application performance. The 32 nodes cluster was reduced to 24 node cluster. The Spark and Hive Cluster.
Technologies: Scala, Apache Spark, Apache Hive, Azure Data Factory, Azure Service Fabric, Azure Application Insight, Azure Cosmos DB, Azure Stream Analytics, Power BI.
Project: Dynamic Database for Finance App
Mobiik had a very important service of applications that allow Financial entities to create forms to get information from potential customers to evaluate them.
- In charge of the Factory BI / BA team (ETL, Data Warehouse, Reporting, Dashboards and Automation of Predictive Models) made up of five Data Engineer Developers. This was done by working with the Technical Leader of the Factory Team. Part of my role was to measure that the Factory complies with the standards and SLA agreed with the client teams. The software products that were used on our proyects were: SAP HANA, Azure SQL Data Warehouse, Azure Data Bricks, SAP Lumira, Azure Machine Learning (mostly R), SSRS, SQL Server 2014 and Power BI.
- One of my main activities was to ensure that the team correctly applied Data Vault 2.0 as a methodology for the architecture of the Data Warehouse in SAP HANA. Likewise, ensure the scalability of the technological components to interact with business applications, both source and destination.
- As an architect, I constantly interacted with the operation and support teams of the data integration and advanced analytics processes once the projects were delivered in productive environments.
Achivements:
- Design and creation of the data lake environment for machine learning source consumption.
- Automation of first cloud client leaving probability model for direct-sales business unit.
- Creation of the predictive results database in Azure Synapse. This database centralizes all the scores that the machine learning processed.
During this time, I participated in the following projects:
Project: Consultant Leaving Probability Model
Company needed to know when a consultant was probable to leave the company during the first year of work.
Main tasks executed in the project:
- Collect data from data warehouse to Azure Synapse.
- Develop the model in R Script in Azure Machine Learning.
- Automate process to update model with Azure Data Factory.
- Automate process to scored monthly consultants data with Azure Data Factory.
- Develop Power BI dashboard for managers.
- The automation of the life cycle of the model.
Technologies: Azure Synapse, Azure Data Lake, Azure Data Factory, SQL Server 2014, Azure Machine Learning, R Script.
Project: Yanbal Data Lake in Azure (May 2019 – Oct 2019)
Company needed a data lake for chatbot incoming data for future analysis and reporting.
Main tasks executed in the project:
- Collect data from azure table stores to azure data lakes.
- Develop Azure function python application to process data.
- Develop Power BI dashboards for managers.
Design and develop of the solution in Azure.
Technologies: Azure Synapse, Azure Data Lake, Azure Data Factory, SQL Server 2014, Azure Machine Learning, R Script.
- Responsible for leading the BI / BA implementation team made up of six external consultants.
- Centralize and analyze the requirements of each implementation in an integral manner for the Corporation and each Country.
- Responsible for the analytics architecture and data architecture with SAP HANA, SAP Lumira, Azure Machine Learning, SSRS, SQL Server 2008/2012/2014 and Power BI.
- Responsible for the definition of process, procedures, and standard for the development factory.
Achivements:
- Project of Integration of Google Analytics with SAP HANA: Implement the integration of information of the applications in the Mobile Business Applications to Analyze them in an integral way from the Warehouse.
- Benchmark of NoSQL platforms for data loads: Research and benchmark about Cassandra, HBase and MongoDB.
- Optimization project for the descriptive (OLAP) Marketing Model in SAP HANA: It was possible to implement the optimization of the load times for the corporate marketing model allowing to have the information available at 6 am of every close of week. Also, the optimization allowed a daily data refresh for
a near real time for users.
- Project Yanbal Global Analytics: Implementation of the Analytical Models Portfolio for Finance, Sales and Commercial Support Operations. It was possible to centralice information of several years with other systems with the current systems.
- Creation of the Data Integration Management (DIM) team: The design of a team that centralized and ordered the development in the whole platform was proposed in order to avoid islands of information and promote a more formal govern for data.
- Establishment of the development process with GitLab: The formal process of the Factory team was designed and instituted to improve the control of versions and code promotion to production with the use of the GitLab tool.
- Data Vault implementation in SAP HANA: The Data Vault methodology was researched and implemented in order to model corporate data in a historical manner avoiding redundant data in 90% of the warehouse.
- Implementation of Predictive Models with Azure Machine Learning: A hybrid architecture was proposed to reduce property costs for prediction and statistical tools. The design for new models is developed with a SaaS (Azure Machine Learning Studio) and the deployment is supported with a PaaS of Azure services.
- Implementation of Hybrid Cloud Architectures: Cloud load automation processes were implemented that consume Azure ML predictive models via Web Service. The results are inserted into On-Premise servers.
The display layer and selft service is integrated with Office365 / SharePoint and PowerBI.
- Developed an Azure SQL database for reporting operational data and application integration for predictive results from Clients that will potentially stop buying from us.
During this time, I participated in the following projects:
Project: Integration Google Analytics and SAP HANA
Company needed to analyze their mobile usability data from google analytics and consumed from SAP HANA.
Main tasks executed in the project:
- Found driver for API consumption in SAP Data Services.
- Develop ETL for staging data.
- Develop model for data delivery.
- Develop Power BI dashboard for managers.
The automation of the process form Google analytics to SAP HANA in-memory models.
Technologies: SAP HANA, SAP Data Services, Power BI.
Project: Optimization Near Real Time Model in SAP HANA
Company needed to analyze operational sales data in corporative models in HANA. The models were having response time latency.
Main tasks executed in the project:
- Analyze main join’s bottleneck in view model.
- Develop new views for optimize summary data.
- Optimization of SAP HANA views latency.
Technologies: SAP HANA, SAP Data Services, Power BI.
Project: DataWarehouse migration to SAP HANA
Company needed to upgrade their Datawarehouse and data delivery, so they invested in SAP HANA to migrate all its legacy warehouse.
Main tasks executed in the project:
- Documentation from data sources and legacy ETL process.
- Develop of migration road map.
- Team assembly for model and ETL development.
- Kickoff and Monitoring of develop.
- Incremental Go Live with new Model delivery.
Design of the data model with Data Vault 2.0. Training of the developers in SAP HANA.
Technologies: SAP HANA, SAP Data Services.
Project: Analytical Marketing Model in Tabular model
Company needed to consolidate all sales data in detail to do marketing analysis about their commercial strategies and promotions. The model needed to be a product, commercial condition, customer, and order number detail level. The model had to have all business unit which means to copy data from different
Datawarehouses.
Main tasks executed in the project:
- Map all main data marts models for detail required.
- Develop ETL process to consolidate data.
- Develop Analysis Services Tabular Model.
- Provision the Microsoft Analysis Service Instance.
- Deploy model.
- Go Live and user’s capacitation.
Design and develop the entire solution.
Technologies: Microsoft Analysis Services, Microsoft SQL Server, Microsoft Integration Services. Microsoft Windows Server and Microsoft Power BI.
- BI analyst responsible for gathering requirements, designing and implementing BI solutions with MSSQL, SSIS, SSRS and SSAS.
- After a few months my assignments focused on making only focused solutions in attending Marketing. Achivements:
- Corporate Marketing Model Project: The amount of ADHOC requirements on marketing and sales information crossings was reduced by 60% with the implementation of the tabular model in Microsoft SQL Server 2012.
- Project for the Separation of Business Mart in Europe: Separation of business marts and ETL process for Spain and Italy Marts so that each of them can work in a separate SQL Server for specific analyzes of each country.
- Internal Optimization Project: The redundancy of summary tables was reduced to meet marketing and sales requirements with the implementation of a single Commercial Fact table that served as a source for any reporting needed in Reporting Services.
- BI Developer in charge of migration of Warehouse SQL Server for the Business Units of Spain and Italy.
- Attention of Marketing BI requirements.
Achivements:
- Project for the Separation of Business Mart in Europe: Separation of business marts and ETL process for Spain and Italy Marts so that each of them can work in a separate SQL Server for specific analyzes of each country.
- Cognos DataManager Developer
- DataManager Developer
- SSIS (Integration Services) Developer
- SSRS (Reporting Services) Developer
- Optimization of Oracle packages and batch work packages.
- Automation of data loading processes with DataManager (Mapfre Project).
- Automation of data loading processes in DataStage (COFIDE Project).
- Development of Dashboards and reports with Reporting Services (UNIQUE Project).
- Implementation of the Starsoft ERP.
- In charge of the administration of the systems infrastructure, ERP and IT budget. Implementation of the integrated management system of the company and of the detection and correction of 'bugs' in coordination with the supplier as well as the personalization of reports.
- Management and development of BI projects at low and medium scale, generating automated reports for analysis and controls within the company's processes. (SSIS / SSRS / SSAS) - Administration and tunning of the database in SQL SERVER.
- Development of small software for the automation of business processes in c # and visual basic. (Reservation management system / Attendance ticket management system) - Administration of domain servers and active directory (Windows Server 2008 R2).
- Implementation of disaster recovery plans for database information as well as the configuration of backup routines and optimization practices.
- Training of users in the use of the Starsoft system.
- Training of users in the use of MS Excel.
- In charge of the administration of the systems infrastructure, ERP and IT budget. Implementation of the STARSOFT business management system. Management of computer equipment to meet the demand in mining projects, support to administrative users.
- Administration of the company's database with development, administration and tuning works in the SQL SERVER 2008 R2 platform.
- Training of users in the use of the Starsoft system