DE Jobs

Search from over 2 Million Available Jobs, No Extra Steps, No Extra Forms, Just DirectEmployers

Job Information

Fusemachines Sr. Data Engineer Azure Databricks in Kathmandu, Nepal

About Fusemachines

Fusemachines is a leading AI strategy, talent, and education services and products provider. Founded by Sameer Maskey Ph.D., Adjunct Associate Professor at Columbia University, Fusemachines has a core mission of democratizing AI. With a presence in 4 countries (Nepal, United States, Canada, and Dominican Republic and more than 400 full-time employees). Fusemachines seeks to bring its global expertise in AI to transform companies around the world.

About the role

This is a full-time position responsible for designing, building, and maintaining the infrastructure required for data integration, storage, processing, and analytics (BI, visualization and Advanced Analytics).

We are looking for a skilled Senior Data Engineer with a strong background in Python, SQL, PySpark, Azure, Databricks, Synapse, Azure Data Lake, DevOps and cloud-based large scale data applications with a passion for data quality, performance and cost optimization. The ideal candidate will develop in an Agile environment, contributing to the architecture, design, and implementation of data products , including migration from Synapse to Azure Data Lake. This role involves hands-on coding, mentoring junior staff and collaboration with multi-disciplined teams to achieve project objectives.

Qualification & Experience

  • Must have a full-time Bachelor's degree in Computer Science or similar

  • At least 5 years of experience as a data engineer with strong expertise in Databricks, Azure, DevOps, or other hyperscalers.

  • 5+ years of experience with Azure DevOps, GitHub.

  • Proven experience delivering large scale projects and products for Data and Analytics, as a data engineer, including migrations.

  • Following certifications:

  • Databricks Certified Associate Developer for Apache Spark

  • Databricks Certified Data Engineer Associate

  • Microsoft Certified: Azure Fundamentals

  • Microsoft Certified: Azure Data Engineer Associate

  • Microsoft Exam: Designing and Implementing Microsoft DevOps Solutions (nice to have)

    Required skills/Competencies

  • Strong programming Skills in one or more languages such as Python (must have), Scala, and proficiency in writing efficient and optimized code for data integration, migration, storage, processing and manipulation.

  • Strong understanding and experience with SQL and writing advanced SQL queries.

  • Thorough understanding of big data principles, techniques, and best practices.

  • Strong experience with scalable and distributed Data Processing Technologies such as Spark/ PySpark (must have: experience with Azure Databricks ), DBT and Kafka, to be able to handle large volumes of data.

  • Solid Databricks development experience with significant Python, PySpark, Spark SQL, Pandas, NumPy in Azure environment.

  • Strong experience in designing and implementing efficient ELT/ETL processes in Azure and Databricks and using open source solutions being able to develop custom integration solutions as needed.

  • Skilled in Data Integration from different sources such as APIs, databases, flat files, event streaming.

  • Expertise in data cleansing, transformation, and validation.

  • Proficiency with Relational Databases (Oracle, SQL Server, MySQL, Postgres, or similar) and NonSQL Databases (MongoDB or Table).

  • Good understanding of Data Modeling and Database Design Principles. Being able to design and implement efficient database schemas that meet the requirements of the data architecture to support data solutions.

  • Strong experience in designing and implementing Data Warehousing, data lake and data lake house, solutions in Azure and Databricks.

  • Good experience with Delta Lake, Unity Catalog, Delta Sharing, Delta Live Tables (DLT).

  • Strong understanding of the software development lifecycle (SDLC), especially Agile methodologies.

  • Strong knowledge of SDLC tools and technologies Azure DevOps and GitHub, including project management software (Jira, Azure Boards or similar), source code management (GitHub, Azure Repos or similar), CI/CD system (GitHub actions, Azure Pipelines, Jenkins or similar) and binary repository manager (Azure Artifacts or similar).

  • Strong understanding of DevOps principles, including continuous integration, continuous delivery (CI/CD), infrastructure as code (IaC – Terraform, ARM including hands-on experience), configuration management, automated testing, performance tuning and cost management and optimization. 

  • Strong knowledge in cloud computing specifically in Microsoft Azure services related to data and analytics, such as Azure Data Factory, Azure Databricks , Azure Synapse Analytics , Azure Data Lake , Azure Stream Analytics, SQL Server, Azure Blob Storage, Azure Data Lake Storage, Azure SQL Database, etc.

  • Experience in Orchestration using technologies like Databricks workflows and Apache Airflow.

  • Strong knowledge of data structures and algorithms and good software engineering practices.

  • Proven experience migrating from Azure Synapse to Azure Data Lake, or other technologies.

  • Strong analytical skills to identify and address technical issues, performance bottlenecks, and system failures.

  • Proficiency in debugging and troubleshooting issues in complex data and analytics environments and pipelines.

  • Good understanding of Data Quality and Governance, including implementation of data quality checks and monitoring processes to ensure that data is accurate, complete, and consistent. 

  • Experience with BI solutions including PowerBI is a plus.

  • Strong written and verbal communication skills to collaborate and articulate complex situations concisely with cross-functional teams, including business users, data architects, DevOps engineers, data analysts, data scientists, developers, and operations teams.

  • Ability to document processes, procedures, and deployment configurations.

  • Understanding of security practices, including network security groups, Azure Active Directory, encryption, and compliance standards.

  • Ability to implement security controls and best practices within data and analytics solutions, including proficient knowledge and working experience on various cloud security vulnerabilities and ways to mitigate them. 

  • Self-motivated with the ability to work well in a team, and experienced in mentoring and coaching different members of the team.

  • A willingness to stay updated with the latest services, Data Engineering trends, and best practices in the field.

  • Comfortable with picking up new technologies independently and working in a rapidly changing environment with ambiguous requirements.

  • Care about architecture, observability, testing, and building reliable infrastructure and data pipelines.

    Responsibilities

  • Architect, design, develop, test and maintain high-performance, large-scale, complex data architectures, which support data integration (batch and real-time, ETL and ELT patterns from heterogeneous data systems: APIs and platforms), storage (data lakes, warehouses, data lake houses, etc), processing, orchestration and infrastructure. Ensuring the scalability, reliability, and performance of data systems, focusing on Databricks and Azure.

  • Contribute to detailed design, architectural discussions, and customer requirements sessions.

  • Actively participate in the design, development, and testing of big data products..

  • Construct and fine-tune Apache Spark jobs and clusters within the Databricks platform.

  • Migrate out of Azure Synapse to Azure Data Lake or other technologies.

  • Assess best practices and design schemas that match business needs for delivering a modern analytics solution (descriptive, diagnostic, predictive, prescriptive).

  • Design and implement data models and schemas that support efficient data processing and analytics.

  • Design and develop clear, maintainable code with automated testing using Pytest, unittest, integration tests, performance tests, regression tests, etc.

  • Collaborating with cross-functional teams and Product, Engineering, Data Scientists and Analysts to understand data requirements and develop data solutions, including reusable components meeting product deliverables. 

  • Evaluating and implementing new technologies and tools to improve data integration, data processing, storage and analysis.

  • Evaluate, design, implement and maintain data governance solutions: cataloging, lineage, data quality and data governance frameworks that are suitable for a modern analytics solution, considering industry-standard best practices and patterns.

  • Continuously monitor and fine-tune workloads and clusters to achieve optimal performance.

  • Provide guidance and mentorship to junior team members, sharing knowledge and best practices.

  • Maintain clear and comprehensive documentation of the solutions, configurations, and best practices implemented.

  • Promote and enforce best practices in data engineering, data governance, and data quality.

  • Ensure data quality and accuracy.

  • Design, Implement and maintain data security and privacy measures.

  • Be an active member of an Agile team, participating in all ceremonies and continuous improvement activities, being able to work independently as well as collaboratively.

    Equal Opportunity Employer: Race, Color, Religion, Sex, Sexual Orientation, Gender Identity, National Origin, Age, Genetic Information, Disability, Protected Veteran Status, or any other legally protected group status.

Powered by JazzHR

DirectEmployers