This position offers an exciting opportunity for leadership, academic contribution and mentorship. Our projects partner with clinical and industry sponsors, as such you’ll lead data aspect of these initiatives. Given our innovation mandate, you will contribute to academic literature and conferences to share our work to the larger research and innovation community. For example, read more about out Data and Analytics Platform in JCO Clinical Cancer Informatics and shared at the American Medical Informatics Association. With all of these initiatives, this role will be responsible for mentoring and supporting intern and junior roles. Finally, the CREATE team highly prioritizes a work/life balance and offer excellent benefits, including a defined pension plan.
Data Infrastructure Development, Security and Governance:
1. Design, develop, optimize, and maintain scalable data architectures, pipelines, and workflows to support data ingestion, processing, storage, and retrieval for operational and research projects.
2. Configure and maintain the infrastructure required to support the team’s data operations, including high-performance computing and data storage systems such as databases, data lakes, and data warehouses, in collaboration with infrastructure and operations teams.
3. Ensure data privacy and security by implementing appropriate access controls, encryption, and data anonymization techniques.
4. Develop and enforce data governance policies, standards, and procedures.
Data Processing, ETL, Data Modeling and Optimization
1. Develop and maintain robust Extract, Transform, Load (ETL) processes to ensure high-quality data delivery.
2. Transform raw data into a usable format by applying data cleansing, normalization, aggregation, and enrichment techniques.
3. Collaborate with data scientists and analysts to understand data requirements and optimize data structures for analytical purposes.
4. Collaborate with stakeholders to define data engineering requirements, prioritize tasks, and manage project timelines.
5. Document data engineering processes, systems, and workflows for knowledge sharing and future reference.
• Bachelor’s or Master’s degree in computer science, data science, or a related field.
• Proficiency in working with databases (relational and NoSQL) and data warehousing concepts.
• Knowledge of data modeling, ETL frameworks, and data pipeline architectures.
• Familiarity with cloud platforms like AWS, Azure, or Google Cloud Platform.
• Strong programming skills in languages like Python, SQL, or Scala.
• Experience with data integration tools, such as MS SSIS, AWS Glue or Apache Kafka.
• Understanding of distributed computing principles and frameworks like Apache Hadoop or Apache Spark.