Josephine P.
Paris (75) 700 €/jour Expérience : 4-6 ans Répond en 4h
Data scientist MySQL Python aws azure gcp airflow LLM RAG Computer Vision Deep Learning
En quelques mots
Présentation : Ingénieur des Mines de Douai, j'ai travaillé chez Thales en tant que Data Scientist pendant 1 an, puis chez Accenture en tant que consultante (à des niveaux techniques comme stratégiques) en tant que Data Scientist et Data Engineer, durant 2 ans et demi.
J'exerce en tant que Freelance depuis septembre 2023 sur des missions en data science et engineering.
Lieu de travail : Paris ou à distance
TJM : 670
Références
Experiences :
Cadusha - Tech Lead / Data Scientist Freelance - Ongoing
Development of an AI-driven product matching system for large-scale tenders, processing millions of vendor catalog entries.
Development and Deployment of a RAG-based Product Matching System
Design and implementation of a product matching system leveraging OpenAI’s text-embedding-ada-002 model and Pinecone for vector search.
Implementation of a hybrid retrieval-ranking approach, where GPT-4 refines and ranks the top retrieved matches for better accuracy.
Deployment of a RAG pipeline combining semantic search and LLM-based ranking
Strategic Technical Leadership
Provided technical and strategic guidance on the selection of technologies and system architecture to ensure scalability, efficiency, and maintainability.
Main Technologies : Python, OpenAI API (for embedding), Pinecone, FAISS (vector database indexing), RAG (Retrieval-Augmented Generation), NLP, FastAPI (backend framework), React (frontend framework)
Arcascience - Data Engineer Freelance - 5 months
Implementation of a Data Retrieval Pipeline
Design & development of a daily data ingestion pipeline that collects and processes data from various medical data sources (e.g., MedLine, PubMed, clinicalTrials.gov) to build a robust dataset of medical research publications.
Implementation of an Annotation Pipeline
Design & development of a data processing pipeline to extract features from raw XML data and convert them into structured CSV files. Implemented NER models.
Implementation of CI/CD and Automated Testing
Implemented CI/CD pipelines to ensure reliability of data workflows : developped unit and integration tests to validate pipeline processes (including file extraction validation and CSV content verificatin after XML parsing.)
Automated test exécution for each Merge Request using GitLab CI/CD.
Main Technologies : Python (Pandas, PySpark), Airflow DAGs, Airflow Server (for Orchestration & Scheduling), GitLab (versioning), Docker, Backblaze (for data storage), PyTest, GitLab CI/CD
Sebia - Data Scientist / Data Engineer Freelance - 8 months
Implementation of an Automatic Classification Model for Medical Diagnoses Data based on Blood Analyses
Types of processed data : clinical data & data from in vitro diagnostic devices operating on the capillary electrophoresis technology (analysis of serum-type blood samples) of time series type (1-Dimensional image)
Implementation of deep learning model architectures (CNN type) and development of a cascading multi-model architecture (combining random forest classifiers & customized Deep Learning architectures)
Main technologies : Python (Pandas, Scikit-learn, TensorFlow, Keras), Jupyter Notebook
Automated Data Pipeline and Machine Learning Deployment on API
Development and deployment a machine learning model API using Flask on Google App Engine.
Integration of the API with BigQuery to allow real-time querying of the patient database.
API design to accept patient ID and blood analysis date as inputs and return the corresponding data row from BigQuery.
Main Technologies : GCS, BigQuery (MySQL), Google App Engine, Flask, Google Looker Studio, Google Cloud Composer (Airflow), Git (GitHub Actions)
Accenture - Data Scientist / Data Engineer Consultant (2 years and 5 months)
∙ Data Scientist / Engineer for an assoc. that helps people enter the labor market - 5 months
o Chatbot and API refactoring
o Social Media Web Scraping
o Modification of the similarity computing and the databases so that the chatbot returns
inclusive answers
o Main technologies : Python (BeautifulSoup, Selenium), MySQL, Docker, AWS Elastic
Beanstalk, AWS Lambda, AWS EC2, AWS CloudWatch, Azure Devops, Postman
∙ Data Scientist for Oil & Gaz int. Co. - Robot pressure gauge detection and reading - 5 months
o Dataset generation from scratch leveraging 3D models
o Implementation, Training & Testing of the model
o Reading of pressure values
o Main technologies : TensorFlow, Data Visualization (Matplotlib, Seaborn), Google OCR API
∙ Python Programmer for Food International Co. - Constrained optimization problem - 4 months
o Improvement of a constraint programming solution using Google OR Tools
o Translation of the business needs into a mathematical problem with constraints
o Industrialize the execution of optimization algorithms on Databricks Jobs (distributed
computing)
o Main technologies : Databricks Jobs, Google Or-Tools with CP-SAT solver, Python
∙ Tech Lead & Data Eng. - Development of a cloud hosted call center for a set of clients - 5 months
o Development of a call center entirely hosted on the cloud
o Main technologies : AWS Connect / DynamoDB / Lambda / SES, Transmission of calls to a
CRM (Salesforce)
∙ Data Engineer for a food waste reduction project in company canteens - 2 months
o Food storage prevision (food amount detection, dishes recognition)
o Winner of the Accenture Global Innovation Contest 2023 (>500 applicants) at Gallia level
(France, Benelux, Belgium)
∙ Data Scientist - Data generation and scenario simulation for a set of clients - 6 months
o Generation of energy consumption data from industrial robots
o Simulation of data retrieval from a warehouse in the cloud
o Automated cross-referencing of consumption data with energy consumption cost APIs
o Display of consumption and cost dashboards, to help decision-makers reduce energy
consumption in an industrial warehouse.
∙ Project Manager of the startup SOS Forest - Internal Entrepreneurial Project - 17 months
o Product :
o Early forest fires detection with an AI trained on synthetic images
o Regular training of the AI (ML ops) on a database continuously enriched with new
images (synthetic and real) coming from different actors (collaborative database)
o On-edge integration of the AI on an energy autonomous system, geared with a
camera, solar panel, 4G module, solar charge controller, battery, SIM card …
o Automatic alert sending to the firemen
o Winner of the Accenture Global Innovation Contest 2022 (50000 applicants) at Europ. level
o Fundraising within Accenture to prototype and deploy the product
o Management of an international team of 5 people with a 200K funding for 10 months
o Integration in the Accenture incubator for 6 months
o Deployment of three prototype cameras in the south of France during the Summer 2022
o Development of partnerships with a fire department (France), a forestry operation (France),
and a cameras network (West USA) during the Summer 2023
o Main technologies : Open-source neural network (Yolo architecture), Synthetic image
generation and labeling (Unity, Omniverse), Generative AI (stable diffusion), ML Ops (Azure
ML), API deployment to host detection model (Azure App Service, Postman), Real time
interactions with deployed IoT (Azure IoT Hub, Azure Function, VPNs), Integration on
Raspberry Pi, Energy optimization process (solar panel, solar charge controller) Project
Monitoring (Azure DevOps)
Thales France - Deep Learning Internship (6 months)
∙ Detection of an aircraft satcom system anomalies via data log files with Python
o Regression Neural Networks (TensorFlow),
o Decision Trees (Scikit-Learn, Random Forest)
Thales France - Data capitalization Internship (4 months)
∙ Modeling of an engineering portal for project monitoring with FreePlane (Software)
LeTROT Paris - Human resources assistant (1 month)
∙ Ensuring the management of payroll, Prepare pay slips monthly, Process social declarations
Projects :
Object Detection & Automatic Alert on Raspberry Pi
o Extraction of labeled images of a specific pet with the MobileNet V2 model
o Transfer learning on the MobileNet V2 architecture for the detection of a specific pet
o Integration of the model on a Raspberry Pi4
o Real-time detection and alerting of the owner upon detection
o Main technologies : TensorFlow, Twilio
Video game development
o development of a two-dimensional (2D) puzzle platform game set in a three-dimensional (3D)
world with Unity
Detection and classification of malicious network packets (SQL injection, XSS, Brute Force…)
Publications :
Medium and Analytics Vidhya publications
Scrape an online newspaper and display the hot topics in a Word Cloud
Visualize the gradient descent of a cost function with its level circles –Python
Scrape an online newspaper and display the hot topics in a Word Cloud -Python
Application Python avec interface graphique tkinter
TLE processing: Cartesian position and Velocity vectors of a satellite at a given time with Python
Etudes
2018 - 2021 : French Grande Ecole Mines de Douai - Master's degree in Data Science
2015 - 2018 : Claude Bernard Preparatory Classes - Bachelor's degree in mathematic
2021 : AWS - Cloud Practitioner Certification
2019 : Stanford University - Machine Learning Certification
2018 : Centrale Lille - Project Management Certification