Internship Openings

CV0075: Internship - Multimodal Embodied AI
- MERL is looking for a self-motivated intern to work on problems at the intersection of multimodal large language models and embodied AI in dynamic indoor environments. The ideal candidate would be a PhD student with a strong background in machine learning and computer vision, as demonstrated by top-tier publications. The candidate must have prior experience in designing synthetic scenes (e.g., 3D games) using popular graphics software, embodied AI, large language models, reinforcement learning, and the use of simulators such as Habitat/SoundSpaces. Hands on experience in using animated 3D human shape models (e.g., SMPL and variants) is desired. The intern is expected to collaborate with researchers in computer vision at MERL to develop algorithms and prepare manuscripts for scientific publications.
  Required Specific Experience
  - Experience in designing 3D interactive scenes
  - Experience with vision based embodied AI using simulators (implementation on real robotic hardware would be a plus).
  - Experience training large language models on multimodal data
  - Experience with training reinforcement learning algorithms
  - Strong foundations in machine learning and programming
  - Strong track record of publications in top-tier computer vision and machine learning venues (such as CVPR, NeurIPS, etc.).
- Research Areas: Artificial Intelligence, Computer Vision, Speech & Audio, Robotics, Machine Learning
- Host: Anoop Cherian
- Apply Now
CV0078: Internship - Audio-Visual Learning with Limited Labeled Data
- MERL is looking for a highly motivated intern to work on an original research project on multimodal learning, such as audio-visual learning, using limited labeled data. A strong background in computer vision and deep learning is required. Experience in audio-visual (multimodal) learning, weakly/self-supervised learning, continual learning, and large (vision-) language models is an added plus and will be valued. The successful candidate is expected to have published at least one paper in a top-tier computer vision or machine learning venue, such as CVPR, ECCV, ICCV, ICML, ICLR, NeurIPS or AAAI, and possess solid programming skills in Python and popular deep learning frameworks such as Pytorch. The intern will collaborate with MERL researchers to develop and implement novel algorithms and prepare manuscripts for scientific publications. Successful applicants are typically graduate students on a Ph.D. track or recent Ph.D. graduates. Duration and start date are flexible, but the internship is expected to last for at least 3 months.
  Required Specific Experience
  - Prior publications in top-tier computer vision and/or machine learning venues, such as CVPR, ECCV, ICCV, ICML, ICLR, NeurIPS or AAAI.
  - Knowledge of the latest self-supervised and weakly-supervised learning techniques.
  - Experience with Large (Vision-) Language Models.
  - Proficiency in scripting languages, such as Python, and deep learning frameworks such as PyTorch or Tensorflow.
- Research Areas: Computer Vision, Machine Learning, Speech & Audio, Artificial Intelligence
- Host: Moitreya Chatterjee
- Apply Now