The ENG AI - NLP Platform team maintains the industry's premier natural language processing (NLP) platform for finance. As one of the teams in the CTO-sponsored ENG AI - Platforms group, this team is a key part of the foundational toolkit that Bloomberg's AI researchers and engineers use to build innovative AI applications for our products including news, research, and dialog. This team is also the infrastructure partner of the NLP Engineering Guild, which boasts over 1000 members.
What We Do
The NLP platform team supports all stages of NLP application development, including:
- NLP Model Training Frameworks: We maintain LIT [1] - a suite of commonly used, well-tested modules built on top of popular open-source projects such as PyTorch and Hugging Face transformers, providing support for all major NLP tasks including classification, tagging, and learning representations.
- NLP Inference Frameworks: We also maintain libnlp [2] - a widely used C++/Python NLP library, containing best-in-class NLP algorithms and providing users with a framework for constructing pipelines of successive NLP tasks. libnlp supports running arbitrary PyTorch and ONNX models, making it very flexible, while also providing sensible defaults for common tasks. libnlp is used in over a hundred projects across Bloomberg [3], making it one of the most heavily relied on ML libraries in the company. We also maintain libnlpserver - a custom KServe-based framework which makes it easy to serve libnlp pipelines on the Data Science Platform.
- Out-of-the-box NLP Models: Finally, the team builds state-of-the-art NLP models for general use with Bloomberg data and applications in mind. This includes pretrained language models, embeddings, named entity recognition (NER) and disambiguation (NED) models [4] for multiple languages and domains. These models and their variants are used in a wide range of applications, including news, research, and IB/MSG. We also maintain pipelines for training and deploying models regularly and provide several interfaces to these models through KServe, BAS, libnlp and our Python interface (pyNED).
Where We're Going
- Continued development of the Python model training library through implementing new functionality and integration of open-source and Bloomberg projects
- Simplification of the libnlp build system and splitting it into multiple functional components
- Support for standardized NLP model inference through benchmarking open-source solutions and adapting best practices into Bloomberg's serving ecosystem
- Developing NLP inference pipelines in libnlp and pivoting more to our growing Python user base
- Continuous training and deployment workflows for all our models
What We Need From You
- Passion for building and maintaining widely used software libraries
- High level of proficiency in Python and C++
- Knowledge of Bloomberg infrastructure (Devise, DPKG, BAS, BCS/BCOS, DSP, etc.)
- Interest in open-source projects (HF transformers, PyTorch, Kubernetes, Triton, Airflow, etc.)
Get In Touch!
We're growing the team, and if you'd like to be involved in the future of NLP at Bloomberg, we'd love to hear from you! Please contact any of the following:
- Shefaet Rahman krahman7@bloomberg.net
- Anju Kambadur pkambadur@bloomberg.net
- Irinka Toidze itoidze@bloomberg.net
References
[1] http://bburl/litdocs
[2] http://bburl/libnlpdocs
[3] As of June 2022, libnlp is deployed on over 100 RHST clusters, depended on by over 70 DPKGs, and the library was initialized over 4 million times during the month
[4] http://bburl/neddocs