Projects

Analytics Zoo provides a collection of reference user applications and demos, which can be modified or even used off-the-shelf in real world applications. Some are listed below.

BigDL allows users to write deep learning applications as standard Spark programs, which can directly run on top of existing Spark or Hadoop clusters to process Big Data.

Analytics Zoo seamlessly unites TensorFlow, Keras, PyTorch, Spark, Flink and Ray programs into an integrated pipeline, which can transparently scale from laptops to large clusters to process production big data.

It leverages emerging AI technologies (e.g., Ray, hyperparameter optimization, sequence generation models, etc.) to automatically generate feature, select models and tune hyperparameters for time series prediction in a distributed fashion.

Building a customer service chatbot using NLP technologies (e.g., text classification and text matching models) in Analytics Zoo with the Microsoft Azure China team.

Building the end-to-end product recommendation pipeline (using session-based recommendation models, Analytics Zoo, MLeap, Play Framework, etc.) on AWS with the Office Depot team.

Building an experiment platform using both DRL algorithms (e.g., imitation learning, DQN, policy gradient, etc.) as well as computer vision models (e.g., object detection, object tracking, OCR, etc.) to play FIFA18.

Recent Publications

BigDL: A Distributed Deep Learning Framework for Big Data

In ACM Symposium of Cloud Computing conference, SoCC 2019

Build Deep Learning Applications for Big Data Platforms

Tutorial in the Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-2019)

Building Deep Learning Applications on Big Data Platforms

Tutorial In the Conference on Computer Vision and Pattern Recognition, CVPR 2018

Experience from Hadoop Benchmarking with HiBench: From Micro-Benchmarks Toward End-to-End Pipelines

In Proceedings of the 2013 Workshop Series on Big Data Benchmarking

HiTune: Dataflow-Based Performance Analysis for Big Data Cloud

In USENIX Annual Technical Conference (ATC), 2011

The HiBench benchmark suite: Characterization of the MapReduce-based data analysis

In Proceedings of the 26th International Conference on Data Engineering Whokshops, ICDEW 2010

Design Patterns for Internet-Scale Services

In Proceedings of the 25th International Conference on Data Engineering Workshops, ICDEW 2009

Latency Hiding in Multi-Threading and Multi-Processing of Network Applications

In the 16th International Conference on Parallel Architecture and Compilation Techniques, PACT 2007

Automatically partitioning packet processing applications for pipelined architectures

In ACM Sigplan 2005 Conference on Programming Language Design and Implementation(PLDI)

Contact