FasTensor / Stream Processing

[FasTensor] (https://github.com/BinDong314/FasTensor) is a generic tensor processing engine with scalability from single nodes to thousands of nodes on HPC. FasTensor supports applications from traditional SQL query to complex DFT solver in scientific applications. It has a 1000X performance advantage over MapReduce and Spark in supporting generic data processing functions on tensor structure. In this project, we propose to expand FasTensor with streaming functionality to support online data processing. Specifically, participants of this project will develop a stream endpoint for retrieving live data output from applications, such as DAS. The stream endpoint performs the function to maintain the pointer of data, which could be a n-dimensional subset of a tensor.

FasTensor / Stream Processing

  • Topics: FasTensor/Streaming Processing
  • Skills: C++, github
  • Difficulty: Difficult
  • Size: Large (350 hours)
  • Mentor: Bin Dong, John Wu

The Specific tasks of the project include:

  • Building a mock workflow based on our DAS application (https://github.com/BinDong314/DASSA) to test stream processing. The mock workflow comprises a data producer, which generates DAS data, and a data consumer, which processes the data.
  • Developing a Stream Endpoint (e.g., I/O driver) to iteratively read dynamically increasing data from a directory. The stream endpoint essentially includes open, read, and write functions, and a pointer to remember current file pointer.
  • Integrating the Stream Endpoint into the FasTensor library.
  • Evaluating the performance of the mock workflow with the new Stream Endpoint.
  • Documenting the execution mechanism.
Bin Dong
Bin Dong
Research Scientist, Lawrence Berkeley National Laboratory

Bin’s research interests are in high-performance computing + big data + AI/non-AI.

John Wu
John Wu
Senior Computer Scientist, Lawrence Berkeley National Laboratory

John is a Senior Computer Scientist at Lawrence Berkeley National Laboratory. He works on data management, data analysis, and scientific computing. His algorithmic research work includes feature extraction, indexing techniques, tensor data processing, and scientific computing.