Commit 1355c449 authored by Roque's avatar Roque

Project and tool description on README

parent f57f73de
# ------ EBULK INGESTION-DOWNLOAD TOOL ------
# CONTENT:
# TOOL DESCRIPTION
Ebulk tool is a wrapper for Embulk, an open-source bulk data loader that helps data transfer between various databases, storages, file formats, and cloud services. It supports any kind of input file formats, parallel and distributed execution to deal with big data sets, transaction control to guarantee All-or-Nothing file transfer, and operation resuming. Ebulk is as easy as git to use, allowing the big data transfering to be done by using very few commands.
# BIG DATA SHARING PLATFORM
Along with Wendelin platform, ebulk is combined to form an easy to use Data Lake to share petabytes of data grouped into data sets. This project offers a solution to the big data sharing problem by solving the following key points:
- Huge transfer (over slow and unreliable network)
- Huge storage (with little budget)
- Many protocols (S3, HTTP, FTP, etc.)
- Many binary formats (ndarray, video, etc.)
- Trade secret
# PROJECT CONTENT:
- Bash script for ingestion and download
- Embulk plugins
- Configuration files (yml)
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment