Commit e20b5ac0 by yu

fix the error in README.md

parent 6253b2d8
......@@ -4,7 +4,7 @@
## Overview
* **Plugin type**: input
* **Resume supported**: not yet
* **Resume supported**: no
* **Cleanup supported**: yes
* **Guess supported**: no
......@@ -12,10 +12,10 @@
- **multi_dir**: description (ArrayList<String>, required)
- **mulit_tag**: description (ArrayList<String>, default: `[]`)
- **order**: description (String, default: `ALPHABETICAL`)
- **load_order**: description (String, default: `ALPHABETICAL`)
- **chunk_size**: description (int, default: `10485760(10M)`)
for the order option. There are many alternative:
Attention: For the order option. There are many alternative:
ALPHABETICAL (default value)
ASCEND_MODIFIED
DESCEND_MODIFIED
......@@ -29,34 +29,36 @@ exec:
min_output_tasks: 1
in:
type: filename
mulit_dir: ["../sample/sample_","../example/example_"]
multi_dir: ["../sample/sample_","../example/example_"]
multi_tag: ["tag1","tag2"]
order: ASCEND_MODIFIED
load_order: ASCEND_MODIFIED
chunk_size: 1000
```
Attention:
exec:
min_output_tasks: 1
is necessary!
Embulk will optimize the task according the core number which means that, it will re-distribute the task, which will cause
errors.
If the multi_dir contains more than one directory. each directory will be treated as a task. the embulk will distribute those tasks to multi
thread. As each task will one consistently, the files in each directory will be uploading in order.
**exec:**
**min_output_tasks: 1**
This configuration is oblige!
Embulk will optimize the task according the core number of the PC which means that, it will re-distribute the task and cause errors.
If the multi_dir contains more than one directory, each directory will be treated as a task. the Embulk will distribute those tasks to multi
thread. Each task will run consistently, the files in each directory will be uploading in order.
For example the upload order maybe:
example1.txt
sample1.txt
sample2.txt
example2.txt
sample3.txt
- example1.txt
- sample1.txt
- sample2.txt
- example2.txt
- sample3.txt
If you want to upload the directory one by one, you need to configure the
**max_thread: 1**
To also upload the directory one by one, you need to configure the max_thread: 1
then you will get
example1.txt
example2.txt
sample1.txt
sample2.txt
sample3.txt
- example1.txt
- example2.txt
- sample1.txt
- sample2.txt
- sample3.txt
## Build
......@@ -65,3 +67,85 @@ java 1.8 is required.
```
$ ./gradlew gem # -t to watch change of files and rebuild continuously
```
## Usage
If you are a new user of embulk.
Here are some tips can help you use this plugin quickly.
First of all, you need to have a java8 environment in a linux system.
And you need a erp5 isntance. [if not, follow the tutorial to have
one](https://nexedi.erp5.net/web_page_module/7056/WebPage_view?ignore_layout:int=1&selection_index=0&portal_status_message=Status%20changed.&selection_name=web_page_module_view_web_page_list_selection&editable_mode:int=1)
Then, you need a embulk on your PC, now there is a bug to load the plugin with the newest embulk. I recommand you use the embulk_v.8.27 instead
of the newest version.
To install the embulk
```
curl --create-dirs -o ~/.embulk/bin/embulk -L "https://dl.bintray.com/embulk/maven/embulk-0.8.27.jar"
chmod +x ~/.embulk/bin/embulk
echo 'export PATH="$HOME/.embulk/bin:$PATH"' >> ~/.bashrc
source ~/.bashrc
```
After installing the embulk. You need to build this filename-input-plugin on your PC.
```
git clone https://lab.nexedi.com/caiyu/embulk-input-filename/tree/multiThread
cd embulk-input-filename
./gradlew package
```
If you want to test the Plugin. Just run ./gradlew test
In fact this input should be used with the wendelin-output-plugin. build it on your PC too.
```
git https://lab.nexedi.com/caiyu/embulk-output-wendelin/tree/java-output
cd embulk-output-plugin
./gradlew package
```
Now you can use the embulk with these two plugin to upload the data.
In your workplace, create a yml file. Say that we create a config.yml, and fill in the configuration.
```yaml
exec:
min_output_tasks: 1
in:
type: filename
mulit_dir: ["../sample/sample_","../example/example_"]
multi_tag: ["tag1","tag2"]
load_order: ASCEND_MODIFIED
chunk_size: 1000
out:
type: wendelin
tag: "weather-cc"
streamtool_uri: https://softinstxxxxx.host.vifib.net/erp5/portal_ingestion_policies/weather-cc
user: zope
password: yourpassword
```
Prepare the sample data and example data to upload.
```
mkdir ../sample
vim ../sample/sample_01.txt
vim ../sample/sample_02.txt
vim ../sample/sample_03.txt
mkdir ../example
vim ../example/example_01.txt
vim ../example/example_02.txt
```
Then run the embulk
```
embulk run -L path/to/embulk-input-filename -L path/to/embulk-output-wendelin config.yml
```
......@@ -19,15 +19,14 @@ sourceCompatibility = 1.8
targetCompatibility = 1.8
dependencies {
compile "org.embulk:embulk-core:0.8.27"
provided "org.embulk:embulk-core:0.8.27"
compile "org.embulk:embulk-standards:0.8.27"
provided "org.embulk:embulk-standards:0.8.27"
compile "commons-codec:commons-codec:1.9"
compile "org.embulk:embulk-core:0.8.29"
provided "org.embulk:embulk-core:0.8.29"
compile "org.embulk:embulk-standards:0.8.29"
provided "org.embulk:embulk-standards:0.8.29"
// compile "YOUR_JAR_DEPENDENCY_GROUP:YOUR_JAR_DEPENDENCY_MODULE:YOUR_JAR_DEPENDENCY_VERSION"
testCompile "junit:junit:4.+"
testCompile "org.embulk:embulk-core:0.8.27:tests"
testCompile 'org.embulk:embulk-test:0.8.27'
testCompile "org.embulk:embulk-core:0.8.29:tests"
testCompile 'org.embulk:embulk-test:0.8.29'
}
test {
......
......@@ -305,7 +305,7 @@ public class FilenameInputPlugin
// This method is for walk through the directory and record the files in the directory. It will compare the filename with the lastPath
// In we want to upload the files in ALPHABETICAL order. than the filename "smaller than" the lastPath will be abandonned.
// Be careful, that since we have alternative for the order. You should be careful what "smaller than" means!
public ArrayList<String> listFiles(PluginTask task,Path pathPrefix,String lastPath,String order)
public ArrayList<String> listFiles(PluginTask task,Path pathPrefix,final String lastPath,final String order)
{
//Path pathPrefix = Paths.get(task.getPathPrefix()).normalize();
final Path directory;
......@@ -319,7 +319,6 @@ public class FilenameInputPlugin
directory = (d == null ? CURRENT_DIR : d);
}
//final ImmutableList.Builder<String> builder = ImmutableList.builder();
final ArrayList<String> filesArray = new ArrayList<String>();
try {
log.info("Listing local files at directory '{}' filtering filename by prefix '{}'", directory.equals(CURRENT_DIR) ? "." : directory.toString(), fileNamePrefix);
......@@ -380,8 +379,7 @@ public class FilenameInputPlugin
// End
// Static method to return a FileTime of a file
public static FileTime getCreationTime(String filename) throws IOException{
File file = new File(filename);
Path p = Paths.get(file.getAbsolutePath());
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or sign in to comment