Build Loader
In order to build the loader, run following command.
> sbt "project loader" "clean" "assembly"
This will give you s2graph-loader-assembly-X.X.X-SNAPSHOT.jar under loader/target/scala-2.xx/
Source Data Storage Options
For bulk loading, source data can be either in HDFS or a Kafka queue.
For source data in HDFS
provide example run of following step under loader/loader.py
1. subscriber.TransferToHFile
tranfer edge format in TSV into HFile directly.
following is paramter for this job.
parameter index | note | example |
---|---|---|
0 | input path in hdfs | |
1 | output path in hdfs | |
2 | zkQuorum for target hbase | |
3 | tableName will be used in target hbase | |
4 | dbUrl for s2graph core | |
5 | maxHFilePerRegionServer |
2. distcp hfile into production hbase cluster(optional)
3. chmod hfile
4. complete load
hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles #{pathToHFile} #{hbase table name}