hbase与Phoenix二级索引Global Local Indexing配置

在HBase中，只有一个单一的按照字典序排序的rowKey索引，当使用rowKey来进行数据查询的时候速度较快，但是如果不使用rowKey来查询的话就会使用filter来对全表进行扫描，查询速度非常慢，我测试查询8万条简单数据用了268秒。而Phoenix提供了二级索引技术可以解决这种查询速度慢的问题。
Phoenix提供两种类型的索引技术：注重提升读性能的Global Indexing和注重提升写性能的Local Indexing。下面分别对这两种索引技术简单使用一下。

在HBase集群的master节点的hbase-site.xml中添加如下配置

<property>
    <name>hbase.master.loadbalancer.class</name>
    <value>org.apache.phoenix.hbase.index.balancer.IndexLoadBalancer</value>
</property>
<property>
    <name>hbase.coprocessor.master.classes</name>
    <value>org.apache.phoenix.hbase.index.master.IndexMasterObserver</value>
</property>
<property>
    <name>hbase.region.server.rpc.scheduler.factory.class</name>
    <value>org.apache.hadoop.hbase.ipc.PhoenixRpcSchedulerFactory</value>
</property>
<property>
    <name>hbase.rpc.controllerfactory.class</name>
    <value>org.apache.hadoop.hbase.ipc.controller.ServerRpcControllerFactory</value>
</property>

在HBase集群的每个regionserver节点的hbase-site.xml中添加如下配置

<property>
    <name>hbase.coprocessor.regionserver.classes</name>
    <value>org.apache.hadoop.hbase.regionserver.LocalIndexMerger</value>
</property>

可能会报错：

Error: ERROR 1029 (42Y88): Mutable secondary indexes must have the hbase.regionserver.wal.codec property set to org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec in the hbase-sites.xml of every region server. tableName=MY_INDEX (state=42Y88,code=1029)

于是，按照错误提示在HBase集群的每个regionserver节点的hbase-site.xml中再次添加如下配置

<property>
    <name>hbase.regionserver.wal.codec</name>
    <value>org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec</value>
</property>

重启hbase集群：

xyhadoop/hbase-1.0.1.1/bin/stop-hbase.sh
xyhadoop/hbase-1.0.1.1/bin/start-hbase.sh

测试建立Phoenix Global index 和 Local index索引：

create table company(id varchar primary key, name varchar, address varchar);
create index global_index on company(name);
create local index local_index on company(name);
!indexes company

看到类似如下信息，说明成功：

Connected to: Phoenix (version 4.7)
Driver: PhoenixEmbeddedDriver (version 4.7)
Autocommit status: true
Transaction isolation: TRANSACTION_READ_COMMITTED
Building list of tables and columns for tab-completion (set fastconnect to true to skip)...
115/115 (100%) Done
Done
sqlline version 1.1.8
0: jdbc:phoenix:localhost> !indexes company
+------------+--------------+-------------+-------------+------------------+-------------+-------+-------------------+----------+
| TABLE_CAT  | TABLE_SCHEM  | TABLE_NAME  | NON_UNIQUE  | INDEX_QUALIFIER  | INDEX_NAME  | TYPE  | ORDINAL_POSITION  | COLUMN_N |
+------------+--------------+-------------+-------------+------------------+-------------+-------+-------------------+----------+
|            |              | COMPANY     | true        |                  | MY_INDEX    | 3     | 1                 | _INDEX_I |
|            |              | COMPANY     | true        |                  | MY_INDEX    | 3     | 2                 | 0:NAME   |
|            |              | COMPANY     | true        |                  | MY_INDEX    | 3     | 3                 | :ID      |
+------------+--------------+-------------+-------------+------------------+-------------+-------+-------------------+----------+
0: jdbc:phoenix:localhost>

测试性能对比

导入了250万数据。采用global index时写入速度慢，读取速度在0.1秒左右；采用local index时，写入速度快，读取速度在4秒左右。可见global index和local index在读取的时候有40倍的差距，在写入的时候差距没这么大，都用了很久。

参考链接：
http://www.aboutyun.com/forum.php?mod=viewthread&tid=15570
http://phoenix.apache.org/secondary_indexing.html
http://www.icaijing.org/hot/article4940159