PostgreSQL プログラマガイド
Prev		Next

Chapter 6. SQL の拡張: 演算子

Postgres supports left unary, right unary and binary operators. Operators can be overloaded; that is, the same operator name can be used for different operators that have different numbers and types of arguments. If there is an ambiguous situation and the system cannot determine the correct operator to use, it will return an error. You may have to typecast the left and/or right operands to help it understand which operator you meant to use.

Postgres では左単項演算子、右単項演算子、及び、二項演算子をサポートしています。演算子を上書きすることができます。つまり異なった数や型の引数を持つ異なる演算子に同一の演算子名を使用する事ができます。もし使用する演算子に曖昧な状態があり、システムが使用するべき正しい演算子を決定することができない場合は、エラーを返します。その場合は、どの演算子を使いたいのかを明示的に指定するために、左/右演算式を型キャストを行う必要があるかもしれません。

Every operator is "syntactic sugar" for a call to an underlying function that does the real work; so you must first create the underlying function before you can create the operator. However, an operator is not merely syntactic sugar, because it carries additional information that helps the query planner optimize queries that use the operator. Much of this chapter will be devoted to explaining that additional information.

全ての演算子は、実際の作業を行なう基礎となる関数を呼び出す "文法的な飾り" です。ですので、演算子を作成する前に、まずはその基礎となる関数を作成する必要があります。しかし、演算子は単なる文法的な飾りだけではありません。その演算子を使うクエリーを最適化するクエリープランナを補助する追加的な情報を伝える機能を持つからです。この章のほとんどの部分を使って、この追加的な情報について説明します。

Here is an example of creating an operator for adding two complex numbers. We assume we've already created the definition of type complex. First we need a function that does the work; then we can define the operator:

ここで２つの複素数を加算する演算子を作成するという例を示します。既に複素数型を定義しているものとします。まず加算を行なう関数を作成する必要があります。その後に演算子を定義できます。

CREATE FUNCTION complex_add(complex, complex)
    RETURNS complex
    AS '$PWD/obj/complex.so'
    LANGUAGE 'c';

CREATE OPERATOR + (
    leftarg = complex,
    rightarg = complex,
    procedure = complex_add,
    commutator = +
);

Now we can do:

これで次の事を実行できるようになります。

SELECT (a + b) AS c FROM test_complex;

+----------------+
|c               |
+----------------+
|(5.2,6.05)      |
+----------------+
|(133.42,144.95) |
+----------------+

We've shown how to create a binary operator here. To create unary operators, just omit one of leftarg (for left unary) or rightarg (for right unary). The procedure clause and the argument clauses are the only required items in CREATE OPERATOR. The COMMUTATOR clause shown in the example is an optional hint to the query optimizer. Further details about COMMUTATOR and other optimizer hints appear below.

ここでは二項演算子をどのように作成するのかを示しました。単項演算子を作成するには、単に（左方単項の場合は）leftarg を、（右方単項の場合は）rightarg を省略するだけです。 procedure 句と argument 句の 2 つのみが CREATE OPERATOR での必須項目です。例で示した COMMUTATOR 句はオプションで、クエリーオプティマイザへのヒントとなります。 COMMUTATOR とその他のオプティマイザへのヒントについての詳細は後述します。

演算子最適化に関する情報

著者: Written by Tom Lane.
Tom Lane 氏

A Postgres operator definition can include several optional clauses that tell the system useful things about how the operator behaves. These clauses should be provided whenever appropriate, because they can make for considerable speedups in execution of queries that use the operator. But if you provide them, you must be sure that they are right! Incorrect use of an optimization clause can result in backend crashes, subtly wrong output, or other Bad Things. You can always leave out an optimization clause if you are not sure about it; the only consequence is that queries might run slower than they need to.

Postgres の演算子定義には、システムに演算子がどうふるまうかに関する有用な事を伝える、幾つかのオプション句を持つ事ができます。これらの句は特定できる時は常に用意しておかなければなりません。なぜなら、その演算子を使用するクエリー実行の際に、これらの句の情報により、かなりの速度向上がなされるからです。しかし、提供する時には正しい事を確認しなければいけません。最適化用の句を間違って使用すると、バックエンドのクラッシュ、不思議な間違った出力、その他有害な事が起こります。最適化用の句について解らなければ、使用しなくても構いません。使用された時よりもクエリーの実行が遅くなるかもしれないというだけです。

Additional optimization clauses might be added in future versions of Postgres. The ones described here are all the ones that release 6.5 understands.

最適化用の句は、Postgres の今後のバージョンで更に追加される可能性があります。ここで記述したものは全て、リリース 6.5 で有効です。

COMMUTATOR

The COMMUTATOR clause, if provided, names an operator that is the commutator of the operator being defined. We say that operator A is the commutator of operator B if (x A y) equals (y B x) for all possible input values x,y. Notice that B is also the commutator of A. For example, operators '<' and '>' for a particular datatype are usually each others' commutators, and operator '+' is usually commutative with itself. But operator '-' is usually not commutative with anything.

COMMUTATOR 句が与えられた場合、それはある演算子に定義された演算子の交代演算子であると名付けます。取り得る全ての入力 x 、 y に対して、(x A y) が (y B x) と等しい時、演算子 A は演算子 B の交代演算子であるといいます。また、B は A の交代演算子となることにも注意して下さい。例えば、特定の型用の演算子 '<' と '>' は通常、お互いの交代演算子になります。演算子 '+' は通常自身が交代演算子となります。しかし、演算子 '-' は通常交代演算子を持ちません。

The left argument type of a commuted operator is the same as the right argument type of its commutator, and vice versa. So the name of the commutator operator is all that Postgres needs to be given to look up the commutator, and that's all that need be provided in the COMMUTATOR clause.

交代された演算子の左引数の型は、その交代演算子の右引数の型と同一です。逆も又同様です。交代演算子の名前は、Postgres が交代演算子を検索する時に与えられなければいけないものであり、COMMUTATOR 句にて提供されなければいけないものです。

When you are defining a self-commutative operator, you just do it. When you are defining a pair of commutative operators, things are a little trickier: how can the first one to be defined refer to the other one, which you haven't defined yet? There are two solutions to this problem:

自身が交代演算子である演算子を定義する場合は、単にそれを指定するだけです。交代演算子のペアを定義する場合は少しややこしくなります。未定義の他のものを参照する、最初のものをどう定義するのかという問題です。この問題には２つの解決方法があります。

One way is to omit the COMMUTATOR clause in the first operator that you define, and then provide one in the second operator's definition. Since Postgres knows that commutative operators come in pairs, when it sees the second definition it will automatically go back and fill in the missing COMMUTATOR clause in the first definition.
一つは、最初の演算子を定義する時に COMMUTATOR 句を省略し、2 番目の演算子の定義では、COMMUTATOR 句に最初の演算子を与えるという方法です。 Postgres は交代演算子がペアになっていることが解っていますので、2 番目の定義を見た時に、自動的に最初の定義に戻ってその未定義になっている COMMUTATOR 句を設定します。
The other, more straightforward way is just to include COMMUTATOR clauses in both definitions. When Postgres processes the first definition and realizes that COMMUTATOR refers to a non-existent operator, the system will make a dummy entry for that operator in the system's pg_operator table. This dummy entry will have valid data only for the operator name, left and right argument types, and result type, since that's all that Postgres can deduce at this point. The first operator's catalog entry will link to this dummy entry. Later, when you define the second operator, the system updates the dummy entry with the additional information from the second definition. If you try to use the dummy operator before it's been filled in, you'll just get an error message. (Note: this procedure did not work reliably in Postgres versions before 6.5, but it is now the recommended way to do things.)
もう一つの方法は、両方の定義に COMMUTATOR 句を含めるというもっと素直な方法です。 Postgres は最初の定義を処理し、 COMMUTATOR が存在しない演算子を参照している事が解ると、システムはその演算子用のダミーのエントリをシステムの pg_operator テーブルに作成します。このダミーエントリには、演算子名、左引数の型、右引数の型、及び、結果の型についてのみの有効なデータがあります。というのは、Postgres がこの時点で推定できるのはこれらに限られるからです。最初の演算子のカタログエントリはこのダミーエントリに結び付いています。この後で 2 番目の演算子を定義すると、システムはその定義から得られる追加情報を使ってダミーエントリを更新します。更新される前にダミー演算子を使おうとすると、エラーメッセージが出力されます。（注意: バージョン 6.5 より前の Postgres ではこの方法は信頼性がありませんでしたが、今ではこの方法が推奨されています。）

NEGATOR

The NEGATOR clause, if provided, names an operator that is the negator of the operator being defined. We say that operator A is the negator of operator B if both return boolean results and (x A y) equals NOT (x B y) for all possible inputs x,y. Notice that B is also the negator of A. For example, '<' and '>=' are a negator pair for most datatypes. An operator can never be validly be its own negator.

NEGATOR 句が定義された場合、それはある演算子に定義された演算子の否定子であると名付けます。両方の演算子がブール値を返し、入力 x 、 y の取り得る全てに対して (x A y) が NOT (x B y) と等しい場合、演算子 A は演算子 B の否定子であるといいます。また、B は A の否定子でもあることに注意して下さい。例えば、ほとんどの型では '<' と '>=' は否定子のペアとなります。演算子が自身の否定子になることは決してありません。

Unlike COMMUTATOR, a pair of unary operators could validly be marked as each others' negators; that would mean (A x) equals NOT (B x) for all x, or the equivalent for right-unary operators.

COMMUTATOR とは異なり、単項演算子のペアは互いの否定子になり得ます。 x の取り得る値すべてに対して (A x) が NOT (B x) に等しいこと、また、右単項演算子の場合も同様の等式がなりたつことを意味します。

An operator's negator must have the same left and/or right argument types as the operator itself, so just as with COMMUTATOR, only the operator name need be given in the NEGATOR clause.

演算子の否定子は、COMMUTATOR と同様に、その演算子の左引数、右引数の型と同じものをとらなければなりません。演算子の名前のみは NEGATOR 句で指定されたものでなければいけません。

Providing NEGATOR is very helpful to the query optimizer since it allows expressions like NOT (x = y) to be simplified into x <> y. This comes up more often than you might think, because NOTs can be inserted as a consequence of other rearrangements.

NOT (x = y) といった式を x <> y といった形に単純化させることができますので、NEGATOR を与える事はクエリーオブティマイザにとって非常に役に立ちます。他の再配置の結果として NOT は挿入されることがありますので、これは想像以上に多く起こります。

Pairs of negator operators can be defined using the same methods explained above for commutator pairs.

否定子のペアは、上述の交代演算子のペアで説明したものと同じ方法で定義されます。

RESTRICT

The RESTRICT clause, if provided, names a restriction selectivity estimation function for the operator (note that this is a function name, not an operator name). RESTRICT clauses only make sense for binary operators that return boolean. The idea behind a restriction selectivity estimator is to guess what fraction of the rows in a table will satisfy a WHERE-clause condition of the form

    		field OP constant

for the current operator and a particular constant value. This assists the optimizer by giving it some idea of how many rows will be eliminated by WHERE clauses that have this form. (What happens if the constant is on the left, you may be wondering? Well, that's one of the things that COMMUTATOR is for...)

RESTRICT 句が与えられた場合、それは、その演算子用の制限選択評価関数を指定します。（演算子名ではなく関数名であることに注意して下さい。）RESTRICT 句はブール値を返す二項演算子に対してのみ意味があります。制限選択評価の背景となる考えは、指定した演算子と特定の定数に対して、次の形式の WHERE 句の条件を満たすのはテーブルの中でどのくらいの割合の行が存在するかを推測することです。

    		field OP constant

この形式を持った WHERE 句によってどのくらいの行が除外されるのかを通知することで、オブティマイザの手助けをします。（定数値が左項にあったら何が起こるかとあなたは疑問を持っているかも知れませんね。そう、それは COMMUTATOR が提供するものの一つです。）

Writing new restriction selectivity estimation functions is far beyond the scope of this chapter, but fortunately you can usually just use one of the system's standard estimators for many of your own operators. These are the standard restriction estimators:

	eqsel		for =
	neqsel		for <>
	intltsel	for < or <=
	intgtsel	for > or >=

It might seem a little odd that these are the categories, but they make sense if you think about it. '=' will typically accept only a small fraction of the rows in a table; '<>' will typically reject only a small fraction. '<' will accept a fraction that depends on where the given constant falls in the range of values for that table column (which, it just so happens, is information collected by VACUUM ANALYZE and made available to the selectivity estimator). '<=' will accept a slightly larger fraction than '<' for the same comparison constant, but they're close enough to not be worth distinguishing, especially since we're not likely to do better than a rough guess anyhow. Similar remarks apply to '>' and '>='.

新しい制限選択評価関数の記述方法はこの章の範囲を越えていますが、好運な事に、大抵の場合システムが持つ標準的な評価関数の１つを、多くの自作の演算子用に使う事ができます。標準的な制限評価関数には次のものがあります。

	eqsel		= 用
	neqsel		<> 用
	intltsel	< 又は <= 用
	intgtsel	> 又は >= 用

これらがカテゴリであることは少し奇妙に見えるかもしれませんが、それを考えるとこれらは意味があります。 '=' は一般的にテーブル内の行の小さな部分を受け付けます。 '<>' は一般的に小さな部分を除きます。 '<' は、指定した定数がテーブルカラム（これは VACUUM ANALYZE によって収集される情報で、選択評価関数で使用できるように作成されます。）のとる値の範囲のどの辺りにあるのかに依存する量の部分を受け付けます。 '<=' は '<' よりも少しだけ大きな部分を受け付けます。この差は比較に用いた定数と同じ部分のためですが、特に大雑把な推測以上のことを行なうのは適切ではありませんので、区別する価値がないといえる位近い値です。'>' と ’＆gt；＝’についても同じ事がいえます。

You can frequently get away with using either eqsel or neqsel for operators that have very high or very low selectivity, even if they aren't really equality or inequality. For example, the regular expression matching operators (~, ~*, etc) use eqsel on the assumption that they'll usually only match a small fraction of the entries in a table.

非常に高い、または、非常に低い選択性をもつ演算子に、本当に同一または不同でない場合でも、 eqsel か neqsel を使わないでおく事もできます。例えば、正規表現比較演算子（ ~ 、 ~* など）は、テーブルのエントリの小部分にのみに合うものと仮定して、eqselを使用します。

JOIN

The JOIN clause, if provided, names a join selectivity estimation function for the operator (note that this is a function name, not an operator name). JOIN clauses only make sense for binary operators that return boolean. The idea behind a join selectivity estimator is to guess what fraction of the rows in a pair of tables will satisfy a WHERE-clause condition of the form

                table1.field1 OP table2.field2

for the current operator. As with the RESTRICT clause, this helps the optimizer very substantially by letting it figure out which of several possible join sequences is likely to take the least work.

JOIN 句が与えられた場合、それはその演算子用の結合選択評価関数の名前を示します。（これが演算子名ではなく関数名であることに注意して下さい。）JOIN 句はブール値を返す二項演算子のみで意味があります。結合選択評価関数の背景となる考えは、対象とする演算子について次の形式の WHERE 句条件をみたす行は二つのテーブルの間で、どのくらいの割合で存在するのかを推測する事です。

                table1.field1 OP table2.field2

RESTRICT 句の使用と同様、これは、いくつかの取り得る結合シーケンスのうちどれが最も仕事量が少ないように考えられるのかをオブティマイザに計算させることで大きな援助となります。

As before, this chapter will make no attempt to explain how to write a join selectivity estimator function, but will just suggest that you use one of the standard estimators if one is applicable:

前と同様に、この章では結合選択評価関数をどう書くのかを説明しようとはしません。ここでは、もし適用できるならば以下の標準的な評価関数のうちの一つを使う事を勧めます。

	eqjoinsel	for =
	neqjoinsel	for <>
	intltjoinsel	for < or <=
	intgtjoinsel	for > or >=

HASHES

The HASHES clause, if present, tells the system that it is OK to use the hash join method for a join based on this operator. HASHES only makes sense for binary operators that return boolean, and in practice the operator had better be equality for some data type.

HASHES 句が与えられた場合、それはシステムに対して、この演算子に基づいた結合にハッシュ結合方法を使っても問題が無い事を伝えます。 HASHES はブール値を返す二項演算子にのみ意味があります。実際には、あるデータ型の等価性を求める演算子であった方が良いです。

The assumption underlying hash join is that the join operator can only return TRUE for pairs of left and right values that hash to the same hash code. If two values get put in different hash buckets, the join will never compare them at all, implicitly assuming that the result of the join operator must be FALSE. So it never makes sense to specify HASHES for operators that do not represent equality.

ハッシュ結合の基礎となっている仮定は、結合演算子は左項と右項の値が同じハッシュコードを持つ時にのみ TRUE を返すことができるということです。２つの値が異なるハッシュの入れ物に置かれた場合、結合は、暗黙的に結合演算子の結果が必ず偽であるという仮定を行ない、それらを比べる事をしません。ですので、等価性を表さない演算子に HASHES を指定することは全く意味がありません。

In fact, logical equality is not good enough either; the operator had better represent pure bitwise equality, because the hash function will be computed on the memory representation of the values regardless of what the bits mean. For example, equality of time intervals is not bitwise equality; the interval equality operator considers two time intervals equal if they have the same duration, whether or not their endpoints are identical. What this means is that a join using "=" between interval fields would yield different results if implemented as a hash join than if implemented another way, because a large fraction of the pairs that should match will hash to different values and will never be compared by the hash join. But if the optimizer chose to use a different kind of join, all the pairs that the equality operator says are equal will be found. We don't want that kind of inconsistency, so we don't mark interval equality as hashable.

実際は、論理的な等価性はあまり十分ではありません。演算子は純粋にビット単位の等価性を表すものの方が好ましいです。ハッシュ関数は、ビットの意味は関係なく、メモリ内の値表現を使って計算されるからです。例えば、時間間隔の等価性はビット単位での等価性ではありません。間隔の等価性演算子は、二つの時間間隔がその終了時刻が異なっていた場合でも期間が同一であれば、その時間間隔は等価であるとみなします。これは、間隔フィールドとの間で "=" を使った結合は、ハッシュ結合を実装した場合とその他を実装した場合とで、異なる結果をもたらすことを意味しています。合うべきペアの多くの部分は異なる値にハッシュされ、ハッシュ結合時に比較されなくなるためです。しかし、オブティマイザが他種類の結合を使用する事を選んだ場合、等価性演算子が同一であるとした全てのペアが見つかります。このような矛盾は好ましくありませんので、間隔の等価性をハッシュ可能とはしません。

There are also machine-dependent ways in which a hash join might fail to do the right thing. For example, if your datatype is a structure in which there may be uninteresting pad bits, it's unsafe to mark the equality operator HASHES. (Unless, perhaps, you write your other operators to ensure that the unused bits are always zero.) Another example is that the FLOAT datatypes are unsafe for hash joins. On machines that meet the IEEE floating point standard, minus zero and plus zero are different values (different bit patterns) but they are defined to compare equal. So, if float equality were marked HASHES, a minus zero and a plus zero would probably not be matched up by a hash join, but they would be matched up by any other join process.

マシンに依存する理由で、ハッシュ結合が適切な処理を行なわずに失敗することがあります。例えば、データ型が不要な部分を埋めたビットを持つ可能性がある構造である場合、その等価性演算子に HASHES をつけることは安全ではありません。（もし、他の演算子を不要なビットが常に 0 になるように作成していたとしたら、多分話は変わります。）この他の例として、FLOAT データ型はハッシュ結合に使用するには安全ではないことがあります。IEEE 浮動小数点標準をみたすマシンではマイナス 0 とプラス 0 は異なる値（異なるビット列）となりますが、等価であるものと定義されます。そのため、浮動小数点の等価性演算子に HASHES を付けると、マイナス 0 とプラス 0 はハッシュ結合では多分一致されませんが、他の結合処理では一致するものとされます。

The bottom line is that you should probably only use HASHES for equality operators that are (or could be) implemented by memcmp().

最後に、おそらく memcmp() で実装された（できた）等価性演算子にのみ HASHES を使うべきです。

SORT1 and SORT2

The SORT clauses, if present, tell the system that it is permissible to use the merge join method for a join based on the current operator. Both must be specified if either is. The current operator must be equality for some pair of data types, and the SORT1 and SORT2 clauses name the ordering operator ('<' operator) for the left and right-side data types respectively.

SORT 句がある場合、それはシステムに対して指定演算子に基づいた結合にマージ結合方式を使う事ができることを伝えます。指定演算子はあるデータ型のペアの等価性演算子でなければいけません。そして、 SORT1 句と SORT2 句はそれぞれ左側、右側用の順序付演算子（ '<' 演算子）の名前を示します。

Merge join is based on the idea of sorting the left and righthand tables into order and then scanning them in parallel. So, both data types must be capable of being fully ordered, and the join operator must be one that can only succeed for pairs of values that fall at the "same place" in the sort order. In practice this means that the join operator must behave like equality. But unlike hashjoin, where the left and right data types had better be the same (or at least bitwise equivalent), it is possible to mergejoin two distinct data types so long as they are logically compatible. For example, the int2-versus-int4 equality operator is mergejoinable. We only need sorting operators that will bring both datatypes into a logically compatible sequence.

マージ結合は、テーブルの左側、右側を順序良くソートし、並列にスキャンするという考えに基づいています。ですので、両データ型は十分に順序付けされている必要があり、結合演算子はソート順で "同じ場所" にある値のペアをのみを成功したものとするものである必要があります。実際問題として、これは、結合演算子は等価性のような振舞いをしなければならないことを意味しています。左右のデータ型が同じ（または少なくともビット単位での等価）であることが望ましいハッシュ結合とは異なり、マージ結合は論理的な互換性を持つ別の２つのデータ型をとることができます。例えば、 int2 対 int4 の等価性演算子はマージ結合が可能です。両方のデータ型を論理的な互換性を保つ順番にソートする演算子のみが必要です。

When specifying merge sort operators, the current operator and both referenced operators must return boolean; the SORT1 operator must have both input datatypes equal to the current operator's left argument type, and the SORT2 operator must have both input datatypes equal to the current operator's right argument type. (As with COMMUTATOR and NEGATOR, this means that the operator name is sufficient to specify the operator, and the system is able to make dummy operator entries if you happen to define the equality operator before the other ones.)

マージソート演算子を指定する時は、対象とする演算子と参照された両演算子はブール値を返さなければいけません。 SORT1 演算子は、対象とする演算子の左引数の型と同じデータ型を入力として２つ持たなければいけません。 SORT2 演算子は、対象とする演算子の右引数の型と同じデータ型を入力として２つ持たなければいけません。（ COMMUTATOR と NEGATOR を使う時と同じように、演算子名は演算子の指定に十分なものです。他を定義する前に、等価性演算子を定義すると、システムはダミー演算子エントリを作成する事ができます。）

In practice you should only write SORT clauses for an '=' operator, and the two referenced operators should always be named '<'. Trying to use merge join with operators named anything else will result in hopeless confusion, for reasons we'll see in a moment.

実際には、'=' 演算子用の SORT 句だけを記述し、２つの参照される演算子を常に '<' という名前にしておくべきです。他の名前の演算子を使ってマージ結合を使用すると、絶望的な混乱を引き起こします。その理由は後で解ります。

There are additional restrictions on operators that you mark mergejoinable. These restrictions are not currently checked by CREATE OPERATOR, but a merge join may fail at runtime if any are not true:

マージ結合を行なう演算子には追加的な制約があります。この制約は今のところ CREATE OPERATOR で点検されませんが、これが真でないと、マージ結合は実行時に失敗する可能性があります。

The mergejoinable equality operator must have a commutator (itself if the two data types are the same, or a related equality operator if they are different).
マージ結合が可能な等価性演算子は交替演算子（二つのデータ型が同じならば、演算子自身、さもなくば、これに関連した等価性演算子）を持つ必要があります。
There must be '<' and '>' ordering operators having the same left and right input datatypes as the mergejoinable operator itself. These operators must be named '<' and '>'; you do not have any choice in the matter, since there is no provision for specifying them explicitly. Note that if the left and right data types are different, neither of these operators is the same as either SORT operator. But they had better order the data values compatibly with the SORT operators, or mergejoin will fail to work.
左右とも同じデータ型をもつ '<' と '>' という名前の、それ自身がマージ結合な演算子である、順序付演算子が必要です。この演算子は '<' と '>' という名前である必要があります。これらを明示的に指定する方法はありませんので、この点については選択の余地がありません。左右のデータ型が異なる場合、この演算子は互いの SORT 演算子と異なるものであることに注意して下さい。 SORT 演算子を使った場合と互換性をもって、データの値をうまく順序づけしますが、マージ結合は失敗します。

Prev	Home	Next
SQLの拡張: 型		SQLの拡張: 集約