• notice
  • Congratulations on the launch of the Sought Tech site

tomcat rewrite mechanism

Introduction to the rewrite mechanism

Rewrite Valve (Rewrite Valve) The way to implement URL rewriting is very similar to the mod_rewrite module of Apache HTTP Server.

configure

The rewrite Valve is configured as a Valve by using the org.apache.catalina.valves.rewrite.RewriteValve class name.

After configuration, an override Valve can be added to the Host as a Valve. Refer to the virtual server documentation for configuration details. This Valve uses a rewrite.config file that contains rewrite directives and must be placed in the Host configuration folder.

In addition, overriding valve can also be used in the context.xml of the web application. This Valve uses a rewrite.config file that contains rewrite directives and must be placed in the web application's WEB-INF folder.

instruction

The rewrite.config file contains a series of directives similar to those used by mod_rewrite, especially the core RewriteRule and RewriteCond directives.

Note: This section is modified from the mod_rewrite documentation, copyrighted by the Apache Software Foundation (1995-2006) and released under the Apache License.

1. RewriteCond

Format: RewriteCond TestString CondPattern

The RewriteCond directive defines a rule condition. One or more RewriteCond directives may execute before RewriteRule directives. The following rules will only be used if the current state of the URI matches its pattern and these conditions are met.

A TestString is a string that, in addition to simple text, can contain the following extended structures.

  • RewriteRule backreferences Backreferences to the form $N (0 <= N <= 9). Provides access to the component parts of the pattern (in parentheses), subordinate to the RewriteRule of the current state of the RewriteCond condition.

  • RewriteCond backreferences

  • RewriteMap expansions

  • Server-Variables These are variables in the form %{ NAME_OF_VARIABLE } . NAME_OF_VARIABLE in %{ NAME_OF_VARIABLE } is a string taken from the following list:

  • HTTP headers:

HTTP\_USER\_AGENT  HTTP\_REFERER  HTTP\_COOKIE  HTTP\_FORWARDED  HTTP\_HOST  HTTP\_PROXY\_CONNECTION  HTTP\_ACCEPT   
  • Connection and request:

REMOTE\_ADDR  REMOTE\_HOST  REMOTE\_PORT  REMOTE\_USER  REMOTE\_IDENT  REQUEST\_METHOD  SCRIPT\_FILENAME  REQUEST\_PATH  CONTEXT\_PATH  SERVLET\_PATH  PATH\_INFO  QUERY\_STRING  AUTH\_TYPE  
  • Inside the server:

DOCUMENT\_ROOT    SERVER\_NAME  SERVER\_ADDR  SERVER\_PORT  SERVER\_PROTOCOL  SERVER\_SOFTWARE  
  • Date and time:

TIME\_YEAR  TIME\_MON  TIME\_DAY  TIME\_HOUR  TIME\_MIN  TIME\_SEC  TIME\_WDAY  TIME 
  • Special string:

THE\_REQUEST  REQUEST\_URI  REQUEST\_FILENAME  HTTPS  

These variables correspond to similarly named HTTP MIME headers and Servlet API methods. Most are documented in various manuals and CGI specifications. Here's a list of those variables that override Valve-specific:

  • REQUEST_PATH
    corresponds to the full path applied to the mapping.

  • CONTEXT_PATH
    corresponds to the path of the mapped context.

  • SERVLET_PATH
    corresponds to the servlet path.

  • THE_REQUEST
    The complete line of HTTP request code sent by the browser to the server (eg, GET /index.html HTTP/1.1). This does not include any extra headers sent by the browser.

  • REQUEST_URI
    The resource requested in the HTTP request code line (in the above example, it should be /index.html).

  • REQUEST_FILENAME
    The full local filesystem path to the file or script that matches the request.

  • HTTPS
    contains the text "on" when the connection uses SSL/TLS, and "off" otherwise.

Also note that:

  1. SCRIPT_FILENAME and REQUEST_FILENAME contain the same value: the value of the filename field of the Apache server's internal structure request_rec. The first name is often referred to as the CGI variable value, and the second name is equivalent to REQUEST_URI (containing the value of the uri field of request_rec).

  2. %{ENV:variable}. where variable can be any Java system property. Currently available.

  3. %{SSL:variable}. where variable is the SSL environment variable name. Not yet implemented. Example: %{SSL:SSL_CIPHER_USEKEYSIZE} may expand to 128.

  4. %{HTTP:header}. where header can be any HTTP MIME header name. This variable is often used to get the header value sent to the HTTP request. Example: %{HTTP:Proxy-Connection} is the value of the HTTP header Proxy-Connection:.

CondPattern, the conditional pattern, is a regular expression applied to the current instance of TestString. TestString is first evaluated before matching CondPattern.

Remember: CondPattern is a perl compatible regular expression with some extensions.

  1. A non-matching pattern can be specified by prefixing the pattern string with a ! character.

  2. There are some special variants of CondPattern. In addition to true regular expression strings, one of the following combinations can be used:

  • Treat CondPattern as a pure string and compare it to TestString in alphabetical order. True if TestString is higher than CondPattern in alphabetical order.

  • >CondPattern (alphabetically below) treats CondPattern as a pure string and compares it to TestString in alphabetical order. True if TestString is lower than CondPattern alphabetically.

  • =CondPattern' (alphabetically equals)

  • Treat CondPattern as a pure string and compare it to TestString in alphabetical order. True if TestString is equal to CondPattern in alphabetical order.

  • -d (directory)

  • Treat TestString as a kind of pathname and test that it exists and is a directory.

  • -f (regular file) Treat TestString as a kind of pathname and test if it exists and is a regular file.

  • -s (regular file, with file size)

  • Treat TestString as a kind of pathname and test if it exists and is a normal file with a file size greater than 0.

Note: All of these tests can be prefixed with ! to reverse their meaning.

  1. You can use [flag] as the third parameter of the RewriteCond directive to set the flag for CondPattern. where flag is a comma-separated list of any of the following flags:

  • nocase|NC (case-insensitive) Case-insensitive, neither in extended TestString nor in CondPattern (AZ and az are equivalent). This token is only valid when comparing TestString and CondPattern. Not valid for filesystem and subrequests (HTTP requests).

ornext|OR (or the next condition) utilizes a local OR (rather than an implicit AND) to combine rule conditions. Typical examples are:

RewriteCond %{REMOTE_HOST}  ^host1.*  [OR]  RewriteCond %{REMOTE_HOST}  ^host2.*  [OR]  RewriteCond %{REMOTE_HOST}  ^host3.*  RewriteRule ...some special stuff for any of these hosts...     

Without this token, you would have to write the condition/rule pair three times.

example:

If you want to rewrite the website home page according to the User-Agent: in the request header, you can use the following code:

RewriteCond  %{HTTP_USER_AGENT}  ^Mozilla.*RewriteRule  ^/$                 /homepage.max.html  [L]RewriteCond  %{HTTP_USER_AGENT}  ^Lynx.*RewriteRule  ^/$                 /homepage.min.html  [L]RewriteRule  ^/$                 /homepage.std.html  [L]

Note: If the browser in use identifies itself as 'Mozilla' (including Netscape Navigator, Mozilla, etc.) then this is the max homepage (max homepage. It can contain frames or other features); if using Lynx browser (terminal-based), then you get a min homepage—a version designed for easy viewing of text; if none of these conditions apply (you are using another browser, or Your browser identifies itself as non-standard content), then you get a standard homepage (std homepage).

2. RewriteMap

Format: RewriteMap name rewriteMapClassName optionalParameters.

The mapping is implemented through an interface that the user must implement. The class name is org.apache.catalina.valves.rewrite.RewriteMap, and the code is:

package org.apache.catalina.valves.rewrite;public interface RewriteMap {public String setParameters(String params);public String lookup(String key);}

3. RewriteRule

Format: RewriteRule Pattern Substitution

The RewriteRule directive is the heart of the rewrite mechanism. This directive can be used multiple times, each instance defines a separate rewrite rule. The order in which these rules are defined is particularly important, as this is the order in which they are applied at runtime.

The pattern is a perl-compatible regular expression applied to the current URL, where "current" refers to the URL at which the rule is in effect, which may be different from the requested URL, because other rules may have matched and It makes changes.

Here are some tips on regular expression formatting:

text:

  • . - matches any single character

  • [chars] - matches the current character

  • [^chars] - does not match the current character

  • text1|text2 - matches text1 or text2

quantifier:

  • ? - zero or one character before the ? sign

  • * - zero or N characters before the * sign (N > 0)

  • + - zero or N characters before the + sign (N > 1)

Grouping: (text) - text grouping (set Nth group can be referenced as RewriteRule)

row anchor

^ - matches the empty string at the beginning of a line.
$ - matches the empty string at the end of a line.

escape

\char - Escape the specified characters (such as ., [], () and other characters)

For more information on regular expressions, see the perl regular expressions online manual (perldoc perlre). For more information on regular expressions and their variants (POSIX regular expressions), check out this book:

*Mastering Regular Expressions, 2nd Edition* (目前该书为第 3 版)Jeffrey E.F. Friedl O'Reilly & Associates, Inc. 2002ISBN 978-0-596-00289-3  

In rules, the NOT character (!) can be used as a prefix to the pattern, implementing the reverse pattern. For example "if the current URL does not match the pattern". This can be used in exceptional cases where reverse patterns are easy to match, or as a last default rule.

Note: When using the NOT character to invert the pattern, you cannot include grouped wildcard components in the pattern. This is because, if the pattern does not match (eg, anti-match), there will be no content in the packet. If reverse mode is used, $N cannot be used in substitution strings.

A substitution string in a rewrite rule is a string that is used to replace the original URL matched by the pattern. In addition to plain text, it includes:

  1. Backreference ($N) RewriteRule pattern.

  2. Backreference (%N) the last matched RewriteCond pattern.

  3. The server variable in the rule condition test string (%{VARNAME}).

  4. Map function call (${mapname:key|default}).

Backreferences are of the form $N (where N is in the range 0-9), which refers to replacing the URL with the content of the Nth group matched by the pattern. The same is used for the TestString of the server variable RewriteCond directive. The mapping function comes from the RewriteMap directive. These three types of variables are expanded in the order described above.

As mentioned before, all rewrite rules are applied to substitution strings (in the order defined in the configuration file). The URL is completely replaced by the replacement string, and the rewriting process does not end until all rules have been applied (or terminated with an L tag).

There is also a special substitution string: -, which means no substitution, which is useful when you need to make the rewrite rule only match the URL without substitution. This string is often used in conjunction with the C (chain) token to apply multiple patterns before replacement occurs.

Alternatively, you can use [flags] as the third parameter to RewriteRule to set special flags for the replacement string. flags is a comma-separated list of the following flags:

  • chain|C (chain with next rule)

This tag links the current rule to the next rule (which in turn can be linked to the following rule, and so on). It has the following effect: if a rule is matched, its successor rules will normally continue to be processed, and this flag has no effect; if a rule is not matched, its successor chain rules will be ignored. For example, when performing an external redirect, for a directory-level ruleset, you may need to remove .www (because .www should not appear here).

  • cookie|CO = NAME:VAL:domain[:lifetime[:path]] (set cookie)

Set a cookie on the client browser. The cookie's name is NAME and its value is VAL. The domain field is the domain of the cookie, such as .apache.org, the optional lifetime is the cookie lifetime (in minutes), and the optional path is the path to the cookie.

  • env|E = VAR:VAL (set environment variable)

Forces a request variable named VAR to be VAL, which can contain expandable backreferenced regular expressions $N and %N. This tag can be used multiple times to set multiple variables.

  • forbidden|F (Force access to this URL is forbidden)

Enforce access to the current URL - an HTTP response code 403 is returned immediately. Using this tag, several RewriteConds can be chained to block certain URLs.

  • gone|G (force the URL to be invalid)

Force the current URL to be invalid - immediately return an HTTP response code of 410 (requested resource deleted). Using this tag, it is possible to indicate that the page has been deleted and does not exist.

  • host|H=Host (rewrite virtual host)

  • Rewrite virtual hosts, not URLs.

  • last|L (last rule)

Immediately stop the rewrite operation and no other rewrite rules will be applied. It corresponds to the last command in Perl or the break command in C language. Use this flag to prevent URLs that are currently being rewritten from being rewritten by subsequent rules. For example, it can be used to rewrite the root path URL (/) to an actual URL, such as: /e/www/.

  • next|N (re-execute)

Rewrite the rewrite operation (starting over from the first rewrite rule). At this point, the URL to be matched is not the original URL, but the URL processed by the last rewrite rule. It corresponds to the next command in Perl or the continue command in C language. This flag restarts the rewrite operation, immediately back to the head of the loop.

But be careful not to create an infinite loop!

  • nocase|NC (case insensitive)

Make the pattern case-insensitive. There is no difference between AZ and az when the pattern matches the current URL.

  • noescape|NE (do not escape URIs in output)

This flag prevents rewrite Valve from applying normal URI escape rules to the rewrite result. In general, special characters (such as %, $, ;, etc.) are escaped as hexadecimal values. This flag prevents this escaping, allowing symbols such as percent signs to appear in the output, as in:

RewriteRule /foo/(.*) /bar?arg=P1\%3d$1 [R,NE]

Convert /foo/zed to a safe request for /bar?arg=P1=zed .

  • qsappend|QSA (append query string)

This flag forces the rewrite engine to add a query string portion of the replacement string to the existing string, rather than simply replacing the existing string. This tag can be used when you want to add more data to the query string via rewrite rules.

  • redirect|R[=code] (force redirection)

Prefix the replacement string with http://thishost[:thisport]/ (making the new URL a URI) to force an external redirect. If no code is specified, an HTTP response code 302 (Moved Temporarily) is generated. If you need to use another response code in the 300-400 range, simply specify this value here. In addition, the following symbolic names can be used: temp (default), permanent, seeother. Use it to feed back normalized URLs to the client, such as rewriting /~ to /u/, or adding a slash to /u/user, etc.

Note: When using this tag, you must ensure that the replacement field is a valid URL! Otherwise it will point to an invalid location! And keep in mind that this tag itself just prefixes the URL with http://thishost[:thisport]/ and doesn't prevent rewriting. Typically, you will want to stop the rewrite and then redirect immediately. To stop rewriting, you also need to add the L flag.

  • skip|S=num (ignore subsequent rules)

If the current rule matches, this flag forces the rewrite engine to skip num rules following the current matching rule. It can implement an if-then-else-like construct: the last rule of the then clause is skip = N, where N is the number of rules in the else clause. (This token is different from the chain|C token!)

  • type|T=MIME-type (mandatory to specify MIME type)

Forces the MIME type of the specified target file to be MIME-type, which can set the content type based on some conditions. For example, in the code snippet below, if .php files are called with the .phps extension, they can be displayed by the mod_php module.

RewriteRule ^(.+\.php)s$ $1 [T=application/x-httpd-php-source]


Tags

Technical otaku

Sought technology together

Related Topic

0 Comments

Leave a Reply

+