• notice
  • Congratulations on the launch of the Sought Tech site

In-depth understanding of JVM (5): class file structure

Today we will introduce the Class class file structure

1 Overview

Although computers can only recognize 0 and 1, more and more programming languages choose a platform-neutral format that is independent of the operating system and machine instruction set as the storage format after program compilation. The Java virtual machine is not bound to any language including Java, and is only associated with a specific binary file such as a "class file". The class file contains the Java virtual machine instruction set symbol table and several other auxiliary information. As a general-purpose, machine-independent execution platform, the Java Virtual Machine can be used by any other language as a language product delivery medium.

2.Class class file structure

A class file is a set of binary streams based on 8-bit bytes. Each data item is compactly arranged in the class file in strict order without any separators, which makes almost all the contents stored in the entire class file are The necessary data for the program to run, there is no gap. When encountering a data item occupying more than 8 bytes of space, it will be divided into several 8-bit bytes for storage in a high-order manner.

The class file structure is not like XML and other description languages. Because it does not have any segmentation symbols, the details such as the number and even the storage byte order are strictly limited. Which byte represents what meaning, what is the length, and what is the order. Changes are not allowed.

The Class file format uses a pseudo structure similar to the C language structure to store the number. This pseudo structure has two data types:unsigned numbers and tables.

A table is a composite data type composed of multiple unsigned numbers and other tables as data items, and all tables habitually end with "_info". The table is used to describe the data of the compound structure with hierarchical relationship, and the whole Class file is essentially a table. It consists of the following table.
20181028105800660 (1).jpg

2.1 Magic number and version of Class file

The first four bytes of each Class file are called the magic number, and its only function is to determine whether the file can be received by the virtual machine as a Class file. The next four bytes store the version number of the Class file: the fifth and sixth are the minor version numbers, and the seventh and eighth are the major version numbers.

2.2 Constant pool

Next is the entry of the constant pool. The constant pool can be understood as the medium resource warehouse of the Class file. It is the type of data structure that the Class file structure is most associated with other projects, and it is also one of the largest data items occupying the Class file. It is also a Class file. The first occurrence of a table-type data item in the file. There are two main constants in the constant pool:Character quantities and symbolic references.literalIt is closer to the constant concepts at the Java language level, such as text strings, constant values declared as final, etc. andSymbolic referenceIt belongs to the concept of compilation principle. It includes the following three types of constants:

  1. Fully qualified names of classes and interfaces

  2. Field names and descriptors

  3. method name and descriptor

2.3 Access flags

After the constant pool ends, the next two bytes represent the access flag, which is used to identify the access information of some classes or interface layers, including whether the Class is a class or an interface, whether it is a public or abstract type, if it is a class Whether to declare as final and so on.

2.4 Class index, parent class index and interface index collection

The class index, parent class index and interface index collection are arranged in order after the access flag, and the Class file determines the inheritance relationship of this class by these three items of data. The class index is used to determine the fully qualified name of the class, and the parent class index is used to determine the fully qualified name of the parent class. Due to the single inheritance of the java language, there is only one parent class index, except for java.lang.Object, All java classes have a parent class, so except java.lang.Object, the parent class index of all java classes is not 0. The interface index set is used to describe which interfaces this class implements. These implemented interfaces will be arranged in the interface index set from left to right in the order of the interfaces after implements (if the class itself is an interface, it is extends).

2.5 Field table collection

Field tables are used to describe variables declared in interfaces or classes. Fields include class-level variables as well as instance variables, but not local variables declared inside methods.

We can think about what information can be contained in describing a field in java?

The scope of the field (public, private, protected modifiers), whether it is an instance variable or a class variable (static modifier), variability (final), concurrent visibility (volatile modifier, whether to force reading and writing from main memory), whether Serialized (transient modifier), field data type, field name. In the above information, each modifier is a boolean value, either with a certain modifier or not, it is very suitable to use the flag to represent. The name of the field and the data type of the field cannot be fixed, and can only be described by referring to the constants in the constant pool.

2.6 Method table collection

The description of methods in the Class file storage format is almost identical to the description of fields. The structure of the method object is the same as the field table, including access flags, name indexes, descriptor indexes, and attribute table sets in turn.

A little mention here, because the volatile modifier and the transient modifier cannot modify the method, so the access flag of the method table does not have these two corresponding flags, but the keyword modification methods such as synchronized, native, abstract are added, so it is also There are more signs corresponding to these keywords.

2.7 Property Sheet Collection

Class files, field tables, and method tables can all carry their own set of attribute tables to describe information specific to certain scenarios. Different from the order, length and content required by other data items in the Class file, the restrictions on the attribute table collection are slightly looser, and each attribute table is no longer required to have a strict order, and as long as it does not duplicate the existing attribute name, anyone The implemented compiler can write its own defined attribute information into the attribute table, and the Java virtual machine will ignore the attributes it does not recognize when it runs.

Property names are all characters stored in the corresponding constant pool.

3. Introduction to bytecode instructions

In the instruction set of the Java virtual machine, most of the instructions contain the data type information corresponding to their operations. For example, the iload instruction is used to load int type data from the local variable table into the operand stack, while the fload instruction loads is data of type float.

Most of the instructions do not support integer types byte, char, short or even boolean types. Most operations on byte, char, short, and Boolean types actually use the corresponding int type as the operator type.

3.1 Load and Store Instructions

Load and store instructions are used to transfer data back and forth between the local variable table in the stack frame and the operand stack.

3.2 Operation Instructions

Operations or arithmetic instructions are used to perform a specific operation on the values on the operand stack and restore the result to the top of the operand stack. into the top of the operation stack. In general, arithmetic instructions can be divided into two types: operations on integer data and operations on floating-point data. (Because there is no byte, char, short, and boolean types, operations on such data should be replaced by int type instructions)

3.3 Type Conversion Instructions

Type conversion instructions can convert two different numeric types to each other. (For example, int class is converted to float type)

3.4 Object Creation and Access Instructions

Although both class instances and arrays are objects, the Java virtual machine uses different bytecodes for the creation and manipulation of class instances and arrays.

3.5 Operand stack management instructions

Just like manipulating the stack in the data structure, the Java virtual machine also provides some instructions for directly manipulating the operand stack.

3.6 Control transfer instructions

It can be considered that the control transfer instruction is to conditionally or unconditionally modify the value of the PC register.

3.7 Method call and return instructions

  • The invokevirtual instruction is used to invoke an instance method of an object

  • The invokeinterface instruction is used to invoke interface methods

  • The invokespecial instruction is used to invoke some instance methods that require special handling

  • The invokestatic instruction is used to invoke a class method (static method)

  • The invokedynamic instruction is used to dynamically resolve the method used by the call-site qualifier at runtime.

Method invocation instructions are independent of data type, while method return instructions are distinguished according to the type of the return value.

3.8 Exception Handling Instructions

In the Java virtual machine, handling exceptions is not implemented by bytecode instructions, but by means of exception tables.

3.9 Synchronization Instructions

The java virtual machine can support method-level synchronization and synchronization of a sequence of instructions within a method. These two synchronization structures are supported by Guancheng.


Technical otaku

Sought technology together

Related Topic


Leave a Reply