Shell Script to List ACPI tables

ACPI (Advanced Configuration and Power Interface) defines platform-independent interfaces for hardware discovery, configuration, power management and monitoring, and these tables contain lots of useful information for low-level programmers such as myself.

Here is a short shell script which I clobbered together a few days ago to list out the ACPI tables on a system together with a short description of each table where possible.

#!/bin/bash
#
#   Author:  Finnbarr P. Murphy 
#     Date:  January 2015
#  Purpose:  List ACPI tables 
#  License:  BSD
#

TMP1=$(mktemp -u -p /var/tmp/ acpiXXXXXX)
TMP2=$(mktemp -u -p /var/tmp/ acpiXXXXXX)
TMP3=$(mktemp -u -p /var/tmp/ acpiXXXXXX)

cat << EOF | sed -e 's/^[ \t]*//' > $TMP1 
    APIC, Multiple APIC Description
    BERT, Boot Error Record
    BGRT, Boot Graphics Resource
    BOOT, Simple Boot Flag
    CPEP, Corrected Platform Error Polling
    CRST, Core System Resources
    DBG2, Micosoft Debug Port2 
    DSDT, Differentiated System Description
    ECDT, Embedded Controller Boot Resources
    EINJ, Error Injection
    ERST, Error Record Serialization
    FACP, Fixed ACPI Description
    FACS, Firmware ACPI Control Structure
    FADT, Fixed ACPI Description
    GTDT, Generic Timer Description
    HEST, Hardware Error Source
    HPET, High Precision Event Timer
    MADT, Multiple APIC Description
    MCFG, Memory Configuration
    MSCT, Maximum System Characteristics
    MSDM, Microsoft Data Management
    PMTT, Platform Memory Topology 
    PSDT, Persistent System Description
    RASF, RAS Feature
    RSDP, Root System Description Pointer
    RSDT, Root System Description
    SBST, Smart Battery
    SLIC, Software Licensing Description
    SLIT, System Locality Distance Information
    SPCR, Serial Port Console Redirection
    SRAT, System Resource Affinity
    SSDT, Secondary System Descriptor
    XSDT, Extended System Description
EOF

# get the list of ACPI tables that Linux knows about
ls -1 /sys/firmware/acpi/tables | sed '/dynamic/d' > $TMP2
ls -1 /sys/firmware/acpi/tables/dynamic >> $TMP2
sort $TMP2 > $TMP3 

# output each table with a description where possible
awk -F, 'NR == FNR { a[$1] = $2; next }
         { 
             desc = a[substr($1,1,4)] 
             if (desc == "") {
                 desc = " Unknown ACPI Table"
             }
             printf "%s\t-%s Table\n", $1, desc
         }' $TMP1 $TMP3 

rm -f $TMP1 $TMP2 $TMP3 

exit 0


The above code should be self-explanatory to most experienced Linux administrators who are familiar with bash and awk.

Here is the output for my current system:

$ ./listapci
APIC	- Multiple APIC Description Table
BGRT	- Boot Graphics Resource Table
DSDT	- Differentiated System Description Table
FACP	- Fixed ACPI Description Table
FACS	- Firmware ACPI Control Structure Table
HPET	- High Precision Event Timer Table
MCFG	- Memory Configuration Table
SSDT1	- Secondary System Descriptor Table
SSDT2	- Secondary System Descriptor Table
SSDT3	- Secondary System Descriptor Table
SSDT4	- Secondary System Descriptor Table
SSDT5	- Secondary System Descriptor Table
SSDT6	- Secondary System Descriptor Table
$


The list of ACPI tables included with the script contains the most common ACPI tables up to revision 5 of the ACPI specification. It does not include some of the more uncommon tables. You can easily add those yourself if you are so inclined.

Out of Memory Killer

I am logged in on pts/1 and using the Bash shell. As shown below, associated with my Bash shell process are three pseudo-files in procfs whose names start with oom. This post discusses the purpose of these files.

# ps
  PID TTY          TIME CMD
 1688 pts/1    00:00:00 ps
10290 pts/1    00:00:00 sudo
10291 pts/1    00:00:00 su
10294 pts/1    00:00:00 bash
# ls -l /proc/10294/oom*
-rw-r--r--. 1 root root 0 Dec 26 17:13 /proc/10294/oom_adj
-r--r--r--. 1 root root 0 Dec 26 17:13 /proc/10294/oom_score
-rw-r--r--. 1 root root 0 Dec 26 17:13 /proc/10294/oom_score_adj
# cat /proc/10294/oom_score
0


It turns out that these three files have to do with Linux out of memory (OOM) management. Linux can be configured to overcommit memory by changing the value of the overcommit_memory kernel variable.

overcommit_memory:

This value contains a flag that enables memory overcommitment.

When this flag is 0, the kernel attempts to estimate the amount
of free memory left when userspace requests more memory.

When this flag is 1, the kernel pretends there is always enough
memory until it actually runs out.

When this flag is 2, the kernel uses a "never overcommit"
policy that attempts to prevent any overcommit of memory.
Note that user_reserve_kbytes affects this policy.

This feature can be very useful because there are a lot of
programs that malloc() huge amounts of memory "just-in-case"
and don't use much of it.

The default value is 0.

This allows memory allocation functions such as malloc() to allocate virtual memory with no guarantee that physical storage for it exists.

Memory overcommitment is useful. Without it, a system may fail to fully utilize its memory. Overcommitting memory allows a system to use virtual memory in a more efficient way but with the risk of running out of physical memory. This is fine until the kernel cannot find sufficient physical memory to back a virtual memory page when needed.

The purpose of the kernel OOM killer routine is free up memory for the system when all other memory management freeing techniques fail. It does this by killing selected processes until sufficient memory is freed to stabilize the system. OOM killer has several configuration options that enable some choice in the behaviour of the system when it is faced with an out-of-memory condition.

OOM Killer attempts to select the “best” processes to kill to achieve system stability, i.e. the least number of processes which will free up the maximum amount memory upon termination and which are also the least important processes as far as the system is concerned. Obviously, it will also kill any process sharing the same mm_struct as the selected process.

To facilitate process selection, the kernel maintains an oom_score for each process. The higher the value, the more likelihood of a process and its children getting killed by OOM Killer in an out-of-memory situation.

The oom_score_adj kernel variable exists to enable a user to have some control of the OOM Killer process selection. The deprecated kernel variable oom_adj provides similar functionality.

3.1 /proc//oom_adj & /proc//oom_score_adj- Adjust the oom-killer score
--------------------------------------------------------------------------------

These file can be used to adjust the badness heuristic used to select which
process gets killed in out of memory conditions.

The badness heuristic assigns a value to each candidate task ranging from 0
(never kill) to 1000 (always kill) to determine which process is targeted.  The
units are roughly a proportion along that range of allowed memory the process
may allocate from based on an estimation of its current memory and swap use.
For example, if a task is using all allowed memory, its badness score will be
1000.  If it is using half of its allowed memory, its score will be 500.

There is an additional factor included in the badness score: the current memory
and swap usage is discounted by 3% for root processes.

The amount of "allowed" memory depends on the context in which the oom killer
was called.  If it is due to the memory assigned to the allocating task's cpuset
being exhausted, the allowed memory represents the set of mems assigned to that
cpuset.  If it is due to a mempolicy's node(s) being exhausted, the allowed
memory represents the set of mempolicy nodes.  If it is due to a memory
limit (or swap limit) being reached, the allowed memory is that configured
limit.  Finally, if it is due to the entire system being out of memory, the
allowed memory represents all allocatable resources.

The value of /proc//oom_score_adj is added to the badness score before it
is used to determine which task to kill.  Acceptable values range from -1000
(OOM_SCORE_ADJ_MIN) to +1000 (OOM_SCORE_ADJ_MAX).  This allows userspace to
polarize the preference for oom killing either by always preferring a certain
task or completely disabling it.  The lowest possible value, -1000, is
equivalent to disabling oom killing entirely for that task since it will always
report a badness score of 0.

Consequently, it is very simple for userspace to define the amount of memory to
consider for each task.  Setting a /proc//oom_score_adj value of +500, for
example, is roughly equivalent to allowing the remainder of tasks sharing the
same system, cpuset, mempolicy, or memory controller resources to use at least
50% more memory.  A value of -500, on the other hand, would be roughly
equivalent to discounting 50% of the task's allowed memory from being considered
as scoring against the task.

For backwards compatibility with previous kernels, /proc//oom_adj may also
be used to tune the badness score.  Its acceptable values range from -16
(OOM_ADJUST_MIN) to +15 (OOM_ADJUST_MAX) and a special value of -17
(OOM_DISABLE) to disable oom killing entirely for that task.  Its value is
scaled linearly with /proc//oom_score_adj.

The value of /proc//oom_score_adj may be reduced no lower than the last
value set by a CAP_SYS_RESOURCE process. To reduce the value any lower
requires CAP_SYS_RESOURCE.

Caveat: when a parent task is selected, the oom killer will sacrifice any first
generation children with separate address spaces instead, if possible.  This
avoids servers and important system daemons from being killed and loses the
minimal amount of work.

As stated earlier, processes to be killed are selected based on their badness score which is visible to a user as /proc/<PID>/oom_score. See this article in LWN (Linux Weekly News) for more information about how badness is calculated. The process, and any children, with the highest badness score is killed first.

There is lots more to OOM Killer than I have time to cover in this post. Just do an Internet search and you will find plenty of additional information.

Decompiling ACPI Tables

Advanced Configuration and Power Interface (ACPI) is a specification which defines platform-independent interfaces for hardware discovery, configuration, power management and monitoring. It was initially developed by Intel, Microsoft and Toshiba in 1996 and revised regularly since then. In early 2013, stewardship of the specification was transferred to the Unified Extensible Firmware Interface Forum (UEFI). The latest version of the specification is 5.1 which was released in July 2014.

The functional areas of the ACPI specification include:

  • System power management
  • Device power management
  • Processor power management
  • Configuration / Plug and Play
  • System Events
  • Battery management
  • Thermal management
  • Embedded controllers
  • SMBus controller

ACPI mainly consists of ACPI tables. The tables define sets of TLV structures (Tag Length Value). The firmware passes these tables to the kernel as a pointer to a root table. The kernel is free to uses or ignore these tables. All the tables can be decompiled with open source tools. The Linux kernel has a builtin ACPI interpreter and scripting language.

ACPI tables contain blocks of configuration information or, alternatively, executable code in a compiled bytecode called AML (ACPI Machine Language). The bytecode provided by the firmware is executed in kernel space. AML bytecode can be found in the DSDT and in SSDTs (Secondary System Description Table).

The ACPI tables are accessible on Linux under the /sysfs pseudo-filesystem:

# cd /sys/firmware/acpi
# ls -al
total 0
drwxr-xr-x. 5 root root    0 Dec 27 08:32 .
drwxr-xr-x. 6 root root    0 Dec 27 11:03 ..
drwxr-xr-x. 5 root root    0 Dec 27 11:14 hotplug
drwxr-xr-x. 2 root root    0 Dec 27 11:14 interrupts
-r--r--r--. 1 root root 4096 Dec 27 11:14 pm_profile
drwxr-xr-x. 3 root root    0 Dec 27 08:32 tables
# cd tables
# ls -al
total 0
drwxr-xr-x. 3 root root     0 Dec 27 13:53 .
drwxr-xr-x. 5 root root     0 Dec 27 13:53 ..
-r--------. 1 root root   114 Dec 27 11:14 APIC
-r--------. 1 root root    56 Dec 27 11:14 BGRT
-r--------. 1 root root 38840 Dec 27 11:14 DSDT
drwxr-xr-x. 2 root root     0 Dec 27 11:14 dynamic
-r--------. 1 root root   244 Dec 27 11:14 FACP
-r--------. 1 root root    64 Dec 27 11:14 FACS
-r--------. 1 root root    56 Dec 27 11:14 HPET
-r--------. 1 root root    60 Dec 27 11:14 MCFG
-r--------. 1 root root   877 Dec 27 11:14 SSDT1
-r--------. 1 root root  2474 Dec 27 11:14 SSDT2
-r--------. 1 root root  2706 Dec 27 11:14 SSDT3
# cd dynamic
# ls -al
total 0
drwxr-xr-x. 2 root root    0 Dec 27 11:14 .
drwxr-xr-x. 3 root root    0 Dec 27 13:53 ..
-r--------. 1 root root 2107 Dec 27 11:19 SSDT4
-r--------. 1 root root  771 Dec 27 11:19 SSDT5
-r--------. 1 root root  281 Dec 27 11:19 SSDT6


Let us examine one of the more interesting tables, i.e. Differentiated System Description Table (DSDT) which supplies information about supported power events in a given system. This table contains the Differentiated Definition Block which supplies the information and configuration information about the base system.

First of all, you need to figure out which ASL (ACPI Source Language) compiler was used was used to build the table.

# dmesg | grep DSDT 
[    0.000000] ACPI: DSDT 0x00000000DA7B1170 0097B8 (v02 ALASKA A M I    00000015 INTL 20051117)
# 


There are two popular ASL compilers – Intel (INTL) and Microsoft (MSFT). From the above output, we can see the table was compiled using the Intel ASL compiler.

To decompile the DSDT table you need to have an Intel ASL decompiler installed. On Fedora 21 X64 for example, this means you to need to install the acpica-tools-20140828-1.fc21.x86_64 package. Alternatively you can download a compiler/decompiler from the ACPICA (ACPI Component Architecture) project which provides an OS-independent reference implementation of the ACPI Specification.

To decompile and view the DTDT source:

# cat /sys/firmware/acpi/tables/DSDT > /var/tmp/dsdt.dat
# iasl -d /var/tmp/dsdt.dat 
# less dsdt.dsl

/*
 * Intel ACPI Component Architecture
 * AML/ASL+ Disassembler version 20141107-64 [Dec  2 2014]
 * Copyright (c) 2000 - 2014 Intel Corporation
 *
 * Disassembling to symbolic ASL+ operators
 *
 * Disassembly of /tmp/dsdt.dat, Sat Dec 27 13:23:33 2014
 *
 * Original Table Header:
 *     Signature        "DSDT"
 *     Length           0x000097B8 (38840)
 *     Revision         0x02
 *     Checksum         0x59
 *     OEM ID           "ALASKA"
 *     OEM Table ID     "A M I"
 *     OEM Revision     0x00000015 (21)
 *     Compiler ID      "INTL"
 *     Compiler Version 0x20051117 (537202967)
 */
DefinitionBlock ("/tmp/dsdt.aml", "DSDT", 2, "ALASKA", "A M I", 0x00000015)

Here is the section of DSDT which deals with a standard keyboard:

Device (PS2K)
{
    Name (_HID, EisaId ("PNP0303") /* IBM Enhanced Keyboard (101/102-key, PS/2 Mouse) */)  // _HID: Hardware ID
    Name (_CID, EisaId ("PNP030B"))  // _CID: Compatible ID
    Method (_STA, 0, NotSerialized)  // _STA: Status
    {
        If ((IOST & 0x0400))
        {
            Return (0x0F)
        }
        Else
        {
            Return (Zero)
        }
    }

    Name (_CRS, ResourceTemplate ()  // _CRS: Current Resource Settings
    {
        IO (Decode16,
            0x0060,             // Range Minimum
            0x0060,             // Range Maximum
            0x00,               // Alignment
            0x01,               // Length
            )
        IO (Decode16,
            0x0064,             // Range Minimum
            0x0064,             // Range Maximum
            0x00,               // Alignment
            0x01,               // Length
            )
        IRQNoFlags ()
            {1}
    })
    Name (_PRS, ResourceTemplate ()  // _PRS: Possible Resource Settings
    {
        StartDependentFn (0x00, 0x00)
        {
            FixedIO (
                0x0060,             // Address
                0x01,               // Length
                )
            FixedIO (
                0x0064,             // Address
                0x01,               // Length
                )
            IRQNoFlags ()
                {1}
        }
        EndDependentFn ()
    })
    Method (_PSW, 1, NotSerialized)  // _PSW: Power State Wake
    {
        KBFG = Arg0
    }
}


And here is section dealing with the HPET:

Device (HPET)
{
    Name (_HID, EisaId ("PNP0103") /* HPET System Timer */)  // _HID: Hardware ID
    Name (_UID, Zero)  // _UID: Unique ID
    Name (BUF0, ResourceTemplate ()
    {
        Memory32Fixed (ReadWrite,
            0xFED00000,         // Address Base
            0x00000400,         // Address Length
            _Y10)
    })
    Method (_STA, 0, NotSerialized)  // _STA: Status
    {
        If ((OSYS >= 0x07D1))
        {
            If (HPAE)
            {
                Return (0x0F)
            }
        }
        Else
        {
            If (HPAE)
            {
                Return (0x0B)
            }
        }

        Return (Zero)
    }

    Method (_CRS, 0, Serialized)  // _CRS: Current Resource Settings
    {
        If (HPAE)
        {
            CreateDWordField (BUF0, \_SB.PCI0.LPCB.HPET._Y10._BAS, HPT0)  // _BAS: Base Address
            If ((HPAS == One))
            {
                HPT0 = 0xFED01000
            }

            If ((HPAS == 0x02))
            {
                HPT0 = 0xFED02000
            }

            If ((HPAS == 0x03))
            {
                HPT0 = 0xFED03000
            }
        }

        Return (BUF0) /* \_SB_.PCI0.LPCB.HPET.BUF0 */
    }
}


If you examine the above “code”, you will see that decompiling ACPI tables can reveal lots of interesting information about your hardware and firmware. In fact, there is a active community of ACPI hackers which patch ACPI tables to improve the functionality of laptops and gaming systems.

If you are interesting in learning more about ACPI support in Linux, a good paper in read is ACPI in Linux, Architecture, Advances, and Challenges by Brown et al.

Installing Google Chrome 38 on Fedora 21 Alpha

Download the current stable Google Chrome RPM. As of today’s date this is revision 38.0.2125.

You also need to install two package to get libXss.so and lsb (Linux Standards Base) compatibility.

# yum install libXScrnSaver  redhat-lsb
# rpm -Uvh google-chrome-stable_current_x86_64.rpm

One of the major new features in this version of Chrome is support for a security key such as the Yubico FIDO U2F for two-factor authentication.
Note that FIDO U2F devices do not currently work with Fedora 21 Alpha. Such devices work perfectly well on the Windows build of the Google Chrome browser. However, on the alpha version of Fedora 21, Google Chrome simply hangs when you use a FIDO U2F device. This is because Fedora 21 currently does not have support for U2F devices.

A bug report (See Bug 1155826) was raised by Andy Lutomirski a few days ago asking for a review of a new library, libu2f-host, to implement the host side of the U2F protocol so hopefully U2F support will be available for the Chrome browser soon.

If you download and build the source RPM referenced in the above bug report, and install the resultant libu2f-host library, Google Chrome two factor authentication will the work with the Yubico U2F device on Fedora 21.

Assuming you have installed the development tools and libraries, here is how to build and install the library:

# rpmbuild --rebuild libu2f-host-0.0-3.fc20.src.rpm
# cd <path_to>/rpmbuild/RPMS/x86_64
# rpm -Uvhp libu2f-host-0.0-3.fc21.x86_64.rpm


Hopefully, this library will make its way into Fedora 21 and other Linux distributions before too long.

Control Group Subsystems in RHEL7

Control groups (cgroups) are a Linux kernel feature that enables you to allocate resources — such as CPU time, system memory, disk I/O, network bandwidth, etc. — among hierarchically ordered groups of processes running on a system. Initially developed by Google engineers Paul Menage and Rohit Seth in 2006 under the name “process containers”, it was merged into kernel version 2.6.24 and extensively enhanced since then. RHEL6 was the first Red Hat distribution to support cgroups.

Cgroups provide system administrators with fine-grained control over allocating, prioritizing, denying, managing, and monitoring system resources. A cgroup is a collection of processes that are bound by the same criteria. These groups are typically hierarchical, where each group inherits limits from its parent group.

The problem with the traditional use of cgroups is summarized by the following except from a Red hat guide:

Control Groups provide a way to hierarchically group and label processes, and to apply resource limits to them. Traditionally, all processes received similar amount of system resources that administrator could modulate with the process niceness value. With this approach, applications that involved a large number of processes got more resources than applications with few processes, regardless of the relative importance of these applications.

In RHEL6, administrators had to build custom cgroup hierarchies to meet their application needs. In RHEL7, it is no longer necessary to build custom cgroups as resource management settings have moved from the process level to the application level via binding the system of cgroup hierarchies with the systemd unit tree. By default, systemd automatically creates a hierarchy of slices, scopes and services to provide a unified structure for the cgroup tree.

A resource controller, also called cgroup subsystem, represents a single resource, such as CPU time or memory. The Linux kernel provides a range of resource controllers which can be seen by cat’ing /proc/cgroups

# cat /proc/cgroups
#subsys_name	hierarchy	num_cgroups	enabled
cpuset	2	1	1
cpu	3	1	1
cpuacct	3	1	1
memory	4	1	1
devices	5	1	1
freezer	6	1	1
net_cls	7	1	1
blkio	8	1	1
perf_event	9	1	1
hugetlb	10	1	1


A quick explanation of each of the above cgroup subsystems:

  • cpuset: Assigns individual CPUs (on a multicore system) and memory nodes to tasks in a cgroup.
  • cpu: Uses the scheduler to provide cgroup tasks access to the CPU.
  • cpuacct: Automatic reports on CPU resources used by tasks in a cgroup.
  • memory: Sets limits on memory use by tasks in a cgroup, and generates automatic reports on memory resources used by those tasks.
  • devices: Allows or denies access to devices by tasks in a cgroup.
  • freezer: Suspends or resumes tasks in a cgroup.
  • net_cls: Tags network packets with a class identifier (classid) to enable the Linux traffic controller to identify packets originating from a particular cgroup task.
  • blkio: Sets limits on input/output access to and from block devices.
  • perf_event: Permits monitoring cgroups with the perf tool.
  • hugetlb: Enables large virtual memory pages and the enforcing of resource limits on these pages.

You can also use the lsusbsys utility to view the control group subsystems.:

# lssubsys
cpuset
cpu,cpuacct
memory
devices
freezer
net_cls
blkio
perf_event
hugetlb

# lssubsys -im
cpuset /sys/fs/cgroup/cpuset
cpu,cpuacct /sys/fs/cgroup/cpu,cpuacct
memory /sys/fs/cgroup/memory
devices /sys/fs/cgroup/devices
freezer /sys/fs/cgroup/freezer
net_cls /sys/fs/cgroup/net_cls
blkio /sys/fs/cgroup/blkio
perf_event /sys/fs/cgroup/perf_event
hugetlb /sys/fs/cgroup/hugetlb

# mount | grep cgroup
tmpfs on /sys/fs/cgroup type tmpfs (rw,nosuid,nodev,noexec,seclabel,mode=755)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpuacct,cpu)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/net_cls type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)


If you install the kernal-doc RPM, you will find documentation on each of the above cgroup subsystems under /usr/share/doc/kernel-doc-<kernel_version>/Documentation/cgroups/.