Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Systems using S0ix don't reach SLP_S0 #506

Open
DrymarchonShaun opened this issue Jan 11, 2024 · 5 comments
Open

Systems using S0ix don't reach SLP_S0 #506

DrymarchonShaun opened this issue Jan 11, 2024 · 5 comments

Comments

@DrymarchonShaun
Copy link

DrymarchonShaun commented Jan 11, 2024

  • Model: darp8
  • BIOS version: 2024-01-10_6c402c3
  • EC version: 2024-01-10_6c402c3
  • OS: NixOS 23.11
  • Kernel: 6.1.69 and 6.6.8

intel's s0ix-selftest-tool is saying the device isn't reaching the deepest state of s0ix, log for that is here.

Manually checking the actual files I'm assuming that script checks I'm finding the same -

$ sudo cat /sys/kernel/debug/pmc_core/slp_s0_residency_usec
0
$ cat /sys/devices/system/cpu/cpuidle/low_power_idle_system_residency_us
0

however, the CPU is hitting C10 -

$ cat /sys/devices/system/cpu/cpuidle/low_power_idle_cpu_residency_us
1115299596
$ sudo cat /sys/kernel/debug/pmc_core/package_cstate_show                      
Package C2 : 434296807
Package C3 : 300541816
Package C6 : 61166
Package C7 : 0
Package C8 : 252830
Package C9 : 0
Package C10 : 105541521

Steps to reproduce

Suspend the system
Resume
Run sudo cat /sys/kernel/debug/pmc_core/slp_s0_residency_usec

Expected behavior

sudo cat /sys/kernel/debug/pmc_core/slp_s0_residency_usec should return greater than 0

Actual behavior

sudo cat /sys/kernel/debug/pmc_core/slp_s0_residency_usec returns 0

@DrymarchonShaun DrymarchonShaun changed the title darp8 doesn't enter SLP_S0 darp8 doesn't reach SLP_S0 Jan 11, 2024
@crawfxrd crawfxrd changed the title darp8 doesn't reach SLP_S0 Systems using S0ix don't reach SLP_S0 Jan 11, 2024
@crawfxrd
Copy link
Member

I don't know what's required for SLP_S0#, but I expect a part of it is missing or incorrect RTD3 configs.

@DrymarchonShaun
Copy link
Author

DrymarchonShaun commented Jan 20, 2024

I don't know what's required for SLP_S0#, but I expect a part of it is missing or incorrect RTD3 configs.

I assume that would be what's causing the Pcieport is not in D3cold: parts of the selftest-tool's output?

Checking PCI Devices D3 States:
[  309.689726] nvme 0000:2f:00.0: PCI PM: Suspend power state: D0
[  309.689730] nvme 0000:2f:00.0: PCI PM: Skipped
[  309.691814] i801_smbus 0000:00:1f.4: PCI PM: Suspend power state: D0
[  309.691817] i801_smbus 0000:00:1f.4: PCI PM: Skipped
[  309.693959] pcieport 0000:00:1d.0: PCI PM: Suspend power state: D0
[  309.693962] pcieport 0000:00:1d.0: PCI PM: Skipped
[  309.695756] snd_hda_intel 0000:00:1f.3: PCI PM: Suspend power state: D3hot
[  309.695762] i915 0000:00:02.0: PCI PM: Suspend power state: D3hot
[  309.696360] xhci_hcd 0000:00:0d.0: PCI PM: Suspend power state: D3hot
[  309.702096] r8169 0000:2e:00.0: PCI PM: Suspend power state: D3hot
[  309.705955] sdhci-pci 0000:2d:00.0: PCI PM: Suspend power state: D3hot
[  309.706697] nvme 0000:01:00.0: PCI PM: Suspend power state: D3hot
[  309.706773] mei_me 0000:00:16.0: PCI PM: Suspend power state: D3hot
[  309.706825] pcieport 0000:00:1c.0: PCI PM: Suspend power state: D0
[  309.706827] pcieport 0000:00:1c.0: PCI PM: Skipped
[  309.707008] intel-lpss 0000:00:15.0: PCI PM: Suspend power state: D3hot
[  309.707208] xhci_hcd 0000:00:14.0: PCI PM: Suspend power state: D3hot
[  309.707892] iwlwifi 0000:00:14.3: PCI PM: Suspend power state: D3hot
[  309.711623] thunderbolt 0000:00:0d.2: PCI PM: Suspend power state: D3hot
[  309.714532] pcieport 0000:00:1c.7: PCI PM: Suspend power state: D3hot
[  309.740534] pcieport 0000:00:06.0: PCI PM: Suspend power state: D3cold


Checking PCI Devices tree diagram:
-[0000:00]-+-00.0  Intel Corporation Device 4621
           +-02.0  Intel Corporation Alder Lake-P GT2 [Iris Xe Graphics]
           +-06.0-[01]----00.0  Sandisk Corp SanDisk Ultra 3D / WD Blue SN550 NVMe SSD
           +-07.0-[02-2c]--
           +-0a.0  Intel Corporation Platform Monitoring Technology
           +-0d.0  Intel Corporation Alder Lake-P Thunderbolt 4 USB Controller
           +-0d.2  Intel Corporation Alder Lake-P Thunderbolt 4 NHI #0
           +-14.0  Intel Corporation Alder Lake PCH USB 3.2 xHCI Host Controller
           +-14.2  Intel Corporation Alder Lake PCH Shared SRAM
           +-14.3  Intel Corporation Alder Lake-P PCH CNVi WiFi
           +-15.0  Intel Corporation Alder Lake PCH Serial IO I2C Controller #0
           +-15.1  Intel Corporation Alder Lake PCH Serial IO I2C Controller #1
           +-16.0  Intel Corporation Alder Lake PCH HECI Controller
           +-1c.0-[2d]----00.0  O2 Micro, Inc. SD/MMC Card Reader Controller
           +-1c.7-[2e]----00.0  Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
           +-1d.0-[2f]----00.0  Sandisk Corp SanDisk Ultra 3D / WD Blue SN570 NVMe SSD (DRAM-less)
           +-1f.0  Intel Corporation Alder Lake PCH eSPI Controller
           +-1f.3  Intel Corporation Alder Lake PCH-P High Definition Audio Controller
           +-1f.4  Intel Corporation Alder Lake PCH-P SMBus Host Controller
           \-1f.5  Intel Corporation Alder Lake-P PCH SPI Controller

The pcieport 0000:00:1d.0 ASPM enable status:
		LnkCtl:	ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+

Pcieport is not in D3cold:          
0000:00:1d.0

The pcieport 0000:00:1c.0 ASPM enable status:
		LnkCtl:	ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk-

Pcieport is not in D3cold:          
0000:00:1c.0

Pcieport is not in D3cold:     
0000:00:1c.7

Available bridge device: 0000:00:06.0 0000:00:07.0 0000:00:1c.0 0000:00:1c.7 0000:00:1d.0

I'm not sure what


The PCIe bridge link power management state is:
0000:00:06.0 Link is in L0

The link power management state of PCIe bridge: 0000:00:06.0 is not expected. 
which is expected to be L1.1 or L1.2, or user would run this script again.


The L1SubCap of the failed 0000:00:06.0 is:
		L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+

The L1SubCtl1 of the failed 0000:00:06.0 is:
		L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1-

is about, although I did notice that the way intel formatted it makes it look like its one of the SSDs, checking lspci -vv it shows 00:06.0 as

00:06.0 PCI bridge: Intel Corporation 12th Gen Core Processor PCI Express x4 Controller #0 (rev 02) (prog-if 00 [Normal decode])
	Subsystem: CLEVO/KAPOK Computer Device 7716
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin D routed to IRQ 122
	IOMMU group: 2
	Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
	I/O behind bridge: [disabled] [16-bit]
	Memory behind bridge: 80400000-804fffff [size=1M] [32-bit]
	Prefetchable memory behind bridge: [disabled] [64-bit]
	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
	BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16+ MAbort- >Reset- FastB2B-
		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
	Capabilities: [40] Express (v2) Root Port (Slot+), MSI 00
		DevCap:	MaxPayload 256 bytes, PhantFunc 0
			ExtTag- RBE+
		DevCtl:	CorrErr- NonFatalErr- FatalErr- UnsupReq-
			RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
			MaxPayload 256 bytes, MaxReadReq 128 bytes
		DevSta:	CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
		LnkCap:	Port #5, Speed 16GT/s, Width x4, ASPM L0s L1, Exit Latency L0s <4us, L1 <16us
			ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp+
		LnkCtl:	ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 8GT/s, Width x4
			TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt-
		SltCap:	AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
			Slot #0, PowerLimit 75W; Interlock- NoCompl+
		SltCtl:	Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
			Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock-
		SltSta:	Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
			Changed: MRL- PresDet+ LinkState+
		RootCap: CRSVisible-
		RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna+ CRSVisible-
		RootSta: PME ReqID 0000, PMEStatus- PMEPending-
		DevCap2: Completion Timeout: Range ABC, TimeoutDis+ NROPrPrP- LTR+
			10BitTagComp+ 10BitTagReq+ OBFF Not Supported, ExtFmt- EETLPPrefix-
			EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
			FRS- LN System CLS Not Supported, TPHComp- ExtTPHComp- ARIFwd+
			AtomicOpsCap: Routing+ 32bit+ 64bit+ 128bitCAS+
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR+ 10BitTagReq- OBFF Disabled, ARIFwd-
			AtomicOpsCtl: ReqEn+ EgressBlck+
		LnkCap2: Supported Link Speeds: 2.5-16GT/s, Crosslink- Retimer+ 2Retimers+ DRS-
		LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
			Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
		LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+ EqualizationPhase1+
			EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest-
			Retimer- 2Retimers- CrosslinkRes: unsupported
	Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
		Address: fee00218  Data: 0000
	Capabilities: [90] Subsystem: CLEVO/KAPOK Computer Device 7716
	Capabilities: [a0] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [100 v1] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt+ RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
		AERCap:	First Error Pointer: 00, ECRCGenCap- ECRCGenEn- ECRCChkCap- ECRCChkEn-
			MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
		HeaderLog: 00000000 00000000 00000000 00000000
		RootCmd: CERptEn- NFERptEn- FERptEn-
		RootSta: CERcvd- MultCERcvd- UERcvd- MultUERcvd-
			FirstFatal- NonFatalMsg- FatalMsg- IntMsg 0
		ErrorSrc: ERR_COR: 0000 ERR_FATAL/NONFATAL: 0000
	Capabilities: [220 v1] Access Control Services
		ACSCap:	SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl- DirectTrans-
		ACSCtl:	SrcValid+ TransBlk- ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl- DirectTrans-
	Capabilities: [200 v1] L1 PM Substates
		L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
			 PortCommonModeRestoreTime=110us PortTPowerOnTime=500us
		L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1-
			  T_CommonMode=110us LTR1.2_Threshold=616448ns
		L1SubCtl2: T_PwrOn=500us
	Capabilities: [150 v1] Precision Time Measurement
		PTMCap: Requester:- Responder:+ Root:+
		PTMClockGranularity: 4ns
		PTMControl: Enabled:+ RootSelected:+
		PTMEffectiveGranularity: Unknown
	Capabilities: [a30 v1] Secondary PCI Express
		LnkCtl3: LnkEquIntrruptEn- PerformEqu-
		LaneErrStat: 0
	Capabilities: [a90 v1] Data Link Feature <?>
	Capabilities: [a9c v1] Physical Layer 16.0 GT/s <?>
	Capabilities: [edc v1] Lane Margining at the Receiver <?>
	Kernel driver in use: pcieport

@peterpeterp

This comment was marked as off-topic.

@DrymarchonShaun
Copy link
Author

I think I have the same issue on my Thinkpad T14 Gen4 with manjaro
Did you fix it?

I haven't checked to see if it's still an issue recently but as far as I know it hasn't been fixed.

@danielstuart14
Copy link

danielstuart14 commented Aug 12, 2024

Exact same issue on a Lenovo V14 (i5 12th Gen, kernel 6.10). For me this screams a Kernel bug, instead of a EC problem.
s0ixSelftestTool also states that the NVME ssd / controller is the culprit, but on my case it is a Samsung PM9B1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants