Commit d5ebb498 authored by Russell King's avatar Russell King

[ARM PATCH] 1389/1: update iop3xx support for 2.5 (patch 2 of 4)

Patch from Eli Carter

Supercedes patch 1382
This patch copies the relevant documentation for the iop3xx support
from the 2.4.19-rmk4-ds2 kernel.
parent 76efaa97
...@@ -49,61 +49,14 @@ Preparing the Hardware ...@@ -49,61 +49,14 @@ Preparing the Hardware
----------------------------- -----------------------------
This document assumes you're using a Rev D or newer board running This document assumes you're using a Rev D or newer board running
Redboot as the bootloader. Redboot as the bootloader. Note that the version of RedBoot provided
with the boards has a major issue and you need to replace it with the
latest RedBoot. You can grab the source from the ECOS CVS or you can
get a prebuilt image and burn it in using FRU at:
The as-supplied RedBoot image appears to leave the first page of RAM ftp://source.mvista.com/pub/xscale/iq80310/redboot.bin
in a corrupt state such that certain words in that page are unwritable
and contain random data. The value of the data, and the location within
the first page changes with each boot, but is generally in the range
0xa0000150 to 0xa0000fff.
You can grab the source from the ECOS CVS or you can get a prebuilt image Make sure you do an 'fis init' command once you boot with the new
from:
ftp://source.mvista.com/pub/xscale/iop310/IQ80310/redboot.bin
which is:
# strings redboot.bin | grep bootstrap
RedBoot(tm) bootstrap and debug environment, version UNKNOWN - built 14:58:21, Aug 15 2001
md5sum of this version:
bcb96edbc6f8e55b16c165930b6e4439 redboot.bin
You have two options to program it:
1. Using the FRU program (see the instructions in the user manual).
2. Using a Linux host, with MTD support built into the host kernel:
- ensure that the RedBoot image is not locked (issue the following
command under the existing RedBoot image):
RedBoot> fis unlock -f 0 -l 0x40000
- switch S3-1 and S3-2 on.
- reboot the host
- login as root
- identify the 80310 card:
# lspci
...
00:0c.1 Memory controller: Intel Corporation 80310 IOP [IO Processor] (rev 01)
- in this example, bus 0, slot 0c, function 1.
- insert the MTD modules, and the PCI map module:
# insmod drivers/mtd/maps/pci.o
- locate the MTD device (using the bus, slot, function)
# cat /proc/mtd
dev: size erasesize name
mtd0: 00800000 00020000 "00:0c.1"
- in this example, it is mtd device 0. Yours will be different.
Check carefully.
- program the flash
# cat redboot.bin > /dev/mtdblock0
- check the kernel message log for errors (some cat commands don't
error on failure)
# dmesg
- switch S3-1 and S3-2 off
- reboot host
In any case, make sure you do an 'fis init' command once you boot with the new
RedBoot image. RedBoot image.
...@@ -235,7 +188,7 @@ JFFS RedBoot partition mapped into the MTD partition scheme. ...@@ -235,7 +188,7 @@ JFFS RedBoot partition mapped into the MTD partition scheme.
You can grab a pre-built JFFS image to use as a root file system at: You can grab a pre-built JFFS image to use as a root file system at:
ftp://source.mvista.com/pub/xscale/iop310/IQ80310/jffs.img ftp://source.mvista.com/pub/xscale/iq80310/jffs.img
For detailed info on using MTD and creating a JFFS image go to: For detailed info on using MTD and creating a JFFS image go to:
......
Board Overview
-----------------------------
The Worcester IQ80321 board is an evaluation platform for Intel's 80321 Xscale
CPU (sometimes called IOP321 chipset).
The 80321 contains a single PCI hose (called the ATUs), a PCI-to-PCI bridge,
two DMA channels, I2C, I2O messaging unit, XOR unit for RAID operations,
a bus performance monitoring unit, and a memory controller with ECC features.
For more information on the board, see http://developer.intel.com/iio
Port Status
-----------------------------
Supported:
- MTD/JFFS/JFFS2 root
- NFS root
- RAMDISK root
- Serial port (ttyS0)
- Cache/TLB locking on 80321 CPU
- Performance monitoring unit on 80321 CPU
TODO:
- DMA engines
- I2C
- 80321 Bus Performance Monitor
- Application Accelerator Unit (XOR engine for RAID)
- I2O Messaging Unit
- I2C unit
- SSP
Building the Kernel
-----------------------------
make iq80321_config
make oldconfig
make dep
make zImage
This will build an image setup for BOOTP/NFS root support. To change this,
just run make menuconfig and disable nfs root or add a "root=" option.
Preparing the Hardware
-----------------------------
Make sure you do an 'fis init' command once you boot with the new
RedBoot image.
Downloading Linux
-----------------------------
Assuming you have your development system setup to act as a bootp/dhcp
server and running tftp:
NOTE: The 80321 board uses a different default memory map than the 80310.
RedBoot> load -r -b 0x01008000 -m y
Once the download is completed:
RedBoot> go 0x01008000
There is a version of RedBoot floating around that has DHCP support, but
I've never been able to cleanly transfer a kernel image and have it run.
Root Devices
-----------------------------
A kernel is not useful without a root filesystem, and you have several
choices with this board: NFS root, RAMDISK, or JFFS/JFFS2. For development
purposes, it is suggested that you use NFS root for easy access to various
tools. Once you're ready to deploy, probably want to utilize JFFS/JFFS2 on
the flash device.
MTD on the IQ80321
-----------------------------
Linux on the IQ80321 supports RedBoot FIS paritioning if it is enabled.
Out of the box, once you've done 'fis init' on RedBoot, you will get
the following partitioning scheme:
root@192.168.0.14:~# cat /proc/mtd
dev: size erasesize name
mtd0: 00040000 00020000 "RedBoot"
mtd1: 00040000 00020000 "RedBoot[backup]"
mtd2: 0075f000 00020000 "unallocated space"
mtd3: 00001000 00020000 "RedBoot config"
mtd4: 00020000 00020000 "FIS directory"
To create an FIS directory, you need to use the fis command in RedBoot.
As an example, you can burn the kernel into the flash once it's downloaded:
RedBoot> fis create -b 0x01008000 -l 0x8CBAC -r 0x01008000 -f 0x80000 kernel
... Erase from 0x00080000-0x00120000: .....
... Program from 0x01008000-0x01094bac at 0x00080000: .....
... Unlock from 0x007e0000-0x00800000: .
... Erase from 0x007e0000-0x00800000: .
... Program from 0x01fdf000-0x01fff000 at 0x007e0000: .
... Lock from 0x007e0000-0x00800000: .
RedBoot> fis list
Name FLASH addr Mem addr Length Entry point
RedBoot 0x00000000 0x00000000 0x00040000 0x00000000
RedBoot[backup] 0x00040000 0x00040000 0x00040000 0x00000000
RedBoot config 0x007DF000 0x007DF000 0x00001000 0x00000000
FIS directory 0x007E0000 0x007E0000 0x00020000 0x00000000
kernel 0x00080000 0x01008000 0x000A0000 0x00000000
This leads to the following Linux MTD setup:
mtroot@192.168.0.14:~# cat /proc/mtd
dev: size erasesize name
mtd0: 00040000 00020000 "RedBoot"
mtd1: 00040000 00020000 "RedBoot[backup]"
mtd2: 000a0000 00020000 "kernel"
mtd3: 006bf000 00020000 "unallocated space"
mtd4: 00001000 00020000 "RedBoot config"
mtd5: 00020000 00020000 "FIS directory"
Note that there is not a 1:1 mapping to the number of RedBoot paritions to
MTD partitions as unused space also gets allocated into MTD partitions.
As an aside, the -r option when creating the Kernel entry allows you to
simply do an 'fis load kernel' to copy the image from flash into memory.
You can then do an 'fis go 0x01008000' to start Linux.
If you choose to use static partitioning instead of the RedBoot partioning:
/dev/mtd0 0x00000000 - 0x0007ffff: Boot Monitor (512k)
/dev/mtd1 0x00080000 - 0x0011ffff: Kernel Image (640K)
/dev/mtd2 0x00120000 - 0x0071ffff: File System (6M)
/dev/mtd3 0x00720000 - 0x00800000: RedBoot Reserved (896K)
To use a JFFS1/2 root FS, you need to donwload the JFFS image using either
tftp or ymodem, and then copy it to flash:
RedBoot> load -r -b 0x01000000 /tftpboot/jffs.img
Raw file loaded 0x01000000-0x01600000
RedBoot> fis create -b 0x01000000 -l 0x600000 -f 0x120000 jffs
... Erase from 0x00120000-0x00720000: ..................................
... Program from 0x01000000-0x01600000 at 0x00120000: ..................
......................
... Unlock from 0x007e0000-0x00800000: .
... Erase from 0x007e0000-0x00800000: .
... Program from 0x01fdf000-0x01fff000 at 0x007e0000: .
... Lock from 0x007e0000-0x00800000: .
RedBoot> fis list
Name FLASH addr Mem addr Length Entry point
RedBoot 0x00000000 0x00000000 0x00040000 0x00000000
RedBoot[backup] 0x00040000 0x00040000 0x00040000 0x00000000
RedBoot config 0x007DF000 0x007DF000 0x00001000 0x00000000
FIS directory 0x007E0000 0x007E0000 0x00020000 0x00000000
kernel 0x00080000 0x01008000 0x000A0000 0x01008000
jffs 0x00120000 0x00120000 0x00600000 0x00000000
This looks like this in Linux:
root@192.168.0.14:~# cat /proc/mtd
dev: size erasesize name
mtd0: 00040000 00020000 "RedBoot"
mtd1: 00040000 00020000 "RedBoot[backup]"
mtd2: 000a0000 00020000 "kernel"
mtd3: 00600000 00020000 "jffs"
mtd4: 000bf000 00020000 "unallocated space"
mtd5: 00001000 00020000 "RedBoot config"
mtd6: 00020000 00020000 "FIS directory"
You need to boot the kernel once and watch the boot messages to see how the
JFFS RedBoot partition mapped into the MTD partition scheme.
You can grab a pre-built JFFS image to use as a root file system at:
ftp://source.mvista.com/pub/xscale/iq80310/jffs.img
For detailed info on using MTD and creating a JFFS image go to:
http://www.linux-mtd.infradead.org.
For details on using RedBoot's FIS commands, type 'fis help' or consult
your RedBoot manual.
BUGS and ISSUES
-----------------------------
* As shipped from Intel, pre-production boards have two issues:
- The on board ethernet is disabled S8E1-2 is off. You will need to turn it on.
- The PCIXCAPs are configured for a 100Mhz clock, but the clock selected is
actually only 66Mhz. This causes the wrong PPL multiplier to be used and the
board only runs at 400Mhz instead of 600Mhz. The way to observe this is to
use a independent clock to time a "sleep 10" command from the prompt. If it
takes 15 seconds instead of 10, you are running at 400Mhz.
- The experimental IOP310 drivers for the AAU, DMA, etc. are not supported yet.
Contributors
-----------------------------
The port to the IQ80321 was performed by:
Rory Bolt <rorybolt@pacbell.net> - Initial port, debugging.
This port was based on the IQ80310 port with the following contributors:
Nicolas Pitre <nico@cam.org> - Initial port, cleanup, debugging
Matt Porter <mporter@mvista.com> - PCI subsystem development, debugging
Tim Sanders <tsanders@sanders.org> - Initial PCI code
Deepak Saxena <dsaxena@mvista.com> - Cleanup, debug, cache lock, PMU
The port is currently maintained by Deepak Saxena <dsaxena@mvista.com>
-----------------------------
Enjoy.
Support functions for the Intel 80310 AAU
===========================================
Dave Jiang <dave.jiang@intel.com>
Last updated: 09/18/2001
The Intel 80312 companion chip in the 80310 chipset contains an AAU. The
AAU is capable of processing up to 8 data block sources and perform XOR
operations on them. This unit is typically used to accelerated XOR
operations utilized by RAID storage device drivers such as RAID 5. This
API is designed to provide a set of functions to take adventage of the
AAU. The AAU can also be used to transfer data blocks and used as a memory
copier. The AAU transfer the memory faster than the operation performed by
using CPU copy therefore it is recommended to use the AAU for memory copy.
------------------
int aau_request(u32 *aau_context, const char *device_id);
This function allows the user the acquire the control of the the AAU. The
function will return a context of AAU to the user and allocate
an interrupt for the AAU. The user must pass the context as a parameter to
various AAU API calls.
int aau_queue_buffer(u32 aau_context, aau_head_t *listhead);
This function starts the AAU operation. The user must create a SGL
header with a SGL attached. The format is presented below. The SGL is
built from kernel memory.
/* hardware descriptor */
typedef struct _aau_desc
{
u32 NDA; /* next descriptor address [READONLY] */
u32 SAR[AAU_SAR_GROUP]; /* src addrs */
u32 DAR; /* destination addr */
u32 BC; /* byte count */
u32 DC; /* descriptor control */
u32 SARE[AAU_SAR_GROUP]; /* extended src addrs */
} aau_desc_t;
/* user SGL format */
typedef struct _aau_sgl
{
aau_desc_t aau_desc; /* AAU HW Desc */
u32 status; /* status of SGL [READONLY] */
struct _aau_sgl *next; /* pointer to next SG [READONLY] */
void *dest; /* destination addr */
void *src[AAU_SAR_GROUP]; /* source addr[4] */
void *ext_src[AAU_SAR_GROUP]; /* ext src addr[4] */
u32 total_src; /* total number of source */
} aau_sgl_t;
/* header for user SGL */
typedef struct _aau_head
{
u32 total; /* total descriptors allocated */
u32 status; /* SGL status */
aau_sgl_t *list; /* ptr to head of list */
aau_callback_t callback; /* callback func ptr */
} aau_head_t;
The function will call aau_start() and start the AAU after it queues
the SGL to the processing queue. When the function will either
a. Sleep on the wait queue aau->wait_q if no callback has been provided, or
b. Continue and then call the provided callback function when DMA interrupt
has been triggered.
int aau_suspend(u32 aau_context);
Stops/Suspends the AAU operation
int aau_free(u32 aau_context);
Frees the ownership of AAU. Called when no longer need AAU service.
aau_sgl_t * aau_get_buffer(u32 aau_context, int num_buf);
This function obtains an AAU SGL for the user. User must specify the number
of descriptors to be allocated in the chain that is returned.
void aau_return_buffer(u32 aau_context, aau_sgl_t *list);
This function returns all SGL back to the API after user is done.
int aau_memcpy(void *dest, void *src, u32 size);
This function is a short cut for user to do memory copy utilizing the AAU for
better large block memory copy vs using the CPU. This is similar to using
typical memcpy() call.
* User is responsible for the source address(es) and the destination address.
The source and destination should all be cached memory.
void aau_test()
{
u32 aau;
char dev_id[] = "AAU";
int size = 2;
int err = 0;
aau_head_t *head;
aau_sgl_t *list;
u32 i;
u32 result = 0;
void *src, *dest;
printk("Starting AAU test\n");
if((err = aau_request(&aau, dev_id))<0)
{
printk("test - AAU request failed: %d\n", err);
return;
}
else
{
printk("test - AAU request successful\n");
}
head = kmalloc(sizeof(aau_head_t), GFP_KERNEL);
head->total = size;
head->status = 0;
head->callback = NULL;
list = aau_get_buffer(aau, size);
if(!list)
{
printk("Can't get buffers\n");
return;
}
head->list = list;
src = kmalloc(1024, GFP_KERNEL);
dest = kmalloc(1024, GFP_KERNEL);
while(list)
{
list->status = 0;
list->aau_desc->SAR[0] = (u32)src;
list->aau_desc->DAR = (u32)dest;
list->aau_desc->BC = 1024;
/* see iop310-aau.h for more DCR commands */
list->aau_desc->DC = AAU_DCR_WRITE | AAU_DCR_BLKCTRL_1_DF;
if(!list->next)
{
list->aau_desc->DC = AAU_DCR_IE;
break;
}
list = list->next;
}
printk("test- Queueing buffer for AAU operation\n");
err = aau_queue_buffer(aau, head);
if(err >= 0)
{
printk("AAU Queue Buffer is done...\n");
}
else
{
printk("AAU Queue Buffer failed...: %d\n", err);
}
#if 1
printk("freeing the AAU\n");
aau_return_buffer(aau, head->list);
aau_free(aau);
kfree(src);
kfree(dest);
kfree((void *)head);
#endif
}
All Disclaimers apply. Use this at your own discretion. Neither Intel nor I
will be responsible if anything goes wrong. =)
TODO
____
* Testing
* Do zero-size AAU transfer/channel at init
so all we have to do is chainining
Support functions forthe Intel 80310 DMA channels
==================================================
Dave Jiang <dave.jiang@intel.com>
Last updated: 09/18/2001
The Intel 80310 XScale chipset provides 3 DMA channels via the 80312 I/O
companion chip. Two of them resides on the primary PCI bus and one on the
secondary PCI bus.
The DMA API provided is not compatible with the generic interface in the
ARM tree unfortunately due to how the 80312 DMACs work. Hopefully some time
in the near future a software interface can be done to bridge the differences.
The DMA API has been modeled after Nicholas Pitre's SA11x0 DMA API therefore
they will look somewhat similar.
80310 DMA API
-------------
int dma_request(dmach_t channel, const char *device_id);
This function will attempt to allocate the channel depending on what the
user requests:
IOP310_DMA_P0: PCI Primary 1
IOP310_DMA_P1: PCI Primary 2
IOP310_DMA_S0: PCI Secondary 1
/*EOF*/
Once the user allocates the DMA channel it is owned until released. Although
other users can also use the same DMA channel, but no new resources will be
allocated. The function will return the allocated channel number if successful.
int dma_queue_buffer(dmach_t channel, dma_sghead_t *listhead);
The user will construct a SGL in the form of below:
/*
* Scattered Gather DMA List for user
*/
typedef struct _dma_desc
{
u32 NDAR; /* next descriptor adress [READONLY] */
u32 PDAR; /* PCI address */
u32 PUADR; /* upper PCI address */
u32 LADR; /* local address */
u32 BC; /* byte count */
u32 DC; /* descriptor control */
} dma_desc_t;
typedef struct _dma_sgl
{
dma_desc_t dma_desc; /* DMA descriptor */
u32 status; /* descriptor status [READONLY] */
u32 data; /* user defined data */
struct _dma_sgl *next; /* next descriptor [READONLY] */
} dma_sgl_t;
/* dma sgl head */
typedef struct _dma_head
{
u32 total; /* total elements in SGL */
u32 status; /* status of sgl */
u32 mode; /* read or write mode */
dma_sgl_t *list; /* pointer to list */
dma_callback_t callback; /* callback function */
} dma_head_t;
The user shall allocate user SGL elements by calling the function:
dma_get_buffer(). This function will give the user an SGL element. The user
is responsible for creating the SGL head however. The user is also
responsible for allocating the memory for DMA data. The following code segment
shows how a DMA operation can be performed:
#include <asm/arch/iop310-dma.h>
void dma_test(void)
{
char dev_id[] = "Primary 0";
dma_head_t *sgl_head = NULL;
dma_sgl_t *sgl = NULL;
int err = 0;
int channel = -1;
u32 *test_ptr = 0;
DECLARE_WAIT_QUEUE_HEAD(wait_q);
*(IOP310_ATUCR) = (IOP310_ATUCR_PRIM_OUT_ENAB |
IOP310_ATUCR_DIR_ADDR_ENAB);
channel = dma_request(IOP310_DMA_P0, dev_id);
sgl_head = (dma_head_t *)kmalloc(sizeof(dma_head_t), GFP_KERNEL);
sgl_head->callback = NULL; /* no callback created */
sgl_head->total = 2; /* allocating 2 DMA descriptors */
sgl_head->mode = (DMA_MOD_WRITE);
sgl_head->status = 0;
/* now we get the two descriptors */
sgl = dma_get_buffer(channel, 2);
/* we set the header to point to the list we allocated */
sgl_head->list = sgl;
/* allocate 1k of DMA data */
sgl->data = (u32)kmalloc(1024, GFP_KERNEL);
/* Local address is physical */
sgl->dma_desc.LADR = (u32)virt_to_phys(sgl->data);
/* write to arbitrary location over the PCI bus */
sgl->dma_desc.PDAR = 0x00600000;
sgl->dma_desc.PUADR = 0;
sgl->dma_desc.BC = 1024;
/* set write & invalidate PCI command */
sgl->dma_desc.DC = DMA_DCR_PCI_MWI;
sgl->status = 0;
/* set a pattern */
memset(sgl->data, 0xFF, 1024);
/* User's responsibility to keep buffers cached coherent */
cpu_dcache_clean(sgl->data, sgl->data + 1024);
sgl = sgl->next;
sgl->data = (u32)kmalloc(1024, GFP_KERNEL);
sgl->dma_desc.LADR = (u32)virt_to_phys(sgl->data);
sgl->dma_desc.PDAR = 0x00610000;
sgl->dma_desc.PUADR = 0;
sgl->dma_desc.BC = 1024;
/* second descriptor has interrupt flag enabled */
sgl->dma_desc.DC = (DMA_DCR_PCI_MWI | DMA_DCR_IE);
/* must set end of chain flag */
sgl->status = DMA_END_CHAIN; /* DO NOT FORGET THIS!!!! */
memset(sgl->data, 0x0f, 1024);
/* User's responsibility to keep buffers cached coherent */
cpu_dcache_clean(sgl->data, sgl->data + 1024);
/* queing the buffer, this function will sleep since no callback */
err = dma_queue_buffer(channel, sgl_head);
/* now we are woken from DMA complete */
/* do data operations here */
/* free DMA data if necessary */
/* return the descriptors */
dma_return_buffer(channel, sgl_head->list);
/* free the DMA */
dma_free(channel);
kfree((void *)sgl_head);
}
dma_sgl_t * dma_get_buffer(dmach_t channel, int buf_num);
This call allocates DMA descriptors for the user.
void dma_return_buffer(dmach_t channel, dma_sgl_t *list);
This call returns the allocated descriptors back to the API.
int dma_suspend(dmach_t channel);
This call suspends any DMA transfer on the given channel.
int dma_resume(dmach_t channel);
This call resumes a DMA transfer which would have been stopped through
dma_suspend().
int dma_flush_all(dmach_t channel);
This completely flushes all queued buffers and on-going DMA transfers on a
given channel. This is called when DMA channel errors have occured.
void dma_free(dmach_t channel);
This clears all activities on a given DMA channel and releases it for future
requests.
Buffer Allocation
-----------------
It is the user's responsibility to allocate, free, and keep track of the
allocated DMA data memory. Upon calling dma_queue_buffer() the user must
relinquish the control of the buffers to the kernel and not change the
state of the buffers that it has passed to the kernel. The user will regain
the control of the buffers when it has been woken up by the bottom half of
the DMA interrupt handler. The user can allocate cached buffers or non-cached
via pci_alloc_consistent(). It is the user's responsibility to ensure that
the data is cache coherent.
*Reminder*
The user is responsble to ensure the ATU is setup properly for DMA transfers.
All Disclaimers apply. Use this at your own discretion. Neither Intel nor I
will be responsible ifanything goes wrong.
Support functions for the Intel 80310 MU
===========================================
Dave Jiang <dave.jiang@intel.com>
Last updated: 10/11/2001
The messaging unit of the IOP310 contains 4 components and is utilized for
passing messages between the PCI agents on the primary bus and the Intel(R)
80200 CPU. The four components are:
Messaging Component
Doorbell Component
Circular Queues Component
Index Registers Component
Messaging Component:
Contains 4 32bit registers, 2 in and 2 out. Writing to the registers assert
interrupt on the PCI bus or to the 80200 depend on incoming or outgoing.
int mu_msg_request(u32 *mu_context);
Request the usage of Messaging Component. mu_context is written back by the
API. The MU context is passed to other Messaging calls as a parameter.
int mu_msg_set_callback(u32 mu_context, u8 reg, mu_msg_cb_t func);
Setup the callback function for incoming messages. Callback can be setup for
outbound 0, 1, or both outbound registers.
int mu_msg_post(u32 mu_context, u32 val, u8 reg);
Posting a message in the val parameter. The reg parameter denotes whether
to use register 0, 1.
int mu_msg_free(u32 mu_context, u8 mode);
Free the usage of messaging component. mode can be specified soft or hard. In
hardmode all resources are unallocated.
Doorbell Component:
The doorbell registers contains 1 inbound and 1 outbound. Depending on the bits
being set different interrupts are asserted.
int mu_db_request(u32 *mu_context);
Request the usage of the doorbell register.
int mu_db_set_callback(u32 mu_context, mu_db_cb_t func);
Setting up the inbound callback.
void mu_db_ring(u32 mu_context, u32 mask);
Write to the outbound db register with mask.
int mu_db_free(u32 mu_context);
Free the usage of doorbell component.
Circular Queues Component:
The circular queue component has 4 circular queues. Inbound post, inbound free,
outbound post, outbound free. These queues are used to pass messages.
int mu_cq_request(u32 *mu_context, u32 q_size);
Request the usage of the queue. See code comment header for q_size. It tells
the API how big of queues to setup.
int mu_cq_inbound_init(u32 mu_context, mfa_list_t *list, u32 size,
mu_cq_cb_t func);
Init inbound queues. The user must provide a list of free message frames to
be put in inbound free queue and the callback function to handle the inbound
messages.
int mu_cq_enable(u32 mu_context);
Enables the circular queues mechanism. Called once all the setup functions
are called.
u32 mu_cq_get_frame(u32 mu_context);
Obtain the address of an outbound free frame for the user.
int mu_cq_post_frame(u32 mu_context, u32 mfa);
The user can post the frame once getting the frame and put information in the
frame.
int mu_cq_free(u32 mu_context);
Free the usage of circular queues mechanism.
Index Registers Component:
The index register provides the mechanism to receive inbound messages.
int mu_ir_request(u32 *mu_context);
Request of Index Register component usage.
int mu_ir_set_callback(u32 mu_context, mu_ir_cb_t callback);
Setting up callback for inbound messages. The callback will receive the
value of the register that IAR offsets to.
int mu_ir_free(u32 mu_context);
Free the usage of Index Registers component.
void mu_set_irq_threshold(u32 mu_context, int thresh);
Setup the IRQ threshold before relinquish processing in IRQ space. Default
is set at 10 loops.
*NOTE: Example of host driver that utilize the MU can be found in the Linux I2O
driver. Specifically i2o_pci and some functions of i2o_core. The I2O driver
only utilize the circular queues mechanism. The other 3 components are simple
enough that they can be easily setup. The MU API provides no flow control for
the messaging mechanism. Flow control of the messaging needs to be established
by a higher layer of software on the IOP or the host driver.
All Disclaimers apply. Use this at your own discretion. Neither Intel nor I
will be responsible if anything goes wrong. =)
TODO
____
Intel's XScale Microarchitecture 80312 companion processor provides a
Performance Monitoring Unit (PMON) that can be utilized to provide
information that can be useful for fine tuning of code. This text
file describes the API that's been developed for use by Linux kernel
programmers. Note that to get the most usage out of the PMON,
I highly reccomend getting the XScale reference manual from Intel[1]
and looking at chapter 12.
To use the PMON, you must #include <asm-arm/arch-iop310/pmon.h> in your
source file.
Since there's only one PMON, only one user can currently use the PMON
at a given time. To claim the PMON for usage, call iop310_pmon_claim() which
returns an identifier. When you are done using the PMON, call
iop310_pmon_release() with the id you were given earlier.
The PMON consists of 14 registers that can be used for performance measurements.
By combining different statistics, you can derive complex performance metrics.
To start the PMON, just call iop310_pmon_start(mode). Mode tells the PMON what
statistics to capture and can each be one of:
IOP310_PMU_MODE0
Performance Monitoring Disabled
IOP310_PMU_MODE1
Primary PCI bus and internal agents (bridge, dma Ch0, dam Ch1, patu)
IOP310_PMU_MODE2
Secondary PCI bus and internal agents (bridge, dma Ch0, dam Ch1, patu)
IOP310_PMU_MODE3
Secondary PCI bus and internal agents (external masters 0..2 and Intel
80312 I/O companion chip)
IOP310_PMU_MODE4
Secondary PCI bus and internal agents (external masters 3..5 and Intel
80312 I/O companion chip)
IOP310_PMU_MODE5
Intel 80312 I/O companion chip internal bus, DMA Channels and Application
Accelerator
IOP310_PMU_MODE6
Intel 80312 I/O companion chip internal bus, PATU, SATU and Intel 80200
processor
IOP310_PMU_MODE7
Intel 80312 I/O companion chip internal bus, Primary PCI bus, Secondary
PCI bus and Secondary PCI agents (external masters 0..5 & Intel 80312 I/O
companion chip)
To get the results back, call iop310_pmon_stop(&results) where results is
defined as follows:
typedef struct _iop310_pmon_result
{
u32 timestamp; /* Global Time Stamp Register */
u32 timestamp_overflow; /* Time Stamp overflow count */
u32 event_count[14]; /* Programmable Event Counter
Registers 1-14 */
u32 event_overflow[14]; /* Overflow counter for PECR1-14 */
} iop310_pmon_res_t;
--
This code is still under development, so please feel free to send patches,
questions, comments, etc to me.
Deepak Saxena <dsaxena@mvista.com>
Intel's XScale Microarchitecture provides support for locking of data
and instructions into the appropriate caches. This file provides
an overview of the API that has been developed to take advantage of this
feature from kernel space. Note that there is NO support for user space
cache locking.
For example usage of this code, grab:
ftp://source.mvista.com/pub/xscale/cache-test.c
If you have any questions, comments, patches, etc, please contact me.
Deepak Saxena <dsaxena@mvista.com>
API DESCRIPTION
I. Header File
#include <asm/xscale-lock.h>
II. Cache Capability Discovery
SYNOPSIS
int cache_query(u8 cache_type,
struct cache_capabilities *pcache);
struct cache_capabilities
{
u32 flags; /* Flags defining capabilities */
u32 cache_size; /* Cache size in K (1024 bytes) */
u32 max_lock; /* Maximum lockable region in K */
}
/*
* Flags
*/
/*
* Bit 0: Cache lockability
* Bits 1-31: Reserved for future use
*/
#define CACHE_LOCKABLE 0x00000001 /* Cache can be locked */
/*
* Cache Types
*/
#define ICACHE 0x00
#define DCACHE 0x01
DESCRIPTION
This function fills out the pcache capability identifier for the
requested cache. cache_type is either DCACHE or ICACHE. This
function is not very useful at the moment as all XScale CPU's
have the same size Cache, but is is provided for future XScale
based processors that may have larger cache sizes.
RETURN VALUE
This function returns 0 if no error occurs, otherwise it returns
a negative, errno compatible value.
-EIO Unknown hardware error
III. Cache Locking
SYNOPSIS
int cache_lock(void *addr, u32 len, u8 cache_type, const char *desc);
DESCRIPTION
This function locks a physically contigous portion of memory starting
at the virtual address pointed to by addr into the cache referenced
by cache_type.
The address of the data/instruction that is to be locked must be
aligned on a cache line boundary (L1_CACHE_ALIGNEMENT).
The desc parameter is an optional (pass NULL if not used) human readable
descriptor of the locked memory region that is used by the cache
management code to build the /proc/cache_locks table.
Note that this function does not check whether the address is valid
or not before locking it into the cache. That duty is up to the
caller. Also, it does not check for duplicate or overlaping
entries.
RETURN VALUE
If the function is successful in locking the entry into cache, a
zero is returned.
If an error occurs, an appropriate error value is returned.
-EINVAL The memory address provided was not cache line aligned
-ENOMEM Could not allocate memory to complete operation
-ENOSPC Not enough space left on cache to lock in requested region
-EIO Unknown error
III. Cache Unlocking
SYNOPSIS
int cache_unlock(void *addr)
DESCRIPTION
This function unlocks a portion of memory that was previously locked
into either the I or D cache.
RETURN VALUE
If the entry is cleanly unlocked from the cache, a 0 is returned.
In the case of an error, an appropriate error is returned.
-ENOENT No entry with given address associated with this cache
-EIO Unknown error
Intel's XScale Microarchitecture processors provide a Performance
Monitoring Unit (PMU) that can be utilized to provide information
that can be useful for fine tuning of code. This text file describes
the API that's been developed for use by Linux kernel programmers.
When I have some extra time on my hand, I will extend the code to
provide support for user mode performance monitoring (which is
probably much more useful). Note that to get the most usage out
of the PMU, I highly reccomend getting the XScale reference manual
from Intel and looking at chapter 12.
To use the PMU, you must #include <asm/xscale-pmu.h> in your source file.
Since there's only one PMU, only one user can currently use the PMU
at a given time. To claim the PMU for usage, call pmu_claim() which
returns an identifier. When you are done using the PMU, call
pmu_release() with the identifier that you were given by pmu_claim.
In addition, the PMU can only be used on XScale based systems that
provide an external timer. Systems that the PMU is currently supported
on are:
- Cyclone IQ80310
Before delving into how to use the PMU code, let's do a quick overview
of the PMU itself. The PMU consists of three registers that can be
used for performance measurements. The first is the CCNT register with
provides the number of clock cycles elapsed since the PMU was started.
The next two register, PMN0 and PMN1, are eace user programmable to
provide 1 of 20 different performance statistics. By combining different
statistics, you can derive complex performance metrics.
To start the PMU, just call pmu_start(pm0, pmn1). pmn0 and pmn1 tell
the PMU what statistics to capture and can each be one of:
EVT_ICACHE_MISS
Instruction fetches requiring access to external memory
EVT_ICACHE_NO_DELIVER
Instruction cache could not deliver an instruction. Either an
ICACHE miss or an instruction TLB miss.
EVT_ICACHE_DATA_STALL
Stall in execution due to a data dependency. This counter is
incremented each cycle in which the condition is present.
EVT_ITLB_MISS
Instruction TLB miss
EVT_DTLB_MISS
Data TLB miss
EVT_BRANCH
A branch instruction was executed and it may or may not have
changed program flow
EVT_BRANCH_MISS
A branch (B or BL instructions only) was mispredicted
EVT_INSTRUCTION
An instruction was executed
EVT_DCACHE_FULL_STALL
Stall because data cache buffers are full. Incremented on every
cycle in which condition is present.
EVT_DCACHE_FULL_STALL_CONTIG
Stall because data cache buffers are full. Incremented on every
cycle in which condition is contigous.
EVT_DCACHE_ACCESS
Data cache access (data fetch)
EVT_DCACHE_MISS
Data cache miss
EVT_DCACHE_WRITE_BACK
Data cache write back. This counter is incremented for every
1/2 line (four words) that are written back.
EVT_PC_CHANGED
Software changed the PC. This is incremented only when the
software changes the PC and there is no mode change. For example,
a MOV instruction that targets the PC would increment the counter.
An SWI would not as it triggers a mode change.
EVT_BCU_REQUEST
The Bus Control Unit(BCU) received a request from the core
EVT_BCU_FULL
The BCU request queue if full. A high value for this event means
that the BCU is often waiting for to complete on the external bus.
EVT_BCU_DRAIN
The BCU queues were drained due to either a Drain Write Buffer
command or an I/O transaction for a page that was marked as
uncacheable and unbufferable.
EVT_BCU_ECC_NO_ELOG
The BCU detected an ECC error on the memory bus but noe ELOG
register was available to to log the errors.
EVT_BCU_1_BIT_ERR
The BCU detected a 1-bit error while reading from the bus.
EVT_RMW
An RMW cycle occurred due to narrow write on ECC protected memory.
To get the results back, call pmu_stop(&results) where results is defined
as a struct pmu_results:
struct pmu_results
{
u32 ccnt; /* Clock Counter Register */
u32 ccnt_of; /
u32 pmn0; /* Performance Counter Register 0 */
u32 pmn0_of;
u32 pmn1; /* Performance Counter Register 1 */
u32 pmn1_of;
};
Pretty simple huh? Following are some examples of how to get some commonly
wanted numbers out of the PMU data. Note that since you will be dividing
things, this isn't super useful from the kernel and you need to printk the
data out to syslog. See [1] for more examples.
Instruction Cache Efficiency
pmu_start(EVT_INSTRUCTION, EVT_ICACHE_MISS);
...
pmu_stop(&results);
icache_miss_rage = results.pmn1 / results.pmn0;
cycles_per_instruction = results.ccnt / results.pmn0;
Data Cache Efficiency
pmu_start(EVT_DCACHE_ACCESS, EVT_DCACHE_MISS);
...
pmu_stop(&results);
dcache_miss_rage = results.pmn1 / results.pmn0;
Instruction Fetch Latency
pmu_start(EVT_ICACHE_NO_DELIVER, EVT_ICACHE_MISS);
...
pmu_stop(&results);
average_stall_waiting_for_instruction_fetch =
results.pmn0 / results.pmn1;
percent_stall_cycles_due_to_instruction_fetch =
results.pmn0 / results.ccnt;
ToDo:
- Add support for usermode PMU usage. This might require hooking into
the scheduler so that we pause the PMU when the task that requested
statistics is scheduled out.
--
This code is still under development, so please feel free to send patches,
questions, comments, etc to me.
Deepak Saxena <dsaxena@mvista.com>
Intel's XScale Microarchitecture provides support for locking of TLB
entries in both the instruction and data TLBs. This file provides
an overview of the API that has been developed to take advantage of this
feature from kernel space. Note that there is NO support for user space.
In general, this feature should be used in conjunction with locking
data or instructions into the appropriate caches. See the file
cache-lock.txt in this directory.
If you have any questions, comments, patches, etc, please contact me.
Deepak Saxena <dsaxena@mvista.com>
API DESCRIPTION
I. Header file
#include <asm/xscale-lock.h>
II. Locking an entry into the TLB
SYNOPSIS
xscale_tlb_lock(u8 tlb_type, u32 addr);
/*
* TLB types
*/
#define ITLB 0x0
#define DTLB 0x1
DESCRIPTION
This function locks the virtual to physical mapping for virtual
address addr into the requested TLB.
RETURN VALUE
If the entry is properly locked into the TLB, a 0 is returned.
In case of an error, an appropriate error is returned.
-ENOSPC No more entries left in the TLB
-EIO Unknown error
III. Unlocking an entry from a TLB
SYNOPSIS
xscale_tlb_unlock(u8 tlb_type, u32 addr);
DESCRIPTION
This function unlocks the entry for virtual address addr from the
specified cache.
RETURN VALUE
If the TLB entry is properly unlocked, a 0 is returned.
In case of an error, an appropriate error is returned.
-ENOENT No entry for given address in specified TLB
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment