This section describes how to build and use SPPARKS via a Python interface.
The SPPARKS distribution includes the file python/spparks.py which wraps the library interface to SPPARKS. This file makes it is possible to run SPPARKS, invoke SPPARKS commands or give it an input script, extract SPPARKS results, an modify internal SPPARKS variables, either from a Python script or interactively from a Python prompt. You can do the former in serial or parallel. Running Python interactively in parallel does not generally work, unless you have a package installed that extends your Python to enable multiple instances of Python to read what you type.
Python is a powerful scripting and programming language which can be used to wrap software like SPPARKS and other packages. It can be used to glue multiple pieces of software together, e.g. to run a coupled or multiscale model. See Section section of the manual for more ideas about coupling SPPARKS to other codes. See Section_start 4 about how to build SPPARKS as a library, and Section_howto 3 for a description of the library interface provided in src/library.cpp and src/library.h and how to extend it for your needs. As described below, that interface is what is exposed to Python. It is designed to be easy to add functions to. This can easily extend the Python inteface as well. See details below.
By using the Python interface, SPPARKS can also be coupled with a GUI or other visualization tools that display graphs or animations in real time as SPPARKS runs. Examples of such scripts may eventually be included in the python directory.
Two advantages of using Python are how concise the language is, and that it can be run interactively, enabling rapid development and debugging of programs. If you use it to mostly invoke costly operations within SPPARKS, such as running a simulation for a reasonable number of timesteps, then the overhead cost of invoking SPPARKS thru Python will be negligible.
Before using SPPARKS from a Python script, you need to do two things. You need to build SPPARKS as a dynamic shared library, so it can be loaded by Python. And you need to tell Python how to find the library and the Python wrapper file python/spparks.py. Both these steps are discussed below. If you wish to run SPPARKS in parallel from Python, you also need to extend your Python with MPI. This is also discussed below.
The Python wrapper for SPPARKS uses the amazing and magical (to me) "ctypes" package in Python, which auto-generates the interface code needed between Python and a set of C interface routines for a library. Ctypes is part of standard Python for versions 2.5 and later. You can check which version of Python you have installed, by simply typing "python" at a shell prompt.
Instructions on how to build SPPARKS as a shared library are given in Section_start 5. A shared library is one that is dynamically loadable, which is what Python requires. On Linux this is a library file that ends in ".so", not ".a".
From the src directory, type
make makeshlib make -f Makefile.shlib foo
where foo is the machine target name, such as linux or g++ or serial. This should create the file libspparks_foo.so in the src directory, as well as a soft link libspparks.so, which is what the Python wrapper will load by default. Note that if you are building multiple machine versions of the shared library, the soft link is always set to the most recently built version.
If this fails, see Section_start 5 for more details, especially if your SPPARKS build uses auxiliary libraries like MPI which may not be built as shared libraries on your system.
For Python to invoke SPPARKS, there are 2 files it needs to know about:
Spparks.py is the Python wrapper on the SPPARKS library interface. Libspparks.so is the shared SPPARKS library that Python loads, as described above.
You can insure Python can find these files in one of two ways:
If you set the paths to these files as environment variables, you only have to do it once. For the csh or tcsh shells, add something like this to your ~/.cshrc file, one line for each of the two files:
setenv PYTHONPATH $PYTHONPATH:/home/sjplimp/spparks/python setenv LD_LIBRARY_PATH $LD_LIBRARY_PATH:/home/sjplimp/spparks/src
If you use the python/install.py script, you need to invoke it every time you rebuild SPPARKS (as a shared library) or make changes to the python/spparks.py file.
You can invoke install.py from the python directory as
% python install.py [libdir] [pydir]
The optional libdir is where to copy the SPPARKS shared library to; the default is /usr/local/lib. The optional pydir is where to copy the spparks.py file to; the default is the site-packages directory of the version of Python that is running the install script.
Note that libdir must be a location that is in your default LD_LIBRARY_PATH, like /usr/local/lib or /usr/lib. And pydir must be a location that Python looks in by default for imported modules, like its site-packages dir. If you want to copy these files to non-standard locations, such as within your own user space, you will need to set your PYTHONPATH and LD_LIBRARY_PATH environment variables accordingly, as above.
If the install.py script does not allow you to copy files into system directories, prefix the python command with "sudo". If you do this, make sure that the Python that root runs is the same as the Python you run. E.g. you may need to do something like
% sudo /usr/local/bin/python install.py [libdir] [pydir]
You can also invoke install.py from the make command in the src directory as
% make install-python
In this mode you cannot append optional arguments. Again, you may need to prefix this with "sudo". In this mode you cannot control which Python is invoked by root.
Note that if you want Python to be able to load different versions of the SPPARKS shared library (see this section below), you will need to manually copy files like libspparks_g++.so into the appropriate system directory. This is not needed if you set the LD_LIBRARY_PATH environment variable as described above.
If you wish to run SPPARKS in parallel from Python, you need to extend your Python with an interface to MPI. This also allows you to make MPI calls directly from Python in your script, if you desire.
There are several Python packages available that purport to wrap MPI as a library and allow MPI functions to be called from Python.
These include
All of these except pyMPI work by wrapping the MPI library and exposing (some portion of) its interface to your Python script. This means Python cannot be used interactively in parallel, since they do not address the issue of interactive input to multiple instances of Python running on different processors. The one exception is pyMPI, which alters the Python interpreter to address this issue, and (I believe) creates a new alternate executable (in place of "python" itself) as a result.
In principle any of these Python/MPI packages should work to invoke SPPARKS in parallel and MPI calls themselves from a Python script which is itself running in parallel. However, when I downloaded and looked at a few of them, their documentation was incomplete and I had trouble with their installation. It's not clear if some of the packages are still being actively developed and supported.
The one I recommend, since I have successfully used it with SPPARKS, is Pypar. Pypar requires the ubiquitous Numpy package be installed in your Python. After launching python, type
import numpy
to see if it is installed. If not, here is how to install it (version 1.3.0b1 as of April 2009). Unpack the numpy tarball and from its top-level directory, type
python setup.py build sudo python setup.py install
The "sudo" is only needed if required to copy Numpy files into your Python distribution's site-packages directory.
To install Pypar (version pypar-2.1.4_94 as of Aug 2012), unpack it and from its "source" directory, type
python setup.py build sudo python setup.py install
Again, the "sudo" is only needed if required to copy Pypar files into your Python distribution's site-packages directory.
If you have successully installed Pypar, you should be able to run Python and type
import pypar
without error. You should also be able to run python in parallel on a simple test script
% mpirun -np 4 python test.py
where test.py contains the lines
import pypar print "Proc %d out of %d procs" % (pypar.rank(),pypar.size())
and see one line of output for each processor you run on.
IMPORTANT NOTE: To use Pypar and SPPARKS in parallel from Python, you must insure both are using the same version of MPI. If you only have one MPI installed on your system, this is not an issue, but it can be if you have multiple MPIs. Your SPPARKS build is explicit about which MPI it is using, since you specify the details in your lo-level src/MAKE/Makefile.foo file. Pypar uses the "mpicc" command to find information about the MPI it uses to build against. And it tries to load "libmpi.so" from the LD_LIBRARY_PATH. This may or may not find the MPI library that SPPARKS is using. If you have problems running both Pypar and SPPARKS together, this is an issue you may need to address, e.g. by moving other MPI installations so that Pypar finds the right one.
To test if SPPARKS is callable from Python, launch Python interactively and type:
>>> from spparks import spparks >>> spk = spparks()
If you get no errors, you're ready to use SPPARKS from Python. If the 2nd command fails, the most common error to see is
OSError: Could not load SPPARKS dynamic library
which means Python was unable to load the SPPARKS shared library. This typically occurs if the system can't find the SPPARKS shared library or one of the auxiliary shared libraries it depends on, or if something about the library is incompatible with your Python. The error message should give you an indication of what went wrong.
You can also test the load directly in Python as follows, without first importing from the spparks.py file:
>>> from ctypes import CDLL >>> CDLL("libspparks.so")
If an error occurs, carefully go thru the steps in Section_start 5 and above about building a shared library and about insuring Python can find the necessary two files it needs.
To run a SPPARKS test in serial, type these lines into Python interactively from the examples/ising directory:
>>> from spparks import spparks >>> spk = spparks() >>> spk.file("in.ising")
Or put the same lines in the file test.py and run it as
% python test.py
Either way, you should see the results of running the in.ising example on a single processor appear on the screen, the same as if you had typed something like:
spk_g++ < in.ising
To run SPPARKS in parallel, assuming you have installed the Pypar package as discussed above, create a test.py file containing these lines:
import pypar from spparks import spparks spk = spparks() spk.file("in.ising") print "Proc %d out of %d procs has" % (pypar.rank(),pypar.size()),spk pypar.finalize()
You can then run it in parallel as:
% mpirun -np 4 python test.py
and you should see the same output as if you had typed
% mpirun -np 4 spk_g++ < in.ising
Note that if you leave out the 3 lines from test.py that specify Pypar commands you will instantiate and run SPPARKS independently on each of the P processors specified in the mpirun command. In this case you should get 4 sets of output, each showing that a SPPARKS run was made on a single processor, instead of one set of output showing that SPPARKS ran on 4 processors. If the 1-processor outputs occur, it means that Pypar is not working correctly.
Also note that once you import the PyPar module, Pypar initializes MPI for you, and you can use MPI calls directly in your Python script, as described in the Pypar documentation. The last line of your Python script should be pypar.finalize(), to insure MPI is shut down correctly.
Note that any Python script (not just for SPPARKS) can be invoked in one of several ways:
% python foo.script % python -i foo.script % foo.script
The last command requires that the first line of the script be something like this:
#!/usr/local/bin/python #!/usr/local/bin/python -i
where the path points to where you have Python installed, and that you have made the script file executable:
% chmod +x foo.script
Without the "-i" flag, Python will exit when the script finishes. With the "-i" flag, you will be left in the Python interpreter when the script finishes, so you can type subsequent commands. As mentioned above, you can only run Python interactively when running Python on a single processor, not in parallel.
The Python interface to SPPARKS consists of a Python "spparks" module, the source code for which is in python/spparks.py, which creates a "spparks" object, with a set of methods that can be invoked on that object. The sample Python code below assumes you have first imported the "spparks" module in your Python script, as follows:
from spparks import spparks
These are the methods defined by the spparks module. If you look at the file src/library.cpp you will see that they correspond one-to-one with calls you can make to the SPPARKS library from a C++ or C or Fortran program.
spk = spparks() # create a SPPARKS object using the default libspparks.so library spk = spparks("g++") # create a SPPARKS object using the libspparks_g++.so library spk = spparks("",list) # ditto, with command-line args, e.g. list = ["-echo","screen"] spk = spparks("g++",list)
spk.close() # destroy a SPPARKS object
spk.file(file) # run an entire input script, file = "in.lj" spk.command(cmd) # invoke a single SPPARKS command, cmd = "run 100.0"
xlo = spk.extract(name,type) # extract a global quantity # name = "boxxlo", "nlocal", "id", "xyz", "site", iarray2", "darray1", etc # type = 0 = int # 1 = int vector # 2 = int array # 3 = double # 4 = double vector # 5 = double array
eng = spk.energy() # query current energy of system
IMPORTANT NOTE: Currently, the creation of a SPPARKS object from within spparks.py does not take an MPI communicator as an argument. There should be a way to do this, so that the SPPARKS instance runs on a subset of processors if desired, but I don't know how to do it from Pypar. So for now, it runs with MPI_COMM_WORLD, which is all the processors. If someone figures out how to do this with one or more of the Python wrappers for MPI, like Pypar, please let us know and we will amend these doc pages.
Note that you can create multiple SPPARKS objects in your Python script, and coordinate and run multiple simulations, e.g.
from spparks import spparks spk1 = spparks() spk2 = spparks() spk1.file("in.file1") spk2.file("in.file2")
The file() and command() methods allow an input script or single commands to be invoked.
The extract() method returns values or pointers to data structures internal to SPPARKS. See the src/app.cpp file and its extract() method for a list of what is recognized as "name" arguments. Other values could easily be added.
For example, "boxxlo" returns the lower x-bound of the simulation box. "Nlocal" and "nglobal" return the number of lattice sites owned by a proc or the total # of lattice sites in the simulation. "Xyz" returns the Nx3 array of lattice site coordinates. "Site" and "iarrayN" and "darrayN" return a vector of integer or floating-point per-site values.
As noted above, these Python class methods correspond one-to-one with the functions in the SPPARKS library interface in src/library.cpp and library.h. This means you can extend the Python wrapper via the following steps:
These are the Python scripts included as demos in the python/examples directory of the SPPARKS distribution, to illustrate the kinds of things that are possible when Python wraps SPPARKS. If you create your own scripts, send them to us and we can include them in the SPPARKS distribution.
trivial.py | read/run a SPPARKS input script thru Python |
demo.py | invoke various SPPARKS library interface routines |
See the python/README file for instructions on how to run them and the source code for individual scripts for comments about what they do.