Jan 29 14:04:40 the speaker is David Malcom, packager of python for Fedora Jan 29 14:05:11 he is planning to talk about different species of python (jython, cython etc.) Jan 29 14:05:35 seems to be a technical audience (experience with python, Java, fedora packaging etc) Jan 29 14:05:46 it probably would be Jan 29 14:05:52 pypy etc Jan 29 14:06:09 * attempting to get his laptop to behave and show the slides properly Jan 29 14:06:25 So why do we care about the different species of python? Jan 29 14:06:59 * interruption for request to transcribe the presentation Jan 29 14:07:27 * bcl (~bcl@neil.brianlane.com) has joined #fudcon-room-3 Jan 29 14:07:31 * interruption completed Jan 29 14:07:49 slides will be uploaded to David Malcom's fedora people page once he has internet access Jan 29 14:07:57 so why do we care about different species of python? Jan 29 14:08:17 intellectually interested in different implementatations, different strengths/weaknesses Jan 29 14:08:25 memory usage, debugging ability, etc/. Jan 29 14:08:53 also interacting with other technologies (ie jython for interacting with java) Jan 29 14:09:18 Doesn't assert that there is a single best implementation of python - they all have their strengths and best places Jan 29 14:09:23 so what is python for? Jan 29 14:09:30 - one off scripts Jan 29 14:09:37 tflink: thanks Jan 29 14:09:44 - simple hacks that can be changed into something long-term Jan 29 14:10:05 - highly readable high-level language Jan 29 14:10:18 - Python is "Batteries Included" Jan 29 14:10:33 * feel free to ask questions (even remote) - I will try to relay Jan 29 14:10:54 Python can also be used as glue code for bridging libraries with high level code Jan 29 14:11:08 sometimes, the linux community is too independant - won't accept a common runtime Jan 29 14:11:23 * jsmith-mobile (95a98657@gateway/web/freenode/ip.149.169.134.87) has joined #fudcon-room-3 Jan 29 14:11:42 python can be used as something as a "common runtime" (as much as anything) Jan 29 14:12:07 Since python can be easily plugged into c++, easy to use with gdb Jan 29 14:12:14 easy to bind to C libs is a strength Jan 29 14:12:23 So where is python used in Fedora? Jan 29 14:12:37 * rdieter (~foo@fedora/rdieter) has joined #fudcon-room-3 Jan 29 14:12:46 powers *.fedoraproject.org Jan 29 14:13:08 also used by TurboGears, Django, other apps (koji et. al.) Jan 29 14:13:23 Fedora infrastructure does use some Django, but it is minimal Jan 29 14:13:46 So we have all these possible uses of python (glue code, web development, simple scripts ...) Jan 29 14:14:05 -> "Python" vs "CPython" Jan 29 14:14:10 Python -> language Jan 29 14:14:31 CPython -> what most people think of as python (generally /usr/bin/python" and the original implementation Jan 29 14:14:42 * DiscordianUK nods Jan 29 14:15:37 * missed the bullets on slide about kloc in sections of CPython Jan 29 14:15:55 CPython's object system Jan 29 14:15:58 many klocs I'm sure Jan 29 14:16:24 Cpython is a implementation is C and has objects and types hand-coded in C Jan 29 14:16:34 Objects are .c structs with a ref count Jan 29 14:16:57 references between objects are just .c pointers -> objects can't move around in memory Jan 29 14:17:13 * Cerlyn (~Cerlyn@66.87.11.113) has joined #fudcon-room-3 Jan 29 14:17:51 there is one big mutex in python (for counting references, if I heard correctly) Jan 29 14:18:05 * question about a patch by google to remove that mutex Jan 29 14:18:13 The Global Interpreter Lock (GIL) Jan 29 14:18:26 there was an attempt to remove the mutex in the past (0.99 era?) but it failed Jan 29 14:18:58 * transcribers note - sorry, I'm having a bit of trouble keeping up. missing a little bit Jan 29 14:19:23 thanks for what you are doing Jan 29 14:19:25 the other issue with CPython is reference counting Jan 29 14:19:54 these pointers are being passed around by hand, and its easy to get wrong Jan 29 14:20:08 can end up with memory leaks, segfaults and other hard to debug situations Jan 29 14:20:12 but on the other hand, it is simple Jan 29 14:20:23 * burriedu2 (95a9ac77@gateway/web/freenode/ip.149.169.172.119) has joined #fudcon-room-3 Jan 29 14:20:32 the next part of CPython is the interpreter Jan 29 14:20:51 python compiles the code down to bytecode which is a series of simple operations Jan 29 14:21:35 tha.py files are turned into a syntax tree that are turned into instructions that are on the "Fake" CPU and some operations are collapsed into just data Jan 29 14:21:54 example: if using the len() function, it is possible to redefine that Jan 29 14:22:19 also possible to redefine stuff like true and false, so its hard to do traditional optimizations at compile time Jan 29 14:22:36 * question - is byte code consistant between implementations? Jan 29 14:22:46 no, the bytecode is not consistant between the implementations Jan 29 14:23:20 that's a fail then Jan 29 14:23:25 there is a marker in the .pyc that identifies the version of the bytecode generated and a timestamp of the associated .py file Jan 29 14:23:54 so when you compile a .py file to .pyc, its kind of like a make file Jan 29 14:24:12 when you run a .py, the bytecode has to exactly match the runtime, or else it will be recompiled Jan 29 14:24:49 but the bytecode generally stays consistent between updates of the same version (ex. python2.7 versions all have the same bytecode "magic number") Jan 29 14:25:02 but the bytecode number could change between development versions Jan 29 14:25:30 * missed the question Jan 29 14:25:44 ahhh so the bytecode is consistent across OSes? Jan 29 14:26:03 the problem is that the .pyc files were generally living next to the .py files, but there is a proposal to change that Jan 29 14:26:17 the new proposal would have a separate directory for bytecode Jan 29 14:26:26 in a .pycache directory Jan 29 14:26:38 that would have a dir for each bytecode version Jan 29 14:27:13 DiscourdianUK: the bytecode should be consistent across OSs Jan 29 14:27:20 the important variable is the runtime Jan 29 14:28:06 so the opcodes (and the byte code) will change between pypy, CPython, Jython, IronPython etc. Jan 29 14:28:16 but should stay the same for all versions of CPython 2.6 etc. Jan 29 14:28:36 * quesion - are the python version reverse compatible (can you run 2.3 bytecode on a 2.6 interperter) Jan 29 14:28:53 no, you can't do that because they may have removed opcodes or added opcodes Jan 29 14:29:07 even the functions could have changed between versions Jan 29 14:29:19 * djf_jeff (~jeff@184-106-95-233.static.cloud-ips.com) has joined #fudcon-room-3 Jan 29 14:29:32 ie there are symantic differences between the different versions of python Jan 29 14:30:21 * example on screen - not sure that I can type fast enough Jan 29 14:30:52 there will hopefully be slides Jan 29 14:31:07 talking about decrememnting the reference count inside a while loop and some of the potential problems in CPython implementation Jan 29 14:31:18 DiscourdianUK: he said he would post them when he gets internet access Jan 29 14:31:34 The good parts of CPython: Jan 29 14:31:38 main loop is a giant switch() statement to process the .pyc opcodes Jan 29 14:31:49 easy to bind to C code Jan 29 14:31:56 (just please do it correctly) Jan 29 14:32:22 you can wrap other C code with "python like" data types to be able to include it into python code Jan 29 14:32:43 it is a rather simple implementation, in the grand scheme of things Jan 29 14:32:49 the bad parts of CPython: Jan 29 14:33:14 it is a bit slow since you're always interpreting the bytecode - never going to be as fast as machine code Jan 29 14:33:32 since the language is so dynamic, you can't use a lot of the traditional optimizations for compile time Jan 29 14:34:10 The Global Interpreter Lock is another disadvantage Jan 29 14:34:25 * question - what about google's unladen swallow? Jan 29 14:34:33 that was a project to add a JIT to CPython Jan 29 14:35:09 they tried to take LLVM (low level virtual machine) which is a library that implements a lot of the things that a compiler could use Jan 29 14:35:36 so you could construct fragements of code and say "give me machine code" Jan 29 14:36:00 when a python code is being called 1000 with the same int valued Jan 29 14:36:26 so the JIT would make machine code instead of a big switch statement Jan 29 14:36:34 the hope was that it would provide a HUGE speedup Jan 29 14:36:42 unfortunately, it was only about 20% speedup Jan 29 14:37:10 since you are generating all that code at runtime, you are doing a LOT of checks (~5 conditionals before adding 2 ints) Jan 29 14:37:24 in theory, you could optimize all of that away with clever coding Jan 29 14:37:45 but the last word on the mailing list was that all the people at google who were working on this have moved on to other projects Jan 29 14:38:01 at this time, there doesn't seem to be any people primed to take over the unladen swallow project Jan 29 14:38:03 moving on ... Jan 29 14:38:23 reference counting fun -> it is too easy to et it wrong and cause crash or other problems Jan 29 14:38:36 Objects can't move around in memory and this can fragment the heap Jan 29 14:39:06 lots of references twiddling-> impossible to have readonly data in shared memory pages (ie KBM's KSM) Jan 29 14:39:22 THere is a non-opaque object API Jan 29 14:39:30 the implementation of details are visable to C extensions Jan 29 14:39:45 this makes them hard to change without breaking hundreds of extentions Jan 29 14:39:57 example: strings are merely string + length Jan 29 14:40:13 * notes that there isn't much time left Jan 29 14:40:23 if you're going to do extensions, please use Cython Jan 29 14:40:36 it auto-generates .c code, handling alot of the details Jan 29 14:40:50 PLEASE don't use SWIG (this will probably be contravertial) Jan 29 14:41:05 all you get are python objects taht wrap C and C++ pointers Jan 29 14:41:12 * going faster now Jan 29 14:41:18 Debug builds Jan 29 14:41:30 Complile CPYthon --with-pydebug Jan 29 14:41:49 adds lots of useful debugging instrumention and makes it easier to debug in gdb Jan 29 14:41:52 but it is a LOT slower Jan 29 14:42:33 keep in mind that while the .h files are the same between debug and non-debug builds, the .so files are NOT COMPATIBLE with the regular optimized python (There are ABI differences) Jan 29 14:42:51 so for example, you couldn't run yum with debug python Jan 29 14:42:57 -- Python 3 Jan 29 14:43:13 Python 3 is a big rewrite of CPython 2, fixing lots of long-standing problems Jan 29 14:43:26 there are syntactic differences from Python 2 Jan 29 14:43:44 cnages in the standard lirary, different .pyc files Jan 29 14:44:04 but it should be much nicer to use than Python 2 (in the speakers opinion) Jan 29 14:44:17 there is slowly growing 3rd party module support Jan 29 14:44:34 * question - Is the global lock gone in python 3? Jan 29 14:44:45 no, it is there still Jan 29 14:44:58 there are arch. issues that won't be fixed in Python 3 Jan 29 14:45:05 -> alternate Python Jan 29 14:45:08 Jythhon: Jan 29 14:45:24 Java base class: org.python.core.PyObject Jan 29 14:45:38 Can wrap arbitrary objects in python Jan 29 14:45:45 so what is the runtime of Jython? Jan 29 14:45:50 it's java Jan 29 14:45:59 * badkittydaddy (95a9941d@gateway/web/freenode/ip.149.169.148.29) has joined #fudcon-room-3 Jan 29 14:46:05 the .py files are compiled to syntax treed, then converted directly into java bytecode Jan 29 14:46:26 in theory, it should be fast (JIT-compiled machine code) Jan 29 14:46:53 however, much of the time, Java bytecode is calling back into the code PyObject code which has to implement some messy switch statements Jan 29 14:47:08 * q - doesn't java do a lot of the same things? Jan 29 14:47:36 kind of, but java is a lot more static and there is some hacking in order to bridge the two worlds Jan 29 14:47:45 so what are the advantages off jython? Jan 29 14:47:57 you can embed it inside a java appserver Jan 29 14:48:17 you can use the java garbage collecter (the CPython GC is not very good) Jan 29 14:48:33 the java VM is perhaps the best open source runtime that we have: Jan 29 14:48:52 JIT, GC, years of research and competition, no GIL Jan 29 14:49:21 * missed question Jan 29 14:49:58 the GIL tends to be an issue when you're trying to get max performance Jan 29 14:50:13 for example, some of yum's perf issues come from talking to disk Jan 29 14:50:23 you want it to have a simple interface, though Jan 29 14:50:36 but when you are working on a script, how fast does it really have to be? Jan 29 14:50:50 the performance issues are more prevalent in webserver space Jan 29 14:50:59 GIL hampers multiprocessing Jan 29 14:51:09 -> back to Jython Jan 29 14:51:26 you can also use Java DB bindings in python code Jan 29 14:51:31 but Jython is still at 2.6 Jan 29 14:51:39 ==> IronPython Jan 29 14:51:53 IP is similar to Jython but on top of the CLR (.NET Runtime) Jan 29 14:52:06 question was about usinthe the multiprocessing module instead of threads. It breaks them out into subprocesses instead of threads so multicores can be taken advantage of. Jan 29 14:52:14 apparantly it works on Mono (not sure who is working on this) Jan 29 14:52:39 * jdob (95a97df7@gateway/web/freenode/ip.149.169.125.247) has joined #fudcon-room-3 Jan 29 14:52:48 *q didn't microsoft fire the person who was working on IP (not sure if I'm right on this - transcriber) Jan 29 14:52:55 ... moving on to PyPy Jan 29 14:53:10 PyPy is very different from the others talked about Jan 29 14:53:28 it is an implementation of an interperter for the FULL python language (with JIT compilation) Jan 29 14:53:37 it is written in a high level language Jan 29 14:54:02 the implementation language is compiled down to .c code from which we get binary Jan 29 14:54:11 can also compile to C#, Java etc. Jan 29 14:54:29 PyPy is actually written in Python (hence PyPy) Jan 29 14:54:46 * Diagram on slides about how many pythons Jan 29 14:54:53 * jdob (95a97df7@gateway/web/freenode/ip.149.169.125.247) has left #fudcon-room-3 Jan 29 14:55:21 you are supposed to run python code through the PyPy code, which spits out lots of generated c code Jan 29 14:55:36 which should have the same behavior as the python code would have running through the interpreter Jan 29 14:56:03 so you end up with c code that takes a while to translate but it can end up much faster than interpreted python code Jan 29 14:56:13 PyPy has limitations Jan 29 14:56:32 so we go through this strange process of translation to allow different optimization Jan 29 14:56:55 pypy does have .pyc files by default (similar but different from CPython) Jan 29 14:57:09 it is starting to have support for the CPython extension APIs Jan 29 14:57:27 it is different, but everytime the speaker has tried them, it has segfaulted, crashed and burned Jan 29 14:57:39 bcl: sorry, I missed your question, will try after talk Jan 29 14:57:47 advantages of pypy Jan 29 14:57:54 speed: see http://speed.pypy.org Jan 29 14:58:10 it is fast because the object implementations are better and have smarter data structures Jan 29 14:58:26 JIT: based on tracing itself, interpreting "hot" loops Jan 29 14:59:02 tflink: I'm sitting behind you :) comment was about the multiprocessing question you missed. Jan 29 14:59:17 bcl: thanks, just trying to keep up Jan 29 14:59:45 memory usage should be better, because of smarter data structures Jan 29 14:59:54 disadvantages of pypy: Jan 29 15:00:06 currently at python 2.5 (2.7 is on its way) Jan 29 15:00:16 6 million lines of augenerated .c code Jan 29 15:00:25 they also only seem to care about the 2 archs Jan 29 15:00:34 * badkittydaddy has quit (Ping timeout: 265 seconds) Jan 29 15:00:48 also wanted to go into packaging and other implementation Jan 29 15:01:00 but he will probably leave that discussion for the mailing list Jan 29 15:01:10 * out of time Jan 29 15:01:25 ==> This is the end of the python presentation