Maintaining Perspective with Multiple Cameras | TouchDesigner

File this away under “interesting theoretical concepts that I’ll never use… or will I?”

At some point while making realtime generative art for massive installations you may find that you’re beyond the capabilities of traditional realtime rendering in Touch. Let’s say, for example, that you need to render 12 hd outputs for a 3 x 4 array of screens – a resolution of 7680 x 3240 certainly can be done with a single render TOP, but delivering that final texture is a little more tricky.

I’m well aware that there are a number of possible solutions to this problem but before you find yourself composing that email to me about how to hack a way to a solution… what if it wasn’t 12 ouputs, what if it was 120? 200? What if every output was 4k? The answer we’re really after here is how to draw a scene with consistent perspective across multiple machines… because at some point you’ll have to use multiple machines. So, what do we do?

Forget what we do… what does that even look like? I’m still so confused.

Okay, so let’s first look at some examples of what it looks like as reference.

In this example we can see one large canvas that spans multiple screens. This is great – it’s huge and beautiful. This example shows a large desktop, which is also great… but what if we’re after some real-time rendering? This is a great illustration of the problem we might encounter. What if these displays were all 1920 x 1080. It’s a 7 x 4 array, so that’s going to be a definite challenge for a complex scene on a single machine. At this point we probably can’t realistically produce a single pixel to pixel texture for this array on a single machine. Instead we’d have to have a system of distributed rendering machines. Okay, that’s pretty straightforward and we can do some hip flat rendering that’s all orthographic no problem. What if we want perspective? If you want perspective in your real time rendering you need a way to conceptualize what the entire “screen” is, and then how to selectively render just a portion of that larger scene.


Consider the beautiful work of Refik Anadol. I can’t speak to exactly what technique is being used here, but it’s a good illustration of the same challenge. How can you maintain the illusion of perspective if you need to render your generative art on multiple machines? That’s the real question we’re trying to answer… and now we can look at some ideas to help us better understand that challenge.

The process and methodology described below aim to solve that problem. For this example I’m going to work in a scale that’s unrealistic… but will allow those without a commercial license to play along from home. If you have a commercial license feel free to turn up the resolution as long as you keep mathematics involved in mind.

First things first, let’s build out a simple proof of concept that will make sure we understand this problem more completely.

Let’s imagine that we have a large composition that we need to cut up (for the sake of rendering) into 4 smaller slices. That might look something like this:


Remember, this is just a proof of concept so we’re going to start with a very easy implementation first, before we start to dig into the more complex questions. An important lesson to consider when it comes to programming is to start by reducing the problem to its basic elements, then when you have a foundational understanding of the issue start to scale up – don’t worry, we’ll get there we just have to start small.

Okay, so we’ve got our 2 x 2 array that we want to render. Let’s see how we can set up some cameras to render just one of those squares a piece, but all from the same point of view.

Wait! Why do they need the same point of view? We’re after the same point of view so we can maintain perspective. It’d be easy enough to use four different cameras that were translated into positions to only see their section of the larger quad, but the results from lighting and perspective calculations wouldn’t match. You can use this multi-camera transformation technique if you’re doing orthographic rendering with emissive lighting, but not if you want to maintain perspective and use non-emissive lighting. It’s okay if you don’t believe me – I didn’t believe me either, and I had to set it up and test it bunch of times before I really understood what’s happening.

What’s that going to look like? Well, for the perspective of a single camera that can see the whole scene – the effect we want to recreate eventually – we might see something like this:


From the vantage point of a single camera, we want to be able to zero in on just a single quadrant in our scene, something like this:


Eventually, we want to reassemble the view of four cameras to look like our original single camera view.

Matt, I still don’t get it. That’s okay. Keep reading, and if by the time we get done with this example it’s still not a useful technique you can stop reading. If, however, you want a means to do perspective based illusions across multiple machines for massive installations, keep reading to the end.

We’re going to set up our test by by using a part of our camera COMP that you may not have used before. Specifically, when it comes to the view page, we’re going to use the Viewing Angle and Method called “Focal Length and Aperture.” I wish I could tell you exactly what this means to TouchDesigner – spoiler, I can’t – what I can help you understand is how these values relate to one another in order to achieve our particular illusion.

We’re going to start by setting up a simple example. Add a geo to your scene, and replace the torus inside with a grid. Set your sizex parameter to be 16/9. If you want to follow along step for step , change your grid to be a polygon, and add a noise SOP with a period of 0.02 and an amplitude of 0.5. Connect that to a facet SOP with unique points and computed normals. Connect your chain of operators to a null SOP and make sure that your display and render flag are turned on for your null. You should have a simple network that looks like this:


Outside of your geo add two moviefile in TOPs. In the first you can use the supplied quad arrangement in the assets portion of the git repository that accompanies this post. It’s called multi_perspective.png.In your second moviefile in TOP select the FiledGuide.tif. Composite these two together with a composite TOP, and change the operand method to Add. Connect this to a null TOP, and finally assign this null to a phong material as the color map. When you’re done you should have something that looks like this:


Whew. Alright, now we can finally get to the interesting part. Let’s add a light and a camera to our network.

Now, in our camera comp let’s set the tz parameter to 10 units:


Next let’s move over the view page of the camera COMP. Here we’re going to leave the projection as perspective, but we’re going to change the viewing angle method to “Focal length and Aperture. Next we’ll change our Focal Length to 10 and our aperture to 16/9:


This is our camera that can see the entire scene. Let’s add a render TOP to our network so we can see what our camera sees:


If we were to bypass our noise SOP the view of this piece of geometry would fit exactly within our view-port. From here forward it’s going to get a little interesting. We want to maintain the perspective calculations from this vantage point, but we want only a single quadrant at a time to fill our view-port. It’s almost like zooming in and cropping to only a sub-section of our view. How do we do that?

Let’s copy our first camera, and then make a few adjustments. I’m going to call my new camera cam_single_p1. Next we’re going to leave our Focal Length and Aperture settings just as they were. We are, however, going to change our window x/y parameters to be:

winx -0.25
winy 0.25

We’re also going to change our window size to be 0.5.


Let’s render that camera and see what we get:


Woah!! That works just like we wanted! Thinking through our other cameras, we can quickly see that the combination of our windows size and offsets act as a zoom and translation mechanism. Try adding 3 more cameras with the following tx/y settings:


  • tx 0.25
  • ty 0.25


  • tx -0.25
  • ty -0.25


  • tx 0.25
  • ty -0.25

If you render these cameras you should see something like this:


Okay. That’s all pretty slick, but how do all of these parameters relate?

What does it mean?!

The meat and potatoes of this technique is to define a view-port’s aspect ratio, the number of vertical slices that a single window represents, and then to specify where the offsets sit that represent the center of a given window.


Let’s think through our simple example. We made our rendered quad a 1.7778 x 1 rectangle – a rectangle with a 16:9 aspect ratio. This was the same value we used for our aperture. We also set the distance of our camera from our geo to be 10 units, which was the same value we used to define our focal length. The window size is a ratio of 1 over the number of vertical sections… in our case we had two vertical sections, so 1 over 2 is 0.5. Our xy window offsets represent the center of our windows in our sections. That’s a little harder to wrap our heads around, and a better way to think about it is the UV coordinates of the center of a given window into our scene. Let’s break that out a little more.

Window Size

  • 1 / number of vertical windows that can fit within our viewport

Focal Length

  • the distance of our camera to our scene window


  • the aspect ratio of our scene window

Win X

  • ( U * 2) – 1 ) / 2

Win Y

  • – ( ( U * 2) – 1 ) / 2 )

To really dig into the power of this technique we need to push beyond just a quad based set up we need a more abstract configuration of windows in our scene. Now that we understand the mechanics of our set up, let’s look at some arbitrary configuration of windows that might be spread across a large number of machines. What if our window arrangement looked something like the below:


What’s going on here?

Well,  let’s imagine that you’re working on a large format LED installation where you need to slice up your scene into uniform HD chunks that then feed an LED controller. In some cases those regions overlap even though only a single LED screen is outputting the content. Relationally, however, you still need to be able to control and designate the regions for the screens. Output 5 and 7 are a good example of these. The final shape of that screen is going to be the combined outline of the two displays, but the video feed to the LED controllers needs to be consistent HD cutouts. All of the dimensions in this example could probably be managed by doing a single rendering of the full scene then cropping to a given region – but at some point your ability to render the full scene and then use TOPs to crop out pixels is going to fail. This example has 7 outputs, but it’s not hard to imagine a project that had 20 or more – as reference, the need to understand this technique came out of working on an installation with 68 discrete outputs.

Okay, okay, okay… fine. So what are all these targets and numbers about?

We’ll remember in our first proof of concept example that we were able to take advantage of a simple 0.25 offset – that makes sense right?

If our geometry is placed with it’s bottom left corner at the origin (0,0), then the center of our first slice is going to be at (0.25, 0.25):

2016-12-13 13.35.19.jpg

That’s great Matt, but that doesn’t make any sense… following this logic, our slices would have been more like:

  • Slice1 ( 0.25, 0.75 )
  • Slice2 ( 0.75, 0.75 )
  • Slice3 ( 0.25, 0.25 )
  • Slice4 ( 0.75, 0.25 )

So what gives?!

Well, we have to remember that we set up our geometry to have it’s center at the origin. If we take this into account, what we see is more like:

2016-12-13 13.36.58.jpg

Looking at this, it should make sense why we used the translation coordinates that we did. The other interesting thing to understand in this look into our geometry is to consider the following.

If the full extent of our scene represents the bounds of a complete window, then we can begin to think about our winx and winy coords as being more akin to a UV – a normalized coordinate on our full window. We need to do some additional math to compensate for the translation of our origin, but that’s pretty straightforward.

If you’re still scratching your head, that’s okay. Let’s look at how to take our new window map and create a programmatic means of slicing up that full scene.

Thinking back to our proof of concept test we need a few things in place in order for this all to work as we expect:

  • We need the transforms for all our cameras to be the same – these are on the xform page: tx, ty, tz.
  • We also need several pieces for the view page:
    • Focal Length
    • Aperture
    • Winx
    • Winy
    • Winsize

Given that our cameras aren’t going to be doing any moving once we set up our calibration, we can safely use python expressions in our parameter fields. In terms of optimization, if an op does a lot of cooking it’s often better to use exports instead of expressions – expressions end up getting complied on demand and evaluated when op op cooks. Since we want fixed cameras looking into a moving world, we can use expressions for our cameras – it’s also going to be less of a hassle to set up, which is great.

For starters let’s add our reference template into our scene. We can start by dragging in our texture, connecting a Null TOP, and then assigning this to the color of a constant Material.


Next let’s add a geometry to our scene. We can replace the torus inside with a rectangle that has our source window’s aspect ratio, which in this case is 16:9 or 1.778 : 1.


Next I’m going to use a Null COMP to hold the transforms of our camera system. I’m going to set this to have a tz value of 5, and otherwise leave this alone.


Let’s also add an object CHOP to help us with determining the distance between the null and geometry – to be clear, we don’t need to do this with an object CHOP, we could do this with a math CHOP, or with Python.

In the Object CHOP I’m going to set our null as the target object, geo1 as the reference object, and I’m going to set this to compute distance.


I’m also going to add some tables that hold some reference information for us. I want to know the width and height of our scene, as well as the width and height of a given cut-out.


We’re also going to need a table with all of our pixel space coordinates:


Now that we have all our primary ingredients ready, we can build out a system to convert our pixel space coordinates into a set of winx and winy translation values.

Let’s start by looking at this process in general. We can first start with our coords in pixel space:

Pixel Space

output x y
output1 640 360
output2 205 130
output3 237 518
output4 525 130
output5 752 593
output6 1013 188
output7 1012 483

These values represent the actual center of these windows in the full pixel scene. I started by making this template in Photoshop, and then measured the location of the center of each given viewport.

Once we have these values, we need to convert them into a normalized values. In other words, how do these pixel coordinates translate to UV coordinates. This is a pretty straightforward calculation – the pixel value divided by it’s respective dimension for the full scene:

  • outputx / full_scene_x
  • outputy / full_scene_y

We can set up a quick eval DAT to do all of this for us:


The two python expressions that drive this in the table2 DAT are:

me.inputCell / op( 'table_scene' )[ 'scene_x', 1 ]
me.inputCell / op( 'table_scene' )[ 'scene_y', 1 ]

Our results from this can be found in the table below.

UV Space

output x y
output1 0.5 0.5
output2 0.16015625 0.18055555
output3 0.18515625 0.71944444
output4 0.41015625 0.18055555
output5 0.5875 0.82361111
output6 0.79140625 0.26111111
output7 0.790625 0.67083333

Now that we have the UV coords that represent the center of each window, we need to convert these values into a scale that takes into account that the center of our geometry is located at the origin. For our x values we multiply our value by 2, subtract 1, and divide by 2. For our y values we use the same operation and multiply by negative 1. We can use another eval DAT to do just this for us.


The two python expressions that drive this in table3 are:

( ( me.inputCell * 2 ) - 1 ) / 2
( ( ( me.inputCell * 2 ) - 1 ) / 2 ) * -1

Now we have taken our original pixel coords and then converted them into our winx and winy transforms.

Converted into winx winy transfroms

output x y
output1 0.0 -0.0
output2 -0.33984375 0.31944444
output3 -0.31484375 -0.21944444
output4 -0.08984375 0.319444444
output5 0.08750000 -0.323611111
output6 0.29140625 0.238888888
output7 0.290625 -0.170833333

With these values set, we just need to make sure that we compute our window size, aperture, and focal_length. Looking back to the above, calculating these remaining values should be a snap.

Window Size

Window size is 1 over the number of windows that can fit into our full scene. This also means that our total scene’s width should be n x the width of a given window.

  • In our case a single window (measured in Photoshop) is 320 pixels, and our full scene is 1280 pixels.
  • 1280 / 320 = 4
  • 1 / 4 = 0.25
  • Window Size = 0.25

Focal Length

Focal Length is the distance between our point of view (in our case the null), and our full scene. We’ve used an object CHOP to compute this distance.


Our aperture is the aspect ratio of our full scene.

  • 1280/720 = 1.778
  • Aperture = 1.778

I’m using an eval DAT to do the computation and organization of all of this:


The python for these looks like this:

1 / ( op( 'table_scene' )[ 'scene_x', 1 ] / op( 'table_render_attr' )[ 'width', 1 ] )
op( 'table_scene' )[ 'scene_x', 1 ] / op( 'table_scene' )[ 'scene_y', 1 ]
op( 'object1' )[ 'dist' ]

Python in touch is name dependent, so I’d recommend looking at this example network if you’re trying to replicate this effect.

Now we can set up our cameras to correctly crop out a given viewport of the entire scene. I’m going to rely on the digits of a camera to be correspondences to the output. So in my case output1 and camera1 should be the same thing. I’m also going to use the translation values of our null to set the location of the camera.

All of that said, our expressions for our camera should look like this:


Again, all of our expressions are name dependent, so I’d recommend looking over how I’ve organized this in the sample file in order to make sure you know exactly what’s referencing what.

Now we can copy past our camera 6 more times – I’m using digits in many of these expressions to match the camera digit to the output digit. Looking over my results it looks like we’re right on the money.


To really appreciate what’s happening here, I’m going to turn off the rendering on our calibration plate, and turn on a sample piece of geometry.


In geo2 you can see the entire geometry, and in each of our viewports we can see that we’re only rendering the region of our geometry that falls into single view.

“That’s a mess… I don’t get it.” You might well be saying. That’s okay. Let’s rearranged our TOPs to mirror more closely what’s in our template:


Hopefully, doing this we should better be able to see how these various pieces work together.

At this point we now have a means of rendering a complex scene across multiple machines (or GPUs if you’re able to use affinity on Quadro cards), and maintain perspective. That unlocks a whole new avenue for realtime rendering that breaks you away from the limitations of single machine configurations or reliance on baked media for distributed realtime rendering.

Download this example from github

It’s always lovely to get an email from Derivative headquarters.

In this case I just got a lovely ping from Malcolm to let me know that the crop parameters on the render TOP can be used for the same functions described above.

Let’s look back at our first example to understand how that might work. In this case I’m going to use the same initial camera that we set up – our cam_full_scene COMP. I’m going to use this one already since I know that it’s correctly configured to capture the entire width and height of our reference plate. Next I’m going to add a render TOP, and under the CROP page I’m going to change my crop right and crop bottom pars to 0.5. For the sake of understanding the concept I’m going to leave this in a fractional unit space, but we could just as easily determine these values as absolute pixel measures. The result of this looks like this:


Next up, rather than using another render TOP I’m going to use a render pass – there’s lots of good reasons to use the render pass but one of the most important considerations here is that it’s a very efficient rendering operation. We do, however, need to make a few other adjustments. We need to target our render2 as our render TOP on the Render Pass page, and we also need to toggle on clear to camera color, and clear depth buffer:


On our render pass we’ll need to use the following crop parameters:


As we add additional render pass tops we need to target the previous render pass – a given render or render pass TOP can only have a single render pass assigned to it.

Our crop parameters should look like this:


  • crop left 0.0
  • crop right 0.5
  • crop bottom 0.0
  • crop top 0.5


  • crop left 0.5
  • crop right 1.0
  • crop bottom 0.0
  • crop top 0.5

All in all when we’re done we should have something that looks like this:


Here the result is the same, the methodology going into it all is just a little different. Like all things TouchDesigner, there are multiple means of solving the same challenge and the “right” one ultimatly comes down to the choice that’s best for your particular installation.

Happy Programming everyone.

* The git repository and support files have been updated to reflect this additional material.