Decomposition of 3D Objects into Geometric Primitives

Sakshi Cholli, Shrusti Girmath and Anilkumar Kulkarni

School of Electrical and Electronics, KLE Technological University, Hubballi, Karnataka, India

Keywords:

3D Object Decomposition, Geometric Primitives, Point Clouds, Convolutional Neural Networks (CNNs),

Deep Learning, Shape Analysis, 3D Geometry, Segmentation, Feature Extraction, Surface Approximation,

Computational Geometry, Machine Learning for 3D Data, Voxelization, Mesh Processing, Point Cloud

Processing, Primitive Fitting, 3D Shape Recognition, Object Reconstruction, Computer Vision, Artiﬁcial

Intelligence

Abstract:

For applications in robotics, CAD systems, and scene comprehension, 3D objects must be broken down into

their component geometric primitives. Current methods frequently depend on manually created features or

regularised transformations, such as voxelization, which result in quantisation artifacts and inefﬁciencies.

This study, which draws inspiration from PointNet, suggests a uniﬁed neural network architecture that breaks

down 3D objects into basic geometric forms like spheres, cylinders, and planes by directly processing raw

point clouds. Our model incorporates extra modules to learn local geometric characteristics for accurate

decomposition, while utilizing PointNet’s capability to handle unordered point sets, guaranteeing permutation

invariance.

1 INTRODUCTION

The decomposition of 3D objects into geometric

primitives is a fundamental problem in computer vi-

sion and computational geometry, with applications

spanning robotics, augmented reality, computer-aided

design (CAD), and autonomous systems. This pro-

cess involves breaking down complex 3D shapes into

simpler components such as planes, spheres, cylin-

ders, and other primitive geometries. Such decompo-

sition enables a more interpretable and efﬁcient repre-

sentation of 3D objects, facilitating downstream tasks

like simulation, manipulation, and rendering.

Traditional methods for 3D object decomposition

often rely on handcrafted algorithms and feature en-

gineering, which may struggle to generalize across

diverse object geometries and noisy datasets. With

the rise of deep learning, data-driven approaches have

shown remarkable promise in addressing these limita-

tions. PointNet, a pioneering deep learning architec-

ture, has emerged as a robust solution for processing

raw point cloud data. By directly operating on un-

ordered sets of 3D points, PointNet preserves permu-

tation invariance and learns global and local geomet-

ric features effectively.

This research aims to leverage the PointNet frame-

work to address the challenge of decomposing 3D

objects into their constituent geometric primitives.

The study explores how PointNet’s feature extrac-

tion capabilities can be adapted to identify and seg-

ment primitives within complex point cloud represen-

tations. By focusing on a data-driven approach, this

work seeks to overcome challenges associated with

noise, incomplete data, and diverse object geometries,

contributing to a scalable and generalized solution for

3D object decomposition.3D object decomposition

into geometric primitives simpliﬁes complex object

representations, enabling efﬁcient storage, manipu-

lation, and analysis. Traditional methods rely heav-

ily on data transformation into regular grids or hand-

crafted geometric approximations, which are compu-

tationally intensive and fail to generalize across varied

datasets.

PointNet, a deep learning framework designed for

point cloud data, has demonstrated success in 3D clas-

siﬁcation and segmentation tasks by directly process-

ing unordered sets of points. This work extends Point-

Net to address the speciﬁc challenge of geometric de-

composition. By identifying local and global point

features, our approach achieves robust decomposition

of objects into primitive shapes, providing a scalable

and interpretable solution.

496

Cholli, S., Girmath, S. and Kulkarni, A.

Decomposition of 3D Objects into Geometric Primitives.

DOI: 10.5220/0013622800004664

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 3rd International Conference on Futuristic Technology (INCOFT 2025) - Volume 3, pages 496-502

ISBN: 978-989-758-763-4

2 RELATED WORK

The related work section provides an overview of ex-

isting methods and research relevant to the decompo-

sition of 3D objects into geometric primitives. For the

given problem, the section would include the follow-

ing topics:

1.Traditional Methods for Geometric Decom-

position Traditional approaches rely on geometric

algorithms and handcrafted rules to identify primi-

tives in 3D models: RANSAC Based Plane Detec-

tion: Techniques such as Random Sample Consensus

(RANSAC) are commonly used for detecting planes

in point clouds by ﬁtting geometric models iteratively.

While effective for simple shapes, these methods

struggle with noise and complex structures. Model

Fitting and Optimization: Optimization-based tech-

niques aim to ﬁt geometric primitives like spheres,

cylinders, and cones by minimizing error metrics.

However, these are computationally expensive and

sensitive to parameter tuning. Region Growing Meth-

ods: Algorithms that group points into regions based

on geometric similarity have been used but often fail

with incomplete or noisy data.

2. Deep Learning for 3D Data Processing The

rise of deep learning has led to novel methods for un-

derstanding and processing 3D data: Volumetric Rep-

resentations: Early works converted point clouds into

3D voxel grids and applied 3D convolutional neu-

ral networks (e.g., VoxNet, 3DShapeNets). These

methods suffer from high memory and computational

costs due to voxelization.(Ding et al., 2023)(Liu et al.,

2019)(Ioannidou et al., 2017). Multiview CNNs:

Techniques like Multi-View CNNs render 3D shapes

into 2D projections and process them using 2D convo-

lutional networks. While effective for classiﬁcation,

this approach loses detailed geometric information,

making it unsuitable for precise primitive decompo-

sition. Spectral CNNs: Applied on mesh data, these

methods process shapes using graph-based represen-

tations. However, they are limited to manifold meshes

and struggle with generalizing to point clouds.

3. PointNet and Point-Based Networks

PointNet introduced a breakthrough in processing

raw point clouds directly: It demonstrated how

symmetric functions, like max pooling, ensure

permutation invariance for unordered point sets.

Extensions of PointNet, such as PointNet++ and

PointCNN, focused on capturing local point fea-

tures and hierarchical structures, enabling ﬁner

segmentation and part identiﬁcation.(Liu and Tian,

2024) However, these methods primarily target

classiﬁcation and segmentation tasks, without spe-

ciﬁc adaptations for decomposing 3D objects into

primitives.

4. Primitive Decomposition with Deep Learning

Recent works have explored the application

of deep learning for geometric decomposition:

Learning-Based Fitting: Some models directly pre-

dict primitive parameters (e.g., plane equations, cylin-

der radii) using neural networks. While promising,

these require large labeled datasets with annotated

primitives, which are scarce.(Fu et al., 2023)(Huang

et al., 2018) Hybrid Methods: Approaches combin-

ing traditional RANSAC with learned features from

deep networks have shown improvements in robust-

ness and efﬁciency. Self-Supervised Learning: Tech-

niques leveraging self-supervision to identify geomet-

ric primitives without labeled data are emerging but

remain experimental.

5. Limitations of Existing Methods Scalability:

Most methods struggle to handle large-scale, dense

point clouds due to computational inefﬁciency. Ro-

bustness: Traditional and some learning-based meth-

ods are sensitive to noise, occlusions, and incom-

plete data. Generalization: Models trained on speciﬁc

datasets or object categories often fail to generalize to

unseen shapes and environments.

3 PROBLEM STATEMENT

The framework is designed to process 3D objects rep-

resented as ordered point sets to decompose them into

geometric primitives. Each 3D object is deﬁned as a

set of n points {P

| i = 1, . . . , n}, where each point P

is characterized by its (x, y, z) coordinates. Additional

feature channels, such as normals or colors, may also

be incorporated when available; however, the focus

remains primarily on the spatial (x, y, z) representa-

tion.

For the decomposition task, the input point cloud

is uniformly sampled from a 3D object while preserv-

ing its inherent order. The proposed deep learning

framework identiﬁes and segments points into groups

corresponding to speciﬁc geometric primitives, in-

cluding planes, cylinders, and spheres. The network

outputs n × p scores, where n is the number of points

in the object, and p is the number of primitive cat-

egories. Each score represents the likelihood that a

point belongs to a speciﬁc primitive. By leveraging

the ordered structure of the data set and integrating

deep learning with geometric post-processing tech-

niques, the framework achieves effective decompo-

sition of 3D objects into their constituent geometric

primitives.

Decomposition of 3D Objects into Geometric Primitives

497

3.1 Pointnet

PointNet, introduced by a Stanford University re-

searcher in 2016, aims to classify and segment 3D

image representations. The approach leverages a data

structure known as a point cloud, which consists of

a collection of points representing the geometry of

a 3D object or shape. However, due to its irregu-

lar structure, point clouds are only applicable to spe-

ciﬁc use cases.Traditionally, many researchers trans-

formed point clouds into alternative representations,

such as voxels (3D volumetric pixels), before process-

ing them through deep neural networks. This transfor-

mation, however, often results in overly large datasets

and introduces quantization errors, potentially alter-

ing the natural characteristics of the 3D structure.

In their work, the authors present a novel technique

that directly processes point clouds, enabling efﬁcient

classiﬁcation and segmentation without requiring in-

termediate transformations.

3.2 Architecture

The input for the suggested architecture is Point Sets

that are extracted from a Point Cloud. With each

point represented by its coordinates (x

, y

, z

), a Point

Cloud is a collection of 3D points P

Either a direct sampling of the item’s form or an

extraction from a segmented scene Point Cloud is

used for the object classiﬁcation job. For semantic

segmentation, the input may be a smaller area of a 3D

scene obtained via object region segmentation, or it

could be a single object for part segmentation.

Key Characteristics of Point Sets

1. Permutation Invariance: Point Clouds lack in-

herent structure, and a collection of N points can

have N! different permutations. Any processing

method must ensure that the output remains con-

sistent regardless of the order of points.

2. Transformation Invariance: Outputs for classi-

ﬁcation and segmentation tasks should not be af-

fected by geometric transformations such as rota-

tions or translations applied to the input points.

3. Point Interactions: Important contextual infor-

mation is frequently carried by nearby sites. As

a result, points shouldn’t be discussed separately.

Because segmentation tasks yield more pertinent

information than classiﬁcation tasks, these inter-

actions are particularly crucial.

Figure 1: the fusion of pointnet shown in green and 3

transformer modules shown in yellow represents our point-

based classiﬁcation model with fc standing for intercon-

nected layer and ⊕ representing matrix addition

4 POINTNET ARCHITECTURE

The PointNet architecture is designed with simplic-

ity and effectiveness in mind. The classiﬁcation net-

work begins by applying a shared multi-layer percep-

tron (MLP) to transform each of the n input points

from 3 dimensions to 64 dimensions. Notably, the

same MLP is applied to all n points to ensure consis-

tency. In the subsequent layer, each of these n points

is further transformed from 64 dimensions to 1024 di-

mensions. A max-pooling operation is then used to

aggregate these features into a single global feature

vector in R

1024

. Finally, a three-layer fully connected

network (FCN) maps this global feature vector to k

output classiﬁcation scores.

Pointnet is built with a focus on simplicity and ef-

ﬁciency it utilizes a shared multi-layer perceptron mlp

to map each of the n points from a 3d input space into

a 64-dimensional feature representation a critical fea-

ture of the design is the consistent application of the

same mlp across all n points subsequently these fea-

tures are further transformed into a 1024-dimensional

space for each point a max-pooling layer then aggre-

gates these features forming a comprehensive global

feature vector this global representation is ﬁnally pro-

cessed through a three-layer fully connected network

fcn to produce k classiﬁcation scores.

5 METHODOLOGY

5.1 Dataset Preparation

• The data set is fetched from the Kaggle repository

using the Kaggle API.

• The ModelNet10 dataset, which contains 3D ob-

ject ﬁles in .off format, is downloaded, ex-

tracted, and loaded into the working directory for

processing.

• The data set structure is printed to verify the pres-

ence of ﬁles and directories.

INCOFT 2025 - International Conference on Futuristic Technology

498

5.2 Point Cloud Preprocessing

5.2.1 File Loading

• If the input ﬁle is a mesh (e.g., . off), it is con-

verted to a point cloud using Poisson disk sam-

pling to generate a uniform set of points.

• If the input ﬁle is already a point cloud, it is di-

rectly loaded.

• A validation step ensures that the loaded data con-

tains valid points. If the point cloud is empty, an

error is raised.

5.2.2 Input Representation

• Each 3D object is represented as a point cloud,

where each point contains its (x, y, z) coordinates.

Additional features such as normals or colors may

also be included if available.

5.3 Deep Learning Framework for

Feature Extraction

A custom deep learning framework is implemented to

extract features from the point cloud:

• Convolutional Layers (conv2d): Extract spa-

tial features from the input point cloud with batch

normalization and ReLU activation.

• Max Pooling (max pool2d): Aggregate local

features into a compact representation.

• Fully Connected Layers

(fully connected): Map features to

higher dimensions and combine them for global

understanding.

• Transform Network (get transform): A

spatial transformer ensures alignment of the in-

put point cloud and feature space to a canonical

coordinate system.

• Dropout (dropout): Regularization is applied

to reduce overﬁtting during training.

5.4 Plane Segmentation Using RANSAC

5.4.1 Plane Detection

• The RANSAC (Random Sample Consensus) al-

gorithm is used iteratively to detect planes in the

point cloud.

• Key parameters, such as distance threshold,

ransac n, and num iterations, are optimized

for precise plane detection.

Plane Detection Probability

The success probability of RANSAC is related to:

• The fraction of inliers (w) in the data.

• The number of iterations (N) required to ensure a

good model with high probability (p).

The probability p of at least one sample being free

of outliers is given by:

p = 1 −(1 −w

)

Where:

• w: Fraction of inliers.

• n: For a plane, the minimum number of points

needed to suit the model is n = 3..

• N: Number of iterations.

This formula is derived from binomial probability.

RANSAC is a robust statistical algorithm de-

signed to estimate the parameters of a mathematical

model from a dataset that may contain a signiﬁcant

proportion of outliers.

i. Random Sampling Randomly select a minimal

subset of the data points required to ﬁt the desired

model. For example, to estimate a plane with the

equation:

ax + by + cz + d = 0,

a minimum of three points is required.

ii. Model Fitting Fit the model to the sampled points

using standard techniques (e.g., solving linear

equations or optimization).

iii. Consensus Measurement Compute the residuals

for all data points to evaluate how well the model

ﬁts. The residual for a point (x

, y

, z

) is calculated

as:

|ax

+ by

+ cz

+ d|

√

+ b

+ c

Count the number of points (inliers) whose resid-

uals are below a predeﬁned threshold ε (e.g., ε =

0.01).

iv. Statistical Concept RANSAC iteratively per-

forms the above steps (random sampling, model

ﬁtting, and consensus measurement). The quality

of a model is evaluated based on the number of

inliers it explains. After a predeﬁned number of

iterations, the model with the highest consensus

(most inliers) is selected as the best estimate.

5.4.2 Primitive Extraction

• Points classiﬁed as inliers for a detected plane are

grouped and visualized in a distinct color.

• Remaining points (outliers) are retained for fur-

ther segmentation.

Decomposition of 3D Objects into Geometric Primitives

499

5.5 Thresholding

The algorithm employs a distance threshold (ε) to sep-

arate inliers from outliers:

Points with distances less than ε are classiﬁed as

part of the plane.

Points with distances greater than ε are excluded (out-

liers).

This involves binary classiﬁcation of points based

on a ﬁxed threshold, a common statistical decision

boundary.

5.6 Residual Segmentation for Other

Primitives

• After removing inlier points of detected planes,

the residual point cloud is iteratively processed to

detect other geometric primitives such as cylin-

ders and spheres.

• Points not belonging to any primitive are classi-

ﬁed as outliers and visualized in gray.

5.7 Visualization and Saving Results

5.7.1 Color-Coding

• Each detected geometric primitive is assigned a

random color for easy visualization.

5.7.2 Output Generation

• The decomposed point cloud, including all prim-

itives and outliers, is merged into a single point

cloud.

• The result is saved in .ply format for further use

or analysis.

5.8 Workﬂow

• An example ﬁle (chair 0904.off) is processed

through the framework.

• The point cloud is decomposed into geometric

primitives.

• The output is saved as trial4.ply.

6 RESULTS

The decomposition results highlight the exceptional

capabilities of PointNet in accurately reconstructing

and simplifying complex 3D objects into geometric

primitives. The reconstructed primitives closely align

Figure 2: Monitor

Figure 3: sofa

with the original structures, preserving intricate de-

tails such as the ornate back design of the chair and

the precise alignment of the table’s legs. This demon-

strates PointNet’s ability to capture and retain ﬁne-

grained features from raw point cloud data, showcas-

ing its strength in effectively representing detailed ge-

ometries. Furthermore, the use of geometric prim-

itives allows for efﬁcient approximation of the ob-

jects by reducing their complexity while maintaining

their essential structural attributes. Such simpliﬁed

representations are highly advantageous in compu-

tationally intensive applications like CAD modeling,

robotics, and virtual reality, where reduced complex-

ity leads to improved processing efﬁciency and faster

computations. Moreover, the algorithm’s robustness

is evident in its ability to handle both simple ﬂat sur-

faces and complex, intricate designs with ease, ensur-

ing reliable performance across a wide range of ge-

ometries. This makes PointNet a powerful tool for

tasks requiring precise, scalable, and efﬁcient 3D ob-

ject decomposition. The outcomes show how well

the plane identiﬁcation algorithm works with various

3d objects the algorithm makes use of open3ds arbi-

trary sample concurrence ransac technique accuracy

was calculated using the formula below:

Accuracy = min



Detected Planes

Expected Planes

, 1.0



×100 (1)

The algorithm identiﬁes detected planes, while ex-

INCOFT 2025 - International Conference on Futuristic Technology

500

Figure 4: Montior;Bed

pected planes represent the known number of planes

in the object. The elapsed time indicates the total time

taken to process each object, including reading, de-

tecting planes, and writing results.

Table 1: Comparison of Detected and Expected Planes for

Various Objects

Object Detected Planes Expected Planes Accuracy (%) Elapsed Time (s)

Sofa 7 5 100.00 13.68

Bed 11 5 100.00 9.88

Chair 18 5 100.00 14.49

Table 4 5 80.00 32.16

Monitor 17 5 100.00 27.41

7 CONCLUSION

• PointNet demonstrates high efﬁciency and effec-

tiveness in the decomposition of 3D objects into

geometric primitives. Its use of symmetric func-

tions like max-pooling enables efﬁcient aggre-

gation of global features from unordered point

clouds, capturing key geometric structures such

as planes, edges, and curves. This eliminates the

need for manual feature extraction and parame-

ter tuning, making PointNet superior to traditional

methods.

• The architecture supports end-to-end training with

standard back-propagation, optimizing perfor-

mance without manual intervention. It directly

processes raw point cloud data, avoiding compu-

tationally expensive conversions like voxelization

and retaining ﬁner details of the object. Com-

pared to voxel-based methods like VoxNet and

3D-CNNs, Point-Net achieves better memory ef-

ﬁciency and faster inference by operating directly

on unstructured data.

Pointnet’s ability to learn semantic representations of

geometric primitives enhances its robustness in han-

dling complex designs, while its lightweight architec-

ture ensures efﬁcient training and scalability for large

datasets. In contrast to point net, which improves lo-

cal feature capture but increases numerical cost, point

net is an effective tool for 3D object deconstruction

because it balances accuracy and efﬁciency.

8 CONCLUSIONS

• PointNet demonstrates high efﬁciency and effec-

tiveness in the decomposition of 3D objects into

geometric primitives. Its use of symmetric func-

tions like max-pooling enables efﬁcient aggre-

gation of global features from unordered point

clouds, capturing key geometric structures such

as planes, edges, and curves. This eliminates the

need for manual feature extraction and parame-

ter tuning, making PointNet superior to traditional

methods.

• The architecture supports end-to-end training

with standard backpropagation, optimizing per-

formance without manual intervention. It directly

processes raw point cloud data, avoiding compu-

tationally expensive conversions like voxelization

and retaining ﬁner details of the object. Com-

pared to voxel-based methods like VoxNet and

3D-CNNs, PointNet achieves better memory ef-

ﬁciency and faster inference by operating directly

on unstructured data.

ACKNOWLEDGEMENTS

The authors thank Sir Virupakshi Hosallimath, Asst.

Prof. of Electrical and Electronics at KLE Technolog-

ical University for their guidance and valuable feed-

back throughout this research. Special thanks to the

other faculties, Asst. Prof. Renuka Ganiger and Asst.

Prof. Pooja Chandargi for their assistance. Their sup-

port greatly enhanced the quality of this work.

REFERENCES

Ding, Z., Sun, Y., Xu, S., Pan, Y., Peng, Y., and Mao, Z.

(2023). Recent advances and perspectives in deep

learning techniques for 3d point cloud data process-

ing. Robotics, 12(4).

Fu, R., Wen, C., Li, Q., Xiao, X., and Alliez, P. (2023). Bp-

net: B

ezier primitive segmentation on 3d point clouds.

In IJCAI International Joint Conference on Artiﬁcial

Intelligence.

Decomposition of 3D Objects into Geometric Primitives

501

Huang, J. et al. (2018). Deepprimitive: Image decompo-

sition by layered primitive detection. Computational

Visual Media, 4(4).

Ioannidou, A., Chatzilari, E., Nikolopoulos, S., and Kom-

patsiaris, I. (2017). Deep learning advances in com-

puter vision with 3d data: A survey. ACM Computing

Surveys, 50(2).

Liu, H. and Tian, S. (2024). Deep 3d point cloud classi-

ﬁcation and segmentation network based on gatenet.

Visual Computer, 40(2).

Liu, Z., Han, Z., and Bu, S. (2019). Deep learning for 3d

data processing. In Deep Learning in Object Detec-

tion and Recognition.

INCOFT 2025 - International Conference on Futuristic Technology

502